14941058 - (D) (P.T.A.B. Aug. 27, 2019)

Kshitiz Kumar et al.

Patent Trials and Appeals BoardAug 27, 2019

14941058 - (D) (P.T.A.B. Aug. 27, 2019)

UNITED STATES PATENT AND TRADEMARK OFFICE UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O. Box 1450 Alexandria, Virginia 22313-1450 www.uspto.gov APPLICATION NO. FILING DATE FIRST NAMED INVENTOR ATTORNEY DOCKET NO. CONFIRMATION NO. 14/941,058 11/13/2015 Kshitiz Kumar 357524-US-NP 5862 69316 7590 08/27/2019 MICROSOFT CORPORATION ONE MICROSOFT WAY REDMOND, WA 98052 EXAMINER AZAD, ABUL K ART UNIT PAPER NUMBER 2657 NOTIFICATION DATE DELIVERY MODE 08/27/2019 ELECTRONIC Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. Notice of the Office communication was sent electronically on above-indicated "Notification Date" to the following e-mail address(es): chriochs@microsoft.com usdocket@microsoft.com PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE ____________ BEFORE THE PATENT TRIAL AND APPEAL BOARD ____________ Ex parte, KSHITIZ KUMAR, HOSAM KHALIL, YIFAN GONG, ZIAD AL BAWAD, and CHAOJUN LIU ____________ Appeal 2018-000949 Application 14/941,0581 Technology Center 2600 ____________ Before MAHSHID D. SAADAT, CARL L. SILVERMAN, and LILAN REN, Administrative Patent Judges. SILVERMAN, Administrative Patent Judge. DECISION ON APPEAL Appellants appeal under 35 U.S.C. § 134(a) from the Examiner’s Final Rejection of claims 1–22, which constitute all pending claims. We have jurisdiction under 35 U.S.C. 6(b). An Oral Hearing was held August 15, 2019. We reverse. 1 The real party in interest is identified as Microsoft Technology Licensing, LLC. App. Br. 2. Appeal 2018-0000949 Application 14/941,058 2 STATEMENT OF THE CASE The invention relates to speech recognition arbitration employing Automated Speech Recognition (ASR) and confidence features. Abstract; Spec. ¶¶ 1–3, 10, 11, Figs. 1–3. Claim 1, reproduced below, is exemplary of the subject matter on appeal (emphasis added): 1. A speech recognition system for transforming an acoustic utterance into a transcribed speech recognition result by arbitrating between speech recognition results generated by a first automated speech recognition (ASR) engine and a second ASR engine, the system comprising: at least one memory device; at least one processing device; an arbitrator stored in the at least one memory device and executable by the at least one processing device, the arbitrator configured to receive a set of confidence features of an utterance and to select between a first speech recognition result representing the acoustic utterance as transcribed by the first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by the second ASR engine, the selection being based on the received set of confidence features. App. Br. 20 (Claims Appendix). THE REJECTIONS Claims 1–22 are rejected under 35 U.S.C. § 102(e) as being anticipated by Endo et al. (US 7,228,275 B1; iss. June 5, 2007) (“Endo”). Final Act. 2–5. ANALYSIS Appellants argue that the Examiner errs in finding Endo discloses claim 1 because the Examiner’s findings are based on an improper claim interpretation and because Endo does not disclose the claim 1 limitation Appeal 2018-0000949 Application 14/941,058 3 the arbitrator configured to receive a set of confidence features of an utterance and to select between a first speech recognition result representing the acoustic utterance as transcribed by the first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by the second ASR engine, the selection being based on the received set of confidence features [(also referred to as “disputed limitation”)]. App. Br. 7–14; Reply Br. 3–7. In the Final Action, the Examiner finds Endo teaches the claim 1 limitations. Final Act. 2–3. In particular, the Examiner finds that Endo teaches the disputed limitation. Id. at 3 (citing Endo 4:14–31, 5:13–6:62; Fig. 2, elements 208, 202, 2024, 206, and 102). Appellants argue that the Examiner errs because Endo teaches arbitration based on “confidence scores,” not based on “confidence features” as recited in claim 1 (emphasis added). App. Br. 6–7. According to Appellants In contrast to arbitration based on “confidence features,” Endo discloses arbitration based strictly on “confidence scores.” As discussed below, “confidence scores” represent a “degree of confidence” and do not quantify “some auditory, linguistic, or syntactical aspect of the utterance compared to the potential result.” Consequently, Endo cannot be relied on as disclosing or suggesting: “an arbitrator . . . to select between a first speech recognition result . . . and a second speech recognition result . . . based on the received set of confidence features.” Id. at 7. Regarding the distinction between “confidence scores” and “confidence features,” Appellants argue “[i]n prior solutions, such as those described in Endo and in Applicant’s ‘Background’ section, ‘arbitration has Appeal 2018-0000949 Application 14/941,058 4 been performed based on a ‘confidence score’ that quantifies a degree of confidence (e.g., expected accuracy) that an ASR engine has in its speech recognition results,’ (see Applicant‘s specification, paragraph [0002]).” Id. Appellants provide an example in which a confidence score of two different ASR’s, using different language models, may result in confidence scores of 92% and 86% wherein the arbitrator selects and outputs the result corresponding to the higher (92%) confidence score. Id. at 8. Appellants assert that “[t]his type of arbitration can be susceptible to errors due to the fact that confidence scores may, at times, require normalization prior to comparison.” Id. (citing (Endo, 5: 55–6:19 discussing normalization). Regarding the arbitration set forth in claim 1, Appellants argue In an example scenario employing the arbitration technology of claim 1, each of two ASR engines may interpret a same acoustic utterance: “what is today?” according to two different language and/or acoustic models. Similar to the example described above, each of the ASR engines independently selects a “best” result based on an associated “confidence score” representing a degree of confidence in the result. For example, a first ASR engine outputs a result “what is today?” and a second ASR engine outputs a result “what did you say?” These results are provided to an arbitrator along with a set of confidence features. Thus, rather than merely receiving and comparing confidence scores (e.g., 86% v. 92%), as in Endo, the arbitrator receives and compares the underlying confidence features (e.g., the features used by the ASR engines as a basis for computing the confidence scores). In one example implementation, the arbitrator compares the computed confidence features to statistical distributions associated with each of the ASR engines to determine a likelihood that each ASR engine is correct (see, e.g., paragraphs [0019], [0020]: “the arbitration logic 128 performs a probability distribution analysis based on combinations of confidence features and observed confidence values”). This comparison Appeal 2018-0000949 Application 14/941,058 5 allows the arbitrator to make inferences based directly on the confidence feature(s). If, for example, the arbitrator receives a confidence feature quantifying the pitch of an acoustic utterance from two different ASR engines, the arbitrator may refer to a statistical distribution (e.g., a trained dataset) to determine that a first ASR engine of the two ASR engines does not perform well within a pitch range of this particular utterance. As a result of this inference, the arbitrator may choose to throw away the result from the first ASR engine and defer to the result of the second ASR engine. Thus, the confidence feature (e.g., the value representing pitch) allows the arbitrator to make inferences not possible when implementing the solution of Endo (e.g., arbitration is based exclusively on comparison of confidence scores). This arbitration based on confidence features “facilitate[es] richer access to data during arbitration and increase[e] arbitration success,” (paragraph [0011]). Id. at 8–9. Regarding the Endo teachings, Appellants argue The cited portions of Endo (Col. 5, line 13-Col. 6, line 62) generally disclose computation of confidence scores and arbitration based on the computed confidence scores. Endo discloses, for example, “[t]he present invention provides a speech recognition system that recognizes an input speech signal by using a plurality of speech recognizers each outputting recognized speech texts (e.g.,“results”) and associated confidence scores and a decision module [e.g., arbitrator] selecting one of the speech texts based upon their associated confidence scores,” (Col. 2, lines 28-33) (emphasis added). “The decision module selects either the first speech text or the second speech text as the output speech text depending upon which of the first and second confidence scores is higher,” (Col. 2, lines 45-48; see also, Col. 6, lines 20-24 and Col. 6, lines 50-51) (emphasis added). Thus, Endo’s disclosure is consistent with the above described solutions whereby arbitration is performed based on confidence scores but not on confidence features. Appeal 2018-0000949 Application 14/941,058 6 Endo’s arbitrator is never disclosed as receiving confidence features. Id. at 10. Appellants further argue that the terms “confidence scores” and “confidence features” are used throughout the Specification to convey specific and different meanings: For example, the Applicant’s specification provides that a confidence score is a numerical score conveying a degree of confidence that is “computed based on the computed confidence features,” (paragraph [0018]; see also paragraph [0011]). Confidence features are, in contrast, features that “quantif[y] some auditory, linguistic, or syntactical aspect of the utterance compared to the potential result,” (paragraph [0019]). Therefore, it is unreasonable to interpret the term “confidence score” as disclosing or suggesting “confidence features.” Id. at 11. Appellants further argue, if the Examiner believes that Endo inherently discloses “an arbitrator configured to receive confidence features,” the findings do not meet the requirements for inherency. Id. at 11–12 (“To establish inherency, the extrinsic evidence 'must make clear that the missing descriptive matter is necessarily present in the thing described in the reference, and that it would be so recognized by persons of ordinary skill. Inherency, however, may not be established by probabilities or possibilities. The mere fact that a certain thing may result from a given set of circumstances is not sufficient.” In re Robertson, 169 F.3d 743, 745, 49 USPQ2d 1949, 1950-51 (Fed. Cir. 1999) (Id.) “In relying upon the theory of inherency, the examiner must provide a basis in fact and/or technical reasoning to reasonably support the determination that the allegedly inherent characteristic necessarily flows from the teachings of the applied Appeal 2018-0000949 Application 14/941,058 7 prior art.” Ex parte Levy, 17 USPQ2d 1461, 1464 (Bd. Pat. App. & Inter. 1990) (Id.)). In the Answer, the Examiner finds that Endo teaches the speech recognizers each output (1) recognized speech text (recognized speech text constitutes the claimed “confidence features”) and (2) the associated “confidence scores,” and the decision module (“arbitrator”) selects one of the recognized speech texts (“confidence features”) “according to the methods of the [Endo] present invention.” Ans. 5–6 (citing Endo 5:13–6:19, Fig. 2, elements 202, 204. 206, 102, and 120). In the Reply Brief, Appellants argue the Examiner errs in interpreting “confidence features” to include the Endo recognized speech text because the interpretation is unreasonably broad and renders other claim terms without any meaning. Reply Br. 3–7. According to Appellants, the Examiner maps Endo’s “recognized speech text” interchangeably to the claim terms “confidence features,” “first speech recognition result,” and “second speech recognition result” without realizing the distinct differences in meaning that these terms have within the context of claim 1. Id. at 3–4. Appellants argue that mapping “confidential features” to Endo’s “recognized speech text” ignores the Specification’s description of the meaning for “confidential features.” Id. at 4. According to Appellants For example, paragraph [0016] of the Appellant’s specification provides: For each potential result, the [automated speech recognition (ASR)] engines compute a number of different metrics, herein referred to as confidence features, that each quantifies some auditory, linguistic, or syntactical aspect of the utterance compared to [al potential result [e.g., result that may be potentially ‘output’ by the ASR engine] (emphasis added). In light of this language, the Appellant submits that any Appeal 2018-0000949 Application 14/941,058 8 interpretation of “confidence features” as being the same as a “result” from a speech recognition engine is unreasonably broad. As recently noted by the Federal Circuit in In re Smith Int'l (Fed. Cir. 2017), “[t]he correct inquiry in giving a claim term its broadest reasonable interpretation in light of the specification is not whether the specification proscribes or precludes some broad reading of the claim term adopted by the examiner. And it is not simply an interpretation that is not inconsistent with the specification. It is an interpretation that corresponds with what and how the inventor describes his invention in the specification, i.e., an interpretation that is “consistent with the specification,” (emphasis added). The Examiner’s interpretation “confidence features” as a “recognized speech text” is clearly inconsistent with the Appellant’s specification (see, e.g., paragraph [0016], quoted above, describing confidence features as “quantif[ying] some auditory, linguistic, or syntactical aspect” of an acoustic utterance as compared to a potential result). Accordingly, the “recognized speech text” that is provided to Endo’ s “decision module 208” does not disclose or suggest the “confidence features” recited in claim 1 or any other pending claim. Id. at 6–7. A claim is anticipated only if each and every element as set forth in the claims is found, either expressly or inherently described in a single prior art reference, and arranged as required by the claim. Verdegaal Bros., Inc. v. Union Oil Co. of Cal., 814 F.2d 628, 631 (Fed. Cir. 1987). Claim terms in a patent application are given the broadest reasonable interpretation consistent with the Specification, as understood by one of ordinary skill in the art. In re Crish, 393 F.3d 1253, 1256 (Fed. Cir. 2004). Our reviewing court states that “the words of a claim ‘are generally given their ordinary and customary meaning.’” Phillips v. AWH Corp., 415 F.3d 1303, 1312 (Fed. Cir. 2005) (en banc) (citations omitted). However, the Appeal 2018-0000949 Application 14/941,058 9 broadest reasonable interpretation differs from the broadest possible interpretation. In re Smith Int’l, Inc., 871 F.3d 1375, 1383 (Fed. Cir. 2017). The correct inquiry in giving a claim term its broadest reasonable interpretation in light of the specification is “an interpretation that corresponds with what and how the inventor describes his invention in the specification, i.e., an interpretation that is ‘consistent with the specification.’” Id. at 1382–83 (quoting In re Morris, 127 F.3d 1048, 1054 (Fed. Cir. 1997)). We are persuaded by Appellants’ arguments that the Examiner’s claim interpretations are unreasonably broad. Here, claim 1 expressly recites the term “confidence features,” not “confidence scores” and the Specification provides consistent, and distinctive, use of each of these terms. See, for example, Spec. ¶¶ 1–3, 10–12, 16, Figs. 1–3. Endo does not teach the disputed limitation including “[an] arbitrator configured to receive a set of confidence features . . .” based on the distinction drawn in the Specification between the “confidence features” and the “confidence scores.” Additionally, based on the record before us, the Examiner provides insufficient support for a finding that Endo’s speech recognition text constitutes the claimed “confidence features.” In view of the above, we do not sustain the rejection of claim 1, independent claims 9 and 18 which are argued together with claim 1, and dependent claims 2–8, 10–17, and 19–22. Because our decision is dispositive of the rejection of these claims, we do not address additional arguments raised by Appellants. Appeal 2018-0000949 Application 14/941,058 10 DECISION We reverse the Examiner’s decision rejecting claims 1–22. REVERSED