Freshub, Ltd.Download PDFPatent Trials and Appeals BoardJan 11, 2021IPR2020-01146 (P.T.A.B. Jan. 11, 2021) Copy Citation Trials@uspto.gov Paper 10 571.272.7822 Date: January 11, 2021 UNITED STATES PATENT AND TRADEMARK OFFICE ____________ BEFORE THE PATENT TRIAL AND APPEAL BOARD ____________ AMAZON.COM, INC., AMAZON.COM SERVICES LLC (formerly AMAZON DIGITAL SERVICES LLC), PRIME NOW LLC, and WHOLE FOODS MARKET SERVICES, INC., Petitioner, v. FRESHUB, LTD., Patent Owner. ____________ IPR2020-01146 Patent 10,232,408 B2 ____________ Before WILLIAM V. SAINDON, FRANCES L. IPPOLITO, and ERIC C. JESCHKE, Administrative Patent Judges. IPPOLITO, Administrative Patent Judge. DECISION Denying Institution of Inter Partes Review 35 U.S.C. § 314(a) IPR2020-01146 Patent 10,232,408 B2 2 I. INTRODUCTION Amazon.com, Inc., Amazon.com Services LLC, Prime Now LLC, and Whole Foods Market Services, Inc. (collectively, “Petitioner”) filed a Petition (Paper 1, “Pet.”) requesting inter partes review of claims 1–30 of U.S. Patent No. 10,232,408 B2 (Ex. 1001, “the ’408 patent”). Patent Owner filed a Preliminary Response. Paper 6 (“Prelim. Resp.”). Pursuant to the Board’s Order (Paper 7), the Petitioner filed a Reply to the Preliminary Response addressing Patent Owner’s contentions directed to discretionary denial under § 314(a) (Paper 8, “Prelim. Reply”) and Patent Owner filed a Sur-reply to that Reply (Paper 9, “Prelim. Sur-reply”). Under 35 U.S.C. § 314(a), an inter partes review may not be instituted unless the information presented in the Petition and any response thereto shows “there is a reasonable likelihood that the petitioner would prevail with respect to at least 1 of the claims challenged in the petition.” Taking into account the arguments presented in the parties’ briefs, we conclude that the information presented in the Petition does not establish that there is a reasonable likelihood that Petitioner would prevail in challenging at least one of claims 1–30 of the ’408 patent as unpatentable under the grounds presented in the Petition. Pursuant to § 314, we hereby do not institute an inter partes review as to these claims of the ’408 patent. A. Related Matters According to Petitioner, the ’408 patent is the subject of Freshub, Inc. v. Amazon.com Inc. et al., Case No. 1:19-CV-00885-ADA (W.D. Tex. June 24, 2019) (“the Freshub case”). Pet. 2; see also Paper 3, 1. IPR2020-01146 Patent 10,232,408 B2 3 B. The ’408 Patent The ’408 patent relates to a system for processing voice orders and presenting lists of items based on users’ verbal orders. Ex. 1001, 2:25–32, 2:59–60. The system uses a voice recording device to record a user’s spoken words regarding, e.g., product descriptions and verbally provided product orders, and create digital files from the recordings. Id. at 8:17–40. The system then uses voice recognition software to translate the digital files into text files. Id. Figure 2 of the ’408 patent, reproduced below, illustrates a networked storage system that converts spoken language including spoken orders, into a digital representation. Id. at 2:49–50, 12:11–18. IPR2020-01146 Patent 10,232,408 B2 4 Figure 2 illustrates a networked storage system that converts spoken language into a digital representation. Id. at 2:49–50, 12:11–18. The networked storage system illustrated in Figure 2 includes a computer system 202 that collects and stores information scanned from items stored in multiple storage units. Id. at 12:10–35. The computer system is coupled to a local scanner 204, a screen 206 (such as, e.g., a touch screen that can receive user inputs via finger and/or pen), and a microphone 203. Id. at 12:11–16. The microphone is coupled to a digitizer that converts spoken language into a digital representation. Id. at 12:16–18. Computer system 202, scanner 204, and screen 206 may be removably mounted to a refrigerator 208, or may be mounted on a wall, stand, or other supporting structure. Id. at 12:20–25. Scanners coupled to computer system 202 are configured to scan other storage units, such as another refrigerator 210 and a cabinet 212. Id. at 12:25–31. Each storage unit may also have its own associated computer system. Id. at 12:33–35. Figure 8 of the ’408 patent, reproduced below, illustrates a method for processing a voice order. Id. at 2:59–60, 13:55–57. IPR2020-01146 Patent 10,232,408 B2 5 Figure 8 illustrates a method for processing a voice order. Id. at 2:59–60, 13:55–57. As illustrated in Figure 8, the method starts with a user’s verbally provided order (state 802). Id. at 13:56–14:7. To provide the order, the user may press a “record shopping list” control, pursuant to which the system prompts the user via a display and/or via a spoken instruction to verbally record a shopping list or speak the order. Id. The user may speak the order into microphone 203 illustrated in Figure 2. Id. at 14:8–9. The system then digitizes and records the spoken order in a file, and transmits the digitized verbal order to a remote system, such as remote system 214 illustrated in IPR2020-01146 Patent 10,232,408 B2 6 Figure 2 (state 804). Id. at 14:9–13. The remote system performs voice recognition on the order in order to interpret the spoken order, and converts the spoken order into text (state 806). Id. at 14:11–15. The remote system may use grammar constrained recognition and/or natural language recognition. Id. at 14:15–18. The remote system then transmits the text version of the order for display to the user so that the user can check if the text version is an accurate interpretation of the spoken order (state 808). Id. at 14:18–20. If the user determines that the order was not correctly translated, the user can provide a corrected order (e.g., via keyboard, or by speaking the order again) to the remote system. Id. at 14:21–26. The remote system then transmits the translated version of the order (the text version) to one or more providers (e.g., supermarkets, wholesale establishments, etc.) in order to receive quotes (state 810). Id. at 14:27–30. Upon receiving quotes from potential providers, the remote system transmits the quotes to the user (state 812). Id. at 14:35–36. Thereafter, the user selects a provider and authorizes placement of the order (state 814), and the remote system places the order with the selected provider (state 816). Id. at 14:37–39. C. Challenged Claims Petitioner challenges claims 1–30. Claims 1, 20, and 30 are independent claims. Claims 1, 20, and 30 are reproduced below: 1. A voice processing system comprising: a networks interface; a computer; non-transitory memory that stores instructions that when executed by the computer cause the computer to perform operations comprising: receive, using the network interface, a digitized order of a user from a remote system configured to receive IPR2020-01146 Patent 10,232,408 B2 7 user spoken words, the remote system comprising a microphone, a wireless network interface, and a digitizer coupled to the microphone, wherein the digitizer is configured to convert spoken words into a digital representation; translate at least a portion of the digitized order to text; match the text, translated from the digitized order, to a text description stored in a database comprising text descriptions of items and associated unique product identifiers; based at least in part on the identified match of the text translated from the digitized order to the text description stored in a database, identify an item corresponding to the text description; add the identified item to a set of items associated with the user; and enable the set of items, including the identified item, to be displayed via a user display. 20. A computer-implemented method, the method comprising: receiving over a network at a network interface a digitized order of a user from a remote system configured to receive user spoken words, the remote system comprising a microphone, a wireless network interface, and a digitizer coupled to the microphone, wherein the digitizer is configured to convert spoken words into a digital representation; translating, using a processing system comprising at least one processing device and configured to perform translation of voice orders to text, at least a portion of the digitized order to text; matching, using the processing system, the text, translated from the digitized order, to a text description associated with a unique product identifier; based at least in part on the unique product identifier associated with the text description matched to the text translated from the digitized order, identifying, using the processing system, an item corresponding to the text; IPR2020-01146 Patent 10,232,408 B2 8 causing the identified item to be placed on an item set associated with the user; and enabling the item set, including at least the identified item, to be displayed via a user display remote from the processing system. 30. Non-transitory memory that stores instructions that when executed by a computer cause the computer to perform operations comprising: receive a digitized voice communication of a user from a remote system configured to receive user spoken words, the remote system comprising a microphone, a wireless network interface, and a digitizer coupled to the microphone, wherein the digitizer is configured to convert spoken words into a digital representation; translate at least a portion of the digitized voice communication to text; match the text, translated from the digitized voice communication, to a text description associated with a unique product identifier, wherein the text description is accessed from a data store; based at least in part on the unique product identifier associated with the text description matched to the text translated from the digitized voice communication, identify an item corresponding to the text; cause the identified item to be included in an item set associated with the user; and enable the item set, including the identified item, to be displayed via a user display. Ex. 1001, 14:46–17:16. IPR2020-01146 Patent 10,232,408 B2 9 D. Alleged Grounds of Unpatentability Petitioner asserts the following grounds of unpatentability: Claims Challenged Reference(s)/Basis 35 U.S.C. § 1, 14, 15, 17–20, 28–30 Calderone,1 Ogasawara,2 Sanchez3 § 103 2–4, 7–9, 11–13, 21, 22, 25–27 Calderone, Ogasawara, Sanchez, Partovi4 § 103 5, 23 Calderone, Ogasawara, Sanchez, Kuhn5 § 103 6, 24 Calderone, Ogasawara, Sanchez, Sichelman6 § 103 10, 16 Calderone, Ogasawara, Sanchez, Cooper7 § 103 Pet. 2–3. In addition to the references listed above, Petitioner relies on the Declaration of Dr. Dan R. Olsen Jr. Ex. 1002 (“Olsen Decl.”). 1 U.S. Patent App. Pub. 2001/0056350 A1, published Dec. 27, 2001 (Exhibit 1003) (“Calderone”). 2 U.S. Patent No. 6,543,052 B1, issued Apr. 1, 2003 (Exhibit 1004) (“Ogasawara”). 3 U.S. Patent App. Pub. 2002/0194604 A1, published Dec. 19, 2002 (Exhibit 1005) (“Sanchez”). 4 U.S. Patent No. 7,376,586 B1, issued May 20, 2008 (Exhibit 1006) (“Partovi”). 5 U.S. Patent No. 6,553,345 B1, issued Apr. 22, 2003 (Exhibit 1020) (“Kuhn”). 6 U.S. Patent App. Pub. 2003/0235282 A1, published Dec. 25, 2003 (Exhibit 1008) (“Sichelman”). 7 U.S. Patent No. 6,757,362 B1, issued June 29, 2004 (Exhibit 1007) (“Cooper”). IPR2020-01146 Patent 10,232,408 B2 10 II. ANALYSIS A. Claim Construction For petitions such as this one, filed after November 13, 2018, we apply the same claim construction standard “used in federal courts, in other words, the claim construction standard that would be used to construe the claim in a civil action under 35 U.S.C. [§] 282(b),” which is articulated in Phillips v. AWH Corp., 415 F.3d 1303 (Fed. Cir. 2005) (en banc). See 37 C.F.R. § 42.100(b) (2019). Under the Phillips standard, the “words of a claim ‘are generally given their ordinary and customary meaning,’” which is “the meaning that the term would have to a person of ordinary skill in the art in question at the time of the invention, i.e., as of the effective filing date of the patent application.” Phillips, 415 F.3d at 1312–13. Petitioner notes that in the Freshub case, the parties agreed that the term “increase recognition accuracy” in claims 5 and 6, means “improve the likelihood of correctly identifying the item,” and requests that the Board adopt this same construction. Pet. 6–7 (citing Ex. 1001, 8:38–43). For the remaining terms in the challenged claims of the ’408 patent, Petitioner urges plain and ordinary meanings. Id. Patent Owner does not challenge Petitioner’s assertions regarding claim construction. Prelim. Resp. 14–15. We determine that for the purposes of this Decision, it is unnecessary to expressly construe any claim term. See Vivid Techs., Inc. v. Am. Sci. & Eng’g, Inc., 200 F.3d 795, 803 (Fed. Cir. 1999) (only terms in controversy must be construed and only to the extent necessary to resolve the controversy); see also Nidec Motor Corp. v. Zhongshan Broad Ocean Motor IPR2020-01146 Patent 10,232,408 B2 11 Co., 868 F.3d 1013, 1017 (Fed. Cir. 2017) (citing Vivid Techs. in the context of an inter partes review). B. Alleged Obviousness In Graham v. John Deere Co. of Kansas City, 383 U.S. 1 (1966), the Supreme Court set out a framework for assessing obviousness under § 103 that requires consideration of four factors: (1) the “level of ordinary skill in the pertinent art,” (2) the “scope and content of the prior art,” (3) the “differences between the prior art and the claims at issue,” and (4) “secondary considerations” of non-obviousness such as “commercial success, long-felt but unsolved needs, failure of others, etc.” Id. at 17–18. “While the sequence of these questions might be reordered in any particular case,” KSR Int’l Co. v. Teleflex Inc., 550 U.S. 398, 407 (2007), the Federal Circuit has “repeatedly emphasized that an obviousness inquiry requires examination of all four Graham factors and that an obviousness determination can be made only after consideration of each factor.” Nike, Inc. v. Adidas AG, 812 F.3d 1326, 1335 (Fed. Cir. 2016). We note that, with respect to the fourth Graham factor, the current record in this proceeding does not include any argument or evidence directed to secondary considerations of nonobviousness. The analysis below addresses the first three Graham factors. C. Level of Ordinary Skill in the Art In determining the level of skill in the art, we consider the type of problems encountered in the art, the prior art solutions to those problems, the rapidity with which innovations are made, the sophistication of the technology, and the educational level of active workers in the field. Custom IPR2020-01146 Patent 10,232,408 B2 12 Accessories, Inc. v. Jeffrey-Allan Indus. Inc., 807 F.2d 955, 962 (Fed. Cir. 1986); Orthopedic Equip. Co. v. United States, 702 F.2d 1005, 1011 (Fed. Cir. 1983). Petitioner contends that an ordinarily skilled artisan at the time of the invention of the ’408 patent would have had the following education and experience, at least a Bachelor-level degree in computer science, computer engineering, electrical engineering, or a related field in computing technology, and two years of experience with automatic speech recognition and natural language understanding or equivalent education, research experience, or knowledge. Pet. 3–4 (citing, Ex. 1002 ¶¶ 24–26). Patent Owner does not dispute this level of skill. For purposes of this Decision, we adopt Petitioner’s proposal, which comports with the teachings of the ’408 patent and the asserted prior art. See Okajima v. Bourdeau, 261 F.3d 1350, 1355 (Fed. Cir. 2001). D. Obviousness Based on Calderone, Ogasawara, and Sanchez – Claims 1, 14, 15, 17–20, and 28–30 Petitioner asserts that claims 1, 14, 15, 17–20, and 28–30 of the ’408 patent would have been obvious over the combination of Calderone, Ogasawara, and Sanchez. Pet. 3, 16–40. For the reasons discussed below, we are not persuaded that Petitioner has demonstrated a reasonable likelihood of prevailing on this challenge. 1. Summary of Calderone (Ex. 1003) Calderone describes a system and method for voice recognition near a wireline node of a network supporting cable television and/or video delivery. Ex. 1003, Title, Abstract. Calderone’s system provides speech recognition IPR2020-01146 Patent 10,232,408 B2 13 services to a collection of users over a network, user identification based upon the speech recognition over the network, and user identified speech contracting over the network for real-time auctions and contracting. Id. ¶ 39. Spoken commands from a cable subscriber are recognized and acted upon to control the delivery of entertainment and information services, such as Video On Demand (VOD), Pay Per View, Channel control, and on-line shopping. Id. ¶¶ 41, 539. Calderone’s Figure 3, reproduced below, illustrates a system providing speech recognition services. Id. ¶ 60. Figure 3 illustrates a remote control unit 1000 coupled to a set-top apparatus 1100 communicating via a wireline physical transport 1200, a distributor node 1300, and a high speed physical transport 1400, with one or more IPR2020-01146 Patent 10,232,408 B2 14 gateways 3100 and one or more server arrays 3200 of a server farm 3000. Id. ¶ 60. As shown in Figure 3, Remote control unit 1000 is fitted with a microphone that relays the subscriber’s speech commands to a central speech recognition engine. Id. ¶¶ 110–111. “The analog signals picked up from the microphone are converted to digital signals where they undergo additional processing before being transmitted to the speech recognition and identification engine located in the . . . centralized location.” Id. ¶ 115. Calderone’s Figures 20A and 20B further teach that the set-top apparatus 1100 may include computer 1150 (i.e., first computer), remote interface 1130, network interface 1170 (i.e., wireless network interface), and memory 1160. Id. ¶¶ 268–275, Figs. 20A–B. Calderone’s central speech recognition engine may process a multiplicity of received speech channels to create a multiplicity of identified speech content, and then responds to the identified speech content to create an identified speech content response, for each of the multiplicity of the identified speech contents. Id. ¶¶ 217–218. Once a complete spoken request has been received, the speech input processor may use a sample’s source address identifying a user site to target the speech data to a specific speech processing processor. Id. ¶ 148. The speech engine determines the most likely spoken request based on statistical analysis, and may return a text string corresponding to the spoken request. Id. ¶ 162. Additionally, Calderone teaches that the speech recognition engine returns a result, visual text corresponding to the recognized spoken request may be transmitted back to the set-top box. Id. ¶¶ 166–167. Software executing within the set- top box displays the text information. Id. ¶ 167. “By displaying the text of IPR2020-01146 Patent 10,232,408 B2 15 the possible recognition results, the user can easily select from the returned list.” Id. ¶ 168. 2. Summary of Ogasawara (Ex. 1004) Ogasawara discloses “an Internet shopping system hosted on a television-set-top-box combination including a remote controller with voice recognition capabilities.” Ex. 1004, 1:6–10, Abstract. The remote control unit includes a keypad and a microphone, and the set-top box (STB) includes voice recognition software and bar code recognition software to support the electronic shopping system. Id. Data is input to an Internet shopping Web program accessed through a Web browser associated with the STB. Id. Ogasawara’s Figure 1, reproduced below, illustrates an electronic shopping system. Id. at 2:48–51. Figure 1 illustrates an electronic shopping system including a television set, a set-top box, and a remote control unit. Id. at 2:48–51. The electronic shopping system illustrated in Figure 1 includes STB 10, television 12, and remote control unit 14. Id. at 3:54–65. STB 10 IPR2020-01146 Patent 10,232,408 B2 16 receives television signals for performing conventional television reception functions. Id. Remote control unit 14 is in communication with STB 10, and includes a keypad for allowing input of keypad data to STB 10, and a microphone for capturing voice data from the user. Id. at 4:13–38. The user may thus provide oral commands to the system during Internet shopping, to select purchase items. Id. at 4:28–38, 9:40–42. The STB is configured with voice recognition software, and user’s voice input recognized by the voice recognition software is transferred to an Internet shopping Web program. Id. at 3:28–36, 9:47–60. Ogasawara describes a purchasing process in which voice recognition is performed by converting the voice data to the corresponding character data. The extracted character data is then transferred to the transaction program [downloaded to the STB]. . . . The data input process continues until all necessary selections have been made by the user to complete an item selection . . . . The client purchase transaction program in the STB 10 is in communication with the server purchase transaction program on the Web server 72 [to which the STB’s tuner provides an Internet connection]. Upon client selection of an item, the server program retrieves information corresponding to the selected item from a Price Lookup (PLU) Table. In the described embodiment, all merchandise information is maintained in the PLU Table. The PLU Table is, in turn, stored and maintained in the Web server 72 database. Id. at 9:30–10:5. 3. Summary of Sanchez (Ex. 1005) Sanchez describes an interactive television virtual shopping cart that facilitates product purchases in an interactive television system. Ex. 1005, Abstract. Upon presentation of an advertisement, movie, or other television program in a programming stream, an indication such as an icon may be IPR2020-01146 Patent 10,232,408 B2 17 presented to a viewer indicating that product or service information is available. Id. The viewer may select the icon and store the corresponding product or service information in a virtual shopping cart or shopping list. Id. The viewer may also tune to a virtual channel and interact with the virtual shopping cart in order to add, delete, or initiate a purchase of products or services. Id. The viewer’s purchase requests may be conveyed via the Internet. Id. 4. Independent Claims 1, 20, and 30 The Petition contains an analysis of how the combination of Calderone, Ogasawara, and Sanchez allegedly teaches or suggests each limitation of independent claim 1. Pet. 18–30. Petitioner presents similar contentions with respect to independent claims 20 and 30 with reference to its arguments and evidence made for claim 1. Pet. 34–40. Patent Owner disputes Petitioner’s contentions, as discussed in detail below. Prelim. Resp. 17–23. a. Claim 1 Claim 1 recites, in part: a computer; non-transitory memory that stores instructions that when executed by the computer cause the computer to perform operations comprising: receive, using the network interface, a digitized order of a user from a remote system configured to receive user spoken words, the remote system comprising a microphone, a wireless network interface, and a digitizer coupled to the microphone, wherein the digitizer is configured to convert spoken words into a digital representation; translate at least a portion of the digitized order to text; IPR2020-01146 Patent 10,232,408 B2 18 match the text, translated from the digitized order, to a text description stored in a database comprising text descriptions of items and associated unique product identifiers; [and] based at least in part on the identified match of the text translated from the digitized order to the text description stored in a database, identify an item corresponding to the text description. Ex. 1001, 14:48–67 (emphasis added). i. Petitioner’s Contentions Petitioner contends the remote control coupled to a set-top box in Calderone’s Fig. 3 teaches claim 1’s “remote system” comprising a digitizer (Calderone’s speech command preprocessor) coupled to a microphone, the digitizer configured to convert spoken words into a digital representation. Pet. 22–25 (citing Ex. 1003 ¶¶ 40–41, 106, 110–111, 115–116, 141, 175, Fig. 3; Ex. 1002 ¶¶ 92–93, 96–97). Petitioner further contends Calderone’s speech processing system includes at least one computer performing the operations of identifying and responding to speech content, thereby teaching claim 1’s “computer” that receives a digitized order of a user from the remote system and translates at least a portion of the digitized order to text. Pet. 20, 22–26 (citing Ex. 1003 ¶¶ 40, 106, 162, 166–167, 214, 217–220, 280–284, 299–301, 303, 308, 381, Fig. 23; Ex. 1002 ¶¶ 87, 92–95, 98). Petitioner further contends Calderone, in view of Ogasawara’s use of a Price Lookup (PLU) Table and SKU (Stock Keeping Unit) numbers, teaches matching of text translated from a digitized order to a text description stored in a database comprising text descriptions of items and associated unique product identifiers, and identifying an item corresponding to the text description, as recited in claim 1. Pet. 26–28 (citing Ex. 1003 IPR2020-01146 Patent 10,232,408 B2 19 ¶¶ 41, 48, 137, 163, 170–171, 217–218, 539; Ex. 1004, 9:48–64, 9:67– 10:19; Ex. 1002 ¶¶ 99–104). Additionally, Petitioner argues Calderone discloses searching databases, based on translated text, for matching movie names, program names, and names or descriptions of products in response to user requests, and Ogasawara discloses using “extracted character data” to search the PLU Table for matching “text string[s] giving the brand or trade name of the product and including a generic description of the product.” Pet. 28. Petitioner then asserts that a person of ordinary skill in the art (“POSITA”) “would have understood that Calderone and Ogasawara disclose the function of using the matching names or descriptions to identify the actual item to provide in response to each user request, whether it is a movie, a program, or other merchandise.” Id. ii. Patent Owner’s Contentions Patent Owner asserts Calderone in view of Ogasawara does not disclose identifying an item “based at least in part on the identified match of the text translated from the digitized order to the text description stored in a database, identify an item corresponding to the text description,” recited in claim 1. Prelim. Resp. 17–23. In particular, Patent Owner argues that in Calderone, “the process of identifying takes place on the user’s front-end system, not the back-end system” and similarity searching for content items occurs on the front-end system. Id. at 18. Additionally, Patent Owner contends that “the portions of Calderone that [Petitioner] refers to only disclose identifying speech, not identifying a content item associated with translated speech.” Id. Patent Owner adds that Petitioner relies on “unsupported expert testimony” and conclusory testimony to supply a claim limitation and to provide motivation. Id. at 19. IPR2020-01146 Patent 10,232,408 B2 20 Patent Owner further contends that Petitioner offered no evidence or explanation as to how Ogasawara performs identification at the back-end system because “Ogasawara discloses selections having been made by the user to complete an item selection, meaning that any item identification occurs on the front-end system (specifically, the set-top box), rather than the back-end system,” and Ogasawara’s “‘extracted character data’ is generated on the front-end system and is transferred to the downloaded transaction program (which is also on the front-end system) for analysis, including identification of an item corresponding to the user’s request.” Id. at 20. Patent Owner adds that Ogasawara’s back-end system “simply receives an item selection and performs a lookup to retrieve ‘merchandise information’ for purchase of an item which the set-top box’s downloaded transaction program has already identified” while claim 1 requires that translation and identification both happen on the back-end system. Id. at 21 (citing Ex. 1003 ¶¶ 41, 48, 137, 163, 166–170, 217–218, 539; Ex. 1004, 2:59–62, 7:53– 55, 9:48–49, 9:62–10:5, 10:17–19, Fig. 5A; Ex. 1002 ¶ 105). iii. Discussion To start, we note that claim 1 recites “a computer” that: (1) “translate[s] at least a portion of the digitized order to text”; (2) “match[es] the text, translated from the digitized order, to a text description stored in a database comprising text descriptions of items and associated unique product identifiers”; and (3) “based at least in part on the identified match of the text translated from the digitized order to the text description stored in a database, identif[ies] an item corresponding to the text description.” Ex. 1001, 14:59–67 (emphases added). IPR2020-01146 Patent 10,232,408 B2 21 First, with regard to these limitations, Petitioner relies on nearly the same disclosure in Calderone and Ogasawara for both the “match” and “identify” steps. As discussed, Petitioner asserts that Calderone’s “speech recognition engine” (or “speech engine”)8 teaches the recited “computer.” Pet. 20 (“Calderone’s speech processing system includes ‘at least one computer’ performing the operations of identifying and responding to speech content in Figure 10. Ex. 1003, [0217]-[0220], [0381].”). More specifically, Petitioner contends that Calderone’s speech engine matches translated text to stored descriptions because [t]he Calderone system performs this function by responding to “identified speech content to create an identified speech content response.” Ex. 1003, [0217]-[0218]; Ex. 1002, ¶ 99. To respond to VOD, “Pay Per View” (“PPV”), or “on-line shopping” requests (Ex. 1003, [0041]) requires identifying content, i.e. items, matching the text of those requests. For VOD and PPV, the system searches for matching content using movie titles or actor names, i.e. text descriptions related to the movie or program. Id. at [0170]. Analogously, for on-line shopping, a POSITA would understand the system to search for names or descriptions of items that match purchase requests. See id. at [0539] (system supports “[i]nteractive shopping, based upon shopping content placed on a VOD server and activated through the speech recognition responses”). Pet. 26 (emphasis added). Petitioner further relies on Ogasawara’s disclosure of PLU Tables and SKU numbers as text descriptions. Id. at 27 (“[I]t would have been obvious to include the SKU numbers disclosed by Ogasawara for items such as movies, videos, and online shopping 8 Based on the disclosures in the cited paragraphs, we agree with Petitioner’s implicit position that Calderone uses at least the terms “speech recognition engine” and “speech engine” interchangeably. See Ex. 1003 ¶ 162 (referring to a “speech engine”), ¶ 166 (referring to a “speech recognition engine”). IPR2020-01146 Patent 10,232,408 B2 22 merchandise in the database of Calderone, for example, to distinguish between items with similar names and to track available items with alpha- numeric identifiers rather than text descriptions.”). Referring to essentially these same sections, Petitioner argues this disclosure also teaches the separate “identify” limitation recited in claim 1. Petitioner asserts that [a]s described in Section IX.A.1[1.5], Calderone discloses searching databases, based on translated text, for matching “names of movies,” “program names,” and names or descriptions of products in response to user requests for VOD, PPV, and “on-line shopping” services. Ex. 1003, [0048], [0137], [0170], [0539]. Ogasawara similarly discloses using “extracted character data” to search the PLU Table for matching “text string[s] giving the brand or trade name of the product and including a generic description of the product.” Ex. 1004, 9:67- 10:14. A POSITA would have understood that Calderone and Ogasawara disclose the function of using the matching names or descriptions to identify the actual item to provide in response to each user request, whether it is a movie, a program, or other merchandise. Ex. 1003, [0041], [0163], [0539] (user can receive the movies/programs requested or obtain “shopping content”); Ex. 1004, 10:17-19 (the user can buy the item identified by the merchandise entry); Ex. 1002, ¶ 104. Pet. 28 (emphases added). Yet, claim 1 recites the “match” and “identify” steps as distinct limitations and separate operations performed by the recited “computer.” Although one aspect of a prior art’s disclosure may satisfy multiple claim limitations, see, e.g., Powell v. Home Depot U.S.A., Inc., 663 F.3d 1221, 1231–32 (Fed. Cir. 2011), Petitioner has not explained sufficiently how the same disclosure in Calderone and Ogasawara reads on both limitations. See Bicon, Inc. v. Straumann Co., 441 F.3d 945, 950 (Fed. Cir. 2006) (holding IPR2020-01146 Patent 10,232,408 B2 23 that claims are “interpreted with an eye toward giving effect to all terms in the claim”); see also 37 C.F.R. § 42.104(b) (requiring Petitioner to set forth “[h]ow the challenged claim is to be construed” and “[h]ow the construed claim is unpatentable”). Second, Petitioner has failed to clearly explain the precise nature of any proposed combination of Calderone and Ogasawara as to the “identify” step. Turning to Petitioner’s cited passages, Calderone explains that the speech engine processes “a multiplicity of the received identified speech channels to create a multiplicity of identified speech content.” Ex. 1003 ¶ 217. Referring to Figure 10, Calderone provides that “[o]peration 2032 performs responding to the identified speech content to create an identified speech content response, for each of the multiplicity of the identified speech contents.” Id. ¶ 218. Calderone further teaches that “a single computer may perform the operations of the speech engine as shown in FIG. 10. The computer performs these operations as controlled by a program system including program steps residing in accessibly coupled memory.” Ex. 1003 ¶ 381. Based on this disclosure, Petitioner asserts that Calderone’s disclosed “identified speech content response” satisfies claim 1’s requirement to “identify an item corresponding to the text description,” such as a movie or video on demand. Pet. 28. Nonetheless, as Patent Owner points out, Calderone teaches that once the speech engine receives the digitized spoken request from the set-top box, the speech engine determines the most likely spoken request based on statistical analysis and voice samples. Ex. 1003 ¶ 162. The speech engine then returns a text string, rather than an item, corresponding to the spoken IPR2020-01146 Patent 10,232,408 B2 24 request. Id. In this way, the speech processing engine performs the translation of the digitized user request into text. Further, the disclosure in paragraphs 217–220 of Calderone more strongly supports Patent Owner’s view. See Prelim. Resp. 18 (“[T]he portions of Calderone that Amazon refers to only disclose identifying speech, not identifying a content item associated with translated speech.”). Indeed, as discussed above, Calderone’s paragraphs 217–220 do not mention identifying any item. Rather, Figure 10, which is reproduced below, describes the receipt and processing of digitized content from multiple voice channels. Figure 10 depicts a flowchart of a method using a back channel from a multiplicity of user sites containing a multiplicity of identified speech channels presented to a speech processing system at a wireline node in a network supporting cable television delivery in accordance with the invention. Ex. 1003 ¶ 67. While paragraphs 217–220 and Figure 10 also teach the step of “responding to identified speech content to create identified speech content response,” IPR2020-01146 Patent 10,232,408 B2 25 this disclosure is not specific to any type of response. Id. ¶¶ 217–220, Fig. 10. Still, Petitioner presumes that the identified speech content response must be a movie, video, etc., described in other parts of Calderone. See Pet. 28 (citing Ex. 1003 ¶¶ 41, 48, 137, 163, 170, 539). However, relied-upon paragraphs 41, 48, 137, and 539 in Calderone together indicate that an objective of the system overall is to use speech recognition to, for example, “control the delivery of entertainment and information services, such as Video On Demand, Pay Per View, Channel control, on-line shopping, and the Internet.” See, e.g., Ex. 1003 ¶ 41. But, as argued by Patent Owner, these paragraphs do not address the specific requirements of the “identify” step. See Prelim. Resp. 18–19 (arguing that these paragraphs “reference speech recognition or accuracy improvement methodologies generally” but that none “suggest identifying a corresponding content item associated with translated speech”). For example, none of these paragraphs discloses the use of “text” translated from speech to identify a movie or other content, and none discloses that process taking place in the identified “computer”—i.e., the speech recognition engine. Additionally, it is unclear how the disclosure in these paragraphs must teach the specific operations performed by the identified “computer” (i.e., speech engine) described in paragraphs 217–220 and Figure 10, as opposed to other components of Calderone’s system. Likewise, relied-upon paragraph 163 of Calderone discusses how the “recognition results” of translations from speech to text of different information can have either “low cost” or “high cost.” Ex. 1003 ¶ 163. As an example of “low cost” recognition results, Calderone discloses a “request to display listings for a particular movie,” whereas, as an example of “high IPR2020-01146 Patent 10,232,408 B2 26 cost” recognition results, Calderone discloses a “request to purchase a movie.” Id. To the extent Petitioner relies upon a selection by the user based in either of these results, however, that selection would be on set-top box 1100 or remote 1000 in Calderone (identified as the “remote system” in claim 1), not the speech recognition engine. See Pet. 24–25; Ex. 1003 ¶ 167 (“This rapid visual feedback may be accomplished by transmitting the recognized text string back to the set-top box. Software executing within the set-top box displays the text information in a special window on top or overlaying of the existing application display.”), ¶ 168 (“In cases where the recognition accuracy is particularly poor, and the speech engine returns several possible recognition results, this overlay display capability may be used to help refine the user’s query. By displaying the text of the possible recognition results, the user can easily select from the returned list.”). Moreover, to the extent Petitioner relies on some operation after the selection by the user, Petitioner has not sufficiently shown that that process takes place in the speech engine (i.e., “computer”). Relied-upon paragraph 170 of Calderone discloses the use of “similarity searching”—i.e., searching for names of movie titles and actors, “which are only partially matched, or which resemble the recognized phrase, without requiring precise specification of the exact title or name.” Ex. 1003 ¶ 170. Even assuming that this process uses a “text string” returned from the speech processing system and then selected by the user (see id. ¶¶ 167–169), Petitioner has not sufficiently shown that this process takes place in the speech recognition engine—rather than another part of the system overall. In addition, Petitioner contends that it would have been obvious to include the SKU numbers disclosed by Ogasawara for items such as movies, videos, and IPR2020-01146 Patent 10,232,408 B2 27 online shopping merchandise in the database of Calderone, for example, to distinguish between items with similar names and to track available items with alpha-numeric identifiers rather than text descriptions. A POSITA could readily implement this modification to Calderone by adding a data column to the database structure. Ex. 1002, ¶ 103. The modified system could use these identifiers to locate and access movies, programs, and merchandise in response to user requests. Pet. 27. Petitioner further argues in its Reply (though not in the Petition) that Ogasawara discloses identifying a corresponding item (by the back-end system) when it “‘retrieves information corresponding to the selected item from a [PLU] Table’ which maintains ‘all merchandise information.’” Prelim. Reply 6. Initially, we note that this argument does not appear in the Petition. Even so, we are not persuaded that Ogasawara’s retrieval of PLU information corresponding to the selected item meets the limitation of claim 1. Ogasawara teaches that the voice recognition program translates voice commands from the user. Ex. 1004, 4:29–34 (“The remote control unit 14 also includes a microphone 32 for capturing voice data upon an utterance by the user. Thus, a user may provide oral commands to the system during Internet shopping instead of keypad commands, making it easier and more pleasant for an average user to use the system.”), 9:14–43 (“[N]eeded programs such as a voice recognition program, video data recognition program, voice generating program, and IC card interface program, if not already preloaded in the STB, are loaded from the local storage unit 74 or downloaded from the Web server 72.”). Specifically, Ogasawara teaches that “[i]f microphone input is detected 112, voice recognition is performed by converting the voice data to the corresponding character data. The IPR2020-01146 Patent 10,232,408 B2 28 extracted character data is then transferred to the transaction program 116.” Id. at 9:47–49. Once the voice data has been translated, the extracted character data is provided to client transaction program 116, which is also in the set-top box. Ex. 1004, 9:50–51, 9:65–66. At the client site, the user then selects an item for purchase. In this way, Ogasawara does not teach that the voice recognition program performs any identification or selection of items. Rather, “[t]he client purchase transaction program in the STB 10 is in communication with the server purchase transaction program on the Web server . . . [and] [u]pon client selection of an item, the server program retrieves information corresponding to the selected item from a Price Lookup (PLU) Table.” Id. at 9:65–10:2. Thus, we agree with Patent Owner that the lookup operation by the back-end server in Ogasawara does not “use” the identified “text”—i.e., the “extracted character data” (Ex. 1004, 9:50) to “identify an item” as recited in claim 1. Instead, the user selects an item at the set-top box using the “text” to “identify an item.” See Ex. 1004, 9:62–64, cited at Prelim. Resp. 20; see also Prelim. Resp. 21 (“The downloaded transaction program on the set-top box is the only program that receives and uses the extracted character data derived from the user’s voice data.”). Further, as discussed, Ogasawara’s downloaded client transaction program communicates with a server program to retrieve information corresponding to the selected item from a PLU Table stored and maintained in the server database. Ex. 1004, 10:1–5. As such, Ogasawara does not teach that the local voice recognition program communicates with a server, database, or otherwise retrieves item information. Thus, even if Calderone’s IPR2020-01146 Patent 10,232,408 B2 29 system could be modified to include Ogasawara’s remote server features, Petitioner has not explained persuasively why a POSITA would have applied these teachings to the speech engine (i.e., “computer”) described in Calderone. See Pet. 27. Additionally, Dr. Olsen’s relied upon testimony does not persuade us otherwise. Dr. Olsen’s testimony and cited evidence mirrors the arguments presented in the Petition, which we have determined are not supported by the disclosures of Calderone or Ogasawara. Compare Pet. 26–28, with Ex. 1002 ¶¶ 99–104; see also 37 C.F.R. § 42.65(a) (“Expert testimony that does not disclose the underlying facts or data on which the opinion is based is entitled to little or no weight.”). Accordingly, on this record, we determine that Petitioner has not shown sufficient basis for instituting trial on the ground that claim 1 would have been obvious based on Calderone, Ogasawara, and Sanchez. b. Claims 20 and 30 Independent claims 20 and 30 recite limitations similar to those recited in claim 1. See Ex. 1001, 16:17–21 (Claim 20 recites “based at least in part on the unique product identifier associated with the text description matched to the text translated from the digitized order, identifying, using the processing system, an item corresponding to the text”), 17:9–12 (Claim 30 recites “based at least in part on the unique product identifier associated with the text description matched to the text translated from the digitized voice communication, identify an item corresponding to the text.”). Petitioner provides similar mappings to Calderone and Ogasawara for independent claims 20 and 30. Pet. 36–37, 40. Patent Owner relies on the same arguments discussed for claim 1. Prelim. Resp. 17–23. IPR2020-01146 Patent 10,232,408 B2 30 For the same reasons discussed above, we determine that Petitioner has not shown sufficient basis for instituting. 5. Claims 14, 15, 17–19, 28, and 29 Dependent claims 14, 15, 17–19, 28, and 29 each depend from one of independent claims 1 and 20. Ex. 1001, 15:51–16:60. Petitioner asserts that the combination of Calderone, Ogasawara, and Sanchez discloses each of the additional limitations of these dependent claims. Pet. 30–34, 38. For the reasons discussed with respect to claims 1 and 20, we determine that Petitioner has not established a reasonable likelihood of prevailing in its contention that claims 14, 15, 17–19, 28, and 29 would have been obvious over the combination of Calderone, Ogasawara, and Sanchez. 6. Conclusion Accordingly, on this record, we determine that Petitioner has not shown sufficient basis for instituting trial on the ground that claims 1, 20, and 30 would have been obvious based on Calderone, Ogasawara, and Sanchez. Similarly, for the same reasons, we are not persuaded that Petitioner has shown sufficient basis for instituting trial on this same ground for claims 14, 15, 17–19, 28, and 29. See Pet. 16–40. E. Obviousness based on Calderone, Ogasawara, Sanchez, and Partovi – Claims 2–4, 7–9, 11–13, 21, 22, and 25–27 1. Summary of Partovi (Ex. 1006) Partovi discloses a “voice portal supporting telephone to web server commerce” allowing users to access web servers to complete commercial transactions. Ex 1006, Abstract. Partovi’s portal may use a “one word commerce model” that permits a user to identify a product and signal the user’s purchase intentions with a single word, phrase, or touch-tone IPR2020-01146 Patent 10,232,408 B2 31 command, pursuant to which the voice portal can complete a transaction with an electronic commerce vendor. Id. Partovi’s portal may also provide a record of commerce transactions in the form of voice receipts including vendor specific tracking and status information, and details of the purchase such as product/service description, price paid, and credit card used. Id. at 4:46–57, 22:16–22. The voice receipts can be reviewed over a telephone interface and/or from a web site coupled to the voice portal. Id. at 4:46–57. Partovi’s portal may also employ personalization techniques to supply customized commerce suggestions to customers, assist purchasers in selecting items, and present suggested items. Id. at 4:64–5:2. 2. Discussion Petitioner asserts that claims 2–4, 7–9, 11–13, 21, 22, and 25–27 of the ’408 patent would have been obvious over the combination of Calderone, Ogasawara, Sanchez, and Partovi. Pet. 40–53. Dependent claims 2–4, 7–9, 11–13, 21, 22, and 25–27 each depend from one of independent claims 1 and 20. Ex. 1001, 15:5–16:54. Petitioner does not rely on Partovi to correct the deficiencies we have identified with respect to Petitioner’s arguments based on Calderone, Ogasawara, and Sanchez for claims 1 and 20. See Pet. 40–52. Based on the record before us and the reasons discussed for claims 1 and 20, we determine that Petitioner has not established a reasonable likelihood of prevailing in its contention that claims 2–4, 7–9, 11–13, 21, 22, and 25–27 would have been obvious over the combination of Calderone, Ogasawara, Sanchez, and Partovi. IPR2020-01146 Patent 10,232,408 B2 32 F. Obviousness based on Calderone, Ogasawara, Sanchez, and Kuhn – Claims 5 and 23 1. Summary of Kuhn (Ex. 1020) Kuhn discloses a universal remote control allowing natural language modality for television and multimedia searches and requests. Ex. 1020, Abstract. The remote control houses a microphone into which the user can input natural language speech, which is recognized and interpreted by a natural language parser that extracts the semantic content of the user’s speech. Id. The parser works in conjunction with an electronic program guide, through which the remote control system is able to ascertain what programs are available for viewing or recording and supply appropriate prompts to the user. Id. 2. Discussion Petitioner asserts that claims 5 and 23 of the ’408 patent would have been obvious over the combination of Calderone, Ogasawara, Sanchez, and Kuhn. Pet. 53–57. Dependent claims 5 and 23 each depend from one of independent claims 1 and 20. Ex. 1001, 15:14–18, 16:35–38. Petitioner does not rely on Kuhn to correct the deficiencies we have identified with respect to Petitioner’s arguments based on Calderone, Ogasawara, and Sanchez for claims 1 and 20. See Pet. 53–57. Based on the record before us and the reasons discussed for claims 1 and 20, we determine that Petitioner has not established a reasonable likelihood of prevailing in its contention that claims 5 and 23 would have been obvious over the combination of Calderone, Ogasawara, Sanchez, and Kuhn. IPR2020-01146 Patent 10,232,408 B2 33 G. Obviousness based on Calderone, Ogasawara, Sanchez, and Sichelman – Claims 6 and 24 1. Summary of Sichelman (Ex. 1008) Sichelman discloses an automated, scalable call-taking system that integrates with existing telephony infrastructures and enables, through speech recognition, text-to-speech (TTS), as well as inputting, accessing, and retrieving information to and from multiple back-end dispatch and booking systems without the need for a human operator. Ex. 1008, Abstract. Sichelman’s system allows passengers in a ground transportation system to access a telephony gateway that performs speech recognition, TTS, and audio playback. Id. ¶¶ 7, 10–12. An application speech server processes passenger and transportation transactions such as ordering a vehicle, gathering information in real-time about available vehicles, and choosing a vehicle type for a reservation. Id. ¶ 12. The speech server is in real-time communication with back-end fleet dispatch and booking systems, enabling transactions typically undertaken by a human dispatcher or agent. Id. ¶ 13. 2. Discussion Petitioner asserts that claims 6 and 24 of the ’408 patent would have been obvious over the combination of Calderone, Ogasawara, Sanchez, and Sichelman. Pet. 57–60. Dependent claims 6 and 24 depend from one of independent claims 1 and 20. Ex. 1001, 15:19–23, 16:39–42. Petitioner does not rely on Sichelman to correct the deficiencies we have identified with respect to Petitioner’s arguments based on Calderone, Ogasawara, and Sanchez for claims 1 and 20. See Pet. 57–60. Based on the record before us and the reasons discussed for claims 1 and 20, we determine that Petitioner has not established a reasonable IPR2020-01146 Patent 10,232,408 B2 34 likelihood of prevailing in its contention that claims 6 and 24 would have been obvious over the combination of Calderone, Ogasawara, Sanchez, and Sichelman. H. Obviousness based on Calderone, Ogasawara, Sanchez, and Cooper – Claims 10 and 16 1. Summary of Cooper (Ex. 1007) Cooper discloses a “computer-based virtual assistant the behavior of which can be changed by the user, comprising a voice user interface for inputting information into and receiving information from the virtual assistant by speech.” Ex. 1007, Abstract. Cooper’s virtual assistant automatically adapts its behavior responsive to input received by the virtual assistant. Id. The virtual assistant adapts to the user based on input received by the virtual assistant, such input including user information, such as information about the user’s experience, time between user sessions, and amount of time a user pauses when recording a message. Id. 2. Discussion Petitioner asserts that claims 10 and 16 of the ’408 patent would have been obvious over the combination of Calderone, Ogasawara, Sanchez, and Cooper. Pet. 61–64. Dependent claims 10 and 16 each depend from independent claim 1. Ex. 1001, 15:34–36, 15:56–58. Petitioner does not rely on Cooper to correct the deficiencies we have identified with respect to Petitioner’s arguments based on Calderone, Ogasawara, and Sanchez for claim 1. See Pet. 61–64. Based on the record before us and the reasons discussed for claim 1, we determine that Petitioner has not established a reasonable likelihood of IPR2020-01146 Patent 10,232,408 B2 35 prevailing in its contention that claims 10 and 16 would have been obvious over the combination of Calderone, Ogasawara, Sanchez, and Cooper. III. CONCLUSION After considering the evidence and arguments presented in the Petition, we determine that Petitioner has not demonstrated a reasonable likelihood of success in proving that at least one claim of the ’408 patent is unpatentable. Therefore, we do not institute an inter partes review on the asserted grounds as to any of the challenged claims. IV. ORDER In consideration of the foregoing, it is hereby: ORDERED that the Petition is denied as to all challenged claims of the ’408 patent and no inter partes review is instituted. IPR2020-01146 Patent 10,232,408 B2 36 PETITIONER: J. David Hadden Saina Shamilov Allen Wang FENWICK & WEST LLP dhadden@fenwick.com sshamilov@fenwick.com allen.wang@fenwick.com PATENT OWNER: James Hannah Jeffrey H. Price Jonathan Caplan KRAMER LEVIN NAFTALIS & FRANKEL LLP jhannah@kramerlevin.com jprice@kramerlevin.com jcaplan@kramerlevin.com Copy with citationCopy as parenthetical citation