Ex Parte KUMAR et alDownload PDFPatent Trials and Appeals BoardMar 26, 201913721276 - (D) (P.T.A.B. Mar. 26, 2019) Copy Citation UNITED STA TES p A TENT AND TRADEMARK OFFICE APPLICATION NO. FILING DATE 13/721,276 12/20/2012 14824 7590 03/28/2019 Moser Taboada/ SRI International 1030 Broad Street Suite 203 Shrewsbury, NJ 07702 FIRST NAMED INVENTOR RAKESH KUMAR UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O. Box 1450 Alexandria, Virginia 22313-1450 www .uspto.gov ATTORNEY DOCKET NO. CONFIRMATION NO. SRI6631-2 3639 EXAMINER HE, YINGCHUN ART UNIT PAPER NUMBER 2613 NOTIFICATION DATE DELIVERY MODE 03/28/2019 ELECTRONIC Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. Notice of the Office communication was sent electronically on above-indicated "Notification Date" to the following e-mail address(es): docketing@mtiplaw.com llinardakis@mtiplaw.com PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE BEFORE THE PATENT TRIAL AND APPEAL BOARD Ex parte RAKESH KUMAR, SUPUN SAMARASEKERA, GIRISH ACHARYA, MICHAEL JOHN WOLVERTON, NECIP F AZIL A YAN, ZHIWEI ZHU, and RY AN VILLAMIL Appeal2017-011807 Application 13/721,276 Technology Center 2600 Before JEREMY J. CURCURI, BARBARA A. BENOIT, and AARON W. MOORE, Administrative Patent Judges. CURCURI, Administrative Patent Judge. DECISION ON APPEAL Appellants appeal under 35 U.S.C. § 134(a) from the Examiner's rejection of claims 1, 3-8, 10-17, 19, and 21-23. Final Act. 1. We have jurisdiction under 35 U.S.C. § 6(b ). Claims 1, 3, 4, 6, 10-13, 15, 19, 21, and 22 are rejected under pre- AIA 35 U.S.C. § 103(a) as obvious over Geisner (US 2013/0083063 Al; Apr. 4, 2013), Bimholtz (Jeremy Bimholtz, Abhishek Ranjan, and Ravin Balakrishnan, Providing Dynamic Visual Information for Collaborative Appeal2017-011807 Application 13/721,276 Tasks: Experiments With Automatic Camera Control, Taylor & Francis Group, LLC, Human-Computer Interaction, 2010, Volume 25, pp. 261-287), Bremond (Franc;ois Bremond. Scene Understanding: perception, multi- sensor fusion, spatio-temporal reasoning and activity recognition. Computer Science [cs]. Universite Nice Sophia Antipolis, 2007. ), and Wilson (US 2004/0193413 Al; Sep. 30, 2004). Final Act. 4--14. Claims 5 and 14 are rejected under pre-AIA 35 U.S.C. § I03(a) as obvious over Geisner, Bimholtz, Bremond, Wilson, and Dishongh (US 2009/0083205 Al; Mar. 26, 2009). Final Act. 14--17. Claims 7, 8, 16, and 17 are rejected under pre-AIA 35 U.S.C. § I03(a) as obvious over Geisner, Bimholtz, Bremond, Wilson, and Crespo (US 2001/0041978 Al; Nov. 15, 2001). Final Act. 17-19. Claim 23 is rejected under pre-AIA 35 U.S.C. § I03(a) as obvious over Geisner, Bimholtz, Bremond, Wilson, and Bar-Zeev (US 2012/0068913 Al; Mar. 22, 2012). Final Act. 19-26. We affirm. STATEMENT OF THE CASE Appellants' invention relates to "mentoring via an augmented reality assistant." Spec. ,r 1. Claim 1 is illustrative and reproduced below, with the key disputed limitations emphasized: 1. A computer-implemented method for utilizing augmented reality to assist a user in performing a real-world task comprising: generating a scene understanding based on an automated analysis of a video input and an audio input, the video input comprising a view of the user of a real-world scene during performance of a task, the audio input 2 Appeal2017-011807 Application 13/721,276 comprising speech of the user during performance of the task, and the automated analysis further comprising identifying an object in the real-world scene, extracting one or more visual cues to situate the user in relation to the identified object, wherein the user is situated by tracking a head orientation of the user; correlating the scene understanding with a domain knowledge data to create a task understanding wherein the task understanding comprises a set of goals relating to performance of the task in the scene understanding; recognizing from the video input at least one object within the view of the user; interpreting a plurality of natural language utterances from the speech received from the user, based at least partly on the task understanding and the at least one object; generating a plurality of visual representations responsive to an ongoing interaction of the computer-implemented method with the user relating to the set of goals, the ongoing interaction comprising at least one audible interaction with the user based on the video input and the interpreted natural language utterances; presenting the plurality of visual representations on a see-through display as an augmented overlay to the user's view of the real-world scene wherein the plurality of visual representations are rendered based on predicted head pose based on the tracked head orientation; and guiding a user to perform the task understanding during operation of the task understanding via audio output. 3 Appeal2017-011807 Application 13/721,276 PRINCIPLES OF LAW We review the appealed rejections for error based upon the issues identified by Appellants, and in light of the arguments and evidence produced thereon. Ex parte Frye, 94 USPQ2d 1072, 1075 (BPAI 2010) (precedential). ANALYSIS THE OBVIOUSNESS REJECTION OF CLAIMS 1, 3, 4, 6, 10-13, 15, 19, 21, AND 22 OVER GEISNER, BIRNHOLTZ, BREMOND, AND WILSON Contentions The Examiner finds Geisner, Bimholtz, Bremond, and Wilson teach all limitations of claim 1. Final Act. 4--10. In particular, the Examiner finds Geisner teaches creating, based on the scene understanding and a domain knowledge base, a task understanding, ( claim 1) "wherein the task understanding comprises a set of goals relating to performance of the task in the scene understanding." See Final Act. 5 (citing Geisner ,r,r 46, 150-151, 166). In particular, the Examiner finds Geisner combined with Bremond teaches ( claim 1) "correlating the scene understanding with a domain knowledge data to create a task understanding wherein the task understanding comprises a set of goals relating to performance of the task in the scene understanding." See Final Act. 9 ( citing Bremond 34 ). The Examiner reasons it would have been obvious at the time of the invention to incorporate the teachings of Bremond into the teachings of Geisner "in order to implement an automatic augmented reality application." Final Act. 9. 4 Appeal2017-011807 Application 13/721,276 In particular, the Examiner finds Bimholtz teaches ( claim 1) "interpreting a plurality of natural language utterances from the speech received from the user, based at least partly on the task understanding and the at least one object." See Final Act. 6-8 (citing Bimholtz 261,263,268, 271). The Examiner reasons it would have been obvious at the time of the invention to incorporate the teachings of Bimholtz into the teachings of Geisner "in order to allow groups of geographically distributed individuals to work together as taught by Bimholtz." Final Act. 8 (citing Bimholtz 261). Appellants present the following principal arguments: 1. The references do not teach "correlating the scene understanding with a domain knowledge data to create a task understanding wherein the task understanding comprises a set of goals relating to performance of the task in the scene understanding" as recited in claim 1. See App. Br. 9--11; see also Reply Br. 2-5. The combined teachings at most teach that reference data can be used by a service provider to develop a scene understanding and then the service provider can use his/her knowledge to understand a task being performed by a service consumer in that scene. In such combined teachings reference data is used to develop a scene understanding instead of domain knowledge data being used in combination with the scene understanding to create a task understanding. Instead, in such combined teachings a task understanding is created in a service provider's mind. App. Br. 10-11. 11. The references do not teach "interpreting a plurality of natural language utterances from the speech received from the user, based at least 5 Appeal2017-011807 Application 13/721,276 partly on the task understanding and the at least one object" as recited in claim 1. See App. Br. 11-13; see also Reply Br. 6-9. Any allowable combination of the references would only teach a system in which detailed visual information is provided to participants and in which any utterances from the participants are heard by a service provider such that a service provider can learn what a consumer desires to do and in which the consumer utterances are recorded and coded based on attributes of the speech to classify consumer speech. Even if it could be argued that the combination of the references teach that a service provider uses the recorded and coded utterances of a consumer to interpret what a consumer wants to accomplish, the recorded and coded utterances taught by any allowable combination of the references are not interpreted based on the task understanding and at last one object in the user's view as taught and claimed by the Appellants. App. Br. 12-13. 111. The rejection is based upon improper hindsight reasoning. See App. Br. 14--16; see also Reply Br. 9-10. [U]sing reference data to develop a scene understanding, as taught by Bremond, clearly does not provide any reasoning or motivation to modify Geisner such that the scene understanding is correlated with a domain knowledge data to create a task understanding as claimed by the Appellants. Bremond has nothing to do with methods and an apparatus for utilizing augmented reality to assist a user in performing a real-world task as claimed by the Appellants. App. Br. 15. [T]he teachings of Bimholtz that participants will have to ask fewer questions to clarify what is being discussed when detailed visual information is provided than when only overview information is provided in order to clarify a task needed to be completed among multiple participants clearly does not provide any reasoning or motivation to modify Geisner such that a plurality of natural language utterances from the speech 6 Appeal2017-011807 Application 13/721,276 received from the user is interpreted based at least partly on the task understanding and the at least one object as claimed by the Appellants. Bimholtz has nothing to do with methods and an apparatus for utilizing augmented reality to assist a user m performing a real-world task as claimed by the Appellants. App. Br. 15. [T]here are no teachings in Wilson that provide any reasoning or motivation to modify Geisner such [that] the scene understanding is correlated with a domain knowledge data to create a task understanding and that a plurality of natural language utterances from the speech received from the user is interpreted based at least partly on the task understanding and the at least one object as claimed by the Appellants. App. Br. 15-16. Our Review The primary reference, Geisner, discloses the following: A collaborative on-demand system allows a user of a head-mounted display device (HMDD) to obtain assistance with an activity from a qualified service provider. In a session, the user and service provider exchange camera-captured images and augmented reality images. A gaze-detection capability of the HMDD allows the user to mark areas of interest in a scene. The service provider can similarly mark areas of the scene, as well as provide camera-captured images of the service provider's hand or arm pointing to or touching an object of the scene. The service provider can also select an animation or text to be displayed on the HMDD. A server can match user requests with qualified service providers which meet parameters regarding fee, location, rating and other preferences. Or, service providers can review open requests and self-select appropriate requests, initiating contact with a user. Geisner Abstract. 7 Appeal2017-011807 Application 13/721,276 Thus, Geisner teaches augmented reality to assist a user in performing a real-world task. Geisner Abstract. The Examiner relies on Geisner to teach most limitations of claim 1. See Final Act. 4--10. The remaining, secondary references are relied on by the Examiner for particular details of claim 1. See Final Act. 6-10. We have reviewed Appellants' arguments in the Appeal Brief and the Reply Brief, and have given all arguments their full weight. However, we do not see any error in the contested Examiner's findings, and we concur with the Examiner's conclusions of obviousness. Regarding Appellants' argument (i), [t]he test for obviousness is not whether the features of a secondary reference may be bodily incorporated into the structure of the primary reference; nor is it that the claimed invention must be expressly suggested in any one or all of the references. Rather, the test is what the combined teachings of the references would have suggested to those of ordinary skill in the art. In re Keller, 642 F.2d 413,425 (CCPA 1981). In our view, the collective teachings of the references suggest the argued limitation. We agree with and adopt as our own the Examiner's explanation that "Bremond discloses the use of reference data to improve scene understanding." Ans. 3. We further agree with and adopt as our own the Examiner's further explanation that the combination of Geisner and Bremond teaches correlating the scene understanding (the service provider's knowledge as taught by Geisner or computer vision, cognition and software engineering as taught by Bremond) with a domain knowledge data ( the reference data as taught by Bremond) to create a task understanding wherein the task understanding comprises a set of goals relating to performance of the task in the scene understanding in order to generate an automatic and improved [] scene understanding for the augmented reality application. Ans. 4. 8 Appeal2017-011807 Application 13/721,276 Put another way, Bremond is relied on for the concept of correlating scene understanding with data. See Bremond 34 ("To be able to improve scene understanding systems, we need at one point to evaluate their performance. The classical methodology for performance evaluation consists in using reference data (called ground truth)."). When this teaching from Bremond is incorporated into Geisner, Geisner is improved to correlate the scene understanding with a domain knowledge data. See Ans. 4. In reaching our conclusion, we recognize that Bremond's reference data is used for performance evaluation of scene understanding systems. However, the rejection does not attempt to incorporate this specific use of reference data in Geisner. Rather, it is the more general concept of correlating scene understanding with data that is being incorporated into Geisner. See Ans. 4. Thus, Appellants' argument (i) does not persuade us of any Examiner error. Regarding Appellants' argument (ii), in our view, the collective teachings of the references suggest the argued limitation. We agree with and adopt as our own the Examiner's explanation that "Geisner discloses through audio ([O 150] line 19) input, the service provider has spoken with the service consumer to learn that the service consumer desires to add oil to the engine." Ans. 4. We further agree with and adopt as our own the Examiner's further explanation that Bimholtz combines video and natural language utterances in order to clarify a task. Ans. 4--5 (citing Bimholtz 263, 268, 271 ); see also Ans. 5 ( citing Bimholtz 261) ("it would have been obvious ... in order to allow groups of geographically distributed individuals to work together as taught by Bimholtz. "). We understand Appellants' emphasis on the need for utterances to be interpreted based on the task understanding and the at least one object. 9 Appeal2017-011807 Application 13/721,276 However, Bimholtz teaches "interpreting a plurality of natural language utterances from the speech received from the user." See Bimholtz 261, 263, 268, 271. When combined with Geisner, the references collectively teach "interpreting a plurality of natural language utterances from the speech received from the user, based at least partly on the task understanding and the at least one object." See Final Act. 6-8 (citing Bimholtz 261,263,268, 271). Put another way, based on the combined teachings of the references, a skilled artisan would have interpreted the utterances based on the task understanding. See Ans. 4--5. Thus, Appellants' argument (ii) does not persuade us of any Examiner error. Regarding Appellants' argument (iii), with respect to modifying Geisner in light of Bremond to teach ( claim 1) "correlating the scene understanding with a domain knowledge data to create a task understanding wherein the task understanding comprises a set of goals relating to performance of the task in the scene understanding," Appellants assert that the Examiner does not provide any reasoning. See App. Br. 15. We disagree. The Examiner articulates a reason to combine the references that is rational on its face and supported by evidence drawn from the record. See Final Act. 9 ("in order to implement an automatic augmented reality application"), Geisner Abstract ( describing augmented reality to assist a user in performing a real-world task), Bremond 34 ( describing correlating scene understanding with data); see also Ans. 6. On the record before us, we find this reasoning outweighs Appellants' assertion. Regarding Appellants' argument (iii), with respect to modifying Geisner in light of Bimholtz to teach ( claim 1) "interpreting a plurality of natural language utterances from the speech received from the user, based at 10 Appeal2017-011807 Application 13/721,276 least partly on the task understanding and the at least one object," Appellants assert that the Examiner does not provide any reasoning. See App. Br. 15. We disagree. The Examiner articulates a reason to combine the references that is rational on its face and supported by evidence drawn from the record. See Final Act. 8 ( citing Bimholtz 261) ("in order to allow groups of geographically distributed individuals to work together as taught by Bimholtz."); see also Ans. 7. On the record before us, we find this reasoning outweighs Appellants' assertion. We, therefore, sustain the Examiner's rejection of claim 1. We also sustain the Examiner's rejection of claims 3, 4, 6, 10-13, 15, 19, 21, and 22, which are not separately argued with particularity. THE OBVIOUSNESS REJECTION OF CLAIMS 5 AND 14 OVER GEISNER, BIRNHOLTZ, BREMOND, WILSON, AND DISHONGH Appellants argue Dishongh fails to cure the deficiency in the rejection of independent claims 1 and 11, from which claims 5 and 14 depend, respectively. App. Br. 17. For reasons given above, we see no error in the Examiner's rejection of independent claims 1 and 11. We, therefore, sustain the Examiner's rejection of claims 5 and 14. THE OBVIOUSNESS REJECTION OF CLAIMS 7, 8, 16, AND 17 OVER GEISNER, BIRNHOLTZ, BREMOND, WILSON, AND CRESPO Appellants argue Crespo fails to cure the deficiency in the rejection of independent claims 1 and 11, from which claims 7, 8, 16, and 17 variously depend. App. Br. 17-18. 11 Appeal2017-011807 Application 13/721,276 For reasons given above, we see no error in the Examiner's rejection of independent claims 1 and 11. We, therefore, sustain the Examiner's rejection of claims 7, 8, 16, and 17. THE OBVIOUSNESS REJECTION OF CLAIM 23 OVER GEISNER, BIRNHOLTZ, BREMOND, WILSON, AND BAR-ZEEV Appellants argue Bar-Zeev fails to cure the deficiency in the rejection of independent claims 1 and 11, and that independent claim 23 recites similar relevant features. App. Br. 18. For reasons given above, we see no error in the Examiner's rejection of independent claims 1 and 11. We, therefore, sustain the Examiner's rejection of claim 23. ORDER The Examiner's decision rejecting claims 1, 3-8, 10-17, 19, and 21- 23 is affirmed. No time period for taking any subsequent action in connection with this appeal maybe extended under 37 C.F.R. § 1.136(a)(l). AFFIRMED 12 Copy with citationCopy as parenthetical citation