10756534 (B.P.A.I. Aug. 8, 2012)

Ex Parte Steely et al

Board of Patent Appeals and InterferencesAug 8, 2012

10756534 (B.P.A.I. Aug. 8, 2012)

UNITED STATES PATENT AND TRADEMARK OFFICE ____________ BEFORE THE BOARD OF PATENT APPEALS AND INTERFERENCES ____________ Ex parte SIMON C. STEELY, JR. and GREGORY EDWARD TIERNEY ____________ Appeal 2010-004607 Application 10/756,534 Technology Center 2100 ____________ Before JOSEPH F. RUGGIERO, DENISE M. POTHIER, and BRIAN J. McNAMARA, Administrative Patent Judges. McNAMARA, Administrative Patent Judge. DECISION ON APPEAL Appeal 2010-004607 Application 10/756,534 2 SUMMARY Appellants appeal under 35 U.S.C. § 134(a) from the Examiner’s final rejection of claims 1-6 and 9-34. Claims 7, 8, 35, and 36 have been objected to as being dependent upon a rejected based claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. (App. Br. 3). We have jurisdiction under 35 U.S. C. § 6(b). We reverse. STATEMENT OF THE CASE Appellants’ invention relates to a cache coherency protocol in a multiprocessor system in which a source processor speculatively executes load instructions beyond a memory barrier to retrieve cache lines for the processor. The memory barrier prevents the processor from retiring the instructions beyond the barrier until all instructions within the barrier have been executed. One or more of the cache lines may be invalidated prior to retiring an instruction if a comparison between updated data fills and the invalidated cache line indicates that the load instruction has violated the memory consistency of the system. (Spec. ¶¶ [0002], [0017], [0023]). Claim 1 is illustrative: 1. A multi-processor system that conforms to a cache coherency protocol, the system comprising: a processor that executes program instructions beyond at least one memory barrier of at least one executed load instruction; Appeal 2010-004607 Application 10/756,534 3 a request engine that retrieves an updated data fill with an undetermined coherency state from one or more other processors of the multi-processor system via a processor-to-processor data fill, the updated data fill corresponding to an invalid cache line from which data had been retrieved by the at least one executed load instruction; and a load compare component that compares the invalid cache line to the updated data fill to evaluate the consistency of the at least one executed load instruction prior to retiring the at least one executed load instruction. THE REJECTIONS Claims 1-3, 9-11, 14-18, 24-26, and 28 are rejected under 35 U.S.C. § 103(a) as unpatentable over US 6,775,749 B1; issued on August 10, 2004 (Mudgett), in view of Value Locality and Load Value Prediction, ASPLOS- VII, October 1996 preprint (Lipasti) and US Patent No. 5,937,431; issued on August 10, 1999 (Kong). Claims 4-6, 12, 13, 19-23, 27, and 29-34 are rejected under 35 U.S.C. § 103(a) as unpatentable over Mudgett in view of Lipasti and Kong in further view of “The Cache Memory Book” (Handy).1 CONTENTIONS In rejecting claim 1, the Examiner finds that Mudgett teaches the claimed multiprocessor system having a cache coherency protocol. (Ans. 3- 1 Claims 7-8, 35, and 36 were considered allowable by the Examiner and are not the subject of this appeal. Appeal 2010-004607 Application 10/756,534 4 4).2 The Examiner maps (i) the claimed processor that executes instructions beyond a memory barrier to CPU 102A, (ii) the other claimed processors to CPU 102B of Mudgett, (iii) the claimed request engine to the part of processor 102A that communicates with Mudgett’s cache coherency mechanism 104, (iv) the retrieval of an updated data fill with an undetermined coherency to the speculative response taught by Mudgett at column 5, lines 65-67, and (v) the claimed updated data fill to the request caused by a cache miss in Mudgett. (Ans. 4). The Examiner finds that Mudgett does not disclose the further functionality recited in claim 1. The Examiner finds that Lipasti teaches (i) a processor that executes program instructions beyond a memory barrier of at least one executed load instruction, (ii) retrieving the data fill with an undetermined coherency state from the multiprocessor system (in the form of a predicted value) and (iii) comparing the invalid cache line to the updated data fill to evaluate the consistency of the executed load instruction prior to retiring it. (Ans. 4). The Examiner finds that Kong teaches the claimed processor to processor data fill. (Ans. 5). Appellants contend that the combination of Mudgett, Lipasti, and Kong would require substantial redesign of Lipasti’s processor. More to the point, Appellants argue that Lipasti teaches a load value prediction (LVP) 2 Throughout, we refer to the Appeal Brief filed on August 18, 2009, the Examiner’s Answer mailed on Nov. 30, 2009, and the Reply Brief filed on February 1, 2010. Appeal 2010-004607 Application 10/756,534 5 method in which the processor that includes an LVP unit does not receive values, but instead generates predictions. (App. Br. 12). According to Appellants, since Lipasti discloses predicting the load value, the combination of Mudgett and Lipasti cannot be read to teach the claimed feature of comparing an updated data fill to an invalid cache line. (Reply Br. 2). Appellants further argue that Kong teaches that a data fill may come from shared memory or local memory but not another processor as claimed. (App. Br. 13, Reply Br. 1-2). ISSUE Did the Examiner err in finding Lipasti teaches comparing an invalid cache line to an updated data fill to evaluate the consistency of an executed load instruction before retiring the instruction? ANALYSIS Independent claims 1 and 24 are rejected over Mudgett, Lipasti, and Kong. Independent claims 19 and 29 are rejected over these same references and Handy. Claims 1 and 19 recite a load compare component, claim 24 recites a means for comparing and claim 29 recites a comparing step. Each of these claims recites comparing the invalidated cache line to the updated data fill. Because we conclude that this feature is not disclosed in the applied references, we reverse the rejections of independent claims 1 and 24 Appeal 2010-004607 Application 10/756,534 6 over Mudgett, Lipasti, and Kong, and claims 19 and 29 over Mudgett, Lipasti, Kong, and Handy. Mudgett teaches a system in which each processor may request data from a shared memory whenever that data is not in the processor’s cache. (Col. 5, ll. 42-44). However, before requested data is provided from a shared portion of memory, a cache coherency mechanism ascertains whether the other processor has a copy of requested line in its cache and, if so, whether the other processor has modified its copy of the requested line. (Col. 5, ll. 44-50). In Mudgett, a first processor’s request for a cache fill from a shared memory is fulfilled before receipt of a response to a probe that the first processor sends to a second processor to test the integrity of the data. If the second processor’s subsequent response to the probe indicates that the shared memory data received by the first processor is stale, the speculative cache fill from the shared memory is invalidated. (Col. 5, l. 42 - Col. 6, l. 10). Among the features common to all the claims that the Examiner does not find taught by Mudgett is that of a load compare component that compares the invalid cache line to the updated data fill. (Ans. 4). The Examiner’s reliance on Lipasti for this teaching is flawed. While Lipasti also teaches a speculative cache fill, instead of probing the second processor to test the integrity of the data, Lipasti predicts the Appeal 2010-004607 Application 10/756,534 7 register values to be loaded from memory (col. 3, 3 ll. 14-17). When the actual value returns from the data cache, Lipasti compares it with the predicted data, and the dependent speculative instructions are either written back or reissued. (Col. 9, ll. 29-38). Lipasti’s comparison of the returned value with a predicted value is in contrast to comparing the updated data fill to an invalid cache line from which data had been retrieved. Mudgett teaches determining if the speculative cache fill is invalid based on the response to a probe, while Lipasti teaches comparing the actual returned value to a predicted value. Neither reference teaches the claimed feature of comparing the actual returned value to an invalid cache line from which data had been retrieved by an executed load instruction. The Examiner identifies no other teaching of this claimed feature in any of the references. Instead, the Examiner asserts that Kong teaches the claimed processor to processor data fill (Ans. 5) and that Handy teaches a gate-level implementation of Lipasti’s system. (Ans. 9). We conclude that the Examiner erred in finding that that Lipasti teaches comparing an invalid cache line to an updated data fill to evaluate the consistency of an executed load instruction before retiring the instruction. Since the Examiner’s erroneous application of Lipasti is common to all the rejections and because the Examiner does not assert that either Mudgett, Kong or Handy teaches the limitations common to all the 3 Lipasti does not include column numbers. Column 3 is located on its second page, first column. The other cites to Lipasti follow accordingly. Appeal 2010-004607 Application 10/756,534 8 claims which are lacking in Lipasti, we reverse the rejections of all the claims. ORDER The rejection of claims 1-3, 9-11, 14-18, 24-26, and 28 under 35 U.S.C. § 103(a) as unpatentable over Mudgett, in view of Lipasti and Kong is reversed. The rejection of claims 4-6, 12, 13, 19-23, 27, and 29-34 under 35 U.S.C. § 103(a) as unpatentable over Mudgett in view of Lipasti and Kong in further view of Handy is reversed. REVERSED rwk