Ex Parte Lippincott et alDownload PDFPatent Trial and Appeal BoardApr 17, 201512044775 (P.T.A.B. Apr. 17, 2015) Copy Citation UNITED STATES PATENT AND TRADEMARK OFFICE UNITED STATES DEPARTMENT OF COMMERCE United States Patent and Trademark Office Address: COMMISSIONER FOR PATENTS P.O. Box 1450 Alexandria, Virginia 22313-1450 www.uspto.gov APPLICATION NO. FILING DATE FIRST NAMED INVENTOR ATTORNEY DOCKET NO. CONFIRMATION NO. 12/044,775 03/07/2008 Lisa Ellen LIPPINCOTT AUS920105019US4 7914 50170 7590 04/17/2015 IBM CORP. (WIP) c/o WALDER INTELLECTUAL PROPERTY LAW, P.C. 17304 PRESTON ROAD SUITE 200 DALLAS, TX 75252 EXAMINER GMAHL, NAVNEET K ART UNIT PAPER NUMBER 2166 MAIL DATE DELIVERY MODE 04/17/2015 PAPER Please find below and/or attached an Office communication concerning this application or proceeding. The time period for reply, if any, is set in the attached communication. PTOL-90A (Rev. 04/07) UNITED STATES PATENT AND TRADEMARK OFFICE ____________________ BEFORE THE PATENT TRIAL AND APPEAL BOARD ____________________ Ex parte LISA ELLEN LIPPINCOTT, PETER JAMES LINCROFT, PETER BENJAMIN LOER, JOHN EDWARD FIREBAUGH, and DENNIS SIDNEY GOODROW ____________________ Appeal 2013-001027 Application 12/044,775 Technology Center 2100 ____________________ Before STANLEY M. WEINBERG, DANIEL J. GALLIGAN, and CARL L. SILVERMAN, Administrative Patent Judges. GALLIGAN, Administrative Patent Judge. DECISION ON APPEAL Appellants 1 seek our review under 35 U.S.C. § 134(a) of the Examiner’s Final Rejection of claims 1–23. We have jurisdiction under 35 U.S.C. § 6(b). We affirm and designate our affirmance of the rejection under 35 U.S.C. § 103(a) as a new ground of rejection. 2 1 The Appeal Brief identifies International Business Machines Corp. as the real party in interest. App. Br. 3. 2 Our Decision refers to Appellants’ Appeal Brief filed May 15, 2012 (“App. Br.”); Appellants’ Reply Brief filed October 16, 2012 (“Reply Br.”); Final Office Action mailed December 22, 2011 (“Final Act.”); Examiner’s Appeal 2013-001027 Application 12/044,775 2 STATEMENT OF THE CASE Claims on Appeal Claims 1 and 23 are independent claims. Claim 1 is reproduced below: 1. An apparatus for synchronizing network element state when a network connection is restored between a plurality of servers after a network failure, comprising: said plurality of servers configured to maintain a plurality of corresponding objects, each object existing in a plurality of different versions, wherein each said different object version results from modifications to an object made by corresponding different servers during said network failure when one or more of said plurality of servers are unable to communicate with each other but otherwise continue to function; each said object comprising a vector including two or more elements, each element associated with each said server and each element containing a separate version number for each said server, wherein responsive to modifying said object, each said server finds its associated element of the two or more elements in the vector, and wherein each said server increments its version number in said associated element in said vector; each said server further comprising an automatic conflict resolution processor configured for providing, at each server, a most up-to-date view of all objects across all of said plurality of servers upon restoration of said network connection between said plurality of servers after said network failure, said conflict resolution processor configured for reconciling the existence of said plurality of different versions of an object to determine which object version should take precedence over other object versions, wherein conflict resolution is performed when there are multiple versions of a same object at a server, said conflict resolution processor further configured for enforcing comprising a tie breaking rule dictating that a server having a Answer mailed August 16, 2012 (“Ans.”); and original Specification filed March 7, 2008 (“Spec.”). Appeal 2013-001027 Application 12/044,775 3 lowest version number takes precedence over other servers when determining which object version should take precedence over other object versions. The prior art relied upon by the Examiner in rejecting the claims on appeal: Shaheen et al. US 5,434,994 July 18, 1995 (hereinafter “Shaheen”) Lev Ran et al. US 2004/0255048 A1 Dec. 16, 2004 (hereinafter “Lev” or “Lev Ran”) Examiner’s Rejections Claims 1–22 3 stand rejected under 35 U.S.C. § 101 as being directed to non-statutory subject matter. Final Act. 3. Claims 1–23 stand rejected under 35 U.S.C. § 103(a) as being unpatentable over Lev Ran and Shaheen. Final Act. 4–12. ANALYSIS Rejection of Claims 1–22 under 35 U.S.C. § 101 The Examiner found claim 1 encompasses software per se because it does not recite hardware for the claimed apparatus and, therefore, rejected claim 1 and dependent claims 2–22 under 35 U.S.C. § 101. Final Act. 3. Appellants contend claim 1 “claims a machine, i.e. each of a plurality of servers” and argue the Examiner erred in concluding claim 1 is not directed to statutory subject matter because “[a] server is a piece of hardware, i.e. a machine.” App. Br. 12. 3 Although Appellants identify claim 23 as also rejected under 35 U.S.C. § 101 (App. Br. 11–12), the Examiner did not include claim 23 in the rejection (Final Act. 3). Appeal 2013-001027 Application 12/044,775 4 We are not persuaded of Examiner error. Although the term “server” can refer to hardware, the description of “server” in Appellants’ own Specification indicates that its meaning is not so limited: “Note that the term server, as used herein, does not refer only to a server in the generic sense of a server. Rather, a server as a collection of policy and a collection of state.” Spec. 10, ll. 16–19. As such, Appellants’ argument that the recitation of the term “server” renders the claim patent eligible is not persuasive. Therefore, we sustain the rejection of claim 1 under 35 U.S.C. § 101. Appellants do not separately argue dependent claims 2–22, and, therefore, we also sustain the rejection of these claims under 35 U.S.C. § 101. 4 Rejection of Claims 1–23 under 35 U.S.C. § 103(a) Appellants contend the Examiner erred in finding the combination of Lev Ran and Shaheen teaches or suggests the limitation of claim 1 reciting: each said object comprising a vector including two or more elements, each element associated with each said server and each element containing a separate version number for each said server, wherein responsive to modifying said object, each said server finds its associated element of the two or more elements in the vector, and wherein each said server increments its version number in said associated element in said vector. App. Br. 15–21; Reply Br. 2–3. Claim 23 recites a similar limitation. See App. Br. 28. In rejecting claim 1, the Examiner acknowledged “Lev does not disclose a vector including a separate version number for each server 4 Should there be further prosecution of this application (including any review for allowance), the Examiner may wish to review claims 1–23 for compliance under 35 U.S.C. § 101 in light of the recently issued preliminary examination instructions on patent eligible subject matter. See “Interim Guidance on Patent Subject Matter Eligibility,” December 16, 2014. Appeal 2013-001027 Application 12/044,775 5 explicitly as claimed.” Final Act. 5. Instead, the Examiner found “Shaheen however discloses a fileset version vector being maintained by each server in column 7 lines 22 – 40 and lines 58 – 67.” Id. Shaheen teaches: “A fileset version vector is maintained by each server for each fileset replica [see Parker et al. ‘Detection of Mutual Inconsistency in Distributed Systems’, IEEE Transactions on Software Engineering, May 1983].” Shaheen, col. 7, ll. 22–25. Although Shaheen teaches the use of a version vector, Shaheen itself does not disclose the details of the version vector. Rather, those details are disclosed in the publication by Parker et al. (hereinafter “Parker”) referred to in the cited passage of Shaheen. Parker teaches a version vector having two or more elements (sequence of n pairs), with each element associated with a server and having a version number for the server (ith pair (Si: vi)): A version vector for a file f is a sequence of n pairs, where n is the number of sites at which f is stored. The ith pair (Si: vi) gives the index of the latest version of f made at site Si. In other words, the ith vector entry counts the number vi of updates to f made at site Si. We will use leters [sic] A, B, C, · · · to designate site names, and vectors will be written as . Parker 243. Parker further teaches incrementing the version number of the server that modifies a file: “Each time an update to f originates at site Si, we increment the Si th component of f’s version vector by one. The vector is committed with the updated file.” Parker 243. Thus, we find Parker, which is referenced in the cited portion of Shaheen, teaches the limitation Appellants argue is lacking in the teachings of Lev Ran and Shaheen. Appeal 2013-001027 Application 12/044,775 6 The Examiner provided the following rationale for combining Lev Ran and Shaheen: It would have been obvious to one of ordinary skill in the art of data processing at the time of the present invention to combine the teachings of cited references because both references are directed towards information and data being consistently maintained in a network. Furthermore, the version vector being maintained in Shaheen by each server assists in consist [sic] information management and also efficient conflict resolution in column 7 lines 22 - 40 and lines 58 - 67. Final Act. 5. This suffices as an articulated reason with rational underpinning to support the combination of Lev Ran and Shaheen, and we find this rationale is also applicable with respect to Parker, which teaches particular details of the version vector taught in Shaheen. Appellants also argue Lev Ran’s teaching of locking a document teaches away from the invention because Lev Ran does not allow servers to modify objects when they are locked. App. Br. 17. We are not persuaded because Lev Ran teaches that conflicts can occur. See, e.g., Lev Ran ¶¶ 233, 346. As the Examiner found, “the version vector being maintained in Shaheen by each server assists in consist [sic] information management and also efficient conflict resolution.” Final Act. 5. Thus, we are not persuaded that Lev Ran’s teachings discourage a combination with Shaheen’s and Parker’s teachings with respect to a version vector. Appellants further argue Lev Ran teaches away from a vector with separate version numbers for each server because Lev Ran is silent about multiple servers and, instead, is concerned with a negotiation between a server and a client. App. Br. 21 (citing Lev ¶ 320). We are not persuaded because, as noted above, Lev Ran teaches conflicts can occur. Therefore, the version vector teachings of Shaheen and Parker provide a useful tool for Appeal 2013-001027 Application 12/044,775 7 identifying conflicts arising from conflicting writing processes at different computers. On the record before us, Appellants have failed to persuade us of error in the rejection of claim 1. Accordingly, we sustain the rejection of claim 1 and dependent claims 2–22 and independent claim 23, which were not separately argued with particularity. Because our decision sets forth reasons for sustaining the rejection with respect to Parker that enlarge on the Examiner’s findings and explanations, we designate our affirmance of the rejection under 35 U.S.C. § 103(a) as a new ground of rejection. DECISION The decision of the Examiner to reject claims 1–22 under 35 U.S.C. § 101 is AFFIRMED. The decision of the Examiner to reject claims 1–23 5 under 35 U.S.C. § 103(a) is AFFIRMED. 5 Should there be further prosecution of this application (including any review for allowance), the Examiner may wish to determine if claims 1–23 comply with 35 U.S.C. § 112 ¶ 1. Each of independent claims 1 and 23 recites: “a tie breaking rule dictating that a server having a lowest version number takes precedence over other servers when determining which object version should take precedence over other object versions” (emphasis added). Appellants’ Specification states that the tie breaking rule is “the object version having the lower server number wins” and explains that, in a conflict between SERVER0 and SERVER2, the tie breaker rule gives precedence to SERVER0 because it has a lower server number than SERVER2. Spec. 12, ll. 6–13. Thus, the tie breaking rule of the claim, which looks at a lowest version number of a server, may be inconsistent with the teachings of the Specification, which resolves conflicts between versions based on the lowest server number, not the lowest version number. Appeal 2013-001027 Application 12/044,775 8 No time period for taking any subsequent action in connection with this appeal may be extended under 37 C.F.R. § 1.136(a)(1)(iv). AFFIRMED 37 C.F.R. 41.50(b) msc Notice of References Cited Application/Control No. 12/044,775 Applicant(s)/Patent Under Reexamination LISA ELLEN LIPPINCOTT Examiner Art Unit 2166 Page 1 of 1 U.S. PATENT DOCUMENTS * Document Number Country Code-Number-Kind Code Date MM-YYYY Name Classification A US- B US- C US- D US- E US- F US- G US- H US- I US- J US- K US- L US- M US- FOREIGN PATENT DOCUMENTS * Document Number Country Code-Number-Kind Code Date MM-YYYY Country Name Classification N O P Q R S T NON-PATENT DOCUMENTS * Include as applicable: Author, Title Date, Publisher, Edition or Volume, Pertinent Pages) U D. Stott Parker, Jr. et al., Detection of Mutual Inconsistency in Distributed Systems, IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, Vol. SE-9, No. 3, 240-247 (May 1983). V W X *A copy of this reference is not being furnished with this Office action. (See MPEP § 707.05(a).) Dates in MM-YYYY format are publication dates. Classifications may be US or foreign. U.S. Patent and Trademark Office PTO-892 (Rev. 01-2001) Notice of References Cited Part of Paper No. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-9, NO. 3, MAY 1983 Detection of Mutual Inconsistency in Distributed Systems D. STOTT PARKER, JR., GERALD J. POPEK, GERARD RUDISIN, ALLEN STOUGHTON, BRUCE J. WALKER, EVELYN WALTON, JOHANNA M. CHOW, DAVID EDWARDS, STEPHEN KISER, AND CHARLES KLINE Abstract-Many distributed systems are now being developed to provide users with convenient access to data via some kind of com- munications network. In many cases it is desirable to keep the system functioning even when it is partitioned by network failures. A serious problem in this context is how one can support redundant copies of resources such as files (for the sake of reliability) while simultaneously monitoring their mutual consistency (the equality of multiple copies). This is difficult since network faiures can lead to inconsistency, and disrupt attempts at maintaining consistency. In fact, even the detection of inconsistent copies is a nontrivial problem. Naive methods either 1) compare the multiple copies entirely or 2) perform simple tests which will diagnose some consistent copies as inconsistent. Here a new approach, involving version vectors and origin points, is presented and shown to detect single file, multiple copy mutual inconsistency effec- tively. The approach has been used in the design of LOCUS, a local network operating system at UCLA. Index Terms-Availability, distributed systems, mutual consistency, network failures, network partitioning, replicated data. I. INTRODUCTION NUMBER of operating systems have been developed A recently in which user files are distributed almost with- out restriction around a network. These systems range from network operating systems (NOS's) such as RSEXEC, NSW, ELAN [17], and DCS [4], to distributed database manage- ment systems (DDBMS's) like SDD-1 [5], [13] and INGRES [15]. These- systems emphasize the uniform interfacing of multiple file systems. Files are to be accessible throughout the network, without regard to the accessor or file location. Unfortunately, a file can be made inaccessible by network failures or crashes of the site where the file is located, so users may obtain randomly fluctuating views of the state of the network. To alleviate this problem, many of the systems propose to keep duplicate copies of files as a reliability mech- anism. This solution engenders another problem. As soon as Manuscript received November 11, 1980; revised November 13, 1981. This work was supported in part by ARPA Reseach Contract DSS MDA-903-77-C-0211 and in part by ONR Grant N00014-79-C-0866. D. S. Parker, Jr., G. J. Popek, A. Stoughton, B. J. Walker, E. Walton, J. M. Chow, D. Edwards, and C. Kline are with the Department of Com- puter Science, University of California, Los Angeles, CA 90024. G. Rudisin was with the Department of Computer Science, University of California, Los Angeles, CA 90024. He is now with the Systems Technology Center, Western Digital Company, Pittsburgh, PA 15213. S. Kiser was with the Department of Computer Science, University of California, Los Angeles, CA 90024. He is now with Xerox Corporation, El Segundo, CA 90245. multiple copies of a file exist, the system must ensure the mutual consistency of these copies: when one copy of the file is modified, all must be modified correspondingly before an independent access can take place. Much has been written about the problem of maintaining consistency in distributed systems, ranging from intemal consistency methods (ways to keep a single copy of a resource looking consistent to multiple processes attempting to access it concurrently) to various ingenious updating algorithms which ensure mutual consistency [1], [21, [61, [8], [161, etc. We concern ourselves here with mutual consistency in 'the face of network partitioning, i.e., the situation where various sites in the network cannot communicate with each other for some length of time due to network failures or site crashes. This is a very real problem in most networks. For example, even in the Ethemet [101, gateways can be inoperative for significant lengths of time, while the Ether segments they normally connect operate correctly. Network partitioning can completely destroy mutual con- sistency in the worst case, and this fact has led to a certain amount of restrictiveness, vagueness, and even nervousness in past discussions, of how it may be handled. In some environ- ments it is desirable or necessary to permit users to continue modifying resources such as files when the network is parti- tioned. A network operating system would be a good example. In such environments mutual inconsistency becomes a fact of life which must be dealt with. This paper shows that in this case mutual inconsistency can be efficiently detected through the use of what we call version vectors and origin points. Once inconsistency is detected, some reconciliation steps are needed. In those cases where the semantics of the operations involved are straightforward, automatic reconciliation may be possible. It is worth reflecting for a moment on the worth of keeping redundant copies. Although redundancy increases reliability and availability, and in most cases improves access time, it leads to mutual consistency problems when network partitions occur. When considenng whether to store a file redundantly one must weigh the advantage of greater availability, the probability of a mutual inconsistency, and the ramifications of such an inconsistency. In many NOS environments, file update rates are moderate and "conflicts" would occur only rarely. However, in transaction-oriented DDBMS's update rates may be high, semantics of operations complex, and con- sistency extremely important. 0098-5589/83/0500-0240$01.00 © 1983 IEEE 240 PARKER et al.: MUTUAL INCONSISTENCY IN DISTRIBUTED SYSTEMS The results of this paper may be nevertheless useful in any system where mutual inconsistency, presumably due to net- work partitioning, is tolerated. Since our application (LOCUS [121, [14], [181) is concerned with files, we will restrict our discussion henceforth to mutual consistency of files rather than of general resources. It is clear, however, that all results here may be applied to more general contexts. The paper is organized as follows. Section II briefly surveys previous research on the partitioning problem. Section III then lays the formal groundwork on inconsistency detection. An accurate and easily implemented technique for detecting mutual inconsistency is developed. Section IV points out bnrefly what must be done in the reconciliation of inconsistent copies. Although the reconciliation of these conflicts must necessarily be left to the user in some cases, it is also demon- strated that for certain kinds of files (mailboxes, directories) the reconciliation may be performed automatically by the system. Finally, conclusions are offered in Section V. II. PREVIOUS WORK ON PARTITIONING Network partitioning is the situation occurring when a network is broken into logically separate components because of site or link failures. There are many partitioning-related issues which must be addressed in the design of distributed file systems. These issues include the relative importance of avail- ability over mutual consistency of files, what occurs when one finds a file has become inaccessible or out of date, and so forth. To our knowledge, however, partitioning has not been investigated very thoroughly. It has been mentioned in several proposed methods for updating files in distributed systems. The most typical response has been to enforce consistency by permitting files to be accessed only in one partition. Unfortu- nately, effective implementation of this policy can often result in the files being accessible in zero partitions. We outline several existing proposals below. Voting: In voting-based systems such as proposed by Thomas [16] and Menasce et al. [9], mutual consistency is guaranteed at the expense of availability. Users desiring to modify a file must lock it by obtaining majority assent in a vote. Since there can be at most one partition containing a majority of the sites, any file will be accessible in at most one partition. Unfortunately, it is possible that there will be no partition which contains a majority of the sites, so in this case no updates could occur anywhere. Tokens: Here it is assumed that. each file has a token asso- ciated with it, which permits the bearer to modify the file. Obtaining the token is another issue, reducible more or less to locking. In this model only sites in the partition containing the token are permitted to modify the file, so using tokens is less restrictive than using voting. However, the problem of recreating lost tokens is nontrivial. Moreover, when a partition occurs, the token may happen to be resident in a rarely used part of the network, effectively making the resource unavailable. Primary Sites: Originally discussed by Alsberg and Day [1], this approach suggests that a single site be appointed respon- sible for a file's activities. Upon partitioning (possibly involving a primary site crash) either 1) a backup site is elected as the new primary site and consistency becomes a possible problem (the proposed approach), or else 2) the file becomes inacces- sible in all but the primary site partition. Reliable Networks and Optimism: Communications in the SDD-1 system are based on the use of a "reliable network" [51, which guarantees the eventual delivery of all messages even if partitioning occurs. This delivery depends on "spoolers" which save messages to be transmitted following a break in communications. No guarantee of postpartition consistency exists; as with the primary site model, assuming consistent data afterwards is "optimistic" [6] in the sense that it may work out, but quite possibly the work done in different parti- tions will have to be detected in some way as inconsistent, and then undone or coalesced somehow by users. Disk Toting: In this approach, employed at Xerox Parc and other installations where very intelligent terminals are linked via a network, files are not stored redundantly but are kept on removable storage media which can be carried around during prolonged partitions. Thus, availability and consistency are simultaneously achieved, but they are not achieved automati- cally. This approach is clearly only useful for local networks with compatible portable storage media at each site, where the delay and inconvenience implied is acceptable. Note that none of these approaches openly states either 1) how conflicting versions of files are detected or 2) what is to be done when these conflicting files are detected upon merge of several partitions. Either the possibility of conflict is precluded by restricting file availability, or else any seemingly conflicting files must be "rolled back" to the most recent point at which there was no conflict. We show in the next sections how, without restricted availability, we can ensure correct propagation of updates in all cases except when unavoidably conflicting file versions are found. III. DETECTION OF MUTUAL INCONSISTENCY One of the reasons the partition problem is so difficult is that each partition can break into subpartitions and/or merge with other partitions many times before the entire network finally becomes connected. Indeed, it is possible that the net- work will never be completely reconnected! However, all messages sent might be delivered eventually through dynami- cally changing partitions. In this unpleasant eventuality, how can one hope to guarantee mutual consistency of files without restricting file availability as in Section II? We now show how inconsistencies or "conflicts" in the file system can be accu- rately detected easily; this solves the major part of the prob- lem. The next section will discuss how these inconsistencies may then be reconciled. We must formalize what we mean by a file "conflict" which arises after a partition, and pinpoint the kinds of inconsistency which partitioning can cause. This is important since, as men- tioned above, many basic systems principles are invalidated in systems subject to partitioning. First, the semantics of re- naming, deletion, and even creation of redundantly stored files or resources in systems which are partitioned are totally unclear. Second, and worse, user-visible names of entities in 241 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-9, NO. 3, MAY 1983 the system may no longer be assumed to either uniquely specify, or even correctly specify, the entities themselves. After a partition, it may be discovered either that two files with the same name have been independently created, or that two independent updates to the same file have been made. In general, names in one partition bear no relation to entities in another. This is a principle reason for the difficulty in defining the semantics of renaming and deletion of files. We need some form of identification of system entities which is immune to partitioning. We achieve this below by using "origin points" and "version vectors." A. File Conflict Types and Origin Points A (network) partition is a set of sites which share a common, synchronized, view of some set of files. An origin point OP(f) of a file f is a system wide unique identifier which is generated when f is created. It is an immu- table attribute of f, although f's name is not (indeed f may have multiple system wide names). Thus, no number of modi- fications or renamings off will change OP(f). An origin point for a file might be something like a (creation time, creation site) pair. Now, just as names cannot uniquely specify files, origin points cannot either, but they do give us important information. Origin points tell us when two files are based on a common file, but do not tell us whether the two files are identical, since both could have been indepen- dently modified. There are two types of conflicts that we wish to consider: name conflicts and version conflicts. A name conflict occurs when two files with different origin points have the same system wide name. In contrast, a version conflict occurs when two versions of the same file (same origin point) have been "incompatibly" modified. After some preliminaries, a ver- sion conflict occurrence is defined more precisely below. A modification id for a version of a file f is a system wide unique identifier of a modification of f in some partition and at some time relative to that partition. A modification history for a version of a file f is the set of modification ids corre- sponding to the modifications of that version of f which have occurred. Two modification histories are compatible if they are identical or if one is an initial history of the other, and incompatible otherwise. We define a version conflict to occur when two versions of the same file f (same origin point) have incompatible modifica- tion histories. Note that when two versions of a file are not equal, their modification histories are always different. However, it is possible for two versions of a file to be equal yet have incom- patible histories. For example, consider a file which contains a bank account balance. If the balance is $20 million initially, and both partitions decrease it to $0, then at partition merge time although both versions are $0 a conflict will be indicated. Further, if the semantics of "decrease" mean "withdraw," a conflict intuitively should occur. We claim that this definition of version conflict occurrence is a reasonable one given that nothing is known about the file content's semantics. Clearly, name conflicts are easy to detect, Version conflicts, however, are more difficult to detect efficiently. This latter problem is addressed in the following sections: modifying, deleting, or renaming the various copies. B. The Problem of Version Conflict Detection One might think that a simple timestamp scheme could be used to detect possible version conflicts among files: every time a file is modified in a partition, one marks it with an update time and the previous update time. Upon partition merge, one checks whether the timestamps on the copies of a file are either all identical (no update on the file occurred), or one copy of the files differs from the others by a single update. Thus, no conflict is signaled when at most one update is made, but in any more complex situation a version conflict condition is raised. This approach is deficient in general, since some nonconflict situations will be handled as conflicts. Let us describe the version conflict problem in the following way. Think of a partition for a file as a subset of sites in the network in which all copies of the file may be maintained with mutual consistency. Note that this definition is not strictly tied to the physical details of network failure. Instead, here partitions are defined relative to files and to the higher con- cept of consistency. Although two sites with different versions of a file f may be communicating for some time, we do not consider the sites to be in a common partition relative to f unless this difference in the two versions is resolved. Definition: A partition graph G(f) for any file f is a directed acycic graph (dag) which is labeled as follows. The source node (and the sink node if it exists) is labeled with the names of all sites in the network having copies of file f, and all other nodes are labeled with a subset of this set of names. Each node can only be labeled with site names appearing on its ancestor nodes in the graph; conversely every site name on a node must appear on exactly one of its descendants. In addition, a node is marked with a "+" iff is modified one or more times within the corresponding partition, and/or a version conflict had to be reconciled. We define this latter situation recursively as follows. Let P be a node in G(f). A version conflict had to be reconciled at P if there are backward paths from P to distinct nodes P1 and P2 in G(f), such that 1) an update to f and/or a version conflict reconciliation for f occurred at both P1 and P2, and 2) there is no ancestor node ofP having two backward paths to both PI and P2. Each node in G(f) thus corresponds to a partition for f, a period of time during which the labeled sites maintain "synchronized" information about f. All sites appearing in the node label resolve any differences that might exist among their copies of f. All connections in G(f) between nodes indicate transitions of the network under partitions or merges. The definition of conflict and reconciliation models the notions of Section III-A for the following reasons. First, any version conflict that is reconciled must have been generated by two prior partitions P1 and P2, giving incompatible modifi- cation sets. Second, and conversely, if a file modification of some kind (update or reconciliation of updates) occurs inde- 242 PARKER et al.: MUTUAL INCONSISTENCY IN DISTRIBUTED SYSTEMS pendently in two partitions P1 and P2, a version conflict must arise later whenever sites from these partitions inspect f. Con- dition 2) guarantees that partition P is the first point at which mutual consistency is again established. An example of a partition graph is shown in Fig. 1. Here there are four sites, A, B, C, and D, which support f. Multiple partitions of these initially connected sites occur, so that at first sites A and B can communicate, but are isolated from sites C and D. Later A and B become isolated, as do C and D, but B and C resume communication. Ultimately, all four sites are reconnected in the bottom node of the graph. The file f is modified first in the {A, B} partition, and subsequently in both the {A} and {B, C} partitions. Note that this sequence of modifications should not result in a version conflict in the BC or BCD partitions since site B at all times has the latest version of f; intelligent implementation of conflict detection should realize this fact and avoid notifying sites C or D that their f versions conflict with the current one. However, in the final ABCD partition a conflict is (and should be) reconciled, since in this case both versions off have incompatible modification sets. Now, as mentioned above it is simple to provide some mechanism which detects all possible version conflicts; a simple timestamp algorithm will be adequate. What is more difficult is to find a mechanism which detects version conflicts only when they are real. In Fig. 1, for example, even though the first update may have been initiated by site A, this in- formation is transitively passed by site B without conflict to sites C andD. C. Version Conflicts and Version Vectors Many possible approaches exist for attacking the problem of accurately detecting version conflicts. More elaborate time- stamp schemes are a possibility, and there are a number of methods based on "update log files" (sometimes referred to as "journaling"). Unfortunately, these approaches suffer from either or both 1) a need to maintain some kind of global network time (in itself nontrivial [7]) and 2) a need to store the entire partition graph-or its equivalent-someplace where it may be accessed later on. Since the partition graph may get arbitrarily large, the latter requirement is undesirable. We now present instead a straightforward solution to this problem based on a version numbering scheme encoding just the necessary characteristics of the history graph. One maintains a vector with each copy of each file. Within every partition (unit of mutual consistency), these vectors keep an update history for the file. As partitions merge, these vectors for the possibly inconsistent files are compared. It tums out that version conflicts are signalled when, and only when, the vectors are "incompatible." We formalize this as follows. Definition: A version vector for a file f is a sequence of n pairs, where n is the number of sites at Whichf is stored. The ith pair (Si: vi) gives the index of the latest version offmade at site Si. In other words, the ith vector entry counts the number vi of updates to f made at site Si. We will use leters A, B, C, * * * to designate site names, and vectors will be written as . + + Fig. 1. Partition graph G(f) for file stored redundantly at sites A, B, C, D. Definition: A set of version vectors are compatible when one vector is at least as large as any other vector in every site component for which they each have entries. A set of vectors conflict when they are not compatible. For example, the version vector dominates so the two are compatible; and andcon- flict, but , , and do not conflict, since the third vector dominates the other two. In Fig. 2 version vectors are given for f in every partition of Fig. 1. The vector associated with the node labeled BCD, indicates that f was modified twice at site A, once at site C, and nowhere else. Note in particular that during the {A, B} partition, the file is modified twice at site A. The final merge results in a conflict. We adopt the following usage of version vectors. 1) Each time an update to f originates at site Si, we incre- ment the Sith component of f's version vector by one. The vector is committed with the updated file. 2) File deletion and renaming are treated as file updates. Deletion results in a version of the file of length zero, for example; when all versions of a file are of length zero, infor- mation on the file may be removed from the system. 3) When version conflicts are reconciled within a partition, the Sith entry of the version vector for the reconciled file is set to be the maximum of the Sith entries of all of its prede- cessors, and in addition the site initiating the reconciliation increments its entry. This ensures future compatibility with any old versions of the file still remaining on the network. 4) When copies of a file are subsequently stored at new sites, the version vector is augmented to include the new site information. The definition of compatibility above still applies in this case. Point 4) states that the vectors are not required to be of fixed length, but may grow (or shrink, actually) as long as the relevant site information is maintained. If a copy off is added at a site E during some partition, the vector in the partition where the copy was obtained is simply augmented to reflect the existence of the E copy. Thereafter, sites merging with this partition will be required to augment their vectors accord- ingly. Also, note that the version counts should be of variable length, so running out of space will not be a problem. Version vectors serve basically to encode the partial order 243 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-9, NO. 3, MAY 1983 + - -- CONFLICT! vector becomes after reconciliation at site B Fig. 2. Partition graph G(f) for f with version vectors effective at the end of each partition. defined by the partition graph: if one node in the graph "precedes" another, i.e., there is a path from the graph source through the former to the latter, then the version vectors of the two nodes will not conflict. This observation leads us to the following result, which shows us that version vectors are all we basically need to detect version conflicts. Theorem: A version conflict must be reconciled at a node in G(f) if and only iff's version vectors conflict at that point. Proof: It is clear that if there is a conflict reconciliation at some node P in G(f), then the version vectors will conflict at P. (Version vectors detect real conflicts just as well as the simple timestamp algorithm: what must be shown is that they detect only real conflicts.) Conversely, suppose that f's ver- sion vectors conflict at some node P. Then two of the vectors must conflict and not be dominated by any third vector. These two vectors were generated in two earlier partitions P1 and P2-both having paths to P-where f was modified inde- pendently. All that must be shown is that there is no ancestor P' of P which also has backward paths to P1 and P2. (Note P' could be either P1 or P2.) Suppose P' exists. We know that in this case the P1 and P2 version conflict will be reconciled at P', giving a version vector whose components are the maxima of the components from the vectors in P1 and P2. But this vector will dominate both the P1 and P2 vectors at P, remov- ing their conflict. This contradicts our original assumption. Hence, P' cannot exist, so the partition graph conditions are satisfied and there is a conflict reconciliation at P. U D. Conclusions on File Conflict Detection The theorem above shows us that version vectors may be used to detect version conflicts and request user reconciliation of the conflicts. Version vectors will detect only "real" con- flicts, i.e., situations in which versions of a file were modified independently in separate partitions. Thus, our work differs from previous research in that, where many people have developed mechanisms (e.g., timestamps) which detect suffi- cient conditions for a conflict to exist, we have striven to provide a mechanism detecting necessary and sufficient condi- tions for conflict. It should be noted that if an identical update is made in two separate partitions, version vectors will indicate a file conflict even though there may be none. In some applications, then, it may be desirable to actually check a file for differences when several copies are found to have conflicting vectors. Indeed, this cross-checking of copies may have to be done eventually if the user is to resolve the file conflict. It is also important to recognize that what has been presented here is applicable only when single files are being processed. Consider the following example (of Mark Brown) where two "transactions" Tl and T2 execute in different network parti- tions. Let readset(Tl) = readset(T2) = {f, g} writeset(Tl) = {f} writeset(T2) = {g} and assume that both TI and T2 complete prior to reconnec- tion of the network. Then a conflict (serialization error) should occur after reconnection, but it is easily seen that in this case version vectors will not detect anything amiss. Many appropriate extensions immediately suggest themselves; one is presented in [111]. Note, interestingly, that many of the "solutions" for providing mutual consistency mentioned in Section II also do not solve this problem. In particular, such conflicts can still occur with the Token and Primary Site approaches, unless all updates are constrained to occur within a single partition. We have shown in this section that file conflicts, whether 244 PARKER et al.: MUTUAL INCONSISTENCY IN DISTRIBUTED SYSTEMS they are name conflicts or version conflicts, can be accurately detected by maintaining just two pieces of information with each filef: 1) an origin point 2) aversion vector. In the following section we take up the question of how to resolve file conflicts, now that the problem of detection has been clarified. IV. RESOLUTION OF MUTUAL INCONSISTENCY A conflict detection mechanism, while valuable, has increased effect if there is also a method for reconciling conflicts auto- matically. From several conflicting versions of a file, this method should produce a subsequent version that dominates these versions, while preserving the operations which were done to them. Although this is certainly not possible in general, there are many cases which admit automated reconciliation. Clearly, conflict reconciliation must take into account the semantics of the operations which were done to the data objects in conflict. This has been noted by many researchers (e.g., [1, p. 568], [5, p. 65], [13, p. 57]). In those cases where the nature of the semantics is sufficiently constrained, straightforward reconciliation algorithms can be given. For example, consider two important types of files in LOCUS, directories and user mailboxes. In both of these cases, there are just two available operations: * insert an item (e.g., create a file, or receive a message) * remove an item (e.g., delete a file, or process a message). Such files have the characteristic that version conflicts can be reconciled simply by taking the union of the entries in the component files, then removing any entries which had been deleted. Reconciliation for both of these file types is handled automatically in LOCUS.' Automatic reconciliation applies in much more far-reaching contexts that on the systems level. An instructive example can be found in electronic funds transfer. Consider a checking account, as proposed earlier in Section III-A. Credits and debits can be made to different copies of the account. Resolu- tion is straightforward so long as the ith copy is represented as x +6A) where x was the original account balance before partition and 6&(x) is the change in that partition. Then the new balance is x + E6i(x). This approach may be improper if we require the balance to remain positive. However, there are many ways to deal with this problem. When x is the balance of a large corporation, presumably the problem will not occur. More generally, one may operate the system in a more constrained fashion when it is partitioned, either by limiting withdrawals in those cases 1Most directory systems, and some mail systems, permit additional operations. Therefore the automatic recovery software in LOCUS for these file types is more involved than indicated here. where the customer is not trusted, or by imposing quota-like limits on withdrawals within each partition. A number of existing applications permit automated recon- ciliation while still allowing robust operation during partition. Two cases which have been studied carefully are banking and airline reservation systems [3]. Extensive, although not full, operation of these systems is quite feasible while partitioned. A desirable characteristic of system operation semantics, or of the reduced partition semantics, is that reconciliation of a data item not necessitate the alteration of many other data items. In order to keep automatic reconciliation cost low in a database, for example, one might insist that most transactions executed during partitions not require undoing and redoing when their read sets are subsequently altered during a recon- ciliation. This is the case today for portions ofbanking systems such as automated tellers. In general, it is often possible to break the semantics of operations into classes, and for each class give rules by which the reconciliation algorithms can be constructed. Simple semantic classes permit reconciliation in a straightforward way without keeping much history. As the semantics become more complex, more history and work is required. Of course, even when the semantics of operations are clear, automatic reconciliation can be very difficult, expensive, and in some cases impossible. Reconciliation cannot be performed in those cases where, as part of the system's activity, an ex- ternal action has been taken that cannot be undone nor can a compensating action be taken. These cases are the same ones for which general purpose data management recovery is impossible too. One suspects that in many systems, automatic reconciliation will be feasible for the large majority of data items. However, there will remain cases that require human intervention. Independent of the degree of automatic reconciliation, a consistent system policy must be defined for each of the following questions. * When and how are data conflicts detected? * Is pennission to access a data item altered by the fact that the item is in conflict? No alteration of permission raises the question of which version to make available, and leads to the possibility of propagating inappropriate values. * How are users informed of conflicts? * What support does the system provide the user for recon- ciling conflicts? These questions raise a number of architectural issues, some of which are addressed in [12], [14], [18]. V. CONCLUSIONS We have developed an effective method for detecting mutual inconsistency in distributed systems. Here inconsistency has been assumed to be caused by multiple users modifying differ- ent copies of a common file without mutually excluding one another. Such a situation would arise, for example, when network failures isolate these users in different partitions of the network. The technique also applies when partitions are artificially introduced; for example, when stations in a con- nected network delay their transmissions to take advantage of 245 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-9, NO. 3, MAY 1983 batching or lower communications rates at various times of day. The method used is simple, relying only on two newly introduced constructs, version vectors and origin points, for its operation. Although the method was discussed specifically in the context of ifie systems, it applies equally well to any class of resources for which occasional mutual inconsistency is tolerable for the sake of availability, or where the semantics of the allowed operations permit automated recovery. The general problem of how to resolve mutual inconsistency of copies of a resource, once it is detected, is a complex ques- tion. We have only given it a summary treatment here, since it raises many design issues and can be answered thoroughly only when the semantics regarding the use of the resource are explicitly known. We have noted, however, that for some resources automatic reconciliation is straightforward to implement. REFERENCES [ 1] P. A. Alsberg and J. D. Day, "A principle for resilient sharing of distributed resources," in Proc. 2nd Int. Conf Software Eng., Oct. 1976. l 21 C. A. Ellis, "A robust algorithm for updating duplicate databases," in Proc. 2nd Berkeley Workshop Distributed Data Management and Comput. Networks, 1977, pp. 1146-158. [3] S. Faissol, "Operation of distributed database systems under network partitions," Ph.D. dissertation, Dep. Comput. Sci., UCLA, 1981. [41 D. J. Farber and F. R. Heinrich, "The structure of a distributed computer system-The distributed file system," in Proc. ICCC, 1972. [51 M. Hammer, and D. Shipman, "An overview of reliability mecha- nisms for a distributed data base system," in Proc. Spring Comp- con, San Francisco, CA, Feb. 28-Mar. 3, 1978. [6] H. T. Kung and J. R. Robinson, "On optimistic methods for concurrency control," ACM TODS, vol. 6, pp. 213-226, June 1981. [71 L. Lamport, "Time, clocks, and the ordering of events in a distributed system," Commun. Ass. Comput. Mach., vol. 21, pp. 558-565, July 1978. [8] B. Lampson and H. Sturgis, "Crash recovery in a distributed data storage system," Tech. Rep., Xerox PARC, 1976. [9] D. A. Menasce, G. J. Popek, and R. R. Muntz, "A locking proto- col for resource coordination in distributed systems." [101 R. M. Metcalfe and D. R. Boggs, "Ethernet: Distributed packet switching for local computer networks," Commun. Ass. Comput. Mach., vol. 19, pp. 395-404, July 1976. [11] D. S. Parker and R. Ramos, "A distributed file system architecture supporting high availability," in Proc. 6th Berkeley Conf Distrib- uted Data Management & Comput. Networks, Asilomar, CA, Feb. 1982. [12] G. Popek, B. Walker, J. Chow, D. Edwards, C. Kline, G. Rudisin, and G. Thiel, "LOCUS: A network-transparent, high reliability distributed system," in Proc. 8th Symp. Oper. Syst. Principles, Asilomar, CA, Dec. 1981. [131 J. B. Rothnie and N. Goodman, "A survey of research and development in distributed database management," in Proc. 3rd VLDB, Tokyo, Oct. 1977. [141 G. J. Rudisin, "Architectural issues in a reliable distributed file system," M. S. thesis, Dep. Comput. Sci., UCLA, Rep. UCLA- ENG-8014 SDPS-80-001, Apr. 1980. [151 M. Stonebraker, "Concurrency control and consistency of multiple copies of data in distributed INGRES," IEEE Trans. Software Eng.., vol. SE-5, pp. 188-194, May 1979. [161 R. F. Thomas, "A solution to the concurrency control problem for multiple copy data bases," in Proc. Spring COMPCON, Feb. 28-Mar. 3, 1978. [171 R. F. Thomas, R. H. Schantz, and H. C. Forsdick, "Network operating systems," Rome Air Develop. Cen., Tech. Rep. RADC- TR-78-117, May 1978. [181 B. J. Walker, "Issues of network transparency and file replication in distributed systems: LOCUS," UCLA Tech. Rep., June 1981. D. Stott Parker, Jr. received the A.B. degree in mathematics from Princeton University, Princeton, NJ, in 1974 and the M.S. and Ph.D. degrees in computer science from the University of Illinois, Urbana- Champaign, in 1976 and 1978, respectively. Currently, he is an Associate Professor with the Department of Com- puter Science, University of California, Los Angeles. His main inter- ests include distributed processing, algorithms, architecture, and logic programming. Dr. Parker is a member of Sigma Xi and the Association for Comput- ing Machinery. Gerald J. Popek received the Ph.D. degree from Harvard University, Cambridge, MA, in 1972. He then joined the faculty at the University of California, Los Ange- les, where he is now Professor of Computer Science and in charge of the department's computing facilities. He is Principal Investigator of an ARPA contract, under which basic work in computer security was done, and which is now focused on distributed systems. He is the author of approximately 50 professional publications. He is also Vice President of Palyn Associates, Inc., and there has been responsible for major soft- ware efforts. Activities have focused on languages, operating systems, and distributed computing functions. Gerard Rudisin received a B.S. degree from the Massachusetts Institute of Technology, Cambridge, in 1975 and the M.S. degree from the Uni- versity of California, Los Angeles, in 1980, both in computer science. He is currently a Ph.D. candidate in computer science at the Univer- sity of California, Los Angeles, and also works for Western Digital Cor- poration's Systems Technology Center. His major research interests are local computer networks, operating systems, database technology, and Ada implementations and support environments. Mr. Rudisin is a member of the Association for Computing Machinery and Sigma Xi. Allen Stoughton received the B.S. degree in mathematics/computer science and the M.S. degree in computer science from the University of California, Los Angeles, in 1979 and 1981, respectively. His current interests include operating system protection models and programming language semantics. Bruce J. Walker received the B.Math degree from the University of Waterloo, Waterloo, Ont., Canada, in 1975 and the M.S. degree in com- puter science from the University of California, Los Angeles, in 1977. He is currently finishing the Ph.D. degree at UCLA. The topic of his dissertation is the transparency and replication principles in distributed systems and -in particular, LOCUS. Research interests also include com- puter security, about which he has published articles and a book. Evelyn Walton was born in Little Rock, AR. She received the B.A. and M.S. degrees from the University of California, Los Angeles, in 1972 and 1975, respectively. She has been working on research projects in the Department of Com- puter Science at UCLA since 1971. At present, she is working on the Ph.D. degree in computer science. Her research interests are in the areas of computer languages, systems, and networks. 246 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-9, NO. 3, MAY 1983 Johanna M. Chow received the B.S. degree (summa cum laude) in mathematics/computer science from the University of California, Los Angeles, in 1980. She is currently working on the Ph.D. degree in computer science at UCLA. Her thesis deals with the issues of transaction management and recoverable processes in distributed systems. David Edwards was born on June 15, 1957 in Glendale, CA and grew up in La Crescenta. He received the B.S. degree in mathematics/computer science from the University of California, Los Angeles, in 1979. As a graduate student at UCLA, he has worked on the LOCUS Dis- tributed Operating System project under Gerald Popek. He has been a departmental scholar and the recipient of a fellowship from System Engineering Laboratories. His current interests are in operating system design, especially distributed and in the interaction between the designs of hardware architecture and system software. Stephen Kiser received the B.S. degree in mathematics/computer sci- ence from the University of California, Los Angeles, in 1979. He is currently working towards the M.S. degree at UCLA and is also employed by the Xerox Corporation, El Segundo, CA. His interests are in distributed computing, computer theory, and programming languages. Charles Kline received the B.S., M.S., and Ph.D. degrees from the Uni- versity of California, Los Angeles, in 1970, 1971, and 1980, respectively. His Ph.D. research was in the areas of computer and network security, especially the use of security kernels and encryption. He is the author of numerous papers on the issues of computer security, security kernels, and the use of public key and conventional encryption in computer network security. He has been employed as a researcher in the Depart- ment of Computer Science at UCLA since 1966. During that period his research areas have included operating systems, languages, networks, distributed systems, and computer security problems. Input-Output Tools: A Language Facility for Interactive and Real-Time Systems JAN VAN DEN BOS, MARINUS J. PLASMEIJER, AND PIETER H. HARTEL Abstract-A conceptual model is discussed which allows the hierarchic definition of high-level input driven objects, called input-output tools, from any set of basic input primitives. An input-output tool is defined as a named object. Its most important elements are the input rule, out- put rule, internal tool definitions, and a tool body consisting of ex- ecutable statements. The input rule contains an expression with tool designators as operands and with operators allowing for sequencing, selection, interleaving, and repetition. Input rules are similar in appear- ance to production rules in grammars. The input expression specifies one or more input sequences, or input patterns, in terms of tool desig- nators. An input parser tries, at run-time, to match (physical) input tokens against active input sequences. If a match between an input token and a tool designator is found, the corresponding tool body is executed, and the output is generated according to specifications in the tool body. The control structures in the input expression allow a vari- Manuscript received October 2, 1981; revised October 1, 1982. This work was supported in part by a grant from The Netherlands Organiza- tion for the Advancement of Pure Research (ZWO). J. van den Bos and M. J. Plasmeijer are with the Department of Computer Science and the Computer Graphics Group, University of Nijmegen, Nijmegen, The Netherlands. P. H. Hartel was with the Department of Computer Science and the Computer Graphics Group, University of Nijmegen, Nijmegen, The Netherlands. He is now with the Department of Computer Science, University of Amsterdam, Amsterdam, The Netherlands. ety of input patterns from any number of sources. Tool definitions may occur in-line or be stored in a library. All tools are ultimately encompassed in one tool representing the program. The input-output tool model offers a nonprocedural input specifica- tion language with a parser provided by the run-time system. It forces clean and structured programs and allows for easy definition of ab- stract input devices and simulation of physical devices on other devices. Implementations have been completed and are being evaluated. Index Terms-Computer graphics, dialogue, input functions, input tools, interaction language, process control, programming language, real time, specification language. I. INTRODUCTION INTERACTIVE computing, in which we include real-time systems and process control, forms a sizable, perhaps more than 50 percent, share of all computing. Most interactive sys- tems have been programmed using regular (or slight derivatives of) batch programming languages. Unfortunately, these lan- guages are badly lacking in provisions for handling input and output on an advanced level. Few, if any of these languages, have, for example, provisions to read one input source out of several specified. Facilities to define named abstract input devices in terms of (a collection of) existing physical devices 0098-5589/83/0500-0247$01.00 © 1983 IEEE 247 Copy with citationCopy as parenthetical citation