From Casetext: Smarter Legal Research

United States v. Lewis

UNITED STATES DISTRICT COURT DISTRICT OF MINNESOTA
Mar 3, 2020
442 F. Supp. 3d 1122 (D. Minn. 2020)

Opinion

Criminal No. 18-194 ADM/DTS

2020-03-03

UNITED STATES of America, Plaintiff, v. Kenneth Davon LEWIS, Defendant.


MEMORANDUM OPINION AND ORDER

I. INTRODUCTION

This matter is before the undersigned United States District Judge for a ruling on Defendant Kenneth Davon Lewis' ("Lewis") Objection [Docket No. 119] to Magistrate Judge David T. Schultz's January 6, 2020 Report and Recommendation [Docket No. 115] ("R&R"). In the R&R, Judge Schultz recommends denying in part and granting in part Lewis' Motion to Exclude DNA Evidence [Docket No. 27]. For the reasons stated below, the Objection is overruled and the R&R is adopted.

II. BACKGROUND

The full background of this motion is thoroughly set forth in the R&R and is incorporated by reference. Briefly, Lewis faces trial after being federally indicted on one count of being an armed career criminal in possession of a firearm. See Superseding Indictment [Docket No. 126]. Lewis was arrested on April 18, 2018 after scuffling with police officers and a landlord in the stairwell of an apartment building. R&R at 1132–33. A Smith & Wesson 9mm gun was recovered at the scene. R&R at 1131–32, 1132–33.

After Lewis' arrest, the gun was sent to Midwest Regional Forensic Laboratory ("MRFL") for DNA testing. The MRFL lab analyzed three DNA swabs from the gun using a probabilistic genotyping software program called STRmix. Gov't Ex. 1 at 2; Gov't Ex. 19. Each of the analyzed swabs contained a large amount of "good quality" DNA. Tr. Vol. III [Docket No. 83] at 456.

The lab received five DNA swabs from the gun, but was able to analyze only three of the swabs because two did not include sufficient DNA to permit testing. R&R at 1156; Gov't Ex. 1 at 2.

Unless otherwise specified, all Exhibits referenced in this Order were introduced during the three days of hearings on the Daubert motion.

Based on the STRmix analysis, the MRFL lab determined the DNA on the gun was a mixture from four persons. Gov't Ex. 1 at 2; Def. Ex. 6 at 1. The STRmix results showed that the DNA mixture in each of the three swabs is "greater than one billion times more likely if it originated from [Lewis] and three unknown unrelated individuals than if it originated from four unknown unrelated individuals." Gov't Ex. 1 at 2. This statistic, called a likelihood ratio ("LR"), is interpreted as providing "extremely strong support for inclusion" of Lewis as a contributor to the DNA mixture found on the gun. Id. Lewis was determined to be the highest contributor and was estimated to have contributed 56% to the DNA mixture. Def. Ex. 6 at 1; Tr. Vol. II [Docket No. 60] at 166–67. The lowest contributor was estimated to have contributed 6% to the mixture. Def. Ex. 6 at 1.

The STRmix results also showed that all involved police officers and the landlord were excluded as contributors to the DNA mixture on the gun. Gov't Ex. 1 at 2. No likelihood ratio was provided for this result. See id.

Lewis challenges the admissibility of the DNA evidence, arguing the use of STRmix to analyze the DNA samples at issue is not sufficiently reliable to be admissible under Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993). Judge Schultz conducted three days of evidentiary hearings and heard testimony from several experts, including Dr. John Buckleton. Dr. Buckleton is the co-developer of STRmix, which was created in 2011 through a joint venture that included New Zealand's Institute of Environmental Science and Research ("ESR"), where Dr. Buckleton serves as lead scientist. Tr. Vol. I [Docket No. 53] at 7, 12–13. Testimony was also heard from Anne Ciecko, the DNA technical leader at the MRFL lab. Tr. Vol. II at 127. Three defense experts also testified: Nathaniel Adams, a software engineer who reviews and analyzes forensic DNA testing results; Dr. Dan E. Krane, Ph.D., professor of biological sciences at Wright State University and president of Forensic Bioinformatics, Inc.; and Dr. Mats Heimdahl, professor of computer science at the University of Minnesota. Id. at 299–300; Tr. Vol. III at 378–79, 555–56; Def. Exs. 18, 25, 33.

Additionally, Judge Schultz appointed Dr. William Thompson as a special master to advise the Court on the issues of scientific reliability. See Order [Docket No. 77]. Dr. Thompson is a professor emeritus in the University of California–Irvine's Department of Criminology, Law & Society. Thomspon CV [Docket No. 114]. He also chairs the Human Factors Committee and is a member of the Forensic Science Standards Board of the Organization of Scientific Area Committees, which is sponsored by the National Institute of Standards and Technology. Id. He has written numerous articles on the topic of forensic DNA evidence. Id. Dr. Thompson reviewed the transcripts and exhibits from the first two days of testimony and personally attended the third day, during which he had the opportunity to ask questions of the witnesses. Dr. Thompson provided a Special Master's Report [Docket No. 113] to the Court on October 31, 2019. The Report was shared with the parties prior to their submission of post-hearing briefs. R&R at 1134–35.

On January 6, 2020, Judge Schultz issued a 63-page R&R finding that STRmix meets Daubert admissibility standards as to the DNA evidence of inclusion—that is, the evidence regarding the likelihood that Lewis is a potential contributor to the DNA mixture on the gun. As to the DNA evidence of exclusion—that is, the evidence excluding the police officers and landlord as potential contributors to the DNA mixture—Judge Schultz determined this DNA evidence falls below STRmix's established threshold for reliability. The R&R thus recommends denying Lewis' Daubert motion as to the evidence of inclusion, and granting the motion as to the evidence of exclusion. Lewis objects to the determination that the DNA evidence of inclusion is admissible. There were no objections to the conclusion that the evidence of exclusion is inadmissible.

III. DISCUSSION

A. Standard of Review

In reviewing a magistrate judge's report and recommendation, the district court "shall make a de novo determination of those portions of the report or specified proposed findings or recommendations to which objection is made." 28 U.S.C. § 636(b)(1)(C) ; see also D. Minn. L.R. 72.2(b). A district judge "may accept, reject, or modify, in whole or in part, the findings or recommendations made by the magistrate judge." 28 U.S.C. § 636(b)(1)(C).

B. Daubert Standard

The admission of expert testimony is governed by Rule 702 of the Federal Rules of Evidence, which provides:

A witness who is qualified as an expert by knowledge, skill, experience, training, or education may testify in the form of an opinion or otherwise if: (a) the expert's scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue; (b) the testimony is based on sufficient facts or data; (c) the testimony is the product of reliable principles and methods; and (d) the expert has reliably applied the principles and methods to the facts of the case.

When evaluating the admissibility of expert testimony, a trial court serves as the gatekeeper to "ensure that any and all scientific testimony or evidence admitted is not only relevant, but reliable." Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 589, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993).

"In a case involving scientific evidence, evidentiary reliability will be based on scientific validity ." Id. at 590 n.9, 113 S.Ct. 2786 (emphasis in original). A trial court may consider one or more of the following non-exclusive factors in assessing scientific validity: "(1) whether the theory or technique can be (and has been) tested; (2) whether the theory or technique has been subjected to peer review and publication; (3) the known or potential rate of error; and (4) whether the theory has been generally accepted [in the relevant scientific community]." Lauzon v. Senco Prods., Inc., 270 F.3d 681, 687 (8th Cir. 2001) (citing Daubert, 509 U.S. at 593–94, 113 S.Ct. 2786 ). A district court possesses broad discretion in making its reliability determination. Kumho Tire Co. v. Carmichael, 526 U.S. 137, 142, 119 S.Ct. 1167, 143 L.Ed.2d 238 (1999). The proponent of the expert testimony bears the burden of showing by a preponderance of the evidence that the testimony is admissible. Lauzon, 270 F.3d at 686.

"As a general rule, the factual basis of an expert opinion goes to the credibility of the testimony, not the admissibility, and it is up to the opposing party to examine the factual basis for the opinion in cross-examination." United States v. Finch, 630 F.3d 1057, 1062 (8th Cir. 2011) (internal quotations and alterations omitted). Although Rule 702 favors admissibility rather than exclusion, the district court must ensure that an expert's testimony "rests on a reliable foundation." Daubert, 509 U.S. at 597, 113 S.Ct. 2786.

C. Lewis' Objections

Lewis raises three objections to the R&R's conclusion that STRMix evidence of Lewis' inclusion as a potential contributor to the DNA mixtures meets Daubert's admissibility standard. Lewis argues: 1) STRMix has not demonstrated foundational validity; 2) STRMix has no known error rate; and 3) STRMix did not follow minimum software industry practices to ensure its software performed reliably.

1. Foundational Validity

Lewis argues STRmix has not been shown to be a sufficiently reliable method for analyzing the type of complex DNA mixtures at issue here. Lewis contends that validation studies show the range of reliability for STRmix does not extend beyond DNA mixtures involving more than three contributors in which the minor contributor constitutes less than 20%. The DNA mixtures at issue here each involve four contributors with the minor contributor constituting 6%. Def. Ex. 6 at 1. Lewis thus contends these mixtures exceed the bounds of reliability which have been validated for STRmix.

Lewis largely relies on a 2016 report by the President's Council and Advisors on Science and Technology ("PCAST") that addressed the validity and reliability of probabilistic genotyping software programs such as STRmix. Def. Ex. 2. The PCAST Report examined the published evidence and concluded:

The two most widely used methods (STRmix and TrueAllele) appear to be reliable within a certain range, based upon the available evidence and the inherent difficulty of the problem. Specifically, these methods appear to be reliable for three-person mixtures in which the minor contributor constitutes at least 20 percent of the intact DNA in the mixture and in which the DNA amount exceeds the minimum level required for the method.

Id. at 80 (footnote omitted). The PCAST Report further stated that "[t]he range in which foundational validity has been established is likely to grow as adequate evidence for more complex mixtures is obtained and published." Id. at 82. An addendum to the PCAST Report was issued in 2017. Addendum PCAST Report [Docket No. 48, Attach. 2]. The addendum clarifies that the concerns related to the minor contributor arise when the person of interest contributes less than 20% of the DNA in the mixture. Id. at 8; Tr. Vol. I at 89.

In response to the PCAST Report, a study was conducted and published by STRmix co-developer Dr. Buckleton and his colleagues at ESR. Gov't Ex. 16. The study, titled Internal Validation of STRmix—a Multi-Laboratory Response to PCAST (the "PCAST Response Study"), examined 2,825 DNA mixtures compiled from 31 laboratories. Id. at 12. Mixtures of three, four, five, and six contributors were specifically targeted. Id. The mixtures were interpreted by staff at ESR using STRmix Version 2.5.02. Id. As stated in Dr. Thompson's Special Master Report, "[w]hen the mixtures were compared with the DNA profiles of thousands of known contributors and millions of non-contributors, STRmix was able to distinguish the contributors from non-contributor[s] with a high level of accuracy." Special Master Report at 31. The study "show[s] persuasively that STRmix is capable of producing accurate results with extremely low error rates: STRmix not only works, it seems to work extremely well, at least when used in the manner it was used in these studies." Id. at 30–31.

In the R&R, Judge Schultz determined that the PCAST Response Study "satisfies the criteria set forth by PCAST for expanding [STRmix's] foundational validity to mixtures of four and five persons, assuming sufficient DNA material is present and that a sufficient percentage of that DNA was contributed by the most minor contributor." R&R at 1156. Judge Schultz thus concluded that STRmix is a reliable principle and method for analyzing complex DNA mixtures including the DNA mixture in this case.

Lewis objects to this conclusion, arguing the PCAST Response Study used Version 2.5.02 of the STRmix software, whereas the MRFL lab used STRmix Version 2.4.05 to process the DNA mixture in this case. Lewis contends several changes were made to STRmix between Version 2.4.05 and Version 2.5.02, and that these changes directly affect the LR calculations performed by the different software versions. Lewis thus argues the PCAST Response Study's validation testing on the later version of STRmix did not validate the earlier version that was used here.

The R&R anticipated this objection and concluded the argument lacked merit because the newer STRmix version "changed nothing fundamental in the program that would be likely to affect its accuracy." R&R at 1149. In reaching this conclusion, the R&R examined four miscodes in Version 2.4.05 that were corrected in the later version, and determined that none of the miscodes in Version 2.4.05 had a material impact on the LR calculations. Id. at 1149–50. The R&R also noted that ESR had performed diagnostics on Version 2.4.05 and confirmed it was valid. Id. at 1149 (citing Tr. Vol. III at 515). The MRFL lab also performed its own internal validation study of STRmix Version 2.4.05 and determined it was valid. Id. at 1149–50 (citing Def. Ex. 14).

The R&R examined Miscode Numbers 4, 5.1, 5.2, and 5.3 listed in Defendant's Exhibit 28. See R&R at 1149–50.

Dr. Buckleton testified that ESR performs regression testing "on all versions [of STRmix] against the new one. So 2.4.05 has been tested against 2.5." Tr. Vol. III at 515.

Lewis nevertheless argues that three additional changes were made between Versions 2.4.05 and 2.5.02 that were not discussed in the R&R. Lewis contends that these changes affected LR calculations, and that the combined effect of the changes on LRs is unknown. This argument ignores that the ESR performed diagnostics on Version 2.4.05 to confirm its validity, and that the MRFL lab conducted an internal validation study of Version 2.4.05 for use in the lab. Given these safeguards, the Court agrees with the conclusion in the R&R that the changes from Version 2.4.05 to 2.5.02 did not fundamentally change STRmix's accuracy.

Because the updates from Version 2.4.05 to Version 2.5.02 did not materially alter the STRmix software's accuracy, the PCAST Response Study's validation testing on the later version was sufficient to establish the foundational validity and reliability of Version 2.4.05. 2. Error Rate

Lewis next argues that STRmix is unreliable because it does not have a known error rate. This argument was also addressed in the R&R, which concluded that the "error rate for false inclusion is known and is acceptably small." R&R at 1154. In reaching this conclusion, the R&R relied on Dr. Buckleton's testimony that the error rate for false inclusion is "immeasurably small," and the Special Master Report stating that "[w]hile there were a few instances in which STRmix produced results that falsely linked non-contributors to the [DNA] mixtures, these misleading results were rare." Tr. Vol. I at 64; Special Master Report at 8.

A false inclusion occurs when a person who did not contribute to the DNA mixture is falsely linked to the mixture. Special Master Report at 8.

Lewis argues that "despite these claims that the error rate is ‘small’ or ‘rare,’ no numerical error rate or range of error rates for STRmix false inclusions has ever been stated on the record [or] in the documentary evidence." Obj. at 9. However, Daubert does not require that an error rate be numerically identified for scientific evidence to be found sufficiently reliable. Rather, the known or potential error rate is one of several non-exclusive factors that courts consider when assessing the scientific validity of a theory or technique. Daubert, 509 U.S. at 593–94, 113 S.Ct. 2786.

Here, the R&R found that the error rate for STRmix "can be and has been estimated by checking how often the program assigns highly incriminating likelihood ratios to the profiles of known non-contributors." R&R at 1154. The PCAST Response Study shows that such errors are rare and occur no more often than would be expected by chance due to an "adventitious match," meaning two individuals having the same or extremely close DNA. Special Master Report at 8, Tr. Vol. I at 42–43. "In other words, the rate of false inclusions was approximately what would be expected if STRmix performed its function flawlessly." R&R at 1154 (quoting Special Master Report at 8). Based on this evidence, the R&R correctly concluded that the error rate for STRmix is acceptably small, and that the absence of a precisely calculated error rate does not alter the reliability and validity of STRmix.

Lewis also argues that the R&R only addressed error rates for false inclusion, and did not account for the possibility that STRmix might generate inaccurate LRs. As an example, Lewis argues STRmix might calculate a DNA sample as being a billion times more likely to have originated from the defendant, when in actuality the LR is only 20 times more likely. Lewis contends there are no published studies showing how often STRmix miscalculates the magnitude of a LR.

Judge Schultz considered the accuracy of the LR generated by STRmix and found that "[t]he variation in the precise LR numbers generated by STRmix from one run to the next is very small. The STRmix validation studies have demonstrated that the variance in LRs is consistently less than an order of magnitude." R&R at 1152 (citing Tr. Vol. II at 151, 170); see also Def. Ex. 14 at 28–29 (discussing LR precision and noting that LR does not vary more than one order of magnitude). Thus, there is no basis to assume that STRmix would generate an inaccurate LR of the magnitude provided in Lewis' example. To the contrary, Dr. Thompson has stated that "[s]tatistical analyses suggest that, in the aggregate, the LRs produced by STRmix are properly calibrated and do not overstate the value of incriminating evidence." Special Master Report at 36. Thus, the absence of a specified error rate does not render STRmix unreliable.

3. Software Industry Standards

Finally, Lewis argues STRmix is not sufficiently validated from a software engineering perspective because it does not satisfy industry practices for the development and testing of new software. This argument lacks merit. As the R&R recognized, STRmix complies with all published guidance documents specifically directed to software validation for probabilistic genotyping systems. Those guidelines include standards published by the Scientific Working Group on DNA Analysis Methods, the Forensic Science Regulator, and the International Society for Forensic Genetics.

The R&R also found that STRmix "very nearly" complies with the safety-critical software developments published by the Institute of Electrical and Electronics Engineers ("IEEE"), and that strict compliance with IEEE standards is not required because STRmix has been rigorously tested and shown to be reliable. R&R at 1151–52.

Resisting this conclusion, Lewis relies on the following testimony of Dr. Mats Heimdahl, Dean of the University of Minnesota Computer Science and Engineering Department, to argue that the STRmix software has not been adequately validated:

[The STRmix software] seems to be a plethora of configuration parameters that may or may not have been explored effectively during the validation testing efforts, but this is where the verification testing would come in and start twiddling all of those in all kinds of different ways to see if they interact in unexpected ways, if it always works and if it fails when it's expected to fail, and so on and so forth.

Tr. Vol. III at 591. This testimony does not overcome the R&R's thorough and well reasoned analysis leading to the conclusion that the STRmix software has been sufficiently validated.

IV. CONCLUSION

Based upon the foregoing, and all the files, records, and proceedings herein, IT IS HEREBY ORDERED that:

1. Defendant Kenneth Davon Lewis' Objection [Docket No. 119] to Magistrate Judge David T. Schultz's January 6, 2020 Report and Recommendation is OVERRULED ;

2. The Report and Recommendation [Docket No. 115] is ADOPTED ; and

3. Lewis' Motion to Suppress DNA Evidence [Docket No. 27] is GRANTED IN PART and DENIED IN PART as follows:

a. The evidence as to Lewis' inclusion as a potential contributor to the DNA mixtures found on the gun is admissible at trial.

b. The evidence as to the exclusion of the relevant police officers and the landlord as potential contributors to the DNA mixtures found on the gun is not admissible at trial.

REPORT AND RECOMMENDATION

DAVID T. SCHULTZ, United States Magistrate Judge

INTRODUCTION

Defendant Kenneth Davon Lewis is charged with being a felon in possession of a firearm as an armed career criminal. Indictment, Docket No. 1. Lewis challenges the admissibility of forensic DNA evidence relating to the firearm at issue, a Smith & Wesson 9mm. Motion, Docket No. 27. The DNA evidence in this case was generated by use of a probabilistic genotyping software program called STRmix, which is relatively new and exceedingly complex. The Government seeks to introduce two opinions based on this forensic analysis. First, it seeks to introduce testimony that the DNA mixture found on the gun is "greater than one billion times more likely if it originated from [Lewis] and three unknown unrelated individuals than if it originated from four unknown unrelated individuals." Gov't Ex. 1. This statistic, called a likelihood ratio, is said to provide "extremely strong support for inclusion" of Lewis as a contributor to the DNA found on the gun. Id.

Pronounced "star mix."

References to exhibit numbers are from exhibits introduced at the Daubert hearing.

Second, the Government seeks to introduce testimony that this same STRmix analysis excludes all relevant police officers as contributors to the DNA mixture found on the gun. This opinion is offered to rebut Lewis's defense that, if his DNA is on the gun, it was transferred there by police.

For the reasons set forth below, this Court finds that the evidence regarding the likelihood ratio as to Lewis's DNA is sufficiently reliable to pass muster under Daubert and its progeny. At the same time, however, the evidence regarding the exclusion of the officers as contributors to the DNA mixtures found on the gun is not reliable and therefore must be excluded. Accordingly, the undersigned recommends that the Court deny Lewis's Daubert challenge as to the former evidence and grant it as to the latter.

FINDINGS OF FACT

I. The Arrest

On August 15, 2018, Lewis was charged in this Court with one count of being an armed career criminal in possession of a firearm in violation of 18 U.S.C. §§ 922(g)(1) and 924(e). Indictment, Docket No. 1. Lewis was also charged in state court with first degree assault.

Lewis's charges arise out of a scuffle with police that occurred at the Loftus Centre apartment building, 3931 Coon Rapids Boulevard, on April 18, 2018. According to police officers, the scuffle ensued because Lewis was found at the property in violation of a court restraining order. Lewis had just been released from prison and was allegedly at the property to collect some personal effects when someone spotted him in the stairwell and notified police. According to Officer Nelson's affidavit, when police confronted Lewis, he was uncooperative and resisted arrest. During the encounter Officer Sharon saw something silver in Lewis's pocket and believed it was either a knife or a gun. When Lewis refused to take his hand out of his pocket, Officer Sharon tried to handcuff Lewis but could not hold him or get him under control.

The facts recounted in this section are taken from the search warrant application submitted to Anoka County District Court Judge Dyanna Street. The search warrant application is not made part of this Court's record but is part of the state court proceedings.

The owner of the Loftus Centre, Michael Moriarty, followed Officer Sharon into the stairwell. Mr. Moriarty told Officer Sharon he believed Lewis carried handguns. Mr. Moriarty said he was concerned for Officer Sharon's safety and joined in the attempt to subdue Lewis whom he described as freakishly strong. During the scuffle Mr. Moriarty discovered Lewis was holding a gun and was afraid Lewis was going to shoot the officer. Eventually the three ended up on the ground. Mr. Moriarty said he felt the gun against his chest and thought he was going to get shot. Other officers responded to the scene and were able to subdue Lewis. The gun was ejected from the pile and landed in the stairwell. As a result of the scuffle Lewis was charged with first-degree assault (the state charge) and being a felon in possession of a firearm as an armed career criminal (the federal charge).

Lewis's account of the interaction with police differs. According to a custodial interrogation with police, Lewis claimed he was in the stairwell of the apartment building having sex with a woman, BH, when police interrupted the couple and trespassed Lewis. Lewis claims the gun was never in his possession, that he did not know its whereabouts, and that it was planted in the stairwell to frame him.

The facts recounted in this section are taken from an audiotape of a custodial interrogation of Lewis that took place on April 19, 2018. A copy of that audio was entered into evidence during a prior suppression hearing as Gov't Ex. 1. See generally Report and Recommendation, Docket No. 36.

As noted, these allegations are pieced together from various court filings and police interviews given by Lewis. Because this motion is filed before trial, the Court does not have the benefit of a fully developed factual record with respect to these underlying facts.

The gun was recovered from the scene of the scuffle and sent to the Midwest Regional Forensic Laboratory (MRFL) for DNA processing. DNA profiling was performed on swabs collected from the grip and trigger area of the gun (Item 8-A), the slide serrations, safety and slide release levers and hammer (Item 8-B), and the empty firearm magazine (Item 10-A). Gov't Ex. 1. The lab processed these swabs and analyzed the DNA recovered using a probabilistic genotyping software known as STRmix. Id. The lab interpreted the DNA found on the gun as a four-person mixture and concluded from its analysis that there was "extremely strong" support for inclusion of Lewis as a contributor to the DNA on the gun and that the officers and Mr. Moriarity were all "excluded" as contributors to the DNA mixture. Id.

II. The Procedural History

On October 29, 2018, Lewis moved for a Daubert hearing to challenge the admissibility of the analysis of the DNA evidence allegedly found on the gun identified in the indictment. Id. at 2; Motion, Docket No. 27. Lewis asserts that the evidence is not admissible under Daubert because the STRmix probabilistic genotyping software is not reliable and, to the extent it is a reliable scientific method in some circumstances, it was not reliably applied in this case. Motion, Docket No. 27. The motion was referred to the undersigned, who granted an evidentiary hearing. Order, Docket No. 37; Order, Docket No. 45.

An evidentiary hearing was conducted on March 5, April 2, and August 16, 2019. Docket Nos. 49, 57, 79. The software developer, Dr. John Buckleton, and the head of the lab, Anne Ciecko, testified for the Government on March 5 and April 2 respectively. Dr. Buckleton holds Ph.D. and DSc degrees from the University of Auckland in New Zealand. Tr. I at 14. He has been a forensic scientist since 1983 and has worked on DNA analysis since its advent in 1987. Tr. I at 13-14. Dr. Buckleton is a co-developer of STRmix, which was created in 2011 as a joint venture between scientists from an Australian laboratory that had closed in 2010 and the Institute of Environmental Science and Research (ESR), the New Zealand government's forensic science service. Tr. I at 7, 12-13. Dr. Buckleton is the principal scientist at ESR. He has testified about 12 times in state and federal courts in the United States, has published over 200 papers in peer-reviewed literature, and written several textbooks on DNA evidence interpretation. Tr. I at 14-15. Dr. Buckleton has published about 50 peer-reviewed articles on STRmix, of which five or so pertain to the development and validation of the software. Tr. I at 16. Before STRmix, he published a handful of articles on the testing and development of software code. Tr. I at 16-17.

Transcripts of the three days of testimony are cited by volume and page number.

STRmix is used by about 63 laboratories worldwide in New Zealand, Australia, Finland, Dubai, the United Kingdom, Canada, Switzerland, and about 43 laboratories in the United States. Tr. I at 19-20, 25; Gov't Ex. 2, Docket No. 34-2. About 75 percent of labs in North America have purchased STRmix. Tr. I at 21. All U.S. federal laboratories, including the U.S. Army Criminal Investigation Laboratory and the FBI, use STRmix. Tr. I at 25-26, Tr. III at 461. STRmix has been the subject of approximately 60 peer-reviewed articles. Tr. III at 470-71, Gov't Ex. 17.

Defense expert Nathaniel Adams testified at the April 2, 2019 hearing. Adams has a Bachelor of Science degree in Computer Science from Wright State University and has been a Systems Engineer at Forensic Bioinformatic Services, Inc. in Fairborn, Ohio since 2012. Tr. II at 299; Def. Ex. 18 (Adams CV). He consults with attorneys and reviews and analyzes forensic DNA testing results that were obtained using commercial software. He has testified about STRmix on multiple occasions. Tr. II at 300; Def. Ex. 18.

Following the second day of testimony the Court appointed a special master, Dr. William Thompson, to assist its understanding of the science, particularly the STRmix software program, and to write a report advising the Court regarding these issues. Dr. Thompson is a Professor Emeritus in the Department of Criminology, Law & Society at the University of California – Irvine where he has been a faculty member since 1983. Docket No. 114 (Thompson CV). He is the Chair of the Human Factors Committee and a member of the Forensic Science Standards Board of the Organization of Scientific Area Committees (OSAC), which is sponsored by the National Institute of Standards and Technology (NIST). Id. He has written over 50 articles in law reviews and peer-reviewed journals. Id. Dr. Thompson reviewed the transcripts and exhibits from the first two days of the hearing and personally attended the final day of testimony, at which he was given the opportunity to ask questions of the witnesses. Special Master Report, Docket No. 113; Tr. III at 371-76, 424-39, 538-54.

Dr. Buckleton, Lab Director Ciecko, and two additional defense experts, Dr. Dan Krane and Dr. Mats Heimdahl, testified on the final day of the hearing. Tr. III. Dan Krane, Ph.D., is a Professor of Biological Sciences at Wright State University in Dayton, Ohio, where he has been a faculty member since 1993. He is also the President, CEO, and Senior Analyst at Forensic Bioinformatics, Inc. Def. Ex. 25 (Krane CV). Mats Heimdahl is a Professor of Computer Science at the University of Minnesota. Tr. III at 555-56. He has worked in the field of software development and engineering for about 30 years, with a focus on evaluating safety critical systems. Tr. III at 556; Def. Ex. 33 (Heimdahl CV).

Dr. Thompson provided his report to the Court on October 31. It was shared with the parties but not publicly filed pending the issuance of this Report and Recommendation. Dr. Thompson's report has now been filed at Docket No. 113. After reviewing his report, the parties submitted simultaneous briefs on November 21 and December 6. Docket Nos. 107, 108, 109, 110.

III. The Science

This science section will first discuss the biological side of DNA analysis, focusing on how a forensic laboratory can determine the genetic characteristics of biological samples. It will then discuss the statistical side of the analysis, focusing on how STRmix assesses whether a particular individual contributed DNA to a mixed sample of biological material.

A. DNA

" ‘Deoxyribonucleic acid, or DNA, is a molecule that encodes the genetic information in all living organisms. 4 Mod. Sci. Evidence § 30:1 (2018-2019 ed.), The Law and Science of Expert Testimony, DNA Typing, Introduction to basic principles. ’ " United States v. Gissantaner , 417 F.Supp.3d 857, 861 (W.D. Mich. 2019), appeal filed (6th Cir. Nov. 7, 2019) (No. 19-2305). "Although 99.9% of the DNA sequences in human cells are the same between any two individuals, enough of the DNA is different that it is possible to distinguish one individual from another (other than identical twins). David H. Kaye and George Sansabaugh, Reference Guide on DNA Identification Evidence, Reference Manual on Scientific Evidence , 136-137 (Federal Judicial Center, 3d ed. 2011)." Id. DNA evidence refers to the chemical and physical analysis of DNA that reveals structural differences in the DNA molecules found in individuals.

B. The biological analysis: generating DNA profiles

In order to fully appreciate the nature of the DNA evidence at issue in this case and, more importantly, the method of its analysis, it is necessary to understand how DNA evidence is processed to generate DNA profiles. Dr. Thompson's Special Master Report contains a clear and concise description of the biological process of DNA extraction and amplification:

... DNA is first chemically extracted from a sample containing biological material, such as blood, semen, hair, or skin cells. Next, a predetermined set of DNA segments ("loci") containing small repeated sequences are amplified using the Polymerase Chain Reaction (PCR), an enzymatic process that replicates a targeted DNA segment over and over to yield millions of copies. After amplification, the lengths of the resulting DNA fragments are measured using a technique called capillary electrophoresis, which is based on the fact that longer fragments move more slowly than shorter fragments through a polymer solution. The raw data collected from this process are analyzed by a software program to produce a graphical image (an electropherogram) and a list of numbers (the DNA profile) corresponding to the sizes of the each of fragments (by comparing them to known "molecular size standards") .... (PCAST, 2016, p. 69) (footnotes omitted).

Special Master Report at 15, Docket No. 113.

A close look at the DNA testing conducted in this case provides a useful illustration. At MRFL, analysts used cotton swabs to collect cellular material from various parts of the Smith & Wesson pistol. Gov't Ex. 1. They extracted the DNA by using chemicals to break open the cells and release the DNA, and additional chemicals to clean and purify the DNA for analysis. Next, they amplified DNA found at 24 loci using a commercial test kit called the Promega PowerPlex Fusion System and a laboratory instrument known as a thermal cycler (which used PCR to create millions of copies of DNA fragments from the targeted loci). The analysts then injected the DNA samples into a computer-controlled instrument called the Applied Biosystems 3500 Genetic Analyzer, which used capillary electrophoresis to measure the length of the amplified DNA fragments. The Genetic Analyzer then produced electropherograms showing the lengths of the DNA fragments that were detected at each locus of each sample.

The DNA fragments examined in this process originate at loci that contain short, repeating sequences of genetic code called short tandem repeats (STR). The number of repetitions tends to vary from person to person, which causes the length of these DNA fragments also to vary in ways that can be used to distinguish different individuals. At each locus there are several possible lengths that the fragments might have. Each possible length variant is called an allele. The alleles are identified by numbers that correspond to the number of "repeats" in the STR. For example, a fragment containing eleven repetitions of a short sequence of genetic code will be labeled "allele 11."

The Genetic Analyzer identifies the alleles by measuring the lengths of the DNA fragments (using a process known as capillary electrophoresis ). The results of this analysis are displayed in graphs known as electropherograms, such as the one shown below (Figure 1). Amplification efficiency varies across loci resulting in some loci with peak heights that are either higher or lower than average.

At each locus a person generally inherits two of these alleles, one from each parent. For a given individual, the same pair of alleles will be found at that locus in the DNA of all his nucleated cells and remains consistent throughout his lifetime. Different individuals tend to have different alleles. While two people may, by chance, have the same alleles at a few loci, the chances of such a coincidence diminish rapidly as more loci are examined. The set of alleles that a person has across multiple loci is called a multi-locus genotype or a DNA profile.

While this is generally a true statement, scientists recently reported that an individual who underwent a bone marrow transplant experienced a change to his DNA profile such that he took on the DNA profile of the bone marrow donor. See When a DNA Test Says You're a Younger Man, Who Lives 5,000 Miles Away, New York Times, Dec. 7, 2019.

Figure 1 displays one of several electropherograms produced in this case for Item 8-A, the swab from the grip and trigger area of the pistol. It shows the alleles detected at five genetic loci. The names of the five loci are D8S1179, D12S391, D19S433, FGA, and D22S1045. Other electropherograms (not shown here) display the alleles detected at 18 other loci when Item 8-A was tested.

The electropherograms display a set of peaks that signal the presence of alleles. The position of a peak along the graph indicates the length of the amplified DNA fragments containing an STR. Based on the position of the peak (relative to molecular size standards, which are not shown here), the computer determines which allele the peak represents and applies a label shown at the top of the box seen immediately below the peak on the electropherogram. The first set of peaks, on the left side of the electropherogram, show the alleles detected at locus D8S1179. There are seven peaks (indicating seven alleles) which the computer has determined to be alleles 9, 10, 11, 12, 13, 14 and 15. By contrast, at the far right side of the electropherogram there are two peaks detected at locus D22S1045, which the computer has determined to be alleles 12 and 15. Figure 1: Electropherogram Showing Alleles Detected at Five Genetic Loci (From Left to Right: D8S1179, D12S391, D19S433, FGA, and D22S1045). Labels under the peaks indicate (from top to bottom) the allele detected, the height of the peak (in RFU), and the peak's measured position on the x-axis.

The height of each peak corresponds roughly to the quantity of DNA detected—taller peaks indicate more DNA. The second number in the box under each peak indicates the measured height of the peak in "relative fluorescent units" (RFU). The third number at the bottom of each box is a measurement (not important for this discussion) indicating where the peak falls along the X-axis of the electropherogram.

From this electropherogram, a DNA analyst can draw inferences about DNA Sample 8-A. First, the large number of alleles indicates the sample contains DNA of more than one individual. Because each individual generally contributes at most two distinct alleles at each locus, the presence of seven alleles at locus D8S1179, and eight alleles at locus D12S391, indicates there are at least four contributors to this DNA mixture. Second, there is considerable variation in the height of the peaks that represent these alleles at loci D8S1179 and D12S391. This height variation indicates that the contributors are contributing different amounts of DNA to the mixture.

There is considerable uncertainty, however, about the genotypes of individual contributors to this mixture. In general, the two alleles of each contributor will produce peaks of roughly equal height. So, an analyst might infer that the two tallest peaks found at locus D8S1179, alleles 13 and 15, came from a single individual who contributed more DNA to the mixture than the other contributors, but that is not the only possibility. The peaks representing alleles 13 and 15 might also be taller because multiple individuals share those alleles. For example, a mixture of DNA from three individuals who have genotypes 10,13; 11,13; and 12,13 would produce a tall 13 peak and shorter peaks at 10, 11 and 12 (as seen in this electropherogram). So, possibilities of that nature must also be considered. Furthermore, individuals sometimes inherit the same allele from both parents. When DNA of these homozygous individuals appears in the mixture it produces a single peak of roughly double the height of a heterozygous peak. So, it is also possible that alleles 13 and 15 each came from a different homozygous individual. The number of combinations of possible genotypes that might account for the peaks observed at a given locus will vary depending on the number of peaks and the analyst's assumptions about the number of contributors.

This phenomenon is called "allele sharing."

Assessment of the genotypes of possible contributors is further complicated by the possibility that the test may have failed to detect all alleles of all contributors. Notice, for example, that only two alleles were detected at locus D22S1045. This might have occurred because the contributors to the mixture collectively have no alleles other than the two alleles that were detected (12 and 15). Another possibility, however, is that the test lacked the sensitivity to detect some of the contributors' alleles at this locus. A variety of factors affect the sensitivity of the test, including the degree to which the DNA has degraded due to age, environmental exposure, or conditions of storage. Moreover, there is variation among the loci in their overall sensitivity (i.e. , in the amount of DNA needed to produce a detectable allele) and in their susceptibility to loss of signal due to degradation.

Another complicating factor is that peaks are sometimes produced spuriously due to the presence of small amounts of contaminating DNA (this is called "allelic drop-in") or due to technical limitations of the PCR process that produce small false peaks adjacent to large peaks due to a phenomenon known as "stutter." In mixed DNA samples it is often impossible to distinguish false peaks produced due to allelic drop-in or "stutter" from true peaks originating from a minor contributor or a contributor whose DNA is somewhat degraded.

To evaluate various theories regarding the genotypes of contributors, the analyst must also take into account uncertainty about the number of contributors. For example, the theory that alleles 13 and 15 at locus D8S1179 each came from a homozygous contributor leaves five other peaks to be explained, which implies that there were three additional contributors (for a total of five contributors to the mixture). So the analyst will need to weigh the probability that there could be as many as five contributors when assessing the plausibility of this theory, which requires consideration of matters such as the number of alleles detected at other loci, taking into account, of course, uncertainty about the sensitivity of the test, whether all alleles were detected, and whether some of the peaks could be spurious.

The reason DNA evidence is so powerful can be illustrated using a simple case under ideal conditions. Assume that the DNA recovered at a crime scene is blood found on the sidewalk twenty feet from the body of a murder victim. Assume further that the blood is not a mixture but was left by a single individual. The electropherogram produced as a result of DNA analysis of that blood drop may, for example, show that the depositor of the blood stain had the following DNA profile at six identified loci: 17, 18; 12, 13; 10, 14; 13, 15; 9,11; and 7,14. In such a case, a suspect whose profile at the first locus is 16, 18, rather than 17, 18 is easily excluded as the depositor of the blood stain, whereas a suspect whose profile matches exactly the entire profile obtained has a high likelihood of being the depositor of the blood. In such a simple case it is easy to both understand the evidence and to appreciate its power to persuade the trier of fact.

But when the DNA mixture at issue in a particular case has, for example, four contributors, the complexity of analysis can become mind-boggling. By way of illustration, assume a DNA profile yields eight distinct alleles at one locus. Assuming there are four contributors to that mixture, each with two alleles at that particular locus, the number of potential combinations of alleles at that one locus is 28 (7+ 6 + 5 + 4 +3 + 2 + 1). That is, there are 28 different potential contributor profiles to explain the DNA at that one locus. But that is only one contributor and one locus. Even without accounting for the number of combinations for all four contributors, the number of potential combinations for one contributor across 10 loci becomes too large to identify by hand. If the number of combinations for one contributor at one locus is 28, the number across ten loci is 2810, a very large number.

Nor, as described above, is the complexity of the DNA mixture analysis merely defined by the number of contributors. Numerous things complicate that analysis, including allele sharing among contributors, homozygousness, allelic drop-out or drop-in, the presence of stutter, and the phenomenon of degradation. Moreover, DNA mixtures recovered from crime scenes and physical objects are far from pristine. The absolute amount of DNA present and the relative amount contributed by the various contributors can differ significantly. While modern computing has made analysis of such complex mixtures possible, the question before this Court is whether and in what circumstances is such an analysis reliable.

C. Probabilistic genotyping

The technology involved in this case is called probabilistic genotyping. "Probabilistic genotyping refers to the use of biological modeling, statistical theory, computer algorithms, and probability distributions to calculate likelihood ratios (LR) and/or infer genotypes for the DNA-typing results of forensic samples ...." Def. Ex. 1 (SWGDAM Guidelines). In other words, probabilistic genotyping systems use mathematical algorithms to perform a systematic assessment of how likely the observed DNA mixture is to occur under various assumptions about possible contributors. Special Master Report at 4-5, Docket No. 113. A probabilistic genotyping system is comprised of software and hardware with analytical and statistical functions that incorporate complex algorithms.

There are two methods of probabilistic genotyping—semi-continuous and continuous. Def. Ex. 22 at 5 (Forensic Science Regulator Guidance). Semi-continuous probabilistic genotyping methods use the "observed peak heights and incorporate a probability of allele drop-out and drop-in to explain missing or extra alleles." Id. However, such methods do not take into account other variables such as peak height ratios, mixture ratios, and stutter percentages in the calculation of the likelihood ratio. Id. In contrast, continuous methods utilize all peak heights and do not require the analyst to determine whether a given peak is allelic, stutter, or over-stutter but analyzes the observed information. Stated otherwise, the continuous method uses more of the information that is present in a DNA sample than the non-continuous method. Id.

Probabilistic genotyping systems generate what is known as a likelihood ratio. The LR compares how likely the observed DNA evidence is under hypotheses regarding the source of the DNA. Typically, though not always, the LR compares the probability of obtaining the observed DNA profile under the hypothesis that the defendant contributed to the DNA mixture being analyzed (the "inclusion hypothesis") with the probability under the hypothesis that the defendant did not contribute to the DNA mixture (the "exclusion hypothesis"). The likelihood ratio considers how likely the observed findings are under the two hypotheses. In the case of a complex, multi-contributor mixture, as in this case, the likelihood ratio is phrased as follows:

This mixture is greater than X times more likely if it originated from the defendant and three unknown unrelated individuals than if it originated from four unknown unrelated individuals.

In this case, the DNA profiles found on the gun were determined to be "greater than one billion times more likely if [they] originated from Kenneth Davon Lewis and three unknown unrelated individuals than if [they] originated from four unknown unrelated individuals." Gov't Ex. 1 at 2 (MRFL Lab Report). It is of critical importance to understand what the likelihood ratio represents and what it does not represent. Special Master Report at 44-45, Docket No. 113. The likelihood ratio is not a statistic of inclusion or exclusion. It does not measure the probability that the defendant in fact contributed to the DNA mixture. Rather, it measures how likely the observed mixture is to occur if the defendant is assumed to have contributed to it than if the defendant is assumed not to have contributed to it. Likelihood ratios greater than 1 support the hypothesis that the defendant is included in the DNA mixture; likelihood ratios less than one support the hypothesis that the defendant did not contribute to the analyzed mixture. Depending on the magnitude of the likelihood ratio, the support for one hypothesis or the other is stronger or weaker. In this case, the likelihood ratio of 1 billion: 1 is said to provide "extremely strong support for inclusion." Gov't Ex. 1 at 2. It is very important that the likelihood ratio be precisely communicated to the jurors lest it mislead them as to its significance.

Likelihood ratios are preferable to the old binary method that yielded a statistic described as a probability of exclusion or inclusion. These binary methods have been criticized as "an inadequately specified—and thus inappropriately subjective—method." PCAST 2016 Report at 78. SWGDAM has noted these historical methods of mixture interpretation consider all interpreted genotype combinations to be equally probable whereas probabilistic approaches provide a statistical weighting to the different genotype combinations. It is a misstatement of the LR in this case to say that it is a billion times more likely that the defendant contributed to his mixture than that he did not.

There are several probabilistic genotyping systems commercially available and in use at forensic laboratories. Def. Ex. 1. The two most frequently used systems are STRmix and TrueAllele. Id. at 80. STRmix predominates and is currently used in over 40 laboratories in the United States. Tr. I at 19; Tr. III at 413.

1. Determining the number of contributors

The application of the STRmix program starts with the analyst's interpretation of the DNA profile to assign an assumed number of contributors to the DNA mix. Ct. Ex. 1 at 2 ("A Guide to Results and Diagnostics Within a STRmix™ Report," hereafter "Diagnostics Guide"). The true number of contributors to a DNA sample taken from a crime scene is unknowable. See id. ; Tr. I at 62-63; Tr. II at 147; Def. Ex. 14 at 45 (MRFL Internal Validation Study). Therefore, the determination of the number of contributors relies upon the analyst's knowledge, experience, and expertise to provide his or her best estimate; it is, by definition, subjective. Ct. Ex. 1 (Diagnostics Guide). Certain well-recognized guideposts assist in this assignment. For example, analysts will find the locus with the maximum number of alleles present and, if appropriate, use that locus to determine the minimum number of contributors to the mixture. Tr. II at 147. If there are eight distinct alleles at one locus the operating assumption is that there are at least four contributors to the DNA mixture that is being analyzed.

A laboratory utilizing STRmix software must first analyze and identify its laboratory parameters. So, for example, certain laboratories will generate stutter peaks more frequently at some alleles than others. In addition, each laboratory will have a discernable stutter variation, a rate of allelic drop-in and drop-out, and other parameters. Knowing and applying these particularities allows the lab to "calibrate" its program for analysis. Tr. I at 11; Special Master Report at 22, Docket No. 113.

Gov't Ex. 1 at 2 (Analysts most commonly use this maximum allele count method in conjunction with peak-height information to arrive at an assumed number of contributors.).

2. Deconvolution

The next step in the STRmix process is "deconvolution," which begins by identifying all possible genotype sets that might reasonably explain the observed data, including consideration of allele drop-out and drop-in. Ct. Ex. 1.

STRmix uses settings for maximum allowable stutter ratios and maximum allowable peak heights to eliminate unreasonable genotype sets from further consideration. Ct. Ex. 1 at 2.

Next the deconvolution process begins to assess the probability of each of these potential genotype sets. To do this STRmix uses a complex mathematical simulation called the Markov Chain Monte Carlo (MCMC). MCMC is a well-known mathematical modeling process that is used to solve complex problems across a wide array of disciplines including weather forecasting, physics, and engineering. Tr. I at 8; Gissantaner , 417 F.Supp.3d at 866-67. MCMC uses an iterative process in which it proposes "solutions" in the form of specified genotype combinations (DNA profiles) and certain biological parameters. It then compares the proposed solution to the observed data to see how well it "fits." For each iteration, the genotype set and the mass parameters are used to generate an expected DNA profile (E). The expected profile generated from the proposal is then compared to the actual observed profile to determine how well the proposed values explain the observed data. STRmix runs eight "chains"—sets of proposals—simultaneously. Each chain generates its own expected profile (E), which is compared to the actual observed data.

These biological parameters—or "mass parameters"—are proposals as to (1) the DNA amount (template) for each contributor to the profile; (2) the level of DNA degradation for each contributor to the profile; and (3) amplification efficiencies for each locus within the profile. Ct. Ex. 1 at 3.

Biological models are employed for the modeling of expected stutter heights, expected allele heights, and the variance in peak heights, with peak height variability dependent on the kit, number of PCR cycles, and capillary electrophoresis instrumentation used.

The comparison of the observed data to the expected data generates what is called a probability density function, which reflects the relative likelihood that various genotypes explain the observed data. The probability density across all peaks in the profile is a measure of the model fit, that is, a measure of how well the proposed genotype set and parameters describe the actual electropherogram of the observed data. The higher the probability density, the better the proposal fits the observed data. At each iteration, the probability density is compared to the probability density of the previous iteration and the new proposal is either accepted or rejected based on that comparison—i.e. , does the new proposal better fit the observed data as measured by the probability density. If it does, the proposal is accepted and a new set of genotypes and mass parameters is proposed, and new probability densities are assigned and compared.

The developer of the STRmix software, Dr. Buckleton, likens this process to a game of "hot and cold." The MCMC starts at a location within the data that represents the proposed genotypes and mass parameters. It then takes a "step," (i.e. , proposes a different set of genotypes and mass parameters). If that step is "hotter" (that is, if it better fits the actual observed data), that step is accepted; if the step is "colder," the step is rejected. Steps are rejected because they are not getting closer to the object. Like the child's game of "hot and cold," the MCMC process moves the proposed solution gradually toward a set of genotypes and mass parameters that most accurately model the observed data from the DNA mixture.

Periodically STRmix will deliberately propose and accept a significantly colder step. It does this so it can assess whether the proposed solution is actually getting closer to the observed value. To illustrate, imagine a game of "hot and cold" being played in a three-story house where the object being sought is hidden on the second floor directly above the first-floor kitchen. If a child on the first floor moves toward the kitchen directly under the object being sought, his or her steps are getting "hotter"—that is, closer in three-dimensional space to the object. But, as long as that child remains on the first floor, circling around the kitchen trying to find the object, she is not actually getting closer to the hidden object. In order for the child (the MCMC) to succeed in getting to the observed value, she must take a large "cold step"—out of the kitchen toward the stairs to the second floor, for example—before she can take "hotter" steps that bring her to the hidden object. The process by which STRmix determines whether to accept or reject a proposed step is through application of an algorithm called the Metropolis-Hastings Acceptance Rejection Sampling. Tr. II at 310.

The MCMC deconvolution process proceeds in two phases. During the first phase, called "burn-in," the MCMC is run to ensure that in the second phase—called "post burn-in"—the MCMC begins in an area of high probability space. Burn-in begins in a randomly chosen genotype set and fixed mass parameters. Once 100,000 iterations (steps) have been accepted the MCMC is said to have "burned in."

Post burn-in begins with genotype and mass parameter proposals that already start from these higher probability densities. The post burn-in process is undertaken until the iterations also reach a set number of acceptances. The genotype sets accepted during post-burn-in are tallied. At the completion of the MCMC these values are normalized at each locus so that they range from zero (indicating that the observed data cannot be explained by the proposed genotype set) to one (indicating that this is the only genotype set that explains the data). Although mathematically unnecessary, the counts are normalized so that the value provides intuitively helpful diagnostic for analysts. These counts are described as "weights" and are the primary output of STRmix. As previously mentioned, STRmix's MCMC process runs eight chains—sets of proposals—simultaneously. These eight chains start in different locations (i.e., have different proposed genotypes and mass parameters) and walk through the iterations (the "hot and cold" game) separately but at the same time. When these separate chains converge it indicates that the proposed solutions are likely. This convergence statistic—called the Gelman-Rubin Convergence Statistic—is used as a measure of the consistency or repeatability of the output of the STRmix process. Tr. II at 175.

3. Calculation of the LR

Once the deconvolution process is completed the likelihood ratio may be generated. A likelihood ratio can be assigned for any person of interest whose DNA profile is known. As indicated, the likelihood ratio is a ratio of two probabilities that evaluates the evidence given two mutually exclusive propositions: . The probability in the numerator and denominator are numerical expressions of the observed data under the two competing hypotheses. That is, the numerator is the probability of finding that profile if hypothesis number 1 is true; the denominator is the probability of finding that profile if hypothesis number 2 is true. (Typically, hypothesis 1 is that the mixture includes DNA from the person of interest or suspect, whereas hypothesis 2 does not include that person as a contributor to the DNA mixture.) Def. Ex. 22 at 14. The probabilities are compared to each other and the LR is generated.

To summarize, STRmix uses MCMC to estimate the probability that particular genotypes are included in the observed DNA profiles, and then uses those estimates to evaluate two hypotheses: (1) that the mixture includes the defendant and (2) that it does not. It uses a likelihood ratio to describe how strongly the evidence supports one hypothesis over the other. As indicated previously, the likelihood ratio in this case is that the DNA mixture is "greater than one billion times more likely if it originated from Kenneth Davon Lewis and three unknown unrelated individuals than if it originated from four unknown unrelated individuals." This is the primary evidence the Government seeks to admit over Lewis's objection. He argues that it is not admissible because STRmix is not a reliable method and has not been reliably applied in this case.

CONCLUSIONS OF LAW

I. The Daubert Standard

Federal Rule of Evidence 702 governs the admissibility of expert testimony. The rule provides:

A witness who is qualified as an expert by knowledge, skill, experience, training, or education may testify in the form of an opinion or otherwise if:

(a) the expert's scientific, technical, or other specialized knowledge will help the trier of fact to understand the evidence or to determine a fact in issue;

(b) the testimony is based on sufficient facts or data;

(c) the testimony is the product of reliable principles and methods; and

(d) the expert has reliably applied the principles and methods to the facts of the case.

The Government, as the proponent of the expert evidence here, has the burden to prove its admissibility by a preponderance of the evidence. Lauzon v. Senco Prods., Inc. , 270 F.3d 681, 686 (8th Cir. 2001) (citing Daubert v. Merrell Dow Pharmaceuticals, Inc. , 509 U.S. 579, 592 n.10, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993) ).

The trial court acts as a gatekeeper to "ensure that any and all scientific testimony or evidence admitted is not only relevant, but reliable." Daubert , 509 U.S. at 589, 113 S.Ct. 2786. The Supreme Court recognized that expert evidence "can be both powerful and quite misleading because of the difficulty in evaluating it." Id. at 595, 113 S.Ct. 2786. Lewis objects both to STRMix's reliability as a method and to the method's application in this case. "In a case involving scientific evidence, evidentiary reliability will be based upon scientific validity. " Id. at 590 n.9, 113 S.Ct. 2786. Daubert identifies several non-exclusive factors for the Court to consider when assessing scientific validity: (1) whether the theory or technique can be (and has been) tested, (2) whether the theory or technique has been subjected to peer review and publication, (3) its known or potential rate of error, and (4) whether the theory or technique has been generally accepted. Id. at 593-94, 113 S.Ct. 2786. The Supreme Court emphasized that the "inquiry envisioned by Rule 702 is ... a flexible one." Id. at 594, 113 S.Ct. 2786.

Lewis has not challenged the evidence under either Rule 702(a) or (b).

In Daubert the Supreme Court also acknowledged the tension between law and science and the competing concerns about where to draw the line on admissibility of expert scientific evidence. On one side is the apprehension that setting the admissibility bar too low will mean "befuddled juries [will be] confounded by absurd and irrational pseudoscientific assertions." Id. at 595, 113 S.Ct. 2786. On the other side is the concern that setting the bar too high "will sanction a stifling and repressive scientific orthodoxy and will be inimical to the search for truth." Id. at 596, 113 S.Ct. 2786. The Court stated:

[T]here are important differences between the quest for truth in the courtroom and the quest for truth in the laboratory. Scientific conclusions are subject to perpetual revision. Law, on the other hand, must resolve disputes finally and quickly. The scientific project is advanced by broad and wide-ranging consideration of a multitude of hypotheses, for those that are incorrect will eventually be shown to be so, and that in itself is an advance. Conjectures that are probably wrong are of little use, however, in the project of reaching a quick, final, and binding legal judgment – often of great consequence — about a particular set of events in the past. We recognize that, in practice, a gatekeeping role for the judge, no matter how flexible, inevitably on occasion will prevent the jury from learning of authentic insights and innovations. That, nevertheless, is the balance that is struck by Rules of Evidence designed not for the exhaustive search for cosmic understanding but for the particularized resolution of legal disputes.

Id. at 596-97, 113 S.Ct. 2786.

If the evidence is admitted, the Court noted, vigorous cross-examination, presentation of contrary evidence, and careful instruction on the burden of proof are available as the "appropriate means of attacking shaky but admissible evidence." Id. at 596, 113 S.Ct. 2786.

II. Is the Expert Testimony in this Case the Product of Reliable Principles and Methods?

A. The PCAST Report and The Internal Validation Study

Lewis contends that the evidence is inadmissible because STRmix has not been shown to be a sufficiently reliable method for analyzing complex DNA mixtures of the type involved here. The underlying principles on which STRmix is based—the MCMC, the Hastings-Metropolis algorithm, and Bayesian statistics—are not themselves challenged as unreliable. Nor is the purely mathematic calculation of relative probabilities (i.e. , the likelihood ratio). Rather, Lewis challenges the manner in which STRmix applies these principles; STRmix, he asserts, does not have foundational validity—it does not produce reliable results that can be validated.

The Court's analysis of this question of foundational validity most properly starts with the 2016 report to President Obama by the President's Council and Advisors on Science and Technology (PCAST). In 2015 President Obama tasked PCAST to assess whether certain "feature-comparison" forensic methods have been scientifically established to be valid and reliable. Def. Ex. 2 (PCAST 2016 Report). "Feature-comparison methods" are methods that attempt to determine whether an evidentiary sample is or is not associated with a potential source based on the presence or absence of similar patterns, impressions, or other features. Examples include DNA, fingerprints, handwriting, tool marks, bite marks, shoe prints, tire tracks, and ballistic evidence. Id. at 1.

In order to be foundationally valid a forensic science method must be shown, based on empirical study, to be repeatable, reproducible, and accurate. Id. at 4-5. "Foundational validity, then, means that a method can, in principle , be reliable. It is the scientific concept we mean to correspond to the legal requirement, in Rule 702(c), of ‘reliable principles and methods.’ " Id. In review of DNA evidence, PCAST divided its inquiry into the validity of methods of analyzing single-source and simple DNA mixtures, and methods of analyzing complex DNA mixtures. This case involves a complex DNA mixture. As PCAST noted, "the fundamental difference between DNA analysis of complex mixture samples and DNA analysis of single-source and simple (two-person) mixtures lies not in the laboratory processing, but in the interpretation of the resulting DNA profile." Id. at 75. DNA analysis of complex mixtures—defined as mixtures with more than two contributors—is inherently difficult and even more so when the mixture contains only small amounts of DNA. The initial approaches to interpretation of complex mixtures relied on subjective judgments by examiners, which resulted in marked inconsistencies across examiners and laboratories in their interpretations. Id. at 76-78. In response to these instances, several groups launched efforts to develop probabilistic genotyping computer programs. "As of March 2014, at least eight probabilistic genotyping software programs had been developed (called LRmix, Lab Retriever, Like Ltd., FST, Armed Expert, TrueAllele, STRmix, and DNA View Mixture Solution), with some being open source software and some being commercial products." Id. at 78-79. As PCAST concluded:

These probabilistic genotyping software programs clearly represent a major improvement over purely subjective interpretation. However, they still require

careful scrutiny to determine (1) whether the methods are scientifically valid, including defining the limitations on their reliability (that is, the circumstances in which they may yield unreliable results) and (2) whether the software correctly implements the methods. This is particularly important because the programs employ different mathematical algorithms and can yield different results from the same mixture profile.

Id. at 79.

PCAST further noted:

Most importantly, current studies have adequately explored only a limited range of mixture types (with respect to number of contributors, ratio of minor contributors, and total amount of DNA). The two most widely used methods (STRmix and True Allele) appear to be reliable within a certain range, based on the available evidence and the inherent difficulty of the problem. Specifically, these methods appear to be reliable for three-person mixtures in which the minor contributor constitutes at least 20% of the intact DNA in the mixture and in which the DNA amount exceeds the minimum level required for the method.

Id. at 80 (emphasis supplied).

Thus, PCAST specifically found "at present, public evidence supports the foundational validity of analysis, with some programs, of DNA mixtures of three individuals in which the minor contributor constitutes at least 20% intact DNA in the mixture and in which the DNA amount exceeds the minimum required level for the method. The range in which foundational validity has been established is likely to grow as adequate evidence for more complex mixtures is obtained and published." Id. at 82.

TrueAllele and STRmix

Whether STRmix's foundational validity may now be extended beyond the limits PCAST identified in 2016 lies at the heart of this case. This case involves a DNA mixture in which the assumed number of contributors is four, rather than three, and in which the minor contributor is estimated to have contributed only 6% to the DNA mixture. Gov't Ex. 1 (MRFL Lab Report). The prosecution contends that the foundational validity of STRmix has been established for such mixtures; the defense contends it has not.

When the PCAST Report was issued it defined the path forward as requiring the peer-reviewed publication of validation studies applying STRmix to more complex DNA mixtures. The prosecution asserts that such studies have been undertaken and published in peer-reviewed scientific journals. The defense contends that the studies that have been done do not comply with PCAST's recommendations. In direct response to PCAST, the developers of STRmix along with 31 forensic laboratories published a study entitled Internal Validation of STRmix—a Multi-Laboratory Response to PCAST (The STRmix Internal Validation Study). Gov't Ex. 16. The STRmix Internal Validation Study purports to address how well the STRmix method performs (1) as a function of the known number of contributors to a DNA mixture, (2) when the number of contributors to the mixture is unknown, (3) as a function of the number of alleles shared among individuals who contributed to the DNA mixture, including when the mixture includes related individuals, and (4) as a function of the absolute and relative amounts of DNA from the various contributors. Id. at 12.

The Government also relies on a second validation study that assessed STRmix's performance analyzing 277 DNA mixtures prepared by the FBI's laboratory, which involved between two and five contributors in varying proportions. Special Master Report at 30-31, 33, 47, Docket No. 113. The authors of this study concluded that STRmix "is fit for the interpretation and statistical assessment of ... mixtures originating from two, three, four and five individuals." Id. at 9, citing Moretti, T.R., Just, R.S., Kehl, S., Willis, L.E., Buckleton, J.S., Bright, J., Taylor, D.A. & Onorato, A.J. (2017). Internal validation of STRmix for the interpretation of single source and mixed DNA profiles. Forensic Science International: Genetics , 29:126-144.

Lewis's critique of these studies focuses primarily on their lack of independence. Specifically, he argues that The STRmix Internal Validation Study violates PCAST's recommendation that "appropriate evaluation of the proposed methods should consist of studies by multiple groups, not associated with the software developers , that investigate the performance and define the limitations by testing on a wide range of mixtures with different properties." Def. Ex. 2 at 79 (PCAST 2016 Report) (emphasis supplied). Here, Lewis asserts, The STRmix Internal Validation Study fails because the underlying analysis was performed by ESR, the software developer. Indeed, the primary authors of the study were the developers of STRmix, Dr. Buckleton, Dr. Jo-Anne Bright, and Mr. Duncan Taylor. Gov't Ex. 16. If PCAST's recommendation is construed to mean that foundational validity may only be established by validation studies strictly divorced from the software developers, then The STRmix Internal Validation Study does not meet that level of independence. As one critique put it, "[w]ithout independent [verification and validation], courts will only have the self-interested assurance of the software developers themselves that the system works properly ... those assurances are not good enough." Def. Ex. 27 at 1.

However, PCAST's independence recommendation is not quite so absolute. In a later section of the report, PCAST also wrote, referring to the very language just quoted, "as noted above, such studies should be performed by or should include independent research groups not connected with the developers of the methods and with no stake in the outcome." Def. Ex. 1 (emphasis supplied). Dr. Buckleton opined that The STRmix Internal Validation Study satisfies the PCAST requirement because 31 independent laboratories were involved in the study that produced the report. Tr. I at 47, 94-95.

To assess whether these studies are sufficiently independent to satisfy the PCAST recommendation requires analysis of the study design. In The STRmix Internal Validation Study, the participating forensic laboratories prepared DNA mixtures (with varying numbers of contributors in varying proportions) and processed those mixtures in their laboratories to produce the electropherograms. Gov't Ex. 16 at 12. ESR was not involved in the process of preparation, amplification or electrophoresis. Id. Nor was ESR provided the details regarding the mixtures prior to running the STRmix analysis on those DNA mixtures. Id. The question thus is whether the involvement by ESR in the analysis of the data submitted by the 31 labs so taints the independence of the study as to call its validity into question.

The Court finds that The STRmix Internal Validation Study is sufficiently independent to satisfy PCAST's recommendation. First, the PCAST report by its own language permits the involvement of the software developers when it states such studies "should include independent research groups" (emphasis supplied). This language expressly refers back to the statement that seemingly disallowed any involvement by the STRmix developer. Regardless, this sentence indicates that PCAST recognized that the potential involvement of the software developers in further foundational validity studies did not per se negate the validity of those studies. This makes practical sense for, as Dr. Buckleton testified, no forensic laboratory or independent body would have sufficient capability or interest to undertake and publish such a large, broad-based study. Tr. I at 48-49; Gov't Ex. 16. Only by combining the results of many different laboratories utilizing hundreds of different DNA mixtures is it possible to generate a large and novel study, such as The STRmix Internal Validation Study, that might appear in a peer-reviewed publication.

Moreover, Dr. Buckleton testified that he discussed this very critique with PCAST and that PCAST agreed that The Internal Validation Study, if published, would satisfy PCAST's concerns. Tr. at I at 47. The PCAST 2016 Report directly references its discussion with Dr. Buckleton. Def. Ex. 2 at 81 n.219.

The question is really what value is being served by assuring that future validation efforts are independent from the software developers, and does The STRmix Internal Validation Study provide sufficient independence to meet that goal. The reason for insisting that testing be independent is to ensure that the results are reliable and not improperly biased by the developers, either deliberately or subconsciously. The STRmix Internal Validation Study meets that goal because ESR had no role in or knowledge of what comprised the DNA mixtures they analyzed. ESR's role was limited to running the STRmix program on the data generated by the 31 labs using each lab's parameters. Gov't Ex. 16 at 12. In short, it was a blind study. It is unclear how ESR could inappropriately influence the STRmix output, and no questions were raised about the qualifications of ESR personnel who analyzed the data. Moreover, the reliability of the interpretation of that output is reasonably established by the process of comparing that output to the "ground truth" of each lab's DNA mixtures, which was done, and to a lesser extent by the process of peer review itself.

But, the defense asserts, The STRmix Internal Validation Study (and the Moretti Study) do not satisfy PCAST's concern for independence because the labs that were involved also were not "independent research groups ... with no stake in the outcome." All 31 laboratories and the FBI laboratory use STRmix. Thus, by definition, Lewis argues, the labs and ESR have a mutual stake in proving the foundational validity of the STRmix software in which they have all invested considerable time, money, and effort.

This criticism, though a fair one, is ultimately unpersuasive. It is true that the submitting labs all have a "stake" in the outcome of The STRmix Internal Validation Study and the Moretti Study. However, by dividing the process of creating the input (the DNA mixtures and electropherograms) from the process of generating and interpreting the output, the study's design provides a reasonable level of assurance of its reliability. The process of peer review of the study, as mentioned, provides further assurance of its bona fides. Unless one posits some deliberate fraud in the studies—for which there is simply no evidence—the Court finds it is sufficiently independent to comply with PCAST's direction to establish the foundational validity of STRmix for analyzing complex DNA mixtures involving more than three contributors. Moreover, each of the 63 labs using STRmix worldwide has performed its own internal validation study (to generate its lab-specific parameters), and the mathematics have been repeated by hand by the U.S. Army and the California Department of Justice, all of which supports the foundational validity of STRmix. Tr. I at 25. The precise limits of that foundational validity—how far it is extended by this study—are less clear. The Court addresses that topic later in its analysis of whether the method is reasonably applied in this case.

The Court recognizes that the process of peer review is not a panacea. There are numerous reports of peer-reviewed articles that have appeared in scientific journals only to be later retracted, either due to fraud or faulty science. See, e.g. , Study Finds Misconduct Widespread in Retracted Scientific Papers , New York Times, Oct. 1, 2012; Castillo, M., The Fraud and Retraction Epidemic, Am. J. of Neuroradiology, Sept. 2014, 35(9): 1653-1654.

Lewis also contends that, even if The STRmix Internal Validation Study establishes the foundational validity of STRmix as a method, it does not establish foundational validity in this case because The STRmix Internal Validation Study utilized STRmix version 2.5.02 whereas MRFL utilized a prior version of the STRmix software, version 2.4.05. Therefore, the defense argues, the method of STRmix v.2.4.05 has not been established as reliable by The STRmix Internal Validation Study. That validation study cannot "retroactively" validate an earlier version of the software it did not test. This too is a fair criticism, but also unpersuasive. Dr. Buckleton testified that, even though The STRmix Internal Validation Study involved a later version of the software, the updates to the software from version 2.4 to 2.5 changed nothing fundamental in the program that would be likely to affect its accuracy. See, e.g. , Tr. I at 28; Tr. III at 483-95, 534; Special Master Report at 29 (summarizing testimony), Docket No. 113. Moreover, ESR ran various diagnostics to establish that prior iterations of the software were nonetheless valid. Tr. III at 515. Further, ESR has publicly released a summary of its software miscodes, which identifies all known potential defects in the software that could potentially affect the likelihood ratio. Tr. I at 28; Def. Ex. 28 (Summary of Miscodes). There are only two software miscodes that relate to version 2.4.05 utilized by MRFL. Id. Miscode Number 4 involves drop-in modeling and forward-stutter modeling that produced "no detectable affect on the likelihood ratio in a number of trials." Def. Ex. 28. Therefore, that miscode provides no basis on which the Court might find that STRmix v.2.4.05 is not foundationally valid.

Miscode Number 5 is also present in versions preceding version 2.5, including version 2.4.05. Miscode Number 5 comprises three separate miscodes, 5.1, 5.2, and 5.3. Miscode 5.1 relates to mixed DNA profiles when there are multiple contributors who are considered persons of interest under the inclusion hypothesis but who are considered unknown under exclusion hypothesis. Miscode 5.3 deals with a "minor anomaly in the familial search LR." Neither scenario is involved in this case—there were not multiple contributors who were considered under the inclusion hypothesis who were unknown under the exclusion hypothesis, and there was no familial search conducted with respect to this DNA mix. Therefore, miscodes 5.1 and 5.3 do not apply.

Miscode 5.2. however, involved a change to the way drop-in alleles are assigned within the determination of the genotype array in the pre-burn phase of STRmix. While that change will affect all likelihood ratio calculations included within the database search, those changes to the likelihood ratio were usually less than 1 order of magnitude. Def. Ex. 28.

Finally, MRFL also undertook its own internal validation study of STRmix version 2.4.05. Def. Ex. 14 (MRFL Internal Validation Study). This study provided further support for the foundational validity of STRmix v.2.4.05 as used in that lab. Lewis asserts this lab validation is inadequate because the computational work was actually performed at ESR. Def. Br. at 14-16, Docket No. 108. But as defense expert Dr. Krane also acknowledged, the concern was more pedagogical in nature than it was a concern that the validation itself was flawed. Tr. III at 380-82.

B. The absence of standards

Lewis also challenges the foundational validity of STRmix by arguing it cannot be a reliable method because it cannot be shown that STRmix complies with applicable standards. Lewis correctly points out that there is no one standard that governs the foundational reliability of probabilistic genotyping software systems. There are, however, three published guidance documents that specifically pertain to probabilistic genotyping systems.

In 2015 the Scientific Working Group on DNA Analysis Methods (SWGDAM) published "Guidelines for the Validation of Probabilistic Genotyping Systems." Def. Ex. 1 (SWGDAM Guidelines). SWGDAM is a consortium of approximately 50 scientists representing federal, state and local forensic DNA laboratories in the United States and Canada. Id. SWGDAM Guidelines provide "[d]evelopmental validation may be conducted by the manufacturer/developer of the application or the testing laboratory. Developmental validation should also demonstrate any known or potential limitations of the system." Id. at 5. The SWGDAM Guidelines provide a number of steps in the validation process. STRmix complies with the SWGDAM Guidelines. Tr. I at 26; Tr. III at 370; Tr. III at 471.

Pronounced "swigdam."

A second guidance document is "Software Validation for DNA Mixture Interpretation" published by the Forensic Science Regulator. Def. Ex. 22 (FSR Guidance). The Forensic Science Regulator is the government official in the U.K. who regulates forensic science activities within the U.K.'s legal system and ensures forensic science services are subject to appropriate scientific quality standards. www.gov.uk/government/organisations/forensic-science-regulator (last visited Dec. 26, 2019). The FSR Guidance contains more detailed requirements than the SWGDAM Guidelines for validation of probabilistic genotyping software. STRmix complies with the FSR Guidance. Tr. I at 26.

Finally, the International Society for Forensic Genetics (ISFG) published guidelines in 2016 for the validation of software for probabilistic genotyping. Def. Ex. 21 (ISFG Guidance). The ISFG is an international non-profit scientific society whose aim to "to promote scientific knowledge in the field of genetic markers." www.ISFG.org (last visited Dec. 23, 2019). The ISFG Guidance proposes minimum requirements for validation and addresses both developmental and internal validation. Def. Ex. 21. STRmix complies with the ISFG Guidance as well. Tr. I at 27.

Lewis contends, however, that STRmix's reliability as a method is not established by compliance with these guidance documents. Rather, he asserts, to be foundationally valid STRmix must undergo and comply with the stringent verification and validation process specified by IEEE for safety critical systems. The IEEE standards have been applied across an array of disciplines and vary in their stringency depending on the consequences of a software failure. When the consequences of such a failure are severe (e.g. , loss of life or liberty) the IEEE standards require that validation be performed by people who are completely independent of the software developers. Def. Ex. 20 (IEEE Standard). In part this is done because, according to Dr. Heimdahl, the purpose of validation is to test and establish the limits of the software and discover when it fails. Tr. III at 583-85. That is, the persons who validate the software should set out to "break" it, a task that the developers will resist, either consciously or subconsciously. Tr. III at 585. Though the parties debate which IEEE classification the STRmix software belongs in, Dr. Buckleton testified he thought it fair to evaluate STRmix according to the highest—safety critical—IEEE standards, even though by IEEE's own definition he believes STRmix falls into a lower classification level. Tr. III at 465.

The Institute of Electrical and Electronics Engineers.

As the special master noted, the defense experts seemed to assert that STRmix did not fully comply with SWGDAM, FSR or ISFG guidelines, but their main objection was that STRmix should be validated under the IEEE standards. Special Master Report at 11-12, Docket No. 113. How, specifically, STRmix allegedly fails to comply with these guidelines the experts could not say. Tr. II at 345; Tr. III at 396.

Notwithstanding, Dr. Buckleton testified that STRmix "very nearly" complies with the IEEE's most stringent safety critical standards. Tr. I at 108. The Court finds that STRmix's lack of strict compliance with this standard does not render it unreliable as a method. First, STRmix complies with all three guidance documents that were specifically adopted for probabilistic genotyping systems, and there is no testimony that probabilistic genotyping software must comply with the IEEE standard. Moreover, the IEEE standard document itself acknowledges that "the existence of an IEEE standard does not imply that there are no other ways to produce, test, measure, purchase, market, or provide other goods and services related to the scope of the IEEE standard." Def. Ex. 20 at 3; Tr. II at 340. Further, Lewis's experts' principal concern with STRmix's lack of compliance with IEEE is that it creates uncertainty as to the limits of the software to produce reliable results, Tr. III at 583-85, not its reliability in general. Dr. Krane testified that the post-PCAST validation studies fail to demonstrate when STRmix might fail to produce accurate results. It may be particularly vulnerable where there is uncertainty about the number of contributors or where there are large differences in the proportion of DNA among the contributors. While this criticism (i.e. , that STRmix may prove brittle in certain unidentified extreme circumstances) deserves consideration (see, e.g. , Def. Ex. 22 at 25), the failure to "break" the system in order to reveal its limits goes more to the question of its reliability as applied in this case than to its reliability as a method. The rigorous testing that STRmix has been subjected to has shown that "STRmix not only works, it seems to work extremely well, at least when used in the manner" used in the post-PCAST validation studies. Special Master Report at 31, 33, Docket No. 113.

Cf. People v. Lopez , Indictment No. 3927/16 (N.Y. Cty. Sup. Ct. Apr. 27, 2018 Decision and Order) at 3, Docket No. 34-7, discussing evidence submitted by ISFG that it rejects the claim that the computer science community must validate the statistical software underlying STRmix.

C. Absence of ground truth

Lewis contends that STRmix is unreliable because there is no "ground truth" regarding the likelihood ratio it produces. Def. Br. at 5, Docket No. 110. That is, there is no knowable, precise likelihood ratio based on the MCMC. This is a true statement. The likelihood ratio is based on highly complex statistical and probability modeling of potential distributions within a massive data set. Even though precise numbers can be generated, the precise number is not "true" in the sense that there is only one likelihood ratio. Indeed, it is well known that if the MCMC process of STRmix is run multiple times on the same exact DNA mixture sample, it will yield different likelihood ratios each time, though the differences will be small. Tr. II at 151, 170.

Lewis's assertion that the absence of a "ground truth" and the variability of LRs render it impossible to verify STRmix's reliability as a method appears to have two components. First, because there is no absolute ground truth of a likelihood ratio, STRmix cannot be checked against itself to determine whether the method is repeatable—that is, that STRmix always gets to the same answer. Second, because there is no absolute ground truth LR and because the program will produce variations in the LR that is generated, the process's accuracy cannot be validated and it is, by definition, unreliable.

The first criticism—that the STRmix software does not generate the same answer even when it is given the same input and is thus not repeatable—is unfounded. The STRmix program, during the validation process, is run against a "seed value" where the precise DNA profile and parameters are set. When run against this "seed value" STRmix returns the exact same likelihood ratio each time. Tr. 1 at 82. This is further evidenced by MRFL's internal validation in which STRmix was run on the lab's three separate computers against a seed value and, in each instance, returned the exact same LR. Def. Ex. 14 at 1 (MRFL Internal Validation Study). Thus, STRmix does produce consistent answers and is repeatable. See id.

Lewis's second concern—that the LR that is generated when STRmix is run "in the field" is variable and therefore cannot be verified as true—though important, is ultimately unfounded. The variation in the precise LR numbers generated by STRmix from one run to the next is very small. The STRmix validation studies have demonstrated that the variance in LRs is consistently less than an order of magnitude. Tr. II at 151, 170. That is, the LRs are in fact consistent and are not materially variable. Further, the likelihood ratios presented in court are not intended to be presented as exact numbers, but rather are described in very broad ranges. In MRFL's system the likelihood ratio ranges are very broad: >1,000,000,000; 1,000,000,000 – 1,000 000; 1,000,000 – 1,000; 1,000 – 100, 100 – 10; and 10-1. These ranges are broad enough to overcome minor variation or imprecision in the precise number generated. At the smallest number range (e.g. , one to ten) the variability is handled by the verbal equivalent of that scale, which states that the results are simply uninformative. See Gov't Ex. 1. The absence of a "ground truth" and the minor variability in the precise LR generated is in fact noted by the Forensic Science Regulator, whose guidance states:

In the event that a likelihood ratio is generated that is on the cusp of one range, the Court could, in its discretion, appropriately limit the testimony to the range immediately below or simply exclude numeric testimony altogether.

6.2.14 Some statistical methods such as maximization methods and Markov Chain Monte Carlo (MCMC) do not generate precisely the same number each time the same calculation is repeated. This is not a problem, as long as the variation of the numbers does not affect the number reported in court, and provided that the user is:

(a) fully trained in the use of the software; (b) aware of the tradeoff between complexity, information content run time and precision; and (c) able to explain those issues in layman's terms.

6.2.15 Meaningful precision should be reported by the software during validation overall, it is necessary to be aware that absolute precision in the evidential weight presented to a court is not necessary. In many situations, an order of magnitude for the LR is sufficient. For example, ‘1 Billion’ is suited to court use; ‘1.135 Billion’ is unlikely to be a justifiable level of precision and would in any event add no extra value in a court setting. See also § 7.9.1c.

Section 7.9.1d further elucidates:

The end product is an assessment of weight of evidence, via a LR. There is no ‘correct’ value for the LR. There is a question with regard to the precision with which the LR should be given and experience from across the spectrum of forensic science suggests that a ‘ballpark’ figure is all that is required. This view is exemplified in existing policy, agreed among all [U.K. forensic labs] to round any calculated LR greater than [1,000,000,000]; so if a sensitivity analysis on a particular case yields LRs [greater than 1,000,000,000], this is of little more than academic interest. For LRs of more modest magnitudes, there is no reason to believe that anything more precise than an order of magnitude for the evidential weight is needed. Furthermore, whereas the scientist would be expected to consider issues of sensitivity there is no requirement that he/she should provide a range of LRs to the court.

Def. Ex. 22 at 30.

Similarly, the SWGDAM Guidelines recognize this phenomenon:

3.2.3.1 Some probabilistic genotyping approaches may not produce the same LR from repeat analyses. Where applicable, these studies should therefore demonstrate the range of LR value that can be expected from multiple analyses of the same data and are the basis for establishing an acceptable amount of variation in the LRs.

3.2.3.2 ... increasing the number of MCMC iterations can reduce variation in the likelihood ratio.

Def. Ex. 1 at 6-7.

STRmix runs eight separate chains during its analysis. Moreover, during the burn-in process it requires a minimum of 100,000 acceptances, and post burn-in it requires a minimum of 50,000 acceptances. MRFL's Internal Validation Study does report the range of expected LR values. See Def. Ex. 14. In this case the MRFL's analysis utilized 1,000,000 "Burn-in Accepts," and 5,000,000 "MCMC Accepts" with over 99 million total iterations. Gov't. Ex. 1 (MRFL Lab Report). In short, the absence of a "ground truth" for the LR resulting from STRmix does not render the method unreliable.

D. Error rate

Lewis castigates STRmix as unreliable because it seemingly does not have a known error rate, either associated with the LR or, more broadly, the percentage of time the LR leads to a false inclusion or a false exclusion. Def. Br. at 20-21, Docket No. 108. Lewis points to the "black box" quality of STRmix to argue that, because there is no known error rate and because it is a black box, there is no way to check the validity of any result offered in court. To the extent this critique is directed to the precision of the LR, it is addressed above. To the extent it asserts a "black box" character of STRmix, it is unfounded. As Dr. Buckleton testified, numerous diagnostics render STRmix's internal processes transparent and reviewable. Tr. III at 374-75.

To the extent that there is no published or known precise error rate with respect to the risk of false inclusion or false exclusion, the criticism is not unjustified. However, as to false inclusion, Dr. Buckleton testified that the error rate is "immeasurably small." Tr. I at 64; Tr. I at 49 ("The error rate is somewhat less than one over the LR."); Special Master Report at 33-34, Docket No. 113. Moreover, changes in the number of contributors or deviation of the assumed number of contributors from the real number of contributors, reduces the likelihood of a false inclusion. Tr. I at 100; Special Master Report at 32, Docket No. 113; Gov't. Ex. 16 (The STRmix Internal Validation Study).

In addition, as the special master concluded, "[w]hile there were a few instances in which STRmix produced results that falsely linked non-contributors to the mixtures, these misleading results were rare and occurred no more often than would be expected by chance due to adventitious (coincidental) similarity between DNA profiles of different individuals. In other words, the rate of false inclusions was approximately what would be expected if STRmix performed its function flawlessly." Special Master Report at 8, Docket No. 113.

In short, the absence of a precisely calculated error rate (because there is no precise ground truth LR) is not the same as saying there is no known error rate. The error rate for false inclusion is known and is acceptably small. Therefore, STRmix is not an unreliable method based on that criticism. The rate of error can be and has been estimated by checking how often the program assigns highly incriminating likelihood ratios to the profiles of known non-contributors. Errors of that type can and do occur, but have been shown (in The STRmix Internal Validation Study) to occur at a rate that is acceptably small. Therefore, STRmix is not an unreliable method based on that criticism.

Concern regarding the rate of false exclusions , however, is different and is addressed more fully below.

E. General acceptance

Lewis contends that STRmix is not a reliable method because it has not gained general acceptance within the relevant scientific community. Much of his critique revolves around the definition of "relevant scientific community." Currently, STRmix is used in over 75 percent of the forensic laboratories in the United States (Tr. I at 21) and over 60 forensic laboratories worldwide. Tr. I at 25. The federal forensic laboratories in the United States that use STRmix include the FBI forensic laboratory and the U.S. Army Criminal Investigation Laboratory. Id. In short, "[Probabilistic genotyping] methods are now accepted and are in widespread use by the forensic community." Ct. Ex. 1 at 2; Tr. II at 142, 150.

Lewis, however, argues that widespread use in forensic laboratories does not satisfy the general acceptance criterion of Daubert because forensic scientists are not disinterested parties (they may have an interest in promoting techniques they have chosen to use) and they may lack the knowledge and breadth of experience needed to fully assess a complex program like STRmix. His expert, Dr. Krane, argues for a broader definition of "relevant scientific community" to include not merely forensic scientists but molecular biologists, statisticians, and most importantly software developers. Tr. III at 390-92. There is no evidence to establish general acceptance in that broadly defined scientific community.

To the extent that acceptance equates to use, there is no doubt that STRmix has gained general acceptance. It is the most widely used probabilistic genotyping software today. Tr. I at 21. Moreover, the underlying science, MCMC, Hastings Metropolis Algorithm, and Bayesian Theory are all generally accepted in many scientific disciplines. While certain software developers and some forensic scientists have asserted STRmix is not reliable, see Def. Exs. 26, 27, the Court finds that STRmix has gained general acceptance within the relevant scientific community. "General acceptance" does not require unanimity or uniformity, nor does it require that the science be without critique. See United States v. Bonds , 12 F.3d 540, 562 (6th Cir. 1993) ; see also Abarca v. Franklin Cty. Water Dist. , 761 F.Supp.2d 1007, 1038 (E.D. Cal. 2011) (reasonable scientific disagreement among experts did not make evidence unreliable and inadmissible but instead goes to credibility and weight of the expert opinion). STRmix's widespread use in forensic laboratories worldwide is evidence that it has gained general acceptance as a reliable method of DNA analysis, at least in certain circumstances.

In State v. Lopez , the New York trial court found, under the Frye standard, that "the relevant scientific community is the forensic DNA science community ... made up of scientists who are directors of and analysts working in forensic laboratories, experts in fields underlying forensic DNA science (such as molecular biology and genetics), and members of the governing bodies created to develop and set standards for DNA laboratory work and oversight of DNA laboratories." Order at 3, Docket No. 34-7.

F. Peer review

As to the Daubert criterion whether the principles and methods have been subject to peer-review publication, the Court finds ample evidence of such peer review. The parties have introduced a great deal of peer-review literature regarding probabilistic genotyping and STRmix. Indeed, Dr. Buckleton has supplied a list of 47 such articles appearing in peer-reviewed journals. Gov't Ex. 3. Most of this literature describes STRmix as a reliable method. Id. However, Lewis argues that many of the articles were written by, or included, the developers of the software and/or the laboratories that have a stake in its admissibility. While true, one purpose of subjecting articles to peer review is to (hopefully) ferret out unreliable science. Because the authors are well known to have an interest in the outcome, the process of peer review acts as a check and balance on that interest. Lewis's argument that certain reviewers have questioned the reliability of STRmix in certain ways, though important, does not undermine its general acceptance by the scientific community or its reliability as a principle and method.

III. The Reliability of STRmix as a Method: The Midwest Regional Forensic Laboratory

To be admitted, the STRmix evidence in this case must not only be based on reliable principles and methods, the method must have been reliably applied. Thus, even assuming STRmix is capable of producing reliable results, the next question is whether MRFL is capable of using that software in a manner that produces reliable results. As discussed below, the evidence establishes that MRFL is capable of using the STRmix software in a manner that produces reliable results and did so in this case.

First, in order to properly model the data and compute the probabilities, STRmix must have information about the operating characteristics of the particular laboratory's DNA extraction, amplification, and genetic analysis equipment and software. Each lab's equipment will generate parameters (e.g. , allele drop-out frequency, stutter variance, etc.) that are specific to that laboratory. The MRFL went through its own internal validation utilizing software version 2.4.05 to establish its parameters. Def. Ex. 14 (MRFL Internal Validation Study); Special Master Report at 39, Docket No. 113. Using a seed value, the internal validation established that the trained MRFL analysts will produce the same result regardless of which of MRFL's three computers is used to run the program. Def. Ex. 14 at 1. Moreover, MRFL's internal validation was used to establish the laboratory's parameters for stutter variance, back stutter, forward stutter, allele drop-in, and allele drop-out. Tr. II at 162-63, 180-84, 243-48. These lab parameter values were then programmed into the STRmix software at MRFL so that analyses are run utilizing these values. Using the lab's parameters, ESR's analysis also verified the accuracy of the results as compared to known DNA mixture samples. Def. Ex. 14. Thus, though Lab Director Ciecko was unable to answer a number of questions at the time of her testimony, it is evident from the hearing that MRFL is trained on the software and is able to utilize the STRmix program to produce reliable results. Defendant Lewis makes no serious argument to the contrary.

In sum, for all the foregoing reasons, the Court finds that STRmix is a reliable principle and method. The STRmix Internal Validation Study published in 2018 satisfies the criteria set forth by PCAST for expanding its foundational validity to mixtures of four and five persons, assuming sufficient DNA material is present and that a sufficient percentage of that DNA was contributed by the most minor contributor. However, as shown below, there are aspects of STRmix as applied to this case that render certain evidence it has generated unreliable and inadmissible.

IV. Has the STRmix Method Been Reliably Applied in this Case?

In order to assess the reliability of STRmix as applied in this case, it is useful to review some of the essential facts. First, the DNA mixture recovered from the gun was processed as a four-person mixture. MRFL was only able to analyze three of the five swabs taken from the gun; two of the swabs did not yield sufficient DNA to permit testing. As to the three that did, all three resulted in a likelihood ratio that provided "very strong support" for inclusion of Lewis as a contributor. The profile observed in each swab was estimated to be "greater than one billion times more likely if it originated from Kenneth Davon Lewis and three unknown unrelated individuals than if it originated from four unknown unrelated individuals." Gov't Ex. 1 (Lab Report). Of the four assumed contributors to the mixture on the gun, the lab further determined that the contributor whose profile was consistent with that of Lewis was the major contributor, having contributed 56% of the total DNA recovered from the gun. See Gov't Ex. 1; Tr. II at 166-67. The lowest presumed contributor, Contributor Number 4, contributed only 6% of the total DNA found in the mixture. This minor contribution is well below the 20% minimum contribution threshold identified in the 2016 PCAST report. Def. Ex. 1 at 82 (PCAST 2016 Report).

The MRFL lab notes (Gov't Ex. 20) indicate that one of the three mixtures (10-A) was analyzed as a mixture of "3 or more" contributors but was reported in the lab report (Gov't Ex. 1) as a mixture of four contributors (Gov't Ex. 1).

The DNA report also states that, as to each of the three tested swabs, all relevant police officers and the property owner involved in the scuffle were excluded as contributors to the DNA mixture. The report provides no further detail regarding that exclusion, including what, if any, likelihood ratio was generated to support that conclusion.

While it is well understood that likelihood ratios less than one are exculpatory (i.e. , they support the conclusion that the person in question was not a contributor), there does not appear to be agreement in the scientific community about how low the LR must be to warrant the conclusion that the person is "excluded" as a potential contributor.

In addition, MRFL is incapable of processing a five-person DNA mixture because the lab's computing capabilities are insufficient. When a five-person mixture is analyzed at the lab using STRmix, MRFL's computers crash. Tr. II at 141, 244; Def. Ex. 14 (1/26/17 Memo). Recognizing this limitation, the lab does not process five-person mixtures as if they were simply four-person mixtures, but rather, has adopted a policy that "mixtures with an assumed number of five contributors will not be interpreted until capable computers are available. Major contributors within mixtures with an assumed number of five contributors can still be interpreted." Def. Ex. 14 (1/26/17 Memo). The number of contributors is not determined by STRmix, but is an input to the STRmix program determined by the lab analyst. Tr. I at 10-11. The DNA mixture in this case was processed because the lab analyst determined the assumed number of contributors to be four.

The computing capabilities required by STRmix are breathtaking. Even ESR, the software developer, lacks the computing capabilities to process a six-person mixture, even though the STRmix software itself has the ability to analyze such a mixture. Tr. II at 245.

Identified in the lab report as emmoriar.

The evidence at the hearing demonstrated that MRFL has a high likelihood of underestimating the number of contributors to a DNA mixture. See Tr. II at 229-36, 268. The MRFL Internal Validation Study demonstrated that, of the three analysts trained to perform STRmix analysis, the likelihood that any of these analysts would underestimate the number of contributors is between 55-65%. Def. Ex. 16. That is, every lab analyst at MRFL is more likely to underestimate the number of contributors than to get it right. Id. The lab analyst who analyzed the mixture at issue in this case, Erin Moriarity, underestimated the number of contributors 60% of the time. Id. In addition, studies, including The STRmix Internal Validation Study and the MRFL Internal Validation Study, establish that underestimating the number of contributors leads to false exclusions. Def. Ex. 3; Special Master Report at 32, Docket No. 113. Thus, while underestimation of the number of contributors tends to have little effect on the LR for inclusion, it tends to "provoke" exclusion of known contributors. Gov't. Ex. 16 at 15, 16, 18, 22; Tr. I at 100, 103. The error rate for such false exclusions is not reported. Def. Ex. 16; Def. Ex. 26 at 1.

Or, as Dr. Buckleton testified generally, "we have a habit of underestimating the number of contributors in sixes." Tr. I at 103.

A. Evidence regarding inclusion of Lewis

Turning first to the evidence for inclusion of Lewis as a contributor to the DNA mixture analyzed in this case, the Court finds this evidence admissible. First, this mixture is well within the reported limitations of STRmix—it involves four assumed contributors, the DNA analyzed was of sufficient quality and quantity, and the major contributor's contribution percentage of 56% is well above even the 2016 PCAST threshold for reliability (i.e., minor contributor must contribute at least 20% of the DNA mixture). Def. Ex. 1 at 82 (PCAST 2016 Report). As the 2017 Addendum to the PCAST report clarified, the concerns regarding reliability arise when the person of interest contributes less than 20% of the DNA. Def. Ex. 2 at 8 (PCAST 2017 Addendum). If the person of interest contributes a greater percentage (assuming sufficient template), the reliability regarding the evidence of his inclusion is satisfied. Moreover, the STRmix Internal Validation Study reports that STRmix is accurate to minor contribution percentages well below 20%, but does not specify a precise percentage. Graphs depicted in the report suggest STRmix is accurate even when the minor contributor only contributed 5% of the DNA found in the mixture. See Gov't Ex. 16 at 14. Both Dr. Buckleton and Lab Director Ciecko testified that this mixture—from the perspective of determining inclusion of Lewis—was not complicated. Tr. III at 456, 465. As the special master observed with respect to the evidence supporting inclusion of Lewis as a contributor:

"When a proposed person of interest aligns with the dominant component in a mixed DNA profile, the support for their inclusion to a mixture will not be markedly altered by an increase in the number of contributors under which the DNA profile is analyzed." Gov't Ex. 16 at 18. Thus, even if MRFL underestimated the number of contributors, the support for inclusion remains robust. See also Tr. I at 100, 103.

Although this case unquestionably involves a complex mixture, and the number of contributors cannot be known with certainty, the defendant's DNA profile is consistent with the genotypes that STRmix assigned to the primary contributor, who was estimated to have contributed more than 50% of the DNA in the mixture. All of the defendant's alleles across 22 [sic] loci were detected in the mixture, so this is not a case where the difference between an incriminating or exculpatory finding depends critically on the correctness of the parameters ... that might not have been adequately evaluated for the circumstances at hand. The DNA mixtures analyzed in this case appear similar in all relevant respects to the bulk of mixtures that were successfully analyzed in the validation studies .... There is nothing about the evidence to suggest that this was a particularly difficult or challenging case for STRmix to analyze.

Special Master Report at 41, Docket No. 113.

Moreover, the STRmix program includes internal diagnostic measures that assess whether the program operated properly in a particular case. These internal diagnostics will detect instances where the program fails to identify the most likely explanations for the observed data or employed implausible assumptions about the "mass parameters" to make the model fit the data. Ct. Ex. 1; Special Master Report at 41, Docket No. 113. Dr. Buckleton, Lab Director Ciecko, and Dr. Krane observed nothing in the internal diagnostics that signaled any problems in this case (Tr. III at 374-75, 434), and the values reported in the internal diagnostics (Def. Ex. 23; Gov't Ex. 19) appear to be within the expected ranges. Ct. Ex. 1; Special Master Report at 43, Docket No. 113; Tr. III at 374. This Court agrees that STRmix's application in this case relative to the inclusion of Lewis as a contributor to the DNA mixture is reliable and therefore admissible.

B. Evidence regarding exclusion of police and Mr. Moriarty as contributors to this mixture

The first question is why it matters, in the context of this case, whether the evidence excluding the police officers as potential contributors to this DNA mixture is admissible. It matters because Lewis hypothesizes that if his DNA was on the gun, it was transferred there by one or more police officers. Def. Br. at 17, Docket No. 108. Studies have shown that DNA from one person may be transferred to an object such as a gun through the intermediary of another person. Def. Exs. 10, 11. That is, Lewis's DNA can be present on the gun even though he in fact never handled it. Id. Moreover, studies also demonstrate that such transfer is more likely if the suspect's DNA is transferred "wet" (e.g. , via sweat) and the intermediary thereafter handles an object such as a gun. Id.

In this case the prosecution alleges that officers engaged in a hand-to-hand struggle with Lewis in order to subdue him. This hand-to-hand combat was apparently of some duration and involved physical touching of Lewis by one or more officers. Therefore, the record establishes that the police officers may well have "accepted" DNA from Lewis during their hand-to-hand struggle. In addition, evidence in the record establishes that police officers may have violated the lab's standard operating procedures by failing to wear gloves when handling the gun. See Def. Exs. 9, 12.

Thus, the defense's transfer theory is not inconsistent with the facts. DNA from Lewis may have been picked up by one or more police officers, who may have then transferred it to the surface of the gun. In addition, studies introduced in evidence demonstrate that if Lewis is a "high shedder" of DNA and an officer who handled both Lewis and the gun is a "low shedder," the DNA mixture found on the gun may show Lewis's (the high shedder's) DNA and may not reveal the presence of much, if any, of the officer's (the low shedder's) DNA. Thus, these studies may support Lewis's argument that his DNA, even if transferred to the gun by the police, appears on the gun in much greater quantities than that of a police officer who handled the gun and transferred Lewis's DNA to it. Def. Ex. 11.

The prosecution presumably wishes to introduce evidence from the STRmix report that the police and Mr. Moriarty are "excluded" as contributors to the DNA mixture on the gun. While this conclusion is not necessarily inconsistent with Lewis's theory (in light of the above-mentioned studies), it may tend to mislead the jury as this evidence of exclusion is not sufficiently reliable to be admissible.

To begin, the Court has virtually no information regarding how STRmix generated the result that excluded all of the police officers. Presumably that evidence is the result of a likelihood ratio less than 1. Given that MRFL is likely to underestimate the number of actual contributors to the DNA mixture on the gun, and given that underestimation leads to false exclusions of contributors at an unknown error rate, this evidence of exclusion is not reliable. Moreover, since the police are excluded as contributors it is presumably because they do not match even the minor contributor and, if not included in the number of known contributors, would have contributed even less than 6% of the combined DNA in the mixture. Tr. II at 278, 284. At these low percentages the evidence does not establish that STRmix is a reliable basis to exclude a contributor. Tr. II at 295. Therefore, in the facts of this case, the exclusion of the police officers as contributors to the DNA mixture on the gun is not reliable and is therefore not admissible.

As Lab Director Ciecko testified, "if you underestimate by two contributors, you are falsely excluding your bottom two contributors from that mixture." Since one of the minor contribution percentages to one of the swabs (10-A) was only 2%, a falsely excluded contributor's percentage would be less than 2% of the mixture. See Gov't Ex. 19.

V. Decisions From Other Courts

A. Federal court decisions

To this Court's knowledge, this is the first case within the Eighth Circuit to consider the admissibility of DNA evidence generated by STRmix's probabilistic genotyping software as applied to a complex DNA mixture. Though there have been several reported decisions addressing probabilistic genotyping in general, and STRmix in particular, few are instructive in this instance.

By far the most comprehensive opinion on the admissibility of STRmix is Gissantaner , with which this Report and Recommendation is consistent. The court in that case noted the 2016 PCAST Report's finding that STRmix appeared reliable for three-person mixtures in which the minor contributor constitutes at least 20% of the intact DNA in the mixture and in which the DNA amount exceeds the minimum level required for the STRmix software. 417 F.Supp.3d at 869-70. The DNA sample analyzed in Gissantaner did not satisfy that standard because it was a mixture of three, and possibly four, individuals in which the minor contributor (allegedly Gissantaner) was responsible for only 7% of the DNA analyzed, of which there was very little. Thus, given the low template amount of DNA in the mixture, the defendant's contribution amounted to only 49 picograms or approximately 8-9 human cells. Id. at 869-70 n.8, 872-73, 877-78, 884-85. Unlike the inclusion of Lewis in this case, the inclusion of the defendant in Gissantaner involved the "outer limits of a complex mixture of low-template, low level DNA, which has very little comparable validation, if any, by the ... lab." Id. at 882. There, unlike here, the amount of DNA attributed to the contributor whose profile matched the defendant was from the minor contributor rather than the major contributor and was far too low to be reliably tested using STRmix, and therefore the likelihood ratio generated from the STRmix analysis was not reliable. See id. at 877 ("the evidence on the record does not establish adequate testing and validation of the STRmix software under the conditions of the DNA evidence in this case") and *23 (citing expert testimony). The Gissantaner court pointed out that its decision not to admit the DNA evidence was "not an indictment of probabilistic genotyping, and certainly not of STRmix software in particular." Id. at 885. It recognized that STRmix has "some general acceptance in the scientific community, particularly with respect to ... ‘mainstream’ higher quality and quantity DNA." Id. at 884. But the analysis in Gissantaner "extend[ed] beyond the standard DNA analysis in order to decipher a low copy number/low-template (quantity) mixture" that represented the "outer limits" for STRmix analysis and fell short of admissibility under Daubert . Id. at 870-71, 881-83.

When testing such a small amount of DNA, random fluctuations in testing results can distort the profile. See Gissantaner , 417 F.Supp.3d 857, Attachment 1 (Glossary), which defines "Low-copy number (LCN)/low-template(LT) DNA" as "DNA test results at or below the stochastic threshold" and defines "Stochastic effects" as "Random fluctuations in testing results that can adversely influence DNA profile interpretation (e.g. , exaggerated peak height imbalance, exaggerated stutter, allelic drop-out, and allelic drop-in)."

"The evidence sample in this case seems to fall outside of (below) the ranges of %-contribution and quantity-of-template-contributed for which the MSP Laboratory has validated STRmix.... [I]t appears that the evidence sample ... is a mixture of DNA from at least three individuals where the individual who contributed the least material is responsible for only 7% of the total DNA that was used for testing (approximately 49 pg). These values are well below the levels at which the 2016 PCAST report felt that some probabilistic genotyping systems had been foundationally validated." Id. at 884 (internal quotation marks omitted).

The court in Gissantaner also noted the short period STRmix had been in use by the Michigan lab at the time it analyzed the DNA mixture in that case—three months. This also contrasts with the instant case. Here, the MRFL had been using STRmix for approximately 16 months at the time of testing. Tr. II at 136; Gov't Ex. 20 at 1.

In stark contrast to Gissantaner , the evidence of inclusion of Lewis as a contributor is based on a large amount of "high quality" DNA (Tr. III at 456), to which he is said to have contributed over 56%. The MRFL lab notes (Gov't Ex. 20) indicate that the three swabs analyzed contained a total of 564 picograms (Item 8-A), 570 picograms (Item 8-B), and 550 picograms (Item 10-A). Gov't Ex. 20 at 54. At the percentage attributed to the major contributor, 56%, the DNA amounts are between 308 and 319 picograms, six times the amount attributed to the defendant in Gissantaner . The amount of DNA attributed in this case to the major contributor is in excess of the lower-level contribution amounts noted to be necessary in The STRmix Internal Validation Study or even the 20% noted in the 2016 PCAST Report. Its reliability for purposes of inclusion of Lewis is established.

As to the exclusion of the officers and Mr. Moriarty, however, the conclusion is the opposite. As in Gissantaner, the exclusion of these potential contributors implicates concerns based on minor contribution percentages and subjective judgments by the lab analyst, including but not limited to, the assignment of the number of contributors at which the MRFL analysts are significantly inaccurate. At 6% the amount attributed to the minor contributor is only 34 picograms of DNA, well below the amount noted in Gissantaner . If the officers and/or Mr. Moriarty contributed even less DNA than the identified minor contributor, the amount is even less. As Lab Director Ciecko testified, STRmix cannot model trace contributions very well (Tr. II at 295), and anything below 100 picograms is considered a low template amount of DNA. Tr. II at 173. Moreover, as Dr. Krane persuasively testified, The STRmix Internal Validation Study still leaves "an outstanding question as to where we should consider STRmix reliable ...." Tr. III at 431. Thus, as to the evidence of exclusion, the Government has not met its burden of demonstrating such reliability as to render it admissible.

Decisions from other federal jurisdictions do not alter this Court's conclusion, as they are inapposite and otherwise unhelpful to this Court's analysis. In United States v. Pettway , No. 12-CR-103S, 2016 WL 6134493 (W.D.N.Y Oct. 21, 2016), the court denied a motion in limine to exclude DNA evidence or, in the alternative, for a Daubert hearing. The Erie County Forensic Laboratory had used STRmix to analyze DNA taken from two firearms. Id. at *1. The two defendants asserted that STRmix was unreliable but did not offer any expert opinion, scientific evidence, or specific challenge to STRmix's methodology or reliability. Id. at *2. In rebuttal, the Government submitted an affidavit and letter setting forth the validity and reliability of STRmix, to which the defendants did not respond. Id. The court summarized the information provided by the prosecution regarding STRmix, but the opinion is short and does not discuss details particular to the case such as the purported number of contributors, percentage contribution by either defendant on either firearm, and so forth. The court found that the defendants' concerns went to weight rather than admissibility of the DNA evidence.

A more recent opinion is similarly unenlightening here. In United States v. Russell , No. CR-14-2563, 2018 WL 7286831 (D.N.M. Jan. 10 2018), the court analyzes the reliability of STRmix, but the case is factually dissimilar and not instructive for Lewis's case. Russell involves DNA samples taken from a sexual abuse victim, the defendant, clothing, and other items. Id. at *5. The samples were first tested by the FBI in 2014 and again in 2016, the latter using STRmix after the FBI laboratory implemented it in 2015. Id. at *1, 4 In some cases, the test results "differ[ed] substantially." Id. at *5. Russell also involves other issues not present in this case, such as the use of STRmix to analyze "low copy number," i.e., a low amount, of DNA in some of the samples, and whether the Native American database had been validated for use in the analysis. Id. at *5, 8, 10. The court ultimately found the Government had met its burden, and the court denied the defendant's motion to exclude the DNA evidence.

B. State court decisions

In 2017 a Minnesota state court analyzed the use of STRmix under the Frye-Mack standard rather than Daubert and found DNA evidence taken from a firearm was admissible. See State v. Edwards , No. 02-CR-17-3290, State v. Hill-Turnipseed , No. 02-CR-17-3291 (Anoka Cty. Dist. Ct. Nov. 11, 2017 Order & Mem.), Docket No. 34-12. Two DNA samples from the firearm were interpreted as four-person mixtures to which defendant Edwards was a contributor. Id. at 3. DNA evidence using STRmix had not previously been admitted at trial in Minnesota. Id. at 6. However, admissibility was not a contested issue in Edwards because the defendant made a tactical decision to withdraw his motion to exclude the DNA evidence, and he stipulated that STRmix was not novel science under the first prong of Frye-Mack . Id. at 4-7 and n.3. The court held an abbreviated hearing out of an abundance of caution and to make a record for potential appellate review, expressing uncertainty about the scope of its gatekeeping obligations under these circumstances. Id. at 6-7, 10, 13. The sole witness was Anne Ciecko of MRFL, and her opinion that STRmix was not new or novel science was not challenged by either defendant. Id. at 8. Accordingly, the court adopted that conclusion for purposes of that case and, after reviewing the testimony and exhibits, also found that, even if STRmix was novel science, it was generally accepted in the relevant scientific community. Id. at 12-13. Having concluded the DNA reports satisfied the first prong of Frye-Mack , the court admitted the evidence. Id. at 2.

The Frye-Mack standard is discussed in State v. Roman Nose , 649 N.W.2d 815, 818-19 (Minn. 2002).

Similarly, the opinion in People v. Bullard-Daniel , 54 Misc.3d 177, 42 N.Y.S.3d 714 (Niagara Cty. Ct. 2016) is of little value to this Court's analysis. In Bullard-Daniel , the court admitted testimony regarding a four-person mixture analyzed by STRmix. Because the case arose under state law, the court considered admissibility under the Frye standard rather than Daubert and specifically eschewed questions of the reliability of the scientific evidence:

[I]t is not the court's duty to reach its own conclusion about the reliability of the proposed scientific procedure, but rather to determine whether most of the relevant scientific community believes [it] is reliable .... Judges should be "counting scientists' votes," and not verifying the soundness of a scientific conclusion.

42 N.Y.S. 3d at 720 (citations omitted).

At the Frye hearing the defense offered no relevant evidence to counter the prosecution's evidence of general acceptance. Id. at 721. Given the well-established acceptance of the underlying mathematical models and the New York State Commission on Forensic Science's unanimous approval of STRmix, the court found it had gained general acceptance in the relevant scientific community. Id. at 725. Moreover, the decision was published several months before PCAST issued its 2016 report, so the court did not have the benefit of that analysis. As the court in Bullard-Daniel noted, its decision was based on what was presented to it. The court did not expect its decision to be unassailable in the face of future developments in the science:

It may be among the first words in New York courts on the admissibility of STRmix, but the Court certainly does not expect it to be the last.

Id at 726. Thus, the decision is of little guidance to this Court.

Subsequent to the decision in Bullard-Daniel , two state trial courts in New York denied motions for a Frye hearing and admitted DNA evidence taken from firearms that was analyzed using STRmix. Neither decision is instructive in Lewis's case, however. The first case relies heavily on the approval of STRmix by the DNA Subcommittee of the New York State Commission on Forensic Science to satisfy Frye 's "general acceptance" requirement. See People v. Lopez , Indictment No. 3927/16 (New York Cty. Sup. Ct. Apr. 27, 2018 Decision & Order) at 5-7, Docket No. 34-7. The Lopez court was also persuaded by the decision in Bullard-Daniel , which the Court has distinguished above, and by other courts that had found general acceptance of STRmix and the science underlying it. See id. at 7-9.

The second New York decision is similarly of little value here. See People v. Yates , Indictment No. 10663-2016 (Kings Cty. Sup. Ct. Oct. 4, 2018 Decision & Order), Docket No. 34-6. It totals less than four pages and, like Lopez , focuses on the Frye standard in which the "court does not have to determine whether or not a novel scientific theory is actually reliable, only whether it is generally accepted in the relevant scientific community. The court's focus should be on counting the votes of scientists." Id. at 3.

In People v. Muhammad , 326 Mich.App. 40, 931 N.W.2d 20 (2018), the Michigan Court of Appeals upheld the trial court's admission of a STRmix-generated DNA analysis. Though Michigan has adopted the Daubert standard, the case itself is factually distinct because it involved analysis of a two-person mixture rather than the complex mixture involved in this case. The appellate court found the trial court had not abused its discretion admitting the evidence. The procedural posture of the decision, with its deferential standard of review, and the fact that the DNA mixture that was tested involved only two contributors make this opinion of little persuasive value here.

The first DNA mixture that was identified involved four donors but could not be analyzed. A second swab taken from a different part of the same physical evidence (a shoe) involved only two contributors. This two-person mixture was the subject of the court's analysis. 931 N.W.2d at 25-26.

In 2018 a Michigan state court admitted DNA evidence taken from a car door that was analyzed using STRmix, finding that the reliability criteria of Daubert and Michigan Rule of Evidence (MRE) 702 were satisfied. People v. Alford , No. 15-696-FC (Ingham Cty. Cir. Ct. Nov. 28, 2016 Order & Mem.), Docket No. 34-9. The DNA sample was determined to be a three-person mixture to which the defendant was a contributor. Id. at 2-3, 12. The defendant did not present any expert rebuttal to the results obtained using STRmix. Id. at 17. His specific arguments seemed to focus not on the reliability of STRmix generally but on his claim that it was not reliably applied by the Michigan State Police (MSP) Crime Lab. Id. at 15. The court rejected his challenge to the training and experience of the MSP lab scientist, saying it went to weight and not admissibility, and to the sufficiency of the lab's internal validation methods. Id. at 15, 20-21. The court found that the evidence and testimony from the prosecution witnesses, including Dr. Buckleton, regarding the use of STRmix satisfied the reliability criteria of Daubert and MRE 702 and therefore the DNA evidence was admissible. Id. at 15, 20-21.

A state trial court in Wyoming found STRmix to be a reliable methodology or technique and held the resulting DNA report was admissible after conducting a Daubert hearing. State v. Fairbourn , No. CR-16-178-L (Sweetwater Cty. Dist. Ct. Oct. 24, 2017 Order & Mem.), Docket No. 34-8. The Wyoming State Crime Lab determined the DNA sample was a three-person mixture to which the defendant was a contributor. Id. at 5-6. However, the short opinion is of limited utility in Lewis's case, as the defendant's motion in Fairbourn did not claim STRmix was unreliable but rather that it was a new methodology or technique never before presented in a Wyoming court, and demanded that the prosecution carry its burden to establish admissibility. Id. at 1. The opinion does not mention any defense witness or expert and states that the defendant did not argue that STRmix was not used properly in this case. See id. at 3 n.2. After hearing testimony from Dr. Buckleton and a lab scientist, the court concluded that STRmix was a reliable method under Daubert and Wyoming Rule of Evidence 702 and therefore admissible, and stated that any dispute over the accuracy of the test results was a question of the weight of the evidence for the jury to decide. Id. at 6.

In 2018, a California court, applying the state's three-prong Kelly-Frye test, admitted DNA evidence analyzed using STRmix. See People v. Venegas , No. 17 CR F 2383 (Shasta Cty. Sup. Ct. Nov. 6, 2018 Order), Docket No. 34-5. The court's opinion does not materially assist the Court here, not only because it involves the "general acceptance" framework but also because it is quite case-specific as it focuses on the particular DNA samples at issue, including very detailed and technical testimony from defense and prosecution witnesses disputing the number of contributors, whether peaks were alleles or instead were stutter or artifact, and whether a known third party was a contributor to the DNA mixture. Id. at 5-6, 8, 10. The court concluded the Kelly-Frye requirements were met and the issues raised by Defendant went to weight rather than admissibility. Id. at 10.

Finally, the opinion of a Virgin Islands trial court in People v. Blash , No. ST-2015-CR-156, 2018 WL 4062322 (V.I. Super. Ct. Aug. 24, 2018), though applying Daubert to a rule of evidence identical to FRE 702, does not address the objections Lewis raises here, and this Court has not relied on that opinion in reaching its conclusion here.

CONCLUSION

Though STRmix is a relatively new and exceedingly complicated technology that employs principles of probabilistic genotyping, the admissibility of the evidence of inclusion it generated in this case is clear. STRmix is foundationally valid for mixtures of four persons when the mixture is of sufficient quality and quantity and the person of interest's contribution to the mixture is of a sufficient amount. STRmix has gained general acceptance in the forensic community, has been validated according to the guidelines published by the forensic bodies directly interested in the process, and has satisfied concerns previously articulated by the President's Council of Advisors on Science and Technology. It has been subject to peer-review publication and testing and, within certain parameters, is a reliable method. As to the analysis of Lewis as a potential contributor to the DNA mixtures at issue, the application of STRmix is well within its reliable parameters. The Government has established admissibility of the evidence as to Lewis's inclusion.

However, the evidence regarding exclusion of the officers and the property owner as potential contributors to the DNA on the handgun is a different matter. First, it is well demonstrated that MRFL regularly underestimates the number of contributors to complex DNA mixtures. The number of contributors in this case was deemed to be four, but given the lab's history, it is entirely possible, if not likely, that the true number of contributors was five or six. Those contributors could have been any of the individuals allegedly excluded. It is well known that underestimation of the assumed number of contributors leads to false exclusions, though at an unspecified rate. If any of the excluded individuals were in fact contributors, their percentage contribution is exceedingly small and well below STRmix's established threshold for reliability. For these reasons, the Court recommends that the evidence excluding the officers and Mr. Moriarty not be admitted at trial.

RECOMMENDATION

IT IS HEREBY RECOMMENDED that Defendant Kenneth Davon Lewis's motion to exclude DNA evidence [Docket No. 27] be GRANTED IN PART AND DENIED IN PART as follows:

1. The evidence as to Lewis's inclusion as a potential contributor to the DNA mixtures found on the gun be admitted.

2. The evidence as to the exclusion of the officers and Mr. Moriarty as potential contributors to the DNA mixtures found on the gun be excluded.


Summaries of

United States v. Lewis

UNITED STATES DISTRICT COURT DISTRICT OF MINNESOTA
Mar 3, 2020
442 F. Supp. 3d 1122 (D. Minn. 2020)
Case details for

United States v. Lewis

Case Details

Full title:United States of America, Plaintiff, v. Kenneth Davon Lewis, Defendant.

Court:UNITED STATES DISTRICT COURT DISTRICT OF MINNESOTA

Date published: Mar 3, 2020

Citations

442 F. Supp. 3d 1122 (D. Minn. 2020)

Citing Cases

Whittley v. State

Such decisions document various peer-reviewed studies validating the software and its low rate of error. See,…

United States v. Ortiz

In United States v. Lewis, as predicted by the PCAST report, the court extended the STRmix's foundational…