From Casetext: Smarter Legal Research

United States v. Ortiz

United States District Court, Southern District of California
Jun 9, 2024
No. 21-CR-2503-GPC (S.D. Cal. Jun. 9, 2024)

Summary

granting motion to exclude DNA evidence analyzed by STRmix due to concerns over the number of assumed contributors and the lab's lack of validation for a higher number of contributors

Summary of this case from United States v. Russell

Opinion

21-CR-2503-GPC

06-09-2024

UNITED STATES OF AMERICA, Plaintiff, v. FRANCISCO ORTIZ, Defendant.


ORDER GRANTING MOTION TO EXCLUDE DNA EVIDENCE

[ECF NO. 120]

The Government has charged Defendant Francisco Ortiz with being a Felon in Possession of a Firearm. The Government intends to prove the charge with DNA analysis of the firearm performed by a probabilistic genotyping software called STRmix. Pending before the Court is Mr. Ortiz's motion to exclude that DNA analysis. ECF No. 120. The motion argues that the DNA sample from the firearm likely contained at least six contributors and that STRmix had not been properly validated for use in a mixture with this many contributors. The Government filed an opposition, ECF No. 123, and Mr. Ortiz filed a reply, ECF No. 124. The Court held three days of evidentiary hearings and heard testimony from the San Diego Police Department Crime Laboratory (“SDPDCL”) employee that conducted the DNA analysis, Adam Dutra, and Mr. Ortiz's expert, Dr. Dan Krane. The Court also solicited additional briefing from the parties. See ECF Nos. 15960, 164-165. For the reasons that follow, the Court GRANTS Mr. Ortiz's motion.

BACKGROUND

On January 26, 2020, officers from the San Diego Police Department stopped a vehicle in which Mr. Ortiz was a passenger. The other occupants included the driver, Jasmine Canchola, and two other passengers: Jasmine's mother, Julieta Canchola, and her mother's friend, Norma Zambrano. The officers had stopped the car because they were looking for Jasmine's brother, and they eventually directed Mr. Ortiz to exit the car so that they might better identify him. At this time, officers observed a handgun which led to a search of the vehicle and seizure of the handgun and eighteen grams of methamphetamine. Mr. Ortiz has since been charged federally for possession of the firearm and the methamphetamine.

At the SDPDCL, a forensic swab of the seized firearm revealed a complex DNA sample, one that multiple individuals had contributed to. Complex DNA samples are difficult to analyze because the DNA profiles within such samples are superimposed upon one another, much like fingerprints on an object touched by multiple people. Ex. Q (PCAST, Forensic Science in Criminal Courts: Ensuring Scientific Validity of FeatureComparison Methods 8 (2016)). The SDPDCL interprets complex samples using STRmix, a software which advances a statistical approach. An analyst tells STRmix how many different profiles are in the mix, and STRmix relies upon that number to find the combination of profiles that best explain the sample. The greater the number of profiles, or number of contributors (“NOC”), the greater the number of combinations that STRmix must sort through before it can generate a conclusion. After running millions of simulations, STRmix calculates whether the inclusion of a suspect's unique genetic profile makes the complex sample more or less likely. That conclusion is called a likelihood ratio (“LR”), and quantifies how much “more likely [it is] to obtain the complex sample if the [suspect] is a contributor than if [they are] not.” Ex. 10 (SDPDCL Forensic Biology Unit Report).

STRmix relies entirely upon the analyst to determine NOC. Analysts determine NOC by looking at an electropherogram, a graphical representation of DNA sequencing-essentially a genetic fingerprinting. DNA analysis considers spots on the human genome, called loci, which contain segments of DNA and vary significantly from one person to another. The alleles are depicted on electropherograms as peaks. While a fingerprint may have countless ridges, at any DNA locus a person has only two alleles. One allele is inherited from each parent.

Because a person has only two alleles at any locus, analysts determine NOC by counting up the number of alleles-which present as peaks on the electropherogram- and dividing by two. A sample that displays six allelic peaks, for instance, likely has a NOC of at least three.

This seemingly straightforward analysis is complicated by the electropherogram's capacity for generating false information, which often leads analysts to either overestimate or underestimate NOC. For instance, electropherograms can generate fake peaks, which analysts refer to as “artifactual peaks.” Artifactual peaks may be generated for a variety of reasons, including simple machine fluctuation or “stutter,” a phenomenon where a real peak is surrounded by fake ones. If an analyst incorrectly considers an artifactual peak in their analysis, they run the risk of overestimating NOC.

On the other hand, electropherograms may also lead an analyst to underestimate NOC. This can occur where the electropherogram fails to detect an allele. This phenomenon, called allelic dropout, occurs when the DNA sample is small and the undetected allelic contribution, minimal. Transcript at 354. Underestimation can also occur as a product of allele sharing. Though alleles are highly variable between individuals, it is possible for two people to share one or both alleles at any particular locus. Where this occurs, the shared alleles overlap on the electropherogram creating the impression of a single peak where two peaks have merged, which may lead an analyst to undercount the total number of alleles.

Mr. Dutra testified that the SDPDCL has developed strategies to address these complex mixture concerns, as recommended by the Scientific Working Group on DNA Analysis Methods (“SWGDAM”) which issues guidelines that “have become generally, if not universally accepted by the forensic science community.” ECF No. 160 at 3. Specifically, Mr. Dutra spoke of stutter thresholds, emphasizing repeatedly that stutter thresholds are created by testing “hundreds and hundreds and hundreds of known samples.” Transcript at 506-07. Mr. Dutra explained that based off those thresholds, after identifying “peaks that are in position for stutter for other alleles . . . we would mathematically calculate the ratio of those and see if those fit within our expectations for stutter, and if they fit within the expectations for stutter, the best explanation for that peak is that that is within the stutter.” Id. at 538. Mr. Dutra testified that the lab's use of a technique called “peak-height balance” is much the same; analysts rely on “validated data” to “verify whether those peaks are within our expectation, from the empirical data that we've obtained from running hundreds of samples, if not thousands of samples, that we've gone through to determine whether it meets our expectations based on the validation.” Id. at 507. The essence of both strategies is that an analyst distinguishes artifactual peaks based upon patterns observed from extensive sampling.

Nevertheless, the risk of overestimation and underestimation rises with the number of contributors to the sample. As Mr. Dutra explained: “The number of opportunities for overlapping is greater with the number of peaks and the number of contributors. So the more peaks, the more stutter; the more stutter, the more opportunity to have overlap of alleles and stutter and other factors.” Transcript at 541. Indeed, peer-reviewed studies on the matter suggest that analysts correctly determine NOC for five-person mixtures less than half the time, and that they almost never correctly determine NOC for six-person mixtures. ECF No. 159 at 2-3.

While the studies reflect significant ambiguities in NOC determination for five-and six-person samples, the SDPDCL wields NOC as a bright-line rule: STRmix may only be used where NOC is five or less. Most labs have only validated STRmix's use in samples where NOC is four or less. Ex. M (R. Austin Hicklin et al., Variation in assessments of suitability and number of contributors for DNA mixtures, 65 Forensic Science International: Genetics 5 (2023)). Which is how the SDPDCL operated prior to 2016: when an analyst set NOC at five or higher, the “exponential” increase in the number of possible configurations would overwhelm the analyst's computer and cause it to crash. Transcript at 25. Now, that threshold is set at six. As of the date of this order, the SDPDCL has never validated or even attempted to validate STRmix's use in samples with NOC of six or higher. Instead, those samples are deemed unsuitable for analysis. Transcript at 170.

Here, Mr. Dutra concluded that the NOC was five, and after running the sample through STRmix determined that it was 542,000,000 times more likely to obtain the DNA profile found on the handgun if Mr. Ortiz was a contributor than if he was not. Ex. 10. Mr. Ortiz argues that this analysis must be excluded, because NOC was likely six or more and, accordingly, Mr. Dutra ran STRmix on a sample for which it was not validated.

STANDARD

Under Federal Rule of Evidence 702, the Court is tasked with the “preliminary assessment of whether the reasoning or methodology underlying the testimony is scientifically valid and of whether that reasoning or methodology properly can be applied to the facts in issue.” Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 592-93 (1993). Rule 702 thus establishes a standard of evidentiary reliability that focuses on the scientific validity of the expert's methods rather than the soundness of his specific conclusions. United States v. Bonds, 12 F.3d 540, 566 (6th Cir. 1993).

Among the factors that a court should consider in determining whether scientific testimony is reliable are: (1) whether the expert's opinion can be or has been tested; (2) whether the theory or technique on which the opinion is based has been subjected to peer review and publication; (3) the technique's known or potential error rate; (4) the existence and maintenance of standards controlling the technique's operations; and (5) “general acceptance.” Daubert, 509 U.S. at 592-95.

The Ninth Circuit has identified three steps in DNA analysis: (1) processing of DNA samples to produce DNA prints; (2) comparison of the prints to see whether there is a “match”; and (3) estimating the statistical significance of the match. See United States v. Chischilly, 30 F.3d 1144, 1156 (9th Cir. 1994). Under Rule 702, each component of DNA testing must pass muster. Id.; see also United States v. Shea, 957 F.Supp. 331, 337 (D.N.H 1997); People v. Castro, 545 N.Y.S.2d 985 (Sup. Ct. 1989) (holding that admissibility conditioned on a finding that the expert properly performed the protocols underlying DNA profiling); United States v. Two Bulls, 918 F.2d 56, 61 (8th Cir. 1990) (same), vacated and dismissed as moot, 925 F.2d 1127 (8th Cir. 1991). Failure to use a reliable method for any of these steps precludes admission of the evidence. Id. And before conclusions grounded in a particular methodology can be admitted into evidence, the methodology must be supported by “appropriate validation.” Daubert, 509 U.S. at 590.

ANALYSIS

As a starting point, Mr. Ortiz does not question STRmix satisfying Rule 702 as a product of reliable principles and methods. Instead, he raises issues whether these principles were “reliably applied” in this case. With respect to probabilistic genotyping software programs like STRmix, the President's Council of Advisors on Science and Technology (PCAST) specifically advises that “[w]hen considering the admissibility of testimony about complex mixtures (or complex samples), judges should ascertain whether the published validation studies adequately address the nature of the sample being analyzed (e.g., DNA quantity and quality, number of contributors, and mixture proportion for the person of interest).” PCAST, An Addendum to the PCAST Report on Forensic Science in Criminal Courts 9 (2017); accord Katherine Kwong, The Algorithm Says You Did It: The Use of Black Box Algorithms to Analyze Complex DNA Evidence, 31 Harv. J.L. & Tech. 275, 277-82, 300 (2017) (stating that “courts should rigorously examine whether a given algorithmic system has been validated for a particular type of evidence analysis and refuse to admit evidence that lacks demonstrated validity for a given mixture type”). A probabilistic genotyping system must be validated on mixtures containing a given number of contributors before the system can be applied to such mixtures. See Scientific Working Group on DNA Analysis Methods (SWGDAM) Guidelines for Validation of Probabilistic Genotyping Systems § 4.1.6.3 (“The number of contributors evaluated should be based on the laboratory's intended use of the software. A range of contributor numbers should be evaluated in order to define the limitations of the software.”).

In 2016, PCAST considered validity “proven” for mixtures containing “three contributors where the person of interest comprises at least twenty percent of the sample,” and a 2018 study of thirty-one laboratories found that analysts correctly identify three-person mixtures ninety-eight percent of the time. Ex. 7 (Jo-Anne Bright et al., Internal validation of STRmixTM-A multi laboratory response to PCAST, 34 Forensic Science International: Genetics 11, 21 (2018)). The PCAST Report predicted that “[t]he range in which foundational validity has been established is likely to grow as adequate evidence for more complex mixtures is obtained and published.” Ex. Q at 82. In response to the PCAST Report, the developers of STRmix published a study based upon the laboratory work of thirty-one forensic laboratories which provided internal validation of STRmix for mixtures up to five contributors when the person of interest contributed at least twenty percent of the DNA and sufficient overall sample exists. Ex. 7 at 11-24. To date, there have not been any validation studies establishing the reliability of STRmix testing of mixtures containing the DNA of six contributors.

In United States v. Lewis, as predicted by the PCAST report, the court extended the STRmix's foundational validity beyond the limits PCAST identified in 2016. 442 F.Supp.3d 1122, 1129 (D. Minn. 2020). In doing so, it conducted a detailed and complete review of the STRmix validation study, along with a second validation study by the FBI involving between two and five contributors in varying proportions, and found the studies established foundational validity for complex mixtures up to four contributors. Id.

1. Determining the Number of the Contributors

Here, Mr. Ortiz does not challenge the foundational validity for mixtures up to five contributors or the computer algorithms and biological models that undergird STRmix's probabilistic analysis. Instead, he challenges the process by which STRmix was applied to a complex DNA mixture that likely contained six contributors, given that STRmix had not been subjected to developmental validation for six-person mixtures by the developer or internal validation by the SDPDCL.

The Government argues that it has shown the validity of the NOC determination of five-contributors and the DNA analysis through Mr. Dutra's testimony.Mr. Dutra testified that NOC is a judgment call informed by an analyst's “professional judgment and expertise.” Transcript at 121. In other words, NOC is an analyst's best guess.

The Government made this argument at the hearing on April 26, 2024. The Government's briefing, by contrast, spends little to no time defending the reliability of the NOC determination.

Because in “real-life situations,” one can “never know the amount of contributors.” Transcript at 121, 471. While there were indications that the sample contained the DNA of six contributors, Mr. Dutra opined, “I can't say for certain it's not a six-person mixture. I would say the best estimate that I have is that it's a five-person mixture.” Id. at 578. Mr. Dutra did not test the sample for a six-person mixture because the SDPDCL had not validated STRmix with six-person mixtures. Moreover, there is no developmental validation by STRmix for the testing of a six-person mixture.

At the hearing, it was established that NOC accuracy flounders as the complexity of a mixture rises. As SWGDAM recognizes, “[f]or mixtures in which minor contributors are determined to be present, a peak in stutter position may be determined to be . . . indistinguishable as being either an allelic and/or stutter peak.” Ex. 13 (SWGDAM Interpretation Guidelines for Autosomal STR Typing by Forensic DNA Testing Laboratories at 32). Mr. Dutra testified that “[t]he number of opportunities for overlapping is greater with the number of peaks and the number of contributors. So the more peaks, the more stutter; the more stutter, the more opportunity to have overlap of alleles and stutter and other factors.” Transcript at 541. Mr. Dutra further recognized that similar issues attend the use of peak height expectations, noting that peak height is not an “end-all/be-all,” and “doesn't cure the deficiencies that exist” regarding allele counting. Id. at 601. While these strategies may provide an analyst with some guidance, the analyst's final decision remains a subjective attempt to distinguish between real and artifactual peaks. Moreover, these strategies do not provide an analyst with the means to address allele sharing or allele dropout, two phenomena that further complicate the NOC analysis.

These weaknesses and concerns are not merely theoretical; they impact NOC accuracy in practice. The 2018 Bright paper studied 2,825 mixtures compiled from 31 laboratories, including the SDPDCL. Ex. 7 at 12. The analysts at these laboratories informed their NOC estimate with peak-height ratios and stutter thresholds. Transcript at 551; see also Transcript at 454 (Mr. Dutra: “[Y]ou [can] look at additional information beyond allele count, that can provide further information that can help support your number of contributor assessment.”). But the analysts nonetheless mischaracterized sixty-four percent of five-person samples and one hundred percent of six-person samples. Ex. 7 at 21. Similar inaccuracy was observed in the Hicklin paper, which surveyed 134 analysts representing 67 laboratories and concluded that only five percent of analysts correctly identified the five-person mixtures and zero percent correctly identified the six-person mixture spot-on.Transcript at 169. These error rates demonstrate to the Court that stutter thresholds and peak-height expectations do little to address the many ambiguities inherent to an analyst's NOC estimate.

Some analysts provided a range which included the possibility of a NOC of six.

The SDPDCL analyst was not asked to consider any six-person mixtures.

The lesson taught by the numerous studies submitted to the Court is that in complex mixtures involving five or more contributors, NOC is usually underestimated. That is, the mixtures are mistaken by analysts as being less complicated than they actually are. The Hicklin paper observed that “[o]verestimations almost always occurred on 2-3 person mixtures, whereas all underestimations occurred on 4-5 person mixtures.” Ex. M (parentheticals omitted). This was true, too, in the Bright paper where every mischaracterization of five and six-person mixtures was an underestimation. So, too, in the SDPDCL's internal validation study. The trend of the papers suggests overwhelmingly, then, that in the present mixture NOC was determined erroneously, and it was underestimated.

Dr. Krane testified that at several loci, the electropherogram displayed twelve peaks, which indicates a NOC of at least six. He concluded that the peaks were not the result of simple machine fluctuation. Even the small peaks “ha[d] the same relative height to width . . . as the peaks that [we]re above 100 RFU” and “ha[d] all the appearance of a true peak.” Transcript at 344. Mr. Dutra disagreed, testifying that in his judgment, one of the twelve peaks was an artifactual product of stutter, and the other an artifactual product of “spectral pull-up.”

Even if Mr. Dutra correctly concluded that the electropherogram displayed only ten peaks, allelic sharing suggests that those peaks underrepresented the true number of alleles in the sample. Allelic sharing occurs where two or more contributors to a sample share at least one allele. This increased risk of allele sharing is challenging for analysts because shared alleles overlap on the electropherogram, creating the impression that only one major contributor is present when in fact two or more lesser contributors have merely overlapped. This risk intensifies in six-person samples, so much so that a 2015 study of “tens of billions of simulated mixtures,” found that due to allele sharing, eighty-six percent of six-person mixtures present as five or fewer.Transcript at 280; see also Ex. L (Michael D. Coble et al., Uncertainty in the number of contributors in the proposed new CODIS set, 19 Forensic Sci. Int'l: Genetics, 207, 209 (2015)).

Dr. Krane noted that the risk of mischaracterization was worse for African Americans and Hispanics. Transcript at 294.

Those risks are heightened here, where evidence in the record demonstrates that the sample may have contained DNA from related individuals. Specifically, the gun was found in a car occupied by Jasmine Canchola and her mother. Alleles are inherited, and thus the presence of a parent and child in a sample produces at least one instance of allele sharing. Transcript at 577. A sample that includes a parent and child and displays twelve peaks, then, likely hides a thirteenth, suggesting a NOC of at least seven. Id. Critically, here, Mr. Dutra performed his NOC analysis without ever being told about the occupants in the car or the potential for related contributors. Transcript at 153. He assumed the sample included Mr. Ortiz and “four unknown and unrelated co-contributors.” Transcript at 448. In United States v. Gissantaner, 990 F.3d 457, 461 (6th Cir. 2021), the court observed that “[v]isual inspection runs the risk of cognitive biases too. Studies suggest that an examiner's knowledge of the case-other evidence about the suspect-affects interpretations, frequently not in the suspect's favor.” Unlike Gissantaner, here, the lack of knowledge of the related occupants prevented a full consideration of salient details needed to arrive at an accurate NOC. Because even if the electropherogram displayed only ten peaks, as Mr. Dutra suggests, the presence of a mother-daughter pair suggests that an eleventh peak may have been obscured, which would have indicated a NOC of at least six.

The facts of this case also demonstrate a significant risk that certain allelic information was just not captured. This phenomenon, known as allelic dropout, occurs when due to the “small amounts of material being sampled” an allele that “was present in the sample at the start of the process, because of a random effect, some sort of sampling error typically, [i]s not detected.” Transcript at 244. Here, the DNA sample collected from the handgun was small, just 450 picograms, already below the recommended 500 picograms. Transcript at 319. That amount was further reduced in subsequent testing. Mr. Dutra first ran the sample for fifteen seconds. Noting that “there were possible DNA types that were below [the] analytical threshold,” Mr. Dutra ran the sample again as a twenty-four second injection. Transcript at 186. He then re-injected the sample once more as a twenty-four second injection. This third injection was the one that Mr. Dutra “specifically used for the number-of-contributors assessment.” Transcript at 587.Each injection removed approximately fifteen percent of the DNA that was in the sample. Transcript at 328. This led to “a reduction” in allelic peaks and a reduction in STRmix's ability to detect all relevant alleles.

Mr. Dutra attempted to run STRmix with this twenty-four second injection, but the computer crashed. Transcript at 588. The parties disagree as to why the program crashed, but neither produced a software expert to testify as to the matter.

In general, disputes about the accuracy, reliability, or validity of a testing method “provide[] grist for adversarial examination, not grounds for exclusion.” Gissantaner, 990 F.3d at 464. A claim that scientific methods are unsound must be addressed initially by the trial judge, while a claim that scientifically sound methods have been applied improperly ordinarily should be left for the jury to resolve unless the alleged “error negates the basis for the reliability of the principle itself.” United States v. Martinez, 3 F.3d 1191, 1198 (8th Cir. 1993), cert. denied, 510 U.S. 1062 (1994).

Here, the alleged error involves the NOC determination. The court in United States v. Williams, observed that NOC is a “foundational part of every calculation Bullet performs. If that input is in doubt, the reliability of the entire analysis is necessarily in doubt.” 382 F.Supp.3d 928, 937 (N.D. Cal. 2019). Williams involved a nearly identical challenge to a probabilistic genotyping software called Bullet. Id. Bullet had been validated for four-person mixtures but not five, and the defendant argued that the relevant mixture contained five or more contributors. Id. at 936. The court concluded that there was “simply not enough evidence to conclude reliably that this mixture is a four-person mixture.” Id. at 937. As such, Bullet was not validated to perform the analysis on the multi-contributor sample.

In this case, Mr. Dutra's NOC determination is placed in doubt given the twelve identified alleles and a small DNA sample size which was degraded due to repeated testing, increasing the likelihood of allelic dropout. In addition, the likely stochastic effects of allele sharing were not taken into account given the fact that Mr. Dutra was unaware that there were two related passengers in close proximity to the firearm in the vehicle.

Meanwhile, the Government argues that the court's treatment of NOC in United States v. Lewis, 442 F.Supp.3d 1122 (D. Minn. 2020), is “instructive given the parallels to the current case.” ECF No. 154 at 3. As previously discussed, the Lewis court found that STRmix was properly validated to analyze a four-person mixture. The Government points out that the court in Lewis found STRmix reliable even though the lab had “NOC error rates of 55% to 65%” and that lab was “incapable of processing 5-person mixtures.” Id. However, in Lewis, there was little indication, aside from validation study error rates in general, that the NOC for the sample had been underestimated, whereas here those indications for the particular sample are numerous.

The Government lists without much discussion a number of cases that it argues stand for the proposition that NOC determination is a fact issue for the jury. But the Court's gatekeeping function under Rule 702 is predicated upon “the facts presented in this litigation.” GE v. Joiner, 522 U.S. 136, 144 (1997). Thus, the persuasiveness of the cases cited depends upon their factual similarity, and none of the cited cases involved facts similar to the instant case, including mixtures of six contributors or the high error rates that characterize NOC determinations of mixtures with six or more contributors. See United States v. Barton, No. 8:14-CR-496-T-17AEP, 2016 WL 11469438, at *7 (M.D. Fla. Sept. 10, 2016) (three-person mixture); People v. Burrus, 200 N.Y.S. 3d 655, 730 (Sup. Ct. 2023) (two or three); State v. Warner, No. A-15-858, 2016 WL 4443559, at *5 (Neb. Ct. App. Aug. 23, 2016) (two and three); People v. Debraux, 21 N.Y.S.3d 535, 542 (N.Y. Sup. Ct. 2015) (same); United States v. Morgan, 53 F.Supp.3d 732, 746 (S.D.N.Y. 2014), aff'd, 675 Fed.Appx. 53 (2d Cir. 2017) (two or three); People v. Davis, 75 Cal.App. 5th 694, 722 (2022) (three).

The facts of this specific case-including the presence of twelve peaks, the heightened risk of related contributors, the severely depleted DNA sample, and the failure to account for related individuals in close proximity to the firearm-indicate a significant likelihood that NOC was six or greater. The Court concludes that the Government has failed to demonstrate that it reliably determined that STRmix performed an analysis that it was validated to perform.

2. STRmix Analysis of Mixtures Involving Six or More Contributors

The Government argues, in the alternative, that even if NOC was determined inaccurately, STRmix's overall analysis remains reliable for six-person mixtures. It argues that errors in NOC simply cause STRmix to be more conservative in its analysis, reducing the risk of falsely accusing an innocent non-contributor. In support of this argument, the Government directs this Court to three studies: the 2017 Moretti Paper (Ex. 8), the 2018 Bright Paper (Ex. 7), and the SDPDCL's 2016 Internal Validation (Ex. 14). None of the studies purport to establish foundational validity as to STRmix's analysis of six-person mixtures. The Court recites the relevant findings.

The 2017 Moretti Paper detailed the results of the FBI laboratory's internal validation of STRmix, including a discussion of the FBI's attempts to study the effect of an erroneous NOC determination upon STRmix's overall accuracy. See Ex. 8. It did not involve testing of six-person mixtures. The FBI purposefully overestimated twentyseven samples of “one, two and three-person profiles” and purposefully underestimated three samples of three-person profiles. Id. at 140. The FBI found that overestimation slightly reduced the LR for actual contributors, resulting in a decrease by more than one order of magnitude in only thirteen percent of the samples. Id. Overestimation also increased the LR for non-contributors, though “at relatively low contributor amounts and LRs . . . barely in inclusionary territory.” ECF No. 160 at 10. On the other hand, underestimation of the three, three-person mixtures revealed no effect.Ex. 8 at 141.

The Government's Supplemental Opposition incorrectly attributes portions of the 2018 Bright paper to the 2017 Moretti paper. See ECF No. 160 at 10.

The 2018 Bright paper observed a similar trend in its analysis of samples consisting of 3 contributors or more, which included 182 5-person samples and 65 6-person samples. See Ex.7. As previously mentioned, every six-person sample was underestimated. Id. at 21. Aside from their unintentional NOC errors, the participating laboratories intentionally overestimated certain three- and four-person mixtures in order to study the reliability of STRmix while using inaccurate NOC determinations. Id. at 19. The paper observed that overestimation of NOC “generally leads to lower LRs for true contributors and an increase in LRs for non-contributors,” while underestimation “can result in false exclusions of true donors.” Id. at 22. Only overestimation, the paper observed, creates the risk of falsely including a non-contributor. With regard to “true donors, you are either correct or conversative when N[OC] is either under or overestimated.” Id. at 22.

Finally, the Government relies upon the SDPDCL's 2018 validation study for the proposition that underestimation of NOC for five-person mixtures increased the “potential for false inclusion” but only at “lower LRs, the highest being 539, which, would be described as moderate support for inclusion.” ECF No. 160 at 13-14.

Although none of the studies analyzed the effect of an inaccurate NOC determination on six-person mixtures, the Government argues that this Court can extrapolate. The Government argues that any errors produced by an inaccurate NOC tend to favor a defendant, “because even in these situations, the program[']s conservative analysis skewed towards exclusion rather than inclusion.” ECF No. 160 at 7. The Government further argues that when errors resulting from inaccurate NOC determinations falsely include non-contributors, these errors are mild and limited to lower LRs. Thus, the Government concludes, the 542,000,000 LR produced in this case is not the sort of anomaly ordinarily attributable to an error in NOC and is instead the sort of reliable result produced in spite of such errors.

Though an expert's opinion may be grounded in “extrapolat[ions] from existing data,” the Court may decline to admit such opinion evidence where “there is simply too great an analytical gap between the data and the opinion proffered.” Joiner, 522 U.S. at 146 (citing Turpin v. Merrell Dow Pharms., Inc., 959 F.2d 1349, 1360 (6th Cir. 1992)). The Court is not persuaded that extrapolation is justified here.

As an initial matter, none of the studies cited by the Government attempted to analyze or extrapolate as to the effect of an inaccurate NOC on six-person mixtures. The 2017 Moretti Paper was designed to analyze NOC errors in one-, two-, and three-person mixtures. The 2018 Bright Paper was designed to analyze NOC overestimation in three-and four-person mixtures. And the SDPDCL's internal validation study only analyzed five-person mixtures. None of the papers suggested that their results could be extrapolated to mixtures of greater complexity. In fact, the 2018 Bright Paper alluded to the dangers of such extrapolation, noting that three- and four-person mixtures respond differently to inaccuracies in NOC. Ex. 7. at 21.

As Dr. Krane explains, “smaller number of contributor mixtures do not behave similarly to higher number of contributor mixtures across a variety of tests.” Transcript at 308. Six-person samples require STRmix to crunch exponentially more possibilities, requiring so much more computing power that laboratories like the SDPDCL have never even attempted to run six-person samples. Indeed, in Williams, attempts to run the probabilistic genotyping software at a NOC of five, for which it had not been validated, resulted in a new LR that was substantially lower than the one produced by running the sample at a NOC of four. 382 F.Supp.3d at 935. And while the 2018 Bright paper's study of three- and four-person samples resulted in false inclusion only where NOC was overestimated, the SDPDCL's internal validation study of five-person samples contradicted this trend, producing several false inclusions where NOC was underestimated.

The inability to test the Government's theory further dissuades this Court from adopting its argument. The record demonstrates that most laboratories, including the SDPDCL, currently lack the capacity to run STRmix on six-person samples. Accordingly, the record contains no evidence suggesting that the results of a successful six-person analysis would be similar to those obtained from a four- or five-person sample. Recognition of this lack of evidence appears to be reflected in the scientific community's measured approach to validation. Laboratories choose to progressively validate STRmix at each NOC level. They do not choose to analyze complex mixtures that are beyond their analytical capacity as lower contributor mixtures.

Indeed, persuasive to the Court is that the SDPDCL's practices do not appear to reflect the Government's position. The Government's argument is that NOC does not matter, that even if NOC was determined incorrectly, even if the handgun swab actually contained a six-person mixture, STRmix's analysis of the sample as a five-person mixture is reliable. If this were true, or if this were accepted as true by the SDPDCL, then the SDPDCL would have no issue running six-person samples as five-person samples. Under this practice, they might avoid altogether the limitations in computing power that prevent them from analyzing mixtures with a NOC of six or more. But this is not the SDPDCL's policy. When asked about the SDPDCL's approach to six-person mixtures, Mr. Dutra responded that the protocol is to “deem [the six-person sample] as being unsuitable for comparisons,” a quote taken “right out of the technical manual.” Transcript at 170. To the extent that Mr. Dutra and the SDPDCL do not practice what the Government proposes, this would weigh against the Government's position. As PCAST concluded:

The practice of construing six-person mixtures as four- or five-person mixtures would not be difficult. As previously noted, “the probability of a six contributor profile appearing as five or fewer contributors” can be has a high as ninety-nine percent. Ex. L (Michael D. Coble et al., Uncertainty in the number of contributors in the proposed new CODIS set, 19 Forensic Science International: Genetics 207, 209 (2015)).

These probabilistic genotyping software programs clearly represent a major improvement over purely subjective interpretation. However, they still require careful scrutiny to determine (1) whether the methods are scientifically valid, including defining the limitations on their reliability (that is, the circumstances in which they may yield unreliable results) and (2) whether the software correctly implements the methods. This is particularly important because the programs employ different mathematical algorithms and can yield different results from the same mixture profile.
Id. at 79.

In this case, the limits on reliability for STRmix testing have been established through developmental and internal validation studies at five-contributor mixtures. Meanwhile, there has not been general acceptance or peer review approval of STRmix for six-person mixtures. And to the extent that the Government argues that the Court may infer reliability from the conclusions in this case-that STRmix was reliable, here, because when STRmix errs, it does not output “significant contributor ratios (over 10%) and likelihood ratios of 104 or greater”-such an argument is at odds with the Supreme Court's instruction in Daubert, that the focus of the Rule 702 inquiry “must be solely on principles and methodology, not on the conclusions that they generate.” 509 U.S. 579, 595 (1993).

Accordingly, the Court concludes that the Government has failed to demonstrate that STRmix's analysis remains reliable for six-person samples.

3. Rule 403 Analysis

In the alternative, even if the DNA analysis were not excluded under Rule 702, the Court would exclude it under Rule 403. Rule 403 permits the exclusion of evidence if its probative value is substantially outweighed by the danger of unfair prejudice, confusion of the issues, or misleading the jury. See United States v. Layton, 767 F.2d 549, 553 (9th Cir. 1985). On one hand, the Ninth Circuit has observed that “statistical evidence derived from sample processing and match analysis, properly documented and performed in compliance with established, peer-reviewed laboratory protocols, is certainly probative of the defendant's guilt or innocence.” United States v. Chischilly, 30 F.3d 1144, 1158 (9th Cir. 1994), overruled by United States v. Preston, 751 F.3d 1008 (9th Cir. 2014). On the other, the Daubert court cited with approval the views of Judge Weinstein, who recognized that: “Expert evidence can be both powerful and quite misleading because of the difficulty in evaluating it. Because of this risk, the judge in weighing possible prejudice against probative force under Rule 403 of the present rules exercises more control over experts than over lay witnesses.” 509 U.S. at 588-89 (quoting Jack B. Weinstein, Rule 702 of the Federal Rules of Evidence Is Sound; It Should Not Be Amended, 138 F.R.D. 631, 632 (1991)); see also John W. Strong, Language and Logic in Expert Testimony: Limiting Expert Testimony by Restrictions of Function, Reliability, and Form, 71 Or. L. Rev. 349, 367 n.81 (1992) (first citing United States v. Addison, 498 F.2d 741, 744 (D.C. Cir. 1974); then citing Reed v. State, 391 A.2d 364, 370 (Md. 1978); and then citing Paul C. Giannelli, The Admissibility of Novel Scientific Evidence: Frye v. United States, a Half Century Later, 80 Colum. L. Rev. 1197, 1237 (1980)) (“There is virtual unanimity among courts and commentators that evidence perceived by jurors to be ‘scientific' in nature will have particularly persuasive effect.”).

As to probative value, if the jury were to find that there was a five-person mixture tested with STRmix in compliance with established and validated laboratory protocols, the evidence would have great probative value and would assist the factfinder in their determination as to whether Mr. Ortiz possessed a firearm. Meanwhile, if the jury were to find that the mixture involved six persons, the evidence would not be reliable and, as a result, would create a considerable likelihood of unfair prejudice and would mislead the jury. Generally, where reasonable experts disagree, it is up to the jury to determine which expert to believe. Kumho Tire Co. v. Carmichael, 526 U.S. 137, 153 (1999). However, in this case, there are two problems. First, in arriving at their opinions, neither expert took into account the presence of two related individuals in close proximity to where the gun was found. As a result, it will be up to the jury, unguided by the experts, to determine what effect these omitted facts have on the NOC determination. Not only is NOC analysis complex, but the failure of Mr. Dutra to account for a salient factor will invite the jury to guess as to what effect to give these additional facts. In re Paoli R.R. Yard PCB Litig., 35 F.3d 717, 747 (3d Cir. 1994) (“[F]or a district court to exclude scientific evidence, there must be something particularly confusing about the scientific evidence at issue-something other than the general complexity of scientific evidence.”).Second, to the extent that members of the jury find that there are six or more contributors, the power of DNA testimony will likely make it difficult for them to disregard the probabilistic testimony as it applies to a five-person mixture.

“If highly consequential evidence emerges from what looks like an indecipherable computer program to most non-scientists, non-statisticians, and non-programmers, it is imperative that qualified individuals explain how the program works and ensure that it produces reliable information about the case.” Gissantaner, 990 F.3d at 463.

The formulaic view that the evidence should be admitted and subjected to vigorous cross-examination does not account, here, for the complicated nature of the evidence and the fact that neither Mr. Dutra nor Dr. Krane considered the likelihood of allelic stacking based upon two close relatives being in close proximity to the firearm at any time before or after their determination of NOC. In sum, balancing the various Rule 403 considerations, the Court finds that even if the evidence were not excluded under Rule 702, the probative value of the DNA analysis is outweighed by the likelihood of undue prejudice. The evidence is therefore excluded under Rule 403 as well.

CONCLUSION

Because the Court concludes that the DNA analysis is inadmissible under both Rule 702 and Rule 403, the Motion to Exclude the STRmix testimony is GRANTED.

IT IS SO ORDERED.


Summaries of

United States v. Ortiz

United States District Court, Southern District of California
Jun 9, 2024
No. 21-CR-2503-GPC (S.D. Cal. Jun. 9, 2024)

granting motion to exclude DNA evidence analyzed by STRmix due to concerns over the number of assumed contributors and the lab's lack of validation for a higher number of contributors

Summary of this case from United States v. Russell
Case details for

United States v. Ortiz

Case Details

Full title:UNITED STATES OF AMERICA, Plaintiff, v. FRANCISCO ORTIZ, Defendant.

Court:United States District Court, Southern District of California

Date published: Jun 9, 2024

Citations

No. 21-CR-2503-GPC (S.D. Cal. Jun. 9, 2024)

Citing Cases

United States v. Russell

See e.g., United States v. Williams, 382 F.Supp.3d 928, 929, 936-38 (N.D. Cal. 2019) (granting motion to…