Opinion
Supreme Court Nos. S-16191/16193/16214/16449 (Consolidated)
01-04-2019
Diane L. Wendlandt, Assistant Attorney General, Office of Criminal Appeals, Anchorage, and Jahna Lindemuth, Attorney General, Juneau, for Petitioner and Cross-Respondent and Appellee State of Alaska. Sharon Barr, Assistant Public Defender, and Quinlan Steiner, Public Defender, Anchorage, for Respondents and Cross-Petitioners Sharpe and Alexander. Brooke Berens, Assistant Public Advocate, and Richard Allen, Public Advocate, Anchorage, for Appellant Holt. Gordon L. Vaughan, Vaughan & DeMuro, Colorado Springs, Colorado, and Gavin Kentch, Law Office of Gavin Kentch, LLC, Anchorage, for Amicus Curiae American Polygraph Association.
Diane L. Wendlandt, Assistant Attorney General, Office of Criminal Appeals, Anchorage, and Jahna Lindemuth, Attorney General, Juneau, for Petitioner and Cross-Respondent and Appellee State of Alaska.
Sharon Barr, Assistant Public Defender, and Quinlan Steiner, Public Defender, Anchorage, for Respondents and Cross-Petitioners Sharpe and Alexander.
Brooke Berens, Assistant Public Advocate, and Richard Allen, Public Advocate, Anchorage, for Appellant Holt.
Gordon L. Vaughan, Vaughan & DeMuro, Colorado Springs, Colorado, and Gavin Kentch, Law Office of Gavin Kentch, LLC, Anchorage, for Amicus Curiae American Polygraph Association.
Before: Stowers, Chief Justice, Winfree, Maassen, Bolger, and Carney, Justices.
OPINION
STOWERS, Chief Justice.
I. INTRODUCTION
In each of the three underlying criminal cases in this consolidated appeal, the defendant sought to introduce expert testimony by a polygraph examiner that the defendant was truthful when he made exculpatory statements relating to the charges against him during a polygraph examination conducted using the "comparison question technique" (CQT). In two of the cases, the superior courts found that testimony based on a CQT polygraph examination satisfied the requirements for scientific evidence under Daubert v. Merrell Dow Pharmaceuticals, Inc. and State v. Coon . In the third case, the superior court reached the opposite conclusion and found the evidence inadmissible. We are now asked to revisit the appellate standard of review for rulings on the admissibility of scientific evidence and to determine the admissibility of CQT polygraph evidence.
509 U.S. 579, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993).
974 P.2d 386 (Alaska 1999).
We conclude that appellate review of Daubert / Coon determinations should be conducted under a hybrid standard: the superior court’s preliminary factual determinations are reviewed for clear error; based on those findings and the evidence available, whether a particular scientific theory or technique has been shown to be "scientifically valid" under Daubert and Coon is a question of law to which we apply our independent judgment; and where proposed scientific evidence passes muster under that standard, the superior court’s case-specific determinations and further evidentiary rulings are reviewed for abuse of discretion. Applying this standard here, we conclude that CQT polygraph evidence has not been shown to be sufficiently reliable to satisfy the Daubert / Coon standard.
II. BACKGROUND
A. State v. Alexander
Thomas Alexander was charged with multiple counts of sexual abuse of a minor. Before trial, Alexander hired David Raskin, Ph.D., a polygraph examiner, to administer a CQT polygraph examination. Based on the polygraph results, Dr. Raskin concluded that Alexander answered truthfully when he denied committing the acts with which he was charged. At Alexander’s request, Superior Court Judge Gregory Miller held an evidentiary hearing to address the admissibility of the polygraph results. For the purpose of that hearing, Alexander’s case was consolidated with an unrelated criminal case pending before Superior Court Judge pro tem Daniel Schally because the two cases involved similar polygraph testimony by the same polygraph examiner, Dr. Raskin. The two judges held a joint evidentiary hearing over the course of two days, spanning more than ten hours of testimony. Dr. Raskin testified for the defense in support of admitting testimony about the polygraph results, while William Iacono, Ph.D., a research psychologist at the University of Minnesota, testified for the State in opposition. Both sides also submitted copious evidence in the form of declarations by the two experts, scientific studies, treatises, etc.
The other defendant later pleaded guilty to the charged offense and is not a party on appeal.
The judges issued a joint order for both cases concluding that CQT polygraph testing satisfies the Daubert / Coon requirements for scientific validity. The judges also concluded that the proposed testimony was not otherwise excluded by the Alaska Rules of Evidence relating to relevance, unfair prejudice, credibility bolstering, expert testimony, or hearsay. Their order held that the polygraph evidence would be admissible, but on the condition that the defendants first testified at their respective trials and subjected themselves to cross-examination. Their ruling was also premised on each defendant agreeing to sit for a second polygraph test administered by the State, which the judges reasoned would mitigate concerns relating to possible bias by a "friendly" examiner and add additional "guarantees of trustworthiness."
The "friendly examiner" bias hypothesis was explored at the evidentiary hearing. The hypothesis posits that when a polygraph examiner is hired by the defense and the test is administered to the defendant without giving the prosecution notice or an opportunity to observe, various factors might work together to bias the examination in ways favorable to the defendant "passing" the test. The validity of this hypothesis and the extent to which a "friendly" examiner might affect the results of a polygraph examination are disputed. See Paul C. Giannelli et al., 1 Scientific Evidence § 8.03[f], at 460 (5th ed. 2012).
It appears the superior court was under the belief that Alexander had already been subjected to a polygraph examination administered by the Department of Corrections. It was later clarified that no such test had taken place, but Alexander did agree to sit for a State-administered exam. The parties appear to have proceeded with the understanding that doing so was a prerequisite for admitting the polygraph evidence.
B. State v. Sharpe
In a case unrelated to Alexander’s, Jyzyk Sharpe was charged with murder and manslaughter in connection with the death of his girlfriend’s two-year-old son. Sharpe also hired Dr. Raskin to administer a polygraph examination, after which Dr. Raskin concluded that Sharpe answered truthfully when he denied the charges against him.
Before trial, the State moved to preclude Sharpe’s polygraph evidence and Dr. Raskin’s testimony. As in Alexander’s case, the State argued that polygraph examinations are not supported by valid science and that additional accuracy problems are presented in the case of a "friendly" polygraph examiner. For those reasons, the State argued that the polygraph testimony should be excluded under Alaska Evidence Rule 403 because its probative value would be outweighed by risks of unfair prejudice, confusion, delay, and wasted time. The State also argued that the proposed testimony included inadmissible hearsay, that the testimony was inadmissible as expert testimony under Daubert / Coon and under the Alaska Rules of Evidence, and that the testimony was inadmissible character evidence under Evidence Rule 608.
No new Daubert / Coon hearing was held; instead, Superior Court Judge Eric Smith relied on the record and evidence presented in Alexander’s Daubert / Coon evidentiary hearing. The superior court held that the testimony would be admissible pursuant to the same reasoning as in that case. However, the court added the additional limiting instruction that the polygraph examiners—Dr. Raskin and the State’s examiner—could testify only to whether Sharpe "believed what he was saying" and not to whether he was "telling the truth"; the court reasoned that the latter would impermissibly imply that a polygraph test can reveal whether a statement is objectively accurate.
During a second polygraph test, administered for the State by former FBI agent Kendall Shull, Sharpe prematurely terminated the examination when Shull asked Sharpe if he was using countermeasures against the polygraph test. The State asked the court to reconsider the admissibility of Dr. Raskin’s testimony based on Sharpe’s lack of cooperation with the second examination. The court ultimately reaffirmed its original decision, ruling that Dr. Raskin’s testimony was admissible but that the State could present evidence of Sharpe’s lack of cooperation in rebuttal.
The term "countermeasures" refers to conscious efforts by an examinee to manipulate the results of a polygraph examination by altering the physiological indicators measured by the polygraph. Classes of countermeasures include using drugs or alcohol to suppress responses to questions; physical techniques such as breath control, biting one’s tongue, or contracting various muscles to create artificial responses; or mental techniques such as disassociation or counting backward to either suppress or create responses. See generally Giannelli et al. , supra note 4 § 8.03[d], at 458-59; Nat’l Research Council, The Polygraph and Lie Detection 4-5, 139-48 (2003), https://doi.org/10.17226/10420.
C. State v. Holt
Jeffery Holt was charged with five counts of first-degree sexual assault. Before trial, Holt hired Dr. Raskin to administer a polygraph examination, after which Dr. Raskin concluded Holt was being truthful when he denied the charges on the grounds that the alleged victim consented to sexual activity. In lieu of a Daubert / Coon hearing, both parties suggested and the court agreed it could determine the admissibility of Dr. Raskin’s testimony by reviewing the record of the hearing and subsequent order in Alexander’s case. The parties also submitted additional scholarly articles on polygraph testing, an audio recording of Holt’s polygraph examination, the raw data from that examination, and the prosecutor’s recorded interview of Dr. Raskin about the procedure used in that examination.
Superior Court Judge Charles Huguelet reviewed the evidence from Alexander’s case, heard oral argument, and then concluded that polygraph evidence is not sufficiently reliable to be admitted. The court further concluded that Dr. Raskin’s testimony would in any case be inadmissible under the evidence rules governing character evidence, bolstering, and prior consistent statements, as well as under the Rule 403 balancing test. After a jury trial, Holt was convicted of one count of first-degree sexual assault and four counts of second-degree sexual assault; he was sentenced to 28 years imprisonment with 8 suspended.
D. Proceedings In The Court Of Appeals
In Alexander’s case, the State filed a petition for review to the court of appeals challenging the conclusion that the proposed polygraph testimony was admissible; Alexander filed a cross-petition challenging the conditions that he agree to testify and agree to submit to a State-administered polygraph exam. In its decision, the court of appeals observed that in accordance with our opinion in Coon , determinations regarding the validity of scientific evidence are reviewed on appeal only for abuse of discretion. The court expressed concern about applying such a deferential standard and suggested that this court should revisit Coon and adopt a more probing standard of review. The court explained:
State v. Alexander , 364 P.3d 458, 460 (Alaska App. 2015).
Id. at 466.
Id. at 466, 468.
As it happened, [Judges Miller and Schally] reached the same conclusion regarding the scientific validity of polygraph examinations. But, as illustrated by the competing testimony offered by Dr. Raskin and Dr. Iacono, this is clearly a matter on which reasonable people can differ—and on which they do differ.
Thus, the two judges in this case might easily have reached differing conclusions regarding the scientific validity of polygraph examinations, even though they heard exactly the same evidence. And if the two judges had reached different conclusions, we apparently would have been required to affirm both of the conflicting decisions under the "abuse of discretion" standard of review.
....
This essentially means that the scientific validity of polygraph evidence will never be judicially resolved at an appellate level: it will remain an open question, and it will need to be litigated anew each time the issue is raised.[ ]
Id. (emphasis in original).
Ultimately, applying the abuse of discretion standard of review, the court of appeals affirmed the order admitting Dr. Raskin’s testimony. The court also upheld the conditions on admissibility imposed by the superior court.
Id. at 471.
Id.
In Sharpe’s case, the State again filed a petition for review challenging the ruling admitting Dr. Raskin’s testimony; the court of appeals denied the petition based on its ruling in Alexander .
The State filed petitions for hearing to this court in both cases; Alexander and Sharpe filed a joint cross-petition challenging the requirement that they agree to testify before their respective polygraph evidence could be admitted. We granted all three petitions and consolidated the cases for briefing.
Sharpe and Alexander are no longer challenging the requirement that they submit to a state-administered polygraph exam if requested to do so.
Holt appealed his convictions and his sentence to the court of appeals. One of Holt’s grounds for appeal was Judge Huguelet’s order excluding Dr. Raskin’s testimony. The court of appeals reasoned that the polygraph issue in Holt’s case was the same as the one in State v. Alexander , and that the trial court’s decision "present[ed] the very problem that [the court] noted when [it] decided Alexander : the problem that reasonable judges who heard exactly the same evidence concerning polygraph testing could rationally reach differing conclusions as to whether polygraph evidence meets the Daubert test for admission." Because we had already granted review of Alexander’s and Sharpe’s cases, the court of appeals severed Holt’s polygraph question and certified it to this court, again asking us to revisit the applicable standard of review. We accepted certification and consolidated Holt’s case with Sharpe’s and Alexander’s.
We are not presented with the other issues and arguments raised in Holt’s initial appeal to the court of appeals, and we do not address them.
III. STANDARD OF REVIEW
Broadly speaking, we review the admission or exclusion of evidence for abuse of discretion. But whether the trial court applied the correct legal rule is a question of law subject to de novo review. Similarly, "[w]hen the admissibility of evidence ‘turns on ... the correct scope or interpretation of a rule of evidence, we apply our independent judgment.’ " Findings of fact underlying a judgment of the superior court are reviewed for clear error, which we will find "if a review of the entire record leaves us with a definite and firm conviction that a mistake has been made."
Timothy W. v. Julia M. , 403 P.3d 1095, 1100 (Alaska 2017) (citing State v. Carpenter , 171 P.3d 41, 63 (Alaska 2007) ).
Id. (citing Carpenter , 171 P.3d at 63 ).
Sanders v. State , 364 P.3d 412, 419-20 (Alaska 2015) (cleaned up) (quoting Barton v. N. Slope Borough Sch. Dist. , 268 P.3d 346, 350 (Alaska 2012) ).
Kiva O. v. State, Dep’t of Health &Soc. Servs., Office of Children’s Servs. , 408 P.3d 1181, 1186 (Alaska 2018) (quoting Bigley v. Alaska Psychiatric Inst. , 208 P.3d 168, 178 (Alaska 2009) ). We have not previously stated explicitly what standard of review applies to findings of fact preliminary to evidentiary rulings. However, under Alaska Evidence Rule 104(b), "[w]hen the relevancy of evidence depends upon the fulfillment of a condition of fact, the court shall admit it upon, or subject to, the introduction of evidence sufficient to support a finding of the fulfillment of the condition." Thus, the relevant question on appeal is whether there is sufficient evidence in the record to support the necessary factual finding, i.e., whether that finding is clearly erroneous.
In State v. Coon we addressed the applicable standards of review for a decision admitting or excluding scientific evidence and concluded that a "determination of reliability under Daubert " is "best left to the discretion of the trial court." However, whether to revisit the standard outlined in Coon is one of the issues raised on appeal and one which the court of appeals has explicitly urged us to reconsider. When deciding whether to overrule a prior decision, we will do so only when "clearly convinced that the rule was originally erroneous or is no longer sound because of changed conditions, and that more good than harm would result from a departure from precedent." A previous decision may be considered "originally erroneous" if it "proves to be unworkable in practice."
974 P.2d 386, 399 (Alaska 1999).
Young v. State , 374 P.3d 395, 413 (Alaska 2016) (quoting Pratt &Whitney Canada, Inc. v. Sheehan , 852 P.2d 1173, 1176 (Alaska 1993) ).
Thomas v. Anchorage Equal Rights Comm’n , 102 P.3d 937, 943 (Alaska 2004) (quoting Pratt & Whitney Canada, Inc. , 852 P.2d at 1176 ).
IV. DISCUSSION
A. The Daubert/Coon Standard
Under Alaska Evidence Rule 702(a), a qualified expert witness may testify to "scientific, technical, or other specialized knowledge" if that knowledge "will assist the trier of fact to understand the evidence or to determine a fact in issue." In Daubert v. Merrell Dow Pharmaceuticals, Inc. , the United States Supreme Court set forth new requirements for admitting scientific evidence under the equivalent Federal Rule of Evidence. Prior to Daubert the prevailing standard had been established in Frye v. United States , under which an "expert opinion based on a scientific technique is inadmissible unless the technique is ‘generally accepted’ as reliable in the relevant scientific community." Daubert concluded that the Frye test was superseded by the adoption of the Federal Rules of Evidence.
509 U.S. 579, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993).
Id. at 584, 113 S.Ct. 2786 (citing Frye v. United States , 293 F. 1013, 1014 (D.C. App. 1923) ).
Id. at 587, 113 S.Ct. 2786.
The new standard laid out in Daubert is two-pronged. First, the court must determine whether the proffered testimony is based on "scientific knowledge," meaning that it is "derived by the scientific method" and "supported by appropriate validation" —in short, that it is "scientifically valid." Second, because Evidence Rule 702 requires that the testimony must "assist the trier of fact to understand or determine a fact in issue," the court must determine "whether the reasoning or methodology underlying the testimony ... properly can be applied to the facts in issue."
Id. at 590, 113 S.Ct. 2786.
Id. at 593, 113 S.Ct. 2786.
Id. at 592-93, 113 S.Ct. 2786.
The Daubert Court also outlined a number of key considerations relevant to the determination of scientific validity, although it noted that these considerations were not "a definitive checklist or test." The first question is whether the scientific theory or technique in question can be and has been empirically tested. The second is whether the theory or technique "has been subjected to peer review and publication." But the Supreme Court cautioned that publication, including in a peer-reviewed journal, "does not necessarily correlate with reliability"; rather, the Court reasoned that publication and peer review is relevant because "submission to the scrutiny of the scientific community is a component of ‘good science,’ in part because it increases the likelihood that substantive flaws in the methodology will be detected." The third consideration that the Court found relevant is "the known or potential rate of error, and the existence and maintenance of standards controlling the technique’s operation." And finally, although Daubert rejected general acceptance in the scientific community as an absolute prerequisite to admissibility, the Supreme Court recognized that "[w]idespread acceptance can be an important factor in ruling particular evidence admissible, and ‘a known technique which has been able to attract only minimal support within the community,’ may properly be viewed with skepticism."
Id. at 593, 113 S.Ct. 2786.
Id.
Id.
Id.
Id. at 594, 113 S.Ct. 2786 (internal citations omitted).
Id. (quoting United States v. Downing , 753 F.2d 1224, 1238 (3d Cir. 1985) ).
In 1999 we adopted Daubert as the applicable admissibility standard for scientific expert testimony under the Alaska Rules of Evidence in State v. Coon .
974 P.2d 386, 393-94 (Alaska 1999).
B. Polygraph Testing And The Comparison Question Technique
This opinion concerns the admissibility of expert testimony regarding the results of a polygraph examination, informally known as a "lie detector test." However, it does not concern the entire field of polygraph testing; rather, it involves the technique known as the "comparison question test" or "control question test" (CQT). The following is a summary of the undisputed aspects of CQT polygraph testing.
The technique was originally known as the "control question" technique; "comparison question" is now the preferred term because the technique does not use a "control" as that term is understood in the scientific community. See Giannelli et al. , supra note 4 § 8.02[a], at 437. For simplicity, we refer to the technique primarily by the shorthand "CQT."
In all polygraph examinations, whether the CQT or some other approach is used, the examinee is connected to a polygraph, an instrument that measures multiple physiological phenomena: pulse rate, blood pressure, respiration rate, and galvanic skin response in the hands and fingers. It is generally accepted that the polygraph is a highly sensitive instrument capable of measuring these physiological phenomena.
Nat’l Research Council , supra note 6, at 12-13; John Synnott et al., A Review of the Polygraph: History, Methodology and Current Status , 1 Crime Psych. Rev. 59, 62-65 (2015). Galvanic skin response, also known as electrodermal response, refers to the electrical conductivity of the skin, which is affected by activity in the skin’s sweat glands. See Nat’l Research Council , supra note 6, at 81, 155.
See Giannelli et al. , supra note 4 § 8.02[c], at 439.
The CQT exams Dr. Raskin administered in these cases are a form of specific-incident polygraph testing, as opposed to a polygraph examination for screening or background check purposes. Screening tests ask about a broad range of conduct, such as whether the examinee has ever committed a crime or used illegal drugs, but specific-incident tests, like the ones Dr. Raskin administered, focus on a particular crime, event, or other occurrence under investigation. The CQT examiner asks three types of questions: "neutral" or "irrelevant" questions ("Is your name Thomas?"), broad "control" or "comparison" questions ("During the first 35 years of your life, did you ever engage in a sexual act of which you should be ashamed?"), and specific "relevant" questions ("Did you ever touch G.B.’s breast?"). Each comparison question will ask about a broad category of past conduct, similar to but excluding the specific occurrence being investigated, and each question will be specifically designed to be ambiguous, broad, and vague but elicit a "No" answer. Because the comparison questions are broadly worded and address sensitive topics, the examinee is assumed to be deceptive or at least unsure of his answer. The underlying rationale of the CQT is that deceptive subjects will feel more threatened by the relevant questions and will view the comparison questions as less important; thus, deceptive subjects will have a stronger physiological reaction to the relevant questions. In contrast, truthful subjects are expected to feel more threatened by the comparison questions and will have a stronger physiological reaction than to the truthfully answered relevant questions. There are two reasons for this expectation: first, the sensitive topic of the comparison questions is assumed to generate a response; second, the examiner will have explained prior to the exam that the examinee’s reactions to the comparison questions are important to the ultimate test result. Thus, the CQT is based on the premise that the relative magnitudes of the examinee’s reactions to the relevant and comparison questions are indicative of his truthfulness or lack thereof when answering the relevant questions.
See Nat’l Research Council , supra note 6, at 1 ("Polygraph testing is used for three main purposes: event-specific investigations (e.g., after a crime); employee screening, and preemployment screening. The different uses involve the search for different kinds of information and have different implications.").
Id. at 23-24.
See Giannelli et al. , supra note 4 § 8.02[e], at 442-43; Nat’l Research Council , supra note 6, at 254-55; David C. Raskin & Charles R. Honts, The Comparison Question Test , in Handbook of Polygraph Testing 1, 5-27 (Murray Kleiner ed., 2001).
Raskin & Honts, supra note 40, at 15. If the examinee answers a comparison question affirmatively, indicating that some past event matches the described conduct, the examiner will elicit an explanation of that event before repeating the question in a way that excludes the admitted conduct ("Other than what you told me, ... did you ever...."). Id. at 16. In a variant of the CQT known as the "directed lie test," the examinee is simply instructed to lie to the comparison question and informed that the results will be inconclusive if there is not a strong enough response. Id. at 23; see also Giannelli et al. , supra note 4 § 8.02[e], at 444; Synnot et al., supra note 36, at 67-68.
See Raskin & Honts, supra note 40, at 15.
Giannelli et al. , supra note 4 § 8.02[e], at 441; Nat’l Research Council , supra note 6, at 14-15, 70-71, 255.
Giannelli et al. , supra note 4 § 8.02[e], at 441; Nat’l Research Council , supra note 6, at 14-15, 70-71, 255.
Raskin & Honts, supra note 40, at 15-16.
Giannelli et al. , supra note 4 § 8.02[e], at 441; Nat’l Research Council , supra note 6, at 14-15, 70, 255; Raskin & Honts, supra note 40, at 7, 18-21.
The examiner asks the examinee a list of prepared questions multiple times. For each relevant question, the examiner will compare the subject’s reaction to his reaction to an adjacent comparison question. Each measured parameter is given a numerical score for each question pair, for example from -3 to +3, with a positive number indicating a stronger reaction to the comparison question and a negative number indicating a stronger reaction to the relevant question. The examiner totals the numerical scores: a high positive overall score is interpreted as indicating a truthful result; a high negative score is interpreted as indicating deception; a score close to zero, whether positive or negative, is considered inconclusive. As will be explained in further detail below, the main scientific criticisms of CQT polygraph testing relate to the validity and testability of the assumptions underlying the technique.
Raskin & Honts, supra note 40, at 17-18.
Id at 7, 19.
Giannelliet Al. , supra note 4 § 8.02[f], at 445-46; Raskin & Honts, supra note 40, at 19.
Depending on the circumstances and the need for particularized test results, the scores may be totaled either for the test as a whole or for each relevant question individually. Raskin & Honts, supra note 40, at 20.
Giannelli et al. , supra note 4 § 8.02[f], at 446; Raskin & Honts, supra note 40, at 20.
C. The Appellate Standard Of Review For Scientific Evidence Rulings
The first question we must address is what standard of review the appellate court should apply to appeals from a Daubert / Coon determination made by the trial court. Our current standard, which the court of appeals urges us to reconsider, is the one laid out in State v. Coon : abuse of discretion.
974 P.2d 386 (Alaska 1999).
In Coon the superior court held an evidentiary hearing to determine whether proffered expert testimony on spectrographic voice identification would be admissible under Frye ’s general-acceptance standard; the superior court then admitted the testimony. After an initial appeal, we remanded the case with directions to the superior court to enter findings of fact and conclusions of law relating to Evidence Rule 703, as well as detailed findings of fact and conclusions of law under both the Frye and Daubert standards; the superior court on remand determined the testimony was admissible under both standards. On appeal again we expressly adopted the Daubert standard, and we then considered the superior court’s ruling admitting the evidence under this newly adopted standard.
Id. at 388.
Id. at 389.
Id. at 389-98.
Id. at 398-403.
The superior court’s conclusion was based on a number of preliminary findings: it found that the technique of spectrographic voice identification "had been empirically tested," that it "had been subjected to peer review and publication," that "when properly performed ... voice spectrography has a known error rate of less than one percent," that "when voice spectrography is properly performed by a qualified person, it has attained widespread acceptance within the relevant scientific community," that "the reasoning and methodology underlying [the expert’s] testimony were scientifically valid," and that the expert in that case "had properly performed the voice spectrographic analysis." We examined each of those preliminary findings in turn, and concluded for each finding that the superior court "did not err" in making it. We then reviewed for abuse of discretion the superior court’s definition of the "relevant scientific community" and its ultimate determination, in light of its preliminary findings, that the evidence presented satisfied the Daubert standard. We noted that "the majority of the federal circuits have chosen to apply the abuse of discretion standard when reviewing district court decisions under Daubert ," and that "the Supreme Court [had] recently approved the abuse of discretion standard in General Electric Co. v. Joiner ."
Id. at 400.
Id. at 401-02 ("[T]he trial court did not err in finding on remand that this technique has been subjected to empirical testing.... [T]he trial court did not err in finding on remand that the technique had been subjected to peer review and publication.... The trial court did not err in finding on remand that the known error rate ... was sufficiently low to make this evidence reliable.... [W]e do not find that the trial court clearly erred in making its general acceptance finding....").
See id. ("[W]e conclude that the trial court did not abuse its discretion in determining the relevant scientific community[,] ... in ruling that the evidence satisfied Daubert [,] ... [or] in finding the voice spectrographic evidence admissible....").
Id. at 399 (citing cases from the Courts of Appeal for the First, Second, Fourth, Fifth, Sixth, Eighth, Ninth, Tenth, and D.C. Circuits, and citing General Electric Co. v. Joiner , 522 U.S. 136, 118 S.Ct. 512, 139 L.Ed.2d 508 (1997) ).
Justice Fabe dissented from the court’s opinion. She argued that applying "an abuse of discretion standard of review to the validity of scientific techniques will most likely lead to inconsistent treatment of similarly situated claims." This non-uniformity, she suggested, "must be reconciled at the appellate level. Otherwise, inconsistent jury verdicts, widely disparate compensation for similar injuries, and erroneous criminal verdicts will continue to erode public confidence in our justice system." Justice Fabe explained that "[t]he reliability of scientific evidence does not change from one case to the next; a scientific method is either reliable or unreliable." For that reason, her dissent advocated reviewing "the question of the validity of scientific information" de novo, while reviewing for abuse of discretion "a trial judge’s assessment of the competency of a particular expert witness to render an opinion."
Id. at 404 (Fabe, J., dissenting).
Id. (Fabe, J., dissenting) (quoting Jay P. Kesan, An Autopsy of Scientific Evidence in a Post Daubert World , 84 Geo. L.J. 1985, 2037 (1996) ).
Id. at 404-05 (Fabe, J., dissenting).
Id. at 405 (Fabe, J., dissenting).
Prior to our decision in Coon , a number of commentators had criticized the federal courts’ abuse of discretion standard and proposed a hybrid standard similar to the one described in Justice Fabe’s dissent. For example, Professor David Faigman argued in a 1997 law review article that the relevance and reliability of scientific evidence "involves several layers of scientific work" and that different standards of review should apply to each. According to Faigman, "[w]hen the scientific evidence transcends the particular case, the appellate court should apply a ‘hard-look’ or de novo review to the basis for the expert opinion," but "[w]hen the scientific evidence involves facts specific to the particular case, the appellate court should defer to the trier of fact below."
See, e.g. , Confronting the New Challenges of Scientific Evidence , 108 Harv. L. Rev. 1509, 1528 (1995) ; David L. Faigman, Appellate Review of ScientificEvidence Under Daubert and Joiner, 48 Hastings L.J. 969, 976 (1997) ; David L. Faigman et al., Check Your Crystal Ball at the Courthouse Door, Please: Exploring the Past, Understanding the Present, and Worrying About the Future of Scientific Evidence , 15 Cardozo L. Rev. 1799, 1822 (1994) ; Michael H. Gottesman, From Barefoot to Daubert to Joiner: Triple Play or Double Error ?, 40 Ariz. L. Rev . 753, 776-80 (1998) ; Jay P. Kesan, An Autopsy of Scientific Evidence in a Post Daubert World , 84 Geo. L.J. 1985, 2038 (1996).
Faigman, Appellate Review , supra note 65, at 976.
Id.
Id.
Although all federal circuits have adopted Joiner ’s abuse of discretion standard for appellate review, a number of state courts have ruled to the contrary and adopted a stricter standard of review. For example, the New Mexico Supreme Court held in Lee v. Martinez that the validity of a particular scientific theory is a form of "legislative fact" not specific to the circumstances of any particular case, and it therefore applies de novo review to such questions. Other states that have adopted a hybrid or de novo standard of review for Daubert determinations include Oklahoma, Washington, Kentucky, New Hampshire, West Virginia, and Oregon. In states that continue to apply the Frye standard of general acceptance, most apply de novo review on appeal.
General Elec. Co. v. Joiner , 522 U.S. 136, 118 S.Ct. 512, 139 L.Ed.2d 508 (1997).
See Hughes v. Kia Motors Corp. , 766 F.3d 1317, 1331 (11th Cir. 2014) ; Calhoun v. Yamaha Motor Corp ., U.S.A ., 350 F.3d 316, 320 (3d Cir. 2003) ; Dura Auto. Sys. of Indiana, Inc. v. CTS Corp. , 285 F.3d 609, 617 (7th Cir. 2002) ; Raskin v. Wyatt Co. , 125 F.3d 55, 65-66 (2d Cir. 1997) ; United States v. Kayne , 90 F.3d 7, 11 (1st Cir. 1996) ; Duffee ex rel. Thornton v. Murray Ohio Mfg. Co. , 91 F.3d 1410, 1411 (10th Cir. 1996) ; Benedi v. McNeil-P.P.C. , 66 F.3d 1378, 1384 (4th Cir. 1995) ; Pedraza v. Jones , 71 F.3d 194, 197 (5th Cir. 1995) ; American & Foreign Ins. Co. v. General Elec. Co. , 45 F.3d 135, 137 (6th Cir. 1995) ; Hose v. Chicago N.W. Transp. Co. , 70 F.3d 968, 972 (8th Cir. 1995) ; United States v. Chischilly , 30 F.3d 1144, 1152 (9th Cir. 1994) ; Joy v. BellHelicopter Textron, Inc. , 999 F.2d 549, 567 (D.C. Cir. 1993).
136 N.M. 166, 96 P.3d 291, 296 (2004).
Taylor v. State , 889 P.2d 319, 331-32 (Okla. Crim. App. 1995) ("[A] trial judge’s decision to admit novel scientific evidence" is subject to "an independent, thorough review ... not limited by deference to the trial judge’s discretion").
State v. Cauthron , 120 Wash.2d 879, 846 P.2d 502, 505 (1993) ("We review the trial court’s decision to admit or exclude novel scientific evidence de novo."), overruled in part on other grounds by State v. Buckner , 133 Wash.2d 63, 941 P.2d 667 (1997).
Miller v. Eldridge , 146 S.W.3d 909, 915 (Ky. 2004) (explaining that "findings of fact, i.e. reliability or non-reliability" are reviewed for clear error and "discretionary decisions, i.e. whether the evidence will assist [the] trier of fact and the ultimate decision as to admissibility" are reviewed for abuse of discretion).
State v. Dahood , 148 N.H. 723, 814 A.2d 159, 161 (2002) ("Generally, we review the trial court’s rulings on evidentiary matters, including those regarding the reliability of novel scientific evidence, with considerable deference.... When the reliability or general acceptance of novel scientific evidence is not likely to vary according to the circumstances of a particular case, however, we review that evidence independently.").
State v. Beard , 194 W.Va. 740, 461 S.E.2d 486, 492 n.5 (1995) (explaining that West Virginia appellate courts review de novo whether "the reasoning or methodology underlying the testimony is scientifically valid," but that whether the scientific evidence "will assist the trier of fact to understand the evidence or to determine a fact in issue" is reviewed under the abuse of discretion standard).
State v. Lyons , 324 Or. 256, 924 P.2d 802, 805 (1996) ("Notwithstanding the usual deference to trial court discretion, we review [a] ruling on the admissibility of scientific evidence de novo ." (emphasis in original) (internal citation omitted) ).
See, e.g. , Goeb v. Tharaldson , 615 N.W.2d 800, 814 (Minn. 2000) (explaining that under Minnesota’s Frye -Mack standard, "the trial judge defers to the scientific community’s assessment of a given technique, and the appellate court reviews de novo the legal determination of whether the scientific methodology has obtained general acceptance in the scientific community"); Brim v. State , 695 So.2d 268, 274 (Fla. 1997) (explaining that "[a]ppellate review of a Frye determination will be treated as a matter of law" and be reviewed de novo).
The primary concern raised by jurisdictions applying abuse of discretion review, as well as by commentators and Justice Fabe’s dissent in Coon , is the potential for inconsistent rulings in similarly situated cases. Our opinion in Coon dismissed this concern, finding it unlikely "that the inconsistency will be of such magnitude as to ‘compromise the integrity of the judiciary in the eyes of the public.’ " In light of the posture of the cases now before us, we may have been too optimistic. If two defendants offer similar scientific testimony and—after separate evidentiary hearings—one judge deems the testimony to be scientifically valid while another does not, that could be the result of differences between the particular cases and differences in the evidence presented at the hearings. But when the judge in the latter case relied on the evidentiary hearing from the first, and reached the opposite conclusion based on identical evidence, it is clear that the difference in outcome cannot be attributed to a difference in the amount or quality of the evidence.
State v. Coon , 974 P.2d 386, 399 (Alaska 1999) (quoting Coon , 974 P.2d at 404 (Fabe, J., dissenting) ).
That is essentially what happened in these cases: the scientific evidence Alexander and Sharpe presented was deemed valid and admissible by the judges in their cases; essentially identical evidence based on the same scientific principles was deemed unreliable as a matter of law and inadmissible in Holt’s case, even though the trial judge relied on the very testimony presented at Alexander’s Daubert hearing. This raises at least the appearance of arbitrariness, i.e., the appearance that the outcome of a Daubert determination in our courts depends more on which judge was assigned to the case than on the objective application of law to the evidence presented. Regardless of how accurate this appearance might be, it certainly has the potential to raise serious questions in the eyes of the public about the integrity of our judicial system, particularly when such inconsistencies occur in the context of serious criminal proceedings.
An evidentiary hearing in which the judge considers the admissibility of expert testimony is also known as a Daubert hearing, and will be hereafter referred to as such.
We explained in Coon that "the premise that the scientific validity of a technique is a legal issue which does not turn on case-sensitive facts" fails to "adequately take account of the reality of the judicial process and the variable state of science." We quoted with approval the New Mexico Supreme Court’s reasoning that the idea that appellate courts are best suited to rule on the validity of a scientific theory or technique assumes "that the record on appeal contains all of the relevant, most recent data concerning the scientific method" and that "there is always a reservoir of scientific literature that an appellate court might independently reference in a de novo review." We also expressed concern about making determinative rulings at all, again noting the New Mexico Supreme Court’s reasoning that "the state of science is not constant; it progresses daily." We explained that "[t]he principal reason for adopting the Daubert standard is to give the courts greater flexibility in determining the admissibility of expert testimony, so as to keep pace with science as it evolves," and concluded that abuse of discretion review "best comports with these aims."
Coon , 974 P.2d at 399.
Id. (quoting State v. Alberico , 116 N.M. 156, 861 P.2d 192, 205 (1993) ).
Id. (quoting Alberico , 861 P.2d at 205 ).
Id.
We do not take these concerns lightly: the record on appeal is limited to the testimony and exhibits in the superior court’s case file, so there is a non-negligible risk that reviewing the validity of scientific evidence de novo could lead us or the court of appeals to decide a case involving the admissibility of scientific evidence based on incomplete information. But the superior court is also limited to the testimony and evidence presented at the hearing. And appellate courts will often have more time than trial courts to mitigate that risk through careful study of secondary sources such as scientific treatises and surveys of academic literature in the relevant field.
Overturning a prior appellate decision requires showing that the decision was either "originally erroneous or is no longer sound because of changed conditions." If an appellate court has made a Daubert determination and then new scientific research becomes available, or if a litigant identifies research that the appellate court overlooked, the trial court would be justified in holding an evidentiary hearing to make a complete record and rule in the alternative. The appellate court would then have the ability to reconsider admissibility under Daubert and Coon . In either case, presenting this new or overlooked evidence is no more of a burden on litigants than the burden they would otherwise have to present relevant evidence at an original Daubert hearing.
Young v. State , 374 P.3d 395, 413 (Alaska 2016) (quoting Pratt &Whitney Canada, Inc. v. Sheehan , 852 P.2d 1173, 1176 (Alaska 1993) ).
In short, Coon ’s fears that de novo review of Daubert determinations would result in the law of scientific evidence becoming set or stagnant and unchanging appear somewhat exaggerated. However, for the reasons discussed above, de novo review will not necessarily allow appellate courts to decide once and for all time whether a particular technique is scientifically valid, as the court of appeals seems to hope. Nonetheless, adopting a less deferential standard of review on appeal would allow trial courts and parties to avoid repeatedly relitigating the validity of scientific evidence, saving the court and parties the time, effort, and cost of a Daubert hearing—at least absent new or previously overlooked research and evidence. It would also ensure that the admissibility of scientific evidence is consistent throughout the courts of this state.
For these reasons, we agree with the court of appeals—and with the dissent in Coon —that a more probing standard of review is warranted in an appeal from a Daubert determination. As explained above, our decision in Coon reviewed the preliminary findings underlying the superior court’s application of the Daubert standard—whether the technique had been tested, whether it had been subject to publication and peer review, etc.—for clear error, but reviewed the court’s ultimate determination of reliability for abuse of discretion. Going forward, we will instead apply our independent judgment to the question whether—based on the evidence presented and the scientific literature available—the technique or theory underlying the proposed expert testimony is sufficiently reliable to satisfy Daubert and Coon .
This approach is consistent with our standard of review in a number of other contexts. For example, we have explained in the context of reviewing a denial of a motion to suppress evidence that although "[t]he trial court’s findings of fact will not be disturbed unless they are clearly erroneous," the question "[w]hether the trial court’s findings support its legal conclusions is a question we answer with our independent judgment." State v. Wagar , 79 P.3d 644, 650 (Alaska 2003) (quoting State v. Joubert , 20 P.3d 1115, 1118 (Alaska 2001) ).
Coon , 974 P.2d at 400-02.
Whether the evidence being offered is ultimately admissible will also depend on case-specific factors, including whether the evidence is helpful to the trier of fact, whether the relevant scientific theory or technique "properly can be applied to the facts in issue," and whether the proposed expert testimony satisfies or runs afoul of other evidentiary rules. Daubert v. Merrell Dow Pharm., Inc. , 509 U.S. 579, 592-95, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993) ; see also Alaska R. Evid. 702. These questions generally fall within the discretion of the trial court, and we will review them accordingly.
In sum, we will limit our independent review to the broad question whether the underlying scientific theory or technique is "scientifically valid" under the first prong of the Daubert analysis.
Daubert 509 U.S. at 592-95, 113 S.Ct. 2786.
D. Admissibility
1. Alaska’s case law on polygraph testing
Although we have not previously addressed the admissibility of polygraph evidence under Daubert and Coon, a discussion of our pre- Daubert case law on the subject provides useful context and perspective. In 1970 we concluded in Pulakis v. State that polygraph evidence offered in a criminal trial is generally inadmissible. Pulakis was convicted of larceny after a jury trial. At trial the prosecution introduced testimony from a police polygraph examiner that Pulakis underwent two polygraph examinations and that, in the examiner’s opinion, "the examinations revealed that deceptive answers were given to four crucial questions." Pulakis challenged his conviction on appeal, arguing that admitting the polygraph testimony was plain error. Citing Frye, as well as language from some of our previous opinions, we observed that "[t]he general rule is that the results of polygraph tests are not admissible in evidence." We explained that "judicial antipathy" to polygraph evidence had not diminished significantly since Frye was decided in 1923, and that court decisions considering the issue "reflect a high degree of sensitivity to the numerous potential sources of error in the ascertainment of deception through polygraph examinations." We concluded that the "central problem regarding admissibility is not that polygraph evidence has been proved unreliable, but that polygraph proponents have not yet developed persuasive data demonstrating its reliability." We therefore held that, although we were "not prepared to say whether polygraph examiners’ opinions are reliable[,] ... the results of polygraph examinations should not be received in evidence over objection." However, we ultimately upheld Pulakis’s conviction because he had waived objection to the evidence at trial and we did not "find polygraph tests so demonstrably unreliable as to require a finding of plain error."
476 P.2d 474, 478-79 (Alaska 1970).
Id. at 474-75.
Id. at 477.
Id. at 476.
Id. at 477 (quoting Gafford v. State , 440 P.2d 405, 410 (Alaska 1968) ).
Id. at 478.
Id. at 479.
Id.
Id. at 479-80.
After we decided Pulakis , several cases in the court of appeals dealt not with the admissibility of polygraph evidence directly, but rather with the admissibility of references in other testimony to a party’s willingness to submit to a polygraph test. The court of appeals noted that "[d]espite its unreliability, polygraph evidence might be perceived by the jury as a complete answer to questions of credibility" and "could also lull the jury into a false sense of security and result in the jury failing to carefully scrutinize conflicting witness testimony." Similarly, the court of appeals was concerned that "a jury may conclude that a witness’s willingness to take a polygraph test is circumstantial evidence that the witness is telling the truth," and therefore concluded that even references to polygraph tests should be either inadmissible or subject to significant limiting instructions.
See, e.g. , Willis v. State , 57 P.3d 688 (Alaska App. 2002) ; Leonard v. State , 655 P.2d 766 (Alaska App. 1982).
Leonard , 655 P.2d at 770 ; see also Willis , 57 P.3d at 692.
Willis , 57 P.3d at 692 ; see also Leonard , 655 P.2d at 771.
The court of appeals first considered the admissibility of polygraph test results in Haakanson v. State . In that case the court was asked to reconsider Pulakis and find polygraph testimony admissible in light of alleged changes in polygraph technology and increased "acceptance among polygraph examiners of the polygraph’s reliability to show truthfulness." The court of appeals applied Frye ’s general acceptance standard: it concluded that for purposes of that analysis, the relevant question could not be limited to the acceptance of polygraph testing among polygraph examiners; rather, the court decided that under our decision in Contreras v. State , the "relevant scientific community" includes the "professions which have studied and/or utilized [the technique] for clinical, therapeutic, research and investigative applications" and specifically excludes "those whose involvement with [the technique] is strictly limited to that of practitioner." Applying that standard, the court of appeals concluded that there was "considerable controversy over the reliability of polygraphs as a scientific process," and that "Haakanson ha[d] not established that there [was] a consensus among the experts regarding the reliability of the polygraph technique." The court of appeals also expressed "concern[ ] about the disproportionate impact polygraph evidence may have on a jury." Citing its previous concerns about polygraph testimony being "perceived by the jury as a complete answer to questions of credibility" and its potential to "lull the jury into a false sense of security," the court of appeals held that "[a]ny evidence which has such great potential to mislead or prejudice the jury should be excluded unless its probative value clearly outweighs the prejudice." The court of appeals found the "probative value of polygraph evidence [to be] insubstantial because the polygraph has not been proven reliable"; thus, the polygraph evidence in that case was inadmissible.
760 P.2d 1030 (Alaska App. 1988).
Id. at 1031-32.
Id. at 1034 (quoting Contreras v. State , 718 P.2d 129, 135 (Alaska 1986) ).
Id. at 1035.
Id.
Id. (quoting Leonard v. State , 655 P.2d 766, 770 (Alaska App. 1982) ).
Id.
2. Polygraph evidence under Daubert in other states
Other jurisdictions that apply the Daubert test have also rejected evidence based on the CQT method. For example, in State v. Porter the Connecticut Supreme Court adopted Daubert as the relevant standard for scientific evidence and upheld its traditional per se ban on admitting polygraph evidence. Jurisdictions that have adopted Daubert and maintain a per se exclusion of polygraph evidence include Idaho, West Virginia, Hawaii, Vermont, the District of Columbia, and the Court of Appeals for the Fourth Circuit. In United States v. Scheffer the Supreme Court held that a per se rule excluding polygraph evidence does not infringe on the constitutional rights of an accused to present evidence in his defense; implied in the Court’s reasoning is the corollary conclusion that such a rule is also not inconsistent with Daubert . According to one treatise on scientific evidence, a majority of states still followed this "traditional rule" of excluding polygraph evidence as of 2012, when Alexander’s evidentiary hearing took place. The superior court in Alexander’s case surveyed polygraph admissibility in "all 50 states and the federal circuits" at the time of the hearing and found that "30 jurisdictions still have a per se ban, 17 admit polygraph results based upon stipulation, and 12 leave the decision to the trial court’s discretion on a case-by-case basis."
State v. Porter , 241 Conn. 57, 698 A.2d 739, 742 (1997).
State v. Perry , 139 Idaho 520, 81 P.3d 1230, 1235-36 (2003) (concluding that polygraph evidence is "useful to bolster [the examinee’s] credibility but do[es] not provide the trier of fact with any additional information" and that it is inadmissible "because it does not assist the trier of fact to understand the evidence or to determine a fact in issue").
State v. Beard , 194 W.Va. 740, 461 S.E.2d 486, 492-493 (1995) ("Despite Appellant’s noteworthy efforts at trying to elevate the image of polygraph results, we remain convinced that the reliability of such examinations is still suspect and not generally accepted within the relevant scientific community. Therefore, any speculation that our position ... regarding polygraph admissibility is in question due to the Daubert /Wilt rulings is put to rest today." (emphasis in original) (footnote omitted) ).
State v. Okumura , 78 Hawai'i 383, 894 P.2d 80, 94 (1995) (reaffirming Hawaii’s per se exclusion of polygraph evidence), abrogated on other grounds by State v. Cabagbag , 127 Hawai'i 302, 277 P.3d 1027, 1038-39 (2012).
Rathe Salvage, Inc. v. R. Brown & Sons, Inc. , 191 Vt. 284, 46 A.3d 891, 897-901 (2012) (affirming denial of Daubert hearing on polygraph reliability on grounds that even assuming polygraph evidence satisfies Daubert it is still inadmissible under Rule 403 ).
See Rowland v. United States , 840 A.2d 664, 673-74 (D.C. 2004) (citing Proctor v. United States , 728 A.2d 1246, 1249 (D.C. 1999) and Peyton v. United States , 709 A.2d 65, 65 (D.C. 1998) ) (excluding polygraph testimony). The D.C. Court of Appeals only recently adopted Daubert , see Motorola Inc. v. Murray , 147 A.3d 751, 756-57 (D.C. 2016), and it does not appear to have since heard a case involving polygraph testimony.
See United States v. Prince-Oyibo , 320 F.3d 494, 501 (4th. Cir. 2003). In addition, the Sixth Circuit has held that, although it "has never adopted a per se prohibition on the introduction of polygraph evidence," it "generally disfavor[s] admitting the results of polygraph evidence" because "the results of a polygraph are inherently unreliable." United States v. Thomas , 167 F.3d 299, 308 (6th Cir. 1999). Furthermore, the Sixth Circuit has "repeatedly held that ‘unilaterally obtained polygraph evidence is almost never admissible under Evidence Rule 403.’ " Id. at 309 (quoting United States v. Sherlin , 67 F.3d 1208, 1216 (6th Cir. 1995), and citing Wolfel v. Holbrook , 823 F.2d 970, 973-75 (6th Cir. 1987) ; Barnier v. Szentmiklosi , 810 F.2d 594, 597 (6th Cir. 1987) ).
See id at 309-12, 118 S.Ct. 1261.
See Giannelli, et al. , supra note 4 § 804[b], at 465 & n.173.
Of the jurisdictions that allow polygraph evidence based on the judge’s discretion, New Mexico is a notable example. Unlike the Alaska Evidence Rules, the New Mexico Rules of Evidence (NMRE) specifically address polygraph examinations. Under NMRE 11-707, the opinion of a polygraph examiner "as to the truthfulness of a person’s answers in a polygraph examination may be admitted" if a number of specific criteria regarding the examiner’s qualifications and the test procedure are met. In Lee v. Martinez the New Mexico Supreme Court held that when the expert’s qualification and the examination meet this rule’s standards, "polygraph examination results are sufficiently reliable to be admitted" under the Daubert standard and NMRE 11-702—New Mexico’s equivalent to Alaska Evidence Rule 702. However, the court also concluded that NMRE 11-707 only makes polygraph evidence admissible subject to the discretion of the trial judge’s balancing of probative value against unfair prejudice.
N.M. R. Evid. 11-707 (2018).
136 N.M. 166, 96 P.3d 291, 293-94 (2004).
Id. at 294.
3. The Daubert factors, applied
Both the Supreme Court in Daubert and our court in Coon explained that the listed factors should not be seen as a determinative checklist, but that the standard is a flexible one. Because the Daubert factors are a good starting point, and the superior court started with them in Alexander , these factors will be discussed in turn here.
Daubert v. Merrell Dow Pharm., Inc. , 509 U.S. 579, 594-95, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993) ("The inquiry envisioned by Rule 702 is, we emphasize, a flexible one.... The focus, of course, must be solely on principles and methodology, not on the conclusions that they generate."); State v. Coon , 974 P.2d 386, 395 (Alaska 1999) ("The factors identified in Daubert provide a useful approach.... Other factors may apply in a given case.").
i. Empirical testing
The first relevant question is whether CQT polygraphy can be, and has been, empirically tested. The superior court in Alexander found that "the hypotheses underlying the polygraph can be and ha[ve] been tested repeatedly, including tests by both Drs. Raskin and Iacono." In light of the record before us and the scientific literature available, this finding is at least partly erroneous.
It is true that Dr. Raskin and Dr. Iacono both testified about a number of studies—conducted by them and others—that have tested the practical application of CQT polygraphy. But one central criticism that Dr. Iacono’s testimony raised was the lack of studies testing the psychological hypotheses that serve as the underlying premise of polygraph testing. For a CQT polygraph test to yield reliable inferences about deception, it must be the case that (1) deception on relevant and comparison questions produce different psychological states; (2) these psychological states produce measurably different physiological responses; (3) these physiological responses include the ones that the polygraph instrument measures; (4) these physiological responses are unlikely to arise from causes other than deception; (5) the scoring system captures the physiological differences relevant to deception; and (6) examiners accurately assign conclusions of deception or honesty to certain score values when they interpret scores. Many of these assumptions and hypotheses appear not to have been tested; even more important, some may not be readily testable.
This is the concept of criterion validity , or the degree to which an empirical measure actually "matches a phenomenon that the test is intended to capture." Nat’l Research Council , supra note 6, at 31.
See id. at 67.
In particular, CQT polygraph examinations are based on the theory that while a truthful person will respond more strongly to the comparison questions, a deceptive person will have a stronger reaction to the relevant questions. Dr. Iacono criticized this as an unfounded assumption, arguing for example that a truthful person might react strongly to the relevant questions due to the implications of a false accusation, while a guilty person outside of laboratory studies might have a reduced reaction to the relevant questions due to the phenomenon of habituation. On those grounds, Dr. Iacono concluded that "the CQT has ... a weak theoretical foundation." He testified that this underlying theory has not been properly tested, in part because laboratory studies cannot duplicate all of the considerations that might be relevant in the field—like habituation or a truthful examinee reacting to the relevant questions out of fear of being falsely accused—and in part because field studies have difficulties establishing the "ground truth" of whether an examined person was actually lying. Determining ground truth presents practical problems that are difficult, perhaps even impossible, to overcome, meaning that true accuracy rates may not be empirically verifiable. Dr. Iacono testified that many field studies focus on criminal cases and use confessions to determine ground truth, but noted that this is problematic because whether or not a defendant passes or fails a polygraph exam affects how likely he is to subsequently confess. Several studies and surveys of polygraph research have reached similar conclusions. For example, a 2003 review of the scientific evidence on polygraphy by the National Research Council concluded that "[p]olygraph research has not developed and tested theories of the underlying factors that produce the observed responses." Similarly, a more recent survey of academic literature concluded that "[i]t appears unlikely that the proponents of the CQT will be able to reconcile the theoretical flaws of their technique in the foreseeable future." Although there have been numerous studies testing the practical applications of the comparison question technique, our review of the record and the available academic literature reveals no studies actually testing the underlying psychological theories. Ultimately, given the fact that certain assumptions of polygraph testing not only are untested, but may be functionally untestable , we conclude that this factor weighs decidedly against admitting polygraph testimony as scientific evidence.
The term "habituation" refers to a "decline in responsiveness to a stimulus due to repeated exposure." Habituation , American Heritage Dictionary (5th ed. 2014). In the context of a polygraph test administered to a criminal defendant, this phenomenon could influence the test results because the relevant questions on the test are directed at the same conduct the defendant has already been accused of and charged with: "[I]f the individual has discussed the crime at length or on numerous occasions, they may have become habituated to talking about the case and no arousal is detected." Erin M. Oksol & William T. O’Donohue, A Critical Analysis of the Polygraph , in Handbook of Forensic Psychology 601, 621 (William O’Donohue & Eric Levensky eds., 2003); see also Lee v. Martinez , 136 N.M. 166, 96 P.3d 291, 318 (2004).
Confessions may also be unreliable measures of ground truth for other reasons. The Innocence Project reports that of the more than 360 DNA exoneration cases in the United States, roughly 28% involved a false confession in the initial conviction. DNA Exonerations in the United States , Innocence Project (2017), https://www.innocenceproject.org/dna-exonerations-in-the-united-states/ (last visited Oct. 16, 2018). It is not possible to infer the overall rate of false confessions from this data, but it is enough to raise questions about how accurately confessions establish ground truth.
Nat’l Research Council , supra note 6, at 2.
Synnott et al., supra note 36, at 76.
ii. Peer review
The superior court in Alexander found that CQT polygraphy has been the subject of various publications, many of which were peer reviewed. This finding is amply supported by the record, and the State does not suggest otherwise. However, as the Supreme Court explained in Daubert , the mere fact of publication in a peer-reviewed journal is not itself probative of a technique’s validity; rather, peer review and "submission to the scrutiny of the scientific community" is relevant because "it increases the likelihood that substantive flaws in the methodology will be detected." As discussed above, the published studies on CQT testing have been subject to substantial scrutiny, and a vigorous debate has arisen about substantive flaws in the theoretical underpinnings of the technique. Notwithstanding this debate, which has been ongoing for decades, the practice of CQT polygraph testing does not appear to have developed in any significant way. Most of the studies cited by Dr. Raskin in support of the technique are from the 1980s and 1990s, with some dated as far back as the late 1970s; and although the superior court’s Daubert hearing was conducted in 2012, Dr. Raskin did not cite to any studies published more recently than 2003. Thus, although studies regarding CQT polygraphy have been published in peer-reviewed journals, it does not appear that this has resulted in the kind of refinement and development that makes publication and peer review relevant to a Daubert analysis. For this reason, although the superior court in Alexander did not clearly err in finding that polygraph testing has been the subject of publication and peer review, we give this finding little weight.
Daubert v. Merrell Dow Pharm., Inc. , 509 U.S. 579, 594-95, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993).
See United States v. Scheffer , 523 U.S. 303, 309-10, 118 S.Ct. 1261, 140 L.Ed.2d 413 (1998) (citing sources debating the validity of CQT polygraphy dating to the late 1980s).
Again, 2003 was the year the National Research Council concluded that polygraph research had not developed or tested the psychological theories assumed to underlie the physical responses the polygraph measures. Nat’l Research Council , supra note 6, at ii , 2.
iii. Acceptable error rate
The superior court in Alexander found that the error rate of CQT polygraph testing is "sufficiently reliable" to be acceptable. The court reasoned that the studies cited by Dr. Raskin showed an accuracy rate of 89% to 98%, while those cited by Dr. Iacono had accuracy rates from 51% to 98%, with an average of 71%. Dr. Raskin estimated that the overall accuracy rate of CQT polygraph testing was around 90%. The court recognized a number of concerns that might affect the accuracy rate of polygraph exams in practice, including the "friendly examiner" hypothesis and the possibility of examinees using countermeasures to "beat" the test. But the court concluded that these concerns "are already built in to the error rate" and are relevant to the weight the jury should assign to the testimony, not to admissibility.
As a preliminary matter, the superior court appears to have misunderstood Dr. Iacono’s testimony. As discussed above, Dr. Iacono criticized each study he discussed, testifying that the accuracy rates reported in those studies were either invalid or not applicable to practical applications of the CQT technique in the field; he concluded that "it’s not possible to accurately estimate the error rate of the controlled question test when it’s used in real life applications." The court’s conclusion that the various concerns discussed are "already built in to the error rate" has no support in the record: while individual studies may have tested specific variables such as countermeasures, neither expert cited any laboratory study that controlled for all of them.
Dr. Iacono also testified that field studies on polygraph testing are unreliable and often "contain a bias of potentially serious magnitude toward overestimating the accuracy" of the test. A typical study, according to Dr. Iacono, would look at cases where the defendant took a polygraph test and later confessed; in such cases, the polygraph chart would be blindly rescored and then compared to the confession. But Dr. Iacono testified that failing a polygraph test often pressures a defendant into confessing, while passing the test substantially decreases the chance of a confession. As such, he explained, field studies are subject to a substantial selection bias: a case is most likely to end up in the study only if the defendant failed a polygraph test and subsequently confessed. When the study then rescores the polygraph chart, Dr. Iacono testified that it is not surprising the results exceed 90% accuracy.
In addition to potential flaws in the perceived accuracy rates of CQT tests, the empirical basis for polygraph examinations suffers from another fault: the lack of a reliable "base rate." In the three cases currently before this court, each defendant was said to have passed his polygraph test; the relevant question for the factfinder is whether, given this fact, the defendant was likely truthful or whether the test was a false negative. To determine this likelihood, more information is required; specifically, information about the base rate of deceptive and truthful subjects.
The "base rate" refers to the probability "of the target condition in the population or in the sample at hand—for security screening, this might refer to the proportion of spies or terrorists or potential spies or terrorists among those being screened." Nat’l Research Council , supra note 6, at 46. A sample population of criminal suspects, for example, may have a higher base rate of deceivers than other sample populations. Id. at 47.
The lack of a reliable base rate estimate was the underlying reason for the Connecticut Supreme Court upholding its traditional per se ban on admitting polygraph evidence in State v. Porter . Noting "wide disagreement" about the accuracy rates for "a well run polygraph exam," the court decided that, even if the estimates of polygraph proponents were accepted, the technique would still be "of questionable validity." The court cited a field study by Dr. Raskin indicating a sensitivity of 87% and a specificity of 59%: "In other words, 13 percent of those who are in fact deceptive will be labeled as truthful ... [and] 41 percent of subjects who are, in fact, truthful will be labeled as deceptive." The court further reasoned that, even if a test is accurate, its probative value as scientific evidence depends on its "predictive value"—the likelihood "that a person really is lying given that the polygraph labels the subject as deceptive" and the likelihood "that a subject really is truthful given that the polygraph labels the subject as not deceptive." This predictive value, the court explained, depends not only on the accuracy of the test but also "on the ‘base rate’ of deceptiveness among the people tested by the polygraph." Because the Porter court found a "complete absence of reliable data on base rates," it concluded that it had no possible way of assessing the test’s probative value. With that in mind, the court concluded that even if polygraph evidence satisfies the Daubert standard, which it assumed without deciding, the probative value of such evidence is very low and substantially outweighed by its prejudicial effects.
241 Conn. 57, 698 A.2d 739, 766-69 (1997).
Id. at 764, 766.
"There are two distinct aspects to accuracy. One is sensitivity. A perfectly sensitive indicator of deception is one that shows positive whenever deception is in fact present: it is a test that gives a positive result for all the positive (deceptive) cases; that is, it produces no false negative results. The greater the proportion of deceptive examinees that appear as deceptive in the test, the more sensitive the test. Thus, a test that shows negative when an examinee who is being deceptive uses certain countermeasures is not sensitive to deception. The other aspect of accuracy is specificity. An indicator that is perfectly specific to deception is one that always shows negative when deception is absent (is positive only when deception is present). It produces no false positive results. The greater the proportion of truthful examinees who appear truthful on the test, the more specific the test. Thus, a test that shows positive when a truthful examinee is highly anxious because of a fear of being falsely accused is not specific to deception because it also indicates fear." Nat’l Research Council , supra note 6, at 38.
Porter , 698 A.2d at 766.
Id.
Id. at 766-67 (footnote omitted).
Id. at 768. As the Porter court described, "[t]he base rate is important because it can greatly accentuate the impact of the false positive and false negative rates arising from any given specificity and sensitivity values." Id. at 767 n.53. For example, "[i]f one assumes base rates progressively higher than 50 percent, then, by definition, the number of deceptive examinees increases and the number of honest examinees decreases." Id. Thus, "even holding specificity and sensitivity rates constant, as the base rate increases the number of false negatives (the labeling of deceptive subjects as truthful) also rises and the number of false positives (the labeling of truthful subjects as deceptive) falls." Id.
Id. at 768-69
As in Porter , the record before us is devoid of reliable data about the base rate of deceptiveness among polygraph examinees outside of lab tests; we also have not found such data in academic literature. Absent some reliable estimate of this base rate there is no way to estimate the reliability of polygraph results, and thus no way to determine whether any particular accuracy rate is acceptable. We conclude that the superior court clearly erred in finding the error rate of CQT polygraph testing to be "sufficiently reliable." Accordingly, this factor weighs against admitting polygraph evidence.
iv. Standards for operation
Under Daubert the court should consider "the existence and maintenance of standards controlling the technique’s operation." The superior court in Alexander found "that although there is no single published protocol that all polygraphers must follow, that nonetheless there are published protocols and training criteria" that are sufficiently utilized so as to be considered standard. Additionally, the court found there was no indication that "Dr. Raskin did not properly administer the two exams."
Daubert v. Merrell Dow Pharm., Inc. , 509 U.S. 579, 594-95, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993).
Standards do control some aspects of polygraph testing and many states also have statutes governing polygraph test administration, examinees’ privacy rights, and licensing of examiners. To describe the standards for administration of polygraphs, Dr. Raskin pointed to New Mexico Evidence Rule 11-707 as providing "clear standards for tests to be offered as evidence" and described the rule as "a superior model for national standards." He also referenced standards adopted by national polygraph organizations and standards imposed by government agencies.
See, e.g. , La. Stat. Ann. §§ 37:2831 -2854 (2018); Me. Rev. Stat. tit. 32, §§ 7351 -7390 (2018) ; Nev. Rev. Stat. Ann. §§ 648.183 -.199 (West 2017); Or. Rev. Stat. Ann. §§ 703.010 -.310 (West 2018) ; Vt. Stat. Ann. tit. 26, §§ 2901 -2910 (2018).
Rule 11-707 provides that a polygraph examiner’s opinion testimony is admissible if the examiner is qualified, the scoring method used is "generally accepted as reliable by polygraph experts," the examiner was informed of relevant information regarding the examinee prior to the exam, two or more relevant questions were asked, three or more charts were taken, and the exam was recorded. However, what constitutes a "generally accepted" scoring method is not further defined. A "relevant question" is simply defined as "a clear and concise question which refers to specific objective facts directly related to the purpose of the examination and does not allow rationalization in the answer." Even if we were to conclude that these standards are sufficient to "control[ ] the technique’s operation," Rule 11-707 is not a national standard. As both the court in Alexander and Dr. Raskin acknowledged, there is no one "controlling" industry standard and there may be great differences in "generally accepted principles."
Id. 11-707(A)(4).
Daubert , 509 U.S. at 594, 113 S.Ct. 2786.
It is clear that some aspects of the test lack standards, or at least consistent standards. Specifically, the formulation and ordering of questions, the conducting of the pretest interview, the choice of scoring system, and the evaluation of the examinee’s demeanor leave much to the examiner’s discretion. While the superior court’s finding regarding CQT protocols was not clearly erroneous, we conclude that the lack of clear controlling standards for CQT administration weighs against its admissibility.
See Synnott et al., supra note 36, at 68 ("The number of total questions asked, the order in which ... questions are placed and whether any or all questions are repeated ... [depend] on the situation, examiner’s preference and the school the examiner subscribes to.").
Id. at 67 ("[D]epending on the situation, examiner’s personal preferences and the ‘polygraph school’ the examiner subscribes to, ... [much of] the pre-test interview can vary greatly.... [and it] can last anywhere between 30 min and 2 h....").
Id. at 68 (describing examiner discretion to set cut-off points for numerical scoring systems and outlining several types of computerized scoring systems).
See Nat’l Research Council , supra note 6, at 16 ("[T]he polygraph examiner is likely to form impressions of the examinee’s truthfulness, based on the examinee’s demeanor.... These impressions are likely to affect the conduct and interpretation of the examination and might, therefore, influence the outcome and the validity of the polygraph examination.").
v. General acceptance
The superior court found that the record is "inconclusive as to whether there is general acceptance within the relevant scientific community." The State argues that CQT polygraphy has not gained general acceptance, while the defendants appear to argue primarily that "inconclusiveness on this factor goes to the weight and not the admissibility of the evidence."
Both Dr. Raskin and Dr. Iacono testified about a variety of surveys regarding the acceptance of polygraphy. Dr. Iacono also testified about a number of scientific publications that conclude polygraph examinations are unreliable. Based on a review of this evidence and literature, it appears that the parts of the scientific community who regularly utilize polygraphy have—perhaps unsurprisingly—widely accepted the technique, while the broader scientific community views the technique more skeptically.
We note that under Contreras v. State , 718 P.2d 129, 135 (Alaska 1986), the "relevant scientific community" for a general acceptance analysis excludes "those whose involvement with [the technique] is strictly limited to that of practitioner." This would not exclude those who, like Dr. Raskin, both conduct research into polygraph testing and administer polygraph examinations. But it would exclude those who do only the latter.
In light of this record and the scientific literature, the superior court’s finding that it is "inconclusive" whether polygraphy is generally accepted is not clearly erroneous. But as the Supreme Court noted in Daubert , " ‘a known technique which has been able to attract only minimal support within the community’ may properly be viewed with skepticism." The Supreme Court’s comment appears particularly apt in this case. Given the decades-long debate over the validity of polygraph evidence, the apparent lack of development in the technique as a response to that debate, and the apparently lackluster support for the technique outside the community of practicing polygraph examiners, we conclude that this factor also weighs against admitting polygraph evidence.
Daubert v. Merrell Dow Pharm., Inc. , 509 U.S. 579, 594, 113 S.Ct. 2786, 125 L.Ed.2d 469 (1993) (quoting United States v. Downing , 753 F.2d 1224, 1238 (3d Cir. 1985) ).
vi. Other relevant factors
As noted above, both Daubert and Coon recognize that other factors than those discussed above may be relevant in some cases. For example, Coon briefly mentions the possibility of " ‘independent’ research funded by tobacco companies" carrying with it "the danger of a hidden litigation motive." This is a relevant consideration in this case. Dr. Raskin, who testified at the Daubert hearing in favor of admitting polygraph evidence, is himself a practicing polygraph examiner and has financial ties to one manufacturer of polygraphs, earning royalties from the sale of polygraph machines he invented. Many of the studies cited as approving polygraph testing as scientifically valid were performed by Dr. Raskin or by other practicing examiners, and a number of the studies were published in polygraph industry publications. While we do not entirely discount this research and have examined it on its merits, we recognize that the polygraph industry has an obvious financial interest in confirming polygraph testing as valid and promoting its use and admissibility in court.
State v. Coon , 974 P.2d 386, 395 (Alaska 1999).
vii. Conclusion
In light of each of the factors discussed above, we conclude that on the evidence before us, CQT polygraph testing has not been shown to satisfy the standard for scientific evidence set forth in Daubert and Coon . We reiterate what we said in Pulakis : "polygraph proponents have not yet developed persuasive data demonstrating its reliability." Absent such data, we are unconvinced that the opinion of polygraph examiners amounts to "scientific, technical, or other specialized knowledge" that "will assist the trier of fact to understand the evidence or to determine a fact in issue," as required under Evidence Rule 702. Our opinion here does not mean that CQT polygraph testing will never be sufficiently reliable to pass muster as scientific evidence, but absent substantial evidence demonstrating that CQT polygraph testing produces reliable results based on sound, verifiable science, the results of CQT polygraph examinations cannot be admitted in evidence over objection.
Pulakis v. State , 476 P.2d 474, 479 (Alaska 1970).
V. CONCLUSION
We REVERSE the judgment of the court of appeals affirming the superior court’s order admitting Alexander’s polygraph evidence. We REVERSE the superior court’s order admitting Sharpe’s polygraph evidence. We AFFIRM the superior court’s order excluding Holt’s polygraph evidence. We REMAND Alexander’s and Sharpe’s cases to the superior court for further proceedings consistent with this opinion relating to their respective criminal charges. We also REMAND Holt’s case to the court of appeals for further proceedings as appropriate on Holt’s remaining points of appeal. We do not retain jurisdiction.