Few questions in applied psychology have been argued more publicly than whether the polygraph works. This post separates the questions that get conflated under the single word “accuracy,” reads the modern peer-reviewed meta-analyses on their own terms, and tests the standard criticisms against data rather than assertion. The full citations — with links where available — are in Sources, and each inline footnote links to its numbered entry.
At a glance — seven takeaways
- Unaided human lie-detection averages about 54% — barely above a coin-flip — and training does not improve it[1].
- The Comparison Question Test">Comparison Question Test discriminates far better: APA 2011 puts validated accuracy at .85–.89[2] and the 2021 meta-analysis at rdec .69 / AUC .91[3].
- The forty-year field-versus-laboratory dispute has been settled empirically — in favour of the field[4].
- The “confession criterion inflates field accuracy” and “no better than chance” critiques fail against algorithmic-criterion field data and against exoneration records[5].
- The Concealed Information Test, where it can be run, is a still more accurate recognition method that is structurally protective of the innocent[6].
- The test is not perfect, and its evidence for clearing the innocent (specificity) is thinner than for catching the guilty (sensitivity)[7].
- Which polygraph matters most of all: single-issue diagnostic testing, low-base-rate screening, and recognition testing are three different propositions and should never be argued as one.
Contents
1. Why the question is harder than it looks
A serious answer keeps at least three questions apart. First, the discrimination question: how well does the technique separate truth-tellers from deceivers under conditions where ground truth is known? Second, the decision question: how should an examiner, court, or employer use a single result in the field, given the base rate of guilt in that setting? An instrument that discriminates well can still produce misleading individual verdicts when the prior probability of guilt is very high or very low. Third, the critique question: how do the standard objections actually hold up when checked against data?
The distinction that drives most of the confusion
A fourth question cuts across the other three: which polygraph? A single-issue, event-specific examination of a named suspect is a very different instrument from a multiple-issue security-screening examination of an applicant against whom there is no specific allegation. Most impressive accuracy figures come from the former; most damaging criticism — including the most pointed passages of the 2003 National Research Council report[8] — is really aimed at the latter. Conflating them is the single most common error in public discussion, and it runs both ways.
It is also worth separating two technical properties that laypeople routinely merge. Reliability is consistency: would two competent examiners scoring the same charts reach the same call? Validity is correctness: does the call correspond to the truth? A test can be highly reliable and still invalid. What follows is overwhelmingly about validity; inter-rater reliability on numerical scores is generally high and is not the locus of the controversy.
2. A short history of a long argument
The modern instrument has a surprisingly continuous lineage. William Moulton Marston’s discontinuous systolic-blood-pressure test was the technique at issue in Frye v. United States (1923), the case that gave American law its “general acceptance” standard — a standard later used, with some irony, to keep polygraph evidence out of court[9]. John Larson built the first continuous recorder of blood pressure, pulse and respiration in 1921; Leonarde Keeler added the electrodermal channel; and in the 1960s Cleve Backster introduced the Zone Comparison Technique and the idea of “psychological set.”
From the late 1970s the research bifurcated into two camps. The Utah group (Raskin, Kircher, Honts, with Bell and Podlesny) built the laboratory paradigm, the standardized Utah scoring system, the directed-lie variant and the computer algorithms; their position is that the Comparison Question Test, properly run, is a valid forensic tool. The Minnesota group (Lykken, Iacono, Patrick) pressed the opposite case: that the CQT lacks a coherent theory and that the only defensible psychophysiological method is the recognition-based Concealed Information Test. Gershon Ben-Shakhar and Eitan Elaad, in Israel, became the leading voices on that recognition paradigm. The argument has been institutional as much as scientific.
3. The interpersonal baseline
Any honest comparison starts with the alternative on offer when the polygraph is not used: unaided human judgment. The canonical meta-analysis put average accuracy at roughly 54%[10]. Trained, professionally interested observers do no better than laypeople; their experience mainly changes the shape of their errors, tilting them toward a lie bias[11][12].
The benchmark to beat
The relevant policy question is almost never “is the polygraph perfect?” It is “is the polygraph better than the judgment that would otherwise be made in its place?” 54% is the benchmark every credibility-assessment technology must beat. An instrument meaningfully better than that is adding information to a decision even if it falls a long way short of certainty.
4. What a Comparison Question Test measures, and how it is scored
The CQT records several autonomic channels simultaneously: thoracic and abdominal respiration, electrodermal activity, relative blood pressure and pulse via a cardio cuff, and, in modern instruments, peripheral vasomotor activity via a fingertip photoplethysmograph. None of these is a “lie response.” There is no physiological signature of deception as such; what the instrument measures is differential reactivity to different kinds of question.
A CQT interleaves two categories. Relevant questions address the matter directly (“Did you take the missing money?”). Comparison questions are built during the pre-test interview so that every examinee is left at least uncertain whether their answer is fully truthful (“Before the age of 25, did you ever take something that wasn’t yours?”). The governing logic is that a deceptive examinee reacts more strongly to the relevant questions, a truthful examinee to the comparison questions[13].
Probable-lie versus directed-lie comparisons
The older probable-lie comparison (PLC) manoeuvres the examinee into a denial the examiner believes is probably false. The newer directed-lie comparison (DLC) simply instructs the examinee to answer a generic question untruthfully. The DLC is more standardized and less dependent on examiner manipulation — one reason the Utah group favours it — and it costs nothing in accuracy (see §10).
Scoring has moved far from impressionistic “global” reading. The Utah numerical system scores each channel, at each relevant-versus-comparison pairing, on a seven-position scale from −3 to +3; the totals are compared against thresholds for one of three calls[14]. Computerized algorithms (OSS-3, ESS) now do much the same with explicit statistical models and agree closely with competent human scorers. The output vocabulary is precise: in single-issue testing, Deception Indicated (DI) / No Deception Indicated (NDI); in screening, Significant Response (SR) / No Significant Response (NSR); and, in both, Inconclusive (INC). For a plain-English walkthrough of what an examinee actually experiences on the day, see What Is a Polygraph Test?
5. Two paradigms, two literatures
“The polygraph” names two qualitatively different things. The CQT is a deception test. The Concealed Information Test (CIT; historically the Guilty Knowledge Test) is a recognition test: it asks whether the examinee’s nervous system reveals knowledge only a guilty party could possess[15][16]. Each paradigm has its own validity literature; they should never be cited interchangeably.
| Comparison Question Test (CQT) | Concealed Information Test (CIT) | |
|---|---|---|
| What it asks | Are you lying about the matter at hand? | Does your body reveal guilty knowledge? |
| Mechanism | Differential reactivity to relevant vs comparison questions | Orienting response to recognized crime detail |
| Protects the innocent by design? | No (specificity weaker than sensitivity) | Yes (false-positive rate controllable by adding items) |
| Main limitation | Requires a defensible comparison-question structure | Requires undisclosed crime details — usually unavailable |
| Field use | Dominant worldwide | Rare outside Japan |
The CIT’s laboratory record is, on its own terms, very strong — arguably stronger than the CQT’s. Ben-Shakhar and Elaad’s meta-analysis pooled 169 conditions from 80 studies (5,198 participants) and found d ≈ 1.55 overall, rising to d ≈ 3.12 (AUC ≈ .95) under optimal conditions[17]. Later work extending the analysis to respiration, heart rate and the P300 brain potential confirmed the picture[18].
Its weaknesses are equally real. Sensitivity to the guilty is lower than specificity to the innocent — guilty examinees sometimes fail to react, from poor encoding, the passage of time, or individual differences[19]. More fundamentally, the CIT can run only when the investigation possesses crime details kept from the public and the suspect — the exception, not the rule. The headline accuracy debate is therefore necessarily about the CQT.
6. What the meta-analyses actually say
If you want the headline accuracy figures by test type without the full apparatus, see How Accurate Is a Polygraph? What follows is the underlying evidence in detail.
6.1 APA 2011 — the canonical numbers
The American Polygraph Association’s 2011 meta-analytic survey remains the most-cited single source. It coded 45 samples from 38 studies — 11,737 scored results across 3,723 examinations[20].
| Technique class | Aggregate accuracy (excl. inconclusives) | 95% CI | Inconclusive rate |
|---|---|---|---|
| Event-specific single-issue (diagnostic) | .890 | .829–.951 | .110 |
| Multiple-issue / screening (DLST, LEPET, TES) | .850 | .773–.926 | .125 |
| All validated techniques aggregated | .869 | .798–.940 | .128 |
The available data are sufficient to describe the polygraph as highly accurate, but insufficient to support any claim that it delivers perfect or near-perfect accuracy.
— the defensible one-sentence summary of the APA 2011 findings
For a detailed reading of one influential synthesis that pulls these statistical threads together, see our summary of Nelson (2015), Scientific Basis for Polygraph Testing.
6.2 Kircher, Horowitz & Raskin 1988 — the antecedent
The 2011 survey did not appear from nowhere. Kircher, Horowitz and Raskin’s 1988 meta-analysis of mock-crime CQT studies established both the method and the metric — the Detection Efficiency Coefficient — that the modern literature still uses, and reported large effects two decades before the APA consolidated the field[21].
6.3 Honts, Thurber & Handler 2021 — the comprehensive recent meta-analysis
The most recent comprehensive treatment used deliberately broad inclusion criteria — the authors preferred to be accused of over-inclusion than of cherry-picking — capturing 138 datasets, 11,053 independent CQT decisions, with a pooled rdec of .69 [.66, .79][22]. That converts to a Cohen’s d of about 1.92, an AUC of about .91, and a Cohen’s U₃ of .973 — the upper half of the innocent distribution sits above 97.3% of the guilty distribution.
Key numbers — the median sample of 998 decisions
- 91.6%of deceptive calls correct
(excl. inconclusives) - 78.9%of truthful calls correct
- ~86%of decisions overall correct
- 18.3% / 10%inconclusive rate
innocent / guilty
The asymmetric inconclusive rate means an inconclusive result itself carries mild information toward truthfulness. The recurring theme: the test is consistently better at catching the guilty than at clearing the innocent.
7. Field versus laboratory
For forty years the strongest card in the sceptics’ hand was the worry that low-stakes laboratory mock crimes would not generalise to real suspects facing real consequences. The data did not cooperate with that fear — or with its opposite.
Key finding — field outperformed laboratory
Field studies returned rdec = .76 [.71, .81] — higher than laboratory studies at .64 [.60, .67]. Motivation behaved as a smooth linear moderator (Q = 333.15, p < .001), not the on/off switch the critics’ position requires[23]. Separating field from laboratory collapsed most other moderator effects.
The “confession criterion” critique — that field studies look good only because guilt is verified using confessions the polygraph itself helped extract — gets a decisive test. Two field studies derive the guilt criterion algorithmically rather than from confession, and returned rdec of .80 and .72[24][25]. If the confession criterion were the source of the field numbers, removing it should have collapsed them. It did not.
8. NRC 2003, read carefully
The National Research Council’s report is the source critics cite most, and it deserves to be read on its own terms. Quantitatively, it is not hostile to the instrument: it reported an AUC of about .86 for laboratory studies and about .89 for field studies — a Cohen’s d of roughly 1.74, comfortably in the “large effect” range[26]. On the numbers, the NRC and the APA are not far apart.
The gap between the NRC’s data and its rhetoric
The report’s qualitative reservations were gloomier than its own effect sizes warranted — and they were aimed at security screening at very low base rates, where even a good test produces an unacceptable absolute number of false positives. That conclusion is defensible. The problem is that it has been quoted ever since as though it were a verdict on event-specific single-issue testing, which it was not.
9. How to use an accuracy figure: base rate and Information Gain
Accuracy in the meta-analyses is computed across pooled samples. A real examiner, judge or employer cares about something narrower: the value of this result for this person. At a 1% base rate of guilt, even a 90%-accurate test generates far more false positives than true positives, simply because there are so many more innocent people to misclassify. This is not peculiar to the polygraph — it is true of every diagnostic test — and it is exactly why screening at low base rates is so treacherous.
The standard answer in the research literature is Information Gain (IG): rather than asking “what is the probability of guilt given the result?”, it asks “how much did this result reduce our uncertainty, relative to where we started?”[27][28].
Honts, Thurber and Handler ran the IG analysis across the full base-rate range. Deceptive IG peaked at 0.37 at a 32% base rate; truthful IG at 0.48 at a 78% base rate. The CQT outperformed interpersonal detection across essentially the entire usable range[29].
| Base rate of guilt | Outcome | CQT Information Gain | Layperson IG | Ratio |
|---|---|---|---|---|
| 66% (charged populations, US state courts) | Truthful | ~.47 | ~.05 | ~9× |
| Deceptive | ~.24 | ~.06 | ~4× | |
| 33% (three-suspect pool) | Truthful | ~.28 | ~.04 | ~7× |
| Deceptive | ~.37 | ~.06 | ~6× |
The lesson is a rule of interpretation, not a single headline: the direction of the result and the base rate of guilt together determine how much weight it deserves. A passed test in a high-prior population and a failed test in a low-prior population are both highly informative; the same results in the opposite settings tell you much less. A polygraph result is evidence to be weighed, never a verdict to be pronounced.
10. The standard critiques, examined
10.1 The Iacono “thought experiment”
For three decades the strongest single sceptical argument was a thought experiment that assumes a polygraph operating at pure chance, then reasons backwards to conditions under which a confession-criterion field study could nevertheless report high accuracy — concluding that the field literature is worthless[30][31].
Why the thought experiment fails
Honts and Thurber dismantle it on two fronts. First, its assumptions fail against evidence: no CQT study has ever reported chance accuracy; the assumed 20% confession-after-failure rate is arbitrary (one DoD programme reported relevant admissions in over 90% of failed exams); and the premise that the polygraph is the only source of guilt information was tested directly — Honts (1996) found no accuracy difference between confession-confirmed and independently-confirmed cases[32]. Second, its central empirical prediction is simply false — exoneration records do not show innocents failing at high rates[33][34].
10.2 “There is no theory”
The objection that the CQT should not be used because no complete, agreed theory explains why it works meets two responses. First, scientific maturity and clinical usefulness are not the same thing — aspirin was in worldwide use for the better part of a century before its mechanism was understood[35]. Second, the premise is outdated: several theoretical accounts now compete — Ginton’s Relevant-Issue-Gravity model[36], the differential-salience account[37], and Honts’s adaptation of the cognitive-load framework[38]. The meta-analytic evidence also disposes of the most popular folk theory — that fear of detection is the necessary mechanism — because the CQT discriminated strongly in laboratory studies carrying no explicit motivational stakes[39].
10.3 Probable-lie versus directed-lie comparison
Is the directed-lie comparison as valid as the older probable-lie comparison? A 250-subject mock-crime experiment scored two ways — by the OSS2 algorithm and by human Utah scoring — found that test type had no significant effect on accuracy, and that spontaneous countermeasure attempts trended lower under the directed-lie format[40]. The more standardized, less manipulative format costs nothing in accuracy.
10.4 Countermeasures
Where the instrument is genuinely most vulnerable
Spontaneous countermeasures (invented by the examinee) are common but largely ineffective and detectable. Trained countermeasures — physical (pressing toes, biting the tongue) or mental (counting backwards) manoeuvres timed to the comparison questions — are a more serious matter: sustained expert coaching can degrade CQT accuracy, which is why countermeasure detection is now standard in examiner training and algorithmic scoring[41]. The honest position is neither that the test is countermeasure-proof nor that it is trivially beaten. See Can You Beat a Polygraph? and Countermeasures in UK Field Polygraph Practice for a fuller treatment.
11. Screening versus single-issue testing
The sharpest practical division in the field is between event-specific diagnostic testing and screening. A single-issue examination concerns one known event and one named suspect; the criterion variance is non-independent; this is the use case that produces the .89-level accuracy. Screening (TES, LEPET, DLST) asks several unrelated questions of someone against whom there is no specific allegation, at a low base rate; the criterion variance is independent, accuracy is lower (the .85 figure), and the absolute false-positive problem is far worse[42]. This is the terrain on which the NRC’s pessimism is most justified.
A distinct application is post-conviction sex-offender testing (PCSOT), used to monitor compliance and elicit disclosure rather than determine guilt (see A Decade of Mandatory PCSOT in England and Wales). Its purpose — therapeutic and supervisory leverage — is different enough from forensic truth-finding that its results should not be pooled with diagnostic accuracy figures or read as verdicts.
12. The legal landscape
Law has mostly excluded the polygraph. Frye (1923) established “general acceptance,” which has remained elusive. Daubert v. Merrell Dow Pharmaceuticals (1993) replaced Frye in federal courts with a flexible reliability inquiry — which in principle opened a door, but most jurisdictions kept it shut[43]. In United States v. Scheffer (1998) the Supreme Court upheld a per se exclusion of polygraph evidence in the military, leaning on the absence of scientific consensus[44].
Employment law moved decisively. The Employee Polygraph Protection Act of 1988 ended roughly one million annual private-sector screening exams, prohibiting most private employers from requiring lie-detector tests — with exemptions (government; certain security and pharmaceutical roles; ongoing investigation of a specific economic loss) and a critical safeguard: a polygraph result may not be the sole basis for an adverse employment action[45].
The instructive exception: New Mexico
Under New Mexico Rule of Evidence 11-707, polygraph results have been admissible in that state’s courts since 1975, reaffirmed under Daubert in Lee v. Martinez (2004)[46]. What makes the regime worth studying is the conditions attached: full disclosure of all data and recordings at least 30 days before the proceeding, and of every polygraph the examinee has ever taken. Transparency substitutes for prohibition — a voluntary model the profession could adopt without waiting for legislatures.
13. What the evidence does not support — in both directions
Several propositions common in public discussion fail against the evidence: that the CQT is no more accurate than chance; that the confession criterion inflates field accuracy; that laboratory studies dramatically overstate field performance; and that no theory exists for how the CQT works.
Equally, several propositions common in professional settings overstate what is established. The APA 2011 committee noted that while sensitivity to deception is significantly above chance for every validated technique, the evidence for specificity rests on a thinner literature[47]. And because individual studies used different scoring thresholds, pooling their sensitivities and specificities is statistically unjustified[48][49][50]. The favourable verdict survives these cautions, but a practitioner who ignores them is overclaiming.
14. Two structural problems worth naming
Two field-practice problems routinely pull real-world accuracy below the meta-analytic ceiling, and both are about systems rather than physiology.
The first is decision policy. An instrument does not decide anything; a policy decides, using its output. One US FBI screening policy engineered to minimize false negatives — including by treating inconclusive outcomes as deceptive — left only about 17% of genuinely innocent examinees avoiding interrogation[51]. The same charts with a balanced cutoff would not expose innocent people to anything like that risk.
The second is regulatory fragmentation. Only 26 of the 50 US states license polygraph examiners; ethical standards are largely advisory and carry no force over non-members; unaccredited schools still train examiners[52]. The remedy most often proposed is not prohibition but transparency — the New Mexico model. The British Polygraph Society’s own Standards of Practice and Code of Ethics reflect the same emphasis on disclosure and on not overstating a single result.
15. The bottom line
Read at face value, the modern research literature supports a small number of propositions that can be stated without hedging. The Comparison Question Test discriminates truthful from deceptive examinees substantially better than chance and far better than any unaided human judgment; the headline accuracies in APA 2011 (.85–.89) and in Honts, Thurber and Handler 2021 (rdec .69, AUC .91) are mutually consistent. The field-versus-laboratory dispute has been settled in favour of field performance. The confession-criterion concern is not supported by the data. The thought-experiment critique is falsified by exoneration records. And the Concealed Information Test, where it can be run, is a still more accurate recognition method, structurally protective of the innocent.
At the same time, the test is not perfectly accurate and never will be; its specificity literature is thinner than its sensitivity literature; screening at low base rates is genuinely treacherous in a way single-issue testing is not; trained countermeasures pose a real if surmountable threat; and real-world accuracy depends as much on an agency’s decision policy and the regulatory environment as on the instrument in the examinee’s chair.
The evidence, in one breath
Highly accurate but not perfect; informative across nearly the entire base-rate range; only as good as the policy that consumes its outputs; and a different proposition entirely depending on whether one means single-issue diagnostic testing, security screening, or recognition testing. It is also the position most likely to survive cross-examination.
16. Historical timeline
17. Glossary
- CQT
- Comparison Question Test. A deception test comparing reactions to relevant and comparison questions.
- CIT
- Concealed Information Test (formerly Guilty Knowledge Test). A recognition test for knowledge only a guilty party could hold.
- PLC / DLC
- Probable-Lie / Directed-Lie Comparison. Two ways of constructing the comparison question; the DLC instructs an untruthful answer to a generic question.
- DI / NDI
- Deception Indicated / No Deception Indicated. The two categorical calls in single-issue testing.
- SR / NSR
- Significant Response / No Significant Response. The screening equivalents of DI / NDI.
- Inconclusive (INC)
- A formal third category used when data do not meet the threshold for either call.
- Sensitivity / Specificity
- The proportion of guilty correctly identified (sensitivity) and innocent correctly cleared (specificity).
- rdec
- Detection Efficiency Coefficient. The effect-size metric used across the CQT meta-analytic literature; counts inconclusives as errors.
- AUC
- Area Under the ROC Curve. A threshold-independent measure of discrimination; .5 is chance, 1.0 is perfect.
- Base rate
- The prevalence of guilt in the population being tested. Predictive values shift with base rate; sensitivity and specificity do not.
- Information Gain (IG)
- How much a result reduces prior uncertainty about the truth, relative to the starting point.
- PCSOT
- Sex Offender Testing">Post-Conviction Sex Offender Testing. Polygraph used for supervision and disclosure rather than to determine guilt.
- EDA
- Electrodermal activity. Skin-conductance changes reflecting sympathetic arousal; a robust scoring channel.
Frequently asked questions
Does the polygraph actually work?
Yes, in the sense that matters for a decision: it discriminates truthful from deceptive examinees far better than unaided human judgment, which sits at about 54% — barely above chance. It does not “detect lies” directly; it measures differential physiological reactivity and applies statistical decision rules. It is highly accurate but never perfect.
How accurate is a polygraph test?
For event-specific, single-issue examinations, the validated accuracy is around 89% (APA 2011), and the most comprehensive recent meta-analysis reports an AUC of about .91 (Honts, Thurber & Handler, 2021). Multiple-issue screening is lower (around 85%) and far more error-prone at low base rates of guilt.
Can you beat a polygraph with countermeasures?
Spontaneous countermeasures are common but largely ineffective and detectable. Trained, expert-coached countermeasures can degrade accuracy, which is why countermeasure detection is now built into examiner training and algorithmic scoring. Defeating a competently administered, countermeasure-aware test is harder than popular accounts suggest, and the attempt itself leaves traces.
Is a polygraph reliable enough to clear an innocent person?
This is its weaker side. The evidence for specificity — correctly clearing the truthful — is thinner than the evidence for catching the guilty, and a result should never be the sole basis for a decision. How much weight a single result deserves depends on its direction and on the base rate of guilt in that setting.
18. Sources (numbered references)
Sources are listed in the order of first citation. Each inline footnote links to the source entry here; hover or tap a footnote for the underlying fact and proof.
- Bond, C. F. & DePaulo, B. M. (2006). Accuracy of deception judgments. Personality and Social Psychology Review, 10(3), 214–234.
- American Polygraph Association (2011). Meta-analytic survey of criterion accuracy of validated polygraph techniques. Polygraph, 40(4), 194–305.
- Honts, C. R., Thurber, S. & Handler, M. (2021). A comprehensive meta-analysis of the comparison question polygraph test. Applied Cognitive Psychology, 35(2), 411–427.
- Honts, C. R. & Thurber, S. (2019). Analyzing Iacono’s thought experiment about polygraph field studies: Reason or fantasy? Polygraph & Forensic Credibility Assessment, 48(2), 75–83.
- Ben-Shakhar, G. & Elaad, E. (2003). The validity of psychophysiological detection of information with the Guilty Knowledge Test: A meta-analytic review. Journal of Applied Psychology, 88(1), 131–151.
- National Research Council (2003). The Polygraph and Lie Detection. Washington, DC: National Academies Press.
- Frye v. United States, 293 F. 1013 (D.C. Cir. 1923).
- Hartwig, M. & Bond, C. F. (2011). Why do lie-catchers fail? A lens-model meta-analysis of human lie judgments. Psychological Bulletin, 137(4), 643–659. See also Hartwig & Bond (2014), Applied Cognitive Psychology, 28(5), 661–676.
- Vrij, A. (2008). Detecting Lies and Deceit: Pitfalls and Opportunities (2nd ed.). Chichester: Wiley.
- Raskin, D. C. & Honts, C. R. (2002). The comparison question test. In M. Kleiner (Ed.), Handbook of Polygraph Testing (pp. 1–48). Academic Press.
- Bell, B. G., Raskin, D. C., Honts, C. R. & Kircher, J. C. (1999). The Utah numerical scoring system. Polygraph, 28(1), 1–9.
- Lykken, D. T. (1959). The GSR in the detection of guilt. Journal of Applied Psychology, 43, 385–388.
- Ben-Shakhar, G. (1977). A further study of the dichotomisation theory in detection of information. Psychophysiology, 14, 408–413.
- Meijer, E. H., Klein Selle, N., Elber, L. & Ben-Shakhar, G. (2014). Memory detection with the Concealed Information Test: A meta-analysis of skin conductance, respiration, heart rate, and P300 data. Psychophysiology, 51(9), 879–904.
- Elaad, E. (1990). Detection of guilty knowledge in real-life criminal investigations. Journal of Applied Psychology, 75(5), 521–529.
- Kircher, J. C., Horowitz, S. W. & Raskin, D. C. (1988). Meta-analysis of mock-crime studies of the control question polygraph technique. Law and Human Behavior, 12(1), 79–90.
- Ginton, A. (2013). A non-standard method for estimating accuracy of lie detection techniques demonstrated on a self-validating set of field polygraph examinations. Psychology, Crime & Law, 19(7), 577–594.
- Mao, Y., Liang, Y. & Hu, Z. (2014). Accuracy rate of lie-detection in China: Estimate the validity of CQT on field cases. Physiology & Behavior, 140, 104–110.
- Wells, G. L. & Olson, E. A. (2002). Eyewitness identification: Information gain from incriminating and exonerating behaviors. Journal of Experimental Psychology: Applied, 8(3), 155–167.
- Honts, C. R. & Schweinle, W. (2009). Information gain of psychophysiological detection of deception in forensic and screening settings. Applied Psychophysiology and Biofeedback, 34(3), 161–172.
- Iacono, W. G. (1991). Can we determine the accuracy of polygraph tests? In J. R. Jennings, P. K. Ackles & M. G. H. Coles (Eds.), Advances in Psychophysiology (Vol. 4, pp. 201–207). Jessica Kingsley.
- Iacono, W. G. & Ben-Shakhar, G. (2019). Current status of forensic lie detection with the comparison question test: An update of the 2003 National Academy of Sciences report on polygraph testing. Law and Human Behavior, 43(1), 86–98.
- Honts, C. R. (1996). Criterion development and validity of the control question test in field application. The Journal of General Psychology, 123, 309–324.
- Bonpasse, M. (2013). Polygraph and 215 wrongful conviction exonerations. Polygraph, 42(2), 112–127.
- Honts, C. R. & Reavy, R. (2015). The comparison question polygraph test: A contrast of methods and scoring. Physiology & Behavior, 143, 15–26.
- Ginton, A. (2009). Relevant Issue Gravity (RIG) strength — a new concept in PDD that reframes the notion of psychological set and the role of attention in the CQT polygraph examination. Polygraph, 38(3), 204–217.
- Senter, S., Weatherman, D., Krapohl, D. & Horvath, F. (2010). Psychological set or differential salience: A proposal for reconciling theory and terminology in polygraph testing. Polygraph, 39(2), 109–117.
- Honts, C. R. (2014). Countermeasures and credibility assessment. In D. C. Raskin, C. R. Honts & J. C. Kircher (Eds.), Credibility Assessment: Scientific Research and Applications (pp. 131–158). Academic Press.
- Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993).
- United States v. Scheffer, 523 U.S. 303 (1998).
- Employee Polygraph Protection Act of 1988, Pub. L. No. 100–347, 29 U.S.C. §§ 2001–2009.
- Lee v. Martinez, 136 N.M. 166, 96 P.3d 291 (2004).
- Hand, D. J. (2009). Measuring classifier performance: A coherent alternative to the area under the ROC curve. Machine Learning, 77(1), 103–123.
- Jones, C. M. & Athanasiou, T. (2005). Summary receiver operating characteristic curve analysis techniques in the evaluation of diagnostic tests. The Annals of Thoracic Surgery, 79(1), 16–20.
- Honts, C. R. (2017). Current FBI polygraph practices put the innocent at high risk of wrongful accusation, interrogation, and false confession. Paper presented at the American Psychology–Law Society meeting, Seattle, WA.
This article was prepared for the British Polygraph Society as a reading of the peer-reviewed evidence. Readers making professional, legal, or regulatory use of the material should consult the original sources. The inline fact-check popovers are intended to make the evidence transparent, not to substitute for the source papers.