I. Psychology of False Perception Why we hear things that aren’t there
Top-Down Processing The brain doesn’t just listen —it constructs perception
When audio is unclear, the brain fills in gaps External suggestions become “heard” speech This process is subconscious and involuntary Even experts cannot resist it through willpower Human perception relies heavily on “top-down processing”—using prior knowledge and expectations to interpret ambiguous sensory input. When the acoustic signal is weak or noisy, the brain doesn’t simply report “unclear.” Instead, it actively constructs a perception based on what it expects to hear. This happens automatically and is immune to conscious effort to resist it. You cannot simply “try harder” to hear objectively. Source: Fraser, H. (2021). Forensic Transcription: Legal and scientific perspectives. In Speaker individuality in phonetics and speech sciences .Textual Priming A transcript doesn’t assist perception—it creates perception
Once a listener sees a word on a page while listening to indistinct audio, they often hear that word “with their own ears”—even if the transcript is demonstrably wrong.
Textual priming is one of the most dangerous phenomena in forensic audio. When you read a transcript while listening to unclear audio, your brain uses those words to construct what you hear. Research shows that once primed, listeners will insist they heard specific words even when acoustic analysis proves those words were never spoken. The transcript becomes the perception. Source: Fraser, H. & Stevenson, B. (2014). The power and persistence of contextual priming. International Journal of Evidence and Proof , 18(3), 205–229.The “Crisis Call” Study Condition Heard “I shot the prick” No transcript provided 0% After seeing transcript 33% Refused to change mind after being told transcript was wrong Many
This study used an actual crisis call recording admitted in a murder trial with 190 participants. Without any transcript, zero participants heard the phrase “I shot the prick.” After being shown a transcript containing that phrase, one-third of participants heard it exactly. Most disturbing: many refused to change their minds even after being told the transcript was fabricated. This demonstrates how powerfully transcripts shape perception. Source: Fraser, H., Stevenson, B., & Marks, T. (2011). Interpretation of a crisis call: Persistence of a primed perception of a disputed utterance. International Journal of Speech Language and the Law , 18(2), 261–292.Contextual Priming Background knowledge about a case biases what we hear
Knowing someone has a criminal history Believing a confession exists Having a theory of guilt → Listeners hear incriminating words in innocuous sounds
Context matters as much as transcripts. Research shows that when participants are told they’re listening to “criminal suspects,” they make significantly more incriminating misinterpretations. If an investigator believes a suspect is guilty, they will literally hear different words than an unbiased listener. This is not dishonesty—it’s how human perception works. Source: Giroux, M. E. (2022). Confirmation Bias for Degraded Forensic Audio Evidence. PhD Thesis, Simon Fraser University.Confirmation Bias People interpret ambiguous audio to support existing hypotheses
Evidence against the theory is discounted Contradictory acoustic evidence is ignored Initial interpretations become entrenched “Corroboration inflation” spreads bias to other evidence Confirmation bias causes investigators to hear what they expect to hear. If you expect a specific suspect to answer a wiretapped phone, you’re significantly more likely to misidentify a different voice as that suspect. When a jury believes a recording contains a confession, this “confession” colors their interpretation of all other evidence. One piece of biased evidence contaminates the entire case. Source: Giroux, M. E. (2022). Confirmation Bias for Degraded Forensic Audio Evidence. PhD Thesis, Simon Fraser University.Fluency Misattribution The ease of processing is mistaken for proof of accuracy
Transcript makes unclear words easy to “hear” Brain experiences processing fluency Fluency feels like recognition Listener concludes: “That must be correct” Fluency misattribution explains why transcripts feel so convincing. When a transcript makes it easy to “make out” words in noisy audio, the brain experiences conceptual fluency. This ease of processing creates a subjective sense of certainty. Listeners mistake the feeling of “getting it” for evidence that their interpretation is correct. Research shows reaction times directly mediate incriminating misinterpretations. Sources: Giroux, M. E. (2022). Confirmation Bias for Degraded Forensic Audio Evidence. PhD Thesis, Simon Fraser University. Higham, P. A., Neil, G. J., & Bernstein, D. M. (2017). Auditory hindsight bias: Fluency misattribution versus memory reconstruction. Journal of Experimental Psychology: Human Perception and Performance , 43, 1143–1159. The “Ad Hoc Expert” Problem Courts allow police to provide transcripts because they’ve listened “many times”
The assumption: Repetitive listening confers expertise
The reality:
Repetition entrains initial (often wrong) perceptions Investigators’ case knowledge primes their hearing Their “ability” to hear more is actually bias Courts routinely treat police investigators as audio experts based solely on repeated listening. But research shows repetitive listening to poor audio does not improve accuracy. It does the opposite: it entrains and strengthens initial perceptions, right or wrong. The investigator’s familiarity with the case primes them to hear words that align with their theory. What appears to be expertise is actually contextual bias. Source: French, P. & Fraser, H. (2018). Why “Ad Hoc Experts” Should Not Provide Transcripts of Indistinct Forensic Audio. Criminal Law Journal , 42(5), 298–302.IV. Prevention Methods Protocols and practices that can prevent wrongful convictions
Core Principle Treat forensic audio as scientific discipline , not “common knowledge”
Formal methods, not intuition Independent analysis, not investigator transcripts Documented procedures, not ad hoc judgments Acknowledged uncertainty, not false confidence The fundamental shift required is treating forensic audio as a scientific discipline. It should not be treated as common sense that anyone can apply. This means using formal validated methods, not intuition. Analysis must be independent from the investigation, not produced by investigators. Procedures must be documented and uncertainty must be acknowledged. Source: French, P. & Fraser, H. (2018). Why “Ad Hoc Experts” Should Not Provide Transcripts. Criminal Law Journal , 42(5), 298–302.Linear Sequential Unmasking (LSU) Analyze evidence before learning case context
Protocol:
Examine raw audio in complete isolation from case information Document initial findings Only then reveal context in controlled stages Separate each stage of analysis Linear Sequential Unmasking is a protocol designed to prevent contextual bias. The analyst examines raw audio evidence in total isolation from reference materials or case context. Initial findings are documented before any potentially contaminating information is revealed. Context is then revealed in controlled, iterative stages. This prevents case theory from shaping what the analyst “hears.” Sources: Dror, I. E., et al. (2015). Context Management Toolbox: A Linear Sequential Unmasking (LSU) Approach. Journal of Forensic Sciences , 60(4). Krane, D. E., et al. (2008). Sequential unmasking: A means of minimizing observer effects. Journal of Forensic Sciences , 53(4). LSU in Practice Stage Action 1 Receive audio only—no case information 2 Document what you hear independently 3 Receive limited context (e.g., topic area) 4 Revise analysis if warranted 5 Receive full case context 6 Final analysis with all biases documented
In practice, LSU works as a staged process. First, the analyst receives only the audio with no case information. They document what they hear independently. Then limited context is provided, and the analysis is revised if needed. Full case context comes last. At each stage, any changes and potential biases are documented. Source: Dror, I. E., et al. (2015). Context Management Toolbox: A Linear Sequential Unmasking (LSU) Approach. Journal of Forensic Sciences , 60(4).Separation of Roles Forensic analysis must be independent from investigation
Analysts should not be employed by police Analysts should not know the “desired” outcome Accredited practitioners only Double-blind procedures for voice lineups Forensic science should be conducted by accredited practitioners strictly separated from police investigators. Analysts should never know what outcome would help the prosecution. Voice identification lineups should be double-blind: neither the witness nor the administrator should know which recording belongs to the suspect. This prevents subtle cues from contaminating results. Source: French, P. & Fraser, H. (2018). Why “Ad Hoc Experts” Should Not Provide Transcripts. Criminal Law Journal , 42(5), 298–302.Transcription Protocols End the practice of police-produced transcripts
Requirements:
Independent transcribers employed in public service Isolation from specific investigation "[Inaudible]" for unclear words—no guessingAcoustic-phonetic verification of auditory findings Courts should end the practice of allowing police investigators to provide transcripts. Transcripts should be produced by teams employed in public service who are entirely independent from the investigation. Words that are unclear should be marked as “inaudible” rather than guessed. Guessing creates “phantom speech” for the jury to follow. All auditory findings should be verified with acoustic analysis. Source: Fraser, H. (2025). Submission to House of Lords Police Transcription Inquiry (Written Evidence POL0005).Expert Requirements Scientific recognition: Transcription is linguistic science, not legal precedent
Epistemic modesty: Acknowledge limits and potential for error
Core competencies:
Track auditory acuity regularly Deep knowledge of EMI mitigation Signal path design expertise No statements of “absolute certainty” Forensic transcription must be recognized as a branch of linguistic science governed by evidence-based methods. Experts must demonstrate epistemic modesty—acknowledging the limits of their knowledge and potential for error. Statements of “absolute certainty” should never be made. Practitioners should regularly track their own auditory acuity. They need deep technical knowledge of electromagnetic interference and signal design. Source: SWGDE (2025). Core Competencies for Forensic Audio (Version 3.3).Jury Instructions Current instructions often fail—timing and language matter
Effective practices:
Present instructions before evidence is played Use simple, explicit language State clearly: “Transcripts are not evidence” Warn that confidence ≠ accuracy Note unfamiliar accent/language unreliability Research shows jury instructions are more effective when presented before the evidence rather than after. Instructions should use simple language and explicitly state that transcripts are not evidence—they are merely aids that jurors are free to reject. Juries must be warned that a witness’s confidence is a poor indicator of accuracy. They should be told that identifying speakers with unfamiliar accents or languages is notoriously unreliable. Source: Fraser, H. (2021). Forensic Transcription: Legal and scientific perspectives.Technical Standards Audio format: Uncompressed PCM (WAV), 16-bit minimum, ≥16 kHz sampling
Hash verification: MD5 or SHA to confirm data integrity
Chain of custody: Document every transfer and access
Enhancement caution: “Clearer” audio is not necessarily more intelligible—can boost false transcript credibility
Audio should be recorded and maintained as uncompressed PCM data with at least 16-bit samples and minimum 16 kHz sampling. Cryptographic hash functions should verify that no data has changed during transfer or storage. Chain of custody must be documented for every transfer and access. Courts should require objective evidence that enhancement actually improves intelligibility. “Clearer” sounding audio can paradoxically boost the credibility of an inaccurate transcript. Sources: SWGDE (2022). Best Practices for Forensic Audio (Version 2.5). SWGDE (2018). Best Practices for Digital Audio Authentication (Version 1.3). ENF Analysis Electric Network Frequency: The power grid “hum” embedded in recordings
Uses:
Verify date and time of recording Detect butt-splices and edits Identify broad geographic location (different grids have different patterns) ENF analysis examines the power line hum embedded in recordings. Electrical grids operate at nominal frequencies (60 Hz US, 50 Hz Europe) that fluctuate slightly over time. These fluctuations are identical across an entire grid at any moment. By matching the hum against a reference database, analysts can determine when and where a recording was made. Discontinuities in the ENF can reveal edits that are otherwise invisible. Sources: Jenkins, C. W. (2011). An Investigative Approach to Configuring Forensic Electric Network Frequency Databases. MS Thesis, University of Colorado Denver. Grigoras, C. (2005). Digital Audio Recording Analysis: The Electric Network Frequency (ENF) Criterion. International Journal of Speech, Language, and the Law , 12(1). ENF Limitations Recording must have been near power grid or electromagnetic field Difficult with low signal-to-noise ratio Heavily compressed audio may not work Requires access to specialized reference databases Not a silver bullet —one tool among manyENF analysis has important limitations. The recording must have been made while the device was plugged in or near an electromagnetic field. Extraction is difficult or impossible with low signal-to-noise ratios. Heavily compressed audio may not retain the signal. Analysis requires access to specialized databases that were active on the specific grid at the time. ENF is a valuable tool but not a complete solution. Source: Jenkins, C. W. (2011). An Investigative Approach to Configuring Forensic ENF Databases. MS Thesis, University of Colorado Denver.