Audio Enhancement Techniques

Forensic Audio Analysis — Week 8

Today’s Topics

  • Enhancement assessment and standards
  • Speech: quality versus intelligibility
  • Filtering and equalization methods
  • Gain compression and expansion

What is Forensic Enhancement?

  • Improve SNR to aid transcription
  • Not cosmetic — must be measurable
  • Always work from a verified copy
  • Document every processing step

I. Enhancement Assessment

Why Assessment Matters

  • Admissibility requires scientific validation
  • Enhancement can help or harm evidence
  • Courts demand repeatability and rigor
  • Subjective “sounds better” is not enough

Subjective Evaluation

  • Listeners rate audio as “clearer”
  • Fast, intuitive, and easily applied
  • Ratings shaped by expectation and context
  • Not replicable or standardized

Limits of Subjective Measures

  • “Sounds better” ≠ more intelligible
  • Transcript priming distorts listener response
  • Two listeners may reach different verdicts
  • Courts expect scientific, not personal, opinion

Objective Metrics

  • STI — speech transmission fidelity
  • AI — spectral intelligibility estimate
  • PESQ — perceived speech quality
  • WIS — human word identification score

Speech Transmission Index (STI)

  • Measures intensity modulation preservation
  • Seven octave bands: 125 Hz – 8000 Hz
  • 96%+ correlation with intelligibility scores
  • Preferred objective standard for forensics

By Acousticator - Own work, CC BY-SA 3.0, Link

Word Identification Score (WIS)

  • Human listeners transcribe audio samples
  • Percentage correct = intelligibility score
  • Highest validity — but slow and expensive
  • Most persuasive evidence for courts

Bruce Koenig’s 11-Step Protocol

  • Evidence marking and chain of custody
  • Physical inspection of media
  • Critical listening and waveform review
  • Spectrographic and digital data analysis
  • Enhancement applied iteratively, documented

AES & SWGDE Standards

  • AES27-1996: managing recorded materials
  • AES43-2000: authenticating analog tapes
  • SWGDE: digital enhancement best practices
  • Document every step for court admissibility

Discussion

  • How do we define “good” enhancement?
  • When might STI mislead an examiner?
  • Can subjective testing ever be valid in court?

II. Speech: Quality vs. Intelligibility

The Core Paradox

  • Better-sounding ≠ more understandable
  • Enhancement may destroy speech cues
  • “Comfort” and “clarity” are different goals

What Makes Speech Intelligible?

  • Fricatives: /s/, /f/, /sh/ — high frequency
  • Plosives: /p/, /t/, /k/ — short transients
  • These cues live in the high-frequency band
  • Removing “noise” can remove these cues

The Filtering Problem

  • Filters reduce noise — but at a cost
  • Fricatives and plosives resemble noise
  • Aggressive filtering smooths them out
  • Speech sounds “cleaner” but loses phonemes

PESQ vs. STI: Different Goals

MOS/PESQ quality trend compared with STI intelligibility trend
  • PESQ — measures naturalness and comfort
  • STI — measures actual intelligibility
  • A processed file can score high on PESQ
  • And simultaneously low on STI

The Transcript Effect

  • A written transcript “primes” the listener
  • Enhanced audio sounds more convincing
  • Listeners “hear” words that aren’t there
  • Biggest risk: inaccurate courtroom transcripts

Courtroom Implications

  • Jurors may treat enhancement as proof
  • Enhanced audio amplifies priming effects
  • Present unenhanced original alongside
  • Expert must explain enhancement limits

Research Evidence

  • Quality ratings improve with higher SNR
  • Intelligibility also improves — differently
  • Rates of improvement diverge at high SNR
  • “Better audio” is not a reliable proxy

Discussion

  • Should enhanced audio be used in court?
  • How do we protect against transcript priming?
  • When does enhancement do more harm than good?

III. Filtering and Equalization

Linear Processing

Band-pass filter frequency response shape
  • Adjusts gain at specific frequencies
  • Does not introduce new signals
  • No synthesis — only attenuation or boost
  • Forensic advantage: verifiable and reversible

Highpass and Lowpass Filters

  • Highpass: removes low-frequency rumble
  • Lowpass: removes high-frequency hiss
  • Combined: define the speech pass band
  • Rolloff slope: steep vs. gentle (dB/octave)
  • ReEQ Demo: control over curve shape and slope

Bandpass Filtering

  • Combines highpass + lowpass filters
  • Forensic speech band: 200 Hz – 4 kHz
  • Preserves most vowel and consonant energy
  • Attenuates rumble, hiss, and interference

Notch Filtering

  • Extremely narrow bandstop filter
  • Targets specific interference frequencies
  • Common: 60 Hz mains hum (US/Canada)
  • Also: 50 Hz (Europe), fan whines, motor tones

Mains Hum: 60 Hz and Harmonics

  • Fundamental: 60 Hz
  • Harmonics: 120, 180, 240 Hz, continuing…
  • Each harmonic needs its own notch
  • Harmonics can extend into the speech band

Consonant Boost: 1–4 kHz

  • Boosts consonant spectral peaks
  • Targets phoneme-distinguishing frequencies
  • Helps distinguish /d/ vs /t/, /b/ vs /p/
  • Apply carefully: avoids noise amplification

EQ and Filter Workflow

  • Analyze spectrogram: identify interference
  • Apply cuts before boosts
  • Iterate: listen, measure STI, adjust
  • Document each step and parameter setting

Discussion

  • What frequencies can you never remove?
  • Why might notching harmonics cause problems?
  • How does EQ interact with the paradox?

IV. Gain Compression and Expansion

Dynamic Range in Forensic Recordings

  • Surreptitious recordings vary wildly
  • Distant speakers: very low amplitude
  • Loud events: may clip or saturate
  • Dynamics processing normalizes the range

AGC and Compression

  • AGC: auto-adjusts gain to normalize loudness
  • Boosts quiet talkers, reduces loud ones
  • Compression: ratio-based gain reduction
  • Above threshold: input reduced by set ratio

Key Compression Parameters

  • Threshold: level where gain reduction starts
  • Ratio: input:output relationship (e.g., 4:1)
  • Attack: how fast compression engages
  • Release: how fast compression disengages

The Pumping Problem

  • Excessive compression causes “pumping”
  • Background noise rises between words
  • Short release times make it worse
  • Pumping degrades intelligibility and credibility

Expansion and Noise Gating

  • Expansion: reduces gain below threshold
  • Noise gate: hard mute below threshold
  • Silences background noise between utterances
  • Less destructive than silence-cut editing

Threshold: A Critical Setting

  • Too low: background noise passes through
  • Too high: word onsets and offsets clipped
  • Onset clipping destroys critical phoneme cues
  • Verify gate behavior with spectrogram

Multiband Dynamics

  • Separate compression per frequency sub-band
  • Each band has its own threshold and ratio
  • Useful for recordings with strong reverberation
  • More surgical than broadband compression

Discussion

  • When would AGC hurt speaker recognition?
  • How do you detect over-compression?
  • Should noise gating be used for court evidence?

Enhancement: Guiding Principles

  • Always work from a verified copy
  • Iterate: measure STI after each step
  • Document every process and parameter
  • Present original alongside enhanced version

Key Takeaways

  • Assessment: objective metrics, not impressions
  • Paradox: quality ≠ intelligibility
  • Filtering: protect speech-critical frequencies
  • Dynamics: threshold setting is high-stakes

Summary

  • Four pillars: assess, filter, compress, verify
  • Court-ready: documented, measured, defensible
  • Tools serve the evidence — not the other way
  • Next: speaker recognition and transcription