Audio Enhancement Techniques
Forensic Audio Analysis — Week 8
Today’s Topics
- Enhancement assessment and standards
- Speech: quality versus intelligibility
- Filtering and equalization methods
- Gain compression and expansion
What is Forensic Enhancement?
- Improve SNR to aid transcription
- Not cosmetic — must be measurable
- Always work from a verified copy
- Document every processing step
I. Enhancement Assessment
Why Assessment Matters
- Admissibility requires scientific validation
- Enhancement can help or harm evidence
- Courts demand repeatability and rigor
- Subjective “sounds better” is not enough
Subjective Evaluation
- Listeners rate audio as “clearer”
- Fast, intuitive, and easily applied
- Ratings shaped by expectation and context
- Not replicable or standardized
Limits of Subjective Measures
- “Sounds better” ≠ more intelligible
- Transcript priming distorts listener response
- Two listeners may reach different verdicts
- Courts expect scientific, not personal, opinion
Objective Metrics
- STI — speech transmission fidelity
- AI — spectral intelligibility estimate
- PESQ — perceived speech quality
- WIS — human word identification score
Speech Transmission Index (STI)
- Measures intensity modulation preservation
- Seven octave bands: 125 Hz – 8000 Hz
- 96%+ correlation with intelligibility scores
- Preferred objective standard for forensics
By Acousticator - Own work, CC BY-SA 3.0, Link

Word Identification Score (WIS)
- Human listeners transcribe audio samples
- Percentage correct = intelligibility score
- Highest validity — but slow and expensive
- Most persuasive evidence for courts
Bruce Koenig’s 11-Step Protocol
- Evidence marking and chain of custody
- Physical inspection of media
- Critical listening and waveform review
- Spectrographic and digital data analysis
- Enhancement applied iteratively, documented
AES & SWGDE Standards
- AES27-1996: managing recorded materials
- AES43-2000: authenticating analog tapes
- SWGDE: digital enhancement best practices
- Document every step for court admissibility
Discussion
- How do we define “good” enhancement?
- When might STI mislead an examiner?
- Can subjective testing ever be valid in court?
II. Speech: Quality vs. Intelligibility
The Core Paradox
- Better-sounding ≠ more understandable
- Enhancement may destroy speech cues
- “Comfort” and “clarity” are different goals
What Makes Speech Intelligible?
- Fricatives: /s/, /f/, /sh/ — high frequency
- Plosives: /p/, /t/, /k/ — short transients
- These cues live in the high-frequency band
- Removing “noise” can remove these cues
The Filtering Problem
- Filters reduce noise — but at a cost
- Fricatives and plosives resemble noise
- Aggressive filtering smooths them out
- Speech sounds “cleaner” but loses phonemes
PESQ vs. STI: Different Goals

- PESQ — measures naturalness and comfort
- STI — measures actual intelligibility
- A processed file can score high on PESQ
- And simultaneously low on STI
The Transcript Effect
- A written transcript “primes” the listener
- Enhanced audio sounds more convincing
- Listeners “hear” words that aren’t there
- Biggest risk: inaccurate courtroom transcripts
Courtroom Implications
- Jurors may treat enhancement as proof
- Enhanced audio amplifies priming effects
- Present unenhanced original alongside
- Expert must explain enhancement limits
Research Evidence
- Quality ratings improve with higher SNR
- Intelligibility also improves — differently
- Rates of improvement diverge at high SNR
- “Better audio” is not a reliable proxy
Discussion
- Should enhanced audio be used in court?
- How do we protect against transcript priming?
- When does enhancement do more harm than good?
III. Filtering and Equalization
Linear Processing

- Adjusts gain at specific frequencies
- Does not introduce new signals
- No synthesis — only attenuation or boost
- Forensic advantage: verifiable and reversible
Highpass and Lowpass Filters
- Highpass: removes low-frequency rumble
- Lowpass: removes high-frequency hiss
- Combined: define the speech pass band
- Rolloff slope: steep vs. gentle (dB/octave)
- ReEQ Demo: control over curve shape and slope
Bandpass Filtering
- Combines highpass + lowpass filters
- Forensic speech band: 200 Hz – 4 kHz
- Preserves most vowel and consonant energy
- Attenuates rumble, hiss, and interference
Notch Filtering
- Extremely narrow bandstop filter
- Targets specific interference frequencies
- Common: 60 Hz mains hum (US/Canada)
- Also: 50 Hz (Europe), fan whines, motor tones
Mains Hum: 60 Hz and Harmonics
- Fundamental: 60 Hz
- Harmonics: 120, 180, 240 Hz, continuing…
- Each harmonic needs its own notch
- Harmonics can extend into the speech band
Consonant Boost: 1–4 kHz
- Boosts consonant spectral peaks
- Targets phoneme-distinguishing frequencies
- Helps distinguish /d/ vs /t/, /b/ vs /p/
- Apply carefully: avoids noise amplification
EQ and Filter Workflow
- Analyze spectrogram: identify interference
- Apply cuts before boosts
- Iterate: listen, measure STI, adjust
- Document each step and parameter setting
Discussion
- What frequencies can you never remove?
- Why might notching harmonics cause problems?
- How does EQ interact with the paradox?
IV. Gain Compression and Expansion
Dynamic Range in Forensic Recordings
- Surreptitious recordings vary wildly
- Distant speakers: very low amplitude
- Loud events: may clip or saturate
- Dynamics processing normalizes the range
AGC and Compression
- AGC: auto-adjusts gain to normalize loudness
- Boosts quiet talkers, reduces loud ones
- Compression: ratio-based gain reduction
- Above threshold: input reduced by set ratio
Key Compression Parameters
- Threshold: level where gain reduction starts
- Ratio: input:output relationship (e.g., 4:1)
- Attack: how fast compression engages
- Release: how fast compression disengages
The Pumping Problem
- Excessive compression causes “pumping”
- Background noise rises between words
- Short release times make it worse
- Pumping degrades intelligibility and credibility
Expansion and Noise Gating
- Expansion: reduces gain below threshold
- Noise gate: hard mute below threshold
- Silences background noise between utterances
- Less destructive than silence-cut editing
Threshold: A Critical Setting
- Too low: background noise passes through
- Too high: word onsets and offsets clipped
- Onset clipping destroys critical phoneme cues
- Verify gate behavior with spectrogram
Multiband Dynamics
- Separate compression per frequency sub-band
- Each band has its own threshold and ratio
- Useful for recordings with strong reverberation
- More surgical than broadband compression
Discussion
- When would AGC hurt speaker recognition?
- How do you detect over-compression?
- Should noise gating be used for court evidence?
Enhancement: Guiding Principles
- Always work from a verified copy
- Iterate: measure STI after each step
- Document every process and parameter
- Present original alongside enhanced version
Key Takeaways
- Assessment: objective metrics, not impressions
- Paradox: quality ≠ intelligibility
- Filtering: protect speech-critical frequencies
- Dynamics: threshold setting is high-stakes
Summary
- Four pillars: assess, filter, compress, verify
- Court-ready: documented, measured, defensible
- Tools serve the evidence — not the other way
- Next: speaker recognition and transcription