Other Enhancement Techniques

Forensic Audio Analysis — Week 9

Today’s Topics

  • Enhancement order of operations
  • Spectral subtraction methods
  • Deep learning for audio
  • Specialized enhancement techniques

I. The Zjalic Processing Framework

Why Order Matters

  • Each step feeds the next
  • Wrong order degrades results
  • Later tools depend on earlier fixes
  • Like following a recipe
Diagram showing the five steps of the Zjalic Processing Framework in order, with arrows indicating the flow from one step to the next

Step 1: Waveform Restoration

  • Fix clicks, pops, and clipping
  • Repair damaged waveform peaks
  • Uses interpolation and spectral repair
  • Must happen first

Step 2: Signal Separation

  • Isolate the target speaker
  • Subtract known interference sources
  • Requires original bit-stream integrity
  • Must precede heavy filtering

Step 3: Filtering and EQ

  • Remove out-of-band noise
  • Notch out hum and tones
  • Define the speech bandwidth
  • Recap from Week 8

Step 4: Broadband Noise Reduction

  • Addresses complex background noise
  • Spectral subtraction or deep learning
  • Automatic, adaptive techniques
  • Most effective after manual filtering

Step 5: Dynamics and Normalization

  • Balance volume across speakers
  • Boost quiet speech, tame loud peaks
  • Normalize to standard playback level
  • Always the final step

Discussion

  • What happens if you skip a step?
  • Why must separation come before filtering?
  • Could you ever justify a different order?

II. Spectral Subtraction

The Basic Idea

  • Recording = speech + noise
  • Estimate the noise alone
  • Subtract noise from recording
  • What remains is cleaner speech
Diagram illustrating spectral subtraction, showing a noisy recording being decomposed into an estimated noise profile and the resulting cleaned speech after subtraction

How It Works (Simplified)

Find silent moments — noise only

Build a “noise profile” from those moments

Subtract the profile from every frame

Fix any negative values (set to zero)

Reconstruct the cleaned audio

The Musical Noise Problem

  • Subtraction creates random artifacts
  • Short tones that flicker on and off
  • Sounds like “birdies” or digital chirping
  • Can be worse than original noise
Spectrogram showing the presence of musical noise artifacts, with random bright spots scattered across the frequency spectrum indicating tonal artifacts left by basic spectral subtraction

Artifacts in frequency and their variation in time (“Birdies”)

Fix 1: Over-Subtraction

  • Remove more noise than estimated
  • Trades some speech for less artifact
  • Adjustable “aggressiveness” factor
  • Adapts based on noise level

Fix 2: Multi-Band (MBSS)

  • Splits audio into frequency bands
  • Each band gets its own settings
  • Handles uneven noise better
  • More accurate than one-size-fits-all

Fix 3: Geometric Approach

  • Uses phase information (others ignore it)
  • Smoother, more natural result
  • Eliminates musical noise effectively
  • More mathematically sophisticated

Measuring Success

  • STI: best for rooms and linear systems
  • STOI: best for digital enhancement
  • PESQ: quality only, not intelligibility
  • Seg.SNR: local signal quality per frame

Discussion

  • Why can’t we perfectly estimate noise?
  • When would musical noise matter in court?
  • Which variant would you choose and why?

III. Deep Learning for Enhancement

What Is Deep Learning?

  • Computer learns from examples
  • Trained on thousands of audio pairs
  • Input: noisy recording
  • Output: cleaned version
  • No explicit rules needed
Diagram illustrating the deep learning pipeline for audio enhancement, showing a noisy audio input being processed by a neural network to produce a cleaner output, with the network represented as a black box

Source: Block diagram of the active noise cancellation headphone based on the CNN-based SFANC

Three Key Architectures

Convolutional Neural Network (CNN) — finds patterns in frequency

Recurrent Neural Network (RNN)/Long Short-Term Memory (LSTM) — tracks patterns over time

Convolutional Recurrent Network (CRN) — combines both strengths

Neural In-Painting

  • Fills in damaged audio segments
  • Like photo restoration for sound
  • Rebuilds missing speech patterns
  • Trained on clean/noisy audio pairs

Neural In-Painting Example

Corrupted and reconstructed spectrogram

Detecting Dangerous Sounds

  • Gunshots, explosions, breaking glass
  • CNN-RNN hybrids detect signatures
  • Over 96% accuracy reported
  • Critical for security and evidence

DL vs. Traditional Methods

TraditionalDeep Learning
Noise typesStationaryAny
ArtifactsMusical noiseFewer
ExplainabilityHighLow
Court-readyEstablishedEmerging

Discussion

  • Can you trust a “black box” with evidence?
  • How would you explain DL to a jury?
  • What’s the risk of AI “hallucinating” speech?

IV. Specialized Techniques

Reference Signal Cancellation

  • Identify background interference source
  • Obtain clean copy of that source
  • Align and subtract it out
  • Isolates the target speech

Acoustic Fingerprinting

  • Software identifies audio recordings
  • Like Shazam for forensic analysis
  • Matches frequency peak patterns
  • Finds the exact interference source

Multiband Compression for Intelligibility

  • Divides spectrum into hearing bands
  • Boosts quiet consonants
  • Reduces loud masking vowels
  • Reallocates energy for clarity

Quality vs. Intelligibility Revisited

  • MB-DRC may sound less natural
  • But word recognition improves
  • Forensic goal: understanding, not comfort
  • Always measure with STOI, not just ears

Selective Noise Cancellation

  • AI distinguishes speech from noise
  • Learns individual speaker patterns
  • Solves the “cocktail party problem”
  • Used in smart hearing aids

Discussion

  • How would you explain reference cancellation to a jury?
  • What are the ethical limits of speaker isolation?
  • Could MB-DRC change the meaning of words?

Key Takeaways

  • Order matters: restore, separate, filter, reduce, normalize
  • Spectral subtraction: powerful but watch for artifacts
  • Deep learning: better results, harder to explain
  • Always measure, document, and preserve originals

Summary

  • Advanced tools extend Week 8 foundations
  • Match technique to the specific problem
  • Court admissibility requires transparency
  • Next: forensic transcription and interpretation