Workshop: Audio Enhancement with STI Validation

Overview⌗

Session Date: Thursday, March 5, 2026
Duration: ~75 minutes

Objective: Classify three repo-provided unknown speech clips by baseline intelligibility level, then use forensic EQ/filtering and dynamics processing in Adobe Audition with FSTIC STI validation. Students will practice the lecture rule: enhancement decisions must be based on intelligibility metrics, not just perceived quality.

Primary tools:

Adobe Audition or Reaper (with EQ and dynamics plugins)
FSTIC (adapted STI tool): https://github.com/tatecarson/FSTIC

What to submit: Each exercise ends with a Turn in block. Submit one Word document (.docx) with short answers organized by exercise number, plus your workshop audio files (originals and processed versions), to the D2L assignment.

Part 1: Setup and Baseline (~15 minutes)⌗

Exercise 1: Build Your 3-Clip Case Set and Baseline STI⌗

Create a folder on your desktop named Week8_STI_Workshop.
Download the course repo as a ZIP from GitHub, then copy the three workshop files from /speech into your Week8_STI_Workshop folder:
- Open: https://github.com/tatecarson/FSTIC
- Click Code (green button) > Download ZIP
- Extract the ZIP on your computer
- Open the extracted folder and go to /speech
  - clip_1_unknown.wav
  - clip_2_unknown.wav
  - clip_3_unknown.wav
From the extracted FSTIC folder, run setup once:
- Install a Python version lower than 3.13 before running setup. FSTIC is not compatible with Python 3.13+.
- macOS: ./scripts/mac/setup.command
- Windows: scripts\windows\setup.bat
Launch FSTIC:
- macOS: ./scripts/mac/launch.command
- Windows: scripts\windows\launch.bat
In the FSTIC notebook, choose Single File mode, select each clip, click Run Analysis, and record baseline STI values.
Listen to each clip in Audition and note key audible issues.
Classify each clip as probable high, mid, or low baseline intelligibility.
Assign working labels for your workflow:
- clip_A_high.wav
- clip_B_mid.wav
- clip_C_low.wav

Target setup for this workshop:

Clip A: already intelligible (high STI) -> minimal or no processing justified
Clip B: somewhat degraded (mid STI) -> moderate processing
Clip C: clearly degraded (low STI) -> careful iterative processing

Turn in:
Word doc questions:
Report baseline STI for A, B, and C.
Why did you classify A as high, B as mid, and C as low?
For each clip, list 1-2 audible problems.

Part 2: EQ and Filtering with STI Checks (~25 minutes)⌗

Exercise 2: Build an Iterative Filter Chain⌗

Create processed versions incrementally for clips B and C. After each step, export a new WAV and run FSTIC.

Highpass step
- Apply a highpass filter (start around 120-200 Hz).
- Export as clip_B_hp.wav and clip_C_hp.wav.
- Run FSTIC and log STI.
Notch step
- Identify any tonal interference (for example, around 60 Hz and harmonics).
- Apply narrow notch filter(s) with conservative Q.
- Export as clip_B_hp_notch.wav and clip_C_hp_notch.wav.
- Run FSTIC and log STI.
Band-limit / lowpass step
- If needed, apply gentle top-end control (avoid removing speech-critical consonant detail).
- Export as clip_B_hp_notch_lp.wav and clip_C_hp_notch_lp.wav.
- Run FSTIC and log STI.
Consonant presence step
- Apply a modest presence boost in the speech-critical range (for example 1-4 kHz).
- Export as clip_B_eq_final.wav and clip_C_eq_final.wav.
- Run FSTIC and log STI.

If STI drops after any step, revert that step and note why.

For clip_A_high.wav, do a quick verification pass only. If baseline STI is already high, justify minimal/no EQ processing.

Turn in:
Word doc questions:
Fill this table for B:
File Processing Step STI
clip_B_mid.wav Baseline
clip_B_hp.wav Highpass
clip_B_hp_notch.wav + Notch
clip_B_hp_notch_lp.wav + Lowpass/Band-limit
clip_B_eq_final.wav + Presence boost
Fill the same table for C (rename with clip_C_* files).
Which step produced the largest STI gain for B and for C?
Did any step make a clip sound cleaner but reduce STI?
For A, explain why minimal/no processing was appropriate.

File	Processing Step	STI
clip_B_mid.wav	Baseline
clip_B_hp.wav	Highpass
clip_B_hp_notch.wav	+ Notch
clip_B_hp_notch_lp.wav	+ Lowpass/Band-limit
clip_B_eq_final.wav	+ Presence boost

Part 3: Compression and Expansion with STI Checks (~20 minutes)⌗

Exercise 3: Dynamics Processing Without Over-Processing⌗

Use clip_B_eq_final.wav and clip_C_eq_final.wav as inputs for this part.

Moderate compression
- Start with conservative settings (for example ratio 2:1-4:1).
- Tune threshold, attack, and release to avoid pumping and transient damage.
- Export as clip_B_comp.wav and clip_C_comp.wav.
- Run FSTIC and log STI.
Expansion or noise gate (optional but recommended)
- Apply gentle downward expansion or gate to reduce between-word noise.
- Verify onsets are not clipped.
- Export as clip_B_comp_expand.wav and clip_C_comp_expand.wav.
- Run FSTIC and log STI.
Compare all versions in FSTIC Compare Two Files mode:
- clip_B_eq_final.wav vs clip_B_comp.wav
- clip_B_comp.wav vs clip_B_comp_expand.wav
- clip_C_eq_final.wav vs clip_C_comp.wav
- clip_C_comp.wav vs clip_C_comp_expand.wav

If pumping, chopped consonants, or STI degradation appears, back off settings.

Turn in:
Word doc questions:
Report final dynamics settings (threshold, ratio, attack, release, and gate threshold if used).
Did dynamics processing improve STI for B and C relative to each *_eq_final.wav file?
What artifact checks did you perform (pumping, onset clipping, noise surges)?

Part 4: Decision and Defensibility (~15 minutes)⌗

Exercise 4: Pick the Court-Defensible Version⌗

Review your STI results across all versions.
Choose:
- one final version for A (likely unchanged or minimally changed),
- one final version for B,
- one final version for C.
Keep each original clip available as the unenhanced original.
Write a short forensic justification for each clip based on:
- Measured STI trend
- Processing transparency (what was changed)
- Any trade-offs observed (quality vs intelligibility)

Turn in:
Word doc questions:
Which final file did you select for A, B, and C, and why?
Provide a 3-4 sentence defensibility statement for each clip using STI evidence.
For A, did STI evidence support doing little or no processing?
Would you submit any processed version that lowered STI even if it sounded cleaner? Explain.