Extract One Speaker from a Recording
Isolate the main speaker from everything else — other voices, music and noise — to get a focused single-voice track from a busy recording.
Pro Premium AI tool — included with any paid plan.
How it works
The separation engine extracts the foreground voice and pushes background talkers, music and ambience down, leaving the target speaker as the clear centre of the recording.
What it's good for
- Pulling one host from a noisy room
- Focusing a presenter over a crowd
- Isolating dialogue from ambience
- Prepping a clean voice for cloning
Details
- Engine
- Demucs
- Formats
- MP3, WAV, M4A, FLAC, OGG, AAC
- Price
- Paid plans
Frequently asked questions
Denoising removes non-speech noise but leaves other voices and music. Target-speaker extraction also removes competing voices and music, keeping only the main speaker.
Not today — it extracts the dominant foreground voice automatically. Reference-guided extraction is planned for a future update.
They're strongly reduced. A background voice as loud as the target is the hardest case and may leave faint traces.
Speaker separation untangles two people talking over each other on an otherwise clean track, while this tool isolates one voice from a full mix of other talkers, music and ambient noise.
It produces a focused single-voice track that works well as cloning input, though a short, naturally clean recording will always beat a heavily processed extraction.
Yes, music and ambience are pushed down along with competing talkers, so the target speaker is left as the clear foreground of the recording.
Most clips finish within a minute, with processing time tied to the length of the recording rather than how crowded the background is.