Separate Overlapping Speakers
When two people talk over each other on one track, this tool pulls the dominant voice forward and reduces the overlap, making cross-talk easier to follow and transcribe.
Pro Premium AI tool — included with any paid plan.
How it works
Overlapping speech is separated by source so the foreground speaker is isolated from the competing voice and room. The result is a clearer single-speaker track from a messy cross-talk recording.
What it's good for
- Cross-talk in interviews
- Single-mic two-person recordings
- Cleaning up debate audio
- Transcription prep for overlap
Details
- Engine
- Demucs
- Formats
- MP3, WAV, M4A, FLAC, OGG, AAC
- Price
- Paid plans
Frequently asked questions
It isolates the dominant near-field speaker and suppresses the overlap. Full per-speaker diarization into separate tracks is on our roadmap; today it cleans the foreground voice.
Yes — reducing the competing voice and room makes speech-to-text far more accurate on overlapping sections.
Recordings where your target speaker is closest to the mic separate best, since proximity gives the model a strong foreground cue.
Speaker separation is built for two voices talking over each other on a clean track, while target-speaker extraction pulls one voice out of a wider mess of voices, music and noise.
It is tuned for two overlapping voices; with three or more it still lifts the dominant near-field speaker, but the result is less clean than a true two-person cross-talk recording.
A typical interview segment processes in under a minute, scaling with clip length rather than with how much overlap there is.
Equal-volume voices are the hardest case and leave more of the competing speaker behind; the tool performs best when your target is clearly the closer, louder voice.