Separate Overlapping Speakers

When two people talk over each other on one track, this tool pulls the dominant voice forward and reduces the overlap, making cross-talk easier to follow and transcribe.

Pro Premium AI tool — included with any paid plan.

🎧

Drop an audio or video file here

or

MP3, WAV, M4A, FLAC, OGG, AAC

Cleaning your audio…

Before

Tip: press the space bar to toggle Before / After.


How it works

Overlapping speech is separated by source so the foreground speaker is isolated from the competing voice and room. The result is a clearer single-speaker track from a messy cross-talk recording.

What it's good for

  • Cross-talk in interviews
  • Single-mic two-person recordings
  • Cleaning up debate audio
  • Transcription prep for overlap

Details

Engine
Demucs
Formats
MP3, WAV, M4A, FLAC, OGG, AAC
Price
Paid plans

Frequently asked questions

It isolates the dominant near-field speaker and suppresses the overlap. Full per-speaker diarization into separate tracks is on our roadmap; today it cleans the foreground voice.

Yes — reducing the competing voice and room makes speech-to-text far more accurate on overlapping sections.

Recordings where your target speaker is closest to the mic separate best, since proximity gives the model a strong foreground cue.

Speaker separation is built for two voices talking over each other on a clean track, while target-speaker extraction pulls one voice out of a wider mess of voices, music and noise.

It is tuned for two overlapping voices; with three or more it still lifts the dominant near-field speaker, but the result is less clean than a true two-person cross-talk recording.

A typical interview segment processes in under a minute, scaling with clip length rather than with how much overlap there is.

Equal-volume voices are the hardest case and leave more of the competing speaker behind; the tool performs best when your target is clearly the closer, louder voice.

Related tools