Dubbing Workflow

import { Steps } from ‘@astrojs/starlight/components’;

The dubbing pipeline runs four stages in sequence: ASR transcription → translation → TTS synthesis → mux. This page explains each stage, how to monitor progress, and how to review and fix individual segments without re-running the whole pipeline.

Before you start

Make sure you have:

A project with a source video configured.
Target language and voice selected.
AI models downloaded (UltiVoice prompts if any are missing).

Start the pipeline

Click Start Dubbing in the Pipeline tab (or press Ctrl+Enter). UltiVoice queues the job and begins immediately.

A progress panel shows the current stage and estimated time remaining. You can navigate to other tabs or minimise the window — the pipeline runs in the background.

Stage 1 — Transcription (ASR)

Whisper large-v3 reads the audio and produces a timed transcript: each spoken segment with start/end timestamps and the original text.

Typical speed: 5–10× real-time on GPU; 0.5–1× on CPU.
A 5-minute video takes ~30–60 seconds on a modern NVIDIA GPU.

When transcription completes, the Transcript tab populates with all segments.

Stage 2 — Translation

Each segment is translated from the source language into the target language using the local Qwen2.5-7B model (or the translation engine configured in Settings).

Translation preserves segment boundaries and timing anchors.
Names, proper nouns, and untranslatable terms are carried through unchanged.

Stage 3 — TTS synthesis

The translated text is converted to speech using the selected voice. Timing is adjusted to fit the original segment durations where possible.

Segments that are significantly shorter or longer than the original audio are flagged for review (shown in yellow in the Transcript tab).
Voice cloning applies the reference timbre to each synthesised segment.

Stage 4 — Mux

FFmpeg assembles the final video: the dubbed audio track replaces the original, subtitles are embedded or burnt in per your settings, and the output is written to the project folder.

Reviewing the output

After the pipeline completes:

Open the Preview tab. Use the Original / Dubbed toggle to compare audio tracks.
Open the Transcript tab to see all segments. Flagged segments (timing mismatch or low TTS confidence) appear highlighted.

Editing a segment

Click any segment row in the Transcript tab to select it.
Edit the Translated text field — correct terminology, natural phrasing, or timing-sensitive text.
Click Re-synthesise segment to regenerate only that segment’s audio. This takes a few seconds.
Preview the segment in isolation with the play button.
Repeat for any other segments that need adjustment.

Re-synthesising individual segments does not rerun the full pipeline — only the affected audio is regenerated and the mux is updated.

Re-running the full pipeline

If you change the source language, target language, or voice, you need to rerun the full pipeline. Click Re-run pipeline in the Pipeline tab. Previous output files are overwritten (originals are moved to a _previous subfolder).

Pipeline errors

If any stage fails, the progress panel shows an error badge with a log excerpt. Common causes and fixes are covered in Troubleshooting. The pipeline is restartable — fix the issue and click Start Dubbing again; completed stages are cached and skipped.

Next steps

Subtitles — edit timing and style before export.
Export & Download — render the final video.