Skip to content

Audio / Voice / Video

This area covers 48 features. Watch the walkthrough, then use the reference below — each feature links to the exact moment it appears (▶).

Who this is for: Translator (oral / audio drafting). Each feature below lists the role/permission it requires.

Features

Audio lens toggle (Text ↔ Audio) ▶ 00:00

As a translator, I want to switch between text and audio editing lenses so that I can focus on different aspects of my work.

How it works. Header bar contains segmented control with "Text" and "Audio" buttons. Clicking toggles aria-pressed state without changing route or unmounting cells. Cells stay mounted during lens switch.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/EditorTable.tsx

Voice library creation — Gemini voice ▶ 00:06

As a voice actor, I want to create a new voice character by naming it and describing how it sounds, so that my TTS output matches my creative vision.

How it works. CharacterModal (or NewVoiceModal) opens with Name input and Engine/Provider selection. For Gemini: selecting Gemini engine shows base voice picker + Guidance textarea. User enters name and optional description prompt. Save creates voice and adds it to project library.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/voice/CharacterModal.tsx:1-80
src/lib/audio/voices.ts:45-110

Voice library creation — Clone voice ▶ 00:00

As a voice actor, I want to clone a voice from a reference recording so that my TTS output preserves my unique timbre.

How it works. CharacterModal shows VoiceCloneSection with three ways to capture reference: mic record, file upload, or reuse a take from project cells. Record/upload auto-saves reference clip to R2 and sets Voice.referenceAudioId. Save creates voice; generated audio is later re-voiced via Seed-VC into that timbre.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/VoiceCloneSection.tsx:1-150
src/components/voice/CharacterModal.tsx:200-250
src/lib/audio/voice-clone.ts:1-80

Voice library — Make narrator (default voice) ▶ 00:06

As a translator, I want to designate one voice as the narrator so that lines without explicit voice assignments use that voice by default.

How it works. CharacterModal footer shows 'Make narrator' button when voice is not default. Clicking marks it as defaultVoiceId in ProjectTtsSettings. Once default, footer shows 'Narrator' badge. Voice.id becomes the project's resolveVoice() fallback.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/voice/CharacterModal.tsx:286-298
src/lib/audio/voices.ts:52-72

Voice library — Edit character ▶ 00:00

As a voice actor, I want to edit an existing character's name, engine, base voice, guidance, and reference clip so that I can refine my voice over time.

How it works. CharacterModal opens with voice pre-filled. Name, color, engine, base voice name, and guidance textarea are editable. Reference clip can be replaced or removed. Engine switch resets base voice to that engine's default. Save persists changes.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/voice/CharacterModal.tsx:138-310

Voice library — Delete character ▶ 00:00

As a project admin, I want to delete a voice character so that it no longer appears in the cast library.

How it works. CharacterModal footer shows 'Delete' button for non-built-in voices. Clicking opens confirmation dialog requiring checkbox before destructive action. Deleting removes voice from project library for all team members.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/voice/CharacterModal.tsx:299-331

Voice preview/test synthesis ▶ 00:00

As a voice actor, I want to hear how my character sounds with a test sentence before committing to use it.

How it works. CharacterModal has Preview button (Play/Pause icon). Clicking synthesizes SAMPLE_TEXT ("The quick brown fox...") using the active voice+engine+API key. For local engines (Kokoro/MMS), consent dialog may appear first. Audio plays inline; button state shows loading/playing/idle. Errors surface in preview pane.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/voice/CharacterModal.tsx:189-239

Voice reference clip preview ▶ 00:00

As a voice actor, I want to hear my reference recording so that I can verify it was captured correctly.

How it works. VoiceCloneSection shows ReferencePreview component when referenceAudioId is set. Play/Pause buttons control playback. Reference clip is fetched from R2 and streamed for preview.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/VoiceCloneSection.tsx:148-200

Record audio — Full-screen modal ▶ 00:00

As a translator, I want to record audio for a cell using my microphone in a dedicated, distraction-free interface so that I can focus on capturing clear takes.

How it works. AudioRecordingModal opens full-screen when user clicks 'Record audio' button. Modal shows: active cell label, countdown (3-2-1) before recording starts, live waveform during recording, duration bar, beep toggle, Space to start/stop, Esc to cancel, Arrow keys to navigate cells. Records to WebM/Opus (or browser-supported format). Phase states: idle → counting → recording → preview → uploading → saved.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/AudioRecorder/AudioRecordingModal.tsx:1-150

Record audio — Mic permission handling ▶ 00:00

As a user, I want clear guidance when my microphone access is blocked so that I can unblock it and resume recording.

How it works. CellAudioRecordButton checks getUnsupportedReason() on mount. If unsupported, button is disabled with tooltip. If micDenied=true, button shows MicOff icon and amber color; clicking shows help popover explaining how to enable in browser site settings. Probe mic permission in modal's startFlow() before countdown.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/CellAudioRecordButton.tsx:18-92
src/components/AudioRecorder/probeMicPermission.ts

Recording takes — Audition (preview playback) ▶ 00:00

As a translator, I want to preview a recording take before saving it so that I can decide whether to keep or retake.

How it works. AudioRecordingModal preview phase shows a Play button. Clicking plays the just-recorded blob. Audio ends automatically or user can stop. Play button shows play/pause state.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/AudioRecorder/AudioRecordingModal.tsx:145-230

Recording takes — Save recording ▶ 00:00

As a translator, I want to save a recording to the cell so that it persists in the project and can be played back later.

How it works. AudioRecordingModal preview phase has 'Save' button. Clicking uploads blob to R2, emits cell.audio.attach event, marks cell as having audio in 'recording' slot. Upload shows spinner + progress. On success, phase moves to 'saved' and user can navigate to next cell or close.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/AudioRecorder/AudioRecordingModal.tsx:150-300

Recording takes — Retake ▶ 00:00

As a translator, I want to discard a preview and record again so that I can capture a better take.

How it works. AudioRecordingModal preview phase has 'Retake' button. Clicking discards the preview, resets recorder state, and returns to idle phase. User can record again.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/AudioRecorder/AudioRecordingModal.tsx:200-250

Recording takes — Delete take ▶ 00:00

As a translator, I want to delete an old or unwanted recording so that only my best takes remain.

How it works. TakesStrip in recording modal shows recorded takes for active cell as circular badges. Each badge has a delete (X) icon. Clicking delete removes the take from R2 and updates UI. Deletes via emitCellAudioDetach or direct R2 DELETE.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/AudioRecorder/TakesStrip.tsx

Recording takes — Select active take ▶ 00:00

As a translator, I want to choose which of my multiple recordings is the active one for the cell so that I can use my favorite take.

How it works. TakesStrip badges show take state: current active take is highlighted (emerald ring). Clicking another badge selects it (emits cell.audio.select event). Only one take is active per cell per slot.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/AudioRecorder/TakesStrip.tsx

Remove noise (RNNoise denoising) ▶ 00:00

As a translator, I want to remove background noise from my recording so that my audio is clearer and more professional.

How it works. DenoiseButton appears in cell audio tab when recording exists. States: 'Remove noise' (available), animated 'Removing noise...' (processing), 'Noise removed ✓ + Revert' (done). Clicking 'Remove noise' runs on-device RNNoise, creates new denoised take (audioId prefixed with 'dn-'), optimistically selects it. Revert re-selects original via referenceAudioId.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/audio/DenoiseButton.tsx:1-162
src/lib/audio/denoise.ts:105-178

Noise removal support detection ▶ 00:00

As a user, I want to know whether my browser supports noise removal so that I'm not confused by a missing feature.

How it works. DenoiseButton disabled if isDenoiseSupported() is false. Tooltip explains 'Noise removal isn't supported in this browser'. Button greyed out.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/audio/DenoiseButton.tsx:56-157
src/lib/audio/denoise.ts:104-115

Transcribe audio to text (Whisper) ▶ 00:00

As a translator, I want to transcribe a recording to text so that I can use it as a starting point for translation or to verify accuracy.

How it works. CellTranscribeBadge button in cell audio tab, labeled 'Transcribe' or showing 'Transcribing...' while in progress. Clicking sends recording bytes to Whisper worker (on-device), generates word-level timings (chunks). On success, emits cell.audio.attach with timings payload. Badge shows success checkmark or error state.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/audio/transcribe.ts:95-252
src/components/CellTranscribeBadge.tsx

Transcription consent gate ▶ 00:00

As a user, I want to be informed about the Whisper model download before it happens so that I can make an informed choice.

How it works. First time a user clicks 'Transcribe', AiModelConsentDialog appears showing Whisper info: label, ~140 MB download size, rationale. User clicks 'Accept' or 'Cancel'. Accepting stores consent in localStorage (codex.aiConsent.whisper='1'); future transcriptions skip the dialog.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/audio/ai-consent.ts:88-152

Generate voice audio (TTS synthesis) ▶ 00:00

As a translator, I want to generate a voiced version of a cell's text using a character voice so that I can hear how it sounds.

How it works. CellVoicePanel shows cast chips for active cell. Clicking a chip (or Generate button in overflow) synthesizes the cell's text using that voice, uploads to R2, and attaches as 'generatedVoice' slot. Spinner shows during synthesis; on success, audio plays automatically.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/audio/generate-voice.ts:48-174
src/components/cell/CellVoicePanel.tsx

Generate voice — Provider selection ▶ 00:00

As a project admin, I want to choose which TTS engine to use (OmniVoice, Gemini, Kokoro, MMS) so that I can balance quality, cost, and latency.

How it works. Project settings TTS section or CharacterModal shows engine options grouped by tier (Cloud: OmniVoice, Gemini; On-device: Kokoro, MMS). Each has icon, title, caveat, and tooltip. Selecting engine updates ProjectTtsSettings.provider. Per-voice engine overrides project default.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/audio/tts-providers.ts:43-203
src/components/voice/CharacterModal.tsx:329-380

Generate voice — OmniVoice (recommended cloud engine) ▶ 00:00

As a translator, I want a default TTS option that requires no setup so that I can start voicing immediately without API keys.

How it works. OmniVoice is the default project TTS engine. No API key required. Synthesis is server-side (sync-worker). Supports voice cloning via native reference. Synthesis progresses through states: loading → synthesizing. Speed/quality trade-off handled server-side.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/audio/generate-voice.ts:63-101

Generate voice — Gemini TTS (BYOK cloud engine) ▶ 00:00

As a user with a Google API key, I want to use Gemini TTS for its high quality and multilingual support so that I can voice multiple languages.

How it works. Project settings allow entering a Gemini API key. If key is present and valid, synthesis succeeds. CharacterModal engine picker shows Gemini with 'Needs your own Google AI key' caveat. Synthesis uses synthesizeGeminiTtsToWavBlob. If key is missing, synthesis fails with 'Add a Gemini API key' error.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/audio/gemini-tts.ts
src/lib/audio/tts.ts:140-148

Generate voice — Kokoro TTS (on-device English engine) ▶ 00:00

As a user who wants free, private TTS without API keys, I want to use Kokoro for English text so that my data never leaves my browser.

How it works. Kokoro engine option in CharacterModal. First use shows consent dialog (~80 MB model). Synthesis runs in Kokoro worker on-device. Named voices available (af_heart, etc.). No API key needed. Speed parameter supported.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/audio/tts.ts:177-202
src/lib/audio/kokoro-worker.ts

Generate voice — MMS TTS (on-device multilingual engine) ▶ 00:00

As a translator working with multiple languages, I want to use MMS for free multilingual TTS so that I can voice all my languages without buying multiple API keys.

How it works. MMS engine option in CharacterModal. Language is set via voiceName (e.g., 'eng', 'fra', 'spa'). First use of a language downloads that model (~130 MB) with consent gate. Synthesis runs on-device. Language mismatch errors caught with friendly error message.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/audio/tts.ts:150-175
src/lib/audio/mms-worker.ts

Voice together — Combined synthesis ▶ 00:00

As a voice actor, I want to synthesize multiple cells as one continuous clip so that the prosody is natural instead of choppy.

How it works. User selects multiple cells + clicks 'Voice together' (or batch generation). Modal joins text with separator, synthesizes ONE clip, uploads once, attaches the same audioId to all cells (untrimmed). Open CombinedBoundaryEditor to manually set where each line's audio starts/ends.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/audio/combined-voice.ts:70-150
src/components/voice/CombinedBoundaryEditor.tsx

Audio playback — Global play-all bar ▶ 00:00

As a translator, I want to listen to all voiced lines in sequence with a global player so that I can hear the full narration without clicking each cell.

How it works. VoicePlaybackBar pinned at bottom of Audio lens. Shows now-playing cell name + voice avatar, full-width scrubber, transport (prev/play/pause/next), speed selector (0.5x–2x), time display (current/total), volume slider + mute. Play starts queue from cell 0; next/prev navigate cells. Scrubber is draggable to seek.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/voice/VoicePlaybackBar.tsx:42-134
src/lib/audio/play-queue.ts:1-150

Audio playback — Speed control ▶ 00:00

As a translator, I want to control playback speed so that I can listen faster or slower depending on my focus needs.

How it works. Speed button in playback bar (text shows current rate, e.g., '1x'). Click opens popover with speed options (0.5x, 0.75x, 1x, 1.25x, 1.5x, 2x). Selecting a speed applies it live to the playing clip and persists for subsequent clips.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/voice/VoicePlaybackBar.tsx:191-222

Audio playback — Volume control ▶ 00:00

As a user, I want to adjust volume and mute/unmute so that I can hear the playback at the right level.

How it works. Volume control in playback bar: mute button (Volume2/VolumeX icon) + slider (0–1, hidden on mobile). Clicking mute toggles to 0 or 1. Dragging slider adjusts live. Volume persists across tracks.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/voice/VoicePlaybackBar.tsx:225-250

Audio attachment — Select active recording ▶ 00:00

As a translator, I want to choose which of my multiple recordings is the active one for a cell so that I can use my best take.

How it works. Audio tabs for a cell show recording takes. Each take is a selectable option (e.g., radio or button). Selecting a take emits cell.audio.select, updates cell.selectedAudioId, and that audio becomes the active one for playback.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/sync/events-emit.ts: emitCellAudioSelect

Audio attachment — Crop/trim recording ▶ 00:00

As a translator, I want to trim a recording so that I can remove unwanted sections without re-recording.

How it works. CropButton (scissors icon) in cell audio tab opens popover. Waveform displayed with two draggable trim handles (start/end). Handles bound by MIN_GAP (0.15s). Dragging to edge clears that bound (null). 'Reset' button clears both. Trim stored in audio-cell-prefs and applied on playback (non-destructive).

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/cell/CropEditor.tsx:1-120

Audio attachment — Delete recording ▶ 00:00

As a translator, I want to delete a recording so that it no longer takes up space and doesn't appear in playback options.

How it works. Audio tab or TakesStrip shows delete icon (trash/X) for each take. Clicking delete removes the attachment from R2 and updates UI. Cell no longer has that audio option.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/audio/upload.ts:142-153

Audio attachment — Upload recording file ▶ 00:00

As a translator, I want to upload an audio file from my computer so that I can use existing recordings in the project.

How it works. Audio tab has 'Upload' button or file input. Selecting a file uploads to R2 via uploadCellAudio, emits cell.audio.attach. File types: WebM, MP3, MP4, M4A, WAV. On success, audio appears as a take option.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/audio/upload.ts:95-118

Video attachment — Save video URL ▶ 00:00

As a translator working with video, I want to attach a video URL to a file so that I can sync subtitles with the video.

How it works. VideoAttachmentDialog opens from header ⋯ menu (Attach video, visible only for VTT/SRT files). Modal has title 'Attach Video' + URL input. User pastes video URL (e.g., YouTube, Vimeo) and clicks Save. URL persists on file metadata.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

e2e/specs/editor/video-attachment-save-url.smoke.spec.ts

Video attachment — Remove video URL ▶ 00:00

As a translator, I want to remove a video URL so that the file no longer references it.

How it works. VideoAttachmentDialog (when video already attached) shows the saved URL + Remove button. Clicking Remove clears the URL. File now has no video reference.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

e2e/specs/editor/video-attachment-remove.smoke.spec.ts

Subtitles — Import VTT/SRT with voice tags ▶ 00:00

As a translator, I want to import a subtitle file with speaker voice tags so that the cast is auto-created and lines are assigned to speakers.

How it works. Import dialog accepts VTT/SRT files. VTT cues with `<v Name>` tags are parsed: speaker name extracted, cast member auto-created or found, cell assigned to that speaker via castAssignments. Timecodes preserved from VTT. Plain narrator lines (no tag) stay untagged.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/parsers/subtitle.ts
src/lib/timeline/diarization-loader.ts

Subtitles — Export VTT with voice tags ▶ 00:00

As a translator, I want to export subtitles as VTT with voice tags so that the voiced version preserves speaker identity.

How it works. Export dialog format selection: choose 'WebVTT (subtitles)'. File exports with WEBVTT header + cues. Cast-assigned cells include `<v Name>` tag. Untagged cells (narrator or no assignment) export as plain text. Timecodes from cell.context preserved.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/video/vtt-generator.ts:34-50

Subtitles — VTT timecode round-trip ▶ 00:00

As a translator, I want subtitles to preserve their original timecodes through import and export so that they stay in sync with video.

How it works. Import VTT with cues like '00:00:01.000 --> 00:00:03.000'. Cell.context stores this timestamp range verbatim. On export, same range is written to output VTT. Timestamps survive the full round trip.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/video/vtt-generator.ts:8-28
src/lib/parsers/subtitle.ts

Diarization — Speaker segmentation (Pyannote) ▶ 00:00

As a translator with a media file, I want to automatically split audio by speaker so that each speaker's section becomes a separate cell.

How it works. Workspace header ⋯ menu (for media files) shows 'Diarize audio' option. Clicking opens DiarizationModal. User can specify number of speakers (optional). Modal shows progress: Starting → Running (% progress + 'Please wait, analyzing...') → Applying → Done. On success, media cells replaced with one per speaker turn, each trimmed to that speaker's time range. Cast members auto-created ('Speaker 1', 'Speaker 2', etc.) and assigned.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/diarization/run-diarization.ts:172-188

Diarization — Auto-assign speakers to cast ▶ 00:00

As a translator, I want diarization to automatically create and assign speaker roles so that I don't have to manually create cast members.

How it works. After diarization completes, cast members 'Speaker 1', 'Speaker 2', etc. are created with auto-assigned colors and added to project TTS settings. Each media segment's cell is assigned to its speaker via castAssignments.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/diarization/run-diarization.ts:155-165
src/lib/import/cast-from-speakers.ts

Audio export — Audio by character ▶ 00:00

As a voice actor, I want to export audio grouped by character so that I can send separate files to collaborators or post individual voice tracks.

How it works. Export dialog format selection: 'Audio by character'. Scope locked to file (Whole project disabled). Preview shows 'No cells with audio found' if file has no audio. Exporting downloads a ZIP containing one WAV per cast member (character), each with that character's voiced cells concatenated.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

e2e/specs/editor/audio-by-character.smoke.spec.ts

Cast assignment — Assign cell to character (chip click) ▶ 00:00

As a translator, I want to click a cast member's chip to assign a line to that character so that the character's voice is used.

How it works. In Audio lens, cell shows cast chips (one per project voice). Chips are buttons with aria-pressed. Clicking a chip (currently not active) assigns cell to that voice, sets voiceId, and kicks off TTS synthesis. Chip becomes pressed (aria-pressed='true'). Clicking an active chip unassigns it.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/cell/CellVoicePanel.tsx

Audio cell waveform display ▶ 00:00

As a translator, I want to see the waveform of a recording so that I can visually inspect audio quality and timing.

How it works. Cell audio tab shows CellWaveform component: visual representation of recorded audio (peaks). Used in crop editor, takes display, and cell audio details. Waveform computed from audio file peaks.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/CellWaveform.tsx

Transcription progress badge ▶ 00:00

As a translator, I want to see transcription progress so that I know whether transcription is complete or still running.

How it works. Cell audio tab shows CellTranscribeBadge with status: 'Transcribe' (idle), spinner + 'Transcribing...' (in progress), checkmark + word count (done), or error state. Clicking transcribes; shows Whisper model download progress if first run.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/CellTranscribeBadge.tsx

TTS synthesis progress badge ▶ 00:00

As a translator, I want to see TTS synthesis progress so that I know when audio generation will complete.

How it works. Cell voice panel or playback row shows synthesis progress: spinner + 'Synthesizing...' while TTS runs. On success, replaces with play button. On error, shows error message with retry option.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/audio/tts.ts:24-82

Voice guidance / prompt field (Gemini) ▶ 00:00

As a voice director, I want to write guidance for Gemini TTS so that the synthesized voice has the tone and accent I want.

How it works. CharacterModal (Gemini engine) shows 'Guidance' textarea. User enters free-text guidance (e.g., 'warm, older man, calm'). Text is stored on Voice.prompt and passed to Gemini TTS API. Guidance overrides or supplements the default prompt template.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/components/voice/CharacterModal.tsx:375-415

MMS language selection ▶ 00:00

As a translator using MMS for multilingual voicing, I want to select the language so that the synthesizer uses the right model.

How it works. CharacterModal (MMS engine) shows language dropdown or text field. User selects or enters MMS language code (e.g., 'eng', 'fra', 'spa'). Code is stored in Voice.voiceName. Synthesis loads that language's model on first use.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/audio/mms-languages.ts

Model download progress (Whisper/Kokoro/MMS) ▶ 00:00

As a user, I want to see model download progress so that I know how long the first synthesis/transcription will take.

How it works. First time a local model (Whisper, Kokoro, or MMS) is used, AiModelDownloadChip appears in bottom-left corner showing 'Downloading <model>: <loaded>/<total> MB' with progress bar. On completion, chip dismisses.

WhoTranslator (oral / audio drafting) PermissionsAttach/select/remove cell audio: contributor(400); playback: viewer(100)
Key files

src/lib/audio/prefetch.ts