Last verified: 2026-03-06 Target:
apps/video-processor
Video Transcoding Architecture: WebM to MP4
How browser codec selection creates WebM files that break NLE editors, and how server-side transcoding fixes it.
Extends: video-recording-data-architecture.md — read that first for the full data flow from browser to admin dashboard.
Overview
Videos recorded in the platform can arrive in two container formats:
- MP4 (H.264 + AAC) — universally compatible with NLE editors (Premiere Pro, DaVinci Resolve, Final Cut Pro)
- WebM (VP8/VP9 + Opus) — the only option on Firefox and some older browsers, but not supported by any major NLE editor
When a pro user downloads a WebM video and tries to import it into Premiere Pro, they get an "unsupported compression type" error. The file is perfectly fine for browser playback but useless for professional editing.
The fix: Detect WebM files during server-side processing and transcode them to MP4 (H.264 + AAC) before they reach any downstream consumer.
Browser Codec Selection Strategy
Why WebM Exists (It's Not a Choice)
The codec selection in getBestSupportedMediaRecorderCodec() strongly prefers MP4 H.264/AAC. WebM is a fallback, not a preference. The reason WebM exists at all is that Firefox does not support MP4 recording via MediaRecorder — it can only produce WebM.
Codec Priority Cascade
The browser tries codecs in this order (first successful construction wins):
1. video/mp4;codecs="avc1.42E01E,mp4a.40.2" → MP4 H.264 Baseline + AAC (best for NLEs)
2. video/mp4 → MP4 (browser picks codecs)
3. video/webm;codecs="vp9,opus" → WebM VP9 + Opus (Chrome/Firefox fallback)
4. video/webm;codecs="vp8,opus" → WebM VP8 + Opus (legacy fallback)
5. video/webm → WebM (browser picks codecs)
6. (no mimeType — browser default) → Last resort
iOS Safari special path: iOS Safari doesn't support explicit codec strings in MediaRecorder. The code tries plain video/mp4 first, then construction with no mimeType at all. Both infer H.264 + AAC.
Construction Testing (Not isTypeSupported)
The code uses actual new MediaRecorder(stream, { mimeType }) construction to verify support — not MediaRecorder.isTypeSupported(). This is intentional:
// isTypeSupported can lie — some browsers report false but accept construction
const tryConstruct = (candidate?: string): string | null => {
try {
const rec = new MediaRecorder(stream, candidate ? { mimeType: candidate } : undefined);
return rec.mimeType || candidate || null; // Read canonical mimeType from UA
} catch {
return null;
}
};
The canonical mimeType read back from the constructed MediaRecorder is the ground truth for what the browser will actually produce.
NLE Compatibility Risk
Each codec selection is tagged with an NLE compatibility risk level:
| Container | Video Codec | Audio Codec | NLE Risk | Why |
|---|---|---|---|---|
| MP4 | H.264 | AAC | Low | Universal NLE support |
| MP4 | Unknown | Unknown | Medium | Depends on actual codec |
| WebM | VP9 | Opus | High | No NLE supports WebM import |
| WebM | VP8 | Opus | High | No NLE supports WebM import |
Browser Support Matrix
| Browser | MP4 H.264+AAC | MP4 (plain) | WebM VP9+Opus | WebM VP8+Opus | Result |
|---|---|---|---|---|---|
| Chrome (desktop) | Yes | Yes | Yes | Yes | MP4 H.264 |
| Chrome (Android) | Yes | Yes | Yes | Yes | MP4 H.264 |
| Safari (macOS) | Yes | Yes | No | No | MP4 H.264 |
| Safari (iOS) | Special* | Special* | No | No | MP4 H.264 (inferred) |
| Firefox (all) | No | No | Yes | Yes | WebM VP9 |
| Edge (Chromium) | Yes | Yes | Yes | Yes | MP4 H.264 |
* iOS Safari uses plain video/mp4 or default construction; codec strings are inferred as H.264+AAC.
Codec Selection Decision Tree
Implemented State
Transcoding is fully implemented as Step 1.5, inserted between metadata extraction (Step 1) and parallel operations (Step 2). All downstream operations (audio extraction, thumbnails, silence detection, HLS encoding) work on the transcoded MP4 file.
Note: This section was originally written as "Target State (To-Be)". Transcoding was implemented in February 2026.
Data Model: transcodedVideoUrl vs videoUrl
The original videoUrl is preserved (pointing to the WebM in blob storage). A separate transcodedVideoUrl field carries the transcoded MP4 URL. Callers (e.g., process-video-response.ts) decide whether to use videoUrl or transcodedVideoUrl for downstream consumers.
// processVideo return type (processVideo.ts)
{
wasTranscoded: boolean;
transcodedVideoUrl?: string; // Set if wasTranscoded=true; points to transcoded MP4
originalFormat?: OriginalFormat; // "webm" | "mp4" | "unknown"
// ...other fields
}
Transcoding Decision Flowchart
Pipeline Integration
Where transcoding fits within the existing processVideo() pipeline in apps/video-processor/src/operations/processVideo.ts:
processVideo()
│
├── Step 0: Download video to shared temp file (disk, not heap)
│ └── downloadVideoToFile(videoUrl, sharedVideoPath)
│
├── Step 1: Extract metadata (sequential — needed for format detection + thumbnail duration)
│ └── getVideoMetadataFromFile(sharedVideoPath) → metadata
│
├── Step 1.5: Transcode if needed (NEW)
│ ├── Detect format from metadata (codec_name, format_name)
│ ├── If WebM/VP8/VP9:
│ │ ├── transcodeToMp4(sharedVideoPath, transcodedPath)
│ │ ├── Replace sharedVideoPath reference with transcodedPath
│ │ └── Set wasTranscoded = true
│ └── If already MP4/H.264: skip (wasTranscoded = false)
│
├── Step 2: Parallel operations (all read from shared file — now transcoded if needed)
│ ├── extractAudioFromVideoFromFile(effectivePath)
│ ├── generateThumbnailFromVideoFile(effectivePath, duration)
│ └── detectSilenceInVideoFromFile(effectivePath)
│
└── Return: { audioPath, thumbnailUrl, metadata, silenceIntervals, wasTranscoded }
Key insight: The transcoded file replaces the original in the shared temp directory. All subsequent operations automatically use the MP4 version without any code changes to audio extraction, thumbnail generation, or silence detection.
FFmpeg Transcoding Settings
Settings reused from the existing rotateVideo90FromBuffer() in rotation.ts, which already produces NLE-compatible H.264+AAC output:
| Setting | Value | Rationale |
|---|---|---|
-c:v libx264 | H.264 video codec | Universal NLE + browser support |
-profile:v main | Main profile | Broad decoder compatibility (vs. High which some mobile decoders struggle with) |
-preset medium | Encoding speed/quality tradeoff | Good balance for server-side; fast saves ~30% time but ~10% larger files |
-crf 23 | Constant Rate Factor | Visually transparent quality; 18=near-lossless, 23=good quality, 28=noticeable loss |
-c:a aac | AAC audio codec | Universal NLE + browser support |
-b:a 128k | 128 kbps audio bitrate | Standard quality for speech; Opus→AAC transcoding is lossy-to-lossy anyway |
-movflags +faststart | Move moov atom to start | Enables progressive playback in browsers without full download |
-y | Overwrite output | Standard for temp file pipelines |
FFmpeg Command
ffmpeg -i input.webm \
-c:v libx264 -profile:v main -preset medium -crf 23 \
-c:a aac -b:a 128k \
-movflags +faststart \
-y output.mp4
Why These Specific Settings
- CRF 23 (not 18): The source is already a lossy WebM recording from a webcam. Re-encoding at CRF 18 would produce unnecessarily large files without visible quality improvement. CRF 23 preserves the existing quality while keeping file sizes reasonable.
- Main profile (not High): High profile would give ~5-10% better compression at the cost of compatibility. Since these videos are for NLE import (not streaming optimization), compatibility is more important.
- Medium preset (not fast/ultrafast): Server-side processing can afford the extra encoding time.
mediumproduces noticeably smaller files thanfastfor the same quality.
Data Model
processVideo Return Fields (Implemented)
The processVideo function in apps/video-processor/src/operations/processVideo.ts returns:
{
wasTranscoded: boolean;
transcodedVideoUrl?: string; // URL of the new MP4 in R2 (only if wasTranscoded=true)
originalFormat?: OriginalFormat; // "webm" | "mp4" | "unknown" from @repo/video/formats
// ...other fields (audioPath, thumbnailUrl, metadata, silenceIntervals, etc.)
}
Field semantics:
| Field | When Set | Value |
|---|---|---|
transcodedVideoUrl | wasTranscoded === true | R2 URL of the transcoded MP4 |
wasTranscoded | Always | true if transcoding occurred |
originalFormat | Always | Derived from ffprobe format_name via deriveOriginalFormat() |
Design note: The original
videoUrlis NOT replaced — it still points to the original WebM in blob storage. Callers receive both the original and the transcoded URL and can use whichever is appropriate. This differs from the original design which proposed replacingvideoUrlwith the transcoded version.
Why preserve the original? The original WebM is archived for debugging. If transcoding introduces artifacts or if the source video needs re-processing with different settings, the original is available without re-recording.
Storage Strategy
Dual Storage (Original + Transcoded)
When transcoding occurs, both files are stored in blob storage:
Blob Storage
├── flows/{flowId}/sessions/{sessionId}/
│ ├── question1/response.webm ← original (moved to originalVideoUrl)
│ └── question1/response-transcoded.mp4 ← transcoded (becomes new videoUrl)
Naming convention: The transcoded file uses a -transcoded suffix to distinguish it from the original. Both files include a random suffix (via addRandomSuffix: true) to prevent caching issues.
Cost Implications
- Additional storage: ~1x the original file size (H.264 at CRF 23 produces files of similar size to VP9 at typical webcam quality)
- Who gets transcoded: Only Firefox users (~3-5% of respondents based on PostHog analytics)
- Net impact: Minimal — a small percentage of videos stored twice
Download/Zip Fix
After transcoding, all downloaded files are guaranteed to be MP4 with correct .mp4 extensions:
| Scenario | Before | After |
|---|---|---|
| Chrome user downloads | response.mp4 (correct) | response.mp4 (unchanged) |
| Firefox user downloads | response.webm with .mp4 extension (broken) | response.mp4 (transcoded, correct) |
| Zip download | Mixed extensions, some wrong | All .mp4 |
The download logic reads videoUrl (which now always points to an MP4) rather than constructing extensions from the original upload.
Performance
Transcoding Time Estimates
Based on FFmpeg benchmarks for webcam-quality video (720p-1080p, 30fps) with -preset medium:
| Video Duration | Estimated Transcode Time | Notes |
|---|---|---|
| 30 seconds | ~5-8 seconds | Typical single-question response |
| 2 minutes | ~15-25 seconds | Long response |
| 5 minutes | ~40-60 seconds | Maximum typical length |
| 10 minutes | ~80-120 seconds | Edge case |
Pipeline Impact
- Latency increase: Transcoding adds ~0.5x the video duration to total processing time
- Only affects Firefox users: Chrome/Safari videos skip transcoding entirely
- Parallel ops unaffected: Audio extraction, thumbnails, and silence detection run after transcoding completes and use the same shared temp file pattern
- Memory: File-based (consistent with OOM-prevention pattern) — no buffers in Node.js heap
Cloud Run Considerations
- Temp disk (
/tmp): tmpfs backed by instance memory. Transcoding reads/writes temp files, so the instance needs enough memory for the video file + transcoded output simultaneously (~2x video size) - Timeout: Cloud Run default 300s should accommodate most videos. For very long videos (>5 min), the transcoding step should have its own timeout separate from ffprobe
Edge Cases
1. Timeout During Transcoding
FFmpeg can hang on corrupt or malformed input. The executeFfmpegProcess() helper already handles timeouts:
- Default: 180 seconds (from
rotation.tspattern) - Should scale with video duration:
base_timeout + (duration_seconds * 2) * 1000 - On timeout, process is killed and
VideoProcessingErroris thrown
2. Corrupt Input File
FFmpeg will exit with a non-zero code. The existing error handling in executeFfmpegProcess() captures stderr, detects SIGKILL (OOM), and wraps errors in VideoProcessingError.
3. Disk Space Exhaustion
On Cloud Run, /tmp is backed by memory. If the input + output exceed available tmpfs space:
- FFmpeg will fail with a write error
- The
finallyblock inprocessVideo()cleans up the shared temp directory - Mitigation: ensure Cloud Run instance memory is at least 3x the maximum expected video size
4. VP8/VP9 in MP4 Container (Edge Case)
Some browsers might report an MP4 container but use VP8/VP9 codecs (technically valid but extremely rare). Format detection should check both container and codec:
const needsTranscode =
metadata.format.codec?.some(c => ["vp8", "vp9", "vp09", "vp08"].includes(c)) ||
containerFormat === "webm" ||
containerFormat === "matroska";
5. Audio-Only or Silent Video
If the input has no audio stream, FFmpeg should not fail — -c:a aac is simply a no-op when there's no audio input. The transcoded output will also have no audio.
6. Already-Transcoded Re-Processing
If a video is re-processed (e.g., after a processing pipeline upgrade), the wasTranscoded flag and originalVideoUrl prevent double-transcoding. The pipeline should check if the current videoUrl is already an MP4 before transcoding.
Key Files
| File | Role |
|---|---|
packages/app-video-flow/src/web/video-recording/capabilities.ts | Browser codec selection (getBestSupportedMediaRecorderCodec), NLE risk tagging |
packages/app-video-flow/src/web/video-recording/useMediaRecorder.ts | MediaRecorder hook that uses codec capabilities for recording |
apps/video-processor/src/operations/processVideo.ts | Main processing pipeline where transcoding step will be inserted |
apps/video-processor/src/operations/rotation.ts | Existing H.264+AAC FFmpeg pattern to reuse for transcoding |
apps/video-processor/src/operations/metadata.ts | Format detection via ffprobe (getVideoMetadataFromFile) |
apps/video-processor/src/utils/ffmpeg-helpers.ts | executeFfmpegProcess() — shared FFmpeg process execution with timeout/OOM handling |
packages/registries/src/server/video-processing-types.ts | QuestionResponse type — extend with originalVideoUrl, wasTranscoded, originalFormat |
packages/services/src/server/testimonials/process-video-response.ts | Orchestration layer calling video-processor; passes through new fields |
packages/services/src/server/testimonials/steps/batch-video-processing.ts | HTTP client call to video-processor processVideo endpoint |
docs/architecture/video-recording-data-architecture.md | Parent doc — covers full data flow from browser to admin |
Implementation Status
Phase 1: Core Transcoding — ✅ Complete
-
apps/video-processor/src/operations/transcode.tstranscodeWebmToMp4(inputPath, outputPath, durationSeconds?): Promise<void>- FFmpeg settings reused from
rotation.ts(H.264 main + AAC 128k, CRF 23, faststart) - Timeout scales with video duration (
BASE_TIMEOUT_MS + durationSeconds * 2 * 1000, capped at 600s)
-
apps/video-processor/src/operations/format-detection.tsdetectNeedsTranscode(metadata): TranscodeDetectionResult- Checks container (
format_namefrom ffprobe JSON) and codec separately - Edge case: VP8/VP9 inside MP4 container also triggers transcode
-
apps/video-processor/src/operations/processVideo.ts- Step 1.5 between metadata and parallel ops
effectivePathused for all downstream ops (transcoded or original)- Returns
wasTranscoded,transcodedVideoUrl?,originalFormat?
Phase 2: Storage & Propagation — ✅ Partially Complete
- Transcoded MP4 uploaded to R2 (
transcodedVideoUrl) - Caller-side propagation of
transcodedVideoUrltoQuestionResponsetype — consumer responsibility -
originalVideoUrlfield inQuestionResponse— not yet added
Phase 3: Download Fix — 🔲 Not yet implemented
- Zip download extension fix (use
transcodedVideoUrlwhere present) - Remove extension-guessing logic in download helpers
Phase 4: Admin Visibility — 🔲 Not yet implemented
- Show transcoding status in
VideoInfoCard - Show
originalFormatin metadata display