Last verified: 2026-03-06 Target: apps/video-processor

Video Transcoding Architecture: WebM to MP4

How browser codec selection creates WebM files that break NLE editors, and how server-side transcoding fixes it.

Extends: video-recording-data-architecture.md — read that first for the full data flow from browser to admin dashboard.

Overview

Videos recorded in the platform can arrive in two container formats:

MP4 (H.264 + AAC) — universally compatible with NLE editors (Premiere Pro, DaVinci Resolve, Final Cut Pro)
WebM (VP8/VP9 + Opus) — the only option on Firefox and some older browsers, but not supported by any major NLE editor

When a pro user downloads a WebM video and tries to import it into Premiere Pro, they get an "unsupported compression type" error. The file is perfectly fine for browser playback but useless for professional editing.

The fix: Detect WebM files during server-side processing and transcode them to MP4 (H.264 + AAC) before they reach any downstream consumer.

Browser Codec Selection Strategy

Why WebM Exists (It's Not a Choice)

The codec selection in getBestSupportedMediaRecorderCodec() strongly prefers MP4 H.264/AAC. WebM is a fallback, not a preference. The reason WebM exists at all is that Firefox does not support MP4 recording via MediaRecorder — it can only produce WebM.

Codec Priority Cascade

The browser tries codecs in this order (first successful construction wins):

1. video/mp4;codecs="avc1.42E01E,mp4a.40.2"   → MP4 H.264 Baseline + AAC (best for NLEs)
2. video/mp4                                     → MP4 (browser picks codecs)
3. video/webm;codecs="vp9,opus"                  → WebM VP9 + Opus (Chrome/Firefox fallback)
4. video/webm;codecs="vp8,opus"                  → WebM VP8 + Opus (legacy fallback)
5. video/webm                                     → WebM (browser picks codecs)
6. (no mimeType — browser default)               → Last resort

iOS Safari special path: iOS Safari doesn't support explicit codec strings in MediaRecorder. The code tries plain video/mp4 first, then construction with no mimeType at all. Both infer H.264 + AAC.

Construction Testing (Not `isTypeSupported`)

The code uses actual new MediaRecorder(stream, { mimeType }) construction to verify support — not MediaRecorder.isTypeSupported(). This is intentional:

// isTypeSupported can lie — some browsers report false but accept construction
const tryConstruct = (candidate?: string): string | null => {
  try {
    const rec = new MediaRecorder(stream, candidate ? { mimeType: candidate } : undefined);
    return rec.mimeType || candidate || null;  // Read canonical mimeType from UA
  } catch {
    return null;
  }
};

The canonical mimeType read back from the constructed MediaRecorder is the ground truth for what the browser will actually produce.

NLE Compatibility Risk

Each codec selection is tagged with an NLE compatibility risk level:

Container	Video Codec	Audio Codec	NLE Risk	Why
MP4	H.264	AAC	Low	Universal NLE support
MP4	Unknown	Unknown	Medium	Depends on actual codec
WebM	VP9	Opus	High	No NLE supports WebM import
WebM	VP8	Opus	High	No NLE supports WebM import

Browser Support Matrix

Browser	MP4 H.264+AAC	MP4 (plain)	WebM VP9+Opus	WebM VP8+Opus	Result
Chrome (desktop)	Yes	Yes	Yes	Yes	MP4 H.264
Chrome (Android)	Yes	Yes	Yes	Yes	MP4 H.264
Safari (macOS)	Yes	Yes	No	No	MP4 H.264
Safari (iOS)	Special*	Special*	No	No	MP4 H.264 (inferred)
Firefox (all)	No	No	Yes	Yes	WebM VP9
Edge (Chromium)	Yes	Yes	Yes	Yes	MP4 H.264

* iOS Safari uses plain video/mp4 or default construction; codec strings are inferred as H.264+AAC.

Codec Selection Decision Tree

Implemented State

Transcoding is fully implemented as Step 1.5, inserted between metadata extraction (Step 1) and parallel operations (Step 2). All downstream operations (audio extraction, thumbnails, silence detection, HLS encoding) work on the transcoded MP4 file.

Note: This section was originally written as "Target State (To-Be)". Transcoding was implemented in February 2026.

Data Model: `transcodedVideoUrl` vs `videoUrl`

The original videoUrl is preserved (pointing to the WebM in blob storage). A separate transcodedVideoUrl field carries the transcoded MP4 URL. Callers (e.g., process-video-response.ts) decide whether to use videoUrl or transcodedVideoUrl for downstream consumers.

// processVideo return type (processVideo.ts)
{
  wasTranscoded: boolean;
  transcodedVideoUrl?: string;  // Set if wasTranscoded=true; points to transcoded MP4
  originalFormat?: OriginalFormat;  // "webm" | "mp4" | "unknown"
  // ...other fields
}

Transcoding Decision Flowchart

Pipeline Integration

Where transcoding fits within the existing processVideo() pipeline in apps/video-processor/src/operations/processVideo.ts:

processVideo()
│
├── Step 0: Download video to shared temp file (disk, not heap)
│   └── downloadVideoToFile(videoUrl, sharedVideoPath)
│
├── Step 1: Extract metadata (sequential — needed for format detection + thumbnail duration)
│   └── getVideoMetadataFromFile(sharedVideoPath) → metadata
│
├── Step 1.5: Transcode if needed (NEW)
│   ├── Detect format from metadata (codec_name, format_name)
│   ├── If WebM/VP8/VP9:
│   │   ├── transcodeToMp4(sharedVideoPath, transcodedPath)
│   │   ├── Replace sharedVideoPath reference with transcodedPath
│   │   └── Set wasTranscoded = true
│   └── If already MP4/H.264: skip (wasTranscoded = false)
│
├── Step 2: Parallel operations (all read from shared file — now transcoded if needed)
│   ├── extractAudioFromVideoFromFile(effectivePath)
│   ├── generateThumbnailFromVideoFile(effectivePath, duration)
│   └── detectSilenceInVideoFromFile(effectivePath)
│
└── Return: { audioPath, thumbnailUrl, metadata, silenceIntervals, wasTranscoded }

Key insight: The transcoded file replaces the original in the shared temp directory. All subsequent operations automatically use the MP4 version without any code changes to audio extraction, thumbnail generation, or silence detection.

FFmpeg Transcoding Settings

Settings reused from the existing rotateVideo90FromBuffer() in rotation.ts, which already produces NLE-compatible H.264+AAC output:

Setting	Value	Rationale
`-c:v libx264`	H.264 video codec	Universal NLE + browser support
`-profile:v main`	Main profile	Broad decoder compatibility (vs. High which some mobile decoders struggle with)
`-preset medium`	Encoding speed/quality tradeoff	Good balance for server-side; `fast` saves ~30% time but ~10% larger files
`-crf 23`	Constant Rate Factor	Visually transparent quality; 18=near-lossless, 23=good quality, 28=noticeable loss
`-c:a aac`	AAC audio codec	Universal NLE + browser support
`-b:a 128k`	128 kbps audio bitrate	Standard quality for speech; Opus→AAC transcoding is lossy-to-lossy anyway
`-movflags +faststart`	Move moov atom to start	Enables progressive playback in browsers without full download
`-y`	Overwrite output	Standard for temp file pipelines

FFmpeg Command

ffmpeg -i input.webm \
  -c:v libx264 -profile:v main -preset medium -crf 23 \
  -c:a aac -b:a 128k \
  -movflags +faststart \
  -y output.mp4

Why These Specific Settings

CRF 23 (not 18): The source is already a lossy WebM recording from a webcam. Re-encoding at CRF 18 would produce unnecessarily large files without visible quality improvement. CRF 23 preserves the existing quality while keeping file sizes reasonable.
Main profile (not High): High profile would give ~5-10% better compression at the cost of compatibility. Since these videos are for NLE import (not streaming optimization), compatibility is more important.
Medium preset (not fast/ultrafast): Server-side processing can afford the extra encoding time. medium produces noticeably smaller files than fast for the same quality.

Data Model

`processVideo` Return Fields (Implemented)

The processVideo function in apps/video-processor/src/operations/processVideo.ts returns:

{
  wasTranscoded: boolean;
  transcodedVideoUrl?: string;   // URL of the new MP4 in R2 (only if wasTranscoded=true)
  originalFormat?: OriginalFormat;  // "webm" | "mp4" | "unknown" from @repo/video/formats
  // ...other fields (audioPath, thumbnailUrl, metadata, silenceIntervals, etc.)
}

Field semantics:

Field	When Set	Value
`transcodedVideoUrl`	`wasTranscoded === true`	R2 URL of the transcoded MP4
`wasTranscoded`	Always	`true` if transcoding occurred
`originalFormat`	Always	Derived from ffprobe `format_name` via `deriveOriginalFormat()`

Design note: The original videoUrl is NOT replaced — it still points to the original WebM in blob storage. Callers receive both the original and the transcoded URL and can use whichever is appropriate. This differs from the original design which proposed replacing videoUrl with the transcoded version.

Why preserve the original? The original WebM is archived for debugging. If transcoding introduces artifacts or if the source video needs re-processing with different settings, the original is available without re-recording.

Storage Strategy

Dual Storage (Original + Transcoded)

When transcoding occurs, both files are stored in blob storage:

Blob Storage
├── flows/{flowId}/sessions/{sessionId}/
│   ├── question1/response.webm          ← original (moved to originalVideoUrl)
│   └── question1/response-transcoded.mp4 ← transcoded (becomes new videoUrl)

Naming convention: The transcoded file uses a -transcoded suffix to distinguish it from the original. Both files include a random suffix (via addRandomSuffix: true) to prevent caching issues.

Cost Implications

Additional storage: ~1x the original file size (H.264 at CRF 23 produces files of similar size to VP9 at typical webcam quality)
Who gets transcoded: Only Firefox users (~3-5% of respondents based on PostHog analytics)
Net impact: Minimal — a small percentage of videos stored twice

Download/Zip Fix

After transcoding, all downloaded files are guaranteed to be MP4 with correct .mp4 extensions:

Scenario	Before	After
Chrome user downloads	`response.mp4` (correct)	`response.mp4` (unchanged)
Firefox user downloads	`response.webm` with `.mp4` extension (broken)	`response.mp4` (transcoded, correct)
Zip download	Mixed extensions, some wrong	All `.mp4`

The download logic reads videoUrl (which now always points to an MP4) rather than constructing extensions from the original upload.

Performance

Transcoding Time Estimates

Based on FFmpeg benchmarks for webcam-quality video (720p-1080p, 30fps) with -preset medium:

Video Duration	Estimated Transcode Time	Notes
30 seconds	~5-8 seconds	Typical single-question response
2 minutes	~15-25 seconds	Long response
5 minutes	~40-60 seconds	Maximum typical length
10 minutes	~80-120 seconds	Edge case

Pipeline Impact

Latency increase: Transcoding adds ~0.5x the video duration to total processing time
Only affects Firefox users: Chrome/Safari videos skip transcoding entirely
Parallel ops unaffected: Audio extraction, thumbnails, and silence detection run after transcoding completes and use the same shared temp file pattern
Memory: File-based (consistent with OOM-prevention pattern) — no buffers in Node.js heap

Cloud Run Considerations

Temp disk (/tmp): tmpfs backed by instance memory. Transcoding reads/writes temp files, so the instance needs enough memory for the video file + transcoded output simultaneously (~2x video size)
Timeout: Cloud Run default 300s should accommodate most videos. For very long videos (>5 min), the transcoding step should have its own timeout separate from ffprobe

Edge Cases

1. Timeout During Transcoding

FFmpeg can hang on corrupt or malformed input. The executeFfmpegProcess() helper already handles timeouts:

Default: 180 seconds (from rotation.ts pattern)
Should scale with video duration: base_timeout + (duration_seconds * 2) * 1000
On timeout, process is killed and VideoProcessingError is thrown

2. Corrupt Input File

FFmpeg will exit with a non-zero code. The existing error handling in executeFfmpegProcess() captures stderr, detects SIGKILL (OOM), and wraps errors in VideoProcessingError.

3. Disk Space Exhaustion

On Cloud Run, /tmp is backed by memory. If the input + output exceed available tmpfs space:

FFmpeg will fail with a write error
The finally block in processVideo() cleans up the shared temp directory
Mitigation: ensure Cloud Run instance memory is at least 3x the maximum expected video size

4. VP8/VP9 in MP4 Container (Edge Case)

Some browsers might report an MP4 container but use VP8/VP9 codecs (technically valid but extremely rare). Format detection should check both container and codec:

const needsTranscode =
  metadata.format.codec?.some(c => ["vp8", "vp9", "vp09", "vp08"].includes(c)) ||
  containerFormat === "webm" ||
  containerFormat === "matroska";

5. Audio-Only or Silent Video

If the input has no audio stream, FFmpeg should not fail — -c:a aac is simply a no-op when there's no audio input. The transcoded output will also have no audio.

6. Already-Transcoded Re-Processing

If a video is re-processed (e.g., after a processing pipeline upgrade), the wasTranscoded flag and originalVideoUrl prevent double-transcoding. The pipeline should check if the current videoUrl is already an MP4 before transcoding.

Key Files

File	Role
`packages/app-video-flow/src/web/video-recording/capabilities.ts`	Browser codec selection (`getBestSupportedMediaRecorderCodec`), NLE risk tagging
`packages/app-video-flow/src/web/video-recording/useMediaRecorder.ts`	MediaRecorder hook that uses codec capabilities for recording
`apps/video-processor/src/operations/processVideo.ts`	Main processing pipeline where transcoding step will be inserted
`apps/video-processor/src/operations/rotation.ts`	Existing H.264+AAC FFmpeg pattern to reuse for transcoding
`apps/video-processor/src/operations/metadata.ts`	Format detection via ffprobe (`getVideoMetadataFromFile`)
`apps/video-processor/src/utils/ffmpeg-helpers.ts`	`executeFfmpegProcess()` — shared FFmpeg process execution with timeout/OOM handling
`packages/registries/src/server/video-processing-types.ts`	`QuestionResponse` type — extend with `originalVideoUrl`, `wasTranscoded`, `originalFormat`
`packages/services/src/server/testimonials/process-video-response.ts`	Orchestration layer calling video-processor; passes through new fields
`packages/services/src/server/testimonials/steps/batch-video-processing.ts`	HTTP client call to video-processor `processVideo` endpoint
`docs/architecture/video-recording-data-architecture.md`	Parent doc — covers full data flow from browser to admin

Implementation Status

Phase 1: Core Transcoding — ✅ Complete

apps/video-processor/src/operations/transcode.ts
- transcodeWebmToMp4(inputPath, outputPath, durationSeconds?): Promise<void>
- FFmpeg settings reused from rotation.ts (H.264 main + AAC 128k, CRF 23, faststart)
- Timeout scales with video duration (BASE_TIMEOUT_MS + durationSeconds * 2 * 1000, capped at 600s)
apps/video-processor/src/operations/format-detection.ts
- detectNeedsTranscode(metadata): TranscodeDetectionResult
- Checks container (format_name from ffprobe JSON) and codec separately
- Edge case: VP8/VP9 inside MP4 container also triggers transcode
apps/video-processor/src/operations/processVideo.ts
- Step 1.5 between metadata and parallel ops
- effectivePath used for all downstream ops (transcoded or original)
- Returns wasTranscoded, transcodedVideoUrl?, originalFormat?

Phase 2: Storage & Propagation — ✅ Partially Complete

Transcoded MP4 uploaded to R2 (transcodedVideoUrl)
Caller-side propagation of transcodedVideoUrl to QuestionResponse type — consumer responsibility
originalVideoUrl field in QuestionResponse — not yet added

Phase 3: Download Fix — 🔲 Not yet implemented

Zip download extension fix (use transcodedVideoUrl where present)
Remove extension-guessing logic in download helpers

Phase 4: Admin Visibility — 🔲 Not yet implemented

Show transcoding status in VideoInfoCard
Show originalFormat in metadata display