All docs/Video Processor

docs/architecture/video-transcoding-architecture.md

Last verified: 2026-03-06 Target: apps/video-processor

Video Transcoding Architecture: WebM to MP4

How browser codec selection creates WebM files that break NLE editors, and how server-side transcoding fixes it.

Extends: video-recording-data-architecture.md — read that first for the full data flow from browser to admin dashboard.


Overview

Videos recorded in the platform can arrive in two container formats:

  • MP4 (H.264 + AAC) — universally compatible with NLE editors (Premiere Pro, DaVinci Resolve, Final Cut Pro)
  • WebM (VP8/VP9 + Opus) — the only option on Firefox and some older browsers, but not supported by any major NLE editor

When a pro user downloads a WebM video and tries to import it into Premiere Pro, they get an "unsupported compression type" error. The file is perfectly fine for browser playback but useless for professional editing.

The fix: Detect WebM files during server-side processing and transcode them to MP4 (H.264 + AAC) before they reach any downstream consumer.


Browser Codec Selection Strategy

Why WebM Exists (It's Not a Choice)

The codec selection in getBestSupportedMediaRecorderCodec() strongly prefers MP4 H.264/AAC. WebM is a fallback, not a preference. The reason WebM exists at all is that Firefox does not support MP4 recording via MediaRecorder — it can only produce WebM.

Codec Priority Cascade

The browser tries codecs in this order (first successful construction wins):

1. video/mp4;codecs="avc1.42E01E,mp4a.40.2"   → MP4 H.264 Baseline + AAC (best for NLEs)
2. video/mp4                                     → MP4 (browser picks codecs)
3. video/webm;codecs="vp9,opus"                  → WebM VP9 + Opus (Chrome/Firefox fallback)
4. video/webm;codecs="vp8,opus"                  → WebM VP8 + Opus (legacy fallback)
5. video/webm                                     → WebM (browser picks codecs)
6. (no mimeType — browser default)               → Last resort

iOS Safari special path: iOS Safari doesn't support explicit codec strings in MediaRecorder. The code tries plain video/mp4 first, then construction with no mimeType at all. Both infer H.264 + AAC.

Construction Testing (Not isTypeSupported)

The code uses actual new MediaRecorder(stream, { mimeType }) construction to verify support — not MediaRecorder.isTypeSupported(). This is intentional:

// isTypeSupported can lie — some browsers report false but accept construction
const tryConstruct = (candidate?: string): string | null => {
  try {
    const rec = new MediaRecorder(stream, candidate ? { mimeType: candidate } : undefined);
    return rec.mimeType || candidate || null;  // Read canonical mimeType from UA
  } catch {
    return null;
  }
};

The canonical mimeType read back from the constructed MediaRecorder is the ground truth for what the browser will actually produce.

NLE Compatibility Risk

Each codec selection is tagged with an NLE compatibility risk level:

ContainerVideo CodecAudio CodecNLE RiskWhy
MP4H.264AACLowUniversal NLE support
MP4UnknownUnknownMediumDepends on actual codec
WebMVP9OpusHighNo NLE supports WebM import
WebMVP8OpusHighNo NLE supports WebM import

Browser Support Matrix

BrowserMP4 H.264+AACMP4 (plain)WebM VP9+OpusWebM VP8+OpusResult
Chrome (desktop)YesYesYesYesMP4 H.264
Chrome (Android)YesYesYesYesMP4 H.264
Safari (macOS)YesYesNoNoMP4 H.264
Safari (iOS)Special*Special*NoNoMP4 H.264 (inferred)
Firefox (all)NoNoYesYesWebM VP9
Edge (Chromium)YesYesYesYesMP4 H.264

* iOS Safari uses plain video/mp4 or default construction; codec strings are inferred as H.264+AAC.

Codec Selection Decision Tree


Implemented State

Transcoding is fully implemented as Step 1.5, inserted between metadata extraction (Step 1) and parallel operations (Step 2). All downstream operations (audio extraction, thumbnails, silence detection, HLS encoding) work on the transcoded MP4 file.

Note: This section was originally written as "Target State (To-Be)". Transcoding was implemented in February 2026.

Data Model: transcodedVideoUrl vs videoUrl

The original videoUrl is preserved (pointing to the WebM in blob storage). A separate transcodedVideoUrl field carries the transcoded MP4 URL. Callers (e.g., process-video-response.ts) decide whether to use videoUrl or transcodedVideoUrl for downstream consumers.

// processVideo return type (processVideo.ts)
{
  wasTranscoded: boolean;
  transcodedVideoUrl?: string;  // Set if wasTranscoded=true; points to transcoded MP4
  originalFormat?: OriginalFormat;  // "webm" | "mp4" | "unknown"
  // ...other fields
}

Transcoding Decision Flowchart


Pipeline Integration

Where transcoding fits within the existing processVideo() pipeline in apps/video-processor/src/operations/processVideo.ts:

processVideo()
│
├── Step 0: Download video to shared temp file (disk, not heap)
│   └── downloadVideoToFile(videoUrl, sharedVideoPath)
│
├── Step 1: Extract metadata (sequential — needed for format detection + thumbnail duration)
│   └── getVideoMetadataFromFile(sharedVideoPath) → metadata
│
├── Step 1.5: Transcode if needed (NEW)
│   ├── Detect format from metadata (codec_name, format_name)
│   ├── If WebM/VP8/VP9:
│   │   ├── transcodeToMp4(sharedVideoPath, transcodedPath)
│   │   ├── Replace sharedVideoPath reference with transcodedPath
│   │   └── Set wasTranscoded = true
│   └── If already MP4/H.264: skip (wasTranscoded = false)
│
├── Step 2: Parallel operations (all read from shared file — now transcoded if needed)
│   ├── extractAudioFromVideoFromFile(effectivePath)
│   ├── generateThumbnailFromVideoFile(effectivePath, duration)
│   └── detectSilenceInVideoFromFile(effectivePath)
│
└── Return: { audioPath, thumbnailUrl, metadata, silenceIntervals, wasTranscoded }

Key insight: The transcoded file replaces the original in the shared temp directory. All subsequent operations automatically use the MP4 version without any code changes to audio extraction, thumbnail generation, or silence detection.


FFmpeg Transcoding Settings

Settings reused from the existing rotateVideo90FromBuffer() in rotation.ts, which already produces NLE-compatible H.264+AAC output:

SettingValueRationale
-c:v libx264H.264 video codecUniversal NLE + browser support
-profile:v mainMain profileBroad decoder compatibility (vs. High which some mobile decoders struggle with)
-preset mediumEncoding speed/quality tradeoffGood balance for server-side; fast saves ~30% time but ~10% larger files
-crf 23Constant Rate FactorVisually transparent quality; 18=near-lossless, 23=good quality, 28=noticeable loss
-c:a aacAAC audio codecUniversal NLE + browser support
-b:a 128k128 kbps audio bitrateStandard quality for speech; Opus→AAC transcoding is lossy-to-lossy anyway
-movflags +faststartMove moov atom to startEnables progressive playback in browsers without full download
-yOverwrite outputStandard for temp file pipelines

FFmpeg Command

ffmpeg -i input.webm \
  -c:v libx264 -profile:v main -preset medium -crf 23 \
  -c:a aac -b:a 128k \
  -movflags +faststart \
  -y output.mp4

Why These Specific Settings

  • CRF 23 (not 18): The source is already a lossy WebM recording from a webcam. Re-encoding at CRF 18 would produce unnecessarily large files without visible quality improvement. CRF 23 preserves the existing quality while keeping file sizes reasonable.
  • Main profile (not High): High profile would give ~5-10% better compression at the cost of compatibility. Since these videos are for NLE import (not streaming optimization), compatibility is more important.
  • Medium preset (not fast/ultrafast): Server-side processing can afford the extra encoding time. medium produces noticeably smaller files than fast for the same quality.

Data Model

processVideo Return Fields (Implemented)

The processVideo function in apps/video-processor/src/operations/processVideo.ts returns:

{
  wasTranscoded: boolean;
  transcodedVideoUrl?: string;   // URL of the new MP4 in R2 (only if wasTranscoded=true)
  originalFormat?: OriginalFormat;  // "webm" | "mp4" | "unknown" from @repo/video/formats
  // ...other fields (audioPath, thumbnailUrl, metadata, silenceIntervals, etc.)
}

Field semantics:

FieldWhen SetValue
transcodedVideoUrlwasTranscoded === trueR2 URL of the transcoded MP4
wasTranscodedAlwaystrue if transcoding occurred
originalFormatAlwaysDerived from ffprobe format_name via deriveOriginalFormat()

Design note: The original videoUrl is NOT replaced — it still points to the original WebM in blob storage. Callers receive both the original and the transcoded URL and can use whichever is appropriate. This differs from the original design which proposed replacing videoUrl with the transcoded version.

Why preserve the original? The original WebM is archived for debugging. If transcoding introduces artifacts or if the source video needs re-processing with different settings, the original is available without re-recording.


Storage Strategy

Dual Storage (Original + Transcoded)

When transcoding occurs, both files are stored in blob storage:

Blob Storage
├── flows/{flowId}/sessions/{sessionId}/
│   ├── question1/response.webm          ← original (moved to originalVideoUrl)
│   └── question1/response-transcoded.mp4 ← transcoded (becomes new videoUrl)

Naming convention: The transcoded file uses a -transcoded suffix to distinguish it from the original. Both files include a random suffix (via addRandomSuffix: true) to prevent caching issues.

Cost Implications

  • Additional storage: ~1x the original file size (H.264 at CRF 23 produces files of similar size to VP9 at typical webcam quality)
  • Who gets transcoded: Only Firefox users (~3-5% of respondents based on PostHog analytics)
  • Net impact: Minimal — a small percentage of videos stored twice

Download/Zip Fix

After transcoding, all downloaded files are guaranteed to be MP4 with correct .mp4 extensions:

ScenarioBeforeAfter
Chrome user downloadsresponse.mp4 (correct)response.mp4 (unchanged)
Firefox user downloadsresponse.webm with .mp4 extension (broken)response.mp4 (transcoded, correct)
Zip downloadMixed extensions, some wrongAll .mp4

The download logic reads videoUrl (which now always points to an MP4) rather than constructing extensions from the original upload.


Performance

Transcoding Time Estimates

Based on FFmpeg benchmarks for webcam-quality video (720p-1080p, 30fps) with -preset medium:

Video DurationEstimated Transcode TimeNotes
30 seconds~5-8 secondsTypical single-question response
2 minutes~15-25 secondsLong response
5 minutes~40-60 secondsMaximum typical length
10 minutes~80-120 secondsEdge case

Pipeline Impact

  • Latency increase: Transcoding adds ~0.5x the video duration to total processing time
  • Only affects Firefox users: Chrome/Safari videos skip transcoding entirely
  • Parallel ops unaffected: Audio extraction, thumbnails, and silence detection run after transcoding completes and use the same shared temp file pattern
  • Memory: File-based (consistent with OOM-prevention pattern) — no buffers in Node.js heap

Cloud Run Considerations

  • Temp disk (/tmp): tmpfs backed by instance memory. Transcoding reads/writes temp files, so the instance needs enough memory for the video file + transcoded output simultaneously (~2x video size)
  • Timeout: Cloud Run default 300s should accommodate most videos. For very long videos (>5 min), the transcoding step should have its own timeout separate from ffprobe

Edge Cases

1. Timeout During Transcoding

FFmpeg can hang on corrupt or malformed input. The executeFfmpegProcess() helper already handles timeouts:

  • Default: 180 seconds (from rotation.ts pattern)
  • Should scale with video duration: base_timeout + (duration_seconds * 2) * 1000
  • On timeout, process is killed and VideoProcessingError is thrown

2. Corrupt Input File

FFmpeg will exit with a non-zero code. The existing error handling in executeFfmpegProcess() captures stderr, detects SIGKILL (OOM), and wraps errors in VideoProcessingError.

3. Disk Space Exhaustion

On Cloud Run, /tmp is backed by memory. If the input + output exceed available tmpfs space:

  • FFmpeg will fail with a write error
  • The finally block in processVideo() cleans up the shared temp directory
  • Mitigation: ensure Cloud Run instance memory is at least 3x the maximum expected video size

4. VP8/VP9 in MP4 Container (Edge Case)

Some browsers might report an MP4 container but use VP8/VP9 codecs (technically valid but extremely rare). Format detection should check both container and codec:

const needsTranscode =
  metadata.format.codec?.some(c => ["vp8", "vp9", "vp09", "vp08"].includes(c)) ||
  containerFormat === "webm" ||
  containerFormat === "matroska";

5. Audio-Only or Silent Video

If the input has no audio stream, FFmpeg should not fail — -c:a aac is simply a no-op when there's no audio input. The transcoded output will also have no audio.

6. Already-Transcoded Re-Processing

If a video is re-processed (e.g., after a processing pipeline upgrade), the wasTranscoded flag and originalVideoUrl prevent double-transcoding. The pipeline should check if the current videoUrl is already an MP4 before transcoding.


Key Files

FileRole
packages/app-video-flow/src/web/video-recording/capabilities.tsBrowser codec selection (getBestSupportedMediaRecorderCodec), NLE risk tagging
packages/app-video-flow/src/web/video-recording/useMediaRecorder.tsMediaRecorder hook that uses codec capabilities for recording
apps/video-processor/src/operations/processVideo.tsMain processing pipeline where transcoding step will be inserted
apps/video-processor/src/operations/rotation.tsExisting H.264+AAC FFmpeg pattern to reuse for transcoding
apps/video-processor/src/operations/metadata.tsFormat detection via ffprobe (getVideoMetadataFromFile)
apps/video-processor/src/utils/ffmpeg-helpers.tsexecuteFfmpegProcess() — shared FFmpeg process execution with timeout/OOM handling
packages/registries/src/server/video-processing-types.tsQuestionResponse type — extend with originalVideoUrl, wasTranscoded, originalFormat
packages/services/src/server/testimonials/process-video-response.tsOrchestration layer calling video-processor; passes through new fields
packages/services/src/server/testimonials/steps/batch-video-processing.tsHTTP client call to video-processor processVideo endpoint
docs/architecture/video-recording-data-architecture.mdParent doc — covers full data flow from browser to admin

Implementation Status

Phase 1: Core Transcoding — ✅ Complete

  • apps/video-processor/src/operations/transcode.ts
    • transcodeWebmToMp4(inputPath, outputPath, durationSeconds?): Promise<void>
    • FFmpeg settings reused from rotation.ts (H.264 main + AAC 128k, CRF 23, faststart)
    • Timeout scales with video duration (BASE_TIMEOUT_MS + durationSeconds * 2 * 1000, capped at 600s)
  • apps/video-processor/src/operations/format-detection.ts
    • detectNeedsTranscode(metadata): TranscodeDetectionResult
    • Checks container (format_name from ffprobe JSON) and codec separately
    • Edge case: VP8/VP9 inside MP4 container also triggers transcode
  • apps/video-processor/src/operations/processVideo.ts
    • Step 1.5 between metadata and parallel ops
    • effectivePath used for all downstream ops (transcoded or original)
    • Returns wasTranscoded, transcodedVideoUrl?, originalFormat?

Phase 2: Storage & Propagation — ✅ Partially Complete

  • Transcoded MP4 uploaded to R2 (transcodedVideoUrl)
  • Caller-side propagation of transcodedVideoUrl to QuestionResponse type — consumer responsibility
  • originalVideoUrl field in QuestionResponse — not yet added

Phase 3: Download Fix — 🔲 Not yet implemented

  • Zip download extension fix (use transcodedVideoUrl where present)
  • Remove extension-guessing logic in download helpers

Phase 4: Admin Visibility — 🔲 Not yet implemented

  • Show transcoding status in VideoInfoCard
  • Show originalFormat in metadata display