All docs/Video Processor

docs/architecture/video-processing-pipeline-performance.md

Last verified: 2026-03-06 Target: apps/video-processor

Video Processing Pipeline — Performance & Bottleneck Analysis

Date: 2026-02-26

How the processVideo pipeline performs on Cloud Run, which operations are bottlenecks, and why the EBU R128 audio quality and signalstats video quality operations add zero latency to the total pipeline.

Related: video-transcoding-architecture.md — WebM-to-MP4 transcoding details.


Pipeline Structure

processVideo (apps/video-processor/src/operations/processVideo.ts) processes each respondent video in three sequential phases:

SEQUENTIAL
  1. Download video to disk (stream from blob storage → /tmp)
  2. Extract metadata via ffprobe
  3. Transcode WebM → MP4 (if needed — WebM/VP8/VP9 containers)

PARALLEL (Promise.allSettled — wall time = slowest operation)
  4a. Extract audio (ffmpeg transcode + blob upload)           ← CRITICAL
  4b. Generate thumbnails (frame extracts + animated GIF + blob uploads) ← CRITICAL
  4c. Detect silence (ffmpeg silencedetect filter)             ← CRITICAL
  4d. Estimate face (frame extract + TensorFlow WASM face-api) ← non-critical
  4e. Detect audio quality — ebur128 (ffmpeg filter → /dev/null) ← non-critical
  4f. Detect video quality — signalstats (ffmpeg filter → /dev/null) ← non-critical
  4g. HLS multi-bitrate encoding (480p/720p/1080p → R2, added 2026-03) ← non-critical

Operations 4d–4g are non-blocking: if they fail, the pipeline logs a warning and continues. They never contribute to the failures[] array that would cause the pipeline to throw. Operations 4a–4c are critical — failure throws VideoProcessingError.

HLS encoding (4g) was added in March 2026. It encodes all three renditions in a single FFmpeg pass (-var_stream_map) and uploads all .ts segments + .m3u8 playlists to Cloudflare R2. Timeout is 180s + durationSeconds × 3s, capped at 15 minutes.


Cloud Run Measurements (2026-02-26)

Note: These measurements were taken before HLS encoding (4g) was added. HLS is non-blocking and runs in parallel, so it should not increase total pipeline latency if it completes before thumbnails (7.7s). Its actual Cloud Run timing has not yet been benchmarked. Also note: blob storage was migrated from Vercel Blob to Cloudflare R2 in March 2026; network I/O timings may differ slightly.

Preview environment — single pipeline run with ebur128 + signalstats

Source: video-processor-preview, revision 00457-gkm, europe-west1.

PhaseOperationStartEndDuration
SequentialDownload06:28:12.22406:28:12.6540.4s
SequentialMetadata (ffprobe)06:28:12.65806:28:13.9051.2s
Parallelebur128 (audio quality)06:28:13.93006:28:16.8312.9s
ParallelSilence detection06:28:13.92106:28:16.8542.9s
Parallelsignalstats (video quality)06:28:13.93406:28:17.1103.2s
ParallelFace estimation06:28:13.93006:28:17.4823.6s
ParallelAudio extraction (ffmpeg + upload)06:28:13.90806:28:18.3054.4s
ParallelThumbnails (frames + GIF + upload)06:28:13.92006:28:21.6197.7s

Total parallel phase: 7.7s — determined entirely by thumbnail GIF generation.

ebur128 output for this run

{
  "integratedLufs": -20.7,
  "loudnessRange": 10.1,
  "meanVolumeDb": -20.7,
  "maxVolumeDb": -11,
  "speechPresenceRatio": 0.88,
  "speechLoudnessStddev": 5.99,
  "mFrameCount": 95,
  "speechFrameCount": 84
}

Bottleneck Analysis

What dominates the parallel phase

Thumbnails (7.7s)  ████████████████████████████████████████  100%
Audio extract      ██████████████████████████                 57%  (ffmpeg 2.2s + upload 2.2s)
Face estimation    █████████████████████                      47%
signalstats        ██████████████████                         42%
ebur128            █████████████████                          38%
Silence            █████████████████                          38%

The thumbnail operation is ~2x slower than the next operation because it:

  1. Extracts multiple static frames (5 ffmpeg invocations)
  2. Generates a color palette for the GIF
  3. Encodes the animated GIF
  4. Uploads all results to blob storage

Why ebur128 and signalstats add zero total latency

Both operations are "compute-only" — they read the local file and pipe to /dev/null. No network I/O, no blob uploads. They finish in 2.9s and 3.2s respectively, well before the thumbnail operation completes at 7.7s.

Since all 6 operations run in Promise.allSettled, the total time is max(all) = 7.7s. Removing ebur128 and signalstats would still leave the pipeline at 7.7s.

If we ever want to speed up the pipeline

The priority order for optimization would be:

  1. Animated GIF generation (~3.9s of the 7.7s thumbnail time) — consider skipping GIF or generating it asynchronously
  2. Audio extraction upload (~2.2s network I/O) — already unavoidable (blob storage round-trip)
  3. Face estimation (~3.6s) — Python subprocess overhead; could be replaced with a lighter model

Audio/video quality analysis would be among the last things to optimize.


Local Benchmarks (Apple Silicon, FFmpeg 7.1.1)

For reference, local timing on synthetic test clips:

30-second clip

OperationWall time
ebur1280.19s
signalstats0.06s
silencedetect0.21s
volumedetect (old, removed)0.49s
thumbnail (1 frame)0.10s
ffprobe metadata0.06s

90-second clip

OperationWall time
ebur1280.99s
signalstats0.08s
silencedetect0.65s
thumbnail (3 frames)0.55s

Note: local benchmarks are significantly faster than Cloud Run due to Apple Silicon's single-thread performance and NVMe storage vs. Cloud Run's tmpfs. The relative ordering is what matters.