Last verified: 2026-03-06 Target:
apps/video-processor
Video Processing Pipeline — Performance & Bottleneck Analysis
Date: 2026-02-26
How the processVideo pipeline performs on Cloud Run, which operations are bottlenecks, and why the EBU R128 audio quality and signalstats video quality operations add zero latency to the total pipeline.
Related: video-transcoding-architecture.md — WebM-to-MP4 transcoding details.
Pipeline Structure
processVideo (apps/video-processor/src/operations/processVideo.ts) processes each respondent video in three sequential phases:
SEQUENTIAL
1. Download video to disk (stream from blob storage → /tmp)
2. Extract metadata via ffprobe
3. Transcode WebM → MP4 (if needed — WebM/VP8/VP9 containers)
PARALLEL (Promise.allSettled — wall time = slowest operation)
4a. Extract audio (ffmpeg transcode + blob upload) ← CRITICAL
4b. Generate thumbnails (frame extracts + animated GIF + blob uploads) ← CRITICAL
4c. Detect silence (ffmpeg silencedetect filter) ← CRITICAL
4d. Estimate face (frame extract + TensorFlow WASM face-api) ← non-critical
4e. Detect audio quality — ebur128 (ffmpeg filter → /dev/null) ← non-critical
4f. Detect video quality — signalstats (ffmpeg filter → /dev/null) ← non-critical
4g. HLS multi-bitrate encoding (480p/720p/1080p → R2, added 2026-03) ← non-critical
Operations 4d–4g are non-blocking: if they fail, the pipeline logs a warning and continues. They never contribute to the failures[] array that would cause the pipeline to throw. Operations 4a–4c are critical — failure throws VideoProcessingError.
HLS encoding (4g) was added in March 2026. It encodes all three renditions in a single FFmpeg pass (
-var_stream_map) and uploads all.tssegments +.m3u8playlists to Cloudflare R2. Timeout is180s + durationSeconds × 3s, capped at 15 minutes.
Cloud Run Measurements (2026-02-26)
Note: These measurements were taken before HLS encoding (4g) was added. HLS is non-blocking and runs in parallel, so it should not increase total pipeline latency if it completes before thumbnails (7.7s). Its actual Cloud Run timing has not yet been benchmarked. Also note: blob storage was migrated from Vercel Blob to Cloudflare R2 in March 2026; network I/O timings may differ slightly.
Preview environment — single pipeline run with ebur128 + signalstats
Source: video-processor-preview, revision 00457-gkm, europe-west1.
| Phase | Operation | Start | End | Duration |
|---|---|---|---|---|
| Sequential | Download | 06:28:12.224 | 06:28:12.654 | 0.4s |
| Sequential | Metadata (ffprobe) | 06:28:12.658 | 06:28:13.905 | 1.2s |
| Parallel | ebur128 (audio quality) | 06:28:13.930 | 06:28:16.831 | 2.9s |
| Parallel | Silence detection | 06:28:13.921 | 06:28:16.854 | 2.9s |
| Parallel | signalstats (video quality) | 06:28:13.934 | 06:28:17.110 | 3.2s |
| Parallel | Face estimation | 06:28:13.930 | 06:28:17.482 | 3.6s |
| Parallel | Audio extraction (ffmpeg + upload) | 06:28:13.908 | 06:28:18.305 | 4.4s |
| Parallel | Thumbnails (frames + GIF + upload) | 06:28:13.920 | 06:28:21.619 | 7.7s |
Total parallel phase: 7.7s — determined entirely by thumbnail GIF generation.
ebur128 output for this run
{
"integratedLufs": -20.7,
"loudnessRange": 10.1,
"meanVolumeDb": -20.7,
"maxVolumeDb": -11,
"speechPresenceRatio": 0.88,
"speechLoudnessStddev": 5.99,
"mFrameCount": 95,
"speechFrameCount": 84
}
Bottleneck Analysis
What dominates the parallel phase
Thumbnails (7.7s) ████████████████████████████████████████ 100%
Audio extract ██████████████████████████ 57% (ffmpeg 2.2s + upload 2.2s)
Face estimation █████████████████████ 47%
signalstats ██████████████████ 42%
ebur128 █████████████████ 38%
Silence █████████████████ 38%
The thumbnail operation is ~2x slower than the next operation because it:
- Extracts multiple static frames (5 ffmpeg invocations)
- Generates a color palette for the GIF
- Encodes the animated GIF
- Uploads all results to blob storage
Why ebur128 and signalstats add zero total latency
Both operations are "compute-only" — they read the local file and pipe to /dev/null. No network I/O, no blob uploads. They finish in 2.9s and 3.2s respectively, well before the thumbnail operation completes at 7.7s.
Since all 6 operations run in Promise.allSettled, the total time is max(all) = 7.7s. Removing ebur128 and signalstats would still leave the pipeline at 7.7s.
If we ever want to speed up the pipeline
The priority order for optimization would be:
- Animated GIF generation (~3.9s of the 7.7s thumbnail time) — consider skipping GIF or generating it asynchronously
- Audio extraction upload (~2.2s network I/O) — already unavoidable (blob storage round-trip)
- Face estimation (~3.6s) — Python subprocess overhead; could be replaced with a lighter model
Audio/video quality analysis would be among the last things to optimize.
Local Benchmarks (Apple Silicon, FFmpeg 7.1.1)
For reference, local timing on synthetic test clips:
30-second clip
| Operation | Wall time |
|---|---|
| ebur128 | 0.19s |
| signalstats | 0.06s |
| silencedetect | 0.21s |
| volumedetect (old, removed) | 0.49s |
| thumbnail (1 frame) | 0.10s |
| ffprobe metadata | 0.06s |
90-second clip
| Operation | Wall time |
|---|---|
| ebur128 | 0.99s |
| signalstats | 0.08s |
| silencedetect | 0.65s |
| thumbnail (3 frames) | 0.55s |
Note: local benchmarks are significantly faster than Cloud Run due to Apple Silicon's single-thread performance and NVMe storage vs. Cloud Run's tmpfs. The relative ordering is what matters.