Last verified: 2026-03-06 Target:
apps/video-processorCompanion: none
Video Processor App Architecture
apps/video-processor is an Express.js service deployed on Cloud Run. It handles all FFmpeg-based media operations (video transcoding, audio extraction, thumbnails, silence detection, HLS encoding), AI-based operations (face estimation, AV quality analysis), and Sharp-based image processing (logo transformation, image variants). Vercel apps cannot run FFmpeg — all such work is proxied here.
Module Overview
Primary Data Flow — processVideo Pipeline
The main pipeline, triggered by the POST /process-video endpoint.
Critical vs Non-Critical Operations
| Operation | Critical? | Failure behavior |
|---|---|---|
| Audio extraction | ✅ Yes | Adds to failures[], throws VideoProcessingError |
| Thumbnail generation | ✅ Yes | Adds to failures[], throws VideoProcessingError |
| Silence detection | ✅ Yes | Adds to failures[], throws VideoProcessingError |
| Face estimation | ❌ No | Logged as warn, pipeline continues |
| Audio quality (ebur128) | ❌ No | Logged as warn, pipeline continues |
| Video quality (signalstats) | ❌ No | Logged as warn, pipeline continues |
| HLS encoding | ❌ No | Logged as warn, pipeline continues |
Integration Map
API Endpoint Surface
Video Processing (from videoProcessingApiContract)
All video processing routes are registered from the videoProcessingApiContract — a typed contract in @repo/video/types that enforces both path and request/response shapes.
| Endpoint | Operation | Notes |
|---|---|---|
POST /process-video | processVideo | Main pipeline: transcode + audio + thumbnails + silence + HLS + face + AV quality |
POST /process-asset-video | processAssetVideo | Asset video variant generation |
POST /detect-silence | detectSilence | Standalone silence detection |
POST /get-video-metadata | getVideoMetadata | ffprobe metadata extraction |
POST /rotate-video-90-and-enrich | rotateVideo90AndEnrich | 90° rotation + audio/thumbnail enrichment |
POST /extract-audio-from-video | extractAudioFromVideo | WAV extraction from URL |
POST /generate-thumbnail | generateThumbnail | Single thumbnail from URL |
POST /get-video-duration | getVideoDuration | Duration from URL |
Image Processing (from imageProcessingApiContract)
| Endpoint | Operation | Notes |
|---|---|---|
POST /process-image | processImage | Multi-format (WebP/JPEG), multi-size (thumbnail/medium/full) Sharp processing |
POST /process-logo-to-white | processLogoToWhite | Logo color transformation to white |
POST /process-logo-to-white-debug | processLogoToWhiteDebug | Debug variant with intermediate steps |
Health & Test Endpoints
| Endpoint | Notes |
|---|---|
GET /health | Memory health check, returns version headers |
GET /api/info | Build SHA, ref, time metadata |
GET /api/sentry-error | Trigger test Sentry capture |
POST /test/transcode | Format detection + WebM→MP4 test (non-prod only) |
DELETE /test/cleanup-artifacts | Delete blob URLs by array (non-prod only) |
POST /test/r2-smoke | R2 connectivity cycle test (non-prod only, cloud only) |
Blob Storage Architecture
Two storage backends are used, chosen based on file type and environment:
| Client | Backend (cloud) | Backend (local) | Prefix | Used for |
|---|---|---|---|---|
VideoClient | Cloudflare R2 | Vercel Blob (local emulation) | video-processor/video | Thumbnails, transcoded MP4, HLS segments/manifests |
AudioClient | Cloudflare R2 | Vercel Blob (local emulation) | video-processor/audio | Extracted WAV files |
AssetsClient | Vercel Blob | Vercel Blob | processed-assets | Image processing output (always Vercel Blob) |
R2 is configured via 6 environment variables (R2_ACCOUNT_ID, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_BUCKET_NAME, R2_PUBLIC_URL, R2_ENDPOINT). All are required in cloud, default to "" locally.
HLS Multi-Bitrate Encoding
Added in March 2026 as a non-critical parallel step in processVideo. Encodes the transcoded (or original) video into three renditions in a single FFmpeg pass using -var_stream_map.
| Rendition | Resolution | Video Bitrate | Audio Bitrate | Preset |
|---|---|---|---|---|
480p | -2:480 | 800k | 96k | fast |
720p | -2:720 | 1500k | 128k | fast |
1080p | -2:1080 | 3000k | 128k | fast |
Output structure in R2:
videos/{videoFlowId}/{sessionId}/{questionId}/hls/
├── master.m3u8 ← master playlist (hlsManifestUrl)
├── 480p/
│ ├── playlist.m3u8
│ └── seg_000.ts, seg_001.ts...
├── 720p/
│ └── ...
└── 1080p/
└── ...
Segments are uploaded in batches of 10. On partial upload failure, already-uploaded segments are cleaned up before rethrowing.
Face Estimation
Non-critical parallel step using TensorFlow WASM backend + @vladmandic/face-api (TinyFaceDetector + AgeGenderNet). Lazy-loaded with a concurrency lock to prevent double-initialization.
Input: Single frame extracted at min(1.0, duration - 0.5) seconds (target: respondent facing camera at start).
Output: FaceEstimationResult = { age, ageBucket, gender, genderConfidence } | null
Runtime dependencies in Docker: libcairo2, libpango1.0-0, libjpeg62-turbo, libgif7, librsvg2-2 (all from the canvas npm package requirement). These can be removed if face estimation is dropped.
Deployment
- Runtime: Node.js 22 LTS on Cloud Run (europe-west1)
- Docker build: Multi-stage (
base → pruner → installer → runner) usingturbo prune video-processor --docker - System packages in runner stage:
ffmpeg,libcairo2,libpango1.0-0,libjpeg62-turbo,libgif7,librsvg2-2 - Port: 8080 (Cloud Run default)
- Sentry: Enabled in cloud when
SENTRY_DSNset;tracesSampleRate: 0.2,sampleRate: 1.0; includesnodeProfilingIntegration - tsup bundling:
skipNodeModulesBundle: true(externalizes allnode_modules); custom banner injectsrequire=createRequire(import.meta.url)for face-estimation'srequire.resolve()calls
Key Design Decisions
OOM Prevention: File-Based Pipeline
All video operations use shared temp files on /tmp (tmpfs, backed by instance memory) rather than Node.js heap buffers. The pipeline downloads once to sharedVideoPath and all FFmpeg operations read from that file. This eliminated ~450MB of heap usage on large (400MB+) videos.
/tmp/processVideo-{nanoid}/
├── input-{nanoid}.mp4 ← downloaded video
├── transcoded-{nanoid}.mp4 ← WebM→MP4 output (if transcoding needed)
└── hls/ ← HLS encoding output
├── master.m3u8
├── 480p/{seg_*.ts, playlist.m3u8}
├── 720p/{...}
└── 1080p/{...}
The entire sharedTempDir is cleaned up in a finally block.
Route Type Safety
All endpoints derive from videoProcessingApiContract / imageProcessingApiContract — typed contracts that define { path, args: ZodSchema, result: ZodSchema }. A RequiredRouteRegistry type ensures TypeScript errors if any contract endpoint is missing an implementation.
Related Docs
- video-transcoding-architecture.md — WebM→MP4 transcoding design and FFmpeg settings
- video-processing-pipeline-performance.md — Cloud Run timing benchmarks, bottleneck analysis
- av-quality-classification-thresholds.md — EBU R128 audio quality classification thresholds
- internal-packages-and-docker.md — Docker build pattern and tsup bundling
- error-logging-and-sentry.md — Sentry integration pattern