Last verified: 2026-03-06 Target: apps/video-processor Companion: none

Video Processor App Architecture

apps/video-processor is an Express.js service deployed on Cloud Run. It handles all FFmpeg-based media operations (video transcoding, audio extraction, thumbnails, silence detection, HLS encoding), AI-based operations (face estimation, AV quality analysis), and Sharp-based image processing (logo transformation, image variants). Vercel apps cannot run FFmpeg — all such work is proxied here.

Module Overview

Primary Data Flow — `processVideo` Pipeline

The main pipeline, triggered by the POST /process-video endpoint.

Critical vs Non-Critical Operations

Operation	Critical?	Failure behavior
Audio extraction	✅ Yes	Adds to `failures[]`, throws `VideoProcessingError`
Thumbnail generation	✅ Yes	Adds to `failures[]`, throws `VideoProcessingError`
Silence detection	✅ Yes	Adds to `failures[]`, throws `VideoProcessingError`
Face estimation	❌ No	Logged as `warn`, pipeline continues
Audio quality (ebur128)	❌ No	Logged as `warn`, pipeline continues
Video quality (signalstats)	❌ No	Logged as `warn`, pipeline continues
HLS encoding	❌ No	Logged as `warn`, pipeline continues

Integration Map

API Endpoint Surface

Video Processing (from `videoProcessingApiContract`)

All video processing routes are registered from the videoProcessingApiContract — a typed contract in @repo/video/types that enforces both path and request/response shapes.

Endpoint	Operation	Notes
`POST /process-video`	`processVideo`	Main pipeline: transcode + audio + thumbnails + silence + HLS + face + AV quality
`POST /process-asset-video`	`processAssetVideo`	Asset video variant generation
`POST /detect-silence`	`detectSilence`	Standalone silence detection
`POST /get-video-metadata`	`getVideoMetadata`	ffprobe metadata extraction
`POST /rotate-video-90-and-enrich`	`rotateVideo90AndEnrich`	90° rotation + audio/thumbnail enrichment
`POST /extract-audio-from-video`	`extractAudioFromVideo`	WAV extraction from URL
`POST /generate-thumbnail`	`generateThumbnail`	Single thumbnail from URL
`POST /get-video-duration`	`getVideoDuration`	Duration from URL

Image Processing (from `imageProcessingApiContract`)

Endpoint	Operation	Notes
`POST /process-image`	`processImage`	Multi-format (WebP/JPEG), multi-size (thumbnail/medium/full) Sharp processing
`POST /process-logo-to-white`	`processLogoToWhite`	Logo color transformation to white
`POST /process-logo-to-white-debug`	`processLogoToWhiteDebug`	Debug variant with intermediate steps

Health & Test Endpoints

Endpoint	Notes
`GET /health`	Memory health check, returns version headers
`GET /api/info`	Build SHA, ref, time metadata
`GET /api/sentry-error`	Trigger test Sentry capture
`POST /test/transcode`	Format detection + WebM→MP4 test (non-prod only)
`DELETE /test/cleanup-artifacts`	Delete blob URLs by array (non-prod only)
`POST /test/r2-smoke`	R2 connectivity cycle test (non-prod only, cloud only)

Blob Storage Architecture

Two storage backends are used, chosen based on file type and environment:

Client	Backend (cloud)	Backend (local)	Prefix	Used for
`VideoClient`	Cloudflare R2	Vercel Blob (local emulation)	`video-processor/video`	Thumbnails, transcoded MP4, HLS segments/manifests
`AudioClient`	Cloudflare R2	Vercel Blob (local emulation)	`video-processor/audio`	Extracted WAV files
`AssetsClient`	Vercel Blob	Vercel Blob	`processed-assets`	Image processing output (always Vercel Blob)

R2 is configured via 6 environment variables (R2_ACCOUNT_ID, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_BUCKET_NAME, R2_PUBLIC_URL, R2_ENDPOINT). All are required in cloud, default to "" locally.

HLS Multi-Bitrate Encoding

Added in March 2026 as a non-critical parallel step in processVideo. Encodes the transcoded (or original) video into three renditions in a single FFmpeg pass using -var_stream_map.

Rendition	Resolution	Video Bitrate	Audio Bitrate	Preset
`480p`	`-2:480`	800k	96k	fast
`720p`	`-2:720`	1500k	128k	fast
`1080p`	`-2:1080`	3000k	128k	fast

Output structure in R2:

videos/{videoFlowId}/{sessionId}/{questionId}/hls/
├── master.m3u8                  ← master playlist (hlsManifestUrl)
├── 480p/
│   ├── playlist.m3u8
│   └── seg_000.ts, seg_001.ts...
├── 720p/
│   └── ...
└── 1080p/
    └── ...

Segments are uploaded in batches of 10. On partial upload failure, already-uploaded segments are cleaned up before rethrowing.

Face Estimation

Non-critical parallel step using TensorFlow WASM backend + @vladmandic/face-api (TinyFaceDetector + AgeGenderNet). Lazy-loaded with a concurrency lock to prevent double-initialization.

Input: Single frame extracted at min(1.0, duration - 0.5) seconds (target: respondent facing camera at start).

Output: FaceEstimationResult = { age, ageBucket, gender, genderConfidence } | null

Runtime dependencies in Docker: libcairo2, libpango1.0-0, libjpeg62-turbo, libgif7, librsvg2-2 (all from the canvas npm package requirement). These can be removed if face estimation is dropped.

Deployment

Runtime: Node.js 22 LTS on Cloud Run (europe-west1)
Docker build: Multi-stage (base → pruner → installer → runner) using turbo prune video-processor --docker
System packages in runner stage: ffmpeg, libcairo2, libpango1.0-0, libjpeg62-turbo, libgif7, librsvg2-2
Port: 8080 (Cloud Run default)
Sentry: Enabled in cloud when SENTRY_DSN set; tracesSampleRate: 0.2, sampleRate: 1.0; includes nodeProfilingIntegration
tsup bundling: skipNodeModulesBundle: true (externalizes all node_modules); custom banner injects require=createRequire(import.meta.url) for face-estimation's require.resolve() calls

Key Design Decisions

OOM Prevention: File-Based Pipeline

All video operations use shared temp files on /tmp (tmpfs, backed by instance memory) rather than Node.js heap buffers. The pipeline downloads once to sharedVideoPath and all FFmpeg operations read from that file. This eliminated ~450MB of heap usage on large (400MB+) videos.

/tmp/processVideo-{nanoid}/
├── input-{nanoid}.mp4       ← downloaded video
├── transcoded-{nanoid}.mp4  ← WebM→MP4 output (if transcoding needed)
└── hls/                     ← HLS encoding output
    ├── master.m3u8
    ├── 480p/{seg_*.ts, playlist.m3u8}
    ├── 720p/{...}
    └── 1080p/{...}

The entire sharedTempDir is cleaned up in a finally block.

Route Type Safety

All endpoints derive from videoProcessingApiContract / imageProcessingApiContract — typed contracts that define { path, args: ZodSchema, result: ZodSchema }. A RequiredRouteRegistry type ensures TypeScript errors if any contract endpoint is missing an implementation.

Related Docs

video-transcoding-architecture.md — WebM→MP4 transcoding design and FFmpeg settings
video-processing-pipeline-performance.md — Cloud Run timing benchmarks, bottleneck analysis
av-quality-classification-thresholds.md — EBU R128 audio quality classification thresholds
internal-packages-and-docker.md — Docker build pattern and tsup bundling
error-logging-and-sentry.md — Sentry integration pattern