All docs/Video Processor

docs/architecture/video-processor-app-architecture.md

Last verified: 2026-03-06 Target: apps/video-processor Companion: none

Video Processor App Architecture

apps/video-processor is an Express.js service deployed on Cloud Run. It handles all FFmpeg-based media operations (video transcoding, audio extraction, thumbnails, silence detection, HLS encoding), AI-based operations (face estimation, AV quality analysis), and Sharp-based image processing (logo transformation, image variants). Vercel apps cannot run FFmpeg — all such work is proxied here.


Module Overview


Primary Data Flow — processVideo Pipeline

The main pipeline, triggered by the POST /process-video endpoint.

Critical vs Non-Critical Operations

OperationCritical?Failure behavior
Audio extraction✅ YesAdds to failures[], throws VideoProcessingError
Thumbnail generation✅ YesAdds to failures[], throws VideoProcessingError
Silence detection✅ YesAdds to failures[], throws VideoProcessingError
Face estimation❌ NoLogged as warn, pipeline continues
Audio quality (ebur128)❌ NoLogged as warn, pipeline continues
Video quality (signalstats)❌ NoLogged as warn, pipeline continues
HLS encoding❌ NoLogged as warn, pipeline continues

Integration Map


API Endpoint Surface

Video Processing (from videoProcessingApiContract)

All video processing routes are registered from the videoProcessingApiContract — a typed contract in @repo/video/types that enforces both path and request/response shapes.

EndpointOperationNotes
POST /process-videoprocessVideoMain pipeline: transcode + audio + thumbnails + silence + HLS + face + AV quality
POST /process-asset-videoprocessAssetVideoAsset video variant generation
POST /detect-silencedetectSilenceStandalone silence detection
POST /get-video-metadatagetVideoMetadataffprobe metadata extraction
POST /rotate-video-90-and-enrichrotateVideo90AndEnrich90° rotation + audio/thumbnail enrichment
POST /extract-audio-from-videoextractAudioFromVideoWAV extraction from URL
POST /generate-thumbnailgenerateThumbnailSingle thumbnail from URL
POST /get-video-durationgetVideoDurationDuration from URL

Image Processing (from imageProcessingApiContract)

EndpointOperationNotes
POST /process-imageprocessImageMulti-format (WebP/JPEG), multi-size (thumbnail/medium/full) Sharp processing
POST /process-logo-to-whiteprocessLogoToWhiteLogo color transformation to white
POST /process-logo-to-white-debugprocessLogoToWhiteDebugDebug variant with intermediate steps

Health & Test Endpoints

EndpointNotes
GET /healthMemory health check, returns version headers
GET /api/infoBuild SHA, ref, time metadata
GET /api/sentry-errorTrigger test Sentry capture
POST /test/transcodeFormat detection + WebM→MP4 test (non-prod only)
DELETE /test/cleanup-artifactsDelete blob URLs by array (non-prod only)
POST /test/r2-smokeR2 connectivity cycle test (non-prod only, cloud only)

Blob Storage Architecture

Two storage backends are used, chosen based on file type and environment:

ClientBackend (cloud)Backend (local)PrefixUsed for
VideoClientCloudflare R2Vercel Blob (local emulation)video-processor/videoThumbnails, transcoded MP4, HLS segments/manifests
AudioClientCloudflare R2Vercel Blob (local emulation)video-processor/audioExtracted WAV files
AssetsClientVercel BlobVercel Blobprocessed-assetsImage processing output (always Vercel Blob)

R2 is configured via 6 environment variables (R2_ACCOUNT_ID, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_BUCKET_NAME, R2_PUBLIC_URL, R2_ENDPOINT). All are required in cloud, default to "" locally.


HLS Multi-Bitrate Encoding

Added in March 2026 as a non-critical parallel step in processVideo. Encodes the transcoded (or original) video into three renditions in a single FFmpeg pass using -var_stream_map.

RenditionResolutionVideo BitrateAudio BitratePreset
480p-2:480800k96kfast
720p-2:7201500k128kfast
1080p-2:10803000k128kfast

Output structure in R2:

videos/{videoFlowId}/{sessionId}/{questionId}/hls/
├── master.m3u8                  ← master playlist (hlsManifestUrl)
├── 480p/
│   ├── playlist.m3u8
│   └── seg_000.ts, seg_001.ts...
├── 720p/
│   └── ...
└── 1080p/
    └── ...

Segments are uploaded in batches of 10. On partial upload failure, already-uploaded segments are cleaned up before rethrowing.


Face Estimation

Non-critical parallel step using TensorFlow WASM backend + @vladmandic/face-api (TinyFaceDetector + AgeGenderNet). Lazy-loaded with a concurrency lock to prevent double-initialization.

Input: Single frame extracted at min(1.0, duration - 0.5) seconds (target: respondent facing camera at start).

Output: FaceEstimationResult = { age, ageBucket, gender, genderConfidence } | null

Runtime dependencies in Docker: libcairo2, libpango1.0-0, libjpeg62-turbo, libgif7, librsvg2-2 (all from the canvas npm package requirement). These can be removed if face estimation is dropped.


Deployment

  • Runtime: Node.js 22 LTS on Cloud Run (europe-west1)
  • Docker build: Multi-stage (base → pruner → installer → runner) using turbo prune video-processor --docker
  • System packages in runner stage: ffmpeg, libcairo2, libpango1.0-0, libjpeg62-turbo, libgif7, librsvg2-2
  • Port: 8080 (Cloud Run default)
  • Sentry: Enabled in cloud when SENTRY_DSN set; tracesSampleRate: 0.2, sampleRate: 1.0; includes nodeProfilingIntegration
  • tsup bundling: skipNodeModulesBundle: true (externalizes all node_modules); custom banner injects require=createRequire(import.meta.url) for face-estimation's require.resolve() calls

Key Design Decisions

OOM Prevention: File-Based Pipeline

All video operations use shared temp files on /tmp (tmpfs, backed by instance memory) rather than Node.js heap buffers. The pipeline downloads once to sharedVideoPath and all FFmpeg operations read from that file. This eliminated ~450MB of heap usage on large (400MB+) videos.

/tmp/processVideo-{nanoid}/
├── input-{nanoid}.mp4       ← downloaded video
├── transcoded-{nanoid}.mp4  ← WebM→MP4 output (if transcoding needed)
└── hls/                     ← HLS encoding output
    ├── master.m3u8
    ├── 480p/{seg_*.ts, playlist.m3u8}
    ├── 720p/{...}
    └── 1080p/{...}

The entire sharedTempDir is cleaned up in a finally block.

Route Type Safety

All endpoints derive from videoProcessingApiContract / imageProcessingApiContract — typed contracts that define { path, args: ZodSchema, result: ZodSchema }. A RequiredRouteRegistry type ensures TypeScript errors if any contract endpoint is missing an implementation.


Related Docs