All docs/general

docs/architecture/performance-profiling-2026-03.md

Performance Profiling — March 2026

TL;DR

The platform is slow because of three systemic patterns, not random one-off issues:

  1. N+1 data fetching — Multiple pages fetch a list of IDs, then loop over each ID with individual Redis calls. This turns a 50ms query into a 2–6 second waterfall. Affects: library respondent detail (3.1s TTFB), BVF admin (5.6s TTFB), admin dashboard (7.9s LCP).

  2. Unoptimized heavy assets — Animated GIF thumbnails are 500KB–2MB each, loaded raw without optimization. A grid with 6 videos downloads 3–12MB of GIFs sequentially, taking 15 seconds per image. This is the biggest perceived slowness for library users.

  3. Invisible backend — The video-processor (our slowest service at 61.7s p75) has tracing completely disabled. We can't diagnose why it's slow because it sends zero data to Sentry. We also have zero custom spans anywhere in the codebase — we can see "page took 4s" but not which Redis call, Clerk call, or function caused it.


Bottlenecks — User-Facing First

Ordered by user impact. Each section includes the original finding and the investigation results.

1. Animated GIF thumbnails — 15s per image (biggest user-visible slowness)

Who's affected: Every library user viewing video grids — the most common screen. Ticket: perf-investigate-gif-thumbnails

Root cause: Video-processor generates 500KB–2MB animated GIFs (apps/video-processor/src/operations/thumbnails.ts). The library loads them via raw <img> tags without Next.js Image optimization (packages/app-library/src/shared/AnimatedThumbnail.tsx). The preload logic (packages/app-library/src/shared/extract-preload-thumbnails.ts) actually prefers animated thumbnails, front-loading the heaviest assets.

Investigation findings:

  • The AnimatedThumbnail component already accepts separate staticSrc and animatedSrc props. The fix requires only ~5 lines: add hover state to VideoDashboardGridItem, pass animatedSrc only when hovered.
  • Video-processor already generates static JPEG thumbnails (30–60KB each) for every video. No backend changes needed.
  • Preload logic is a one-line fix: flip animatedThumbnailUrl || thumbnailUrl to thumbnailUrl || animatedThumbnailUrl.
  • Payload reduction: ~95% — grid initial load drops from 3–12MB to 180–360KB.
  • WebP/MP4 alternatives are worth doing as Phase 2 but diminishing returns after the JPEG-default fix.

Fix: Show static JPEG by default (already generated), only load animated GIF on hover. Update preload logic. Effort: XS (1–2 hours).

Expected improvement: Grid pages from 30s+ → <5s with multiple videos.

2. Hello /start — 2.7s TTFB (every new user's first impression)

Who's affected: Every new user hitting the onboarding flow. Ticket: perf-investigate-hello-start

Root cause (corrected): Originally described as "two independent Clerk API calls running sequentially." Investigation revealed 7 sequential async calls forming a complete waterfall: searchParamsgetDraftIdFromCookie()registry.getDraft()auth()clerkClient()clerk.getOrganization()registry.getOrganizationProfile().

Investigation findings:

  • Two fully independent branches that run sequentially: (A) cookie-based draft lookup (no auth needed), (B) Clerk auth-based org profile lookup.
  • Inside branch B, after auth() returns orgId, clerk.getOrganization() and registry.getOrganizationProfile() are independent of each other but run serially.
  • Important caveat: For genuinely new users (no Clerk session), auth() returns { userId: null } immediately and getMyOrganizationProfile() early-returns null. The 2.7s TTFB may NOT be caused by these serial calls for new users — it may be Clerk middleware overhead.

Fix: Two changes: (A) Parallelize the page-level branches with Promise.all(), (B) Parallelize Clerk API + KV fetch inside getMyOrganizationProfile(). Effort: XS–S (1 hour).

Expected improvement: ~600–700ms saved for authenticated users (2.7s → ~2.0s). New user improvement needs Sentry spans to confirm bottleneck location.

3. Library respondent detail — 3.1s TTFB (flow manager workflow)

Who's affected: Flow managers reviewing individual respondent submissions — a key workflow screen. Ticket: perf-investigate-library-respondent

Root cause: N+1 inside getAllSessionIds() (packages/registries/src/server/video-processing-registry.ts, line 383). To return session IDs sorted by createdAt, it fetches the full session data for every session in the flow. Each getRespondentData() call makes 2 HGETALL calls — one for session data, one to re-verify the session belongs to the flow (redundant).

Investigation findings:

  • The page already uses Promise.all() for its 5 top-level data calls — the bottleneck is the internal N+1.
  • Total Redis round-trips: 2N + 6 where N = respondents. For 20 respondents: 46 Redis calls.
  • Additional waste: getClient() makes 2 HGETALL calls (one clientExists check + one getAllValues) when 1 suffices.
  • Session index (client__{flowId}__sessions) is re-fetched N+1 times redundantly.

Fix (three tiers):

  1. P0: New getAllSessionIdsSorted method — fetch only createdAt per session (1 HGET vs 2 HGETALL). Saves N calls. Effort: S.
  2. P1: Fix getClient double-fetch (2→1 HGETALL). Effort: XS.
  3. P2: Pre-fetch session index and share across callers. Saves ~N+1 more calls. Effort: M.

Expected improvement: Redis round-trips drop from ~46 to ~24 for 20 respondents. Wall-clock savings ~100–200ms. Remaining TTFB dominated by Clerk API calls.

4. Clerk JS bundle — 25s on slow connections (affects all apps)

Who's affected: Every user on a slow connection (mobile, poor wifi). Ticket: perf-clerk-eager-loading-spike (Backlog — spike for later)

Root cause: ClerkProvider wraps the entire app at root layout level. The Clerk JS bundle is eagerly loaded and blocks all rendering. On slow 3G/4G, Sentry shows it taking up to 25.7s.

Status: Parked for later spike. High risk (auth flow changes affect all apps). Needs investigation of Clerk's lazy loading options and careful testing.


Bottlenecks — Internal

5. BVF /admin — 5.6s TTFB

Who's affected: Flow managers using BVF admin (internal). Ticket: perf-investigate-bvf-admin

Root cause (corrected): Originally described as "per-flow fetches are serialized." Investigation confirmed the outer Promise.all() for per-flow getMergedSessionsForFlow calls already exists. The actual bottleneck is the inner N+1: each getAllClientSessions(flowId) fetches session IDs then individually fetches each session's data.

Investigation findings:

  • Total Redis round-trips on cold cache: ~318 Redis calls + 7 Clerk API calls for ~30 flows / ~200 sessions.
  • Two caching layers: getFilteredFlows() cached 1h, getAllFlowsSessions() cached 5min. The 5.6s TTFB is a cold-cache scenario.
  • Flow config fetching has its own N+1: getAllFlows() does 1 + 2N calls, getFlowOrgMapping() duplicates the client index fetch.

Update (2026-03-04): The admin/page.tsx now calls videoFlowActions.getAllSessions() (a single bulk Redis fetch for all raw sessions) and groups results in-memory, eliminating the outermost raw-session N+1. The remaining bottleneck is getMergedSessionsForFlow fanning out per-flow to the processing registry for enrichment.

Remaining fix: Reduce getMergedSessionsForFlow processing registry fan-out. Pagination (10 flows per page) and Redis pipeline support in BaseStorage are still valuable for further improvement.

6. Admin dashboard — 7.9s LCP

Who's affected: Every admin user on first page load of the day (internal). Ticket: perf-investigate-admin-dashboard

Root cause (corrected): Investigation found getAllClients() is called TWICE — once via getAllVideoFlows() and once via getGroupedVideoFlows(). This doubles the N+1 cost. The 60s in-memory cache only covers Clerk organizations, NOT the Redis client data.

Investigation findings:

  • Each getAllClients() call: 1 + N Redis round-trips. Called twice = 2 + 2N calls. With 30 flows: 62 Redis round-trips.
  • getAllClients() already uses Promise.all internally (individual fetches are concurrent). The issue is duplication + lack of caching.
  • BaseStorage/StorageDriver have no bulk/pipeline methods. RedisDriver uses ioredis which natively supports pipeline().

Fix (three tiers):

  1. Quick win: Deduplicate getAllClients() — call once, derive grouping in JS. Effort: XS (15 min).
  2. Cache: Add 60s in-memory cache to getAllClients(). Effort: S (30 min).
  3. Pipeline: Add hgetallMulti() to BaseStorage/RedisDriver for true batch operations. Effort: M (2–3 hours).

The Observability Gap

Ticket (video-processor): perf-investigate-video-processor-tracing Ticket (custom spans): perf-investigate-custom-sentry-spans

Zero custom Sentry spans exist in the entire codebase. All performance data comes from auto-instrumentation.

Video-processor tracing (investigation complete)

  • setupExpressErrorHandler(app) is already correctly placed in both services. No changes needed.
  • @sentry/node@10.32.1 with OpenTelemetry — enabling tracing auto-instruments Express routes.
  • Performance overhead: negligible (FFmpeg is CPU-bound in a child process, Sentry traces only Node.js event loop).
  • No PII concerns (request bodies not captured by default, only HTTP paths/methods/timing).
  • Estimated additional spans: ~500–1,000/month (0.02% of quota).

Fix: One-line change in two files: tracesSampleRate: 00.2. Effort: XS (15 min).

Custom Sentry spans (investigation complete)

Recommended pattern: withRedisSpan() / withAuthSpan() helper functions wrapping Sentry.startSpan().

Key findings:

  • API confirmed: Sentry.startSpan() with onlyIfParent: true is the correct API (@sentry/nextjs@^10.32.1). Zero overhead when no transaction is active.
  • Proxy pattern doesn't work: BaseStorage methods are protected and only called via this.method() from subclasses. A Proxy only intercepts external property access — it would capture zero calls.
  • Instrument 12 BaseStorage methods + 3 OrganizationClient leaf methods (checkOrganizationAuth, getCurrentOrganization, getAllUserMemberships). The verify* methods don't need direct wrapping — Sentry auto-creates parent-child spans via async context.
  • Key redaction: redactKey() helper replaces IDs with placeholders (org:{orgId}, vf:{flowId}) to avoid PII in spans.
  • Naming: op: "db.redis" + name: "{REDIS_CMD} {redacted_key}" for storage; op: "auth.clerk" + name: "clerk.{method}" for auth.
  • Estimated additional spans: ~25K–42K/month (0.7% of quota, bringing total to ~24.7%).

Fix: Create helper functions, wrap 15 methods, add peer dependencies. Effort: S (2–3 hours).


Recommended Priority

Ordered by user impact and effort:

#FindingTicketEffortQuick Win?
1GIF thumbnails (95% payload reduction)perf-investigate-gif-thumbnailsXSYes — ~5 lines
2Hello /start parallelizationperf-investigate-hello-startXS–SYes
3Library respondent N+1perf-investigate-library-respondentSYes (P0 tier)
4Video-processor tracingperf-investigate-video-processor-tracingXSYes — 1-line x2
5Admin dashboard dedup + cacheperf-investigate-admin-dashboardXS–SYes
6Custom Sentry spansperf-investigate-custom-sentry-spansSFoundation for future
7BVF admin bulk fetch + paginationperf-investigate-bvf-adminMNo
8Clerk eager loadingperf-clerk-eager-loading-spikeM–LNo — needs spike

Cross-cutting improvement: Adding hgetallMulti() pipeline support to BaseStorage/RedisDriver would benefit findings #3, #5, #6, and #7. This is the single highest-leverage infrastructure change.


Appendix A: Per-App Performance Data

Raw Sentry data tables for reference. Data window: 14 days (ending 2026-03-02), p75 unless noted.

Admin (Sentry project 4510642999197776)

Frontend — Page Loads (p75)

PageDurationLCPFCPTTFBHits
/ (dashboard)7,918ms5,048ms4,373ms65ms356
/hello-drafts6,556ms1,190ms1,190ms3,929ms41
/sign-in6,224ms5,907ms4,092ms1,076ms106
/video-exporter6,204ms3,324ms3,324ms150ms6
/flows/:videoFlowId/:respondentId4,088ms3,908ms2,684ms135ms178
/flows/:videoFlowId/intelligence3,411ms1,287ms1,287ms99ms58
/flows/:videoFlowId2,707ms3,659ms3,558ms183ms256

Backend — Server Transactions (p75)

RouteDurationHits
POST /flows/[videoFlowId]/[respondentId]61,736ms35
GET /api/export-raw-clips7,602ms40
GET /1,631ms735
GET /sign-in/[[...sign-in]]1,390ms695

Explore: Slowest transactions · Slowest spans · Backend only

Library (Sentry project 4510080388366416)

Frontend — Page Loads (p75)

PageDurationLCPFCPTTFBHits
/sign-in5,432ms5,032ms5,426ms1,081ms105
/dashboard/:videoFlowId/:respondentId4,633ms4,656ms3,304ms3,149ms30
/switch-org4,581ms2,388ms2,388ms5
/4,385ms2,640ms1,908ms670ms170
/dashboard/:videoFlowId2,945ms1,908ms1,164ms741ms50

Slowest Spans (7-day window)

OperationDescriptionp75MaxCount
default/api/webhooks/process-video/route37,169ms39,002ms40
http.clientPOST video-processor-prod30,621ms33,735ms65
resource.imgAnimated GIF from Vercel Blob15,080ms15,080ms5

Explore: Slowest spans · Slowest transactions

Branded Video Flow (Sentry project 4510080275120208)

PageDurationLCPFCPTTFBHits
/admin7,652ms5,696ms5,696ms5,610ms5
/4,657ms2,292ms2,556ms1,229ms40
/flows/:flowId3,468ms1,918ms1,792ms1,167ms535
/flows/:flowId/frame1,305ms520ms484ms419ms5

Explore: Slowest transactions

Hello (Sentry project 4510515885834320)

PageDurationLCPFCPTTFBHits
/start7,486ms5,307ms5,702ms2,741ms110
/flows/vf_ki4j-rSNmDlM5,643ms3,372ms3,372ms2,322ms5
/4,050ms3,024ms3,218ms662ms100

Explore: Slowest transactions


Appendix B: Sentry Configuration Reference

SDK Config per App

ApptracesSampleRateprofilesSampleRateBrowser ProfilingSpans (30d)
library0.20.110%420,405
admin0.20.110%260,949
branded-video-flow0.20.110%266,684
hello0.20.110%214,478
links0.20.110%38,980
demo0.1No140
video-processor00.1N/A0
sync-worker00.1N/A0

Total: ~1.2M spans/month of 5M included (24% of quota).

Config File Locations

  • Next.js server: apps/*/instrumentation.ts
  • Next.js client: apps/*/sentry.client.config.ts
  • Express: apps/video-processor/src/index.ts (lines 46–79), apps/sync-worker/src/index.ts (line 23)
  • Shared builders: packages/core/src/server/sentry.ts, packages/core/src/web/sentry.ts

Data Layer (Instrumentation Targets)

All app data flows through a registry pattern. Instrumenting BaseStorage covers everything:

App Page → Registry (e.g., VideoProcessingRegistry)
               → BaseStorage (12 methods)  ← instrument here
                    → RedisDriver (11 methods)
                         → ioredis → Redis

BaseStorage (packages/storage/src/base-storage.ts) — 12 methods:

  • Hash: getValue, setValue, getAllValues, deleteValue
  • String: getSimpleValue, setSimpleValue, setSimpleValueWithTTL, deleteSimpleValue
  • Sorted set: sortedSetAdd, sortedSetRevRange, sortedSetRemove
  • Scan: scanSimpleKeys

OrganizationClient (packages/auth/src/server/organization-client.ts) — 8 methods (3 leaf methods to instrument):

  • Instrument: checkOrganizationAuth, getCurrentOrganization, getAllUserMemberships
  • Skip (delegate to above): verifySignedInUser, verifySignedInOrganization, verifyOrganisationAdmin, checkUserMembership, checkOrganization (sync)

Span count queries: Admin · Library · BVF · Hello


Related Documents