All docs/Operations

docs/architecture/nightly-maintenance-architecture.md

Nightly Maintenance Architecture

Status: Partially implemented (4/9 active, 5 planned) Kanban task: simplify-nightly-kanban-actions Demo visualization: /nightly-maintenance in demo app

Overview

The nightly maintenance pipeline is a set of focused, single-purpose jobs that run sequentially after hours (UTC). Each job answers one question about the codebase, makes one type of change, and produces output reviewable in under a minute.

Goal: Automate the work that would otherwise require dedicated engineers — code review, documentation, dependency management, tech debt reduction — without adding headcount.

Pipeline Timeline

UTC   Job                        Status      Slack Channel
────  ─────────────────────────  ──────────  ──────────────
00:00 ① Daily Overview           ✅ ACTIVE    #daily-overview
02:00 ② Dead Code Cleanup        ✅ ACTIVE    #ai-janitor
03:00 ③ New Code Reviewer        🔲 PLANNED   #ai-janitor
04:00 ④ Boy Scout Scanner        🔲 PLANNED   #ai-janitor
05:00 ⑤ Documentation Generator  🔲 PLANNED   #ai-janitor
06:00 ⑥ Dependency Health        ✅ ACTIVE    #ai-janitor
07:00 ⑦ Kanban Hygiene           ✅ ACTIVE    #ai-janitor
08:00 ⑧ Performance Baseline     🔲 PLANNED   #ai-janitor
05:00 ⑨ Architecture Review      ✅ ACTIVE    #ai-janitor

Jobs run 2 hours apart to avoid CI resource contention and allow later jobs to consume earlier outputs.

Universal Job Pattern

Every nightly job follows the same 3-step architecture:

┌─────────────────────────────────────────────────────┐
│                   GitHub Actions                     │
│                                                      │
│  ┌──────────────┐   ┌──────────────┐   ┌──────────┐ │
│  │  Step 1       │   │  Step 2       │   │  Step 3   │ │
│  │  PRE-SCAN     │──▶│  LLM REVIEW   │──▶│  APPLY    │ │
│  │              │   │              │   │          │ │
│  │ Deterministic │   │ Claude (if    │   │ Create PR │ │
│  │ TypeScript    │   │ candidates    │   │ Post Slack│ │
│  │ Zero LLM cost│   │ exist)        │   │ Update    │ │
│  │              │   │              │   │ memory    │ │
│  └──────────────┘   └──────────────┘   └──────────┘ │
│        │                   │                  │      │
│        ▼                   ▼                  ▼      │
│   candidates.json    decisions.json    PR + Slack    │
└─────────────────────────────────────────────────────┘

Why this pattern?

  1. Zero cost when clean — Pre-scan gates the LLM. No candidates = no Claude API call.
  2. Testable — Deterministic steps can be unit tested without LLM mocking.
  3. Reviewable — Each step produces a JSON artifact that's inspectable.
  4. Bounded cost — LLM step has explicit --max-turns cap.

Job Details

① Daily Overview (00:00 UTC) — ✅ ACTIVE

AspectDetail
QuestionWhat did we ship today?
Pre-scanget-daily-overview-window.ts computes 24h window, gh fetches merged PRs + commits
LLMClassifies changes by impact, generates HEAD (Slack headline) + COMPACT (detail)
Output.kanbn/daily-overview/YYYY-MM-DD-daily-overview.md → PR → Slack #daily-overview
ModelClaude (via claude-code-action)
Workflow.github/workflows/daily-overview.yml
Prompt.github/prompts/daily-overview.md (148 lines)

Data flow:

gh pr list (merged) ──▶ Claude ──▶ daily-overview.md ──▶ PR ──▶ Slack
gh log (commits)    ──┘

② Dead Code Cleanup (02:00 UTC) — ✅ ACTIVE

AspectDetail
QuestionIs there unused code in workspace X?
Pre-scanKnip static analysis on rotation target (35 workspaces)
LLMReviews Knip findings, removes dead code with skip rules
OutputAuto-merge PR + Slack #ai-janitor
Fast pathNo findings → update tracker, skip Claude entirely
ModelClaude Sonnet 4 (--max-turns 100)
Workflow.github/workflows/nightly-dead-code-cleanup.yml
Memory.kanbn/memory/dead-code-rotation.json

Rotation: 35 workspaces, one per night. Full rotation = ~5 weeks.

Data flow:

rotation.json ──▶ Knip scan ──▶ findings?
                                  │
                    ┌─────────────┼─────────────┐
                    ▼             ▼              ▼
                 0 findings    N findings     Invalid JSON
                    │             │              │
                    ▼             ▼              ▼
               Update tracker  Claude review   Warn + skip
               Clean PR        Remove code
               Auto-merge      Verify (lint+tc)
                               PR + Slack

③ New Code Reviewer (03:00 UTC) — 🔲 PLANNED

AspectDetail
QuestionDoes yesterday's new code have potential bugs or anti-patterns?
Pre-scanCollect diffs from PRs merged in last 24h (from daily overview)
LLMReview each PR diff for: security issues, error handling gaps, race conditions, missing edge cases, performance regressions
OutputReview report markdown, kanban tasks for critical findings
Key distinctionNot a linter — focuses on semantic/logical bugs that static analysis misses

Candidate categories:

  • Security (OWASP top 10, injection, XSS)
  • Error handling (missing try/catch, swallowed errors)
  • Race conditions (async patterns, state management)
  • Edge cases (null checks, boundary conditions)
  • Performance (N+1 queries, unnecessary re-renders, large bundles)
  • Pattern violations (direct process.env, import-time env access)

④ Boy Scout Scanner (04:00 UTC) — 🔲 PLANNED

AspectDetail
QuestionWhat small improvements can we make to recently-touched code?
Pre-scanFiles changed in last 7 days. Measure: file length (>500 lines), function complexity, duplicate patterns, inconsistent naming
LLMTriage: fix-inline (trivial), create-task (larger), skip (not worth it)
OutputAuto-fix PR for inline fixes + new kanban tasks for larger items
Scope guardOnly touches files modified in last sprint — never random refactoring

Examples of inline fixes:

  • Replace raw console.error with useSentryToast
  • Add missing lazy initialization pattern
  • Extract 20-line inline function to named function
  • Remove unused imports

Examples of kanban tasks created:

  • "Refactor 1500-line VideoRespondentDashboard into sub-components"
  • "Extract duplicate Sentry registration to shared utility" (already done as boyscout task)

⑤ Documentation Generator (05:00 UTC) — 🔲 PLANNED

AspectDetail
QuestionAre our user-facing docs up to date with the latest features?
Pre-scanCompare recent PRs (last 7 days) with existing docs. Detect: new features without docs, changed behavior not reflected, FAQ-worthy patterns
LLMGenerate/update: feature explanations, FAQ entries, API docs, workflow guides
OutputPR with doc updates
FormatTBD — research how Loom, Notion, Linear structure public docs

Content categories:

  • Feature explanation pages (what it does, how to use it)
  • FAQ entries (common questions from support/usage patterns)
  • API endpoint documentation (request/response schemas)
  • Integration guides (webhooks, embed codes)
  • Changelog entries (human-readable release notes)

⑥ Dependency Health (06:00 UTC) — 🔲 PLANNED

AspectDetail
QuestionAre our dependencies safe, current, and lean?
Pre-scanpnpm audit (security), pnpm outdated (versions), bundle impact analysis
LLMFor major version bumps — read changelogs, assess migration effort
OutputSecurity fix PR (auto-merge for patch), report for major upgrades
Slack"2 security patches applied, 3 major upgrades need review"

Auto-merge criteria (no LLM needed):

  • Patch version bumps (1.2.3 → 1.2.4)
  • Known-safe minor bumps (types packages, linters)

LLM review needed:

  • Major version bumps (breaking changes)
  • Minor bumps with large changelogs
  • New transitive dependencies

⑦ Kanban Hygiene (07:00 UTC) — ✅ ACTIVE

AspectDetail
QuestionIs the kanban board accurate?
Pre-scannightly-pv-collect.ts — cross-reference daily overview with task statuses
LLMReview candidates — approve/reject status changes and duplicates
OutputSingle-purpose PR with clear commit messages per change type
Workflow.github/workflows/nightly-kanban-hygiene.yml
Prompt.github/prompts/nightly-kanban-hygiene.md

Focused on 4 checks (simplified from original 12 responsibilities):

  1. Status sync (task status matches merged PR state)
  2. Staleness detection (>60 days without activity in Backlog)
  3. Duplicate flagging (shared impactedApps + overlapping title keywords)
  4. Archive Done tasks (move to .kanbn/archived-tasks/)

⑧ Performance Baseline (08:00 UTC) — 🔲 PLANNED

AspectDetail
QuestionIs the app getting faster or slower?
Pre-scanpnpm build — capture bundle sizes per app. Optional: Lighthouse CI against staging
LLMCompare against baselines — explain significant changes
OutputTrend data in .kanbn/memory/performance-baselines.json. Kanban task if bundle grows >5% in a week
SlackWeekly trend to #ai-janitor

Metrics tracked:

  • Bundle size per app (JS + CSS)
  • Build time per app
  • Number of dependencies per app
  • Optional: Core Web Vitals from Lighthouse

⑨ Architecture Review (05:00 UTC) — ✅ ACTIVE

AspectDetail
QuestionDo our architecture docs match the actual code? Are there improvement opportunities?
Pre-scannightly-architecture-review-collect.ts — rotation through 22 targets, finds existing docs, computes recent changes, lists structure
LLMReads code and docs deeply, verifies alignment, updates outdated docs with Mermaid diagrams, creates improvement kanban tasks
OutputPR with doc updates + kanban tasks with ## Human Decision Needed sections
Fast pathNone — always invokes Claude (docs need semantic review even when code hasn't changed)
ModelClaude Sonnet 4 (--max-turns 80)
Workflow.github/workflows/nightly-architecture-review.yml
Prompt.github/prompts/nightly-architecture-review.md
Memory.kanbn/memory/architecture-review-rotation.json
Skill.claude/skills/architecture-review/SKILL.md (on-demand companion)

Rotation: 22 targets (11 apps + 11 packages), one per night. Full rotation ≈ 3 weeks.

Human decisions captured as kanban tasks — when the review identifies an improvement that requires judgment (e.g. "should we extract this to a shared package?"), a kanban task is created with a ## Human Decision Needed section. The reviewer answers in a follow-up session.

Data flow:

rotation.json ──▶ collect candidates ──▶ Claude reviews docs vs code
                                              │
                              ┌────────────────┼───────────────┐
                              ▼                ▼               ▼
                        Update outdated   Create new     Create improvement
                        docs + Mermaid    docs           kanban tasks with
                        diagrams                         human questions
                              │                │               │
                              └────────┬───────┘               │
                                       ▼                       ▼
                                   PR (needs review)    Tasks in .kanbn/tasks/
                                   Slack #ai-janitor    (human answers later)

Future Ideas (Backlog)

JobQuestionComplexity
Type Safety ProgressionHow many any types remain? Are we getting stricter?Low — grep + count
API Contract ValidatorDo our API responses match their documented schemas?Medium — needs staging access
Test Coverage ReporterIs test coverage increasing or dropping?Low — Jest coverage report
Accessibility AuditAre our pages WCAG compliant?Medium — needs axe-core + browser

Memory & State

Nightly jobs persist state in .kanbn/memory/:

FilePurposeUpdated by
dead-code-rotation.jsonRotation index, scan history (last 10)② Dead Code Cleanup
board-health.jsonBoard metrics, recurring flags, priority list⑦ Kanban Hygiene
performance-baselines.jsonBundle sizes, build times (planned)⑧ Performance Baseline
architecture-review-rotation.jsonRotation index, review history (last 10)⑨ Architecture Review

Implementation Priority

PhaseJobsRationale
Phase 1 (done)①②⑦Foundation: daily overview, dead code cleanup, kanban hygiene
Phase 2 (next)Add new code reviewer
Phase 3④⑥Boy scout + dependency health — high automation value
Phase 4⑤⑧Documentation + performance — requires more design work
Phase 5 (done)Architecture review — doc alignment and improvement discovery

File Index

.github/
├── workflows/
│   ├── daily-overview.yml              ① trigger
│   ├── daily-overview-post.yml         ① Slack relay
│   ├── nightly-dead-code-cleanup.yml   ② trigger
│   ├── nightly-product-verification.yml ⑦ legacy (disabled)
│   └── nightly-kanban-hygiene.yml      ⑦ trigger (active)
├── prompts/
│   ├── daily-overview.md               ① prompt
│   ├── nightly-dead-code-cleanup.md    ② prompt
│   ├── nightly-pv-review.md            ⑦ legacy prompt
│   ├── nightly-kanban-hygiene.md       ⑦ prompt (Step 2)
│   └── nightly-architecture-review.md  ⑨ prompt

packages/ci-scripts/src/
├── get-daily-overview-window.ts              ① helper
├── find-daily-overview-file.ts               ① helper
├── post-daily-overview-to-slack.ts           ① Slack
├── post-dead-code-cleanup-to-slack.ts        ② Slack
├── nightly-pv-collect.ts                     ⑦ Step 1
├── nightly-pv-apply.ts                       ⑦ Step 3
├── post-product-verification-to-slack.ts     ⑦ Slack
├── nightly-architecture-review-collect.ts    ⑨ Step 1
└── post-architecture-review-to-slack.ts      ⑨ Slack

.kanbn/memory/
├── dead-code-rotation.json            ② state
├── board-health.json                  ⑦ state
├── performance-baselines.json         ⑧ state (planned)
└── architecture-review-rotation.json  ⑨ state

.claude/skills/architecture-review/
├── SKILL.md                           ⑨ on-demand companion skill
└── references/
    ├── coverage-map.md                ⑨ doc coverage tracking
    └── review-checklist.md            ⑨ verification checklist

docs/architecture/
└── nightly-maintenance-architecture.md  This document