The problem
Code review is the bottleneck of most small teams. Junior devs merge PRs that introduce N+1 queries, leak secrets in env var comments, or silently break 6 downstream services. Senior devs spend half their review time on mechanical checks that a tool should catch.
I wanted a GitHub App that runs on every PR, catches the mechanical stuff automatically, and posts actionable inline comments directly on the diff — not a summary at the bottom, but a comment on the exact line that's wrong.
Architecture: 4 agents in parallel
On each PR webhook, the orchestrator spawns 4 specialist agents that run concurrently:
- Security agent: Regex + entropy analysis for secrets, hardcoded credentials, dangerous patterns (eval, shell injection).
- Quality agent: N+1 detection, cyclomatic complexity heuristics, dead code patterns.
- Blast Radius agent: Import graph traversal to find how many modules depend on changed files. Flags PRs that touch high-fan-out code.
- LLM agent: Sends the diff to an LLM for logical review — things no static analysis catches, like wrong business logic or missing edge cases.
Each agent is independently time-boxed. If any agent times out, the others still complete and post their comments. The orchestrator collects results and maps each finding to a file path + line number for the GitHub inline comment API.
Zero-LLM static analysis
The Security and Quality agents do not call any LLM. They run pure static analysis — fast, deterministic, and free. This matters for two reasons:
- Latency: static analysis finishes in milliseconds, LLM calls take seconds.
- Cost: most PRs have obvious issues (hardcoded API key, missing null check) that don't need LLM reasoning. Only the LLM agent incurs API cost.
Security: customer key management
Users bring their own LLM API keys. Storing them plaintext in Firestore is a non-starter. Keys are encrypted with GCP KMS (AES-256) before storage and decrypted at request time in the worker. The KMS key is never accessible to the application at rest — only the encrypted ciphertext is stored.
Testing
222 passing tests across all four agents, the orchestrator, and webhook deduplication. Each agent has unit tests against known-bad code patterns. The orchestrator has integration tests with mocked GitHub webhooks. Webhook dedup has fuzz tests to verify idempotency under repeated delivery.
What I'd improve
The blast radius agent currently only traverses one level of the import graph. Deep transitive dependencies (A imports B imports C — you changed C) aren't flagged. Full transitive traversal is the obvious next step, though it requires caching the dependency graph between PRs to stay fast.