In December 2025 I shipped a content platform called bkamp.ai. Eleven microservices. A Next.js portal. GitOps on EKS. A cloud setup I had never deployed before. The first production merge went out nine days after the first commit. One person, with Claude Code as a second pair of hands.
Four months and about 1,500 commits later, that workflow has been packed into a Claude Code plugin called bkit. This post is two things at once. It's the story of how those 9 days actually went. And it's a write-up of what those 9 days taught.
The result first: 9 days, 11 services, one person

Here are the numbers, before the story:
| Metric | Value |
|---|---|
| Repository | popup-studio-ai/bkamp-portal |
| Day 0 | 2025-12-01 (Mon) |
| First production merge | 2025-12-09 20:11 KST, PR #21 |
| Time to launch | 9 days (D+8) |
| Commits to launch | 207 across 25 PRs |
| Microservices at launch | 11 (Auth, User, Project, Content, Community, Chat, Media, Search, Admin, Notification, Recipe) |
| Claude Co-Authored ratio (final 14 days) | 177/320 = 55% |
| Structured docs produced in first 14 days | 70+ |
That last row is the one that explains all the others. Most "AI shipped a project in N days" stories speed up prompt typing. This one did the opposite. It put the spec first, numbered each doc, and broke work into units the AI could translate. The model wrote a lot of code. It almost never picked what code to write.
The rest of this post follows the real timeline. Then it pulls out the seven patterns that show up everywhere. Then it walks through how those patterns are now baked into bkit.
Day 0: write the rules before writing code
The first four commits to bkamp-portal had zero lines of business
logic. They had this:
| Commit | Contents |
|---|---|
baeb2aff | README.md + .claude/instructions/CLAUDE.md (159 lines) + docs/00-requirement/ |
ab8b00d8 | Commit-message convention added to the Claude instruction file |
70110820 | A market-strategy PDF in docs/00-requirement/ |
ace3314c | A short brand/strategy video as a binary file |
That CLAUDE.md is the load-bearing piece. It draws a PDCA cycle as ASCII
flow. It tells the AI to do Plan→Do→Check→Act per task. It names me (kay)
as the verifier of every Claude output. It demands Korean for commit
messages. It requires TodoWrite for any non-trivial change. About 100
lines of useful meta-rules.
Those 100 lines shaped the other 1,170 commits. Every PR after this inherited the rhythm, the language split, the named verifier, and the doc-numbering habit from one file. Day 0 is when you build the house the AI is going to live in. Skip it and you'll argue about conventions inside every prompt for the rest of the project.
Day 1: 11 microservices in 24 hours (because the spec was ready)
Day 1 produced 17 commits and merged PRs #1–#3. By the end of day, the
services/ folder held Auth, User, Project, Content, Community, Chat,
Media, Search, Admin, Notification, and a shared/ package. Eleven
Clean-Architecture skeletons with Pydantic schemas, FastAPI routers,
and docker-compose wiring. From scratch.
That sounds like a story about how smart the model is. It isn't. Read the commits in order:
45302a2a market analysis report
57b1a6ba system architecture design doc
7d7199c2 brand rename: bkii → bkamp (sweep)
8152bd1a PostgreSQL schema design doc
e5e43b2b static mockup pages
b4d1c7a8 API contract spec
807d5fde realtime + architecture refinement
4d4e9d1e Phase 1: environment
01bdc15b Phase 2/3: MVP core
a13f18fe Phase 3 extension + Gap analysis → PR #1, #2, #3 mergeSix design docs land before the first scaffold commit. By the time Claude Code is asked to build the eleven services, the spec for each one is already a numbered document. The work unit isn't "build a chat service." It's "translate Document 7 §3.2 into FastAPI." Eleven of those translations fit in a day, because the model never had to invent boundaries. It just rendered specs it had already been handed.
This is the first speed trick. The spec is the bottleneck. When it's ready in advance, the keyboard isn't.
Day 3: when terms drift, the AI drifts
Day 3 is the most useful day to study. The platform had two related ideas — voting on showcase entries, and "liking" community posts. The codebase used both terms in different places. API responses, DB columns, Service Worker push payloads, and admin UI labels all said different things.
The fix is in the order of the commits:
| Commit | Contents |
|---|---|
3a2d2154 | Add a "Like vs Vote terminology guide" + update coding convention |
a52ef726 | Sweep upvote → like everywhere; produce Gap Analysis report v2 |
f0db2ff5 | Add totalLikes, upvotes fields to admin API types |
b8f43cef | Apply terminology guide consistently |
9d3b2413 | Firebase Service Worker payload: upvote → like |
c39263fa | Refactor: collapse vote/upvote concept into like |
A guide-document commit comes first. Then the sweep. The model needs one written, link-targetable definition of "Like" vs "Vote" to stay consistent across six layers of the codebase. Without that doc, the same prompt makes different terms in different sessions.
The lesson is plain and important: the LLM mirrors the mess it sees. Pin the term in a doc, then sweep — that beats asking the model to be consistent on top of an inconsistent codebase. That same day also rolled in "unified logging," "environment variable cleanup," and "OAuth error standardization" all at once. A planned cross-cutting day to clear future drift in one pass.
Day 4: checkpoint, then tear it down
Day 4 made the boldest call of the launch. Rebuild three days of frontend on top of shadcn/ui, and lift the codebase into a monorepo at the same time.
ee56f2b3 checkpoint: hero/cards/buttons polish ("rollback point")
4e633430 refactor: Portal frontend rebuilt on shadcn/ui
122b1ce1 feat: extract packages/ui; expand seed data 10×
... PR #7 (Major Refactoring)Two things matter here. First, the checkpoint commit has the words
"rollback point" right in its message. There's one known-good spot to
return to if the rebuild fails. Second, the rewrite is bundled with a
structural fix (packages/ui extraction). It isn't "redo the UI." It's
"redo the UI on the structure we should have started with."
This is the third speed trick: safety net first, then nerve. A 9-day timeline doesn't survive being timid on Day 4. It also doesn't survive having no escape route from a bold call.
Day 8: the infrastructure big bang
For seven days the infra/ folder didn't exist. Then Day 8 ran 56
commits in 24 hours — one every 26 minutes on average — and shipped:
42b74321 Terraform AWS (VPC, EKS, RDS, ElastiCache, ALB)
cbdb4564 K8s manifests (kustomize base + staging overlay)
e91880b1 GitHub Actions CI/CD: 5 workflows
(ci, build-backend, build-frontend, deploy-staging, deploy-prod)
301a4a43 Production K8s overlay + ArgoCD application
3414ca2f CORS + production OAuth
... PR #8 (GitOps pipeline + portal complete)
... PR #9–#17 (GitOps simplification storm: image-tag auto-update, ArgoCD sync)
b35c2b17 PR #21 staging→production release ← LIVE 20:11 KSTThis 24-hour sprint is only possible because of the seven days before it. Backend and frontend were kept in a state where they "just need to be put on rails." The conventions, env var layout, and Docker boundaries had been pre-aligned. Nine of seventeen Day-8 PRs are GitOps-simplification PRs. That's the visible signature of the alignment work.
The next day was hotfix mode. ALB to Nginx Ingress. CloudFront CDN in front of media buckets. Redis-backed view counters. The image-tag auto-update workflow was already running on Day 9, which is why each hotfix took minutes, not hours.
What bkamp is now, four months later
That same codebase has since grown to 19 microservices. It made an i18n trip from 8 languages back to 2. It picked up an MCP v2 surface that exposes bkamp's data to outside AI agents. It added a competition domain. The homepage you see at the top of this post is the visible tip. Below the surface sit the community feed and showcase that the launch was built to power.


Two PRs from after the launch are worth flagging. They make the workflow easy to read. PR #249 (April 2026) rolled the i18n surface back from 8 languages to ko/en, on purpose, after data showed 86% of translation traffic was being burned on languages with tiny read share. The PR kept DB rows and OpenSearch fields in place rather than deleting them. A reversible step back, by design. PR #245 shrank the Redis PVC from 8Gi to 1Gi, with a cost analysis doc as a co-author on the call. Both PRs read like PDCA Act-phase work, not "let's clean up." That's not luck. The first 100 lines of CLAUDE.md set the rhythm and it stuck.
Seven patterns of "vibe coding," extracted
Watch the 1,170-commit timeline from a distance and the same seven patterns repeat. They're the rules I'd hand to anyone trying to copy this kind of timeline:
| # | Pattern | What it looks like |
|---|---|---|
| 1 | Day 0 meta-rules | A 100–200 line CLAUDE.md before any business logic. Names the verifier, the cadence, the language split |
| 2 | Korean intent / English code / Korean commit | Plan in your strongest language; let the model write code in its strongest. AI is great at the translation |
| 3 | Numbered docs as work units | "Document 28 §3" beats "build the chat service." Drops output variance to near zero |
| 4 | One PDCA cycle per day | Plan→Do→Check→Act per day; one PR ≈ one cycle. Keeps context fresh and scope finite |
| 5 | Cross-cutting day | Logging, env vars, terms, design system — bundled together on one day so feature days stay clean |
| 6 | Checkpoint then rebuild | Mark the rollback spot, then make the bold call. Day 4 shadcn rebuild and Day 8 infra big bang are both this |
| 7 | Spec-first, code-second | Six design docs before the first scaffold. The keyboard is not the bottleneck |
The thread that ties these together isn't "the model is smart." It's "the human work moves earlier in the cycle." The author still has to write the design doc, write the convention, pick the rollback point. What changes is that those notes become inputs to a translation step, not stuff added to the code after the fact.
For a closer look at one of these patterns, the harness engineering post goes deeper on why the workflow around the model matters more than which model you pick.
From practice to plugin: meet bkit
Exactly one month after the bkamp launch — 2026-01-09 — I opened a new repo called bkit-claude-code. The slogan is one sentence:
The only Claude Code plugin that verifies AI-generated code against its own design specs.

bkit exists because the seven patterns above are hard to keep up by willpower across many sessions. A 9-day push is doable. A 12-month one isn't. The plugin makes the patterns the default, not the discipline.
The v2.1.12 surface area, in numbers:
| Surface | Count | Purpose |
|---|---|---|
| Skills | 43 | Structured domain knowledge invocable as /skill |
| Agents | 36 | Roles with model + tool + memory constraints |
| Hook events | 21 | Pre/Post/SessionStart points the plugin observes |
| Lib modules | 142 | The actual code, partitioned across 4 architecture layers |
| MCP servers | 2 | bkit-pdca (10 tools) + bkit-analysis (6 tools) |
| Output styles | 4 | Learning / PDCA-guide / Enterprise / PDCA-Enterprise |
These are the surfaces. The patterns live inside them. The Day-0
meta-rule habit is now an auto-injected SessionStart context. The
numbered-doc habit is now docs/01-plan/features/{feature}.plan.md
through docs/04-report/features/{feature}.completion-report.md, with
a strict folder schema. Korean+English intent is now an 8-language
intent router (lib/intent/) plus a KO/EN translation pool with a
6-language fallback.
PDCA as a state machine, not a vibe
The single most important call in bkit was modeling PDCA as a written
finite state machine, not as etiquette. The file
lib/pdca/state-machine.js defines:
- States (11):
idle, pm, plan, design, do, check, act, qa, report, archived, error - Events (22):
START, PM_DONE, PLAN_DONE, DESIGN_DONE, DO_COMPLETE, MATCH_PASS, ITERATE, ANALYZE_DONE, QA_PASS, ROLLBACK, RECOVER, RESET, ERROR, … - Transitions (25): forward path
idle→pm→plan→design→do→check→(qa|act)→report→archivedplus an iteration loopcheck ──ITERATE→ act ──ANALYZE_DONE→ check - Guards (9):
guardDeliverableExists,guardDesignApproved,guardMatchRatePass,guardCanIterate(max 5 iterations),guardCheckpointExists, …

The Match Rate ≥ 90% threshold sits in exactly one place
(bkit.config.json:67). It became a single source of truth in v2.1.10
after we caught a 100/90 mismatch between the doc and the gate. That's
the kind of bug bkit is built to make impossible.
The shift from bkamp to bkit isn't "now the AI is smarter." It's "now
the workflow is checkable." When gap-detector reports that the code
matches the design at 87%, pdca-iterator is the agent that re-enters
Act, fixes the gap, and re-runs the gate — up to five times. The human
in the loop reviews outcomes. The loop itself runs without prodding.
For the method behind that loop, the
PDCA-for-Claude-Code post is the
deep dive.
The payoff: 79 straight Claude Code releases
The discipline pays off in numbers you can see from outside:
- 79 straight compatible CC releases (v2.1.34 → v2.1.118+) without a break
- 117+ test files / 4,000+ test cases with zero failures on
main - Invocation Contract L1–L5 with 226 CI-gated assertions that re-run on every push — the public surface of the plugin can't change shape silently
- Domain-purity CI (
scripts/check-domain-purity.js) blocksfs,child_process,net,http,osfrom enteringlib/domain/— the architectural boundary is enforced by code - Docs=Code CI (
scripts/docs-code-sync.js) blocks drift across 8 architecture counters (Skills, Agents, Hook events, Lib modules, MCP servers, …), 5 BKIT_VERSION locations, and 5 one-liner SSoT pins
The reason 79 straight CC versions haven't broken bkit isn't luck. It isn't "we move fast." It's that the contract between bkit and Claude Code is written down as 226 assertions, and every commit re-proves them. The same idea bkamp used on Day 0 — pin the convention before you write code — is now an automatic property of the bkit codebase itself.
Takeaways
Five things to walk away with:
- Day 0 isn't optional. The 100 lines of CLAUDE.md decide the shape of the next 1,000 commits more than any model upgrade does.
- Move human work earlier in the cycle. Specs, conventions, and rollback points all belong before the keyboard, not after the bug.
- The model is a translator, not an author. Numbered design docs turn "build X" into "render Document N §3." Variance drops, output compounds.
- Checkpoint, then be brave. The Day 4 shadcn rewrite and the Day 8 infra big bang were possible because there was a known-good commit to return to. No safety net, no nerve.
- Bottle the discipline. Personal willpower scales to one project. A plugin like bkit scales it to every session. The gap between bkamp and bkit is the gap between a 9-day sprint and a workflow you can hand to someone else.
If any of this is useful to your own work, the source is open: github.com/popup-studio-ai/bkit-claude-code. And the platform that proved the method is live at bkamp.ai.
Terms used in this post
ArgoCD — A tool that watches a Git repo and keeps a Kubernetes cluster in sync with what the repo says. Push to Git, the cluster updates.
Clean Architecture — A way to lay out code so the business rules don't depend on the framework or the database. Easy to test, easy to swap out parts.
Domain-purity CI — A check that runs on every commit. It blocks the core "domain" folder from importing things like file system or network code, so the boundary stays clean.
GitOps — A workflow where the live system is described by files in a Git repo. Want to change the system? Edit the files and merge. A bot applies the change.
Harness engineering — Building the rails around an AI model — the prompts, hooks, checks, and rollback points — instead of just chatting with the model. The rails matter more than the model.
Invocation Contract — A written-down list of what a tool's outside surface looks like (commands, inputs, outputs). Tests re-prove it on every push, so callers don't break by surprise.
K8s (Kubernetes) — A system that runs and manages many small server processes (containers) across a fleet of machines. Restarts them, scales them, networks them.
Match Rate — A score from bkit. It compares the finished code to the design doc and returns a percent. If it's below the threshold (default 90%), the cycle loops back to fix the gap.
MCP — Model Context Protocol. A standard way for AI tools to call external functions. bkit ships two MCP servers so the model can read PDCA state and run analyses.
MSA (microservices) — An app split into many small services that talk over the network, instead of one big program. Each service can be built and deployed on its own.
PDCA — Plan, Do, Check, Act. A four-step loop for getting work done. You plan, you do it, you check the result, then you act on what you learned and start again.
State machine — A list of named states (like plan, do, check) and the rules
for moving between them. If a move isn't on the list, it can't
happen. Stops the workflow from going off the rails.
Terraform — A tool that builds cloud infrastructure (servers, networks, databases) from text files. Edit the files, run the tool, the cloud changes to match.
FAQ
Could anyone reproduce the 9-day timeline?
Not by typing faster. The compression came from work done before Day 1: a 159-line meta-rule file, a numbered design archive, and a habit of writing the spec before asking the model to scaffold. Without that scaffolding, even a faster model just produces 70% slop faster. With it, scaffolding eleven microservices in a day is a translation problem, not an authoring one.
Why turn a personal workflow (bkamp) into a public plugin (bkit)?
Because the workflow that made bkamp possible was not in the code — it was in the conventions, the document numbering, the daily PDCA cadence, and the checkpoint-then-rewrite habit. None of that travels with a Git repo. bkit packages it as Claude Code Skills, Agents, Hooks, an MCP server, and a state machine, so the next person doesn't have to re-derive it from scratch.
Does bkit only work with Claude Code?
Today, yes — the harness is built on CC's plugin surface (Skills, Agents, Hooks, MCP). The patterns it encodes — Plan/Design/Do/Check/Act, Match Rate gates, trust-graduated automation, port/adapter purity — are model-agnostic. A Codex- or Cursor-shaped fork would be feasible but does not exist yet.
Is 'vibe coding' just hype?
Read literally — typing prompts and accepting whatever comes back — yes, it is hype, and it produces brittle code. Read as a discipline — context engineering, design-first, AI-evaluator loops, automatic checkpoints — it is just engineering with a new tool in the loop. The 9-day bkamp launch is empirical evidence that the disciplined version works. The seven patterns below are the rules of the disciplined version.
Related reading: