Field Guide · May 20, 2026 · Last updated 2026-05-21 · 15 min read
Vibe-Coded Repo Cleanup With Claude Code or Codex

Questions this page answers
- How do I clean up a vibe-coded repo without rewriting everything?
- What should Claude Code or Codex verify before opening a cleanup PR?
- Which repo signals tell me an AI-generated codebase is unsafe to ship?
- Why does long-running cleanup work need a persistent development host?
Start here
The Cleanup Order That Actually Works
| Phase | Goal | Agent job |
|---|---|---|
| Freeze | Stop behavior drift before refactoring. | Document current flows, routes, env vars, and known failures. |
| Prove | Create tests around the paths users rely on. | Add smoke tests, fixtures, and golden-path scripts. |
| Map | Find dead files, duplicate models, circular imports, and unsafe state. | Generate a risk map and propose small PR boundaries. |
| Cut | Remove waste only after tests can catch regressions. | Delete unreachable paths, merge duplicate utilities, and simplify config. |
| Harden | Make the repo safe for the next agent. | Update AGENTS.md, docs, runbooks, and CI checks. |
The practical rule
Why AI-Written Repos Rot So Quickly
Vibe coding rewards local progress. A generated feature compiles, the demo works, and nobody asks whether the new helper duplicated an older helper, skipped the migration path, or hid a state bug behind a happy-path browser session.
- Parallel abstractions appear because each prompt sees a partial repo.
- Tests are added around generated behavior instead of business behavior.
- Old files stay alive because nobody knows which imports are safe to remove.
- Env vars, background jobs, and auth callbacks drift away from docs.
- Agents keep patching symptoms because the repo has no shared operating rules.
Run A First-Pass Repo Triage
Give the agent a narrow inspection prompt before allowing edits. The first output should be a map, not a patch.
Inspect this repo without editing files.
Return:
1. Entry points and user-visible workflows
2. Build, lint, typecheck, test, and e2e commands
3. Duplicated concepts or modules
4. Dead files or unreachable paths
5. State, auth, billing, and data migration risks
6. Three smallest cleanup PRs with rollback plansThat prompt keeps Claude Code or Codex in investigator mode. Once the map is useful, convert each cleanup into a bounded task with a test requirement and a rollback note.
Use A Cleanup PR Template Agents Can Follow
## Cleanup claim
This PR removes or simplifies: ...
## Proof
- [ ] Existing behavior covered by tests
- [ ] New regression test added when behavior was unclear
- [ ] Lint/typecheck/build run
- [ ] Manual smoke path checked
## Risk
- Files touched:
- State or data migration:
- Rollback:
## Agent notes
- What was intentionally left alone:
- What should be cleaned next:This is where an always-on workspace helps. Long cleanup passes need repeated test runs, dependency installs, browser smoke tests, and review loops. If the machine sleeps or loses terminal state, the agent loses the thread.
What To Delete First
| Candidate | Delete when | Do not delete when |
|---|---|---|
| Unused files | Static analysis, tests, and grep agree they are unreachable. | They are generated, dynamically imported, or used by deployment scripts. |
| Duplicate helpers | One helper can absorb call sites without changing behavior. | They encode different auth, billing, or data assumptions. |
| Old migrations | Production state has already crossed the boundary. | Local dev, tests, or customer tenants still need them. |
| Generated docs | They contradict working commands or shipped behavior. | They capture a decision the team still relies on. |
Where Hyperbox Fits
Hyperbox is useful when cleanup becomes an operating loop: an agent runs tests, waits for review, updates docs, retries failed checks, and keeps repo state warm across hours or days. The Mac is not magic. It is the stable place where the work keeps living.
- Keep dependency caches, browser sessions, local services, and repo state warm.
- Run Claude Code, Codex, Cursor, and smoke-test browsers from one persistent workspace.
- Use SSH for command work and VNC when cleanup requires visual inspection.
- Preserve logs and terminal history after your laptop closes.
Frequently asked questions
Should I rewrite a vibe-coded repo from scratch?
Usually no. First freeze behavior, map dead paths, add tests around the paths that matter, then refactor in small PRs. Rewrite only when the dependency graph, data model, or security model cannot be recovered safely.
Can Claude Code or Codex clean up AI-generated code?
Yes, if you give the agent a bounded runbook, tests, repo context, and a persistent workspace. Treat the agent as a refactoring operator, not as a one-shot prompt box.
Why use an always-on Mac for repo cleanup?
Cleanup runs often span hours of tests, migrations, browser checks, and review passes. An always-on Mac keeps the repo, caches, credentials, logs, and agent sessions available after your laptop sleeps.
Related reading
Always-on Mac runtime
Give your agent a Mac that stays online after your laptop closes.
Hyperbox gives Codex, Claude Code, OpenClaw, and remote dev workflows a persistent macOS machine with SSH, VNC, and full desktop access.