Field Guide · May 20, 2026 · Last updated 2026-05-21 · 15 min read

Vibe-Coded Repo Cleanup With Claude Code or Codex

The scariest vibe-coded repo is not the one that looks broken. It is the one that works just enough for people to keep building on top of it. The first cleanup job is not to make it elegant. It is to make the repo observable, testable, and safe enough that Claude Code or Codex can help you repair it without inventing a second codebase beside the first one.

Hyperbox code review dashboard cleaning up an AI-generated repo — Treat vibe-coded cleanup like incident response: freeze behavior, map risk, add tests, then let agents work in bounded PRs.

Questions this page answers

How do I clean up a vibe-coded repo without rewriting everything?
What should Claude Code or Codex verify before opening a cleanup PR?
Which repo signals tell me an AI-generated codebase is unsafe to ship?
Why does long-running cleanup work need a persistent development host?

Start here

The Cleanup Order That Actually Works

Phase	Goal	Agent job
Freeze	Stop behavior drift before refactoring.	Document current flows, routes, env vars, and known failures.
Prove	Create tests around the paths users rely on.	Add smoke tests, fixtures, and golden-path scripts.
Map	Find dead files, duplicate models, circular imports, and unsafe state.	Generate a risk map and propose small PR boundaries.
Cut	Remove waste only after tests can catch regressions.	Delete unreachable paths, merge duplicate utilities, and simplify config.
Harden	Make the repo safe for the next agent.	Update AGENTS.md, docs, runbooks, and CI checks.

The practical rule

Do not ask an agent to clean the whole repo. Ask it to make one claim true, prove the claim with tests, then open a reviewable PR.

Why AI-Written Repos Rot So Quickly

Vibe coding rewards local progress. A generated feature compiles, the demo works, and nobody asks whether the new helper duplicated an older helper, skipped the migration path, or hid a state bug behind a happy-path browser session.

Parallel abstractions appear because each prompt sees a partial repo.
Tests are added around generated behavior instead of business behavior.
Old files stay alive because nobody knows which imports are safe to remove.
Env vars, background jobs, and auth callbacks drift away from docs.
Agents keep patching symptoms because the repo has no shared operating rules.

Run A First-Pass Repo Triage

Give the agent a narrow inspection prompt before allowing edits. The first output should be a map, not a patch.

Inspect this repo without editing files.

Return:
1. Entry points and user-visible workflows
2. Build, lint, typecheck, test, and e2e commands
3. Duplicated concepts or modules
4. Dead files or unreachable paths
5. State, auth, billing, and data migration risks
6. Three smallest cleanup PRs with rollback plans

That prompt keeps Claude Code or Codex in investigator mode. Once the map is useful, convert each cleanup into a bounded task with a test requirement and a rollback note.

Use A Cleanup PR Template Agents Can Follow

## Cleanup claim
This PR removes or simplifies: ...

## Proof
- [ ] Existing behavior covered by tests
- [ ] New regression test added when behavior was unclear
- [ ] Lint/typecheck/build run
- [ ] Manual smoke path checked

## Risk
- Files touched:
- State or data migration:
- Rollback:

## Agent notes
- What was intentionally left alone:
- What should be cleaned next:

This is where an always-on workspace helps. Long cleanup passes need repeated test runs, dependency installs, browser smoke tests, and review loops. If the machine sleeps or loses terminal state, the agent loses the thread.

What To Delete First

Candidate	Delete when	Do not delete when
Unused files	Static analysis, tests, and grep agree they are unreachable.	They are generated, dynamically imported, or used by deployment scripts.
Duplicate helpers	One helper can absorb call sites without changing behavior.	They encode different auth, billing, or data assumptions.
Old migrations	Production state has already crossed the boundary.	Local dev, tests, or customer tenants still need them.
Generated docs	They contradict working commands or shipped behavior.	They capture a decision the team still relies on.

Where Hyperbox Fits

Hyperbox is useful when cleanup becomes an operating loop: an agent runs tests, waits for review, updates docs, retries failed checks, and keeps repo state warm across hours or days. The Mac is not magic. It is the stable place where the work keeps living.

Keep dependency caches, browser sessions, local services, and repo state warm.
Run Claude Code, Codex, Cursor, and smoke-test browsers from one persistent workspace.
Use SSH for command work and VNC when cleanup requires visual inspection.
Preserve logs and terminal history after your laptop closes.

Frequently asked questions

Should I rewrite a vibe-coded repo from scratch?

Usually no. First freeze behavior, map dead paths, add tests around the paths that matter, then refactor in small PRs. Rewrite only when the dependency graph, data model, or security model cannot be recovered safely.

Can Claude Code or Codex clean up AI-generated code?

Yes, if you give the agent a bounded runbook, tests, repo context, and a persistent workspace. Treat the agent as a refactoring operator, not as a one-shot prompt box.

Why use an always-on Mac for repo cleanup?

Cleanup runs often span hours of tests, migrations, browser checks, and review passes. An always-on Mac keeps the repo, caches, credentials, logs, and agent sessions available after your laptop sleeps.

Always-on Mac runtime

Give your agent a Mac that stays online after your laptop closes.

Hyperbox gives Codex, Claude Code, OpenClaw, and remote dev workflows a persistent macOS machine with SSH, VNC, and full desktop access.

Start with Hyperbox