Field Notes · June 1, 2026 · Last updated 2026-06-01 · 12 min read
Vibe Coding 102: Personal AI Infrastructure

Outline
Build a small AI workbench that makes models, sessions, and background jobs easy to control

- Use a desktop agent app as the front door so sessions, repos, browsers, and tasks stay visible.
- Run a local OpenAI-compatible router so agents can call subscriptions, hosted models, and local models through one interface.
- Pick models by job: research, build, review, validate, or run cheap background work.
- Track token usage before running long jobs so retries and thinking budgets do not surprise you.
- Run background workers as explicit terminal jobs with logs, health checks, and handoff notes.
- Keep the router, browser profiles, model caches, uploads, and agent sessions on an always-on Mac.
The point is not to collect every AI tool. The point is to make a repeatable local workbench where a developer can assign work, choose the right model, watch the job run, and recover the state later.
Questions this page answers
- How should a desktop agent app organize repos, chats, browsers, and task state?
- Why run a local OpenAI-compatible router for hosted, subscription, and local models?
- How do model roles, token budgets, and background worker logs make AI coding safer?
- Why does personal AI infrastructure belong on an always-on Mac runtime?
Interface
Use a desktop agent app as the control plane for repos, chats, browsers, and tasks

A terminal chat is fine for one task. It breaks down when you have several repos, several agent sessions, a browser preview, a model picker, and a checklist of unfinished work. A desktop agent app earns its keep by keeping those pieces visible.
The useful interface is not flashy. It should show the project, current branch, chosen model, prompt history, active browser, task list, and validation state. If a job ran overnight, you should be able to reopen it and immediately see what changed and what still needs review.
| Control-plane feature | Why it matters |
|---|---|
| Pinned sessions | You can return to a long-running job without searching terminal history. |
| Project sidebar | Every chat stays attached to the repo or experiment it belongs to. |
| Model picker | You can choose the right model for the job without rewriting the workflow. |
| Browser panel | UI work can be checked in the same place the agent is editing. |
| Task checklist | The handoff is visible before you reread the whole transcript. |
Routing
Run a local OpenAI-compatible router so every agent can use the same model endpoint

Once you use multiple AI products, the messy part is not intelligence. The messy part is auth, model names, rate limits, subscriptions, CLI behavior, local model servers, and usage tracking. A local OpenAI-compatible router turns that mess into one endpoint your tools can call.
The router should expose a model catalog, a health check, provider status, and usage data. Before you start a long agent run, you should know which providers are reachable and what budget the job is allowed to burn.
Router checks before a long run:
GET /healthz returns healthy.
GET /v1/models returns the expected model catalog.
Provider tokens are valid.
Budget limits are visible.
The agent records which model handled each task.Model strategy
Assign models by role instead of sending every task to the same frontier model

A strong workflow does not need one model to do everything. Use one model to research constraints, another to implement inside the repo, another to review assumptions, and another to validate the result in a browser or test suite. The split matters more than the brand names.
This also keeps cost under control. Expensive reasoning models should handle the tasks where reasoning matters. Cheaper or local models can handle summarization, log review, checklist expansion, mechanical cleanup, or slow background experiments.
| Role | Use it for |
|---|---|
| Researcher | Read docs, compare APIs, find constraints, and produce a scoped plan. |
| Builder | Make the first implementation inside a bounded repo area. |
| Reviewer | Find hallucinated assumptions, security issues, missed edge cases, and overbroad changes. |
| Validator | Run tests, browser checks, screenshots, logs, and reproduction steps. |
| Background worker | Cheap long-running tasks where latency is acceptable. |
Workers
Run background agents as named jobs with logs, checks, and handoff notes

The bad version of multi-agent work is several chats editing the same code blindly. The useful version is more disciplined: one worker researches, one builds, one reviews, one validates, and each worker writes down what it changed or learned.
Run these as named jobs. Give each job a repo, branch or worktree, prompt file, expected output, test command, and log path. When the job finishes, it should leave a short handoff note: files changed, checks run, failures, decisions, and next recommended task.
Worker contract:
Name: checkout-validator
Repo: app-web
Branch: agent/checkout-validation
Input: tasks/checkout-browser-check.md
Output: reports/checkout-validation.md
Checks: bun test && browser smoke test
Handoff: files changed, screenshots, failures, next stepCost control
Track token usage before retries, thinking budgets, and background runs get expensive

High-volume AI work can get expensive quietly. A model retries a tool call. A reasoning model spends a large thinking budget. A background worker runs the wrong prompt for an hour. A validation loop keeps opening the same failing browser path. None of that feels expensive until you look at usage.
Put budget checks into the workflow before the run starts. The agent should know the model, the budget, the allowed retry count, the stop condition, and where to write usage notes. This is especially important when several workers run at once.
- Set a max retry count for tool failures.
- Use cheaper models for summaries, log reading, and checklist expansion.
- Reserve expensive models for ambiguous architecture or security decisions.
- Record model name and rough usage in the handoff note.
- Stop the job when the validation target is impossible instead of looping.
Runtime
Keep the router, browser profiles, model caches, and agent sessions on an always-on Mac

This is why the setup belongs on the always-on Mac runtime where agents actually live. The workflow depends on local services, desktop apps, browser profiles, model caches, terminal panes, uploads, logs, and screenshots. Those should not disappear because a laptop sleeps.
| Stack component | Why persistence matters |
|---|---|
| Model router | Agents keep using the same endpoint, provider auth, and model catalog. |
| Desktop agent app | Pinned sessions survive reconnects and context switches. |
| Browser profiles | Cookies, permissions, previews, and test state remain available. |
| Terminal workers | Long jobs can keep running while attention moves elsewhere. |
| Model caches and files | Downloads, logs, reports, and screenshots stay inspectable. |
Frequently asked questions
What is personal AI infrastructure?
Personal AI infrastructure is the developer-owned stack around daily AI work: desktop agent apps, model routing, provider auth, local models, token budgets, browser automation, terminal panes, logs, and a persistent machine to keep it all running.
Why use more than one model for AI coding?
Different models are better at different jobs. One can research, another can implement, another can review hallucination risk, and another can validate the result in a browser or test suite.
Why does this workflow need an always-on Mac?
The workflow depends on persistent desktop apps, browser profiles, local proxy services, terminal panes, model caches, logs, screenshots, and long-running jobs that should survive laptop sleep and reconnects.
Related reading
Always-on Mac runtime
Give your agent a Mac that stays online after your laptop closes.
Hyperbox gives Codex, Claude Code, OpenClaw, and remote dev workflows a persistent macOS machine with SSH, VNC, and full desktop access.