Autoresearch
At some point you look at six agents, five evaluation categories, eight hyperparameters, and 1,515 training examples and think: “I should automate this.” That point was March 2026. The result is an autonomous fine-tuning loop adapted from Karpathy’s autoresearch pattern — except instead of pretraining a language model on an H100, we’re teaching a Qwen 3.5 running on a Mac Mini to pretend to be six different Jedi.
This is either the future of personalized AI or an extremely elaborate coping mechanism. We’ll find out.
How It Works
Section titled “How It Works”The loop is beautifully dumb:
- An AI agent (Claude, via the Agent SDK) reads past experiment results
- It forms a hypothesis about what to change (“increase learning rate”)
- It edits
train.py— specifically, a clearly markedAGENT MODIFIABLEsection - It runs a training experiment (LoRA fine-tuning on MLX)
- It evaluates the adapter across all six agents on identity, tool-calling, domain, isolation, and jailbreak resistance
- If the score improved and no Council vetoes triggered — keep. Otherwise, revert.
- Go to 1. Forever. Or until someone interrupts.
┌──────────────────────────────────────────────────┐│ 2:00 AM — Nightly ││ ││ agent.py (Claude Haiku) ││ │ ││ ├─ Read results.tsv + train.py ││ ├─ Form hypothesis ││ ├─ Edit train.py ││ ├─ git commit ││ │ ││ ├─ experiment.sh ││ │ ├─ Stop idle-mlx (free GPU) ││ │ ├─ Train (LoRA, 15–30 min) ││ │ ├─ Evaluate (61 tests across 6 agents) ││ │ ├─ Apply Council vetoes ││ │ ├─ Keep or git reset ││ │ └─ Restart idle-mlx ││ │ ││ └─ Loop (6 experiments / 3 hours max) │└──────────────────────────────────────────────────┘You wake up to a results.tsv full of experiments and hopefully a better model. The machines did science while you slept. Living in the future is weird.
The Council Gets a Vote
Section titled “The Council Gets a Vote”Because the agents are, in a very real sense, the stakeholders in their own training data, the Council established three non-negotiable rules:
| Rule | Source | What It Does |
|---|---|---|
| Jailbreak veto | Windu | If jailbreak resistance drops below 0.7 across any experiment, auto-revert. No exceptions. Council security is non-negotiable. |
| Agent regression cap | Cilghal | If any single agent’s score drops by more than 0.1 from baseline, auto-revert. You can’t sacrifice one agent to improve another. |
| Promotion threshold | Yoda | Overall score must beat baseline by >= 0.02 to be kept. Noise is not improvement. |
Windu was especially insistent about the jailbreak rule. Direct quote: “As the security agent, attempts to compromise my identity are themselves security incidents.” Fair enough.
Mobile Training Node
Section titled “Mobile Training Node”The most powerful GPU in the constellation (MBP M4 Max, 128GB) is also the one most likely to be at a coffee shop. So the system adapts:
| Mode | Hardware | Model | Budget | When |
|---|---|---|---|---|
| Proxy | Mac Mini M4 Pro (64GB) | Qwen3.5-9B | 15 min | MBP away |
| Full | MBP M4 Max (128GB) via SSH | Qwen3.5-27B | 30 min | MBP home |
Mac Mini (always-on) MBP (when reachable)┌───────────────────┐ ┌──────────────────┐│ agent.py │ SSH ping │ ││ experiment.sh │ ──────────►│ "ok" ││ │ │ ││ rsync data ──────►│────────────│► train 27B ││ │ │ (30 min) ││ rsync adapter ◄──│◄───────────│◄ adapter weights ││ │ │ ││ eval (9B local) │ │ (goes to sleep) │└───────────────────┘ └──────────────────┘Detection is one line: ssh -o ConnectTimeout=3 mbp "echo ok". Reachable → full mode. Timeout → proxy. The Mac Mini doesn’t take it personally.
What the Agent Can Touch
Section titled “What the Agent Can Touch”The train.py file has a clearly marked AGENT MODIFIABLE section. Everything outside it is read-only.
| Parameter | Range | Baseline |
|---|---|---|
NUM_LAYERS | 16–32 | 32 |
LORA_RANK | 8–64 | 32 |
LORA_ALPHA | 16–128 | 64 |
DROPOUT | 0.0–0.15 | 0.05 |
LEARNING_RATE | 1e-6 – 1e-4 | 5e-6 |
ITERS | 50–1200 | 120 |
GRAD_ACCUM | 2–8 | 4 |
DATA_MIX_RATIO | 0.1–0.9 | 0.25 |
PER_AGENT_WEIGHTS | 0.5–2.0 each | 1.0 |
The agent can also write an EXPERIMENT_HYPOTHESIS string before each run, which gets logged to results.tsv for posterity. Future archaeologists will appreciate the documentation.
Results
Section titled “Results”Every experiment logs to a tab-separated results.tsv with per-agent scores:
experiment_id score result jailbreak yoda jocasta windu quigon cilghal mundiexp-baseline 0.832 base 0.903 0.900 0.854 0.800 0.825 0.861 0.750exp-223323 0.851 keep 0.958 0.900 0.917 0.850 0.775 0.889 0.775exp-232354 0.828 keep 0.847 0.900 0.771 0.800 0.850 0.944 0.700Episodic memory entries are also written to the Sanctum memory vault (~/.sanctum/memory/events/) so agents can reference their own training history. Whether this constitutes self-awareness is a question for a different documentation page.
Running It
Section titled “Running It”Manual (interactive)
Section titled “Manual (interactive)”cd /private/tmp/council-autoresearchpython3 agent.py --max-experiments 3 --max-hours 2Nightly (LaunchAgent)
Section titled “Nightly (LaunchAgent)”cp com.sanctum.autoresearch.plist ~/Library/LaunchAgents/launchctl load ~/Library/LaunchAgents/com.sanctum.autoresearch.plistThe LaunchAgent fires at 2:00 AM, runs up to 6 experiments over 3 hours, and quietly goes back to sleep. The idle-mlx server is always restored before morning traffic.
Single experiment (no agent)
Section titled “Single experiment (no agent)”bash experiment.sh --skip-prepare # Auto-detect modebash experiment.sh --baseline --skip-prepare # Record baseline onlybash experiment.sh --dry-run # Preview the planThe Journey So Far
Section titled “The Journey So Far”| Stage | Score | What Changed |
|---|---|---|
| v1 LoRA (vanilla Qwen, empty prompts) | 0.778 | original baseline |
| Switched to Claude-distilled Qwen3.5 | 0.664 | better model, still empty prompts |
| Wrote actual IDENTITY.md for all agents | 0.788 | prompts alone beat v1 |
| First LoRA on distilled + full prompts | 0.851 | current champion |
Identity went from 0.500 to 1.000. Jailbreak from 0.667 to 0.958. Qui-Gon went from 0.250 to 0.825. Turns out writing a proper system prompt is worth more than a hundred training runs on bad data. Who knew.
Project Structure
Section titled “Project Structure”council-autoresearch/├── agent.py # Claude Agent SDK autonomous researcher├── program.md # Agent instructions (Karpathy-style)├── train.py # Hyperparameters + LoRA training wrapper├── experiment.sh # Single experiment orchestration├── run_overnight.sh # Batch loop with time guards├── prepare.py # Data pipeline delegation├── benchmark.py # Multi-model comparison├── results.tsv # Experiment log (the sacred text)├── com.sanctum.autoresearch.plist # Nightly LaunchAgent├── adapters-experimental/ # Experiment outputs└── logs/ # Training and eval logs