01Modern Python Scaffolding
One command. Full toolchain. Same quality bar for human and AI code. 8 quality tools on every commit. copier update pulls upstream improvements without losing local changes.
poe check — single gate for hooks, CI, and AIcopier update for upstream template improvementsmy-project/ ├── pyproject.toml # uv, poethepoet, all tool config ├── .python-version ├── .gitignore ├── .pre-commit-config.yaml # 8 quality tools ├── AGENTS.md # canonical AI instructions ├── CLAUDE.md → symlink to AGENTS.md ├── docs/ │ ├── SPEC.md # requirements (highest authority) │ ├── DESIGN.md # architecture (traces to SPEC) │ └── PATTERNS.md # conventions (can't contradict) ├── src/my_project/ │ ├── __init__.py │ └── py.typed # PEP 561 marker └── tests/
02Hierarchical Documentation
Three documents with strict authority — higher overrides lower. A dedicated agent walks you through each, presenting trade-offs while you decide. Each hard-rejects content belonging at a different level.
Why three levels? Start broad, narrow down. SPEC defines what, not how. DESIGN chooses technologies, not code. PATTERNS specifies code, not architecture. Broader levels are more authoritative but less specific — a design discussion can't silently rewrite a requirement, and a convention can't contradict an architectural choice.
design or patterns, doc-harmonizer cross-references all three levels03Design Workflow
Seven CLI commands. new and init scaffold the project. The rest are preconfigured Claude agent prompts — each scoped to a single concern, each producing a versioned artifact in the repo.
prothon new — scaffold a fresh project from the templateprothon init — add prothon to an existing projectprothon spec — preconfigured Claude agent prompt: "What are you building, who is it for, and why?"prothon design — preconfigured Claude agent prompt: researches tech, presents trade-offsprothon patterns — preconfigured Claude agent prompt: code style, testing, conventionsprothon execute — preconfigured Claude agent prompt: fresh subagents, verifies promisesprothon compliance — preconfigured Claude agent prompt: evidence tables, code vs docsNot generators — guided conversations. Each skill presents one decision at a time and waits for your approval. Skills refuse content at the wrong level — spec won't discuss technology, design won't write code, patterns won't add requirements.
04Drift Detection & Reconciliation
Three automated agents guard consistency. doc-harmonizer keeps docs aligned with each other. compliance-checker keeps code aligned with docs. tech-researcher generates reference material after design decisions.
doc-harmonizer — doc ↔ doc
design and patterns commandscompliance-checker — code ↔ doc
prothon compliancefile:line evidence and prioritized action itemstech-researcher — generates skills
tech-*, style-*, optim-*, domain-*) auto-loaded during execution05Skills Collection
After DESIGN is written, tech-researcher generates reference skills for your exact stack. Queries Context7 live docs, falls back to web search, then training knowledge. Current material, not generic training data.
tech-* — library usage, idioms, gotchas, version-specific APIsstyle-* — naming conventions, import organization, type annotationsoptim-* — performance patterns, GPU batching, subprocess managementdomain-* — field-specific concepts: geospatial, ML, finance, etc.Example: ML + geospatial project
.agents/skills/ ├── tech-pytorch.md ├── tech-fastapi.md ├── tech-polars.md ├── style-python.md ├── optim-gpu.md └── domain-geospatial.md
06Execution Promises
Before execution starts, the planner writes change_promise.toml — a contract that declares exactly what each task will produce. This turns open-ended code generation into a bounded, verifiable process.
The core problem: context pollution. As context fills, agent behaviour diverges from intent. The promise captures what was planned before drift set in. Fresh subagents prevent pollution between tasks. Together they make drift measurable — not just "did it work" but "did it do what it said it would do."
Why line predictions? Requiring the AI to predict line counts forces thoughtful scoping. If it predicts 50 lines but writes 300, either the plan was sloppy or execution went sideways.
Example
[metadata] base_commit = "a3f2c1b" [[tasks]] title = "Add auth handler" goal = "Implement JWT auth" success_criteria = "Tests pass" files_to_create = ["src/auth/handler.py"] files_to_modify = ["src/__init__.py"] files_to_remove = [] expected_lines_added = 85 expected_lines_removed = 0 context_files = ["src/config.py"] doc_sections = ["docs/DESIGN.md#auth"] reference_skills = ["tech-pyjwt"] dependencies = [] completed = false attempts = 0 [[tasks]] title = "Add auth tests" goal = "Test auth flows" success_criteria = "100% coverage" files_to_create = ["tests/test_auth.py"] files_to_modify = [] files_to_remove = [] expected_lines_added = 120 expected_lines_removed = 0 context_files = ["src/auth/handler.py"] doc_sections = [] reference_skills = [] dependencies = [0] completed = false attempts = 0