PROTHON — docs-first AI-assisted Python development

01Modern Python Scaffolding

One command. Full toolchain. Same quality bar for human and AI code. 8 quality tools on every commit. copier update pulls upstream improvements without losing local changes.

poe check — single gate for hooks, CI, and AI

8 quality tools: ruff, ty, pytest, hypothesis, mutmut, bandit, vulture, complexipy

uv, pyproject.toml-only, src/ layout, py.typed marker

copier update for upstream template improvements

single quality gate

pre-commitCIAI

↓

poe check

↓

rufftypytesthypothesismutmutbanditvulturecomplexipy

prothon new my-project

my-project/
├── pyproject.toml          # uv, poethepoet, all tool config
├── .python-version
├── .gitignore
├── .pre-commit-config.yaml  # 8 quality tools
├── AGENTS.md               # canonical AI instructions
├── CLAUDE.md → symlink to AGENTS.md
├── docs/
│   ├── SPEC.md              # requirements (highest authority)
│   ├── DESIGN.md            # architecture (traces to SPEC)
│   └── PATTERNS.md          # conventions (can't contradict)
├── src/my_project/
│   ├── __init__.py
│   └── py.typed              # PEP 561 marker
└── tests/

02Hierarchical Documentation

Three documents with strict authority — higher overrides lower. A dedicated agent walks you through each, presenting trade-offs while you decide. Each hard-rejects content belonging at a different level.

Why three levels? Start broad, narrow down. SPEC defines what, not how. DESIGN chooses technologies, not code. PATTERNS specifies code, not architecture. Broader levels are more authoritative but less specific — a design discussion can't silently rewrite a requirement, and a convention can't contradict an architectural choice.

Skills hard-reject content from other levels

SPEC change triggers DESIGN, then PATTERNS review

Conflicts resolve at doc level before code

After design or patterns, doc-harmonizer cross-references all three levels

authority chain

SPEC.md highest

DESIGN.md traces up

PATTERNS.md no conflict

change cascades top-down

03Design Workflow

Seven CLI commands. new and init scaffold the project. The rest are preconfigured Claude agent prompts — each scoped to a single concern, each producing a versioned artifact in the repo.

prothon new — scaffold a fresh project from the template

prothon init — add prothon to an existing project

prothon spec — preconfigured Claude agent prompt: "What are you building, who is it for, and why?"

prothon design — preconfigured Claude agent prompt: researches tech, presents trade-offs

prothon patterns — preconfigured Claude agent prompt: code style, testing, conventions

prothon execute — preconfigured Claude agent prompt: fresh subagents, verifies promises

prothon compliance — preconfigured Claude agent prompt: evidence tables, code vs docs

new / init

bootstrap

▸

spec

what & why

▸

design

architecture

▸

patterns

code style

▸

execute

build it

▸

compliance

verify it

Not generators — guided conversations. Each skill presents one decision at a time and waits for your approval. Skills refuse content at the wrong level — spec won't discuss technology, design won't write code, patterns won't add requirements.

04Drift Detection & Reconciliation

Three automated agents guard consistency. doc-harmonizer keeps docs aligned with each other. compliance-checker keeps code aligned with docs. tech-researcher generates reference material after design decisions.

doc-harmonizer — doc ↔ doc

Fires automatically after design and patterns commands

Catches contradictions, scope creep, unchosen tech between doc levels

Amends lower docs autonomously — higher doc wins mechanically. SPEC never touched.

compliance-checker — code ↔ doc

Mandated before any work is claimed complete; also runs via prothon compliance

Reads every checkable statement from docs and verifies code implements it

Produces PASS/FAIL/PARTIAL tables with file:line evidence and prioritized action items

tech-researcher — generates skills

Fires after design decisions — queries Context7 live docs, web search, training knowledge

Outputs reference skills (tech-*, style-*, optim-*, domain-*) auto-loaded during execution

automated agents

doc-harmonizer

doc ↔ doc

after design / patterns

compliance-checker

code ↔ doc

before work is claimed complete

tech-researcher

generates reference skills

after design

SPEC never amended

05Skills Collection

After DESIGN is written, tech-researcher generates reference skills for your exact stack. Queries Context7 live docs, falls back to web search, then training knowledge. Current material, not generic training data.

tech-* — library usage, idioms, gotchas, version-specific APIs

style-* — naming conventions, import organization, type annotations

optim-* — performance patterns, GPU batching, subprocess management

domain-* — field-specific concepts: geospatial, ML, finance, etc.

Auto-loaded during execution — no manual context switching

Example: ML + geospatial project

.agents/skills/

.agents/skills/
├── tech-pytorch.md
├── tech-fastapi.md
├── tech-polars.md
├── style-python.md
├── optim-gpu.md
└── domain-geospatial.md

research pipeline

DESIGN.md

↓

Context7web

↓

tech-*style-*optim-*domain-*

auto-loaded during execute

06Execution Promises

Before execution starts, the planner writes change_promise.toml — a contract that declares exactly what each task will produce. This turns open-ended code generation into a bounded, verifiable process.

The core problem: context pollution. As context fills, agent behaviour diverges from intent. The promise captures what was planned before drift set in. Fresh subagents prevent pollution between tasks. Together they make drift measurable — not just "did it work" but "did it do what it said it would do."

Files to create, modify, remove — declared upfront

Line predictions force thinking through scope

Checked against git with ±30% or ±30 lines tolerance

3 attempts per task, fresh context each

1Plan

Read all docs + skills

Scan codebase gaps

Write promise file

2Execute

Fresh subagent per task

Implement → check → commit

Verify promise (3 retries)

3Verify

compliance-checker audit

Full docs vs code

Cleanup promise file

Why line predictions? Requiring the AI to predict line counts forces thoughtful scoping. If it predicts 50 lines but writes 300, either the plan was sloppy or execution went sideways.

Example

docs/change_promise.toml

[metadata]
base_commit = "a3f2c1b"

[[tasks]]
title = "Add auth handler"
goal = "Implement JWT auth"
success_criteria = "Tests pass"
files_to_create = ["src/auth/handler.py"]
files_to_modify = ["src/__init__.py"]
files_to_remove = []
expected_lines_added = 85
expected_lines_removed = 0
context_files = ["src/config.py"]
doc_sections = ["docs/DESIGN.md#auth"]
reference_skills = ["tech-pyjwt"]
dependencies = []
completed = false
attempts = 0

[[tasks]]
title = "Add auth tests"
goal = "Test auth flows"
success_criteria = "100% coverage"
files_to_create = ["tests/test_auth.py"]
files_to_modify = []
files_to_remove = []
expected_lines_added = 120
expected_lines_removed = 0
context_files = ["src/auth/handler.py"]
doc_sections = []
reference_skills = []
dependencies = [0]
completed = false
attempts = 0