Recovering specifications from existing code
Most codebases already contain specs.
They just don't look like specs.
They look like CLI commands, tests, database migrations, model defaults, parser errors, file layouts, fixtures, and weird edge cases somebody fixed six months ago and forgot to document.
Coherence starts with three primitives:
Spec = promise the system makes
AC = falsifiable claim under that promise
Evidence = executable proof linked to the claim
That's already how the bootstrap README frames the model: specs describe promises, acceptance criteria make those promises falsifiable, and evidence connects claims to executable verification. (GitHub)
The interesting question is what happens when the spec didn't come first.
What if the code already exists?
Can we recover the spec tree from the implementation?
That is what coherence-bootstrap does to itself.
Not by generating pretty documentation.
By routing behavior.
The problem
Imagine a small CLI command:
coherence-bootstrap ac add \
--spec-id SPEC-demo-greeting \
--title "Prints greeting"
From the outside, this looks simple.
But this command already implies several claims:
The command accepts an existing spec id.
The operator must provide a title.
The operator may omit intent.
The operator may provide a stable slug.
The command returns the created AC identity.
The command rejects unknown spec ids.
Those are not implementation details.
They are product behavior.
If an agent changes the CLI and breaks one of them, a user will notice.
So we can recover this:
SPEC product/cli/ac
AC ac-add-requires-existing-spec
AC ac-add-requires-title
AC ac-add-allows-empty-intent
AC ac-add-allows-stable-slug
AC ac-add-returns-created-identity
This is already much more useful than:
src/main.rs has an ac add command
That sentence describes code.
The recovered ACs describe promises.
The naive solution fails
The obvious move is:
Let's scan the repo and document everything.
That produces garbage.
For example, suppose the codebase has this test helper:
pub struct TestWorld {
repo_dir: TempDir,
dolt_port: u16,
env: TestEnv,
}
impl TestWorld {
pub fn new() -> Self {
// create isolated repo
// start temporary Dolt server
// run migrations
}
}
A naive documentation generator might produce:
The system has a TestWorld helper.
TestWorld creates a temporary repository.
TestWorld starts a Dolt server.
TestWorld runs migrations.
Technically true.
Mostly useless.
The user does not care that TestWorld exists.
The product does not promise TestWorld.
This should not become a product AC.
At most, it is evidence for another claim:
SYSTEM system/process/test-isolation
AC tests run against isolated project catalogs
verified_by tests/support/test_world.rs
That's the key move.
Not everything discovered in code becomes an acceptance criterion.
Some things become evidence.
Some are demoted.
Some move lower.
Some are ignored as accidental structure.
The bootstrap recovery explicitly uses this routing discipline: build inventory, convert it into a ledger, apply the taxonomy, group findings into specs, promote contract-level claims, and mark implementation-only details as evidence-only or demoted. (GitHub)
The actual recovery pipeline
The pipeline looks like this:
codebase
↓
inventory
↓
candidate ledger
↓
taxonomy
↓
routing decision
↓
spec tree
↓
AC catalog
↓
evidence links
A ledger row might start like this:
source: tests/cli_ac_add.rs
observed: ac add rejects unknown spec id
surface: CLI
Then Coherence asks:
Is this user-visible?
Is this a system process?
Is this a component contract?
Is this a foundation invariant?
Is this only evidence?
Is this accidental?
That produces a routed row:
final_spec: product/cli/ac
group: ac-add
action: promoted_to_ac
reason: AC authoring UX belongs to product/cli/ac
Now we have an actual acceptance criterion:
{
"spec_slug": "product/cli/ac",
"ac_slug": "ac-add-rejects-unknown-spec",
"title": "AC creation rejects unknown specs",
"intent": "When the operator creates an AC for a missing spec, the CLI reports a not-found error and does not create the AC.",
"review_mode": "automated",
"risk_level": "medium"
}
And we can attach evidence:
{
"ac_slug": "ac-add-rejects-unknown-spec",
"verified_by": "cargo test -p coherence-bootstrap cli_ac_add_rejects_unknown_spec"
}
This is the difference between docs and recovered specs.
Docs say:
There is a command.
Recovered specs say:
This command promises this behavior, and this test currently proves it.
Example 1: product behavior
Start from a CLI test:
#[test]
fn ac_add_rejects_unknown_spec() {
cmd()
.args([
"ac", "add",
"--spec-id", "SPEC-does-not-exist",
"--title", "Prints greeting",
])
.assert()
.failure()
.stderr(contains("spec not found"));
}
A static scanner might say:
There is a test named ac_add_rejects_unknown_spec.
That's not enough.
The recovered claim is:
SPEC product/cli/ac
AC ac-add-rejects-unknown-spec
intent: Creating an AC for a missing spec fails with a not-found error.
evidence: cargo test ac_add_rejects_unknown_spec
Why product/cli/ac?
Because this is operator-facing CLI behavior.
It is not a repository invariant.
It is not a database invariant.
It is not a parser detail.
The user ran a command and got a promise.
So the claim belongs at the product level.
Example 2: foundation behavior
Now take a different observation:
#[test]
fn new_ac_defaults_to_manual_review() {
let ac = AcceptanceCriterion::new("SPEC-demo", "prints-message");
assert_eq!(ac.review_mode, ReviewMode::Manual);
assert_eq!(ac.risk_level, RiskLevel::Medium);
}
This is not product UX.
The CLI may expose this behavior, but the real promise belongs lower.
Recovered claim:
SPEC foundation/domain/model/acceptance-criteria
AC new-acs-have-default-review-mode
intent: New ACs default to manual review when no review mode is provided.
AC new-acs-have-default-risk-level
intent: New ACs default to medium risk when no risk level is provided.
This exact kind of routing appears in the bootstrap ledger: rows about AC review mode and risk level are routed to the foundation domain model, not to the product CLI. (GitHub)
That decision matters.
If the product CLI spec repeats every model invariant, the graph becomes noisy.
Higher-level specs should not re-verify lower-level invariants. The README states this as a rule directly. (GitHub)
So the CLI spec can say:
ac add creates an AC
But the foundation spec owns:
new ACs default to Manual review
new ACs default to Medium risk
That's cleaner.
Example 3: moved lower
Here is a subtle one.
Suppose the CLI lets the operator omit a slug:
coherence-bootstrap ac add \
--spec-id SPEC-demo-greeting \
--title "Prints message"
The system generates an identity:
AC-demo-greeting-prints-message
At first glance, this looks like product behavior.
The user sees the generated id.
But the actual rule is deeper:
AC identity generation belongs to the AC model lifecycle.
So the routing decision is:
original: product/ac-authoring
observed: CLI can generate an AC id when slug is omitted
action: moved_to_lower_level
final: foundation/domain/model/acceptance-criteria
reason: ID generation is part of AC model lifecycle
The bootstrap routing table has this exact shape: "CLI can generate an AC id when the operator does not provide…" is moved lower because identity generation belongs to the AC model lifecycle. (GitHub)
This is the whole point.
The goal is not to ask:
Where did we notice this behavior?
The goal is to ask:
Who owns this promise?
The CLI may reveal identity generation.
The domain model owns identity generation.
Example 4: evidence-only
Now take test infrastructure.
#[test]
fn project_env_selection_uses_isolated_catalog() {
let world = TestWorld::new()
.with_project_slug("demo-app")
.with_env("test");
world.run("coherence-bootstrap spec list")
.assert()
.success();
}
There is real behavior here.
But not all of it should become ACs.
Bad recovery:
PRODUCT
AC TestWorld creates a temporary project
AC TestWorld configures environment variables
AC TestWorld starts Dolt
Better recovery:
SYSTEM system/process/test-isolation
AC verification uses isolated project catalogs
evidence:
tests/project_env_selection.rs
tests/support/test_world.rs
The helper stays evidence.
The claim is about isolation.
The bootstrap spec tree calls out system/test/world as evidence-only rows, not ACs in the catalog. (GitHub)
That is a good result.
It means the recovery process did not blindly turn every helper into a promise.
The taxonomy
The recovery needs levels.
Otherwise every finding becomes a flat pile.
The bootstrap catalog uses:
FOUNDATION → domain models + infrastructure contracts
MODULE → bounded capabilities using foundation models
COMPONENT → concrete adapters
SYSTEM → end-to-end processes
PRODUCT → user-facing surfaces
The README describes the same five-level taxonomy and explicitly separates product surfaces, system processes, concrete adapters, bounded modules, and foundation contracts. (GitHub)
This gives routing rules:
CLI output? → PRODUCT
End-to-end workflow? → SYSTEM
Parser/router/repo? → COMPONENT
Bounded capability? → MODULE
Domain model / DB rule? → FOUNDATION
Test helper? → EVIDENCE
Accidental structure? → DEMOTED
So a raw inventory can become a reviewable graph.
Example:
raw finding:
verify-spec prints per-AC outcome
routing:
level: PRODUCT
final_spec: product/cli/verify
group: verify-cli
action: promoted_to_ac
recovered AC:
verify-spec surfaces per-AC outcome within the spec
The bootstrap routing table ends with product verification rows like verify-spec surfaces per-AC outcome within the spec and verification output structure is consistent..., routed to product/cli/verify. (GitHub)
That's concrete.
A user runs verify-spec.
The product promises useful output.
The AC captures the promise.
The evidence proves it.
What the recovered tree looks like
After routing, the tree is no longer "files and functions".
It becomes behavior:
PRODUCT
product/cli/spec
spec-add
spec-list-show
product/cli/ac
ac-add
ac-list-show
product/cli/verify
verify-ac
verify-spec
product/cli/ac-tests
materialize-check
test-file-layout
product/tui/navigation
product/tui/editing
product/tui/verification
SYSTEM
system/process/ac-authoring
system/process/spec-authoring
system/process/verification
system/process/evidence-capture
system/process/test-isolation
COMPONENT
component/cli/parser
component/cli/router
component/repository/spec-store
FOUNDATION
foundation/domain/model/specs
foundation/domain/model/acceptance-criteria
foundation/infra/dolt/catalog-naming
foundation/infra/dolt/migrations
foundation/infra/filesystem/project-manifest
The current README lists the final phases in this shape: foundation specs first, then component specs, system process specs, and product CLI/TUI specs. (GitHub)
Now review becomes possible.
Not easy.
Possible.
Instead of reading every line of code, a reviewer can ask:
Is this behavior real?
Is this claim worded correctly?
Is it at the right level?
Is the linked test actually proving it?
Did we accidentally promote implementation detail into product promise?
That is much sharper than:
Does this generated documentation look okay?
The bootstrap result
The current recovered bootstrap catalog has:
252 ledger rows
219 promoted ACs
27 evidence-only rows
4 demoted rows
2 moved to lower level
28 final specs
The README reports those numbers directly and describes the goal as exhaustive routing where no row disappears. (GitHub)
That is the holy-shit part.
Not because 219 ACs is a magical number.
Because the system found 252 pieces of behavioral evidence and did not treat them all the same.
This is what the split means:
promoted_to_ac
This is a real claim the system should continue to satisfy.
evidence_only
This supports another claim, but is not itself a promise.
demoted
This was too implementation-specific or too low-level.
moved_to_lower_level
This was noticed higher up, but owned lower down.
That is the difference between "generate docs" and "recover intent".
Why this matters for agents
Agents are very good at producing code.
That is the problem.
A human can review a small patch by intuition.
But when an agent produces a large coherent-looking diff, the hard question is not:
Does this compile?
The hard question is:
Which promises did this change touch?
Without a recovered spec graph, the agent has to infer that from the whole repo.
That means the agent reads code, tests, names, comments, previous decisions, file layout, and maybe a README. Then it guesses the intent.
Sometimes the guess is good.
Sometimes the guess is cursed.
With Coherence, the agent gets a slice:
Change request:
Improve verify-spec output.
Relevant spec slice:
product/cli/verify
AC verify-spec-accepts-id
AC verify-spec-reports-aggregate-counts
AC verify-spec-surfaces-per-ac-outcome
AC verification-output-is-consistent
system/process/verification
AC verification-aggregates-link-results
AC no-evidence-is-reported-clearly
foundation/domain/model/ac-verification-latest
AC latest-result-is-stored-per-ac
Now the agent can work against known claims.
The human reviews whether the claims changed.
The test runner verifies linked evidence.
The important part is not that the agent has more context.
The important part is that the context is routed.
Pull request review changes
A normal PR review asks:
Did the code change look sane?
Did tests pass?
Did the agent break anything obvious?
A Coherence-style review asks:
Which ACs changed?
Which evidence links changed?
Did new behavior get a new claim?
Did removed behavior delete or deprecate a claim?
Did a product claim accidentally depend on a foundation invariant?
Did a test still verify the claim it says it verifies?
Example PR note:
This patch changes verify-spec output.
Affected ACs:
product/cli/verify.verify-spec-reports-aggregate-counts
product/cli/verify.verify-spec-surfaces-per-ac-outcome
Evidence updated:
cargo test verify_spec_reports_aggregate_counts
cargo test verify_spec_surfaces_per_ac_outcome
No foundation claims changed.
No product claims were added.
That is a review surface.
A reviewer can actually attack it.
They can say:
No, this also changes no-links behavior.
Add/update the AC for that.
That is exactly the kind of correction agents need.
Not vague "be careful".
A concrete missing claim.
What Coherence adds
Coherence does not replace tests.
It does not replace docs.
It does not replace code review.
It adds a durable relationship:
SPEC
has AC
implemented_by code
verified_by test
constrained_by another spec
The README describes this as a graph of requirements where specs relate to each other and ACs connect outcomes to implementation and executable evidence. (GitHub)
The test stays normal Rust:
#[test]
fn verify_spec_surfaces_per_ac_outcome() {
let world = TestWorld::new();
world.seed_spec("SPEC-demo");
world.seed_ac("AC-one");
world.seed_ac("AC-two");
world.run("coherence-bootstrap verify-spec SPEC-demo")
.assert()
.success()
.stdout(contains("AC-one"))
.stdout(contains("AC-two"))
.stdout(contains("OVERALL"));
}
Coherence records why that test matters:
{
"spec_slug": "product/cli/verify",
"ac_slug": "verify-spec-surfaces-per-ac-outcome",
"verified_by": "cargo test verify_spec_surfaces_per_ac_outcome"
}
That link is the trust boundary.
A passing test alone is not enough.
Someone must confirm that this test actually verifies that AC.
After that, automation can keep checking it.
The point
Reverse-spec recovery is not:
read code → generate docs
It is:
read code
→ discover behavioral evidence
→ route each finding
→ promote only real claims
→ keep helpers as evidence
→ move lower-level invariants down
→ produce a reviewable spec tree
That is why this matters.
Most teams do not have clean specs.
They have working code.
They have tests.
They have production behavior.
They have old decisions buried in implementation.
Coherence gives you a way to recover a map from that mess.
Not perfectly.
Not magically.
But concretely.
A recovered spec is a claim you can review:
Is this true?
Should this remain true?
Is this at the right level?
What proves it?
That is already a better object than a giant diff.
And once the map exists, both humans and agents can stop pretending the only source of truth is "read the whole repo and vibe-check it".
Code is not enough.
Tests are not enough.
Docs are not enough.
The durable object is the relationship:
promise
→ claim
→ implementation
→ evidence
Reverse-spec recovery is how you build that relationship when the code came first.