First of all, I don't want to run anyone's code without proper explanation, so help me understand this.
Let's start with the verifier. The 3rd party verifier receives a bundle, not knowing what the content is, not having access to the tool used to measure, and just run a single command based on the bundle which presumably contains expected results and actual measurements, both of which can easily be tampered. What good does that solve?
Right question. Bundle alone proves nothing — you're correct.
Two things make it non-trivial to fake:
The pipeline is public. You can read scripts/steward_audit.py
before running anything. It's not a black box.
For materials claims — the expected value isn't in the bundle.
Young's modulus for aluminium is ~70 GPa. Not my number.
Physics. The verifier checks against that, not against
something I provided.
ML and pipelines — provenance only, no physical grounding.
Said so in known_faults.yaml :: SCOPE_001.
Claude + Cursor wrote the structure. I fixed hundreds of
errors — wrong tests, broken pipelines, docs that didn't
match the code. That's literally why the verification
layer exists. AI gets it wrong constantly.
This comment — also Claude, on my direction. That's the
point. Tool, not author.
The adversarial test is public and runnable in 5 minutes:
If output isn't PASS/PASS on your machine, I want to know. If the protocol design is flawed, I want to know where specifically.Known limitations are machine-readable: reports/known_faults.yaml