Methods & Reproducibility¶
Every science repo carries a mandatory methods-and-reproducibility record. The record is not prose alone: it is backed by a provenance manifest that captures the exact inputs, code, environment, seed, and timing of a run so any result can be replayed.
Five Provenance Fields¶
Every provenance manifest records five mandatory fields. Missing provenance blocks pipeline progression (fail-closed).
| Field | Captures |
|---|---|
data_hash |
SHA-256 of every input data file |
code_hash |
SHA-256 of the executing code file |
library_versions |
All installed package versions via importlib.metadata |
random_seed |
RNG seed propagated through the run (HORDAGO_SEED) |
timestamp |
Run timestamp for traceability |
Reproducibility Template¶
The reproducibility record is the per-repo methods template. It documents the pinned environment, the provenance runtime, the canonical artifact contract, and the RNG seed propagation rule so a reader can reconstruct the run end to end.
| Surface | Mechanism | Location |
|---|---|---|
| Python deps | requirements-lock.txt exact pins |
repo root |
| Container base image | Tag-pinned python:3.11.11-slim |
containers/base.Dockerfile |
| Runtime artifacts | SHA-256 hashed provenance manifests | src/hordago/provenance.py |
| Script outputs | Self-hashed provenance bundles | scripts/provenance_bundle.py |
| Container runs | Auto-init provenance at startup | containers/scripts/provenance_init.sh |
Provenance Runtimes¶
Three provenance surfaces produce the manifest. The Python runtime
(make_manifest) hashes inputs, code, and library versions at every pipeline
boundary; the script runtime (create_provenance_bundle) emits a self-hashed
JSON bundle including the git commit and dirty-tree flag; the container runtime
writes /output/provenance.json at container startup.
Seed Propagation¶
Pass HORDAGO_SEED=<integer> as an environment variable to propagate a fixed
seed through analysis runs. The provenance manifest captures this seed in
env_capture so sessions stay traceable and reproducible.
Source Pointers¶
docs/reproducibility-policy.mdsrc/hordago/provenance.pyscripts/provenance_bundle.pycontainers/scripts/provenance_init.sh