Skip to content

Methods & Reproducibility

Every science repo carries a mandatory methods-and-reproducibility record. The record is not prose alone: it is backed by a provenance manifest that captures the exact inputs, code, environment, seed, and timing of a run so any result can be replayed.

Five Provenance Fields

Every provenance manifest records five mandatory fields. Missing provenance blocks pipeline progression (fail-closed).

Field Captures
data_hash SHA-256 of every input data file
code_hash SHA-256 of the executing code file
library_versions All installed package versions via importlib.metadata
random_seed RNG seed propagated through the run (HORDAGO_SEED)
timestamp Run timestamp for traceability

Reproducibility Template

The reproducibility record is the per-repo methods template. It documents the pinned environment, the provenance runtime, the canonical artifact contract, and the RNG seed propagation rule so a reader can reconstruct the run end to end.

Surface Mechanism Location
Python deps requirements-lock.txt exact pins repo root
Container base image Tag-pinned python:3.11.11-slim containers/base.Dockerfile
Runtime artifacts SHA-256 hashed provenance manifests src/hordago/provenance.py
Script outputs Self-hashed provenance bundles scripts/provenance_bundle.py
Container runs Auto-init provenance at startup containers/scripts/provenance_init.sh

Provenance Runtimes

Three provenance surfaces produce the manifest. The Python runtime (make_manifest) hashes inputs, code, and library versions at every pipeline boundary; the script runtime (create_provenance_bundle) emits a self-hashed JSON bundle including the git commit and dirty-tree flag; the container runtime writes /output/provenance.json at container startup.

Seed Propagation

Pass HORDAGO_SEED=<integer> as an environment variable to propagate a fixed seed through analysis runs. The provenance manifest captures this seed in env_capture so sessions stay traceable and reproducible.

Source Pointers

  • docs/reproducibility-policy.md
  • src/hordago/provenance.py
  • scripts/provenance_bundle.py
  • containers/scripts/provenance_init.sh