Reproducibility Envelope (Provenance)
Overview
Provenance is the frozen dataclass attached to
every NavResult to record the exact runtime
state under which a navigation produced its outputs. Two navigations with identical
inputs produce byte-identical
Provenance except for
pipeline_run_iso8601, which is
wall-clock by construction; regression-baseline comparison strips that field before
comparing.
Theory
The provenance envelope captures three independent kinds of state:
Code state.
rms_nav_versionandrms_nav_git_shatogether identify the exact source code that ran.External-data state.
spice_kernelslists every SPICE kernel actually loaded;static_data_hashessha256-hashes every YAML insrc/nav/config_fileswhose filename matches one of the_STATIC_DATA_PREFIXES(config_220_for the body shape catalogue,config_3for ring catalogues,config_4for per-instrument blocks).Pipeline state.
technique_namesandextractor_namesenumerate every registeredNavTechniqueandNavModelunder the current process — so a regression run pinned to an old code revision but with a new technique registered records the difference in its provenance even when the outputs are otherwise byte-identical.
Restrictions and assumptions
The static-data hash list is built once per
navigatecall bycollect_provenance_metadata(). Comments and whitespace are included in the hashed bytes, so a YAML edit that only adds a comment changes the hash.The SPICE-kernel list is read live from
spiceypy.ktotal/spiceypy.kdata; the orchestrator does not coerce or sort the list itself (the dataclass sorts it for byte-identical output).The git SHA is read from
git rev-parse HEADplus a--is-dirtycheck; the reported value is'dirty'when the working tree has uncommitted changes andNonewhen neither git nor a recorded SHA is available.The dataclass is frozen; the
spice_kernel_countderived field is populated in__post_init__from the kernel list length.
Sources of uncertainty
The envelope reports no uncertainty. Every field is a deterministic readout of runtime state at navigate time.
Configuration
The envelope carries no YAML configuration of its own. The list of filename prefixes
counted as static data lives in module-level
_STATIC_DATA_PREFIXES; downstream callers that want a different set of YAML files
hashed must extend the list at module level.
Implementation
Source file: src/nav/nav_orchestrator/provenance.py —
Provenance,
ProvenanceMetadata, and
collect_provenance_metadata().
Public surface (autodocumented at nav.nav_orchestrator):
Provenance— frozen dataclass. Public fields:rms_nav_version— version string (e.g.'0.5.2').image_et— observation midtime ET (TDB seconds past J2000).pipeline_run_iso8601— UTC timestamp when the run began; excluded from regression-baseline comparison.rms_nav_git_sha— short git SHA,'dirty', orNone.spice_kernels— sorted tuple of SPICE kernel filenames.static_data_hashes— read-only mapping of YAML filename to sha256 hex digest.technique_names— sorted tuple of registered technique class names.extractor_names— sorted tuple of registered extractor class names.spice_kernel_count— derived; populated fromlen(spice_kernels)in__post_init__.
ProvenanceMetadata— internal dataclass returned bycollect_provenance_metadata()carrying the freshly-read git SHA, kernel list, and static-data hash dict.collect_provenance_metadata()— runs the live readouts. Called once pernavigate().
The dataclass enforces invariants in __post_init__: every collection input is coerced
to its read-only / sorted form so two
Provenance instances with the same inputs are
byte-identical for hash / serialisation.
Examples
Two navigations of the same image with the same SPICE kernels. An operator runs a
batch over a Cassini ISS image at two different wall-clock times. Both runs produce
Provenance instances that differ only in
pipeline_run_iso8601; every other
field is byte-identical. A regression-baseline comparator strips that field and confirms
the two outputs match.
Code change that surfaces in provenance. An operator pulls a new commit that touches
nav.nav_technique.dt_fitting and reruns navigation. The new
Provenance carries a different
rms_nav_git_sha; the rest of the
envelope is unchanged unless the per-technique result changed. The reviewer can correlate
output diffs with the SHA delta directly.
Static-data change. An operator edits config_220_body_shape.yaml to refine
Mimas’s ellipsoid residual. The next
Provenance carries a different
static_data_hashes entry for that
file; downstream regression baselines pinned to the old hash flag the difference and
require re-baselining.