Provenance & disclosure·Cohorts
Collections
Collections organise candidates, hypotheses, experiments, papers, and provenance into coherent research cohorts. They are scientific operating units, not just lists.
Collection exists today as a deployed RecordType on the Protean Ledger. Off-chain collection manifests still exist as supplemental packaging for review and mirror publication, but the current canonical public object is a Ledger Collection record linked by typed Includes edges.
What collections are for
- benchmark sets
- assay cohorts
- experiment batches
- validation groups
- exploration branches
- public full-sequence releases
Lifecycle
private
-> internal review
-> assay pending
-> assay active
-> public reviewed
-> public full
-> archivedMost collection preparation remains private or under internal review before Ledger submission. Public collections expose membership, candidate/family sequences, lifecycle state, disclosure state, and typed lineage. Supplemental manifests pass through the publication guard before mirror publication.
Collections as scientific operating units
Collections let the runtime reason about groups, not just individual candidates. They bind candidate families, evidence traces, assay readiness, Ledger records, and review state without exposing private payloads. Collections support wet-lab batch planning, failure-memory studies, exploration branches, and public proof surfaces.
How they connect to the workflow
The canonical workflow DAG places update_collections after prepare_provenance, gated by the public_collection_redaction review gate. The active loop wires collection preparation after lineage and provenance stages where the relevant artifacts are available.
python3 pipelines/collection_system.py --build
python3 pipelines/collection_system.py --verifyCollection manifests are public-safe by construction: published sequences remain visible, while salts, local paths, embeddings, scoring weights, source traces, and wet-lab automation fields are excluded.
