Overview·Walkthrough
How a discovery cycle works
A Protean cycle moves through sixteen bounded stages — from evidence intake to a confirmed RuntimeCycle record on the Protean Ledger. The early stages curate scientific input; the middle stages propose, validate, and rank; the final stages explain, draft typed records, anchor them to Base mainnet, and hand off to human review.
Stages 1–14 run on every cycle. Stage 9 (assay-preparation handoff) is currently drafts-only; the remaining wet-lab lifecycle states are reserved. Stage 16 is the on-chain step: terminal scientific objects are submitted to the Protean Ledger at 0xE3c261F3C05D4c4710003cd6066EfD95094cf5f0 via the role-separated Galen → operator-approval → Bankr write path.
FIG · 01·Runtime cycle topology
- 01model scan
- 02healthcheck
- 03ingest
- 04extract
- 05index
- 06normalize
- 07constrain
- 08features
- 09rank
- 10train
- 11explain
- 12claim QA
- 13learn
- 14cognition
- 15provenance
- 16ledger write
The shape of a cycle
A cycle has a fixed contract. It cannot reorder its stages, skip them, or recurse. Each stage receives an explicit input, applies an explicit control, and writes an explicit output that the next stage reads. The full stage list and the canonical labels are defined in the cycle executor and verified by the operations kernel.
The earliest stages are about curation — evidence intake, deduplication, normalization, and entity extraction. The middle stages compose the constraint surface, propose candidates inside it, and run deterministic validation. The late stages rank survivors, explain why they ranked where they did, check generated claims against the evidence index, apply a single bounded-feedback pass when preconditions exist, prepare a reviewed public projection with full candidate and family sequences where records are intentionally published, and draft Ledger envelopes for the terminal scientific objects. The final stage broadcasts those envelopes — gated by an operator-signed approval token — through the Bankr automation wallet, which holds the contract's AUTOMATION_WRITER_ROLE.
The end of a cycle is a record on chain
The terminal artifact of a cycle is no longer a local snapshot directory — it is a typed RuntimeCycle record on the Protean Ledger, optionally accompanied by Hypothesis, Experiment, Candidate, and ScientificAsset records produced inside the same run. The shape of the discovery flow:
Hypothesis + Evidence + Candidate
│ │ │
└───────┴─────────────┘
│
▼
RuntimeCycle (anchors the cohort)
│
▼
ScientificAsset (citable aggregation)
│
▼
Protean Ledger record on Base mainnet
│
▼
Indexer · digest · explorer
GitHub mirror · Gitlawb mirror
Local snapshots are still produced. They support operator audit and feed the GitHub and Gitlawb public verification rails, but they are downstream of the chain, not the canonical artifact. A new reader who wants to verify the cycle does not start from a local bundle; they start from getRecord(bytes32) on the Ledger proxy and walk the record's lineage edges.
Sixteen stages, in detail
The atlas below summarises each stage's mechanism and why it exists in the cycle. The full input, control, output, and signal contract for every stage lives in the runtime data layer; this view is a reader's overview.
Evidence acquisition
Literature, prior records, proteomics summary metadata, failure signals, patent context, and assay-summary sources enter as provenance-bearing evidence.
Control
source trust scoring · deduplication · bounded ingestion
Multimodal scientific memory
Typed peptide, evidence, embedding, proteomics, motif-family, failure, lineage, and assay-summary records persist as context for future cycles.
Control
artifact-only consolidation · no raw spectra or vectors
Hypothesis formation
Motif, cleavage, shielding, novelty, comparative, and uncertainty hypotheses become reviewable artifacts with confidence labels.
Control
confidence labels · supporting and contradicting evidence
Computational experiment design
Mutation sweeps, motif perturbation studies, local evolutionary branches, contrastive sets, and anti-pattern tests are planned before the next proposal pass.
Control
bounded candidate counts · explicit decision rules
Constrained candidate proposal
Candidates are proposed inside an optimization surface shaped by cleavage risk, shielding logic, archive lineage, failure proximity, novelty floor, and synthesis plausibility.
Control
deterministic bounds · proposal routing · lineage retention
Validation and structural analysis
Residue validity, motif burden, cleavage exposure, embedding similarity, novelty, readiness prerequisites, and contradiction checks decide what survives into ranking.
Control
deterministic gates · warning burden
Ranking and scientific prioritization
Multi-axis scoring balances protease signals, solubility, permeability proxies, synthesis practicality, novelty, stable similarity, failure memory, and bounded model-informed signals.
Control
normalized scoring caps · single-pass rerank · no model-only promotion
Candidate explanations and assessments
Evidence retrieval, reranking, claim QA, editorial synthesis, candidate briefs, and feasibility assessments turn runtime output into reviewable communication.
Control
claim QA · uncertainty language
Assay-preparation handoff
Top candidates move into scientific review with assay categories, comparison groups, risk notes, and rationale for possible downstream testing.
Control
human review · experimental controls
IP and provenance package
Sequence lineage, source traces, novelty notes, disclosure awareness, and founder-review packages preserve scientific and IP context.
Control
founder review · disclosure caution
Feedback gate
Explanation feedback, synthetic rehearsal labels, and reviewed assay labels when present are kept behind calibration gates with rollback paths.
Control
bounded deltas · normalized weights · active-learning refusal by default
Research orchestration
Runtime modes coordinate proposal, exploration, hypothesis work, experiment planning, memory consolidation, and data expansion across cycles.
Control
mode priorities · forbidden mutations
Stage 9 — review-gated handoff draft
Top candidates from the ranking pass are organised into a draft review batch with assay categories, comparison groups, risk notes, and rationale for inclusion. The runtime does not submit the batch. The provider packet passes through the external-provider safety gate, and quote acceptance, payment, and order dispatch all require human review.
Today. Only the draft_review_required state is operational; provider packets are drafted and pass through the external-provider safety gate.
On the short-term path. The remaining fifteen wet-lab lifecycle states (scientific_review_pending through closed) and the consumer that turns experiment plans into provider packets at scale. The current drafts-only scope is why the safety guarantee holds trivially — extending the wet-lab path requires every gate on this page to remain enforced.
Stage 16 — Ledger submission
After the cognition cluster, provenance build, and collection update complete, the cycle drafts a RuntimeCycle envelope (plus any Hypothesis/Experiment/Candidate/ScientificAsset envelopes produced in the same run) against the protean.ledger.v1 schema. The envelope is queued, not signed: Galen does not hold any contract role, and the broadcast cannot proceed without an operator-issued approval token.
The operator reviews the queued envelope and runs bin/galen_ledger_approve.py --review-record-id <id> to mint a single-use, 5-minute-TTL token at ~/.openclaw/exec-approvals.json (mode 0600). The Ledger submitter consumes the token, validates the envelope against the on-chain schema + allow-list, encodes the calldata, and hands it to Bankr. Bankr — which holds only AUTOMATION_WRITER_ROLE — signs and broadcasts under its spend policy ($100/tx cap, $500/day cap, destination allow-list, halt switch). The chain emits RecordRegistered and RecordContentEmitted; the Vercel cron indexer picks them up within ~90 seconds; the explorer renders the new record at www.protean.sh/ledger/record/<recordId>.
The same step also writes a local snapshot directory using atomic primitives (O_CREAT|O_EXCL + fsync + os.replace). That directory is integrity scaffolding: operator audit, event-driven GitHub and Gitlawb replication, and per-record replay-artifact authoring all read from it. It is downstream of the chain. The canonical record is the on-chain Record struct; the snapshot is the source material for the replayPointer it cites.
A reviewer who wants to verify this cycle starts from the Ledger:
cast call 0xE3c261F3…94cf5f0 "getRecord(bytes32)" <recordId>→ confirm the record exists and read itscontentDigest,replayPointer, lifecycle, and disclosure state.- Walk the lineage edges to the parent records (
EvidenceBundle,Hypothesis, priorRuntimeCycle) the same way. - Reproduce the indexer state digest from genesis against any Base RPC; compare to
/ledger/api/v1/indexer/digest. - If the
replayPointeris a public mirror anchor, fetch the artifact fromproteanlabs1/ledger-mirroror the matching Gitlawb repo and re-shasum -a 256. If it's a historical bootstrap pointer, the verify page surfaces a "Historical Bootstrap Record" badge instead.
The full replay walkthrough lives in Provenance layer.
Stage signals at a glance
Every stage emits a signal that the cycle records — gate state, priority vector, review trace, handoff readiness. The table below summarises what each stage produces.
curated evidence plane · peptide-spectrum links where present
motif · contradiction · lineage · multimodal memory
bounded hypothesis set
reviewable experiment plan
candidate field · archive events
gate-passed candidate set · readiness gate state
ranked candidate slate
briefs · reports · candidate assessments
review-ready draft handoff batch
provenance-aware review package
conservative feedback report
next investigation plan
What remains private
The cycle records its complete state internally. The public surface — what the docs site reads from the public export — is a reviewed projection of that state. Published candidate and family sequences are visible; private salts, embedding vectors, scoring internals, provider secrets, and unfiled IP stay in the private vault. The publication boundary is detailed in Disclosure boundary.
