Overview·Walkthrough

How a discovery cycle works

A Protean cycle moves through sixteen bounded stages — from evidence intake to a confirmed RuntimeCycle record on the Protean Ledger. The early stages curate scientific input; the middle stages propose, validate, and rank; the final stages explain, draft typed records, anchor them to Base mainnet, and hand off to human review.

Target shape

Stages 1–14 run on every cycle. Stage 9 (assay-preparation handoff) is currently drafts-only; the remaining wet-lab lifecycle states are reserved. Stage 16 is the on-chain step: terminal scientific objects are submitted to the Protean Ledger at 0xE3c261F3C05D4c4710003cd6066EfD95094cf5f0 via the role-separated Galen → operator-approval → Bankr write path.

FIG · 01·Runtime cycle topology

01model scan
02healthcheck
03ingest
04extract
05index
06normalize
07constrain
08features
09rank
10train
11explain
12claim QA
13learn
14cognition
15provenance
16ledger write

Block gate · halts on failure (entered before stages 09 & 15)Ledger path · stages 15 & 16 prepare provenance and register records

Sixteen stages execute in order. Stage 16 submits the cycle's terminal records to the Protean Ledger. The two block-gates halt the cycle on failure.

The shape of a cycle

A cycle has a fixed contract. It cannot reorder its stages, skip them, or recurse. Each stage receives an explicit input, applies an explicit control, and writes an explicit output that the next stage reads. The full stage list and the canonical labels are defined in the cycle executor and verified by the operations kernel.

The earliest stages are about curation — evidence intake, deduplication, normalization, and entity extraction. The middle stages compose the constraint surface, propose candidates inside it, and run deterministic validation. The late stages rank survivors, explain why they ranked where they did, check generated claims against the evidence index, apply a single bounded-feedback pass when preconditions exist, prepare a reviewed public projection with full candidate and family sequences where records are intentionally published, and draft Ledger envelopes for the terminal scientific objects. The final stage broadcasts those envelopes — gated by an operator-signed approval token — through the Bankr automation wallet, which holds the contract's AUTOMATION_WRITER_ROLE.

The end of a cycle is a record on chain

The terminal artifact of a cycle is no longer a local snapshot directory — it is a typed RuntimeCycle record on the Protean Ledger, optionally accompanied by Hypothesis, Experiment, Candidate, and ScientificAsset records produced inside the same run. The shape of the discovery flow:

Hypothesis  +  Evidence  +  Candidate
         │       │             │
         └───────┴─────────────┘
                 │
                 ▼
            RuntimeCycle  (anchors the cohort)
                 │
                 ▼
           ScientificAsset  (citable aggregation)
                 │
                 ▼
         Protean Ledger record on Base mainnet
                 │
                 ▼
      Indexer · digest · explorer
      GitHub mirror · Gitlawb mirror

Local snapshots are still produced. They support operator audit and feed the GitHub and Gitlawb public verification rails, but they are downstream of the chain, not the canonical artifact. A new reader who wants to verify the cycle does not start from a local bundle; they start from getRecord(bytes32) on the Ledger proxy and walk the record's lineage edges.

Sixteen stages, in detail

The atlas below summarises each stage's mechanism and why it exists in the cycle. The full input, control, output, and signal contract for every stage lives in the runtime data layer; this view is a reader's overview.

Evidence acquisition

Literature, prior records, proteomics summary metadata, failure signals, patent context, and assay-summary sources enter as provenance-bearing evidence.

Control

source trust scoring · deduplication · bounded ingestion

Multimodal scientific memory

Typed peptide, evidence, embedding, proteomics, motif-family, failure, lineage, and assay-summary records persist as context for future cycles.

Control

artifact-only consolidation · no raw spectra or vectors

Hypothesis formation

Motif, cleavage, shielding, novelty, comparative, and uncertainty hypotheses become reviewable artifacts with confidence labels.

Control

confidence labels · supporting and contradicting evidence

Computational experiment design

Mutation sweeps, motif perturbation studies, local evolutionary branches, contrastive sets, and anti-pattern tests are planned before the next proposal pass.

Control

bounded candidate counts · explicit decision rules

Constrained candidate proposal

Candidates are proposed inside an optimization surface shaped by cleavage risk, shielding logic, archive lineage, failure proximity, novelty floor, and synthesis plausibility.

Control

deterministic bounds · proposal routing · lineage retention

Validation and structural analysis

Residue validity, motif burden, cleavage exposure, embedding similarity, novelty, readiness prerequisites, and contradiction checks decide what survives into ranking.

Control

deterministic gates · warning burden

Ranking and scientific prioritization

Multi-axis scoring balances protease signals, solubility, permeability proxies, synthesis practicality, novelty, stable similarity, failure memory, and bounded model-informed signals.

Control

normalized scoring caps · single-pass rerank · no model-only promotion

Candidate explanations and assessments

Evidence retrieval, reranking, claim QA, editorial synthesis, candidate briefs, and feasibility assessments turn runtime output into reviewable communication.

Control

claim QA · uncertainty language

Assay-preparation handoff

Top candidates move into scientific review with assay categories, comparison groups, risk notes, and rationale for possible downstream testing.

Control

human review · experimental controls

IP and provenance package

Sequence lineage, source traces, novelty notes, disclosure awareness, and founder-review packages preserve scientific and IP context.

Control

founder review · disclosure caution

Feedback gate

Explanation feedback, synthetic rehearsal labels, and reviewed assay labels when present are kept behind calibration gates with rollback paths.

Control

bounded deltas · normalized weights · active-learning refusal by default

Research orchestration

Runtime modes coordinate proposal, exploration, hypothesis work, experiment planning, memory consolidation, and data expansion across cycles.

Control

mode priorities · forbidden mutations

Stage 9 — review-gated handoff draft

Top candidates from the ranking pass are organised into a draft review batch with assay categories, comparison groups, risk notes, and rationale for inclusion. The runtime does not submit the batch. The provider packet passes through the external-provider safety gate, and quote acceptance, payment, and order dispatch all require human review.

Today. Only the draft_review_required state is operational; provider packets are drafted and pass through the external-provider safety gate.

On the short-term path. The remaining fifteen wet-lab lifecycle states (scientific_review_pending through closed) and the consumer that turns experiment plans into provider packets at scale. The current drafts-only scope is why the safety guarantee holds trivially — extending the wet-lab path requires every gate on this page to remain enforced.

Stage 16 — Ledger submission

After the cognition cluster, provenance build, and collection update complete, the cycle drafts a RuntimeCycle envelope (plus any Hypothesis/Experiment/Candidate/ScientificAsset envelopes produced in the same run) against the protean.ledger.v1 schema. The envelope is queued, not signed: Galen does not hold any contract role, and the broadcast cannot proceed without an operator-issued approval token.

The operator reviews the queued envelope and runs bin/galen_ledger_approve.py --review-record-id <id> to mint a single-use, 5-minute-TTL token at ~/.openclaw/exec-approvals.json (mode 0600). The Ledger submitter consumes the token, validates the envelope against the on-chain schema + allow-list, encodes the calldata, and hands it to Bankr. Bankr — which holds only AUTOMATION_WRITER_ROLE — signs and broadcasts under its spend policy ($100/tx cap, $500/day cap, destination allow-list, halt switch). The chain emits RecordRegistered and RecordContentEmitted; the Vercel cron indexer picks them up within ~90 seconds; the explorer renders the new record at www.protean.sh/ledger/record/<recordId>.

The same step also writes a local snapshot directory using atomic primitives (O_CREAT|O_EXCL + fsync + os.replace). That directory is integrity scaffolding: operator audit, event-driven GitHub and Gitlawb replication, and per-record replay-artifact authoring all read from it. It is downstream of the chain. The canonical record is the on-chain Record struct; the snapshot is the source material for the replayPointer it cites.

A reviewer who wants to verify this cycle starts from the Ledger:

cast call 0xE3c261F3…94cf5f0 "getRecord(bytes32)" <recordId> → confirm the record exists and read its contentDigest, replayPointer, lifecycle, and disclosure state.
Walk the lineage edges to the parent records (EvidenceBundle, Hypothesis, prior RuntimeCycle) the same way.
Reproduce the indexer state digest from genesis against any Base RPC; compare to /ledger/api/v1/indexer/digest.
If the replayPointer is a public mirror anchor, fetch the artifact from proteanlabs1/ledger-mirror or the matching Gitlawb repo and re-shasum -a 256. If it's a historical bootstrap pointer, the verify page surfaces a "Historical Bootstrap Record" badge instead.

The full replay walkthrough lives in Provenance layer.

Stage signals at a glance

Every stage emits a signal that the cycle records — gate state, priority vector, review trace, handoff readiness. The table below summarises what each stage produces.

Stage

Output

Signal

Acquire

curated evidence plane · peptide-spectrum links where present

provenance-bearing record

Remember

motif · contradiction · lineage · multimodal memory

persistent scientific context

Hypothesize

bounded hypothesis set

investigation candidate

Plan

reviewable experiment plan

planned comparison set

Propose

candidate field · archive events

bounded proposal set

Validate

gate-passed candidate set · readiness gate state

gate state

Rank

ranked candidate slate

priority vector

Explain

briefs · reports · candidate assessments

review trace

Handoff

review-ready draft handoff batch

truth layer pending

Trace

provenance-aware review package

traceable research state

Learn

conservative feedback report

gated adaptation

Orchestrate

next investigation plan

research direction

What remains private

The cycle records its complete state internally. The public surface — what the docs site reads from the public export — is a reviewed projection of that state. Published candidate and family sequences are visible; private salts, embedding vectors, scoring internals, provider secrets, and unfiled IP stay in the private vault. The publication boundary is detailed in Disclosure boundary.