Overview·Map
System overview
Evidence enters; constraints shape the search space; proposal systems explore; deterministic validators decide; ranking prioritises; learning adapts within bounds; review packages the output. One operating fabric, one cycle at a time.
All layers below run on every cycle. The full stage list lives in the cycle executor and is verified by the operations kernel.
The platform, in seven layers
evidence plane
-> constraint engine
-> candidate field
-> validation gates
-> failure-aware ranking
-> interpretability layer
-> bounded adaptation
-> review package
-> Protean Ledger (Base mainnet)The advantage is not a single model. The advantage is the operating fabric that keeps evidence, constraints, proposal systems, ranking logic, and learning behaviour aligned across repeated discovery cycles — and ends each cycle with a typed, content-addressed record on a public chain.
The on-chain layer
Operator-signed
approval token
│
Galen ──────►│
(cognition │ Bankr
runtime, │ automation wallet
zero on-chain ▼ (AUTOMATION_WRITER_ROLE)
authority) registerRecord
│
▼
Protean Ledger (Base mainnet, UUPS proxy)
0xE3c261F3C05D4c4710003cd6066EfD95094cf5f0
│
emits RecordRegistered
+ RecordContentEmitted
│
▼
Vercel cron indexer (every minute, Neon)
│
sha256 state digest at
/ledger/api/v1/indexer/digest
│
▼
Explorer at protean.sh/ledger
GitHub mirror at github.com/proteanlabs1/ledger-mirror
Gitlawb mirrors under DID did:key:z6Mkt6MEeSCJM2krT1PfX8BmTWbi9YYkLqdaRXSF6UZvy5QB
The chain is the source of truth. The indexer reflects it. The digest reproduces it. The explorer is a lens; GitHub and Gitlawb are public-distribution surfaces. None of those downstream surfaces are the authority. Any third party can recompute the record graph from the contract's event log alone.
FIG · 01·Runtime cycle topology
- 01model scan
- 02healthcheck
- 03ingest
- 04extract
- 05index
- 06normalize
- 07constrain
- 08features
- 09rank
- 10train
- 11explain
- 12claim QA
- 13learn
- 14cognition
- 15provenance
- 16ledger write
Core layers
Evidence plane. Captures source records, extracted scientific entities, literature signals, negative evidence, internal observations, and candidate lineage. Built with BAAI/bge-m3 for text embeddings, BAAI/bge-reranker-v2-m3 for reranking, urchade/gliner_large-v2 for entity extraction.
Constraint engine. Turns research objectives into design boundaries before candidate generation begins. Hard gates halt invalid candidates; soft constraints become warnings and ranking context.
Proposal systems. Explore candidate space through embedding-guided synthesis on facebook/esm2_t12_35M_UR50D and deterministic generation. Proposal cannot override validators.
Validation gates. Reject invalid residues, malformed sequences, excessive repetition, unacceptable cleavage exposure against the four-enzyme panel (trypsin, chymotrypsin, pepsin, elastase), and candidates too close to known failure patterns. Deterministic; authoritative.
Ranking architecture. Balances seven canonical weights — protease_resistance (0.25), solubility (0.15), permeability (0.15), novelty (0.15), synthesis_risk (0.10), failure_similarity (0.10), stable_similarity (0.10). Bounded adaptation within ±20% of base, ±50% with trusted assay data, normalised at write time.
Interpretability layer. Produces structured rationale for why a candidate advanced, stalled, or was rejected. Claim QA via tasksource/ModernBERT-base-nli flags unsupported statements.
Bounded learning. Single pass per cycle. Records the delta vector, the reason, the prior state, and the rollback path as operator-audit scaffolding, then submits only approved public-safe Ledger records.
Review package. Converts selected outputs into reviewable artifacts for scientific planning and IP strategy. Drafts only; no auto-submit.
The object model under the cycle
The runtime treats every artifact as a typed scientific object with lineage, lifecycle state, and disclosure state. The Protean Ledger defines fourteen RecordType enum values (Unknown plus thirteen usable types: RuntimeCycle, Hypothesis, Experiment, EvidenceBundle, Candidate, Thesis, AssayResult, Collection, RetractionNotice, ExternalSignal, Governance, ScientificAsset, IPAsset) and fourteen RelationType enum values. Records flow into the chain through Stage 16 of the cycle.
FIG · 03·Protean Ledger record graph
Control surfaces
Autonomy stays useful because the cycle preserves control surfaces at every stage:
- source provenance before extraction
- constraints before generation
- deterministic validation before scoring
- failure penalties before ranking
- bounded adaptation before reranking
- human review before any wet-lab or IP decision
