Overview·Start here

What Protean is

Protean is a provenance-aware scientific operating system for peptide discovery: evidence intake, constrained proposal, deterministic validation, computational prioritization, public proof, and review-gated handoff. Start here to understand what is implemented, what is bounded, and what remains reserved for human scientific review.

The shape, in one sentence

The runtime ingests evidence, proposes candidates inside an explicit constraint surface, ranks them against deterministic validators, and writes every step as a content-addressed record to the Protean Ledger contract at 0xE3c261F3C05D4c4710003cd6066EfD95094cf5f0. Any third party can replay the contract's event log from genesis and reconstruct the identical state digest. The cycle does not order assays. Published candidates and families show their full sequences. The cycle does not modify its own scoring, validators, prompts, or code.

Sixteen stages, one cadence

A cycle runs through sixteen bounded stages. The runtime cannot reorder, skip, or recurse them. The stage list is defined in the cycle executor and verified by the operations kernel.

FIG · 01·Runtime cycle topology

01model scan
02healthcheck
03ingest
04extract
05index
06normalize
07constrain
08features
09rank
10train
11explain
12claim QA
13learn
14cognition
15provenance
16ledger write

Block gate · halts on failure (entered before stages 09 & 15)Ledger path · stages 15 & 16 prepare provenance and register records

Sixteen stages execute in order. Stage 16 registers approved Ledger records. Local snapshots are supplemental. The two block-gates halt the cycle on failure.

Three of those stages — pattern learning, score and rank, bounded feedback — no-op when their preconditions are unmet. They record a warning and continue; they do not halt the cycle or create an active-learning claim. The full walkthrough lives in How a discovery cycle works.

Pick your path

The docs are organised into six groups. Three reading paths are pre-sequenced for the readers most likely to land here. Each path is four pages and takes about thirty minutes of attentive reading.

For scientists evaluating the research approach. What Protean is → How a discovery cycle works → Failure-aware optimization → Scientific boundaries. You will see the end-to-end methodological walkthrough, the distinctive commitment to negative evidence as memory, and the integrity contract that says rankings are prioritization, not validation.

For infrastructure engineers evaluating the system. System overview → Runtime architecture → Galen → Provenance layer. You will see the layered platform map, the execution layer with cycle contract and replay, the orchestration kernel that verifies the workflow without becoming the truth layer, and the cryptographic substrate that turns each cycle into a verifiable artifact.

For governance and compliance reviewers auditing control surfaces. Scientific boundaries → Disclosure boundary → Bounded learning → Galen. You will see the eight code-enforced rules the runtime ships with, the publication boundary that separates public sequence provenance from private salts and internals, the bounded-learning envelope that prevents drift, and the orchestration kernel's complete refusal list.

Browse by topic

Six groups, twenty-six pages. The sidebar carries the full list; below is the one-line orientation for each group.

Overview — start here; what the system is and what it does. Four pages.
Runtime — the scheduled scientific execution layer and the cycle contract. Four pages.
Evidence layer — how scientific information enters and is organised. Five pages.
Reasoning & memory — how the system reasons, remembers, and stays in bounds. Five pages.
Provenance & disclosure — what is recorded, anchored, and made publicly verifiable. Four pages.
Operations & governance — who is in charge, what is reserved, where this is going. Four pages.

What this surface is, and is not

The docs describe the runtime as systems research. They name what is implemented, what is partial, what is target, and what is reserved. They expose full sequences for intentionally published candidates and families, while scoring internals, embeddings, private manifests, provider secrets, and unfiled IP stay in the private vault by construction.