Skip to content
Protean

Overview·Map

System overview

Evidence enters; constraints shape the search space; proposal systems explore; deterministic validators decide; ranking prioritises; learning adapts within bounds; review packages the output. One operating fabric, one cycle at a time.

Operational

All layers below run on every cycle. The full stage list lives in the cycle executor and is verified by the operations kernel.

The platform, in seven layers

evidence plane
-> constraint engine
-> candidate field
-> validation gates
-> failure-aware ranking
-> interpretability layer
-> bounded adaptation
-> review package
                     -> Protean Ledger  (Base mainnet)

The advantage is not a single model. The advantage is the operating fabric that keeps evidence, constraints, proposal systems, ranking logic, and learning behaviour aligned across repeated discovery cycles — and ends each cycle with a typed, content-addressed record on a public chain.

The on-chain layer

              Operator-signed
              approval token
                    │
       Galen ──────►│
   (cognition       │     Bankr
    runtime,        │   automation wallet
    zero on-chain   ▼   (AUTOMATION_WRITER_ROLE)
    authority)   registerRecord
                    │
                    ▼
        Protean Ledger  (Base mainnet, UUPS proxy)
        0xE3c261F3C05D4c4710003cd6066EfD95094cf5f0
                    │
            emits RecordRegistered
                  + RecordContentEmitted
                    │
                    ▼
              Vercel cron indexer  (every minute, Neon)
                    │
        sha256 state digest at
        /ledger/api/v1/indexer/digest
                    │
                    ▼
              Explorer at protean.sh/ledger
              GitHub mirror at github.com/proteanlabs1/ledger-mirror
              Gitlawb mirrors under DID did:key:z6Mkt6MEeSCJM2krT1PfX8BmTWbi9YYkLqdaRXSF6UZvy5QB

The chain is the source of truth. The indexer reflects it. The digest reproduces it. The explorer is a lens; GitHub and Gitlawb are public-distribution surfaces. None of those downstream surfaces are the authority. Any third party can recompute the record graph from the contract's event log alone.

FIG · 01·Runtime cycle topology

  1. 01model scan
  2. 02healthcheck
  3. 03ingest
  4. 04extract
  5. 05index
  6. 06normalize
  7. 07constrain
  8. 08features
  9. 09rank
  10. 10train
  11. 11explain
  12. 12claim QA
  13. 13learn
  14. 14cognition
  15. 15provenance
  16. 16ledger write
Block gate · halts on failure (entered before stages 09 & 15)Ledger path · stages 15 & 16 prepare provenance and register records
The seven layers above expand into sixteen bounded stages in the actual cycle. Stage 16 submits the cycle's terminal records to the Protean Ledger.

Core layers

Evidence plane. Captures source records, extracted scientific entities, literature signals, negative evidence, internal observations, and candidate lineage. Built with BAAI/bge-m3 for text embeddings, BAAI/bge-reranker-v2-m3 for reranking, urchade/gliner_large-v2 for entity extraction.

Constraint engine. Turns research objectives into design boundaries before candidate generation begins. Hard gates halt invalid candidates; soft constraints become warnings and ranking context.

Proposal systems. Explore candidate space through embedding-guided synthesis on facebook/esm2_t12_35M_UR50D and deterministic generation. Proposal cannot override validators.

Validation gates. Reject invalid residues, malformed sequences, excessive repetition, unacceptable cleavage exposure against the four-enzyme panel (trypsin, chymotrypsin, pepsin, elastase), and candidates too close to known failure patterns. Deterministic; authoritative.

Ranking architecture. Balances seven canonical weights — protease_resistance (0.25), solubility (0.15), permeability (0.15), novelty (0.15), synthesis_risk (0.10), failure_similarity (0.10), stable_similarity (0.10). Bounded adaptation within ±20% of base, ±50% with trusted assay data, normalised at write time.

Interpretability layer. Produces structured rationale for why a candidate advanced, stalled, or was rejected. Claim QA via tasksource/ModernBERT-base-nli flags unsupported statements.

Bounded learning. Single pass per cycle. Records the delta vector, the reason, the prior state, and the rollback path as operator-audit scaffolding, then submits only approved public-safe Ledger records.

Review package. Converts selected outputs into reviewable artifacts for scientific planning and IP strategy. Drafts only; no auto-submit.

The object model under the cycle

The runtime treats every artifact as a typed scientific object with lineage, lifecycle state, and disclosure state. The Protean Ledger defines fourteen RecordType enum values (Unknown plus thirteen usable types: RuntimeCycle, Hypothesis, Experiment, EvidenceBundle, Candidate, Thesis, AssayResult, Collection, RetractionNotice, ExternalSignal, Governance, ScientificAsset, IPAsset) and fourteen RelationType enum values. Records flow into the chain through Stage 16 of the cycle.

FIG · 03·Protean Ledger record graph

Protean Ledger · 17 RecordTypes · 20 RelationTypesAnchorsTestsSupportsProducesProducesProducesIncludesAssetOfProtectedByCitesRetractsReviewedByRuntimeCycleEvidenceBundleHypothesisExperimentCandidateThesisAssayResultCollectionScientificAssetIPAssetExternalSignalGovernanceRetractionNoticetemporal rootRecordType (on chain)edge labels = on-chain RelationType
Thirteen usable RecordTypes plus the Unknown sentinel, and a representative subset of the typed RelationTypes that wire them together on chain. Every node is a RecordType from contracts/ProteanLedger.sol; every edge label is a real RelationType from the contract enum.

Control surfaces

Autonomy stays useful because the cycle preserves control surfaces at every stage:

  • source provenance before extraction
  • constraints before generation
  • deterministic validation before scoring
  • failure penalties before ranking
  • bounded adaptation before reranking
  • human review before any wet-lab or IP decision