Skip to content
Protean LabsDocs

Discovery Lifecycle

Protean moves autonomous peptide discovery through a controlled lifecycle. Evidence enters with provenance, memory forms across cycles, hypotheses shape computational experiments, candidate sets are generated under constraints, deterministic gates decide what survives, and review artifacts preserve uncertainty for downstream scientific decisions.

scientific evidence
-> source provenance
-> scientific memory
-> hypothesis formation
-> computational experiment design
-> constrained candidate generation
-> deterministic validation
-> feature assembly and ranking
-> explanation and claim QA
-> bounded learning
-> research package
-> wet-lab review boundary

Lifecycle Thesis

Protean is not a free-form sequence generator. It is a scientific runtime that coordinates evidence, constraints, candidate proposals, deterministic validation, ranking, claim discipline, memory, and research planning into one bounded operating system.

The value is continuity between stages. A rejection is not just a discarded candidate. It can become failure memory. A ranking is not proof. It becomes a prioritization signal. A generated paper is not validation. It becomes a review artifact that exposes what is supported, weakly supported, unsupported, or contradicted.

01

Scientific Data Acquisition

Literature, evidence records, failures, patents, peptide databases, and planned assay or structure sources enter as provenance-bearing records.

control

source trust scoring, deduplication, bounded ingestion

02

Scientific Memory Formation

Stable motifs, failure motifs, contradictions, lineage, retrieval history, and experiment memory become persistent context for future cycles.

control

artifact-only memory consolidation

03

Hypothesis Formation

Motif, cleavage, shielding, novelty, comparative, and uncertainty hypotheses are proposed as reviewable research artifacts.

control

confidence labels, supporting and contradictory evidence

04

Computational Experiment Design

Mutation sweeps, motif perturbation studies, local evolutionary branches, contrastive sets, and anti-pattern tests are planned before another candidate push.

control

bounded candidate counts and explicit decision rules

05

Constraint-Guided Candidate Generation

Candidates are proposed inside a constrained optimization surface shaped by cleavage risk, shielding logic, failure proximity, novelty, and practical sequence bounds.

control

deterministic bounds and proposal routing

06

Validation And Structural Analysis

Residue validity, motif burden, cleavage exposure, sequence-space position, embedding similarity, novelty, and contradiction checks shape candidate eligibility.

control

deterministic gates and warning burden

07

Ranking And Scientific Prioritization

Multi-objective ranking balances protease-related signals, solubility, permeability proxies, synthesis practicality, novelty, stable similarity, and failure memory.

control

normalized scoring caps and rerank limits

08

Candidate Explanations And Scientific Papers

Evidence retrieval, reranking, claim QA, editorial synthesis, candidate briefs, and computational feasibility papers turn runtime output into reviewable communication.

control

claim QA and uncertainty language

09

Wet Lab Handoff

Top candidates move into scientific review with assay categories, comparison groups, risk notes, and rationale for inclusion in a downstream validation batch.

control

human scientific review and experimental controls

10

IP And Provenance Layer

Sequence lineage, source traces, novelty notes, disclosure awareness, and founder-review packages preserve scientific and IP-oriented context.

control

founder review and disclosure caution

11

Recursive Bounded Learning

Explanation-guided or assay-guided feedback can adjust prioritization within caps, logging reasons, prior state, evidence, and rollback paths.

control

bounded deltas, normalized weights, replayable reports

12

Autonomous Research Orchestration

Runtime modes coordinate generation, exploration, hypothesis work, experiment planning, memory consolidation, and data expansion.

control

mode priorities and forbidden mutations

Stage 1: Scientific Data Acquisition

The lifecycle begins with source discipline. Literature, evidence records, failure data, patent context, peptide databases, assay repositories, and planned structural sources are treated as provenance-bearing inputs rather than interchangeable text.

The system continuously expands its scientific memory through bounded ingestion and retrieval. Source identity, record freshness, duplicate suppression, and extraction confidence remain part of the evidence state.

Stage 2: Scientific Memory Formation

Normalized evidence becomes persistent scientific context. Motif memory, stable motif accumulation, failure motif accumulation, contradiction tracking, lineage tracking, retrieval history, and experiment memory give each cycle continuity.

This memory is artifact-only. It gives the runtime better context without granting permission to rewrite code, mutate scoring rules, or remove validation controls.

Stage 3: Hypothesis Formation

The hypothesis layer asks what should be studied next. It can propose motif hypotheses, cleavage hypotheses, shielding hypotheses, novelty hypotheses, comparative hypotheses, and uncertainty-oriented hypotheses.

Each hypothesis should carry confidence, supporting evidence, contradictory evidence, candidate groups, investigation status, and lineage. This moves the system away from endless candidate production and toward directed scientific investigation.

Stage 4: Computational Experiment Design

Hypotheses become bounded computational study designs. The planner can create mutation sweeps, motif perturbation studies, local evolutionary branches, contrastive candidate sets, and anti-pattern tests.

These plans are not wet-lab results. They are structured ways to make the next computational cycle more informative by comparing candidate families and stress-testing assumptions.

Stage 5: Constraint-Guided Candidate Generation

Candidate generation happens inside a constrained optimization surface. The generator is not a free-form peptide writer. It is bounded by sequence rules, cleavage-aware logic, failure proximity, novelty pressure, and practical candidate constraints.

Local models may assist as proposal sources where routed and available, but deterministic constraints define what the system is allowed to explore.

Stage 6: Validation And Structural Analysis

Candidates are evaluated through deterministic validation and sequence analysis before they can become serious review objects. The lifecycle can include amino acid validation, motif validation, cleavage scoring, embedding similarity, sequence-space positioning, novelty scoring, reranking, and contradiction checks.

LLMs are proposal and synthesis systems only. Deterministic validators remain authoritative.

Stage 7: Ranking And Scientific Prioritization

Ranking is multi-objective prioritization. Candidate scores can reflect protease-related signals, solubility and permeability proxies, synthesis practicality, novelty, stable similarity, failure similarity, warning burden, and bounded adaptive weights.

Ranking evolves as evidence and memory evolve, but it remains a prioritization layer. It does not establish biological activity, safety, efficacy, or experimental validation.

Stage 8: Candidate Explanations And Scientific Papers

For high-priority candidates, the system retrieves relevant evidence, reranks context, checks generated claims, and creates candidate explanations or computational assessment papers.

This communication layer is designed for founder, researcher, and scientific review. It should translate raw runtime outputs into interpretable rationale while preserving uncertainty.

Stage 9: Wet-Lab Handoff

Top candidates can be organized into review batches with assay category suggestions, comparison groups, risk notes, and rationale for inclusion.

Wet-lab validation is the downstream truth layer. Computational prioritization can improve what gets tested, but it cannot replace controlled assays, comparison groups, experimental controls, or human scientific review.

Stage 10: IP And Provenance Layer

Candidate lineage, source traces, novelty notes, disclosure awareness, and founder-review packages preserve research context for IP-oriented review.

The system can prepare review artifacts. It does not determine patentability, provide legal advice, or convert computational novelty signals into legal conclusions.

Stage 11: Recursive Bounded Learning

Learning is intentionally constrained. Explanation-guided or assay-guided feedback can adjust prioritization within bounded deltas while logging the reason, evidence, prior state, and rollback path.

The system is prevented from becoming an uncontrolled self-modifying loop. It does not rewrite architecture, mutate scoring code, remove failure penalties, or recursively retrain itself without bounds.

Stage 12: Autonomous Research Orchestration

The research planner coordinates runtime modes such as generation, exploration, hypothesis formation, experiment planning, memory consolidation, and data expansion.

This is the move toward an autonomous scientific operating system: the platform proposes the next investigation, records the basis for that proposal, and leaves a replayable trace for review.

Stage
Signal
Acquire

curated evidence plane

provenance-bearing evidence
Remember

motif, contradiction, lineage, and exploration memory

persistent scientific context
Hypothesize

bounded hypothesis set

investigation candidates
Plan

reviewable computational experiment plans

planned comparison sets
Generate

candidate field

bounded proposal set
Validate

validated candidate set

gate state
Rank

ranked candidate slate

priority vector
Explain

briefs, reports, and candidate assessment papers

review trace
Handoff

review-ready handoff batch

truth layer pending
Trace

provenance-aware review package

traceable research state
Learn

conservative ranking adjustment

bounded adaptation
Orchestrate

next investigation plan

scientific operating layer

What Remains Private

The public lifecycle explains the operating architecture without exposing moat-critical implementation details.

  • Exact routing policy.
  • Prompt design.
  • Proprietary scoring weights.
  • Private source curation and weighting.
  • Full failure memory.
  • Operator setup, schedules, credentials, deployment paths, and runtime internals.

Scientific boundary

Computational prioritization is not experimental validation. Each lifecycle stage improves review quality, but biological claims still require assay evidence, controls, and human scientific review.

Operational discipline

Where local models are unavailable, exhausted, or contradicted, the runtime records warnings and falls back to deterministic behavior instead of silently promoting claims.