Discovery Lifecycle
Protean moves autonomous peptide discovery through a controlled lifecycle. Evidence enters with provenance, memory forms across cycles, hypotheses shape computational experiments, candidate sets are generated under constraints, deterministic gates decide what survives, and review artifacts preserve uncertainty for downstream scientific decisions.
scientific evidence
-> source provenance
-> scientific memory
-> hypothesis formation
-> computational experiment design
-> constrained candidate generation
-> deterministic validation
-> feature assembly and ranking
-> explanation and claim QA
-> bounded learning
-> research package
-> wet-lab review boundary
Lifecycle Thesis
Protean is not a free-form sequence generator. It is a scientific runtime that coordinates evidence, constraints, candidate proposals, deterministic validation, ranking, claim discipline, memory, and research planning into one bounded operating system.
The value is continuity between stages. A rejection is not just a discarded candidate. It can become failure memory. A ranking is not proof. It becomes a prioritization signal. A generated paper is not validation. It becomes a review artifact that exposes what is supported, weakly supported, unsupported, or contradicted.
Scientific Data Acquisition
Literature, evidence records, failures, patents, peptide databases, and planned assay or structure sources enter as provenance-bearing records.
control
source trust scoring, deduplication, bounded ingestion
Scientific Memory Formation
Stable motifs, failure motifs, contradictions, lineage, retrieval history, and experiment memory become persistent context for future cycles.
control
artifact-only memory consolidation
Hypothesis Formation
Motif, cleavage, shielding, novelty, comparative, and uncertainty hypotheses are proposed as reviewable research artifacts.
control
confidence labels, supporting and contradictory evidence
Computational Experiment Design
Mutation sweeps, motif perturbation studies, local evolutionary branches, contrastive sets, and anti-pattern tests are planned before another candidate push.
control
bounded candidate counts and explicit decision rules
Constraint-Guided Candidate Generation
Candidates are proposed inside a constrained optimization surface shaped by cleavage risk, shielding logic, failure proximity, novelty, and practical sequence bounds.
control
deterministic bounds and proposal routing
Validation And Structural Analysis
Residue validity, motif burden, cleavage exposure, sequence-space position, embedding similarity, novelty, and contradiction checks shape candidate eligibility.
control
deterministic gates and warning burden
Ranking And Scientific Prioritization
Multi-objective ranking balances protease-related signals, solubility, permeability proxies, synthesis practicality, novelty, stable similarity, and failure memory.
control
normalized scoring caps and rerank limits
Candidate Explanations And Scientific Papers
Evidence retrieval, reranking, claim QA, editorial synthesis, candidate briefs, and computational feasibility papers turn runtime output into reviewable communication.
control
claim QA and uncertainty language
Wet Lab Handoff
Top candidates move into scientific review with assay categories, comparison groups, risk notes, and rationale for inclusion in a downstream validation batch.
control
human scientific review and experimental controls
IP And Provenance Layer
Sequence lineage, source traces, novelty notes, disclosure awareness, and founder-review packages preserve scientific and IP-oriented context.
control
founder review and disclosure caution
Recursive Bounded Learning
Explanation-guided or assay-guided feedback can adjust prioritization within caps, logging reasons, prior state, evidence, and rollback paths.
control
bounded deltas, normalized weights, replayable reports
Autonomous Research Orchestration
Runtime modes coordinate generation, exploration, hypothesis work, experiment planning, memory consolidation, and data expansion.
control
mode priorities and forbidden mutations
Stage 1: Scientific Data Acquisition
The lifecycle begins with source discipline. Literature, evidence records, failure data, patent context, peptide databases, assay repositories, and planned structural sources are treated as provenance-bearing inputs rather than interchangeable text.
The system continuously expands its scientific memory through bounded ingestion and retrieval. Source identity, record freshness, duplicate suppression, and extraction confidence remain part of the evidence state.
Stage 2: Scientific Memory Formation
Normalized evidence becomes persistent scientific context. Motif memory, stable motif accumulation, failure motif accumulation, contradiction tracking, lineage tracking, retrieval history, and experiment memory give each cycle continuity.
This memory is artifact-only. It gives the runtime better context without granting permission to rewrite code, mutate scoring rules, or remove validation controls.
Stage 3: Hypothesis Formation
The hypothesis layer asks what should be studied next. It can propose motif hypotheses, cleavage hypotheses, shielding hypotheses, novelty hypotheses, comparative hypotheses, and uncertainty-oriented hypotheses.
Each hypothesis should carry confidence, supporting evidence, contradictory evidence, candidate groups, investigation status, and lineage. This moves the system away from endless candidate production and toward directed scientific investigation.
Stage 4: Computational Experiment Design
Hypotheses become bounded computational study designs. The planner can create mutation sweeps, motif perturbation studies, local evolutionary branches, contrastive candidate sets, and anti-pattern tests.
These plans are not wet-lab results. They are structured ways to make the next computational cycle more informative by comparing candidate families and stress-testing assumptions.
Stage 5: Constraint-Guided Candidate Generation
Candidate generation happens inside a constrained optimization surface. The generator is not a free-form peptide writer. It is bounded by sequence rules, cleavage-aware logic, failure proximity, novelty pressure, and practical candidate constraints.
Local models may assist as proposal sources where routed and available, but deterministic constraints define what the system is allowed to explore.
Stage 6: Validation And Structural Analysis
Candidates are evaluated through deterministic validation and sequence analysis before they can become serious review objects. The lifecycle can include amino acid validation, motif validation, cleavage scoring, embedding similarity, sequence-space positioning, novelty scoring, reranking, and contradiction checks.
LLMs are proposal and synthesis systems only. Deterministic validators remain authoritative.
Stage 7: Ranking And Scientific Prioritization
Ranking is multi-objective prioritization. Candidate scores can reflect protease-related signals, solubility and permeability proxies, synthesis practicality, novelty, stable similarity, failure similarity, warning burden, and bounded adaptive weights.
Ranking evolves as evidence and memory evolve, but it remains a prioritization layer. It does not establish biological activity, safety, efficacy, or experimental validation.
Stage 8: Candidate Explanations And Scientific Papers
For high-priority candidates, the system retrieves relevant evidence, reranks context, checks generated claims, and creates candidate explanations or computational assessment papers.
This communication layer is designed for founder, researcher, and scientific review. It should translate raw runtime outputs into interpretable rationale while preserving uncertainty.
Stage 9: Wet-Lab Handoff
Top candidates can be organized into review batches with assay category suggestions, comparison groups, risk notes, and rationale for inclusion.
Wet-lab validation is the downstream truth layer. Computational prioritization can improve what gets tested, but it cannot replace controlled assays, comparison groups, experimental controls, or human scientific review.
Stage 10: IP And Provenance Layer
Candidate lineage, source traces, novelty notes, disclosure awareness, and founder-review packages preserve research context for IP-oriented review.
The system can prepare review artifacts. It does not determine patentability, provide legal advice, or convert computational novelty signals into legal conclusions.
Stage 11: Recursive Bounded Learning
Learning is intentionally constrained. Explanation-guided or assay-guided feedback can adjust prioritization within bounded deltas while logging the reason, evidence, prior state, and rollback path.
The system is prevented from becoming an uncontrolled self-modifying loop. It does not rewrite architecture, mutate scoring code, remove failure penalties, or recursively retrain itself without bounds.
Stage 12: Autonomous Research Orchestration
The research planner coordinates runtime modes such as generation, exploration, hypothesis formation, experiment planning, memory consolidation, and data expansion.
This is the move toward an autonomous scientific operating system: the platform proposes the next investigation, records the basis for that proposal, and leaves a replayable trace for review.
curated evidence plane
motif, contradiction, lineage, and exploration memory
bounded hypothesis set
reviewable computational experiment plans
candidate field
validated candidate set
ranked candidate slate
briefs, reports, and candidate assessment papers
review-ready handoff batch
provenance-aware review package
conservative ranking adjustment
next investigation plan
What Remains Private
The public lifecycle explains the operating architecture without exposing moat-critical implementation details.
- Exact routing policy.
- Prompt design.
- Proprietary scoring weights.
- Private source curation and weighting.
- Full failure memory.
- Operator setup, schedules, credentials, deployment paths, and runtime internals.
Scientific boundary
Computational prioritization is not experimental validation. Each lifecycle stage improves review quality, but biological claims still require assay evidence, controls, and human scientific review.
Operational discipline
Where local models are unavailable, exhausted, or contradicted, the runtime records warnings and falls back to deterministic behavior instead of silently promoting claims.
