Research Cognition
Protean Labs separates candidate production from scientific reasoning. The runtime still generates, validates, ranks, explains, and learns within bounded controls, while a second layer asks what should be studied next.
This layer is artifact-only. It does not rewrite code, mutate scoring rules, or alter learning logic.
Operating Model
evidence retrieval
-> hypothesis formation
-> computational experiment design
-> candidate-set comparison
-> bounded memory consolidation
-> next-investigation priority
The goal is to move from repeated candidate ranking toward persistent scientific exploration.
Hypotheses
The hypothesis engine proposes reviewable scientific hypotheses across motif behavior, cleavage exposure, shielding patterns, novelty pressure, failure correlation, and structural constraints.
Each hypothesis carries confidence, supporting evidence, contradictory evidence, candidate groups, status, and lineage.
Sequence Space
Sequence-space analysis maps the current candidate field into clusters, motif families, neighborhoods, redundancy groups, novelty gradients, and underexplored regions.
ESM-derived sequence signals remain the peptide/protein similarity layer. Text embeddings support evidence retrieval; they do not replace sequence similarity.
Scientific Memory
Scientific memory persists what the runtime has learned as replayable artifacts:
- motif memory
- failure memory
- contradiction memory
- exploration memory
- candidate lineage
- retrieval history
- experiment memory
This gives the system continuity without giving it permission to mutate its own architecture.
Experiment Planning
The experiment planner creates computational study designs:
- mutation sweeps
- motif perturbations
- contrastive candidate sets
- local exploration branches
- ablation and challenge studies
These plans are scientific review artifacts, not wet-lab findings.
Data Provenance
The provenance layer scores evidence sources by authority, curation, access quality, freshness, provenance granularity, replication utility, and compliance posture.
Source reliability influences review context. It does not convert computational claims into biological validation.
Bounded Planning
The research planner prioritizes six runtime modes:
- generation
- exploration
- hypothesis
- experiment
- memory consolidation
- data expansion
Every planned action records allowed writes and forbidden mutations. The system may recommend investigations, but it cannot self-modify code, scoring, or learning rules.
