Skip to content
Protean

Runtime·Generation surface

Constraint engine

The constraint engine composes the surface that candidate proposal explores. Cleavage exposure, motif redundancy, novelty floor, synthesis plausibility, composition baseline — each is a hard or soft gate before scoring.

Operational

Constraints are composed at cycle start from evidence, failure memory, and project priorities. Hard gates halt candidates before scoring; soft constraints become warnings and ranking context.

What the engine constrains

The constraint surface narrows candidate space along eight axes. The runtime treats constraints as an operating surface — not a prompt accessory.

  • Protease exposure. Cleavage risk against the four-enzyme panel: trypsin, chymotrypsin, pepsin, elastase. Thermolysin is annotation-only and does not affect scoring. The canonical rules live in the canonical protease ruleset.
  • Digestion stability. Composite signal across protease panel + literature-derived degradation patterns + internal failure memory.
  • Sequence complexity. Composition penalty: single-residue fraction ≤ 35%, longest residue run ≤ 4. Shannon entropy (per-residue, bits) below threshold flags low-complexity sequences before ranking.
  • Composition baseline. Composition KL divergence vs the UniProt baseline, measured in nats. Values above ~1.5 indicate severe bias and gate the candidate.
  • Hydrophobicity and amphipathicity. Hydrophobic moment (Eisenberg, alpha-helix period) and Boman index (Boman 2003) as composition signals.
  • Aggregation propensity. AGGRESCAN a3v (Conchillo-Solé 2007; HSA threshold −0.02).
  • Disorder propensity. TOP-IDP (Campen 2008).
  • Novelty and redundancy. Embedding similarity to stable examples and to failure examples. Sequence-space neighbourhood distance.
  • Synthesis practicality. Heuristic checks on synthesis-risk-correlated motifs (sticky residues, modification handles, etc.).

Hard gates and soft constraints

Constraints split into two enforcement classes. The split is critical: hard gates halt before scoring, soft constraints become warnings.

Hard gates halt a candidate before it reaches scoring. They include: invalid residues, malformed length, severe composition imbalance, AGGRESCAN HSA below threshold, and any motif on the hard-rejection list. Hard gates protect the platform from invalid or obviously weak candidates.

Soft constraints become warnings, ranking penalties, or review context. They include most novelty signals, the failure-similarity surface, and synthesis-risk heuristics. Soft constraints preserve nuance for scientific review — a candidate near failure memory may still be scientifically interesting, but it should advance with visible context.

Constraint sources

Constraints are composed at cycle start from:

  • literature-derived stability and degradation signals
  • internal failure memory (the contradiction graph and failure-similarity surface)
  • known stable and unstable reference sets
  • sequence-level feature ranges
  • project-specific research priorities
  • experimental planning requirements from the hypothesis layer

The engine does not treat all signals equally. Source quality, evidence type, failure proximity, and review confidence affect how strongly a signal influences candidate space.

Embedding-guided proposal

The proposal layer reads the composed constraint surface and explores within it. Embedding-guided synthesis uses ESM-2 sequence embeddings (facebook/esm2_t12_35M_UR50D) to bias proposal toward unexplored regions of sequence space — but the proposal layer cannot violate hard gates. KL gates on the composed proposal distribution prevent the layer from drifting into degenerate regions.

The constraint engine is the contract that the proposal layer obeys. The proposal layer is free to be creative inside the contract; it cannot rewrite the contract.

Design discipline

The engine turns research priorities into operational boundaries. Candidates that do not respect those boundaries are stopped early — usually before any model is invoked. This is what makes the runtime tractable: most rejections happen at the constraint surface, not at the scoring surface.