Skip to content
Protean

Reasoning & memory·Adaptation

Bounded learning

Protean's adaptation envelope is narrow on purpose. Scoring weights move within ±20% of the canonical base (±50% with trusted assay data), the deltas are normalised and replayable, and the system never rewrites its own validators or code.

Operational

The bounded-feedback surface runs at most once per cycle, writes any delta vector into local operator-audit state, and re-normalises against the canonical BASE_WEIGHTS. When trusted labels are absent, it records the no-op state instead of claiming active learning.

The envelope, in numbers

The canonical scoring surface is a seven-component weighted sum. The components and their base weights live in the canonical base weights:

| Component | Base weight | | --- | --- | | protease_resistance | 0.25 | | solubility | 0.15 | | permeability | 0.15 | | novelty | 0.15 | | synthesis_risk | 0.10 | | failure_similarity | 0.10 | | stable_similarity | 0.10 |

The bounded-feedback surface may adjust each weight within ±20% of its base value for explanation-guided adaptation, or ±50% of base when trusted assay data is present. After every update, the weights are normalised to sum to 1.0. The runtime_verify check confirms the invariant at write time; a violation halts the cycle.

Two learning modes

Adaptation has two regimes, distinguished by the maturity of the evidence behind the proposed change.

Explanation-guided. When no assay data is available, the system can adjust prioritisation based on repeated rationale patterns, warnings, strengths, and failure signals surfaced by the explanation layer. The mode improves prioritisation; it does not assert biological truth. Adaptation in this mode is capped at ±20%.

Assay-guided. When reviewed measured outcomes exist for the relevant candidate population, the runtime weighs measured evidence above explanation-derived signal. Adaptation in this mode is capped at ±50%, still normalised, still replayable. Without those reviewed labels, this mode remains inactive.

Both modes write a delta vector with the reason, the supporting evidence, the prior state, and the rollback path into supplemental local operator-audit state. The cycle's terminal RuntimeCycle record on the Protean Ledger is the canonical public record; a verifier reproduces Ledger events through the indexer Digest, then uses replay artifacts only as supplemental material.

What learning cannot do

Bounded feedback is an adjustment, not a redefinition. The system refuses, by construction, to:

  • rewrite the base scoring contract (BASE_WEIGHTS are anchor; deltas float around them, never replace them)
  • remove failure-motif penalties (the failure_similarity component cannot be zeroed)
  • mutate validator code or thresholds
  • self-modify any runtime source (the remediation surface is refused)
  • recursively retrain (recursion depth is 0; one pass per cycle)
  • convert model explanations into validation claims
  • run open-ended feedback amplification

These are enforced in code. Every adaptation path that would touch one of them halts at the first refusal.

One pass per cycle

The cadence is fixed. Bounded feedback runs at most once per cycle, between ranking and explanation. Reranking — the optional pass that re-orders candidates after the bounded update — also runs at most once per cycle. The single-pass constraint exists for one reason: reproducibility. A reviewer can read the prior state, the delta, and the post-state and reconstruct the adaptation deterministically against the cycle's on-chain RuntimeCycle record.

Trust-tier discipline for labels

Active calibration requires trusted labels: approved wet-lab assay outcomes, human-reviewed scientific labels, or curated external assay records with provenance. Explanation-derived patterns remain reviewable proposals until trusted labels exist; the trust-tier classification is reviewed at the evidence boundary, not at the scoring boundary.

The runtime emits an explicit signal when the bounded-feedback pass cannot find trusted-tier evidence for the current candidate population. In that condition, the system falls back to the prior state and records the reason. The cycle still completes; the scoring surface simply does not move.

The objective

The objective of bounded feedback is not unbounded self-improvement. The objective is disciplined prioritisation that preserves reproducibility. The runtime can become more responsive as trusted evidence accumulates, but every public cycle remains a Ledger record whose state can be reproduced from Base mainnet events.