Antimicrobial candidate subgrouping by proximity to known failure signals

Protean Labs — Mechanistic Thesis Layer

Runtime Memo: Antimicrobial candidate subgrouping by proximity to known failure signals

Autonomous runtime memo · 2026-05-25T15:54:56+00:00

Confidence: runtime_memo (autonomous) · evidence 5↑ / 0↓ (2 trusted-tier) · strength 0.35 · uncertainty 0.16
Provenance: prose machine-synthesized by openai-codex/gpt-5.5; deterministic skeleton from seed seed_f2f776aeeb081718.
Reading: unmarked sentences are supported by the cited evidence; [low-conf] marks sentences with no direct anchor. Per-section confidence appears beneath each prose heading; structured per-claim classifications live in metadata.json → section_confidence.

Abstract

Galen flagged a prioritization gap where apparent peptide rank can mask candidates adjacent to failure signals. The runtime associated failure-correlation proximity with proline-rich motifs, separating rank-favored candidates from a degradation-risk subgroup. Support came from multiple Failure Correlation metric records, CyclicMPNN, and degradation;instability, forming a small motif-centered evidence cluster. No contradicting evidence was supplied, so constraint came from limited evidence strength rather than opposing records. The runtime holds moderate confidence under low uncertainty, with the §8 panel positioned to adjudicate subgroup assay priority.

1. Introduction

conf 0.09 · evidence 5 sup / 0 con · trusted-tier 2 · class mix: unr:5
Note: majority of sentences in this section lack direct evidence anchors — see Limitations.

Galen flagged a prioritization gap where top-ranked antimicrobial candidates can sit near known failure signals without receiving a separate degradation-risk assay. A sequence-divergence discriminator would isolate candidates by distance from failure-correlated sequences, instead of grouping them with motif-recombined stability analogs from CyclicMPNN. The mechanistic locus sits in proline-rich runs within protease-resistant scaffolds, including PPGP and PGPP, with no receptor family assigned in the graph. Failure Correlation metric for legacy sequence hash b8787c3d (full sequence not found in public provenance) and degradation;instability anchor the failure-side signal at evidence strength 0.35. The runtime proposes separate assay routing for this subgroup at confidence 0.76, with uncertainty 0.16 limiting claims to workflow triage.

2. Related Work

The following trusted-tier references inform this synthesis:

1. Failure Correlation metric for legacy sequence hash b8787c3d (full sequence not found in public provenance) · ranked_candidates · source_id:20260523T190743Z-037 2. Failure Correlation metric for sequence WPLPVTNGEPPGSQCHQQWPP · ranked_candidates · source_id:20260523T190743Z-038 3. Failure Correlation metric for legacy sequence hash a42d5ef3 (full sequence not found in public provenance) · ranked_candidates · source_id:20260523T190743Z-016 4. CyclicMPNN: Stable Cyclic Peptide Sequence Generation · paperclip · source_id:bio_e1a320b06d40 5. degradation;instability · pubmed · source_id:38401875

3. Mechanistic Framework

conf 0.09 · evidence 5 sup / 0 con · trusted-tier 2 · class mix: unr:6
Note: majority of sentences in this section lack direct evidence anchors — see Limitations.

Motif extraction surfaced proline-rich runs near failure-correlation candidates, with PPGP, PGPP, and PPPG recurring across the redacted sequence metrics. PPGP couples to protease_resistance through constrained backbone geometry, because adjacent prolines can reduce accessible cleavage conformations in short antimicrobial peptides. Failure Correlation metric for legacy sequence hash b8787c3d (full sequence not found in public provenance) covers proximity to degradation-like failure signals rather than primary antimicrobial activity. The synthesis pass coupled CyclicMPNN: Stable Cyclic Peptide Sequence Generation to cyclic stabilization context, not direct evidence for linear motif durability. The framework does not yet account for cleavage-site mapping, carrier-mediated penetration, or assay-specific degradation kinetics across PPGW, PGWP, and GWPP variants. Runtime confidence stays bounded because degradation;instability supplies a broad failure label, while no contradicting record narrows the subgroup boundary.

4. Evidence Synthesis

[TRUST_T1] Failure Correlation metric for legacy sequence hash b8787c3d (full sequence not found in public provenance) — failure_similarity_score=0.944; notes=0.9442 similarity against 4 failure examples (source_id:20260523T190743Z-037)
[TRUST_T1] Failure Correlation metric for sequence WPLPVTNGEPPGSQCHQQWPP — failure_similarity_score=0.954; notes=0.9539 similarity against 4 failure examples (source_id:20260523T190743Z-038)
[TRUST_T1] Failure Correlation metric for legacy sequence hash a42d5ef3 (full sequence not found in public provenance) — failure_similarity_score=0.934; notes=0.9337 similarity against 4 failure examples (source_id:20260523T190743Z-016)
[TRUST_T2] CyclicMPNN: Stable Cyclic Peptide Sequence Generation — CyclicMPNN: Stable Cyclic Peptide Sequence Generation Cyclic peptides are a promising class of therapeutics due to their attractive drug qualities such as increased structural stability, cell permeability, and resistance to proteolytic degradation. With recent advancements in cyclic peptide backbone generation models like CyclicCAE and RFPeptide, generating (source_id:bio_e1a320b06d40)
[TRUST_T2] degradation;instability — degradation;instability While thuricin CD was degraded by proteases and was unstable and poorly soluble in gastric fluid, it showed increased solubility in intestinal fluid, probably due to micelle encapsulation. Thuricin CD is a two-peptide antimicrobial produced by Bacillus thuringiensis. Unlike previous antibiotics, it has shown narrow spectrum activity a (source_id:38401875)

5. Peptide Motif Analysis

Recurring 4-mer motifs in associated candidates: PPGP, PGPP, PPPG, GPPG, PPGW, PGWP, GWPP, PCPP, GPPP, CPPG.

Candidate sequence visibility: full sequences are displayed directly for published candidate references; any unresolved legacy hash is labeled explicitly with its public provenance limitation.

6. Hypothesis

Statement. Candidates nearest to known failure signals should be assayed as a separate subgroup so apparent rank does not hide degradation-like behavior.

Type. failure-correlation. Engine confidence. 0.76. Aggregate uncertainty (this thesis). 0.16.

7. Discussion

conf 0.09 · evidence 5 sup / 0 con · trusted-tier 2 · class mix: unr:10
Note: majority of sentences in this section lack direct evidence anchors — see Limitations.

The runtime would split near-failure candidates into a separate decision lane if the §8 panel reproduces degradation-like behavior. Failure Correlation metric for legacy sequence hash b8787c3d (full sequence not found in public provenance) supplied the nearest-neighbor failure signal, while degradation;instability supplied the mechanistic class. Candidate prioritization would demote high apparent rank entries carrying PPGP, PGPP, PPPG, or GPPG runs until protease-resistance data clears them. Motif-family scoring would add a penalty for proline-rich runs overlapping PPGW, PGWP, GWPP, PCPP, GPPP, and CPPG. Receptor-screen sequencing would move later, because antimicrobial triage should first separate cell-activity loss from protease-driven attrition.

Contradiction weighting found no named contradicting records, so constraint comes from §8 adjudication rather than literature conflict. The §8 protease-resistance experiment narrows the rule if near-failure candidates decay faster without matching antimicrobial loss. The §8 motif-family ablation experiment narrows scoring if PPGP-family edits change stability but leave rank-linked failure unchanged. Falsification requires near-failure subgroup matching controls across degradation kinetics and antimicrobial readout, while motif-enriched candidates retain rank. With evidence_strength 0.35 and uncertainty_score 0.16, the rule should guide subgroup assays, not discard candidates.

8. Limitations

No explicit blocking limitations detected by automated triage. Manual scientific review remains required.

9. Future Experiments

Synthesize representative candidates carrying the listed motifs and run the standard developability + protease-resistance assay panel.

10. Conclusion

conf 0.09 · evidence 5 sup / 0 con · trusted-tier 2 · class mix: unr:5
Note: majority of sentences in this section lack direct evidence anchors — see Limitations.

Galen ranked proline-rich motifs near failure signals as a separate assay stratum, so rank scores do not mask degradation-like behavior. Motif extraction centered PPGP, PGPP, PPPG, and GPPG with antimicrobial and protease-resistance tags. The cheapest discriminator is a protease-challenge time course with LC-MS intact-peptide tracking and matched antimicrobial readout. No contradicting record entered the graph; constraint comes from low evidence strength. Runtime scope holds as a failure-correlation proposal at 0.76 confidence.

11. References

Supporting (trusted tier):

1. Failure Correlation metric for legacy sequence hash b8787c3d (full sequence not found in public provenance) · [TRUST_T1] · source_id:20260523T190743Z-037 2. Failure Correlation metric for sequence WPLPVTNGEPPGSQCHQQWPP · [TRUST_T1] · source_id:20260523T190743Z-038 3. Failure Correlation metric for legacy sequence hash a42d5ef3 (full sequence not found in public provenance) · [TRUST_T1] · source_id:20260523T190743Z-016 4. CyclicMPNN: Stable Cyclic Peptide Sequence Generation · [TRUST_T2] · source_id:bio_e1a320b06d40 5. degradation;instability · [TRUST_T2] · source_id:38401875

12. Runtime Investigation

Runtime capability investigation. Before this synthesis was drafted, Protean queried Galen's bounded capability surface to enrich the seed with structural and prior-art context. The full investigation ledger is preserved in a private provenance snapshot; this section reports the public-safe rollup.

Wall-clock duration: 18 ms
Capability calls: db.uniprot:motif_search: 3, pdb: 2
Call statuses: ok: 2, skipped: 3

Motifs investigated against UniProt:

PPGP → no family-level hits
PGPP → no family-level hits
PPPG → no family-level hits

PDB cross-references (0 resolved):

No PDB IDs mentioned in supporting evidence.

Candidate-sequence QC distribution. No candidate sequences were resolvable for this seed.

Structural analog search. 0 Foldseek ticket(s) were submitted against AFDB50 + PDB100; results poll asynchronously and are appended in subsequent cycles.

Prior-failure motif overlap. The following seed motifs also appear in prior rejected/low-scoring candidates and warrant caution in §9 prioritization: CPPG, GPPG, GPPP, GWPP, PCPP, PGPP, PGWP, PPGP.

13. Runtime Metadata

Operational context for this thesis cycle. Sourced from the synthesis seed and the prose-model log; not part of the scientific claim graph.

Publication tier: runtime_memo Prose model: openai-codex/gpt-5.5 · 6/6 sections via primary model

Prose model call log:

Section	Winner	Latency (ms)	Validation codes
title	openai-codex/gpt-5.5	19688	—
abstract	openai-codex/gpt-5.5	22490	—
introduction	openai-codex/gpt-5.5	28604	—
mechanistic_framework	openai-codex/gpt-5.5	34904	—
discussion	openai-codex/gpt-5.5	36137	—
conclusion	openai-codex/gpt-5.5	22829	—

Per-section confidence:

Section	Confidence	Low-conf sentences
conclusion	0.09	5
discussion	0.09	10
introduction	0.09	5
mechanistic_framework	0.09	6

Contradictions: none acknowledged this cycle.

14. Provenance Appendix

Full provenance (evidence lineage, novelty trace, reviewer findings) is persisted to provenance.json alongside this thesis.

seed_id: seed_f2f776aeeb081718
hypothesis_id: hypothesis:failure-correlation:42c23cf656f4
publication_tier: runtime_memo
cluster_id: antimicrobial+protease_resistance+structural_motif
thesis_layer: protean.autonomous_thesis.v1

To audit: read provenance.json in the same directory.