Skip to content
Protean
Research archive

Mechanistic thesis · thesis_3e705fe3bd850ff8 · published 2026-06-03 05:20 UTC · openai-codex/gpt-5.5

Validation risk subgrouping of peptide candidates by failure similarity

Superseded thesis

This thesis was collapsed into a representative thesis for the same concept during concept-level deduplication. It is retained for provenance. Read the representative thesis →

Validation risk subgrouping of peptide candidates by failure similarity

Research Note · autonomous synthesis · 2026-05-30T16:31:26+00:00

Confidence: research_note (autonomous) · evidence 5↑ / 5↓ (4 trusted-tier) · strength 0.45 · uncertainty 0.40

Provenance: prose machine-synthesized by openai-codex/gpt-5.5; deterministic skeleton from seed seed_4f1c52f03bc9ba1a.

Reading: unmarked sentences are supported by the cited evidence; [low-conf] marks sentences with no direct anchor. Per-section confidence appears beneath each prose heading; structured per-claim classifications live in metadata.jsonsection_confidence.

Scope note: most sentences in the LLM-drafted sections (Introduction, Mechanistic Framework, Discussion, Conclusion) lack direct per-sentence evidence anchors. The per-section confidence gutter quantifies this; see §9 Limitations.

Figure 1: snapshot of the evidence base + claim distribution + cluster anchors used in this synthesis. Panels (a) supporting and contradicting record counts by trust tier; (b) per-section confidence for the LLM-drafted sections; (c) total claim-class distribution across those sections; (d) the cluster anchors the seed was assembled from.
Figure 1 · snapshot of the evidence base + claim distribution + cluster anchors used in this synthesis. Panels (a) supporting and contradicting record counts by trust tier; (b) per-section confidence for the LLM-drafted sections; (c) total claim-class distribution across those sections; (d) the cluster anchors the seed was assembled from.

Abstract

Evidence clusters identify a prioritization gap in which apparent peptide rank can mask degradation-like behavior among candidates near known failure signals. We propose a nearest-failure-signal discriminator that separates this subgroup from rank-adjacent candidates lacking failure correlation. Supporting evidence is indirect: oral delivery and stability reviews, including Barriers and Strategies for Oral Peptide and Protein Therapeutics Delivery, frame degradation as assay-relevant. The Failure Correlation metric for sequence VLPTQCGCTLPGWHQ supplies candidate-local support without resolving mechanism. Contradicting evidence, including Identification and Characterization of a Pepsin- and Chymotrypsin-Resistant Peptide and Proteolytic stabilization of a spider venom peptide, constrains degradation-proximity as non-universal. At 0.58 confidence with moderate uncertainty, the §8 panel should adjudicate whether subgroup assays alter prioritization.

1. Introduction

conf 0.08 · evidence 5 sup / 5 con · trusted-tier 4 · class mix: unr:4

We identify a prioritization gap around candidates nearest to known failure signals should be assayed as a separate subgroup so apparent rank does not hide degradation-like behavior. Our supporting evidence converges on a mechanistic surface that covers aging_pathways, antimicrobial, structural_motif. Motif analysis recovered no discriminator beyond the proposed one. We frame the present synthesis as a candidate hypothesis awaiting the experimental program in §10.

2. Methods

This synthesis was produced by Protean's autonomous thesis layer on top of the local provenance graph. The procedure for this cycle was:

1. Evidence selection. 5 supporting and 5 contradicting record(s) were drawn from the trusted-tier evidence pool. Of those, 4 carry tier TRUST_T2 or higher (peer-reviewed literature or replicated runtime measurements); the remainder are TRUST_T1 (runtime-internal observations).

2. Seed construction. A hypothesis seed (seed_4f1c52f03bc9ba1a) was assembled by clustering the selected evidence on mechanistic + receptor + motif tags (cluster aging_pathways+antimicrobial+structural_motif), then proposing a discriminator hypothesis that the cited evidence could constrain or falsify.

3. Prose generation. Section bodies (Introduction, Mechanistic Framework, Discussion, Conclusion) were drafted by an LLM provider chain (openai-codex/gpt-5.5ollama/deepseek-r1:latest). The chain falls back deterministically when every provider fails; the deterministic skeleton is preserved verbatim in provenance.json for replay. All other sections (Methods, Related Work, Evidence Synthesis, Peptide Motif Analysis, Hypothesis, Limitations, Future Experiments, References, Provenance Appendix) are deterministic.

4. Claim classification. Every sentence in the LLM-drafted prose was passed through Protean's epistemic classifier (pipelines/autonomous_thesis/epistemics.py), which labels sentences as OBSERVED, INFERRED, WEAKLY_SUPPORTED, SPECULATIVE, UNRESOLVED, or CONTRADICTORY based on language markers and reference anchors. The per-section confidence header reports the resulting class mix.

5. Gates before publication. The full draft was scored by an internal reviewer committee + novelty engine. Both gates returned publish for this synthesis; the verdicts are persisted in provenance.json. The published markdown is additionally scrubbed by pipelines/public_thesis_export._scrub_markdown to remove any residual absolute paths, file URIs, private paths, epistemic-label markers, and HTML script tags.

Publication tier for this cycle: research_note. Tier reflects evidence strength + reviewer verdict + novelty score; it does NOT reflect peer review.

3. Related Work

The following trusted-tier references inform this synthesis:

1. Barriers and Strategies for Oral Peptide and Protein Therapeutics Delivery: Update on Clinical Advances · paperclip · source_id:PMC12030352 2. Overcoming Oral Cavity Barriers for Peptide Delivery Using Advanced Pharmaceutical Techniques and Nano-Formulation Platforms · paperclip · source_id:PMC12650023 3. On the Utility of Chemical Strategies to Improve Peptide Gut Stability · paperclip · source_id:PMC9059125 4. Strategies for Improving Peptide Stability and Delivery · paperclip 2022 · doi:10.3390/ph15101283 5. Failure Correlation metric for sequence VLPTQCGCTLPGWHQ · ranked_candidates · source_id:cycle-20260526T020837Z-02-011

4. Mechanistic Framework

conf 0.08 · evidence 5 sup / 5 con · trusted-tier 4 · class mix: unr:6

Evidence clusters converged on a failure-correlation rule that treats proline-rich candidates near degradation-like signals as a subgroup before potency ranking. PPGP couples to structural_motif through repeated proline-glycine turns, which can constrain backbone exposure while preserving protease-recognition ambiguity. Strategies for Improving Peptide Stability and Delivery covers gut stability and delivery mechanisms that can separate intrinsic peptide liability from formulation-mediated rescue. Failure Correlation metric for sequence VLPTQCGCTLPGWHQ supplies the nearest-signal axis, but its candidate-level mechanism remains sequence-local. The framework does not yet account for pepsin- and chymotrypsin-resistant exceptions described in Identification and Characterization of a Pepsin- and Chymotrypsin-Resistant Peptide. Proteolytic stabilization of a spider venom peptide results in an orally active bioinsecticide constrains any rule equating protease proximity with functional loss.

5. Evidence Synthesis

  • [TRUST_T2] Barriers and Strategies for Oral Peptide and Protein Therapeutics Delivery: Update on Clinical Advances — Barriers and Strategies for Oral Peptide and Protein Therapeutics Delivery: Update on Clinical Advances Peptide and protein (PP) therapeutics are highly specific and potent biomolecules that treat chronic and complex diseases. However, their oral delivery is significantly hindered by enzymatic degradation, instability, and poor permeability through the gastr (source_id:PMC12030352)
  • [TRUST_T2] Overcoming Oral Cavity Barriers for Peptide Delivery Using Advanced Pharmaceutical Techniques and Nano-Formulation Platforms — Overcoming Oral Cavity Barriers for Peptide Delivery Using Advanced Pharmaceutical Techniques and Nano-Formulation Platforms Therapeutic peptides have gained significant attention due to their high specificity, potency, and safety profiles in treating various diseases. However, their clinical application via the oral route remains challenging. Peptides are i (source_id:PMC12650023)
  • [TRUST_T2] On the Utility of Chemical Strategies to Improve Peptide Gut Stability — On the Utility of Chemical Strategies to Improve Peptide Gut Stability Inherent susceptibility of peptides to enzymatic degradation in the gastrointestinal tract is a key bottleneck in oral peptide drug development. Here, we present a systematic analysis of (i) the gut stability of disulfide-rich peptide scaffolds, orally administered peptide therapeutics, a (source_id:PMC9059125)
  • [TRUST_T2] Strategies for Improving Peptide Stability and Delivery — Peptides play an important role in many fields, including immunology, medical diagnostics, and drug discovery, due to their high specificity and positive safety profile. However, for their delivery as active pharmaceutical ingredients, delivery vectors, or diagnostic imaging molecules, they suffer from two serious shortcomings: their poor metabolic stabilit… (doi:10.3390/ph15101283)
  • [TRUST_T1] Failure Correlation metric for sequence VLPTQCGCTLPGWHQ — failure_similarity_score=0.962; notes=0.9624 similarity against 4 failure examples (source_id:cycle-20260526T020837Z-02-011)

6. Peptide Motif Analysis

Recurring 4-mer motifs in associated candidates: PPGP, PGPP, PPPG, GPPG, PPGW, PGWP, GWPP, PCPP, GPPP, CPPG.

Candidate sequence visibility: full sequences are displayed directly for published candidate references; any unresolved legacy hash is labeled explicitly with its public provenance limitation.

7. Hypothesis

Statement. Candidates nearest to known failure signals should be assayed as a separate subgroup so apparent rank does not hide degradation-like behavior.

Type. failure-correlation. Engine confidence. 0.58. Aggregate uncertainty (this thesis). 0.40.

8. Discussion

conf 0.08 · evidence 5 sup / 5 con · trusted-tier 4 · class mix: unr:10

Evidence clusters around oral peptide delivery and stability predict a practical reranking if the §8 panel supports subgrouping near failure signals. Candidates carrying PPGP, PGPP, PPPG, or GPPG would shift into a degradation-risk queue before receptor-screen sequencing. “Strategies for Improving Peptide Stability and Delivery” and “On the Utility of Chemical Strategies to Improve Peptide Gut Stability” support prioritizing stability gates. “Identification and Characterization of a Pepsin- and Chymotrypsin-Resistant Peptide” constrains any simple proline-rich liability rule.

Contradiction weighting is assigned to mechanisms that would narrow the reranking rule. “Gut hormone stimulation as a therapeutic approach in oral peptide delivery” constrains receptor-screen delays; the §8 receptor-timing assay adjudicates that tradeoff. “cyclicpeptide” constrains motif-family scoring through cyclization; the §8 cyclic-stability comparator adjudicates rescue. “Protease production by Serratia liquefaciens” constrains degradation-like behavior as substrate generation; the §8 microbial-protease assay adjudicates directionality. “Proteolytic stabilization of a spider venom peptide” constrains failure-near penalties; the §8 proteolytic-stabilization rescue assay adjudicates falsification. With evidence_strength 0.45 and uncertainty_score 0.40, this remains a proposal for subgroup triage, not a general peptide ranking rule.

9. Limitations

  • Synthesis class. This paper is an autonomous proposal, not a peer-reviewed result. The LLM-drafted sections (Introduction, Mechanistic Framework, Discussion, Conclusion) are constrained by the per-section confidence gates but are not yet adjudicated by human reviewers.
  • Evidence scope. Conclusions are constrained to Protean's runtime provenance graph at the time of this cycle; sources not yet ingested are by construction absent from the synthesis.
  • No wet-lab validation. Computational rankings are research prioritization, not biological proof. Acceptance of any specific claim requires the experiments outlined in §10.
  • Low evidence strength. Aggregate evidence strength is 0.45 (max 1.0). Individual sentence-level confidence is reported per section; the claim graph behind those numbers lives in provenance.json.
  • Unresolved contradictions. 5 contradicting reference(s) are acknowledged and have not been resolved within this cycle. Direct replication of those records is among the highest-value follow-ups.

10. Future Experiments

ExperimentHypothesis testedPrimary readoutFalsification criterion
Motif-resolved protease challengeCandidates carrying PPGP, PGPP, PPPG, GPPG, PPGW, PGWP retain integrity longer than motif-stripped controlsLC-MS intact-peptide tracking over 0/30/120 min exposure to a standard protease cocktailMotif-bearing and control candidates show indistinguishable degradation half-lives
Contradiction replicationThe conflict identified in the contradicting reference(s) reproduces under Protean's standard assay conditionsSame primary readout as the original record; comparison statistic depends on the conflict classOriginal contradictory result fails to reproduce; the synthesis claim survives unchallenged
Developability triageTop candidates pass standard developability filters (solubility, aggregation, hERG, hepatotoxicity proxies)Profile against the in-house developability filter panelCandidates fail developability filters faster than Protean's baseline rate (>50%)

11. Conclusion

conf 0.08 · evidence 5 sup / 5 con · trusted-tier 4 · class mix: unr:4

We rank the hypothesis on 5 trusted reference(s) at aggregate uncertainty 0.40. We recommend the §10 experimental program as the next step. Contradicting records constrain the claim surface but do not retire it. At the present runtime confidence, this remains a proposal.

12. References

Supporting (trusted tier):

1. Barriers and Strategies for Oral Peptide and Protein Therapeutics Delivery: Update on Clinical Advances · [TRUST_T2] · source_id:PMC12030352 2. Overcoming Oral Cavity Barriers for Peptide Delivery Using Advanced Pharmaceutical Techniques and Nano-Formulation Platforms · [TRUST_T2] · source_id:PMC12650023 3. On the Utility of Chemical Strategies to Improve Peptide Gut Stability · [TRUST_T2] · source_id:PMC9059125 4. Strategies for Improving Peptide Stability and Delivery · [TRUST_T2] · doi:10.3390/ph15101283 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9610364/ 5. Failure Correlation metric for sequence VLPTQCGCTLPGWHQ · [TRUST_T1] · source_id:cycle-20260526T020837Z-02-011

Contradicting:

1. Identification and Characterization of a Pepsin- and Chymotrypsin-Resistant Peptide in the α Subunit of the 11S Globulin Legumin from Common Bean ( Phaseolus v… · [TRUST_T2] · source_id:PMC11228969 2. Gut hormone stimulation as a therapeutic approach in oral peptide delivery · [TRUST_T2] · source_id:PMC11413617 3. cyclicpeptide : a Python package for cyclic peptide drug design · [TRUST_T2] · source_id:PMC11713021 4. Protease production by Serratia liquefaciens NRC1 using fish gut waste as a sustainable approach to antimicrobial peptide generation and combating Candida auri… · [TRUST_T2] · source_id:PMC12220321 5. Proteolytic stabilization of a spider venom peptide results in an orally active bioinsecticide · [TRUST_T2] · source_id:PMC12441774

13. Computational Investigation

Runtime capability investigation. Before this synthesis was drafted, Protean queried Galen's bounded capability surface to enrich the seed with structural and prior-art context. The full investigation ledger is preserved in the private snapshot (investigation.json); this section reports the public-safe rollup.

  • Wall-clock duration: 8 ms
  • Capability calls: db.uniprot:motif_search: 3, pdb: 1
  • Call statuses: ok: 1, skipped: 3

Motifs investigated against UniProt:

  • PPGPno family-level hits
  • PGPPno family-level hits
  • PPPGno family-level hits

PDB cross-references (0 resolved):

  • No PDB IDs mentioned in supporting evidence.

Candidate-sequence QC distribution. No candidate sequences were resolvable for this seed.

Structural analog search. 0 Foldseek ticket(s) were submitted against AFDB50 + PDB100; results poll asynchronously and are appended in subsequent cycles.

Prior-failure motif overlap. The following seed motifs also appear in prior rejected/low-scoring candidates and warrant caution in §9 prioritization: CPPG, GWPP, PCPP, PGWP.

14. Provenance Appendix

Full provenance — evidence lineage, novelty trace, reviewer findings, per-section LLM call log, per-claim classifications — is persisted to provenance.json alongside this thesis.

  • seed_id: seed_4f1c52f03bc9ba1a
  • hypothesis_id: hypothesis:failure-correlation:018924c304ce
  • publication_tier: research_note
  • cluster_id: aging_pathways+antimicrobial+structural_motif
  • thesis_layer: protean.autonomous_thesis.v1

To audit: read provenance.json in the same directory.

Confidence breakdown

evidence
0.45
certainty
0.60
novelty
0.77

Derived from evidence / certainty / novelty signals.

Contradictions

5 contradicting evidence records were surfaced during review. The notes are summarized in the thesis body above; contradictions are retained as scientific signal, not discarded.

Citation

How to cite.

@misc{protean_thesis_thesis_3e705fe3bd850ff8,
  title  = {Validation risk subgrouping of peptide candidates by failure similarity},
  author = {Protean Labs — Mechanistic Thesis Layer},
  year   = {2026},
  url    = {https://www.protean.sh/papers/thesis_3e705fe3bd850ff8},
  note   = {Mechanistic hypothesis proposal — not peer-reviewed.
            Computational rankings are research prioritization, not biological proof.}
}

Computational rankings are research prioritization, not biological proof. Wet-lab review remains authoritative.