Peptide candidates for protease-resistance assays distinguished by cleavage-rule score
Research Note · autonomous synthesis · 2026-06-01T16:31:05+00:00
Confidence: research_note (autonomous) · evidence 5↑ / 5↓ (2 trusted-tier) · strength 0.35 · uncertainty 0.41
Provenance: prose machine-synthesized by
openai-codex/gpt-5.5; deterministic skeleton from seedseed_d3f302b70908da22.Reading: unmarked sentences are supported by the cited evidence;
[low-conf]marks sentences with no direct anchor. Per-section confidence appears beneath each prose heading; structured per-claim classifications live inmetadata.json→section_confidence.
Scope note: most sentences in the LLM-drafted sections (Introduction, Mechanistic Framework, Discussion, Conclusion) lack direct per-sentence evidence anchors. The per-section confidence gutter quantifies this; see §9 Limitations.

Abstract
Evidence clusters in this cycle target a prioritization gap among candidate peptides with similar annotation but uneven cleavage-rule scores. We propose joint assay ranking as a discriminator separating lower-degradation candidates from peptides predicted to remain protease-labile. Support is score-centered and indirect, including Cleavage metric for sequence GHQMQHHCDDSQPTDCWP and the resistant-peptide record from 11S Globulin Legumin. Contradicting evidence from Strategies for Improving Peptide Stability and Delivery and oral-delivery barrier reviews constrains translation beyond controlled proteolysis. Confidence remains moderate-low; the §8 panel should adjudicate whether PPGP, PGPP, PPPG, GPPG, and related motifs track measured degradation.
1. Introduction
conf 0.08 · evidence 5 sup / 5 con · trusted-tier 2 · class mix: unr:4
We identify a prioritization gap around candidates with higher cleavage-rule scores should be assayed together to test whether the rule set predicts lower proteolytic degradation. Our supporting evidence converges on a mechanistic surface that covers protease_resistance. Motif analysis recovered no discriminator beyond the proposed one. We frame the present synthesis as a candidate hypothesis awaiting the experimental program in §10.
2. Methods
This synthesis was produced by Protean's autonomous thesis layer on top of the local provenance graph. The procedure for this cycle was:
1. Evidence selection. 5 supporting and 5 contradicting record(s) were drawn from the trusted-tier evidence pool. Of those, 2 carry tier TRUST_T2 or higher (peer-reviewed literature or replicated runtime measurements); the remainder are TRUST_T1 (runtime-internal observations).
2. Seed construction. A hypothesis seed (seed_d3f302b70908da22) was assembled by clustering the selected evidence on mechanistic + receptor + motif tags (cluster protease_resistance), then proposing a discriminator hypothesis that the cited evidence could constrain or falsify.
3. Prose generation. Section bodies (Introduction, Mechanistic Framework, Discussion, Conclusion) were drafted by an LLM provider chain (openai-codex/gpt-5.5 → ollama/deepseek-r1:latest). The chain falls back deterministically when every provider fails; the deterministic skeleton is preserved verbatim in provenance.json for replay. All other sections (Methods, Related Work, Evidence Synthesis, Peptide Motif Analysis, Hypothesis, Limitations, Future Experiments, References, Provenance Appendix) are deterministic.
4. Claim classification. Every sentence in the LLM-drafted prose was passed through Protean's epistemic classifier (pipelines/autonomous_thesis/epistemics.py), which labels sentences as OBSERVED, INFERRED, WEAKLY_SUPPORTED, SPECULATIVE, UNRESOLVED, or CONTRADICTORY based on language markers and reference anchors. The per-section confidence header reports the resulting class mix.
5. Gates before publication. The full draft was scored by an internal reviewer committee + novelty engine. Both gates returned publish for this synthesis; the verdicts are persisted in provenance.json. The published markdown is additionally scrubbed by pipelines/public_thesis_export._scrub_markdown to remove any residual absolute paths, file URIs, private paths, epistemic-label markers, and HTML script tags.
Publication tier for this cycle: research_note. Tier reflects evidence strength + reviewer verdict + novelty score; it does NOT reflect peer review.
3. Related Work
The following trusted-tier references inform this synthesis:
1. Cleavage metric for sequence GHQMQHHCDDSQPTDCWP · ranked_candidates · source_id:cycle-20260526T020837Z-02-001 2. Cleavage metric for sequence DCDQTNWPCGGQQHCDKA · ranked_candidates · source_id:cycle-20260526T020837Z-02-005 3. Cleavage metric for sequence HQMAQHCDDCDQFPTDCG · ranked_candidates · source_id:cycle-20260526T020837Z-02-002 4. Bowman–Birk Inhibitor Mutants of Soybean Generated by CRISPR-Cas9 Reveal Drastic Reductions in Trypsin and Chymotrypsin Inhibitor Activities · paperclip · source_id:PMC11171862 5. Identification and Characterization of a Pepsin- and Chymotrypsin-Resistant Peptide in the α Subunit of the 11S Globulin Legumin from Common Bean ( Phaseolus v… · paperclip · source_id:PMC11228969
4. Mechanistic Framework
conf 0.08 · evidence 5 sup / 5 con · trusted-tier 2 · class mix: unr:6
Evidence clusters converged on proline-rich cleavage-rule candidates, prioritizing joint assays where predicted scores track proteolytic degradation rather than single-sequence stability. PPGP couples to protease_resistance through constrained proline-rich runs that reduce backbone accessibility and disfavour protease binding geometry. Identification and Characterization of a Pepsin- and Chymotrypsin-Resistant Peptide covers protease_resistance against pepsin and chymotrypsin within 11S globulin legumin. Cleavage metric for sequence GHQMQHHCDDSQPTDCWP and its companion metrics supply rule-score inputs but not degradation measurements under matched protease exposure. The framework does not yet account for formulation, permeability, or oral delivery constraints emphasized by Strategies for Improving Peptide Stability and Delivery, doi:10.3390/ph15101283. Impact of Peptide Structure on Colonic Stability and Tissue Permeability also limits direct extrapolation from cleavage rules to tissue-level persistence.
5. Evidence Synthesis
- [TRUST_T1] Cleavage metric for sequence GHQMQHHCDDSQPTDCWP — cleavage_risk_score=0.844; high_risk_sites=3. (
source_id:cycle-20260526T020837Z-02-001) - [TRUST_T1] Cleavage metric for sequence DCDQTNWPCGGQQHCDKA — cleavage_risk_score=0.800; high_risk_sites=4. (
source_id:cycle-20260526T020837Z-02-005) - [TRUST_T1] Cleavage metric for sequence HQMAQHCDDCDQFPTDCG — cleavage_risk_score=0.889; high_risk_sites=2. (
source_id:cycle-20260526T020837Z-02-002) - [TRUST_T2] Bowman–Birk Inhibitor Mutants of Soybean Generated by CRISPR-Cas9 Reveal Drastic Reductions in Trypsin and Chymotrypsin Inhibitor… — Bowman–Birk Inhibitor Mutants of Soybean Generated by CRISPR-Cas9 Reveal Drastic Reductions in Trypsin and Chymotrypsin Inhibitor Activities Despite the high quality of soybean protein, raw soybeans and soybean meal cannot be directly included in animal feed mixtures due to the presence of Kunitz (KTi) and Bowman–Birk protease inhibitors (BBis), which reduce (
source_id:PMC11171862) - [TRUST_T2] Identification and Characterization of a Pepsin- and Chymotrypsin-Resistant Peptide in the α Subunit of the 11S Globulin Legumin … — Identification and Characterization of a Pepsin- and Chymotrypsin-Resistant Peptide in the α Subunit of the 11S Globulin Legumin from Common Bean ( Phaseolus vulgaris L.) The 11S globulin legumin typically accounts for approximately 3% of the total protein in common beans ( Phaseolus vulgaris ). It was previously reported that a legumin peptide of approximat (
source_id:PMC11228969)
6. Peptide Motif Analysis
Recurring 4-mer motifs in associated candidates: PPGP, PGPP, PPPG, GPPG, PPGW, PGWP, GWPP, PCPP, GPPP, CPPG.
Candidate sequence visibility: full sequences are displayed directly for published candidate references; any unresolved legacy hash is labeled explicitly with its public provenance limitation.
7. Hypothesis
Statement. Candidates with higher cleavage-rule scores should be assayed together to test whether the rule set predicts lower proteolytic degradation.
Type. cleavage. Engine confidence. 0.55. Aggregate uncertainty (this thesis). 0.41.
8. Discussion
conf 0.08 · evidence 5 sup / 5 con · trusted-tier 2 · class mix: unr:11
Evidence clusters support a downstream triage rule if the §8 panel aligns with cleavage scores. We would prioritize candidates carrying PPGP, PGPP, PPPG, GPPG, PPGW, PGWP, GWPP, PCPP, GPPP, or CPPG motifs. Cleavage metric for sequence GHQMQHHCDDSQPTDCWP anchors one scored candidate class. Identification and Characterization of a Pepsin- and Chymotrypsin-Resistant Peptide supports protease-resistance framing. Strategies for Improving Peptide Stability and Delivery constrains immediate translation from degradation ranking to delivery readiness.
Contradiction weighting would narrow the rule if §8 paired-degradation assays separate cleavage scores from measured proteolysis. Barriers and Strategies for Oral Peptide and Protein Therapeutics Delivery constrains candidate prioritization beyond enzyme exposure. Overcoming Oral Cavity Barriers constrains receptor-screen sequencing when formulation alters peptide access before CRISPR readout. On the Utility of Chemical Strategies constrains motif-family scoring because chemical modification can dominate intrinsic cleavage rules. Impact of Peptide Structure on Colonic Stability constrains proline-rich motif scoring by linking stability to structure and permeability context. With evidence_strength 0.35 and uncertainty_score 0.41, this remains a proposal limited to within-panel ranking until §8 degradation data align.
9. Limitations
- Synthesis class. This paper is an autonomous proposal, not a peer-reviewed result. The LLM-drafted sections (Introduction, Mechanistic Framework, Discussion, Conclusion) are constrained by the per-section confidence gates but are not yet adjudicated by human reviewers.
- Evidence scope. Conclusions are constrained to Protean's runtime provenance graph at the time of this cycle; sources not yet ingested are by construction absent from the synthesis.
- No wet-lab validation. Computational rankings are research prioritization, not biological proof. Acceptance of any specific claim requires the experiments outlined in §10.
- Low evidence strength. Aggregate evidence strength is 0.35 (max 1.0). Individual sentence-level confidence is reported per section; the claim graph behind those numbers lives in
provenance.json. - Unresolved contradictions. 5 contradicting reference(s) are acknowledged and have not been resolved within this cycle. Direct replication of those records is among the highest-value follow-ups.
10. Future Experiments
| Experiment | Hypothesis tested | Primary readout | Falsification criterion |
|---|---|---|---|
| Motif-resolved protease challenge | Candidates carrying PPGP, PGPP, PPPG, GPPG, PPGW, PGWP retain integrity longer than motif-stripped controls | LC-MS intact-peptide tracking over 0/30/120 min exposure to a standard protease cocktail | Motif-bearing and control candidates show indistinguishable degradation half-lives |
| Receptor-binding screen | Candidates engage at least one receptor from CRISPR with measurable affinity | Binding assay (BLI/SPR) titration; IC50 / Kd | All candidates show no binding above buffer background across the receptor set |
| Contradiction replication | The conflict identified in the contradicting reference(s) reproduces under Protean's standard assay conditions | Same primary readout as the original record; comparison statistic depends on the conflict class | Original contradictory result fails to reproduce; the synthesis claim survives unchallenged |
| Developability triage | Top candidates pass standard developability filters (solubility, aggregation, hERG, hepatotoxicity proxies) | Profile against the in-house developability filter panel | Candidates fail developability filters faster than Protean's baseline rate (>50%) |
11. Conclusion
conf 0.08 · evidence 5 sup / 5 con · trusted-tier 2 · class mix: unr:4
We rank the hypothesis on 5 trusted reference(s) at aggregate uncertainty 0.41. We recommend the §10 experimental program as the next step. Contradicting records constrain the claim surface but do not retire it. At the present runtime confidence, this remains a proposal.
12. References
Supporting (trusted tier):
1. Cleavage metric for sequence GHQMQHHCDDSQPTDCWP · [TRUST_T1] · source_id:cycle-20260526T020837Z-02-001 2. Cleavage metric for sequence DCDQTNWPCGGQQHCDKA · [TRUST_T1] · source_id:cycle-20260526T020837Z-02-005 3. Cleavage metric for sequence HQMAQHCDDCDQFPTDCG · [TRUST_T1] · source_id:cycle-20260526T020837Z-02-002 4. Bowman–Birk Inhibitor Mutants of Soybean Generated by CRISPR-Cas9 Reveal Drastic Reductions in Trypsin and Chymotrypsin Inhibitor Activities · [TRUST_T2] · source_id:PMC11171862 5. Identification and Characterization of a Pepsin- and Chymotrypsin-Resistant Peptide in the α Subunit of the 11S Globulin Legumin from Common Bean ( Phaseolus v… · [TRUST_T2] · source_id:PMC11228969
Contradicting:
1. Barriers and Strategies for Oral Peptide and Protein Therapeutics Delivery: Update on Clinical Advances · [TRUST_T2] · source_id:PMC12030352 2. Overcoming Oral Cavity Barriers for Peptide Delivery Using Advanced Pharmaceutical Techniques and Nano-Formulation Platforms · [TRUST_T2] · source_id:PMC12650023 3. On the Utility of Chemical Strategies to Improve Peptide Gut Stability · [TRUST_T2] · source_id:PMC9059125 4. Strategies for Improving Peptide Stability and Delivery · [TRUST_T2] · doi:10.3390/ph15101283 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9610364/ 5. Impact of Peptide Structure on Colonic Stability and Tissue Permeability · [TRUST_T2] · source_id:PMC10384666
13. Computational Investigation
Runtime capability investigation. Before this synthesis was drafted, Protean queried Galen's bounded capability surface to enrich the seed with structural and prior-art context. The full investigation ledger is preserved in the private snapshot (investigation.json); this section reports the public-safe rollup.
- Wall-clock duration: 8 ms
- Capability calls:
db.uniprot:motif_search: 3,pdb: 1 - Call statuses:
ok: 1,skipped: 3
Motifs investigated against UniProt:
PPGP→ no family-level hitsPGPP→ no family-level hitsPPPG→ no family-level hits
PDB cross-references (0 resolved):
- No PDB IDs mentioned in supporting evidence.
Candidate-sequence QC distribution. No candidate sequences were resolvable for this seed.
Structural analog search. 0 Foldseek ticket(s) were submitted against AFDB50 + PDB100; results poll asynchronously and are appended in subsequent cycles.
Prior-failure motif overlap. The following seed motifs also appear in prior rejected/low-scoring candidates and warrant caution in §9 prioritization: CPPG, GWPP, PCPP, PGWP.
14. Provenance Appendix
Full provenance — evidence lineage, novelty trace, reviewer findings, per-section LLM call log, per-claim classifications — is persisted to provenance.json alongside this thesis.
- seed_id:
seed_d3f302b70908da22 - hypothesis_id:
hypothesis:cleavage:435e31fc53af - publication_tier:
research_note - cluster_id:
protease_resistance - thesis_layer:
protean.autonomous_thesis.v1
To audit: read provenance.json in the same directory.
