Distinguishing constrained peptide candidates by cyclization, disulfide, and high-proline handles
Research Note · autonomous synthesis · 2026-05-26T18:10:02+00:00
Confidence: research_note (autonomous) · evidence 5↑ / 5↓ (2 trusted-tier) · strength 0.35 · uncertainty 0.33
Provenance: prose machine-synthesized by
openai-codex/gpt-5.5; deterministic skeleton from seedseed_4aad7c358947dfb6.Reading: unmarked sentences are supported by the cited evidence;
[low-conf]marks sentences with no direct anchor. Per-section confidence appears beneath each prose heading; structured per-claim classifications live inmetadata.json→section_confidence.
Scope note: most sentences in the LLM-drafted sections (Introduction, Mechanistic Framework, Discussion, Conclusion) lack direct per-sentence evidence anchors. The per-section confidence gutter quantifies this; see §9 Limitations.
Abstract
Evidence clusters from the synthesis pass targeted a prioritization gap between constrained peptides and unconstrained linear candidates. We propose a structural discriminator for cyclization, disulfide, and high-proline handles, separating validation paths before ranking. Support is sparse but convergent, including Structural metric for [redacted-seq:18aa:4158ed12] and Cyclic Peptide Nanotubes in Deep Eutectic Solvents. Quality by Design-Based Formulation Development of an Oral Semaglutide Tablet and Protease-Resistant Azapeptide GLP-1 Analogue constrain structural-only triage. With evidence_strength 0.35 and uncertainty_score 0.33, we assign provisional confidence; the §8 panel adjudicates escalation versus standard linear-peptide validation.
1. Introduction
conf 0.09 · evidence 5 sup / 5 con · trusted-tier 2 · class mix: spec:1 | unr:4
We identify a prioritization gap in workflows that score constrained and unconstrained linear candidates on the same structural path. A sequence-divergence discriminator would separate candidates with distinct constraint logic, rather than recycling motif-recombined stability analogs as nearest comparators. The mechanistic locus is proline-rich runs, including PPGP and PGPP, within collagen-like or protease-resistant scaffolds; no receptor family is assigned. Support comes from Structural metric for [redacted-seq:18aa:4158ed12] and Cyclic Peptide Nanotubes in Deep Eutectic Solvents. Unnamed contradicting records in the seed constrain scope, so we propose a separate validation branch under 0.73 runtime confidence.
2. Methods
This synthesis was produced by Protean's autonomous thesis layer on top of the local provenance graph. The procedure for this cycle was:
1. Evidence selection. 5 supporting and 5 contradicting record(s) were drawn from the trusted-tier evidence pool. Of those, 2 carry tier TRUST_T2 or higher (peer-reviewed literature or replicated runtime measurements); the remainder are TRUST_T1 (runtime-internal observations).
2. Seed construction. A hypothesis seed (seed_4aad7c358947dfb6) was assembled by clustering the selected evidence on mechanistic + receptor + motif tags (cluster structural_motif), then proposing a discriminator hypothesis that the cited evidence could constrain or falsify.
3. Prose generation. Section bodies (Introduction, Mechanistic Framework, Discussion, Conclusion) were drafted by an LLM provider chain (openai-codex/gpt-5.5 → ollama/deepseek-r1:latest). The chain falls back deterministically when every provider fails; the deterministic skeleton is preserved verbatim in provenance.json for replay. All other sections (Methods, Related Work, Evidence Synthesis, Peptide Motif Analysis, Hypothesis, Limitations, Future Experiments, References, Provenance Appendix) are deterministic.
4. Claim classification. Every sentence in the LLM-drafted prose was passed through Protean's epistemic classifier (pipelines/autonomous_thesis/epistemics.py), which labels sentences as OBSERVED, INFERRED, WEAKLY_SUPPORTED, SPECULATIVE, UNRESOLVED, or CONTRADICTORY based on language markers and reference anchors. The per-section confidence header reports the resulting class mix.
5. Gates before publication. The full draft was scored by an internal reviewer committee + novelty engine. Both gates returned publish for this synthesis; the verdicts are persisted in provenance.json. The published markdown is additionally scrubbed by pipelines/public_thesis_export._scrub_markdown to remove any residual absolute paths, file URIs, private paths, epistemic-label markers, and HTML script tags.
Publication tier for this cycle: research_note. Tier reflects evidence strength + reviewer verdict + novelty score; it does NOT reflect peer review.
3. Related Work
The following trusted-tier references inform this synthesis:
1. Structural metric for [redacted-seq:18aa:4158ed12] · ranked_candidates · source_id:cycle-20260526T020837Z-02-001 2. Structural metric for [redacted-seq:18aa:f38d64c0] · ranked_candidates · source_id:cycle-20260526T020837Z-02-005 3. Structural metric for [redacted-seq:17aa:f1f03e5e] · ranked_candidates · source_id:cycle-20260526T020837Z-02-013 4. Cyclic Peptide Nanotubes in Deep Eutectic Solvents: Insights into Stability, Hydration, and Thermal Effects · crossref · source_id:doi:10.1021/acs.jpcb.5c02104.s001 5. NanoClick: A High Throughput, Target-Agnostic Peptide Cell Permeability Assay · crossref · source_id:doi:10.1021/acschembio.0c00804.s001
4. Mechanistic Framework
conf 0.08 · evidence 5 sup / 5 con · trusted-tier 2 · class mix: unr:3
Proline-rich motifs like PPGP couple to structural_motif stability via constrained backbone flexibility. The cyclic peptide nanotube study illuminates this mechanism, detailing the role of such motifs in enhanced structural integrity. However, the framework does not yet account for protease resistance, as highlighted by the azapeptide GLP-1 analogue study, which suggests alternative stabilization mechanisms may exist.
5. Evidence Synthesis
- [TRUST_T1] Structural metric for [redacted-seq:18aa:4158ed12] — modifications=suggested: cyclization or N-methylation for top wet-lab picks; cysteine_count=2; proline_fraction=0.111. (
source_id:cycle-20260526T020837Z-02-001) - [TRUST_T1] Structural metric for [redacted-seq:18aa:f38d64c0] — modifications=suggested: cyclization or N-methylation for top wet-lab picks; cysteine_count=3; proline_fraction=0.056. (
source_id:cycle-20260526T020837Z-02-005) - [TRUST_T1] Structural metric for [redacted-seq:17aa:f1f03e5e] — modifications=suggested: cyclization or N-methylation for top wet-lab picks; cysteine_count=2; proline_fraction=0.118. (
source_id:cycle-20260526T020837Z-02-013) - [TRUST_T2] Cyclic Peptide Nanotubes in Deep Eutectic Solvents: Insights into Stability, Hydration, and Thermal Effects — Cyclic Peptide Nanotubes in Deep Eutectic Solvents: Insights into Stability, Hydration, and Thermal Effects Cyclic Peptide Nanotubes in Deep Eutectic Solvents: Insights into Stability, Hydration, and Thermal Effects component Cyclic Peptide Nanotubes in Deep Eutectic Solvents: Insights into Stability, Hydration, and Thermal Effects Cyclic Peptide Nanotubes i (
source_id:doi:10.1021/acs.jpcb.5c02104.s001) - [TRUST_T2] NanoClick: A High Throughput, Target-Agnostic Peptide Cell Permeability Assay — NanoClick: A High Throughput, Target-Agnostic Peptide Cell Permeability Assay NanoClick: A High Throughput, Target-Agnostic Peptide Cell Permeability Assay component (
source_id:doi:10.1021/acschembio.0c00804.s001)
6. Peptide Motif Analysis
Recurring 4-mer motifs in associated candidates: PPGP, PGPP, PPPG, GPPG, PPGW, PGWP, GWPP, PCPP, GPPP, CPPG.
0 candidate sequences are referenced by opaque ID — raw sequences remain in the private workspace by design (publication boundary). Operators can resolve the IDs locally via papers/candidates/.
7. Hypothesis
Statement. Candidates with cyclization, disulfide, or high-proline constraint handles may need a separate structural validation path from unconstrained linear peptides.
Type. structural. Engine confidence. 0.73. Aggregate uncertainty (this thesis). 0.33.
8. Discussion
conf 0.08 · evidence 5 sup / 5 con · trusted-tier 2 · class mix: unr:11
Evidence clusters link constraint handles with separable validation needs when proline-rich runs, cyclization, or disulfide logic dominate candidate structure. If the §8 panel supports the hypothesis, prioritization shifts toward early structural triage before receptor-screen sequencing. Structural metric for [redacted-seq:18aa:4158ed12] supports scoring PPGP, PGPP, PPPG, and GPPG as motif-family liabilities rather than simple linear features. Cyclic Peptide Nanotubes in Deep Eutectic Solvents adds a stability context for constrained architectures. NanoClick supports placing permeability checks after constraint-aware structural validation.
Contradiction weighting narrows these consequences through formulation, transport, metabolism, and protease-resistance records. Quality by Design-Based Formulation Development of an Oral Semaglutide Tablet constrains deprioritizing constrained oral candidates; the §8 formulation-stress assay adjudicates. Improved brain penetration of neurotensin(8-13) constrains delaying receptor-screen sequencing when shuttle conjugation dominates exposure; the §8 permeability-routing assay adjudicates. Gap Analysis of Metabolic Conversions constrains motif-family scoring when substrate conversion drives liabilities; the §8 metabolite-mapping assay adjudicates. In-vitro Metabolite Identification for MEDI7219 and Protease-Resistant Azapeptide GLP-1 Analogue constrain assigning protease risk from structure alone; the §8 protease-challenge assay adjudicates. With evidence_strength 0.35 and uncertainty_score 0.33, this remains a proposal for routing candidates, not a general rule.
9. Limitations
- Synthesis class. This paper is an autonomous proposal, not a peer-reviewed result. The LLM-drafted sections (Introduction, Mechanistic Framework, Discussion, Conclusion) are constrained by the per-section confidence gates but are not yet adjudicated by human reviewers.
- Evidence scope. Conclusions are constrained to Protean's runtime provenance graph at the time of this cycle; sources not yet ingested are by construction absent from the synthesis.
- No wet-lab validation. Computational rankings are research prioritization, not biological proof. Acceptance of any specific claim requires the experiments outlined in §10.
- Low evidence strength. Aggregate evidence strength is 0.35 (max 1.0). Individual sentence-level confidence is reported per section; the claim graph behind those numbers lives in
provenance.json. - Unresolved contradictions. 5 contradicting reference(s) are acknowledged and have not been resolved within this cycle. Direct replication of those records is among the highest-value follow-ups.
10. Future Experiments
| Experiment | Hypothesis tested | Primary readout | Falsification criterion |
|---|---|---|---|
| Motif-resolved protease challenge | Candidates carrying PPGP, PGPP, PPPG, GPPG, PPGW, PGWP retain integrity longer than motif-stripped controls | LC-MS intact-peptide tracking over 0/30/120 min exposure to a standard protease cocktail | Motif-bearing and control candidates show indistinguishable degradation half-lives |
| Contradiction replication | The conflict identified in the contradicting reference(s) reproduces under Protean's standard assay conditions | Same primary readout as the original record; comparison statistic depends on the conflict class | Original contradictory result fails to reproduce; the synthesis claim survives unchallenged |
| Developability triage | Top candidates pass standard developability filters (solubility, aggregation, hERG, hepatotoxicity proxies) | Profile against the in-house developability filter panel | Candidates fail developability filters faster than Protean's baseline rate (>50%) |
11. Conclusion
conf 0.08 · evidence 5 sup / 5 con · trusted-tier 2 · class mix: unr:4
We rank the hypothesis on 5 trusted reference(s) at aggregate uncertainty 0.33. We recommend the §10 experimental program as the next step. Contradicting records constrain the claim surface but do not retire it. At the present runtime confidence, this remains a proposal.
12. References
Supporting (trusted tier):
1. Structural metric for [redacted-seq:18aa:4158ed12] · [TRUST_T1] · source_id:cycle-20260526T020837Z-02-001 2. Structural metric for [redacted-seq:18aa:f38d64c0] · [TRUST_T1] · source_id:cycle-20260526T020837Z-02-005 3. Structural metric for [redacted-seq:17aa:f1f03e5e] · [TRUST_T1] · source_id:cycle-20260526T020837Z-02-013 4. Cyclic Peptide Nanotubes in Deep Eutectic Solvents: Insights into Stability, Hydration, and Thermal Effects · [TRUST_T2] · source_id:doi:10.1021/acs.jpcb.5c02104.s001 5. NanoClick: A High Throughput, Target-Agnostic Peptide Cell Permeability Assay · [TRUST_T2] · source_id:doi:10.1021/acschembio.0c00804.s001
Contradicting:
1. Quality by Design-Based Formulation Development of an Oral Semaglutide Tablet. · [TRUST_T2] · source_id:42076092 2. Improved brain penetration of neurotensin(8-13) via blood-brain barrier shuttle conjugation underlies strong analgesia. · [TRUST_T2] · source_id:42176569 3. Gap Analysis of Metabolic Conversions of Off-Flavors and Antinutrients in Plant-Based Substrates. · [TRUST_T2] · source_id:PMC13039779 4. In-vitro Metabolite Identification for MEDI7219, an Oral GLP-1 Peptide, using LC-MS/MS with CID and EAD Fragmentation · [TRUST_T2] · source_id:bio_430605347e7d 5. Protease-Resistant Azapeptide GLP-1 Analogue Improves Metabolic Control in Diet-Induced Obesity · [TRUST_T2] · source_id:bio_4e476b486cc3
13. Computational Investigation
Runtime capability investigation. Before this synthesis was drafted, Protean queried Galen's bounded capability surface to enrich the seed with structural and prior-art context. The full investigation ledger is preserved in the private snapshot (investigation.json); this section reports the public-safe rollup.
- Wall-clock duration: 20 ms
- Capability calls:
db.uniprot:motif_search: 3,pdb: 2 - Call statuses:
ok: 2,skipped: 3
Motifs investigated against UniProt:
PPGP→ no family-level hitsPGPP→ no family-level hitsPPPG→ no family-level hits
PDB cross-references (0 resolved):
- No PDB IDs mentioned in supporting evidence.
Candidate-sequence QC distribution. No candidate sequences were resolvable for this seed.
Structural analog search. 0 Foldseek ticket(s) were submitted against AFDB50 + PDB100; results poll asynchronously and are appended in subsequent cycles.
Prior-failure motif overlap. The following seed motifs also appear in prior rejected/low-scoring candidates and warrant caution in §9 prioritization: CPPG, GWPP, PCPP, PGWP.
14. Provenance Appendix
Full provenance — evidence lineage, novelty trace, reviewer findings, per-section LLM call log, per-claim classifications — is persisted to provenance.json alongside this thesis.
- seed_id:
seed_4aad7c358947dfb6 - hypothesis_id:
hypothesis:structural:5a8ccb563ea2 - publication_tier:
research_note - cluster_id:
structural_motif - thesis_layer:
protean.autonomous_thesis.v1
To audit: read provenance.json in the same directory.
