Research ingestion
PDB · UniProt · literature · prior cycles · failure motifs.
Galen · Continuous scientific cognition
Galen is the scientific cognition layer of Protean — a persistent biological reasoning runtime that ingests evidence, evaluates proteins, ranks binders, and prepares wet-lab handoff in continuous, review-gated cycles. Operated by Protean Labs, anchored on Base mainnet.
01—Why Galen
Galen of Pergamon (129–216 CE) systematised medical observation into a coherent reasoning framework — anatomy, physiology, pharmacology, the recording of evidence. Two millennia later, biology is large enough to require the same discipline at machine scale. Our runtime takes the name as a small commitment: that computational biology is *biology first,* and reasoning over it must be systematic, observed, and recorded.
Galen is not a chatbot, an assistant, a copilot, or an LLM wrapper. It is the scientific cognition layer of a peptide discovery runtime that operates continuously and writes every cycle into a sealed, replayable record.
02—Cognition loops
Galen runs fifteen autonomous cron schedules — biological evaluation, structural comparison, protein QC, design-handoff bundling, evidence reconciliation. Four named publicly below. The rest stay operator-internal until the public surface for them has been reviewed.
galen-esm2-pll-scoring-sweep
On-device ESM-2 pseudo-likelihood scoring across the active candidate population.
Active · every 6h
Local model execution
galen-protein-qc-rollup
Protein QC pipeline rollup — composition, hydrophobic moment, AGGRESCAN, TOP-IDP, Boman, Shannon entropy.
Active · every 4h
Quality control
galen-foldseek-similarity-watch
Foldseek similarity sweep against the structural reference set; surfaces new structural neighbours for review.
Active · daily
Structural integration
galen-design-handoff-bundler
Bundles review-ready candidates and their lineage records into a draft wet-lab handoff packet.
Active · every 6h
Handoff infrastructure
+ 11 additional cognition loops · operator-internal
Operational state · verified 2026-05-23 · not live telemetry
03—Capabilities
Capabilities are the verbs the cognition runtime can perform. Today's surface is four categories deep: live biological database integrations, on-device protein evaluation, the discovery runtime proper, and the orchestration layer that schedules and sequences them.
Biological databases
Live, read-only integrations with structural and sequence repositories. Read paths, never write paths.
PDB
structural reference and motif lookups
UniProt
sequence annotations and provenance
Foldseek
structural similarity sweeps over the candidate set
Local model execution
On-device protein evaluation. No external API for sequence inference. Failure falls back to deterministic descriptors.
ESM2 scoring
facebook/esm2_t12_35M_UR50D pseudo-likelihood + embeddings
Protein QC
composition, hydrophobic moment, AGGRESCAN, TOP-IDP, Boman, Shannon entropy
Binder ranking
multi-axis prioritisation against the canonical scoring surface
Discovery runtime
The scientific cycle itself — generation through review-ready handoff, bounded at every step.
Candidate extraction
from literature, prior cycles, failure motifs, patent context
Scoring
seven canonical weights, normalised at write time
Validation
deterministic gates: residue validity, motif burden, cleavage exposure
Rejection learning
failure-similarity scoring fed by the contradiction graph
Assay preparation
review-gated drafts; no auto-submit
Orchestration
Bounded scheduling, provenance envelopes, and design-handoff infrastructure across the runtime.
Cron cognition loops
15 autonomous schedules, hourly to daily cadence
Provenance envelopes
every output carries a content hash and a lineage record
Design handoffs
candidate bundles prepared for human review and wet-lab handoff
Bounded runtime signaling
health, capability, and cycle state composed by Galen
50 capabilities surfaced · 38 full execution · 12 partial / target · verified 2026-05-23
04—Runtime topology
The continuous flow Galen orchestrates. Structure generation via RFdiffusion is the principal Phase I capability — every stage to either side of it runs today.
FIG · 01·Galen runtime topology
Research ingestion
PDB · UniProt · literature · prior cycles · failure motifs.
Galen cognition runtime
Evidence → constraints → proposal → validation → ranking, on a continuous cadence.
RFdiffusion / Folding
Local structure generation and folding — Phase I expansion capability.
Candidate ranking
Binder QC, scoring against canonical weights, claim QA, rationale assembly.
Wet-lab handoff
Review-gated draft packets; redacted provider packets prepared for human dispatch.
05—Phase I Runtime Expansion
Phase I expands the cognition runtime from simulation-scale orchestration into continuous local structure generation and the first wet-lab handoff batches. The first $30,000 in bankr fees is allocated against the targets below — capability earned by the system's own throughput.
Phase I · target allocation · bankr fees
$30,000
Three allocations across local structure generation, continuous cognition infrastructure, and the first review-gated wet-lab handoff batches.
01
Local structure generation + folding
Local CUDA structure-generation and folding workloads — RFdiffusion, BindCraft, AlphaFold-class validation, Boltz/Chai-style model workflows, and future GPU-only protein design tools — running continuously on-prem instead of staged or cloud-bound.
02
Continuous cognition runtime
Stable continuous runtime operation for Galen cognition loops, ingestion, scoring, indexing, local model execution, provenance preparation, and long-running scientific orchestration with error-corrected memory.
03
Real-world assay validation
Move top-ranked computational candidates into real-world assay validation — beginning to close the loop between autonomous candidate generation, biological evidence, and experimental feedback.
Runtime impact
What expanded capability produces on the public surface — visible in cycle artifacts and the provenance graph.
Phase I is a runtime-expansion target, not a fundraise. The expansion is earned by protocol activity inside the Protean network and deployed against the named hardware and handoff targets above. Wet-lab validation and human scientific review remain authoritative.
Boundary
Computational rankings are research prioritization signals. They do not prove biological activity, therapeutic effect, safety, efficacy, patentability, clinical readiness, or experimental validation. Wet-lab validation and human scientific review remain the authoritative downstream layer.