Protean Frontier · public training observatory
Protean Observatory
A public window into Protean Frontier training, evaluation, and model progress.
Every run shown here is sanitized and publicly safe. Training curves, evaluation metrics, and progress history are displayed without exposing private infrastructure or raw datasets.
Successful runs
6
completed
Training runs
16
Evaluation runs
22
Public metric series
13
with real data
Latest milestone
Phase 103
Phase 103 SeqQA GPU Decode Repair Eval
Current model status
Where Protean Frontier stands today
snapshot · 2026-06-21T21:52:41Z
- Training status
- QLoRA fine-tuning on GPU is complete; the training-loss trajectory is recorded over its full step history.
- Latest evaluation
- SeqQA exact-match was recorded on a GPU evaluation run; unscored records are excluded, never counted as zeros.
- Public metrics
- 13 sanitized metric series are published, led by the training-loss curve.
- Benchmark eligibility
- Observability only — not Frontier 0.1 certified and not scorecard-eligible.
- Next milestone
- Broader SeqQA coverage and continued instrumented training.
- Provenance
- Sanitized public bundle; no private infrastructure or raw datasets are exposed.
Progress
The path so far
- 01 complete
Foundation & guardrails
Run-lab harness, claim boundaries, and sanitization rules established so progress can be shown without exposing private infrastructure.
- 02 complete
Dataset materialization
SFT datasets staged and validated as the training corpus.
- 03 complete
H100 QLoRA training
QLoRA fine-tuning on GPU produced the training-loss trajectory shown below.
- 04 complete
Instrumented continuation
Training continued with full metric instrumentation (loss, optimizer steps, examples seen).
- 05 complete
Numeric / range / unit training
Focused training passes on numeric, range, and unit handling.
- 06 complete
SeqQA evaluation path
Sequence-QA evaluation harness with exact-match scoring on a GPU eval run.
- 07 complete
Observatory launch
This public window — sanitized runs, training curves, and evaluation metrics.
Metrics
Training & evaluation curves
Real public metric series recovered from the training tracker. Only meaningful series are charted; counters and absent values are not plotted as curves or zeros.
Training loss
QLoRA fine-tuning loss recorded over its full step history.
Validation loss
14 ptsTraining & evaluation stats
Best examples / sec
6.303
BixBench subset accuracy
0.04918
BixBench baseline accuracy
0.000
BixBench candidate accuracy
0.000
BixBench correct
6.000
BixBench delta
0.000
BixBench positive delta
0.000
Examples seen
500.0
Peak GPU memory (MiB)
3.071e+4
Optimizer steps
1000
SeqQA exact match
0.000
Featured
Successful runs
Completed runs with public results. Each links to a detailed view; the full sanitized run bundle is available as JSON below.
Phase 103 SeqQA GPU Decode Repair Eval
Evaluation run · phase 103
View run →Phase 101 - RunPod SeqQA GPU Eval
Evaluation run · phase 101
View run →Phase 79: Full Phase75 Cap And Utilization Repair
Phase run · phase 79
View run →Phase 78: Post-Phase76 Numeric/Range/Unit Repair Decision Packet
Phase run · phase 78
View run →Phase 52 - Local Disk Relief and redacted_storage_backend Enforcement Audit
Phase run · phase 52
View run →Phase 47 - BixBench Deterministic Subset Delta
Phase run · phase 47
View run →Methodology & limits
What this is — and what it is not
The Observatory is a sanitized, read-only window into Protean Frontier run history. Metrics are computational observability telemetry. No run here is Frontier 0.1 certified, no metric is scorecard-eligible, and nothing here is biological proof — wet-lab and provider validation is a separate, gated process. Charts show only real recorded points; missing values are omitted rather than shown as zero.
