Protean Frontier · public training observatory

Protean Observatory

A public window into Protean Frontier training, evaluation, and model progress.

Every run shown here is sanitized and publicly safe. Training curves, evaluation metrics, and progress history are displayed without exposing private infrastructure or raw datasets.

Explore runs View methodology

Successful runs

completed

Training runs

Evaluation runs

Public metric series

with real data

Latest milestone

Phase 103

Phase 103 SeqQA GPU Decode Repair Eval

Current model status

Where Protean Frontier stands today

snapshot · 2026-06-21T21:52:41Z

Training status: QLoRA fine-tuning on GPU is complete; the training-loss trajectory is recorded over its full step history.
Latest evaluation: SeqQA exact-match was recorded on a GPU evaluation run; unscored records are excluded, never counted as zeros.
Public metrics: 13 sanitized metric series are published, led by the training-loss curve.
Benchmark eligibility: Observability only — not Frontier 0.1 certified and not scorecard-eligible.
Next milestone: Broader SeqQA coverage and continued instrumented training.
Provenance: Sanitized public bundle; no private infrastructure or raw datasets are exposed.

Progress

The path so far

01 complete
Foundation & guardrails
Run-lab harness, claim boundaries, and sanitization rules established so progress can be shown without exposing private infrastructure.
02 complete
Dataset materialization
SFT datasets staged and validated as the training corpus.
03 complete
H100 QLoRA training
QLoRA fine-tuning on GPU produced the training-loss trajectory shown below.
04 complete
Instrumented continuation
Training continued with full metric instrumentation (loss, optimizer steps, examples seen).
05 complete
Numeric / range / unit training
Focused training passes on numeric, range, and unit handling.
06 complete
SeqQA evaluation path
Sequence-QA evaluation harness with exact-match scoring on a GPU eval run.
07 complete
Observatory launch
This public window — sanitized runs, training curves, and evaluation metrics.

Metrics

Training & evaluation curves

Real public metric series recovered from the training tracker. Only meaningful series are charted; counters and absent values are not plotted as curves or zeros.

Training loss

QLoRA fine-tuning loss recorded over its full step history.

275 points

Validation loss

14 pts

Training & evaluation stats

Best examples / sec

6.303

BixBench subset accuracy

0.04918

BixBench baseline accuracy

0.000

BixBench candidate accuracy

0.000

BixBench correct

6.000

BixBench delta

0.000

BixBench positive delta

0.000

Examples seen

500.0

Peak GPU memory (MiB)

3.071e+4

Optimizer steps

1000

SeqQA exact match

0.000

Featured

Successful runs

Completed runs with public results. Each links to a detailed view; the full sanitized run bundle is available as JSON below.

evalcompleted

Phase 103 SeqQA GPU Decode Repair Eval

Evaluation run · phase 103

View run →

evalcompleted

Phase 101 - RunPod SeqQA GPU Eval

Evaluation run · phase 101

View run →

phasecompleted

Phase 79: Full Phase75 Cap And Utilization Repair

Phase run · phase 79

View run →

phasecompleted

Phase 78: Post-Phase76 Numeric/Range/Unit Repair Decision Packet

Phase run · phase 78

View run →

phasecompleted

Phase 52 - Local Disk Relief and redacted_storage_backend Enforcement Audit

Phase run · phase 52

View run →

phasecompleted

Phase 47 - BixBench Deterministic Subset Delta

Phase run · phase 47

View run →

Methodology & limits

What this is — and what it is not

The Observatory is a sanitized, read-only window into Protean Frontier run history. Metrics are computational observability telemetry. No run here is Frontier 0.1 certified, no metric is scorecard-eligible, and nothing here is biological proof — wet-lab and provider validation is a separate, gated process. Charts show only real recorded points; missing values are omitted rather than shown as zero.

manifest.json index.jsonbundle sha256 96841006796a… · generated 2026-06-21T21:52:41Z

Protean Observatory

Where Protean Frontier stands today

The path so far

Foundation & guardrails

Dataset materialization

H100 QLoRA training

Instrumented continuation

Numeric / range / unit training

SeqQA evaluation path

Observatory launch

Training & evaluation curves

Training loss

Validation loss

Successful runs

Phase 103 SeqQA GPU Decode Repair Eval

Phase 101 - RunPod SeqQA GPU Eval

Phase 79: Full Phase75 Cap And Utilization Repair

Phase 78: Post-Phase76 Numeric/Range/Unit Repair Decision Packet

Phase 52 - Local Disk Relief and redacted_storage_backend Enforcement Audit

Phase 47 - BixBench Deterministic Subset Delta

What this is — and what it is not