Skip to content
Protean

Protean Frontier · public training observatory

Protean Observatory

A public window into Protean Frontier training, evaluation, and model progress.

Every run shown here is sanitized and publicly safe. Training curves, evaluation metrics, and progress history are displayed without exposing private infrastructure or raw datasets.

Successful runs

6

completed

Training runs

16

Evaluation runs

22

Public metric series

13

with real data

Latest milestone

Phase 103

Phase 103 SeqQA GPU Decode Repair Eval

Current model status

Where Protean Frontier stands today

snapshot · 2026-06-21T21:52:41Z

Training status
QLoRA fine-tuning on GPU is complete; the training-loss trajectory is recorded over its full step history.
Latest evaluation
SeqQA exact-match was recorded on a GPU evaluation run; unscored records are excluded, never counted as zeros.
Public metrics
13 sanitized metric series are published, led by the training-loss curve.
Benchmark eligibility
Observability only — not Frontier 0.1 certified and not scorecard-eligible.
Next milestone
Broader SeqQA coverage and continued instrumented training.
Provenance
Sanitized public bundle; no private infrastructure or raw datasets are exposed.

Progress

The path so far

  1. 01 complete

    Foundation & guardrails

    Run-lab harness, claim boundaries, and sanitization rules established so progress can be shown without exposing private infrastructure.

  2. 02 complete

    Dataset materialization

    SFT datasets staged and validated as the training corpus.

  3. 03 complete

    H100 QLoRA training

    QLoRA fine-tuning on GPU produced the training-loss trajectory shown below.

  4. 04 complete

    Instrumented continuation

    Training continued with full metric instrumentation (loss, optimizer steps, examples seen).

  5. 05 complete

    Numeric / range / unit training

    Focused training passes on numeric, range, and unit handling.

  6. 06 complete

    SeqQA evaluation path

    Sequence-QA evaluation harness with exact-match scoring on a GPU eval run.

  7. 07 complete

    Observatory launch

    This public window — sanitized runs, training curves, and evaluation metrics.

Metrics

Training & evaluation curves

Real public metric series recovered from the training tracker. Only meaningful series are charted; counters and absent values are not plotted as curves or zeros.

Training loss

QLoRA fine-tuning loss recorded over its full step history.

275 points
step 1step 2505.572.37

Validation loss

14 pts
step 10step 2505.664.86

Training & evaluation stats

Best examples / sec

6.303

BixBench subset accuracy

0.04918

BixBench baseline accuracy

0.000

BixBench candidate accuracy

0.000

BixBench correct

6.000

BixBench delta

0.000

BixBench positive delta

0.000

Examples seen

500.0

Peak GPU memory (MiB)

3.071e+4

Optimizer steps

1000

SeqQA exact match

0.000

Methodology & limits

What this is — and what it is not

The Observatory is a sanitized, read-only window into Protean Frontier run history. Metrics are computational observability telemetry. No run here is Frontier 0.1 certified, no metric is scorecard-eligible, and nothing here is biological proof — wet-lab and provider validation is a separate, gated process. Charts show only real recorded points; missing values are omitted rather than shown as zero.

manifest.jsonindex.jsonbundle sha256 96841006796a… · generated 2026-06-21T21:52:41Z