Provenance & disclosure·Foundation

Provenance layer

Galen → proposal → operator approval → Bankr → Protean Ledger → events → indexer → Digest → Explorer. The chain is the source of truth. The Digest is the reproducibility primitive. The explorer is a lens. Replay artifacts on GitHub and Gitlawb are supplemental.

Operational

The canonical contract is the Protean Ledger at proxy 0xE3c261F3C05D4c4710003cd6066EfD95094cf5f0 on Base mainnet, schema protean.ledger.v1, UUPS-upgradeable behind explicit role-based access control. Public verification mirrors live at github.com/proteanlabs1/ledger-mirror and Gitlawb DID did:key:z6Mkt6MEeSCJM2krT1PfX8BmTWbi9YYkLqdaRXSF6UZvy5QB; the indexer digest endpoint at /ledger/api/v1/indexer/digest re-derives the entire state from the contract's event log alone.

FIG · 04·The deployed verification path

Stage 1 · Galen

Detects a publishable scientific record (RuntimeCycle, Hypothesis, Experiment, EvidenceBundle, Candidate, CandidateFamily, CandidateLineage, Thesis, AssayResult, Collection, ExternalSignal, ScientificAsset). Drafts a typed envelope against protean.ledger.v1.

Holds zero on-chain roles. Cannot sign or broadcast.

↓Proposal queued for human/operator review.

Stage 2 · Operator approval

Reviews the proposed envelope, binds it to a review_record_id, and issues a single-use 5-minute approval token.

Operational approval only. Does not grant treasury powers or bypass Ledger RBAC.

↓Approved envelope and token are handed to Bankr.

Stage 3 · Bankr automation wallet

Validates approval token, envelope, selector allow-list, spend policy, destination allow-list, and halt switch; signs and broadcasts to Base mainnet.

AUTOMATION_WRITER_ROLE only · per-tx + per-day spend caps · Ledger proxy destination allow-list.

↓Transaction mined. Chain emits events.

Stage 4 · Protean Ledger

UUPS proxy at 0xE3c261F3…94cf5f0 on Base mainnet emits RecordRegistered + RecordContentEmitted (+ PublicationAttested and EdgeLinked for candidate/family publication).

17 RecordTypes · 20 RelationTypes · 10 LifecycleStates · 6 DisclosureStates.

↓Indexer picks up the event on the next minute-cadence cron tick.

Stage 5 · Public indexer + digest

scripts/index_ledger_from_genesis.py replays events into a sha256 state digest, served at /ledger/api/v1/indexer/digest.

Open source · reproducible from any Base RPC · 12-block confirmation window.

↓Digest equality with any third-party indexer is the reproducibility primitive.

Stage 6 · Explorer + verification rails

Lens over indexed chain state at protean.sh/ledger; approved public replay artifacts replicate to GitHub and Gitlawb after public records land.

Read-only views. Authority lives upstream.

Protected surface · additional contract role gates

Operator approval is necessary for every Bankr broadcast, but the actions below also require on-chain roles or timelocks that Bankr does not hold.

IPAssetrequires IP_DECLARANT_ROLE · Bankr does not hold it
RetractionNotice / proposeRetraction / executeRetractiontreasury-only RETRACTOR_ROLE · 24h timelock
Governance / role grants / role revocationstreasury-only DEFAULT_ADMIN_ROLE
Upgrades / pause / unpausetreasury-only UPGRADER_ROLE / PAUSER_ROLE
setLifecycle / setDisclosure on prior recordsoperator-only mutation of observable state
revokeLineageoperator-only — can deny prior science

Role separation · three principals · no overlap

Treasury
proteanlabs.base.ethDEFAULT_ADMIN_ROLE · UPGRADER_ROLE · PAUSER_ROLE · RETRACTOR_ROLE · LINEAGE_REVOKER_ROLE · IP_DECLARANT_ROLE
Operator
0x827Ba9…9C2C7OPERATOR_WRITER_ROLE · PAUSER_ROLE · LINEAGE_REVOKER_ROLE · IP_DECLARANT_ROLE where granted
Bankr
spend-capped automation walletAUTOMATION_WRITER_ROLE only

Operator compromise is not governance compromise.

Disclosure boundary · what crosses the publication guard

PrivatePrivate salts, embeddings, scoring internals, provider secrets, unfiled IP. Never leave the operator runtime.
ReviewedDraft envelopes pass through the publication guard before any public write. Full published sequences are allowed; salts, internals, and local paths fail closed.
Public on chainTyped records and publication attestations carry plaintext sequence provenance, lineage, operator attribution, and content-addressed digests on the Ledger. No secrets.

Six stages from cognition to public lens. Each row names the principal that acts, what it actually does, and the authority it carries. The chain is the source of truth; everything downstream is reproducible or supplemental.

The pipeline, in plain words

Every public Ledger write uses one approval-mediated execution path.

Stage 1 — Galen proposes. The cognition runtime composes a typed envelope for the next record against protean.ledger.v1 and queues it with a review record. Galen holds zero on-chain roles and cannot sign or broadcast.

Stage 2 — Operator approves. The operator reviews the proposal and mints a single-use, 5-minute-TTL approval token bound to the review record. The token authorises one broadcast attempt; it does not grant treasury roles or bypass contract checks.

Stage 3 — Bankr broadcasts. Bankr holds exactly AUTOMATION_WRITER_ROLE. It validates the approval token, envelope schema, selector allow-list, destination allow-list, per-transaction cap, per-day cap, and halt switch, then signs and broadcasts to Base mainnet.

Stage 4 — Ledger emits. The Protean Ledger contract validates the call and emits RecordRegistered, RecordContentEmitted, and (for lineage) EdgeLinked.

Stage 5 — Indexer reproduces. A Vercel cron runs the open-source indexer (scripts/index_ledger_from_genesis.py) every minute, replays events into Neon, and computes a sha256 over canonical bytes of the confirmed records and edges. The result is served at /ledger/api/v1/indexer/digest. Any third party with a Base RPC must compute the same digest — that is the reproducibility primitive.

Stage 6 — Explorer + mirrors. The public explorer at protean.sh/ledger renders indexed chain state. GitHub and Gitlawb distribute schema, indexer code, deployment metadata, and replay artifacts. They are public verification surfaces, not authorities.

Additional role-gated surfaces

IPAsset records — minting carries provisional-IP intent and requires IP_DECLARANT_ROLE in addition to a writer role. Bankr does not hold IP_DECLARANT_ROLE; even if a broadcast path were bypassed, the on-chain role check would reject.
RetractionNotice and the three retraction proposal kinds (proposeRetraction, cancelRetraction, executeRetraction) — retraction is a treasury-only action behind a 24-hour timelock.
Governance records — operator-authored only.
setLifecycle / setDisclosure mutations on already registered records — operator-only.
revokeLineage — operator-only; could be used to deny prior science.

Approval is necessary for these actions, but not sufficient. The Ledger role check and, where applicable, the retraction timelock are the final authority.

What lives on chain — and what does not

On chain: the record itself. Every record carries plaintext content fields (title, summary, author, runtime) and content-addressed pointers (contentDigest, replayPointer, referencesDigest), plus its RecordType, lifecycle state, and disclosure state. Publication attestations add full sequences, candidate/family IDs, lineage references, provenance hashes, operator signatures, and durable content URIs for published candidates and families. Every lineage edge between records is a typed first-class event (EdgeLinked), not metadata.

Off chain (private vault): private salts, embedding vectors, scoring internals, unfiled IP, provider secrets, and unpublished review material. These never cross the publication guard. Published Candidate and CandidateFamily records carry full sequences in their public attestation payloads while preserving hashes and commitments beside them.

Off chain (supplemental): the local cycle snapshot directory (atomic writes via O_CREAT|O_EXCL + fsync + os.replace), and the per-record replay artifact mirrored to proteanlabs1/ledger-mirror. These exist for operator audit and to let a verifier recompute the replayPointer's sha256. They are not the canonical record. A reader who wants to verify a record does not start from a snapshot — they start from getRecord(bytes32) on the Ledger proxy.

The typed surface

The Ledger schema (protean.ledger.v1) defines:

17 RecordType enum values — Unknown, RuntimeCycle, Hypothesis, Experiment, EvidenceBundle, Candidate, Thesis, AssayResult, Collection, RetractionNotice, ExternalSignal, Governance, ScientificAsset, IPAsset, CandidateFamily, CandidateLineage, FamilyLineage. Unknown is a sentinel; the other sixteen are usable record types.
20 RelationType enum values — Unknown, DerivedFrom, Tests, Supports, Contradicts, Supersedes, Retracts, Includes, Produces, Cites, ReviewedBy, Anchors, AssetOf, ProtectedBy, ParentOf, ChildOf, MemberOfFamily, VariantOf, FamilyDerivedFrom, PublishedAs. Unknown is a sentinel; the other nineteen are usable edge types.
10 LifecycleStates — Draft, ReviewReady, Anchored, AssayRequested, AssayReturned, IPReview, PatentFiled, Published, Superseded, Disputed.
6 DisclosureStates — PrivateCommitmentOnly, RedactedPublic, CounselReviewed, PatentPending, Public, Retracted.

Every record carries plaintext content (title, summary, author, runtime) and content-addressed pointers: contentDigest, replayPointer, and referencesDigest. contentDigest and record identity are computed by the contract from ABI-encoded fields. referencesDigest is the contract's keccak over the ABI-encoded references list. The replay artifact sha256 is a supplemental mirror check.

Candidate publication attestations

Published candidate sequences appear in public attestation payloads. Each candidate also keeps a domain-separated salted hash so historical commitment checks remain reproducible:

keccak256(
"PROTEAN_CANDIDATE_COMMITMENT_V1|<candidate_id>|<sequence.upper()>|<private_salt>"
)

The salt persists in the private vault via setdefault, so the same candidate_id always receives the same salt. The salt is not published.

The publication attestation proves that a specific full sequence, candidate ID, family ID, lineage set, content URI, provenance hash, dossier hash, archive hash, timestamp, and operator signature were committed together. The commitment is an integrity field beside the sequence, not a replacement for visibility.

Verification — what a reader actually runs

A reader can re-verify any published record end-to-end without trusting us. The procedure has four steps. Each one is grounded in chain state.

Read the record from chain. cast call 0xE3c261F3C05D4c4710003cd6066EfD95094cf5f0 "getRecord(bytes32)" <recordId> --rpc-url https://mainnet.base.org. The returned struct is the authoritative record — title, summary, recordType, lifecycle, disclosure, contentDigest, replayPointer, writer, registeredAt. If you trust nothing else on the page, trust this.
Walk lineage on chain. Read the contract's EdgeLinked events filtered on the recordId. Every parent and child edge — DerivedFrom, Tests, Supports, Contradicts, Supersedes, Retracts, plus 7 more — is a typed event with both endpoints recoverable. The graph is reconstructable from event logs alone.
Reproduce the state digest. Clone github.com/proteanlabs1/ledger-mirror, run scripts/index_ledger_from_genesis.py against any Base RPC from genesis, and call --digest-only. The output must equal the digest served at /ledger/api/v1/indexer/digest. Equality means the indexer reflects the chain bit-for-bit.
Recompute the replay artifact sha256 (when the record's replayPointer is a public mirror anchor). Fetch the artifact from the mirror at the cited commit, run shasum -a 256, compare to the sha256 suffix on the on-chain pointer. For the four historical bootstrap records the public mirror was introduced after registration, so this step honestly returns LIMITED — the explorer surfaces that explicitly.

If every check passes, the reader has reproduced the Ledger state from Base mainnet events and confirmed any available supplemental mirror artifact.

Contradiction graph and failure memory

Failure memory is one of Protean's distinctive commitments. Rejected motifs, contradicted claims, degraded candidates, and validation failures become first-class signal: the scoring layer reads from failure memory directly, and the failure-similarity component of the ranking surface penalises candidates that look like prior failures.

Today, failure memory feeds the failure-similarity scoring signal at rank time and claim-QA flags reach reviewers as warnings. The back-edge from contradiction memory or failure memory into proposal generation or hypothesis prioritisation is reserved — when that loop closes, it will be a single review-gated change visible inside the cycle executor.