Skip to content
Protean

Evidence layer·Structure

Entity extraction

Entity extraction turns local scientific records into structured review signals. Additive, not authoritative — it augments ingestion and evidence organisation without replacing deterministic validators.

What gets extracted

The extractor is configured for scientific and peptide-adjacent entities. The route is urchade/gliner_large-v2 when the local GLiNER runtime is available; deterministic field extraction and regex fallback remain active when it is not.

  • peptide names
  • sequence-like strings
  • assay names
  • proteases
  • organisms
  • route-of-administration terms
  • degradation and stability terms
  • permeability terms
  • toxicity terms
  • failure signals

Where the output lives

data/processed/entities/entities_latest.jsonl

Entity records help the retrieval layer and the paper generator identify relevant assay context, protease language, degradation signals, and failure vocabulary. They appear in candidate explanation context, not in the scoring contract.

What entity extraction is not

Entity extraction can miss entities, over-select terms, or require local package support. It is a structuring aid, not a scientific claim engine. The extracted records influence retrieval and context; they do not influence validation, scoring, or learning.