Evaluating frontier models
on biological reasoning

Benchmarking agentic reasoning on therapeutically-relevant modalities including proteins, small molecules, and antibodies.

Agentic drug discovery walkthrough
Stay in touch
30
Tasks
1/4
Domains Live
2
Models

Benchmark Results

Given proteins with their small-molecule binding residues labelled, agents need to infer the rules and apply them to held-out chains: which distance threshold defines a contact for a small molecule, which co-crystallised molecules are real ligands vs crystallisation buffer, when a metal is a cofactor vs a surface ion, how to handle sugars and modified amino acids, and whether every duplicate ligand copy contributes. Structures come from real PDB depositions. Difficulty tiers vary the number of labelled examples: 3–5 for extra-hard and 12–13 for medium-hard. The tasks are self-contained: every criterion is demonstrated by at least one positive example the agent sees.