Benchmarking agentic reasoning on therapeutically-relevant modalities including proteins, small molecules, and antibodies.
Given proteins with their small-molecule binding residues labelled, agents need to infer the rules and apply them to held-out chains: which distance threshold defines a contact for a small molecule, which co-crystallised molecules are real ligands vs … crystallisation buffer, when a metal is a cofactor vs a surface ion, how to handle sugars and modified amino acids, and whether every duplicate ligand copy contributes. Structures come from real PDB depositions. Difficulty tiers vary the number of labelled examples: 3–5 for extra-hard and 12–13 for medium-hard. The tasks are self-contained: every criterion is demonstrated by at least one positive example the agent sees.