SMDD-Bench Leaderboard

A benchmark for LLM agents on small molecule drug design. Agents are given a sandboxed Python environment plus a structure-prediction tool (Boltz) and an ADMET predictor, then asked to solve tasks whose solutions are easy to verify computationally but hard to produce.

Read the paper Submit your results

Full benchmark (502 tasks per run). Click headers to sort.

Agent	Pass rate ▼	2D Pharmacophore Identification	Interaction Point Discovery	Scaffold Hopping	Lead Optimization	Fragment Assembly	Cost / task	Avg time/task
gpt-5.4_medium_minimalist_agent boltz=8 · admet=15	40.2%	12.0%	0.0%	3.8%	57.6%	1.7%	$0.78	23.4m
gemini-3.1-pro_medium_minimalist_agent boltz=8 · admet=15	39.0%	20.0%	4.0%	0.0%	55.6%	1.7%	$0.62	19.1m
claude-4.6-sonnet_medium_minimalist_agent boltz=8 · admet=15	38.0%	28.0%	0.0%	3.8%	53.5%	0.0%	$1.61	23.8m
kimi-k2.5-thinking_minimalist_agent boltz=8 · admet=15	30.3%	12.0%	0.0%	1.9%	43.5%	0.0%	$0.40	30.2m
qwen3.5-397b-a17b_minimalist_agent boltz=8 · admet=15	27.5%	4.0%	0.0%	1.9%	40.0%	0.0%	$0.75	18.6m
deepseek-v3.2_minimalist_agent boltz=8 · admet=15	24.3%	8.0%	0.0%	3.8%	34.7%	0.0%	$0.43	43.0m
minimax-m2.7_minimalist_agent boltz=8 · admet=15	19.3%	16.0%	0.0%	1.9%	27.1%	0.0%	$0.36	47.0m

About the benchmark

Each task is generated and synthesized to be both challenging and guarnateed-solvable. The benchmark exists to be a testbed for LLM capability under realistic chemistry constraints; however, the individual tasks themselves may not be of signficant therapuetic interest. Binding affinity values throughout SMDD-Bench are Boltz-2 outputs of log₁₀(IC₅₀); lower is stronger.

Type 1

2D Pharmacophore Identification

Given sets of active and inactive molecules for a protein target, write a Python predicate that distinguishes actives from inactives via 2D structural reasoning.

Type 2

Interaction Point Discovery

Given a protein pocket, propose 3D coordinates of the three interaction points most likely conserved across diverse binders (donor/acceptor/hydrophobic, etc.).

Type 3

Scaffold Hopping

Given an active molecule, propose a molecule with a chemically distinct scaffold that preserves the protein-ligand interactions and remains a binder.

Type 4

Lead Optimization

Modify a strong binder to improve sampled objectives (potency, ADMET, etc.) while satisfying hard constraints and holding other properties constant.

Type 5

Fragment Assembly

Given 1–2 fragments with 3D poses in a pocket, design a single drug-like molecule that incorporates the fragments and binds the target.

Diversity Leaderboard

The diversity benchmark resamples a 20-task subset of Lead Optimization where every model produces 10 submissions per task. We measure whether agents produce diverse, distinct, novel successful solutions rather than converging on the same, ideally passing, solution.

Diversity benchmark: 20 tasks × 10 rollouts = 200 submissions per agent.

Agent	Avg successful ▼	Avg unique & successful	Novel & successful	Pairwise Tanimoto	Cost / task	Avg time/task
claude-4.6-sonnet_medium_minimalist_agent boltz=8 · admet=15	8.40	3.70	74.0%	0.823	$0.87	14.7m
gemini-3.1-pro_medium_minimalist_agent boltz=8 · admet=15	8.00	4.00	67.6%	0.809	$0.59	17.0m
gpt-5.4_medium_minimalist_agent boltz=8 · admet=15	7.90	2.75	64.6%	0.863	$0.66	18.4m
qwen3.5-397b-a17b_minimalist_agent boltz=8 · admet=15	7.25	3.55	67.2%	0.814	$0.72	15.3m
kimi-k2.5-thinking_minimalist_agent boltz=8 · admet=15	6.00	3.85	65.0%	0.786	$0.44	28.9m
minimax-m2.7_minimalist_agent boltz=8 · admet=15	6.00	4.05	73.1%	0.763	$0.34	47.7m
deepseek-v3.2_minimalist_agent boltz=8 · admet=15	5.35	3.85	68.4%	0.763	$0.47	59.6m

Citing SMDD-Bench

If you use SMDD-Bench in your work, you can cite us here!

@misc{han2026smddbench,
      title={SMDD-Bench: Can LLMs Solve Real-World Small Molecule Drug Design Tasks?}, 
      author={Kevin Han and Renfei Zhang and Kathy Wei and Hamed Mahdavi and Niloofar Mireshghallah and Amir Farimani},
      year={2026},
      eprint={2605.21740},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2605.21740}, 
}