SMDD-Bench Leaderboard
A benchmark for LLM agents on small molecule drug design. Agents are given a sandboxed Python environment plus a structure-prediction tool (Boltz) and an ADMET predictor, then asked to solve tasks whose solutions are easy to verify computationally but hard to produce.
| Agent | Pass rate ▼ | 2D Pharmacophore Identification | Interaction Point Discovery | Scaffold Hopping | Lead Optimization | Fragment Assembly | Cost / task | Avg time |
|---|---|---|---|---|---|---|---|---|
gpt-5.4_medium_minimalist_agent boltz=8 · admet=15 | 40.2% | 12.0% | 0.0% | 3.8% | 57.6% | 1.7% | $0.78 | 23.4m |
gemini-3.1-pro_medium_minimalist_agent boltz=8 · admet=15 | 39.0% | 20.0% | 4.0% | 0.0% | 55.6% | 1.7% | $0.62 | 19.1m |
claude-4.6-sonnet_medium_minimalist_agent boltz=8 · admet=15 | 38.0% | 28.0% | 0.0% | 3.8% | 53.5% | 0.0% | $1.61 | 23.8m |
kimi-k2.5-thinking_minimalist_agent boltz=8 · admet=15 | 30.3% | 12.0% | 0.0% | 1.9% | 43.5% | 0.0% | $0.40 | 30.2m |
qwen3.5-397b-a17b_minimalist_agent boltz=8 · admet=15 | 27.5% | 4.0% | 0.0% | 1.9% | 40.0% | 0.0% | $0.75 | 18.6m |
deepseek-v3.2_minimalist_agent boltz=8 · admet=15 | 24.3% | 8.0% | 0.0% | 3.8% | 34.7% | 0.0% | $0.43 | 43.0m |
minimax-m2.7_minimalist_agent boltz=8 · admet=15 | 19.3% | 16.0% | 0.0% | 1.9% | 27.1% | 0.0% | $0.36 | 47.0m |
About the benchmark
Each task is generated and synthesized to be both challenging and guarnateed-solvable. The benchmark exists to be a testbed for LLM capability under realistic chemistry constraints; however, the individual tasks themselves may not be of signficant therapuetic interest. Binding affinity values throughout SMDD-Bench are Boltz-2 outputs of log10(IC50); lower is stronger.
2D Pharmacophore Identification
Given sets of active and inactive molecules for a protein target, write a Python predicate that distinguishes actives from inactives via 2D structural reasoning.
Interaction Point Discovery
Given a protein pocket, propose 3D coordinates of the three interaction points most likely conserved across diverse binders (donor/acceptor/hydrophobic, etc.).
Scaffold Hopping
Given an active molecule, propose a molecule with a chemically distinct scaffold that preserves the protein-ligand interactions and remains a binder.
Lead Optimization
Modify a strong binder to improve sampled objectives (potency, ADMET, etc.) while satisfying hard constraints and holding other properties constant.
Fragment Assembly
Given 1–2 fragments with 3D poses in a pocket, design a single drug-like molecule that incorporates the fragments and binds the target.
Diversity Leaderboard
The diversity benchmark resamples a 20-task subset of Lead Optimization where every model produces 10 submissions per task. We measure whether agents produce diverse, distinct, novel successful solutions rather than converging on the same, ideally passing, solution.
| Agent | Avg successful ▼ | Avg unique & successful | Novel & successful | Pairwise Tanimoto | Cost / task | Avg time |
|---|---|---|---|---|---|---|
claude-4.6-sonnet_medium_minimalist_agent boltz=8 · admet=15 | 8.40 | 3.70 | 74.0% | 0.823 | $0.87 | 14.7m |
gemini-3.1-pro_medium_minimalist_agent boltz=8 · admet=15 | 8.00 | 4.00 | 67.6% | 0.809 | $0.59 | 17.0m |
gpt-5.4_medium_minimalist_agent boltz=8 · admet=15 | 7.90 | 2.75 | 64.6% | 0.863 | $0.66 | 18.4m |
qwen3.5-397b-a17b_minimalist_agent boltz=8 · admet=15 | 7.25 | 3.55 | 67.2% | 0.814 | $0.72 | 15.3m |
kimi-k2.5-thinking_minimalist_agent boltz=8 · admet=15 | 6.00 | 3.85 | 65.0% | 0.786 | $0.44 | 28.9m |
minimax-m2.7_minimalist_agent boltz=8 · admet=15 | 6.00 | 4.05 | 73.1% | 0.763 | $0.34 | 47.7m |
deepseek-v3.2_minimalist_agent boltz=8 · admet=15 | 5.35 | 3.85 | 68.4% | 0.763 | $0.47 | 59.6m |
Citing SMDD-Bench
If you use SMDD-Bench in your work, you can cite us here!
@misc{smddbench2026,
title = {SMDD-Bench: A Small Molecule Drug Design Benchmark for LLM Agents},
author = {Your Name and Collaborators},
year = {2026},
eprint = {arXiv:xxxx.xxxxx},
archivePrefix = {arXiv},
url = {https://smddbench.com},
}