SMDD-Bench

Trace Viewer

Inspect what each model did on a specific task: pick a run, pick a task, and view the agent transcript and trajectory side-by-side with the eval verdict.

Loading…