Trace Viewer
Inspect what each model did on a specific task: pick a run, pick a task, and view the agent transcript and trajectory side-by-side with the eval verdict.
Loading…
Inspect what each model did on a specific task: pick a run, pick a task, and view the agent transcript and trajectory side-by-side with the eval verdict.