Leaderboard

This page provides a detailed view of model performance in the Balsam benchmark, with two main display modes: a table and visual charts.

Customize View

You can customize the results using filter options. If no filters are selected, the overall model performance is displayed.
When categories or tasks are selected, the table and charts update automatically based on your selection.

Select Metrics Type
Select Models
Select Categories
Select Tasks

Overall Performance

Display All Charts As

Overall Performance

Average score comparison across all models