What is this leaderboard about?
BALSAM's leaderboard shows the average score of each model evaluated on all tasks within the benchmark. The average score is calculated based on the individual task scores.
Each row in the leaderboard represents a model, and each column represents a task. You can compare the performance of different models on different tasks using the filter menus.
What does the score mean?
The score is a metric that is specific to the task. Each task might have a different scoring metric, and the score is calculated based on that metric. The score is a number between 0 and 1, where 1 is the best possible score.