leaderboard.title

What is this leaderboard about?

BALSAM's leaderboard shows the average score of each model evaluated on all tasks within the benchmark. The average score is calculated based on the individual task scores.


Each row in the leaderboard represents a model, and each column represents a task. You can compare the performance of different models on different tasks using the filter menus.


What does the score mean?

The score is a metric that is specific to the task. Each task might have a different scoring metric, and the score is calculated based on that metric. The score is a number between 0 and 1, where 1 is the best possible score.

model
average
Creative Writing
Entailment
Factuality
Fill in the Blank
Information Extraction
Logic
Machine Translation
Program Execution
Question Answering
Reading Comprehension
Sequence Tagging
Summarization
Text Classification
Text Manipulation