This page provides a detailed view of model performance in the Balsam benchmark, with two main display modes: a table and visual charts.
You can customize the results using filter options. If no filters are selected, the overall model performance is displayed.
When categories or tasks are selected, the table and charts update automatically based on your selection.
Model | Average score | Creative Writing | Entailment | Fill in the Blank | Information Extraction | Logic | Program Execution | Question Answering | Reading Comprehension | Sequence Tagging | Summarization | Text Classification | Text Manipulation | Translation/Transliteration |
0.684 | 0.645 | 0.714 | 0.614 | 0.698 | 0.664 | 0.598 | 0.715 | 0.756 | 0.688 | 0.662 | 0.734 | 0.666 | 0.738 | |
0.682 | 0.627 | 0.714 | 0.504 | 0.709 | 0.659 | 0.845 | 0.702 | 0.729 | 0.635 | 0.643 | 0.625 | 0.671 | 0.796 | |
0.678 | 0.649 | 0.667 | 0.572 | 0.707 | 0.656 | 0.721 | 0.727 | 0.774 | 0.658 | 0.663 | 0.651 | 0.655 | 0.713 | |
0.674 | 0.669 | 0.762 | 0.565 | 0.719 | 0.679 | 0.512 | 0.718 | 0.756 | 0.662 | 0.666 | 0.629 | 0.673 | 0.747 | |
0.668 | 0.579 | 0.738 | 0.542 | 0.685 | 0.646 | 0.767 | 0.681 | 0.718 | 0.636 | 0.658 | 0.664 | 0.668 | 0.705 | |
0.668 | 0.612 | 0.691 | 0.497 | 0.676 | 0.631 | 0.794 | 0.695 | 0.742 | 0.636 | 0.656 | 0.571 | 0.710 | 0.771 | |
0.661 | 0.639 | 0.691 | 0.500 | 0.685 | 0.701 | 0.712 | 0.704 | 0.760 | 0.588 | 0.660 | 0.666 | 0.614 | 0.673 | |
0.648 | 0.615 | 0.619 | 0.523 | 0.659 | 0.646 | 0.777 | 0.672 | 0.754 | 0.604 | 0.655 | 0.686 | 0.617 | 0.597 | |
0.615 | 0.601 | 0.643 | 0.512 | 0.643 | 0.572 | 0.474 | 0.679 | 0.734 | 0.587 | 0.638 | 0.633 | 0.631 | 0.643 | |
0.606 | 0.615 | 0.643 | 0.490 | 0.618 | 0.639 | 0.618 | 0.632 | 0.692 | 0.580 | 0.645 | 0.581 | 0.586 | 0.539 | |
0.598 | 0.558 | 0.643 | 0.407 | 0.629 | 0.558 | 0.665 | 0.619 | 0.724 | 0.488 | 0.655 | 0.538 | 0.630 | 0.665 | |
0.596 | 0.585 | 0.762 | 0.471 | 0.644 | 0.583 | 0.531 | 0.617 | 0.737 | 0.547 | 0.664 | 0.572 | 0.523 | 0.514 | |
0.592 | 0.578 | 0.595 | 0.395 | 0.642 | 0.534 | 0.558 | 0.643 | 0.723 | 0.538 | 0.613 | 0.528 | 0.611 | 0.743 | |
0.583 | 0.578 | 0.691 | 0.462 | 0.580 | 0.539 | 0.440 | 0.627 | 0.641 | 0.546 | 0.596 | 0.537 | 0.655 | 0.693 | |
0.581 | 0.586 | 0.714 | 0.398 | 0.624 | 0.562 | 0.436 | 0.613 | 0.737 | 0.497 | 0.628 | 0.532 | 0.634 | 0.593 | |
0.530 | 0.553 | 0.524 | 0.380 | 0.548 | 0.552 | 0.434 | 0.536 | 0.754 | 0.440 | 0.597 | 0.551 | 0.467 | 0.549 | |
0.522 | 0.507 | 0.595 | 0.316 | 0.551 | 0.509 | 0.514 | 0.576 | 0.694 | 0.438 | 0.563 | 0.376 | 0.562 | 0.586 | |
0.495 | 0.502 | 0.405 | 0.451 | 0.538 | 0.533 | 0.499 | 0.568 | 0.647 | 0.421 | 0.518 | 0.493 | 0.392 | 0.465 | |
0.484 | 0.503 | 0.571 | 0.257 | 0.572 | 0.449 | 0.309 | 0.576 | 0.676 | 0.428 | 0.605 | 0.323 | 0.502 | 0.515 | |
0.314 | 0.228 | 0.309 | 0.102 | 0.359 | 0.274 | 0.322 | 0.398 | 0.488 | 0.371 | 0.262 | 0.384 | 0.217 | 0.364 | |
- | - | 0.000 | 0.018 | - | 0.018 | - | 0.044 | 0.075 | 0.016 | 0.061 | 0.056 | 0.121 | - | |
- | 0.557 | 0.643 | 0.365 | 0.609 | 0.552 | 0.649 | 0.609 | 0.647 | 0.425 | 0.636 | 0.506 | 0.571 | - | |
- | 0.585 | 0.619 | 0.253 | 0.527 | 0.518 | 0.789 | 0.548 | 0.725 | 0.342 | 0.611 | 0.336 | 0.519 | - | |
- | 0.110 | 0.333 | 0.833 | 0.410 | 0.143 | 0.088 | - | 0.626 | 0.527 | 0.279 | 0.194 | 0.119 | 0.223 |