Leaderboard

This page provides a detailed view of model performance in the Balsam benchmark, with two main display modes: a table and visual charts.

Customize View

You can customize the results using filter options. If no filters are selected, the overall model performance is displayed.
When categories or tasks are selected, the table and charts update automatically based on your selection.

Select Metrics Type
Select Models
Select Categories
Select Tasks

Overall Performance

Model
Average score
Creative Writing
Entailment
Fill in the Blank
Information Extraction
Logic
Program Execution
Question Answering
Reading Comprehension
Sequence Tagging
Summarization
Text Classification
Text Manipulation
Translation/Transliteration
0.6840.6450.7140.6140.6980.6640.5980.7150.7560.6880.6620.7340.6660.738
0.6820.6270.7140.5040.7090.6590.8450.7020.7290.6350.6430.6250.6710.796
0.6780.6490.6670.5720.7070.6560.7210.7270.7740.6580.6630.6510.6550.713
0.6740.6690.7620.5650.7190.6790.5120.7180.7560.6620.6660.6290.6730.747
0.6680.5790.7380.5420.6850.6460.7670.6810.7180.6360.6580.6640.6680.705
0.6680.6120.6910.4970.6760.6310.7940.6950.7420.6360.6560.5710.7100.771
0.6610.6390.6910.5000.6850.7010.7120.7040.7600.5880.6600.6660.6140.673
0.6480.6150.6190.5230.6590.6460.7770.6720.7540.6040.6550.6860.6170.597
0.6150.6010.6430.5120.6430.5720.4740.6790.7340.5870.6380.6330.6310.643
0.6060.6150.6430.4900.6180.6390.6180.6320.6920.5800.6450.5810.5860.539
0.5980.5580.6430.4070.6290.5580.6650.6190.7240.4880.6550.5380.6300.665
0.5960.5850.7620.4710.6440.5830.5310.6170.7370.5470.6640.5720.5230.514
0.5920.5780.5950.3950.6420.5340.5580.6430.7230.5380.6130.5280.6110.743
0.5830.5780.6910.4620.5800.5390.4400.6270.6410.5460.5960.5370.6550.693
0.5810.5860.7140.3980.6240.5620.4360.6130.7370.4970.6280.5320.6340.593
0.5300.5530.5240.3800.5480.5520.4340.5360.7540.4400.5970.5510.4670.549
0.5220.5070.5950.3160.5510.5090.5140.5760.6940.4380.5630.3760.5620.586
0.4950.5020.4050.4510.5380.5330.4990.5680.6470.4210.5180.4930.3920.465
0.4840.5030.5710.2570.5720.4490.3090.5760.6760.4280.6050.3230.5020.515
0.3140.2280.3090.1020.3590.2740.3220.3980.4880.3710.2620.3840.2170.364
--0.0000.018-0.018-0.0440.0750.0160.0610.0560.121-
-0.5570.6430.3650.6090.5520.6490.6090.6470.4250.6360.5060.571-
-0.5850.6190.2530.5270.5180.7890.5480.7250.3420.6110.3360.519-
-0.1100.3330.8330.4100.1430.088-0.6260.5270.2790.1940.1190.223