BALSAM is a visionary collaboration among esteemed academic and governmental institutions across the Middle East. BALSAM’s objective is to spearhead the development and curation of domain-specific test datasets crucial for benchmarking and evaluating the performance of LLMs on a broad variety of Arabic NLP tasks.
10+
Organizations
50,000+
Questions
67
Language tasks
1000+
Datasets
Pooling resources and expertise to create high-quality datasets tailored for AI testing, covering diverse domains and various Arabic dialects to enhance the robustness and versatility of LLMs.
Establishing standardized evaluation frameworks and benchmarks to rigorously assess the performance of LLMs developed by consortium members, facilitating transparent comparisons and driving continuous improvement.
See the latest benchmark results for the top LLM for Arabic.
Prioritizing ethical considerations and responsible AI practices throughout the development process, ensuring fairness, transparency, and accountability in AI models and applications.
Bringing the Arabic NLP together to craft a common vision and to build common datasets and benchmarking.