Close Menu
    What's Hot
    AI Events

    Global AI Show Riyadh 2026: Why data quality will be the winners of the AI era

    By Art RyanJune 29, 20260

    Global AI Show Riyadh 2026 opens up a major discussion on one of the most…

    Global AI Show Riyadh 2026 Opens Today in Saudi Arabia

    June 29, 2026

    xAI Grok 4.5 Enters Private Beta at Tesla and SpaceX

    June 29, 2026

    Microsoft Launches MAI-Code-1-Flash for GitHub Copilot Users

    June 29, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    Breaking AI News
    Monday, June 29
    • Home
    • Events
    • Videos
      • Machine Can Think Summit 2026
      • Step Dubai Conference 2026
    • Technology & Innovation

      Global AI Show Riyadh 2026: Why data quality will be the winners of the AI era

      June 29, 2026

      Global AI Show Riyadh 2026 Opens Today in Saudi Arabia

      June 29, 2026

      xAI Grok 4.5 Enters Private Beta at Tesla and SpaceX

      June 29, 2026

      Microsoft Launches MAI-Code-1-Flash for GitHub Copilot Users

      June 29, 2026

      Meta Gemini AI Tokens: Why Meta Is Asking Staff to Use Gemini More Efficiently

      June 29, 2026
    • Business & Marketing

      xAI Grok 4.5 Enters Private Beta at Tesla and SpaceX

      June 29, 2026

      Meta Gemini AI Tokens: Why Meta Is Asking Staff to Use Gemini More Efficiently

      June 29, 2026

      MGX Raises Nearly $50 Billion to Accelerate Global AI Investments

      June 28, 2026

      Google Demand Gen Campaigns Get Gemini AI Guidance to Improve Ad Performance

      June 28, 2026

      Tech Equity Sales Renew AI Debt Binge Worries as AI Infrastructure Spending Accelerates

      June 28, 2026
    • Industry Applications

      Microsoft Launches MAI-Code-1-Flash for GitHub Copilot Users

      June 29, 2026

      DeepSeek Launches DSpark to Boost AI Inference Speed by Up to 80%

      June 29, 2026

      XLSMART and Tencent Cloud Complete Major AI-Driven Cloud Migration Project

      June 28, 2026

      NVIDIA Supercomputers Now Power Over 400 of the World’s 500 Fastest Systems

      June 27, 2026

      NVIDIA Vera CPU to Power Agentic Scientific AI at Los Alamos

      June 27, 2026
    • Trends & Insights

      Claude’s Agentic Work Reshapes Anthropic Economic Index

      June 28, 2026

      Tech Equity Sales Renew AI Debt Binge Worries as AI Infrastructure Spending Accelerates

      June 28, 2026

      UAE Investors Lead the World in AI Adoption, HSBC Survey Finds

      June 26, 2026

      Google Says Generative AI Is Creating a New Language for Marketing and Creativity at Cannes Lions 2026

      June 24, 2026

      OpenAI Reveals Future Ad Plans as ChatGPT Moves Toward the Intelligence Economy

      June 24, 2026
    • AI in Travel

      Global AI Show Riyadh 2026 Opens in 2 Days as Saudi Arabia Prepares for Major AI Conference

      June 27, 2026

      Agoda AI Travel Features Bring Real-Time Updates and Smarter Trip Planning

      June 26, 2026

      AI Travel Agents Could Disrupt Brand Loyalty as Travelers Embrace Smarter Booking Decisions

      June 26, 2026

      Jamaica Tourism 3.0 Uses AI to Transform Visitor Economy Into National Development Platform

      June 26, 2026

      Southwest Airlines Teams Up with AWS to Speed Up AI and Cloud Modernization

      June 21, 2026
    Breaking AI News
    Home » Stanford CRFM partners with Arabic AI on HELM Arabic leaderboard
    Technology & Innovation

    Stanford CRFM partners with Arabic AI on HELM Arabic leaderboard

    Art RyanBy Art RyanJanuary 29, 2026No Comments2 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Stanford CRFM partners with Arabic AI on HELM Arabic leaderboard
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Stanford CRFM collaborates with Arabic.AI to create a new evaluation platform focused on Arabic large language models. The collaboration resulted in HELM Arabic, a public leaderboard designed to measure model performance using standardized Arabic benchmarks. The project was developed by Stanford University’s Center for Research on Foundation Models (CRFM) together with Arabic.AI. The platform extends the existing HELM evaluation framework to Arabic language tasks.

    Stanford CRFM Collaborates with Arabic.AI on HELM Arabic

    The HELM Arabic leaderboard evaluates models across seven Arabic-language tasks. These tasks include AlGhafa, ArabicMMLU, Arabic EXAMS, MadinahQA, AraTrust, ALRAGE, and ArbMMLU-HT. Each benchmark measures different language capabilities. These include multiple-choice reasoning, question answering, grammar understanding, safety evaluation, and academic knowledge. The benchmarks are drawn from established Arabic datasets.

    Evaluation Methodology Used by Stanford CRFM and Arabic.AI

    Stanford CRFM collaborates with Arabic.AI using a standardized evaluation process. The system applies zero-shot prompting for instruction-tuned models. Multiple-choice tasks use Arabic letter options rather than Latin characters. The evaluation samples 1,000 examples per task subset to balance dataset distributions. Optional reasoning features are disabled to maintain consistency across models. The leaderboard records full model prompts and outputs to support reproducibility.

    Model Rankings and Benchmark Results

    In the initial HELM Arabic results, Arabic.AI LLM-X (Pronoia) achieved the highest overall score across all seven tasks. Among open-weights models, Qwen3 235B ranked highest by mean score. Other open-weights models appearing in the top ten include Llama 4 Maverick, Qwen3-Next 80B, and DeepSeek v3.1. Several Arabic-focused models, such as AceGPT-v2, ALLaM, JAIS, and SILMA, were evaluated but did not rank above leading multilingual models.

    Purpose of the HELM Arabic Platform

    Stanford CRFM collaborates with Arabic.AI to address gaps in Arabic model evaluation infrastructure. HELM Arabic provides a transparent system for comparing both proprietary and open models. The platform allows researchers to replicate results and track progress in Arabic language modeling using consistent benchmarks.

    Source: https://www.middleeastainews.com/p/stanford-crfm-collabs-with-arabic

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Art Ryan

    Related Posts

    Global AI Show Riyadh 2026: Why data quality will be the winners of the AI era

    June 29, 2026

    Global AI Show Riyadh 2026 Opens Today in Saudi Arabia

    June 29, 2026

    xAI Grok 4.5 Enters Private Beta at Tesla and SpaceX

    June 29, 2026

    Comments are closed.

    Latest News

    Global AI Show Riyadh 2026: Why data quality will be the winners of the AI era

    June 29, 2026

    Global AI Show Riyadh 2026 Opens Today in Saudi Arabia

    June 29, 2026

    xAI Grok 4.5 Enters Private Beta at Tesla and SpaceX

    June 29, 2026

    Microsoft Launches MAI-Code-1-Flash for GitHub Copilot Users

    June 29, 2026
    Facebook X (Twitter) Pinterest Vimeo WhatsApp TikTok Instagram LinkedIn YouTube Spotify Reddit Snapchat Threads

    AI University

    • Global Universities
    • Universities in Africa
    • Universities in Asia
    • Universities in Europe
    • Universities in Latin America
    • Universities in Middle East
    • Universities in North America
    • Universities in Oceania

    AI Tools & Apps Directory

    • AI Productivity Tools
    • AI Coding Tools
    • AI Voice Tools
    • AI Video Tools
    • AI Image Generators
    • AI Writing Tools

    Info

    • Home
    • About Us
    • AI Organizations & Associations
    • Contact Us
    • Cookie Policy
    • Copyright Policy
    • Disclaimer
    • Editorial Policy
    • Terms and Conditions

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    © 2026 Breaking AI News.
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.

    Sign Up

    Want to stay ahead In Artificial Intelligence?

     Sign up now and get exclusive breaking AI news and special updates—FREE!