Panorama of LLM evaluation

Activities

Department

News

SYNASC 2025

February 12, 2025 February 12, 2025

Official SYNASC 2025 page.

SYNASC 2023

November 14, 2024 February 10, 2025

Official page here.

Research Empowers Us

Dr. Clémentine Fourrier

In this presentation, we’ll cover how LLMs evaluation is done in practice, from the theoretical basics to more practical nits you should pay attention to while evaluating, based on the evaluation work done at Hugging Face. We’ll wrap up with an overview of the most relevant and latest evaluation datasets.

Short Bio:

Dr Clémentine Fourrier is the head of evaluation at Hugging Face – her team has worked on improving LLM evaluation through several initiatives, such as the Open LLM Leaderboard (evaluating at scale, 13K models over 2 years), the LLM Evaluation Guidebook (a free online resource), and lighteval (LLM evaluation software). They have also taken part in the creation of several datasets (such as GAIA or Global-MMLU).