In this presentation, we’ll cover how LLMs evaluation is done in practice, from the theoretical basics to more practical nits you should pay attention to while evaluating, based on the evaluation work done at Hugging Face. We’ll wrap up with an overview of the most relevant and latest evaluation datasets.
Short Bio:
Dr Clémentine Fourrier is the head of evaluation at Hugging Face – her team has worked on improving LLM evaluation through several initiatives, such as the Open LLM Leaderboard (evaluating at scale, 13K models over 2 years), the LLM Evaluation Guidebook (a free online resource), and lighteval (LLM evaluation software). They have also taken part in the creation of several datasets (such as GAIA or Global-MMLU).