From the course: AI Orchestration: Validation and User Feedback and Performance Metrics
Unlock this course with a free trial
Join today to access over 25,300 courses taught by industry experts.
Evaluating large language models (LLMs)
From the course: AI Orchestration: Validation and User Feedback and Performance Metrics
Evaluating large language models (LLMs)
- [Instructor] We've discussed the metrics that we'd use to evaluate traditional machine-learning models. How would we evaluate large language models? But first, let's talk about why evaluation of LLMs is even necessary. LLMs are increasing the used in applications that directly impact users, such as chat bots, content creation tools, and even medical diagnosis systems. This means it's very important to ensure that the outputs of these models are accurate, relevant to what the user wants, and safe for end users. This is especially important because LLMs can be used to influence opinions, disseminate information, and even spread propaganda. We've already discussed that LLMs are prone to hallucinations. They can sometimes generate incorrect or nonsensical outputs, and they can exhibit biases present in the vast data sets that they are trained on. Evaluating LLMs helps detect these issues, such as hallucinations, biases, or inconsistencies in responses. Evaluation also helps align model…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
(Locked)
Evaluating models using metrics1m 50s
-
(Locked)
Evaluating regression models2m 48s
-
(Locked)
Evaluating classification models4m 8s
-
(Locked)
Evaluating clustering models1m 52s
-
Accuracy precision recall5m 45s
-
(Locked)
Evaluating large language models (LLMs)5m 3s
-
(Locked)
Human evaluation2m 12s
-
(Locked)
Statistical methods for LLM evaluation2m 28s
-
(Locked)
ROUGE scores3m 29s
-
(Locked)
BLEU score1m 13s
-
(Locked)
METEOR score57s
-
(Locked)
Perplexity2m 48s
-
(Locked)
Model-based methods for LLM evaluation1m 53s
-
(Locked)
Natural language inference3m 22s
-
(Locked)
BLEURT3m 57s
-
(Locked)
Judge models4m 16s
-
(Locked)
LLM evaluation10m 11s
-
(Locked)
-
-