From the course: AI in Risk Management and Fraud Detection
Unlock this course with a free trial
Join today to access over 25,600 courses taught by industry experts.
Handling imbalanced datasets
From the course: AI in Risk Management and Fraud Detection
Handling imbalanced datasets
- [Instructor] Fraud detection often deals with a unique challenge. Imbalanced data sets In many real world cases, only a tiny fraction say, less than 1% of transactions are actually fraudulent, that means your data set is dominated by normal transactions, and this imbalance can cause major problems for machine learning models. Why is this a problem? Let's say your model simply predicts non-fraud every time. It would be 99% accurate, but completely useless. You'd miss nearly all the actual fraud cases, that's why accuracy by itself is misleading. In fraud detection, we care much more about recall or catching fraud and precision, avoiding false alarms. Fortunately, there are several techniques to help address this imbalance. The first is oversampling, increasing the number of fraud cases in your training set. One popular method is called SMOTE, which stands for a synthetic minority oversampling technique. It works by generating realistic synthetic examples of fraud cases, helping the…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.
Contents
-
-
-
-
(Locked)
Data understanding and preparation in ChatGPT5m 11s
-
(Locked)
Handling imbalanced datasets4m 15s
-
(Locked)
Feature engineering and preprocessing in ChatGPT4m 36s
-
(Locked)
Building and evaluating baseline models4m 4s
-
(Locked)
Advanced modeling and hyperparameter tuning5m 18s
-
Model interpretation, validation, and monitoring2m 53s
-
(Locked)
-
-