From the course: AI in Risk Management and Fraud Detection

Unlock this course with a free trial

Join today to access over 25,600 courses taught by industry experts.

Handling imbalanced datasets

Handling imbalanced datasets

- [Instructor] Fraud detection often deals with a unique challenge. Imbalanced data sets In many real world cases, only a tiny fraction say, less than 1% of transactions are actually fraudulent, that means your data set is dominated by normal transactions, and this imbalance can cause major problems for machine learning models. Why is this a problem? Let's say your model simply predicts non-fraud every time. It would be 99% accurate, but completely useless. You'd miss nearly all the actual fraud cases, that's why accuracy by itself is misleading. In fraud detection, we care much more about recall or catching fraud and precision, avoiding false alarms. Fortunately, there are several techniques to help address this imbalance. The first is oversampling, increasing the number of fraud cases in your training set. One popular method is called SMOTE, which stands for a synthetic minority oversampling technique. It works by generating realistic synthetic examples of fraud cases, helping the…

Contents