From the course: CompTIA SecAI+ (CY0-001) Cert Prep

Unlock this course with a free trial

Join today to access over 25,600 courses taught by industry experts.

Data cleansing

Data cleansing

To build secure and reliable AI models, we must start with high-quality data. Data cleansing is a major step in preparing trustworthy data for AI. It involves identifying and removing errors, inconsistencies, and irrelevant information before that data ever reaches a training pipeline. The primary focus of data cleansing is to make the data accurate and consistent. Just as a chef inspects ingredients before cooking, data engineers review data sets to ensure that nothing bad or misleading is included. Cleansing includes activities such as fixing typographical errors, filling in missing values, standardizing formats, and eliminating duplicate or incomplete records. In a cybersecurity context, data cleansing might involve removing log entries with invalid timestamps, correcting inconsistent log formats, and flagging sensor readings that fall outside realistic ranges. Each of these steps prevents a model from learning patterns that do not reflect real-world behavior. Data cleansing also…

Contents