From the course: Data Quality: Core Concepts

Impact of poor data quality

From the course: Data Quality: Core Concepts

Impact of poor data quality

- [Instructor] For this video, we're going to be talking about the impact of poor data quality and make it real for you as to why it's important to account for this. I'm going to go over three specific use cases to really highlight, at best-case scenario, maybe get a few numbers wrong, people have the wrong idea about the business, worst-case scenario, which we're going to be highlighting here, the data is wrong where it has a regulatory impact, or it impacts peoples' lives in a negative way. So let's jump right into this. For the first case study, we're going to be talking in the healthcare space, and specifically, the Epic Sepsis Model, which was a electronic health record company who utilized their massive EHR dataset to create a model to predict sepsis, with the goal of saying, "Hey, if we can predict this really life-threatening thing, we'll reduce the amount of patients impacted by it." What ended up happening was the opposite. It predicted worse than the standard of care what doctors are currently doing, and it over-alerted issues for patients to the point where people weren't even paying attention to the model. This all came to light with researchers for "JAMA," which is one of the top medical publishers, highlighted how the model was actually wrong. And they had this whole case blow up on them. And what they actually found out was, for the electronic health records, they trained their ML model on all the data globally, but when they put it into hospitals at a local level, the data from a data quality perspective did not match the population, and therefore, were giving wrong results. Epic was eventually able to rectify the issue by updating the hospitals, say, "You need to train on your own specific patient data." But because of all the news and also being in "JAMA," they took a huge reputational hit and people were more wary to use the model. In a second case, we have finance. And this is pretty common. Banks typically are paying fines for regulatory things because it's such a complex system to handle. One recent one is actually Citibank, which was fined $136 million for failing to fix longstanding data issues. And the Federal Reserve Board actually called out how, you know, they were supposed to make sufficient progress in remediating these problems for data quality management, and they failed to do so. And this is in addition to another $400 million they were fined back in 2020, which kicked this all off. And we have, actually, in the documents, the SEC files that you can actually look to see what's happening here. And then finally, this is actually one of my favorite kind of use cases to call out, because this highlights how insidious data quality can be. It can be hidden to you, one of the most critical aspects of your business, and it can set you up for failure in the long term. And it's a slow burn here. So in this case, we had Bird, which is a ridesharing group for scooters. I was in San Francisco (laughing) when this was happening. There were Bird scooters everywhere, right? Well, they were a big kind of venture-backed company. They raised a lot of money. And they actually had to go back and say, "For the past two years, we have been reporting our revenue wrong." And they actually had to go to the SEC again to say, "All of our financial data that we reported for the past two years cannot be trusted." What ended up happening from a data quality perspective, they had wallets that were preloaded. And they were able to do rides. But if the wallets were empty or has insufficient funds, they did not account for that. So they were overestimating their revenue. Again, it was a business logic thing that seemed very minor, but it built up over years.

Contents