Five Data Quality Standards You Should be Targeting
Being a software engineer by trade, I always wanted to help my children learn code and to share my passion for building computer algorithms. But quite often, we do learn things from our kids as well. Inspired by the researchers from MIT who built Scratch, a tool for block-based visual programming language to teach children code, we’ve designed DataGroomr visual rule editor to be simple and easy to use, yet powerful and flexible. I think the kids would approve – it’s that easy to use. We’ve just released the latest enhancements to the DataGroomr app and, if you’re interested, you can read more about that on our website. But what I really wanted to talk about is why I think data cleansing is so important.
As you’ll know by now from reading my posts, I helped to build the DataGroomr app, which leverages Machine Learning to find duplicate Salesforce records automatically. What’s important here are two concepts: one is that Salesforce cannot catch all of your duplicates, and for those it does identify, it can’t dedupe across objects. The second concept is that Machine Learning enables an approach that will constantly improve and refine how algorithms are executed. I may be building the initial algorithms, but Machine Learning is going to “remember” how users are using the app and adjusting rules and can ensure that the most recent data is cleaned according to a pre-existing set of rules. As my partner and co-founder Steve Pogrebivsky noted last week, “In the case of duplicate management, Machine Learning algorithms can actually be trained to identify and eliminate the seemingly ceaseless introduction of duplicate data before it gets into your database.”
That essentially means that when a company implements Machine Learning for this purpose, they are taking the necessary steps to eliminate the risks of bad data for themselves and for their customers. We’ve all heard the alarming statistic from Gartner that the average cost of poor data quality on businesses is between $9.7M and $14.2M annually. Another study by the Royal Mail Data Services reveals that organizations surveyed believe that inaccurate customer data costs them six percent of their annual revenue. And that’s why data cleansing is so important. “Dirty” data are undermining businesses and disrupting customer relationships, to say nothing of the lost revenue.
There are three approaches that are at the core of data cleansing: deduplication, normalization, and record completion. To maintain good data over bad data, all three should be undertaken periodically. Unfortunately, doing it once and forgetting about it is not really an option. As new data is input, the risk of bad data increases yet again, and you could end up in the same, or worse, spot than when you started. Maintaining the value of your data requires continuous vigilance and undertaking an approach that is repeatable. Again, that is where Machine Learning is so valuable. It becomes the vigilant monitoring system that your business needs in order to maintain data integrity so that you can proceed with business. Time lost to checking and rechecking data records is time lost to building the business.
The need for data cleansing is not a one-and-done. We at DataGroomr recommend that you first conduct an assessment of your data quality. For any data management platform, you will want to adhere to these five key data quality standards: validity, reliability, precision, integrity, timeliness. Take a look at how the data flows through the system. An obvious red flag is a preponderance of duplicate records, which is a common occurrence with most CRM platforms.
For those businesses that use the Salesforce platform, third-party tools can help to maintain those data quality standards. But you’ll first want to understand why the problem is occurring so that you can take steps to prevent it in the future. In our experience, typical causes tend to be poor manual data entry, botched imports, and third-party apps.
Maintaining an awareness of all the ways the data can go wrong is why we are able to stay on top of the best data cleansing approaches. When it comes to the concept of duplication of records, that’s where the DataGroomr tool excels. If you want to discuss some strategic approaches to managing your data duplication issues with Salesforce and/or the value of Machine Learning, we’d love to chat - send me a message. And we do offer a free, 14-day trial of DataGroomr so you can see for yourself how it works.