Data Quality is a Money Pit

It’s not. It shouldn’t be.

It should be contributing to better results, to better performance, to better decisions.

So, Data Quality shouldn’t be a money pit, but in some awkward way it is happening.

Although the awareness of the strategic importance of data exists, with a special focus on its quality, most organizations are still struggling to enable their data capabilities, risking poor strategic decision making and misallocation of critical resources.

I came across an article today (https://sdtimes.com/data/report-75-of-developers-say-theyre-responsible-for-data-quality/) with the suggestive title “Report: 75% of developers say they’re responsible for data quality”.

This brings up that the lack of a structured approach to data quality is eating away on every organization’s financial performance, impairing the decision processes, preventing additional gains in markets that are increasingly competitive and complex.

Not mentioning the direct impacts of poor data quality in the business processes, that I’ve already highlighted in previous occasions, this suggests that every organization has valuable resources redirecting their time and skills to get data into minimal quality levels for their specific needs, often unaligned with the global business and data strategy.

If 75% of developers feel in some way responsible for the quality of the data they need, this means that a percentage of their time is diverted from the tasks they are hired to do and should be focusing on.

It’s not unlikely that the remaining 25% of developers, although not feeling directly responsible for data quality, are also spending some of their handling data quality issues.

Let me complement this reasoning with an additional couple of examples.

It is common the hear that 80% of a data scientist time is spent on cleansing and classifying data, even being accepted as part of the job, and this simply means that they are using their specific and valuable skills and creating value from them just 20% of the time.

Something that is also common is that some critical reports within organizations are built on “excel chains”, where different people give their inputs to assure the quality of the results, again, a considerable percentage of, sometimes highly skilled resources, is being diverted in to reviewing and fixing data.

Laptop data quality

This can be called laptop data quality.

Where a responsibility of the organisation, the quality of its data, its informally delegated to some people that, according to their specific needs and context, autonomously, and relying on their individual judgment develop a version of quality data.

Back to the Excel

Although some of the impacts are easier to quantify than others, there’s one at least that can be directly imputed to this approach:

How many hours are being spent across the organization in ad-hoc tasks related with data quality, and what is the cost of those hours?

What is the effective cost, or value not being generated, due to the hours that are diverted to these tasks?

Answering these questions will allow to understand what percentage of this value would be needed to setup a structured approach for this situation.