Blazent has published its latest findings taken from a report entitled The Role of Data Quality Management in Machine Learning and Predictive Analytics which was co authored by 451 Research. The key finding from the report is that more than half of those who were keen on advanced technologies such as machine learning, felt that the state of their company’s data was a problem.
Charlie Piper, CEO, Blazent said: “As foundational as data quality is to an organization’s success we continue to see that a majority of IT Execs are not confident in their data quality management practices. The pace of change and seemingly never ending increase in the amount of data and data sources are significant drivers of this lack of confidence.”
The findings will come as no surprise to anyone involved in dealing with large amounts of data inside organisations, especially marketing teams. It is not unusual for marketing teams to spend significant sums cleaning data prior to every mailshot and campaign. This repetition is due to poorly designed applications and a lack of quality at the data gathering phase.
Piper went on to say: “Critical business decisions are made without a complete and accurate picture. These findings further validate how crucial it is for IT and the C-suite to continue to prioritize data quality, employing an organization-wide streamlined process for data management.”
Getting data quality right is a balancing act
Large amounts of money have been spent to try and simplify data input and reduce errors but they continue to creep in. Sometimes it is caused by poor spelling while at other times the culprit is a failure to check if a record already exists. For data where there is no human input errors are also present. These are often caused when data is moved between systems and merged with different field types and formatting the common issue.
The solution is to use Data Quality Management (DQM) processes to sort out the data. The reports shows that 60% of respondents have already invested in DQM which begs the question as to why they don’t trust their data. The answer perhaps lies in the problems that the marketing department face, a lack of basic cleanliness with the data leading to it becoming dirty almost as soon as it has been through the DQM process.
One of the interesting points from the survey is a recognition that external data is not always clean. This creates a particular challenge. Externally acquired data often has check records so that the provider can see how many times the data is used. The contract to use will often have clauses that prevents records being removed from the data set although it doesn’t stop companies from creating their own subsets of the data and then cleaning those records prior to their use.
Another issue that is highlighted in the report is that of perception vs reality. In some cases the perception of data accuracy is far higher than the reality. Importantly there are also cases where the reality is higher than the perception. The problem for board members is trying to work out which one they should trust – their perception or the reality.
DQM must solve cost vs business value
As DQM has matured so has customer expectation. No customer now expects to get back perfectly clean data. The time and cost to do that often far exceeds the urgency of the business unit. In the last few years the decision from marketing teams has been to decide what they can live with. This might be data that is rated 90%, 80%, 70%, or even less, clean. This is not an acceptable approach for machine learning or for some of the advanced analytics companies are beginning to undertake.
In the report it is apparent that around 94% of respondents believe that dirty data has reduced the business value of the data. The largest group estimate the impact at 10-19% of value lost. The problem is that any loss means money wasted and potential business and revenue opportunities that will never present again.
As well as a loss of business value there are other costs. Among these are the extra time taken to reconcile data, bad decision making and customer dissatisfaction. The latter is rated at just 1 half of a percentage point more than loss of credibility in a system. These are serious issues and suggest that while companies may have invested in DQM they have yet to implement sufficiently robust processes to ensure that it is delivered.
There is positive news on the horizon
The report shows that companies are not unaware of the problem. A third of companies are developing a plan to deal with this issue while just over 31% are already implementing a solution. 24% are already seeing the benefits from addressing the issue but importantly, just over 6% say their response plan failed and they are trying again.
This latter group is far smaller than might have been expected and it will be interesting to see in a year if it goes up as those currently implementing and developing a plan go through their failure points.
There is a lot more in the report which should be widely circulated around IT, marketing and anyone involved in handling data. The report is still valid even if the end user organisation is not using Blazent for its DQM as the comments and numbers are likely to be fairly similar across other organisations.
The question for many companies is while those in this report have begin to deal with their SQM failures, are you even aware of yours? The answer for many, based on previous reports by other DQM vendors is almost certainly no.