Soda has released Soda Cloud. It calls it a: “data observability and collaboration platform that helps data teams get ahead of the silent data issues affecting organizations every day.” The goal is to provide organisations with trusted data using tools to detect and resolve collaboration issues. It also delivers a collaboration platform so that organisations can define what ‘good data’ looks like.
Tom Baeyens, CTO, Soda, said: “The biggest problem faced by data teams today is that they are flying blind without the capabilities to detect problems with their data. The result is that data issues remain silent and undetected
“Without a clear strategy to monitor data for issues, most organizations do not know how to start addressing the problem which leaves their systems exposed and can result in serious downstream issues for the data products they are building. Soda gives those responsible for data quality and integrity the capability to create a culture and community of good data practices that starts at the sources of data and flows through an organization. This is what Soda and our new Soda Cloud are all about: giving data teams the power to create trusted data products, and an integrated platform to bring everyone closer to data.”
With companies looking to leverage their data to improve revenue, more needs to be done. So what can they do?
The problems of ‘dirty data’
Dirty data is a problem that has bedevilled companies for decades. The problems are often deep, complex and as much software as they are user-driven. Some of the common problems are:
- Coding errors where the same piece of data is not recorded in the same format making it hard to combine or search data.
- Incomplete or abandoned records created when a customer service agent takes a call and the caller hangs up partway through the call.
- Duplicate records created often due to mistakes in spelling names or mistyping postcodes or telephone numbers.
- Poor data merging, which mixes data from multiple records, not only makes the data worthless but can also result in GDPR breaches.
- Sending mailshots to people who have died or to people who have since moved house. Both cause problems. One upsets grieving family members, and the other can create opportunities for fraud.
Clean-up or enhance?
Getting clean data using external companies can be expensive, and most companies limit their spending. For example, do you want data that is 100% clean, or will 80%, 60%, or even 50% do?
Do you just want the data tidied up? One solution that marketing teams use is an outside agency to go through postcodes and normalise addresses. The downside of that approach is that choosing the wrong postcode database means those living in new builds may not appear on the list.
If the goal is data enhancement, then using the ABC Scale database is common. It provides a socio-economic indicator of those living at a postcode to ensure marketing materials go to those who can afford what is being sold.
These solutions allow for data to reach a certain level of cleanliness and utility ’at a defined moment in time’. The problem is keeping that data clean after that.
What is Soda Cloud looking to do?
Soda Cloud is looking to get to the root cause of dirty data. The idea is that organisations go through the data, get it to a quality level and then use Soda Cloud to ensure it stays at the level. The platform allows the user to create a wide range of Key Performance Indicators (KPIs) against which data cleanliness can be measured.
Those KPIs focus on both core and transformed data. This is critical. Measuring the cleanliness and accuracy of the core data, provides organisations with a way to spot data gathering problems. It could be measuring the number of incomplete records or running regular checks to find those duplications. It can also help developers ensure that field formats or checking systems are working as expected.
When it comes to transformed data, it also looks at where there are problems bringing data together. It allows data engineers to spot problems with their ETL code and processes. It enables them to reduce and even eliminate mismatching of records quickly and easily.
Importantly, Soda has also integrated this into a collaboration platform. It means that teams are not relying on meetings or email chains to get things done. When a problem is spotted, it can be dealt with quickly by bringing the key people together to get code changed and data cleaned up.
Enterprise Times: What does this mean?
Data cleansing by third party organisations is big business and will continue to be so. However, one of the reasons they make so much is that organisations have such poor data cleanliness. Bad code, poor input checking, user errors, all of these are at the core of the problem. Spending large amounts of money to get data cleaned by a third party without addressing the root causes is pointless.
With Soda Cloud, Soda believes it can help companies solve some of their internal problems in how data is managed. The challenge will be getting enough people in the data ownership chain to buy-in. Marketing teams will need little convincing as they know they need good, accurate and trustworthy data. However, persuading developers to add to their workload and correct underlying application and input field checking will be more difficult.