Data is the new oil. At least that has been the mantra for at least the last five years. Granted, most of those using it are vendors looking to sell analytics, machine learning and AI solutions. However, the amount of data that organisations currently hold is vast. Statista put it at 59 zettabytes in 2020.
That data volume is expected to almost treble by 2024. It begs several questions. How useful is that data? Do you trust your data? How much time is spent massaging the data to get it into a usable state? Is dirty data of any use other than to help storage vendors sell more hardware?
Does your data suck?
Enterprise Times started by asking O’Connor about dirty data and where it comes from.
O’Connor said: “We have a saying here at that Precisely which is, “your data sucks”. I actually have a T-shirt that says your data sucks. And being the chief data officer and CIO for Precisely, our data sucks like everyone else’s. It’s the nature of the beast.”
Why is that?
“You have to go back and look at where data come from? It comes from transactions, from amassing information about various different things. People are motivated and compensated to transact, make a sale, answer a customer request, and fulfil something logistically. Without the right bounds there, the amount of data collected, the type of data collected, and the quality of the data are just enough to get the transaction done.
“To me, the root of data issues comes from when it is originally created. There’s this aspect of, “I created it for this purpose.” When it comes to analytics, it becomes “but I want to consume it for other purposes.” And that’s where the rubber meets the road.”
Build a centralised data team
In most organisations, data is seen as something to democratise. But when it comes to managing, cleaning, making that data useable, it’s a different story. The problem is that data is scattered all over the organisation. It is often only brought together when there is a use case. At that point, we’ve started to see data scientists involved. How is Precisely doing this?
“I am a firm believer that in many organisations you need a centralised data team, not a centralised analytics team. It’s a Centre of Excellence that focuses only on the data. It helps the functional organisations to figure out where they get their data from. It provides a way to measure data quality by building feedback loops into the source systems.
“To your point, when someone’s focused on a use case, sometimes they get done with that use case, and they move on. The worst thing to me is when people extract data from source systems, fix it somewhere, and don’t get those fixes in a feedback loop back to the source systems. Because then you’re just continually repeating the same thing over and over again, and you’re wasting time and money.”
How do you manage data privacy and security?
Data is rarely simple, especially when it is extracted, combined and built into a use case. Two of the most challenging issues around data are PII security. How do you ensure your data team understand that?
“Inside Precisely, I run IT, data and Infosec, and we explicitly call them out separately. I understand these are major different functions, and they’re related. All three of those functions get along great and exchange ideas. The really fine line that you need to make is that this team is there to help improve the data for consumption by the business units. It has to work in partnership with the business units.”
If you are focused on the data and not key use cases such as engineering, that is where we see analytics and PII risks. How do you deal with that?
“The central team is focused more on data engineering and not so much on data science and analytical use cases. There has to be enough expertise in that central team to understand that you are engineering the data for consumption in analytical/data science use cases. While you’re engineering data, you need to analyse it. You can also use machine learning principles to increase the quality of it as well. The central team provides this level of expertise that advises all the functional groups and gets involved very closely with them.”
Clean data is needed for corporate acquisitions
Engineering the data for your organisation and working on feedback loops to keep it clean makes sense. But what about mergers and acquisitions? What can you do to speed up data integration without threatening data standards?
“We’re a very acquisitive company. We started out as Syncsort. Along the way, we acquired a bunch of other data management companies along the timeframe. Last year, we acquired Pitney Bowes software and data division. You can only imagine inside those two companies how many different business models were represented in the data, and you’re spot on.
“When I think about acquisitions, I think about data. The first thing we need to do is understand the acquired company’s data, what business model is represented and how strictly they adhere to that business model. What kind of quality is manifested? What do we need to do to transform it, to put it into our data models to support our business?
“It’s quite an interesting journey. It wasn’t until I got involved in the M&A side that I realised one of the biggest success factors is going to be whether or not the data can be used.”
Leave alone, carve-out or tuck-in?
Should companies leave their acquisition targets alone until the data is sorted, or do they rush in without enough due diligence?
“Probably a bit of both. Depending upon the size of the acquisition, there can easily be a case for “let it continue to run and see what’s going on there get to understand it much better.” Due diligence around data is pretty hard. You can’t do due diligence before you close because you have to get in there, get into the systems, which means that you’re already left with some ideas as to what it might look like.
“But then there’s also the consideration of “Did you acquire the entire business, or did you carve it out?” When we acquired the Pitney Bowes software data division, it was a carve-out of a larger company. So we didn’t get the systems. “We were on these transition service agreements, and we had to move fast and move that stuff.
“We do a lot of acquisitions that we call tuck-ins. I used this analogy a year ago. We tucked a lot of little things into that twin bed, and they’re falling out the sides. Even in the tuck-in, you still have the same considerations. You’ve got to get them into the main business processes because you can’t afford to continue operating smaller order management and selling processes. You may also have to get it more quickly to cross-sell initiatives, though those might be some of the major drivers behind the acquisition.
“You need to look at it from that perspective of, what were the acquisition drivers, and therefore, how quickly do you need to move?”
Has cloud made things easier or harder?
The explosion of cloud has come alongside the explosion in data. One enables the other but are companies really using the cloud to utilise their data better?
“From my perspective, it’s a dream come true. Our strategy is to leverage SaaS, and as a software vendor, our models aren’t different from other software vendors. Why do we need to diverge? I get plenty of people in sales, sales operations, fulfilment, who were like, customise that system to do blah, blah, blah. And I’m, no customisations!
“Can we change our business model? Can we change some of the other things that we’re doing to be more efficient there? Quite frankly, what I want to be able to do is upgrade easily to the next SaaS version. We’re growing rapidly. We’ve doubled in size in the last year, we doubled in size the year before, and we fully intend to double in size again. I’ve got to be able to continually stay on top of those SaaS applications and take advantage of what those vendors are providing for me. So from that perspective, it’s great.
“The tool every IT person needs is the ability to say, “No, it’s not in the best interest of the business.” When you have built your own applications and customised them, it’s really hard to say no. Recently where we were asked to make one little change to map the old system we used to have. We showed that making one little change would mean doubling the size of the data model underneath. 50% of the data model would be non-standard. So the next time the SaaS vendor upgrades, we can’t. It would take a good portion of the next six months to catch up.”
Chief Data Officers biggest nightmares
When you talk to other Chief Data Officers, what are their three biggest nightmares?
“Number one, I would say, data state quality. A lot of folks just assume you’re going to have data that’s usable. “I’ll get a Chief Data Officer, and they’ll just make that happen.” It’s taken us decades to get into the situations we’re in, and it’s going to take a long time to get out of it. There’s a lot of hard work underneath. You can’t throw a chief data officer in, and then magically, this stuff has to happen. There’s hard, hard work.
“Number two is even if people realise it’s hard work, are they going to invest appropriately? Are the board going to value the importance of data to even have a discussion about it to say, “what’s the risk associated with us not having data that’s appropriate integrity for our business?”
“The last one is the appropriate use of data. If you haven’t had a Chief Data Officer or any sort of centralised management, you got to start getting your arms around, how are we already using this? Are we okay? Then you have to figure out how to rein that stuff in appropriately to continue to allow that balance between compliance and innovation.”