Is synthetic data the key to greater data science use? (Image Credit: Robert Bye on Unsplash)Data science teams want access to data and lots of it. They rightly argue that without prompt and unfettered access, their effectiveness is limited. The problem for organisations is that they have so much data that they struggle to clean it in order to allow such access. Most attempts to sanitise or anonymise data are anything but. The majority of solutions do pseudo anonymisation which takes little effort to reverse engineer.

So how do we solve this problem? At Spark+AI Summit in Amsterdam, Enterprise Times talked with Chad Whitney, SVP Data Partnerships and Jonathan Chin, Co-Founder and SVP of Product and Strategy, Arm Insight. Arm Insight provides tools to create synthetic data.

The target market is risk adverse organisations in both the finance and healthcare markets. The synthesised data is typically used for their internal analytics. It also gives data science teams faster access to data than they would previous have had.

How is synthesised data different from sanitised data?

Jonathan Chin, Co-Founder and SVP of Product and Strategy, Arm Insight (Image Credit: Arm Insight)
Jonathan Chin, Co-Founder and SVP of Product and Strategy, Arm Insight

ET asked Chin what was the difference between synthetic data and sanitised data? Chin replied: “Sanitised data just removes the sensitive information. Synthetic data goes further and adds statistical obfuscation. That is the secret sauce that we add that allows them to be sure that there isn’t any risk of correlating data points across data sets or being able to reverse engineer to the original.”

Data obfuscation is key for Arm Insight’s customer base. It allows those customers to give quick access to data science teams to do analytics on. It also provides a solution to the issue of insider threat. Data leaks happen. Most organisations hope that the encryption they are using and the sanitised nature of the data makes it unreadable. Chin and Whitney say that synthesised data removes that concern.

Several Arm Insight clients are doing more than just create analytics around the data. They are using the synthesised data to build new products. At no point is control over the relevancy of the data taken away from the data owner.

Eventually any encryption can be cracked

The problem with any form of encryption is that given enough compute power and it can be cracked. This is where Chin and Whitney point out that what they are doing is not encryption. It is a wholly different process. They’ve also subjected it to third-party attacks.

Chad Whitney, SVP Data Partnerships, Arm Insight (Image Credit: Arm Insight)
Chad Whitney, SVP Data Partnerships, Arm Insight

Whitney said: “There are three different buckets of data that we look at. The raw data which has the don’t touch, taboo, fear factor. You have anonymised data which you could pair with some mobile location and different data aspects and deanonymise. Then there is the synthetic data where we have obfuscated all of those data points. Part of the secret sauce is how it’s a deviation from the actual transaction.

“We’ve had multiple third-party firms come in and try and reverse engineering that transaction and no-one’s been able to crack the algorithm and reverse engineer it back.”

Those third parties have all failed. It provides Arm Insight with a set of proof points that makes their customers happy. Whitney continued: “That’s where a lot of the banks are coming in and saying we can now get comfortable with this. In the medical field they are getting comfortable with it as well. We are able to satisfy HIPAA, CCPA and GDPR. By the time the data is synthesised and distributed out is a complete non referenceable data set to the original.”

Don’t medical staff need identifiable data to do their job?

The healthcare industry is a vast, complex, machine that shares significant amounts of data in order to solve medical problems. If the data is completely non referenceable, ET wanted to know how they were working with it.

According to Whitney, the use case in healthcare is not about clinicians. One challenge that exists in many private healthcare environments is how to separate medical data from administrative data. For example, when accounts access patient data to create a bill, they can often see the whole medical history.

Whitney said: “That’s where we are focusing. Rev cycle management companies that are ingesting enormous amount of PII information that, at the end of the day, isn’t relevant for what they are doing. So there are a couple of ways that we are helping organisations bring that data in to do analytics for their business in a way that they have never really felt comfortable doing.”

Is this a solution to the supply chain problem?

One of the challenges of complex supply chains is data protection. At one end of the scale are companies that do data processing for the data controller. Under GDPR and an increasing number of other privacy laws around the world, the data controller is jointly liable for how the data processor handles the data. Few organisations are able to impose their data controls on their suppliers.

A different example is that of manufacturing companies. Many are now using machinery with large numbers of embedded sensors. Their suppliers use those sensors to create digital twins to know how the equipment is being used. The idea is that this allows them to predict failure based on usage rather than on best guess. The problem is that the data can reveal a lot about how busy a manufacturer is and what their workloads are.

ET asked Chin and Whitney if synthesised data would work in these environments.

Chin replied: “Right now, not that far. The question of how do we push it further along that supply chain is a question we are still trying to figure out ourselves.”

One of the reasons for the reticence to expand into new areas is that Arm Insight only introduced synthetic data in the last 18 months. This means it is a relatively new product to the market. Arm Insight has a lot of domain knowledge in financial and it sees a lot of potential business. As such, it is loathe to expand into other markets until it has to.

Where is the Arm Insight partner ecosystem?

The last two years have seen the UK become one of the biggest markets for fintech start-ups. It has also seen significant growth in insurtech and the emergence of a lawtech market. All of these are investing heavily in data science and have a need for secure access to data.

ET asked Chin and Whitney if they were looking at building an ecosystem that would give them that expansion into other markets through partners. Whitney said: “We’ve had conversations along those lines in different capacities be that cloud providers, analytics infrastructure on top of that or data marketplaces. We are focused on working directly with the customers. At this point we are still having the conversations but we haven’t committed to anything. A partner ecosystem is still something that is still in development.”

Enterprise Times: What does this mean

Organisations are constantly being told that “data is the new oil.” Most, however, are wary that instead of ending up with a slick new product, they’ll end up with an oil slick that will damage the business. The challenge of how to make data accessible without the risk of a damaging breach is a real one for enterprises. But if they can’t solve it, they won’t be able to monetise the data they hold or develop new products.

Arm Insights is offering them a solution, albeit only for those who operate in the finance and healthcare markets. For some, the idea of synthesised data might sound too good to be true. After all, anonymising data was felt to be the best solution for years.

The question is can Arm Insight satisfy the wider market? At the moment it is playing to its strengths and not overreaching. ET sees this as being a sensible move. There is no point entering a market where you have no domain expertise when there is so much business in your core market.

How long will it be before Arm Insight looks at licencing its technology to partners? If it can find the right partner with the right domain expertise, it seems to make sense that it uses them to grab a larger share of the data analytics market.

Clifford Chance creating its own lawtech solutions

How Electrolux is using AI to shape the white goods market


Please enter your comment!
Please enter your name here