Enterprise Times sat down with Bharath Gowda, VP Product Marketing, Databricks to discuss their latest announcement, Databricks Ingest. As organisations look to create massive data lakes from which they can do analytics and build their machine learning, getting access to the data is a non-trivial task.
Gowda said: “The path towards data science and machine learning and all this analytics is a rather complicated long journey for them to get there. The number one challenge that they face is data. Where is the data? Can I get my hands on the right kind of data sets? How performant is it? How reliable is my data?”
Ingest provides an easy way to bring in data from multiple locations from the mainframe, SaaS applications and other data stores. But this is far from the end of the story. Merging data sets create problems with field names, sizes and even types. Anyone who has done ETL will know the complexity it brings. Gowda says this is something they have focused on.
Gowda talks about the problems of data cleanliness. Marketing teams often outsource data for cleaning prior to major campaigns. They even set limits on how clean the data has to be in order to be acceptable.
Gowda is seeing customers accept a bronze, silver, gold approach to their data. At one end it is the raw, incomplete, dirty data. At the other, it is clean, organised and complete. Gowda believes that most companies are targeting Silver. This is because it allows them to work on the data at a level of cleanliness that they can accommodate. It is also a level where they can begin to build out their machine learning and AI systems.
To hear more of what Gowda had to say, listen to the podcast.
Where can I get it?
obtain it, for Android devices from play.google.com/music/podcasts
use the Enterprise Times page on Stitcher
listen to the Enterprise Times channel on Soundcloud
listen to the podcast (below) or download the podcast to your local device and then listen there