Why 2021 Is Graph Data Science’s Year for the Enterprise

February 10, 2021

Graph data technology is fast becoming mainstream in enterprise IT. In its ‘Top 10 Data and Analytics Technology Trends for 2020’ report, Gartner states, “Finding relationships in combinations of diverse data, using graph techniques at scale, will form the foundation of modern data and analytics.” Gartner also surveyed companies about using AI and machine learning techniques, finding a remarkably high 92% said they plan to employ graph techniques within five years.

There has been an explosion in academic research in this space underpinning this trend. Over 28,000 peer-reviewed scientific papers about graph-powered data science have been published in the last decade. Graph data science is earmarked as a significant area of focus by the global research community.

Taking your business to the next level with machine learning

Graph data science is a powerful and innovative technique. It can reason about the ‘shape’ of the connected context for each piece of data through graph algorithms. It enables far superior and richer machine learning predictions. In the past, working with massive volumes of connected data was solely the domain of cutting-edge companies with highly trained R&D scientists, like Google. Today, these approaches are available to anyone.

Graph data science can revolutionise the way enterprises make predictions in many diverse scenarios, from fraud detection to tracking a customer or a patient journey. It leverages the connections between data points for more accurate and interpretable predictions. In a drug discovery use case, this means identifying possible new associations between genes, diseases, drugs and proteins, and providing immediate context to assess the relevance or validity of any such discovery. For customer recommendations, it means learning from user journeys to make accurate recommendations for future purchases while presenting options within their buying history to build confidence in suggestions.

The ability to rapidly ‘learn’ generalised, predictive features from data could take organisations to the next level with machine learning. Some companies are still learning how to leverage connected data in their existing machine learning workflows. There is less friction in getting started and an increase in the number of real-world examples and case studies.

As a starting point, knowledge graphs provide immediate value. They connect concepts and offer a foundation on which more sophisticated approaches – from graph algorithms to deep learning – can be applied. Given this potential payback, data scientists are increasingly acknowledging that, from queries to support domain experts in uncovering patterns to the identification of high-value features to train machine learning models, a lot of their work isn’t possible without graph technology.

Real-world success with graph data science

The British government is also using graph science data. Data scientists Felisia Loukou and Dr Matthew Gregory discuss deploying their first machine learning model in a GOV.UK blog post. It was built with graph technology and automatically recommends content to users from the central government online resource, based upon the page they are visiting. They explain that their application learns continuous feature representations for the nodes. This can also be used for various machine learning tasks, such as recommending content.

Using data science for predictive maintenance

Graph data science is also essential to US-based Caterpillar, a household-name manufacturer of construction equipment. The firm combines natural language processing (NLP) with a knowledge graph for predictive maintenance. Its IT team recognised valuable data sealed in over 27 million documents. The team created an NLP tool to extract and relate concepts to uncover unseen connections and trends.

The resulting classification tool learns from the section of data already tagged with terms such as ‘cause’ or ‘complaint’ to apply to the rest of the data. The system uses WordNet as a lexicographic dictionary while accessing the Stanford Dependency Parser. It then parses text and graph technology to find patterns and connections, build hierarchies and add ontologies. Once this is all applied, users can conduct meaningful, data science-enhanced searches on the newly connected data.

Improving the detection of infections

Another example of graph data science in action comes from the New York-Presbyterian Hospital. Its analytics team uses the technology to track infections and take strategic action to contain them. Their developers found that graph technology offered them a flexible way to connect all the dimensions of an event – the ‘what’, ‘when’ and ‘where’ the event occurred.

Empowered with this insight, the team created a ‘time’ and then a ‘space’ tree to model all the rooms patients could be treated in on-site. This initial model revealed a large number of inter-relationships, but that alone did not meet project goals. An event entity was included to connect the time and location trees. The resulting data model allows the analytics team to analyse everything in its facilities. It can then proactively identify and contain diseases before they spread.

Graph data science has moved into business

Use cases like these are just a small selection. Gartner’s data industry team predicts that a quarter of global Fortune 1000 companies will have built a graph technologies skills base in three years. It will allow them to leverage graph technologies as part of their data and analytics initiatives.

Graph-enabled data science will become a key part of business analytics underpinning beneficial business insights in 2021 and beyond. Make understanding it a priority and an unlocker of business advantage for you and your team.

The author is Lead Product Manager – Data Science at the world’s leading graph database, Neo4j

Neo4j is the leader in graph database technology. As the world’s most widely deployed graph database, we help global brands – including Comcast, NASA, UBS and Volvo – to reveal and predict how people, processes and systems are interrelated. Using this relationships-first approach, applications built using Neo4j tackle connected data challenges such as analytics and artificial intelligence, fraud detection, real-time recommendations and knowledge graphs.