Knowledge graphs are a big trend nowadays, but they aren’t as widely known in the data science world. Why? asks Neo4j’s Maya Natarajan
The Turing Institute frames knowledge graphs as the best way to “encode knowledge to use at scale in open, evolving, decentralised systems.” Put simply, a knowledge graph flexibly captures everything an organization knows about a particular topic.
Using a knowledge graph, we can start to reason about the underlying data and use it for complex decision-making. While not every knowledge graph is built the same way, we’re finding that every graph data science project starts with a knowledge graph. That rings true since Gartner predicts that by 2025, graph technologies will be used in 80% of data and analytics innovations, up from 10% this year.
What is the difference between a graph and a knowledge graph?
You might wonder about the difference between a graph and a knowledge graph. In practice, it’s a short journey from graph to knowledge graph achieved by adding semantics. That could mean a simple product catalog or a complex ontology like FIBO, the Financial Industry Business Ontology. It is a business model of how all financial instruments, business entities and processes work in the financial industry.
Data can be easily ingested into a graph data store. It is one of the essential elements required for building a knowledge graph. Graph databases enable storage, management, and analysis of connected data in the form of nodes and relationships between them.
Relational databases store unconnected data as tables. They require resource-intensive JOIN operations for any type of analysis. Relationships between data are computed on the fly, not stored, in an RDBMS.
Relationships are essential for building a knowledge graph, as they provide the first level of context. We call this ‘dynamic context’. That’s because when you put data into a dynamic structure like a graph database, you get a structure that’s immediately contextually connected to all of its neighbours. And neighbours are connected to all their neighbours, and so on and so forth. The knowledge graph grows and becomes richer as new information is added.
Why are we going to all this knowledge graph trouble?
Context is always being dynamically added with graphs. By contrast, if you put information into a knowledge base instead of a knowledge graph, you only get out what you put in. Knowledge bases have a static, shallow context, not a rich, dynamic one.
A knowledge graph lends itself to advanced analytics. That’s because it connects everything you have on a topic, which ultimately fuels new types of discoveries. Because you capture and store relationships, you get out more information than you put in.
To flesh out knowledge graphs, you need semantics. This is the second layer of context. Adding this contextual or semantic layer is the step where you make the data smart. It drives intelligence into the data so we can infer meaning from it. Semantics means meaning, and that’s what confers context.
Why go to all this trouble? It’s because knowledge graphs enable you to integrate all your data right where it sits. They are not another system to fit into your landscape, but a structure that relates all of the data you have.
Knowledge graphs function as a non-disruptive insight layer on top of whatever existing data landscape or infrastructure you may have in place.
And because knowledge graphs connect all the data, a single knowledge graph can serve multiple purposes. A knowledge graph you initially create for product 360 can also support Bill of Materials (BOM) management, which entails looking at the same kinds of data from different angles.
From siloed data to advanced analytics on connected data
When we asked our customers about knowledge graphs recently, the majority said they had already implemented them (67%). These are responses from large organisations in various vertical sectors across multiple use cases, so there is undoubted momentum. But why are knowledge graphs becoming such a hot topic for enterprises?
Knowledge graphs serve many use cases. There are at least 40 that we know of. They tend to break down into two use case groups—data management and data analytics.
The data management knowledge graph drives action by providing assurance or insight. Data assurance knowledge graphs focus on data aggregation, validation, and governance. They include examples like data lineage, data provenance, data governance, compliance, and risk management.
Funnelling all your data into a knowledge graph can be a challenge. MANTA, built on a knowledge graph, is an automated lineage platform. It shows users how data flows and transforms in its journey across systems. Lineage and data flows are connected in a knowledge graph, so customers quickly gain a more holistic and scalable view of their data pipelines, allowing better governance, compliance, migration, and metadata activation.
Going beyond the visibility of information
By contrast, the data insight knowledge graph goes beyond the visibility of information and focuses on exploration, deduction, and inference of new knowledge. Examples include X-360, where the X denotes Customer, Patient, Product, or Agent. This category also includes use cases such as identity and access management, AML (anti-money laundering), root cause analysis, recommendations, and many others.
The magic of knowledge graphs comes into play as data scientists run graph algorithms to analyse them as a whole, uncovering patterns and anomalies. The results from these algorithms can enrich the graph and existing ML models with highly predictive features. That’s where data analytics knowledge graphs come in. They improve decisions, forecasts, and predictions and prescribe optimal actions. Use them to predict churn and perform any type of what-if analysis.
Meredith Corporation uses a decisioning knowledge graph to gain a deeper insight into their customers and their diverse interests. Using graph algorithms designed for community detection, Meredith turned billions of anonymous data points into strong user profiles that span their media properties. Better knowledge of the user spurred far greater audience engagement, resulting in more than 600% more web traffic.
Summing up, you can see in a few steps how we’ve gone from data insight knowledge graphs to sophisticated knowledge graphs fuelling AI and machine learning. That’s a typical graph technology journey for our customers, with knowledge graphs at the centre. It’s a powerful trend that we see across every industry and in every data science team.
So for graphs, and particularly knowledge graphs, it looks like The Turing Institute is onto something here. Make sure you and your data science team don’t get left behind.
Maya Natarajan is Senior Director, Product Marketing, at Neo4j, the world’s leading graph technology company.
Neo4j is the world’s leading graph data platform. We help organizations – including Comcast, ICIJ, NASA, UBS, and Volvo Cars – capture the rich context of the real world that exists in their data to solve challenges of any size and scale. Our customers transform their industries by curbing financial fraud and cybercrime, optimizing global networks, accelerating breakthrough research, and providing better recommendations. Neo4j delivers real-time transaction processing, advanced AI/ML, intuitive data visualization, and more.