Graph representing the metadata of thousands of archive documents, documenting the social network of hundreds of League of Nations personals. By Martin Grandjean [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

Graph databases have been the fastest growing of any database category in the last five years. There is growing awareness that connected data is the raw material for business success. Neo4j has seen numerous use-cases where graphs have displaced existing RDBMS or NoSQL systems.

Which graph database?

Excitement and innovation in the graph category has created a rush to join the fray. Several notable vendors have released graph databases or added graph capabilities to their existing database offerings. But not all graph databases are created equal. CIOs are going to have to be clear on not just why graph databases, but which graph database.

What makes a good technology choice in the context of enterprise graph software? Primary is the native and non-native design of the database management system. As the name suggests, native graph databases are those built to handle graph workloads across the entire stack whereas non-native databases piggyback atop another DBMS. This is not an implementation detail: the design of a graph database strongly influences its performance and dependability.

Two types of graph database

Non-native graph databases tend to come in two flavours:

  1. A graph API on top of some other existing database management system.
  2. Multi-model semantics where one engine claims some level of integrated support for several kinds of data model.

A native graph database exclusively serves graph workloads across its entire stack. From query language through to the database management engine and file system considerations, and from clustering to backup and monitoring, the native graph database is designed from the ground-up for those tasks.

Native technologies tend to process graph queries faster, scale better (retaining their hallmark query speed as the dataset grows), and run far more efficiently upon less hardware (we’ve seen in excess of 10x reductions). Conversely, non-native stores optimise for their primary workload at the expense of graphs, which means they struggle with the complexities of multiple first-class models.

Native graph storage

One reason native graph databases outperform is their choice of storage. When architected specifically for storing graph data, it is known as native graph storage. Native graph databases are designed to use the file system in a way that is expressly sympathetic towards graphs. This makes them highly performant and safe for graph workloads. This approach is championed by Neo4j. Traversing across a relationship has a constant cost irrespective of the size of the graph database because of the on-disk format and in-memory data structure design. [1]

Mechanically the cost of traversing a relationship between nodes is tiny. This is because of mechanical sympathy between the software and hardware. Neo4j optimises for the fastest thing a computer can do: fetch and dereference pointers to navigate the graph efficiently and at extremely high speeds.

The implication of this is simple and profound: query latency is proportional to the amount of data we choose to explore, not to the total size of the data. And queries are very fast.

Real-world project success

Such rapid performance is critical for real-world systems. For example, when building a cross-asset investment-banking platform, one Global 500 financial services company ran into significant challenges with user entitlement and authorisation access. Various levels of users required access to the platform for activities as diverse as portfolio management, equities trading and foreign currency exchange.

For such a high-profile platform, the firm required a robust permissions management solution. The firm can track which users have access to which assets, activities, decision-making tools and more. Queries can also be run in real time as users are added, removed or audited for fraud prevention. Moreover, since the solution is native graph technology, the data is stored dependably and processed performantly for a wide range of business-critical activities.

Non-native performance issues

Non-native storage is optimised for other data models, such as a relational, columnar, document, or simple keys and values. While optimised for their own native model, they’re not set up to exploit graph structure safely or efficiently. To treat columnar, relational, document, or key-value data as a graph database, the database management system must perform costly translations to and from the primary internal model of the database.

Database management system designers can try to amortise the cost of such translations through radical de-normalisation. However, non-native approaches risk high latency when querying graphs. Plus there are well-understood safety risks when persisting with graph data – risks which radical denormalisation tends to amplify.[2]

Native graphs include transactional mechanisms to ensure that data safety remains impervious to network blips, server failures, and even contention from competing transactions or scaling decisions. Non-native graph architectures, especially those built on eventually consistent stores lacking transactional isolation, will eventually corrupt graph data.

Native graph query processing

Native graph query processing refers to how a graph database describes, plans, optimises and executes queries. With a native graph system, every layer of the architecture – from the user’s expression in the graph query language to the files on disk – is optimised for storing and retrieving graph data.

Through radical denormalisation, non-native graph databases may try to avoid mechanical penalties. For example, a non-native store may be optimised for three levels of traversal depth by duplicating and co-locating data or by creating extensive indexes for each query. Beyond that, the traversal performance reduces drastically, whereas the native approach provides consistently high traversal performance at any depth. The upshot is that initially queries seem performant, but there is a mechanical ‘cliff edge’ which causes latency to rapidly increase.

Supporting next generation data apps

Neither SQL nor NoSQL can handle graph workloads well enough for modern applications. For example, the eBay ShopBot team created a graph-AI powered chatbot that acts as a friendly personal shopper to eBay’s enormous catalogue.[3]

Existing product searches and recommendation engines were unable to provide or infer contextual information within a shopping request leading to suboptimal conversion. Instead, eBay’s real-time recommendation engine both understands and learns from the contextual language supplied by the shopper and quickly zeroes in on specific product recommendations at Internet scale – all of which is supported by native graph technology.

Dependability

The disconnect between graph data with non-graph storage is problematic for both performance and scalability. The only way to ensure data safety is to update the graph via Atomic, Consistent, Isolated and Durable (ACID) transactions (although interestingly old-fashioned strict 2-phase commit is not required, and fast modern transaction protocols can be used). Another problem is that maintenance of relationships between records is far more demanding than weaker-than-ACID consistency models can provide.

Get the most out of your connections

It’s often convenient to think that non-native graph technology may be ‘good enough’, particularly if you have that non-native technology installed for its native use-case.

But data tends to grow over time. Today’s datasets are more variably structured, interconnected and interrelated than ever before. This means if you want to take advantage of graph technology, native graph technology is what you should use.

We know that the value is in the connections, and a non-native approach puts that value at risk. A native graph database will also serve you better over the long-term and won’t require extraordinary hardware investments.

The choice of graph technology is not always clear in an area of high growth and rapid innovation. But businesses hoping to get the most out of the connections in their data will find the integrity, performance, efficiency and scaling advantages of a native approach are the best bases for long-term success.

[1] https://neo4j.com/whitepapers/rdbms-developers-graph-databases-ebook/

[2] https://neo4j.com/whitepapers/rdbms-developers-graph-databases-ebook/

[3] https://neo4j.com/case-studies/ebay-shopbot/


Neo4j logo (Image credit www.neo4j.com

Neo4j, Inc. is the graph company behind the leading platform for connected data. The Neo4j graph platform helps organizations make sense of their data by revealing how people, processes and digital systems are interrelated. This connections-first approach powers intelligent applications tackling challenges such as artificial intelligence, fraud detection, real-time recommendations and master data.

The company boasts the world’s largest dedicated investment in native graph technology, has amassed more than ten million downloads, and has a huge developer community deploying graph applications around the globe. More than 270 commercial customers, including global enterprises like Walmart, Comcast, Cisco, eBay and UBS use Neo4j to create a competitive advantage from connections in their data.

Neo4j is privately held and funded by Eight Roads Ventures (an investment arm of Fidelity International Limited), Sunstone Capital, Conor Venture Partners, Creandum, Dawn Capital and Greenbridge Investment Partners. Headquartered in San Mateo, Calif., Neo4j has regional offices in Sweden, Germany and the UK. For more information, please visit Neo4j.com and @Neo4j.

LEAVE A REPLY

Please enter your comment!
Please enter your name here