Neo4j has announced new capabilities to its products for cloud and self-managed customers. It claims this will allow customers to, “increase analytical queries by up to 100x faster.” It will also allow customers to “run both transactional and analytical processing within one database.”
A third claim is that customers can now, “automatically track data changes in real time for faster mission-critical decision making.”
Sudhir Hasbe, Chief Product Officer, Neo4j, said, “Neo4j’s integration of operational and analytical workloads within a single database is now enhanced by the power of parallel runtime and change data capture, empowering our customers with real-time insights, cost-efficient data management, and simplified architecture.
“The results foster quicker decision-making, superior customer experiences, and a competitive edge in the market at a magnitude of speed, performance and agility that is far greater than ever before.”
What are the new capabilities?
In its announcement, Neo4j has highlighted four capabilities and benefits with this new release. They are:
- Up to 100X faster performance of analytical queries via Parallel Runtime capability, which adds concurrent threads across multiple CPU cores to run analytical graph queries. Neo4j also leverages a technique called morsel-based parallelism to optimize this capability, for greater scalability, better resource utilization, and seamless multitasking.
- Faster mission-critical decisions now enabled by native Change Data Capture (CDC), which automates the real-time tracking and notification of data changes in the database. CDC is also integrated with Neo4j Connector for Kafka and Confluent, which streams these changes for easier consumption across other data platforms and applications.
- Easier Knowledge Graph creation via new embedding models that predict and find missing relationships and infer new connections within an organization’s knowledge graph for greater semantic understanding.
- New pathfinding algorithms that make complex workflows more efficient by identifying the best sequence and critical paths between nodes on a graph.
Morsel-driven parallelism is another big step for Neo4j
These enhancements are significant. Analytics depend on speed and performance to be relevant and timely. A claimed 100x improvement will interest customers. Doing this by using parallel runtime capabilities ensures that the underlying hardware is used more effectively.
Key to that parallel enhancement is the adoption of morsel-driven parallelism. It is not a new technique. However, it has historically been one that required careful tuning and balancing to ensure that the right morsel size is selected. What is not clear with this release, is how that will be done.
Will Neo4j dynamically adjust the morsel size based on the query, the data set and the available resources? If so, how long will it take to build a picture of what the customer is doing? How quickly will it adapt when the query goes out of bounds for the available resources and selected morsel size?
That question over the resources is especially important in a cloud environment where customers can add/remove resources as required. In addition, there is no discussion of how this would balance CPU and GPU threads for the analytics. There is also no mention of a mechanism to allow data scientists to select what should be used.
In September, the company added native vector search into its core capabilities. It added the ability to do both explicit and implicit responses to an LLM. It improves accuracy, context and explainability to an LLM. The result is more contextually accurate responses to queries. As organisations begin to roll out private LLMs, vector search offers new features that will improve the user experience.
Enterprise Times: What does this mean?
Improving the performance of analytics is key to getting more out of the data organisations hold. However, there are questions here about how the speed is calculated and how best to take advantage of the new capabilities.
There is little on the Neo4j website to show how to effectively deploy morsel-driven parallelism. There is also nothing to show what this means for database configuration or even if it can be configured. Customers will want to know more before taking this step. They will want to understand the benefits and configuration challenges. They will also want to be certain that this won’t impact other technologies they are using to improve query performance.
Setting that aside, however, this is an interesting move by Neo4j. Two major announcements in the last two months show that the company is working hard to separate itself from others in this space. NODES 2023 is taking place at the moment online. Will there be talks on the native vector search and these latest capabilities? We can only wait and see.