Hazelcast has just upped the ante for in-memory databases as it announces the release of Jet. It describes Jet as a parallel, streaming engine and a distributed processing engine for big data streams. Jet will sit on Hazelcast’s in-memory data grid (IMDG) which will provide the underlying storage functionality. What will be of interest to many is that Jet is open source and that means it can be used on other storage architectures.
In a statement Greg Luck, CEO of Hazelcast, said: “Hazelcast Jet is a super fast, low latency, next generation DAG Engine for Big Data processing. We believe that the Hadoop and Spark ecosystems are too complex to program and to deploy and have set out to bring Hazelcast’s legendary simplicity to Big Data. We have designed it as a general purpose engine for the intersect of Big Data programmers and Java programmers. But if you are already a Hazelcast user or have data in Hazelcast it will be the easiest way to solve your Big Data problems.”
Can Jet disrupt the big database vendors?
There is a massive fight brewing in the cloud for in-memory databases. IBM, Microsoft, Oracle, SAP and Amazon are all pushing their in-memory solutions hard. Meanwhile other vendors including Huawei are looking at open source projects to get into this space. It launched its Astro project last year bringing together HBase and Apache Spark as part of FusionInsight. IBM has also invested heavily in Spark offering it as a service on its Bluemix developer platform. This is where Hazelcast Jet comes in.
According to unaudited numbers in the press release, Jet outperforms Hadoop by a factor of 20x. It also outperforms Spark while using some of the same technologies as Spark such as Directed Acyclic Graphs (DAG). DAGs allow users to share in-memory data and run multiple queries simultaneously. This reduces the need to create multiple copies of data in-memory. It also removes the problem of record blocking which in turn reduces problems for users.
It will be interesting to see how much data Hazelcast can give Jet access to. Spark also allows companies to have multiple in-memory datasets that are addressed by the same query. If Hazelcast can compete with this and show how it works for complex multi-dataset analysis it will get a lot of attention.
What Hazelcast needs to do now is apply a wider number of benchmarks to Jet and publish the results. If it can show the same improvements over Hadoop and Spark as in the benchmark it has already run, it will challenge the major database vendors.
Where is Hazelcast positioning Jet?
On the Jet product page, Hazelcast has identified six areas it sees as typical use cases for Jet. These are:
- Real-time (low-latency) stream processing
- Implementing Change Data Capture (CDC)
- Moving from batch to stream processing
- Fast batch processing
- Internet-of-things (IoT) data ingestion, processing and storage
- Data processing microservice architectures
There are other applications that come readily to mind. Cybersecurity is an area where this has particular relevance. Holding large amounts of real-time data from multiple sources in-memory while querying it for behavioural patterns is a real challenge. Another area of interest is fraud investigation where datasets come from different companies.
It’s very early days for Jet but the numbers already look encouraging. If it performs on other data architectures as it does on Hazelcast’s IMDG then it will be seen as a real challenger to Spark. While the big players are investing heavily in their own database solutions some such as Huawei and IBM are pouring money into open source solutions as well.