Apache Cassandra now has an AIOps service designed to make running Cassandra and DataStax Enterprise (DSE) easier. Project Vector has been released into a private beta by DataStax. It will continuously monitor the health and behaviour of a Cassandra cluster and provide recommendations on how to improve performance.
“Our enterprises requested a service like Vector because, as a community, we know it takes a high level of skill to be consistently successful with Cassandra,” said DataStax Chief Product Officer Ed Anuff.
“Our goal is to provide an AIOps service to proactively monitor the health of clusters to help developers and operators be more productive and effective with Cassandra.”
DataStax has already signed up NTT Data and Cisco to Vector.
What is Vector delivering?
The press release calls out four key features for Vector. They are:
Automated expert advice: Proactively identifies current and potential issues to help developers and operators solve problems quickly. Automated advice provides contextual learning with background knowledge to build skills.
Continuous updates: Rules and advice are continuously updated, deployed to SaaS and on-premises applications, and automatically applied to clusters.
Hands-off management: Advanced visualisations of system usage with insightful charting to understand tables, keyspaces, and nodes. Vector helps developers and operators see and understand how the cluster is performing and its configuration without having to log into Cassandra nodes.
Cassandra skills development: Helps strengthen Cassandra skills and knowledge by providing detailed advice and recommendations. Vector helps to reduce unexpected and unplanned items so experts can spend more time innovating.
A deeper dive into the announcement
To understand more about this announcement, Enterprise Times spoke with Anuff and Aaron Morton, Vice President, Engineering – AIOps at DataStax. Morton joined DataStax as part of a recent acquisition. Anuff began by explaining why DataStax has announced Vector.
“DataStax has been focused this year on solving the operator experience. Cassandra has a great reputation for being very powerful, and if you’re dealing with massive amounts of data, one of the best choices. But developers and operators, people who are running the software, say it’s complicated. It’s a distributed system, and there’s a lot of idiosyncrasies that are unique to it.
“We’ve tried to address that in a couple of different ways. We introduced Kubernetes support earlier this year, and we also launched Astra, Cassandra in the cloud. It’s very simple for people who just want to go and sign up and have a turnkey experience.
“We also wanted to improve the experience for every Cassandra user. Aaron has been working on automating a lot of things in Cassandra. Vector is designed to give what we call predictive maintenance. It tells you that the cluster might be having problems that your ops people should look at. Maybe the issue is network latency between individual nodes in the Cassandra cluster because Cassandra is a distributed database. It can also provide recommendations around the database configuration, tables, indexing, load and queries.”
Morton added: “We’ve got the experience to detect most of those things with heuristics and algorithms. It’s about being consistently successful. There are some unique challenges with Cassandra because it’s open-source and it’s a distributed system. To be consistently successful means helping people. Catching developers and giving them better ways to use features. Catching people as they’re doing load testing, detect hotspots in the data model, and explain that to them.”
Is this about automating the solution?
Database teams work with scripts to take care of the mundane all the time. However, Cassandra instances tend to be running mission-critical data. There is often little time for making changes, and there are strict controls on how those changes are made. Will Cassandra work in manual or automatic mode?
Anuff replied saying: “It doesn’t make the changes automatically for you. This is one of the questions we get a lot from an AIOps standpoint. What we hear from operators is they want the recommendations, but they’ll make the changes themselves. They don’t want the system messing with their configuration or installation behind their back.”
How do you get the right data to make those recommendations?
Anuff replied: “Everything in the product right now is around the data model that exists within Cassandra. It’s happening through the agent that’s installed in the Cassandra cluster. We’re not doing monitoring around the cluster on these things. We’re getting it out of the Cassandra diagnostics.
“One aspect of this is we’ve designed it so that it can be deployed fully on-premises for customers that absolutely do not want this data leaving the perimeter of the organisation. Several alpha testers have been financial institutions that are very interested in this, but it has to run fully inside their data centre.”
Once the data is gathered, the data sets on which the AIOps engine sits need training. It takes a lot of data over time to give accurate recommendations. How is this being done?
Anuff responded: “In terms of the data sets that we’ve trained pieces against, it is based on the same systems that we’re using for our Astra database service. It is what we used for managed Cassandra before that, and so it is based on data models that have been trained against a pretty wide set.
“Part of the reason why we worked with Aaron’s company was that DataStax stacks had a lot of visibility into our own flavour of Cassandra, but Aaron had built his product against Apache Cassandra. I agree that how these algorithms get optimised is about being able to see more and more Cassandra installations. Part of the goal of this is to open it up and get a lot more data points so that we can build these models and train them.”
What about Cassandra derivatives?
Not everyone is using Apache Cassandra or DataStax Enterprise. Will this run on any derivative of the Cassandra code?
Anuff commented: “At the moment, it’s Apache Cassandra, it’s DataStax Cassandra. Amazon KeySpaces is a compatibility layer on top of DynamoDB. Some of these things may apply to that, and we’ll try to figure that out. CosmosDB is a similar thing. It’s a Microsoft database engine that it created. ScyllaDB is at least based on the Cassandra architecture, although they’ve rewritten it from the ground up in C++.
“There’s enough Cassandra users so yes, people will be expecting this to work with their Cassandra look alike. At the moment we’re focusing on all the different versions of Cassandra people are using, 2.3, 3.0 and now 4.0 is coming out. You’ve also got the DataStax flavour of it.”
Morton responded: “Apache Cassandra and DSE could have a cluster that’s in six regions around the world. Internally, we have metrics that help us understand message loss and latency between those regions. In a SaaS model, we can’t, in any way, provide that level of expertise against something like KeySpaces. What we can do with Astra is the Cassandra as a Service model is we’ll be able to get that and help our internal ops team understand if there are network issues.
“It’s great that Apache Cassandra is seen as the gold standard for distributed databases. Our next goal after Apache Cassandra and DSE is to integrate with Astra, Cassandra as a Service. We’ll be able to add this as value to anyone on that platform. When it comes to KeySpaces, Cosmos, etc, we could add value. But they’re on their own journey, and we’ve got some challenges we want to get through first.”
Enterprise Times: What does this mean?
Reducing and simplifying the workload on DBAs is a good thing. The more complex the software, the more risk of error, and that can have catastrophic issues for the enterprise. By introducing AIOps to Apache Cassandra and DataStax Enterprise, DataStax is reacting to issues raised by its users. It’s a sensible move, and it keeps it in touch with competitors.
However, the question is, how far can it go? Is DataStax going to allow the recommendations to be examined to see exactly how they were arrived at? This will help give users greater trust in what the AIOps product does.
Will it provide the option to automate some of those tasks? It’s definitely a tricky option, but when it comes to deciding the right sequence to update and refresh a cluster, an automated solution can work and respond to issues faster than a human. The challenge here is persuading customers to trust a greater degree of autonomy. That said, Oracle seems to be achieving this, and other database vendors are beginning to look at self-patching and self-tuning capabilities.
The more significant challenge here will be extending the AIOps to the various Cassandra derivatives. They make up a large user base and, importantly, a potentially lucrative one. Expanding AIOps to those solutions would be good news.