Couchbase has released a developer preview of Generative AI capabilities for its Capella Database-as-a-Service (DBaaS) called Capella iQ. The company claims that it will “significantly enhance developer productivity and accelerate time to market for modern applications.”
Scott Anderson, SVP of product management and business operations at Couchbase, said, “Code that used to take hours for a developer to write will now be generated in a matter of minutes in sample sets from Capella iQ.
“This makes developers more efficient when building modern apps, ultimately accelerating innovation for customers. By incorporating generative AI into our fully managed DBaaS, we are making it easier for developers to get started with Capella and significantly boost their productivity.”
The company goes on to say that It says that “Capella iQ enables developers to write SQL++ and application-level code more quickly by delivering recommended sample code.”
What is this all about?
Speeding up the developer experience is nothing new. Integrated developer environments (IDEs) have evolved significantly over the last three decades. They have gone from basic scripting interfaces to having a lot of capabilities to speed up code completion and check syntax, among other things.
Couchbase wants to go further. With this announcement, it wants to allow the developer to ask questions of Capella iQ and have it deliver syntactically correct SQL++ code. To do that, it will use a dedicated LLM built by and run by Couchbase. It also means that Capella iQ will understand best practices for writing SQL++.
Importantly, this is aimed at developers, not end-users looking for something to write queries for them. As such, it should have a greater understanding of SQL++ and Couchbase products.
But, there are many questions to be answered here. For example, how much developer time will be saved by having Capella iQ write code? How will that code be tested? Will developers get lazy and let their skills atrophy by just handing off code writing? How optimised will the code be? Who owns the code in terms of copyright? Who will check the code – this is a key question given the tech industry’s history of sample code being used in production apps and creating security and other issues.
Getting a deeper view
To get an understanding of some of these questions, Enterprise Times talked with Jeff Morris, VP Product & Solutions Marketing, Couchbase.
What is the 1,000-foot view of what you are announcing?
Morris replied, “We’re announcing a developer preview of Capella iQ. It’s a Chat GPT-powered coding assistant built into the Couchbase workbench, which is our user interface inside of Couchbase, Capella.
“Capella is the Database-as-a-Service for Couchbase. It allows the developer to have conversational conversations via a chatbot with Capella IQ. Everything is scoped to the context of Couchbase, so you can’t go and do weird things and use the account for accessing the LLM to do things outside of the context of Couchbase.”
What does it know about?
Morris said, “It knows the SQL++ query language, the programming language of the developer. It also knows the structure of Couchbase itself. The notion of, I have a database, the database has scopes inside of it, which are groups of collections. Collections are like a table, and then there are documents inside of the collection. It was a data structure that we introduced pervasively across the database, about a year and a half ago, as an organising mechanism.
Morris continued, “Capella iQ understands that, and will create data for you, create new documents, it will build indexes for you, based on a query that you ask it to write. You might ask it to recommend an index for a query. It will also build full-text searches for you.”
Morris sees Capella iQ as a smart assistant that will take on a task from the developer, allowing them to focus on the problem in hand, and not on writing code. But this is more than just a code generator. He also believes it will help debug a piece of code to speed up the resolution of problems.
He commented, “One of my engineering leads keeps pointing out to me, you have to remember how text-oriented a developer is. You’re writing scripts all the time. This is a very, very natural way to be conversing with something to help you do your work.”
Code security
Over the last few decades, we’ve seen many problems with sample code. It’s not production ready, and that has stopped many vendors from releasing the same level of sample code they did in the 80’s, 90’s and even 00’s. How can you be sure that although the code may be syntactically and functionally correct, it is also secure? Because once you hand that code to a developer, they are not necessarily going to do anything other than deploy the code.
Morris responded, “code security is the responsibility of the developer, whether they were using this tool or not. We are trying to assist them in building test cases for the code that will be generated from this. But this is still an area where we recognise we’ve got to keep the good guardrails on what we’re doing in terms of interaction.”
The plan is to help test teams, and developers, build valid test scripts for any code generated by Capella iQ. That’s a really important step but one that will still need significant oversight of those scripts. Without that, Capella iQ is simply marking its own homework, and that is a dangerous route. Interestingly, Morris also sees there is a real opportunity to build test pipelines for those people doing CI/CD. If it closes that gap where test pipelines can be built and dropped into projects, that will be a huge leap
Data security
It is not just code security that is an issue. There have been too many cases of people leaking sensitive data to LLMs. That has caused many large organisations to start to put blocks on staff using tools like Chat GPT, Bard and other tools. How will this prevent data leakage?
Morris sees this as being organisations needing to adopt best practices for the process. He said, “Don’t give personal information, don’t give intellectual property to the LLM unless you’ve licenced the enterprise version of Chat GPT.”
Having a private instance makes a lot of sense but is likely to be out of the reach of many organisations due to the cost, complexity and challenges of maintaining the model.
With the code generated here, Morris made the point that “the database itself is still designed to be enterprise-level secure. All of our RBAC controls are enforced, and all of our code execution rules are enforced inside of the eventing engine. But, because we’re part of the larger application stack for the new product development team, there are the pieces that I can control and pieces that I can’t control.
“I recognise that that is an area that I have to be sensitive to, but it’s also the development team that still has that degree of responsibility for what they built here.”
Preventing others from seeing my queries and data
If I’m using this AI assistant tool, to write queries against my data, at some point, it will see the underlying data. I cannot simply obfuscate the data because then it cannot know what it is querying. How do you ring-fence the data when helping the developer write this in such a way that it cannot be cleverly extracted by somebody from another organisation using the model?
Morris replied, “Much of it is following your existing rules. Let the AI generate test data for us. Let it read the structure of your sample JSON document and a John Doe record that you’ve created. Then ask it to make more data that looks real, but make it not real.
“You could arguably generate a good asset for yourself in the billions of records if you want it to because size is usually your biggest desire. It’s why developers want to clone the customer database in the first place. That problem we can assist with.
“But that larger one of sharing your sensitive information with your LLM is a universal AI issue right now. The LLM providers are all going to end up realising we’ve got a copyright problem. We’ve read too much, and we know too much. Be very, very careful about the production-level database or production-level data that you work with.
“Our customers store really sensitive data. We store account information, and JSON is the greatest format for storing account information about all kinds of things. An AI could take advantage of that, and we have to be careful about people learning from your learnings”
The problem is more than just the data and the queries. There is a risk here of people discovering the structure of my database and using that to craft attacks. How can a customer be sure someone isn’t learning about their database structures?
Morris commented, “My initial answer is we’re trying to keep the context of everything, to your organisation and your use of pathways in your project. Get it scoped down and then encourage the LLM to control the boundaries of access.”
How much time can be saved?
How much time saving will there be? You are talking about getting better syntax, better code, speed it up. Have you done some testing? What metrics are there to see what level of time people save for example, will it be five man hours per developer per month? Or do you have any hard numbers?
Morris responded, “In conversing with other analysts who talk to developers all the time, I try and get them to tell me the same answer. It’s like 30% to potentially 50% more efficient for development. I don’t have my exact block of time.
“You’ll have a better ability to maintain your contextual train of thought while you’re writing something in a much, much easier way. You don’t get hung up for an hour or two on a particular weird function that you wrote or somebody else wrote. You are not spending time to try and figure that out.”
Morris went on to say that the time savings are not based on qualitative data but anecdotal data. Couchbase is doing surveys to understand more.
How optimised will code such as queries be?
What about the optimization of queries? Have you looked at the query code generated in your own testing, and then looked to see how optimised they are, for that particular database customer environment? Or is that something that’s going to have to come over time as the body of knowledge increases?
Morris admits that this is only going to come over time. Of interest to many customers will be his statement that “in the last two years, we’ve picked up about five patents, and they all tend to be around query optimization, index optimization. We’re storing everything to be faster and faster and more efficient.
“In the most recent release of the main database, which is underpinning all of this, we introduced a new cost-based optimizer. We’ve carried that over now into not only our original query engine, but also the analytics engine, and we’re starting to carry all of that over into the full-text search engine as well.”
Enterprise Times: What does this mean?
Over the last decade, low-code and no-code environments have become mainstream. They are all about speeding up the time to deliver applications. It’s long overdue, given that every organisation is struggling with a massive backlog of software development.
One area that has not had enough attention has been the database. What Couchbase is doing here is to put the focus solely on the database. This is not just about writing faster code. It is about writing more reliable, more trustworthy code and making life easier for the enterprise developer. If Couchbase succeeds, it will be the biggest change to database development in a very long time.
There is still a lot to do, however. This is still a developer preview, and there are many unknowns. How good will the code be? Will it save significant time? This latter question depends as much on the quality of the code as it does on the ability to test and trust the code.
It will be interesting to see how this preview goes. Will Couchbase present Capella iQ as a way to build effective data pipelines in its next webcast series? Will we have to wait until Couchbase Connect 2024 to get a realistic look with real data around quality and time saved? Only time will tell.