Teradata is bringing generative AI to VantageCloud Lake with the launch of ask.ai. The solution is being made available only on VantageCloud Lake installation on Azure at the moment. There is a private preview promised for customers on AWS and general availability is expected in H1, 2024.
Hillary Ashton, Teradata’s Chief Product Officer, said, “Teradata ask.ai for VantageCloud Lake enables enterprises to quickly get to the value of their data, wherever it is, and democratizes AI and ML.
“Teradata was recently called out by Forrester for its strong vision and strategy that includes AI/ML at scale and now Teradata ask.ai takes this even further with a dramatic improvement in productivity and ease of use. Enterprises choose Teradata’s open and connected platform which empowers AI and ML at massive scale, harmonizes data, and delivers a price-per-query advantage.”
What is ask.ai?
Teradata describes ask.ai as a “natural language interface designed to allow anyone with approved access to ask questions of their company’s data and receive instant responses from VantageCloud Lake.”
It goes on to say, “By reducing the need for complex coding and querying, ask.ai can dramatically increase productivity, speed and efficiency for technical users and expands analytic use to non-technical roles, who can now also use Teradata’s powerful, cloud-native platform to sift through mountains of data and draw insights.”
The announcement lists a set of key use cases that include:
- Data Insights: Teradata ask.ai enables quicker data exploration and insights. Users can ask questions in natural language and receive instant responses, eliminating the need to manually construct queries and scripts.
- Model and Code-generation: Those without extensive coding experience can now create code snippets by expressing their intention in natural language, democratizing the process of data analytics. Data scientists will see accelerated development by reducing the need to manually write code, which is also intended to reduce syntax errors, increase code consistency, and enhance general productivity as developers can focus on high-level logic and problem-solving.
- System Administration: It becomes easier and faster to retrieve system information related to VantageCloud Lake, such as environment and compute groups. An administrator can login and simply ask questions about the system (such as, “What is the state?” or “What is the current consumption?”) as if speaking to an informed colleague.
- Metadata Analysis: Teradata ask.ai can provide information on table design, making it easier to explore datasets and schemas, helping users understand the nuances in data attributes and existing relationships between datasets. This makes it faster to understand the data available for building analytics, providing deeper insights.
- Help: Any user can ask for help on a wide variety of topics, including general documentation, questions about what Teradata functions exist in a particular database, detailed descriptions for a particular function, SQL generation for that function, and more.
It is easy to gloss over these announcements and position them as saving time for all users who need to access corporate data. What such an approach misses is the time and effort required to prepare the data and the underlying LLM. Additionally, there are challenges with the maintenance of the underlying model and how the data will be kept current and accurate.
Who will use this?
Enterprise Times took the opportunity to ask a number of questions of Teradata about the announcement. We started by asking, is this aimed at end-users? The reason for the question is that the announcement talks about using it for data insights along with other tasks that are more IT-related.
Teradata responded with, “The natural language interface is designed to allow anyone with approved access to ask questions of their company’s data and receive instant responses from VantageCloud Lake. This includes technical roles, like data scientists, as well as end-users of such data within an organisation, like a line-of-business manager.”
It will be interesting to see how this plays out. Many of the use cases seem to be aimed at technical teams, not end-users. What is not clear is, when used for generating code, how Teradata will test that code.
Other database vendors have provided details on code testing and even incorporated unit tests into their products to verify code. Additionally, if used for code generation, how optimised will that code be?
What work is required to build and manage the underlying LLM?
It takes time to build and verify an LLM and to manage and maintain it. When asked about building the LLM, Teradata stated, “Teradata is using Microsoft’s OpenAI Service set to provide this functionality. We have a great relationship with Microsoft and think highly of the LLMs available to this service.”
It seems that customers are not going to build their own LLM, and it raises the question as to what is happening with both the AWS version and the GCP version when the product goes GA.
When it comes to management, Teradata replied, “Customer IT departments will not need to manage the model. Utilizing the RAG technique, the data will be dynamically available to the model at runtime.”
The RAG model (Retrieval Augmented Generation) is a technique first developed at META and which is being adopted by a number of companies, including Teradata and its competitors. It makes it easier to import new data.
What data is involved, and how is context generated?
Two of the key questions for building an LLM is what data is being used and how context is being built. Context is always critical for understanding data.
Teradata responded with, “VantageCloud Lake supports virtually all types of data. Teradata ask.ai has access to only a certain set of data, using the same role-based access controls that Teradata customers have come to rely on.”
What if that certain set of data is was not clarified?
In terms of context, Teradata again pointed to the RAG model, saying, “All of the queries fall into the category of RAG (Retrieval Augmented Generation), meaning that ask.ai is basically asking the model to answer based on the information provided to it. RAG is the standard being adopted by the wider community.”
What is not clear is how this is going to work on both corporate data and public data. Both change all the time. RAG plays well in reducing the amount of time spent updating the data set. However, customer data is rarely sitting in one location or cloud.
That led to another question. Will this be a single LLM or, indeed, multiple LLMs either based on different clouds or different sets of data?
Teradata responded with, “We are spinning up an Azure AI service for each customer and using a single tenant boundary to ensure customer data stays within the boundary.”
Protecting sensitive data
One of the major challenges of the wider use of generative AI has been that of sensitive data leakage. It has led many organisations to start to ban the use of public LLMs to protect data. When asked how sensitive data will be protected from exposure due to questions, Teradata provided its longest answer.
“The Microsoft OpenAI Service is designed with security measures that Teradata has implemented in order to deliver a low-risk solution to its customers. As stated, we are spinning up an Azure AI service for each customer and using a single tenant boundary to ensure customer data stays within the boundary.
“Beyond using a secure Azure AI service for each customer and safe tenant boundaries for data segregation, we also offer robust data encryption both at rest and in transit. This reinforces the security of the OpenAI usage.
“To ensure utmost security and data privacy, we strictly follow the principle that customer prompts or data are not used for training our AI models. Each customer’s data is isolated within their tenant boundary.
“Also, data is pulled out of the database at the global level, and can then be passed down to the AI Service in the tenant. But tenant data is not passed back up to the global level as the AI Service lives in the tenant.”
Enterprise Times: What does this mean?
There are still questions here to be answered by Teradata. For example, a single instance on Azure is not going to suit every customer. Nor will many want to move all their data to Azure. Customers will not want to limit ask.ai to just one set of data. They will want it to use their entire data set, and this is not going to be possible, at least until 2024.
There are also questions on how Teradata will incorporate code being generated by ask.ai into customers’ software lifecycle. Testing, verification of code, optimisation of code and especially queries, all need more explanation. This is not about using RAG but about ask.ai speeding up things for Teradata customers.
Despite this, there is a lot of interesting stuff in this announcement. Partnering with Microsoft and building the first generation on Azure says more about where Teradata sees its customer base than anything else. That AWS is only just going into a private beta, and there is no clear mention of GCP, the third major public cloud that Teradata supports, is a surprise. Hopefully, this will all be resolved by H1/2024 when the product is generally available.