Piggibank © 2018 Image by 3D Animation Production Company from Pixabay https://pixabay.com/illustrations/piggy-bank-money-save-finance-3612928/ CAST AI has announced the launch of AI Optimizer at Google Cloud Next ’24. The company claims that the AI Optimizer service reduces the cost of deploying Large Language Models (LLMs). The LLMs are deployed on CAST AI-optimised Kubernetes clusters, which the company claims will unlock unprecedented Gen AI savings.

Leon Kuperman, CTO at CAST AI (image credit: LinkedIn)
Leon Kuperman, CTO at CAST AI

CAST AI Co-Founder and CTO Leon Kuperman, said, “Not all large language models are created equal. Some may be more efficient than others in terms of cost, performance, and accuracy across numerous use cases. But organizations haven’t had a way to identify and deploy the most optimal model in terms of performance and cost.

“We’re addressing that gap by extending our expertise in automation to LLM optimization. What makes AI Optimizer so compelling is that it significantly reduces costs without requiring organizations to swap their existing technology stacks or even change a line of application code, which will help democratize generative AI.”

How does the AI Optimizer service work?

According to the company, the AI Optimizer service is designed to integrate with any OpenAI-compatible API endpoint. Once it has identified the LLM, it searches both commercial vendors and open source, looking for the optimal performance and lowest inference costs.

AI Optimizer does a number of things to deliver performance and cost savings. Among those are:

  • Identifies the number of users and usage patterns: This is important. Many LLMs rely on usage based pricing. This means that more users = higher cost. Analysing usage patterns will show how costs can increase. It allows an organisation to analyse usage and benefits against costs to prevent an LLM becoming a net cost to the business.
  • Analyse the cost benefits of model fine-tuning: Models will need to be fine-tuned as users make more use of them. It ensures that results are more accurate and deliver the best benefits for an organisation.
  • Creates budgets: Using a range of criteria, the AI Optimizer service can deliver budgets and alerts for customers. This allows them to keep control of costs and identify unexpected cost increases.

All of these, CAST AI claims, means that organizations can expect substantial cost reductions on AWS, Azure, and GCP.”

Enterprise Times: What does this mean?

In the rush to cloud services, the mantra was that OPEX would save considerable sums of money over CAPEX. By and large, that has been true, but the early savings have been eroded by increased, and often uncontrolled, use of cloud services. While the increased use of many services has had benefits, those costs are making companies rethink their use of certain cloud services.

Recognising this, CAST AI is moving to provide a means for organisations to understand the cost of using an LLM. The key here is maybe. There is no definitive evidence, as yet, of how accurate the budget advice will be or the savings promised. It should also be noted that this is for LLMs that use the OpenAI APIs.

It will be interesting to see how quickly CAST AI delivers case studies that stand up to the cost savings it is promising. If it shows significant savings, it will push others to do more to make LLM costs more transparent. It may also make commercial LLM vendors rethink their charging policies.

LEAVE A REPLY

Please enter your comment!
Please enter your name here