Predibase has launched a software development kit (SDK) for efficient fine-tuning and serving of Large Language Models (LLMs). It claims that it will dramatically increase training speed and reduce deployment costs and complexity. Simultaneously, the company has released Predibase AI Cloud using Nvidia A100 GPUs. It states that this will efficiently train the largest open-source LLMs.
Dev Rishi, co-founder and CEO of Predibase, said, “More than 75% of organizations won’t use commercial LLMs in production due to concerns over ownership, privacy, cost, and security, but productionizing open-source LLMs comes with its own set of infrastructure challenges.
“Even with access to high-performance GPUs in the cloud, training costs can reach thousands of dollars per job due to a lack of automated, reliable, cost-effective fine-tuning infrastructure. Debugging and setting up environments require countless engineering hours. As a result, businesses can spend a fortune even before getting to the cost of serving in production.”
What is it that Predibase is releasing?
The key part of this announcement is the SDK, and Predibase is making some pretty bold claims about it. What is interesting is that performance from using the SDK is also tied to the Predibase lightweight, modular LLM architecture. That makes it hard to separate the gains from the SDK and the gains from the architecture.
So what are the claims? Predibase is claiming:
- A 50x improvement in training speed for task-specific models.
- A 15x reduction in deployment costs
To achieve this, Predibase says there are a number of new innovations. It calls out three in the announcement:
- Automatic Memory-Efficient Fine-Tuning: Compresses any open-source LLM to make it trainable on commodity GPUs (such as the Nvidia T4). It is built on top of the open-source Ludwig framework for declarative model building. Predibase then uses other settings to allow training on the available hardware.
- Serverless Right-Sized Training Infrastructure: The built-in orchestration logic uses the most cost-effective hardware in your cloud to run each training job.
- Cost-Effective Serving for Fine-Tuned Models: LLM deployment can be scaled up and down with traffic. Dynamically served LLMs can be co-deployed with hundreds of other specialized fine-tuned LLMs. Predibase says this can result in over 100x cost reduction compared with dedicated deployments. Importantly, LLMs do not need their own separate GPU.
The Predibase AI Cloud is also part of this announcement. It says that it is “a service for selecting the most cost-effective compute resources optimized for your workload with support for multiple environments and regions.”
The AI Cloud is available on request and has access to Nvidia A100 GPUs. These, claims Predibase, are “optimized for distributed training and serving.”
Enterprise Times: What does this mean?
Anything that reduces the time and complexity of building LLMs is good news. In this case, Predibase has released two things that will interest those looking to deploy LLMs for specific purposes. Among those purposes are tasks like technical support and customer service. These are all areas that organisations are focusing on as they look at using LLMs and GenAI for something other than generic queries.
It is also a market where there are many possibilities but where organisations are not sure what to do. It would have been interesting to see how Predibase intends to attract companies to make itself their first-look platform. At the moment, it is just one of several platforms competing in this space, but this ability to deploy multiple custom LLMs on the same infrastructure could significantly reduce costs and appeal to smaller companies.