At Insight 2023, NetApp has announced new portfolio updates that the company claims will enhance the data pipeline for hybrid multi-cloud AI. It claims that the new features can drive simplicity, savings, security and sustainability for customers.

Harv Bhela, Chief Product Officer at NetApp (Image Credit: LinkedIn)
Harv Bhela, Chief Product Officer at NetApp

Harv Bhela, Chief Product Officer at NetApp, said, “There are two major disruptions today for customers: the opportunity of AI and the threat of ransomware.

“Today, NetApp unveiled new innovations that make the AI data pipeline simple to deploy as well as scalable and performant across your hybrid multi-cloud data estate – while protecting that same data from ever-increasing threats. These solutions position us at the forefront of creating successful business AI outcomes for our customers.”

NetApp’s five-stage data pipeline for AI

Organisations need to get the most from their AI investments. However, NetApp says they, “may struggle to get meaningful results from massive amounts of disparate data flowing unhindered through a five-stage pipeline.” NetApp makes that statement due to the increasingly complex data landscape that organisations have. Some are on-prem, and some are spread across multiple cloud platforms.

The NetApp pipeline below shows four key stages, with retraining as the fifth. Importantly, it also shows a range of vendors and partners that NetApp works with at each stage. It will be interesting to see how organisations utilise this pipeline.

NetApp data pipeline for hybrid multi-cloud AI (Image Credit: NetApp)
NetApp data pipeline for hybrid multi-cloud AI

For many organisations, the preparation phase, cleaning, transforming, organising and enriching data, is nothing new. Yet it is still a point approach where the clean and enriched data does not get fed back to the main data lake. Doing so will reduce the repetition and waste of effort and resources. It will also provide a trusted base for many AI projects.

From experimentation through to training, deployment and retraining, the question will also be the speed of data accessibility. This is not just about retrieval times from storage devices but also network speeds, especially for those multi-cloud environments. Letting the machine learning run over the data is also resource-intensive and is best done close to where the data is stored.

What NetApp is doing is delivering high-performance all-flash and cloud storage. This not only improves the read/write of the data but the NetApp ONTAP AI solution goes further. This is not just flash storage. The AFF C-series that NetApp has released is powered by NVIDIA DGX. It brings GPUs to the data, and does not require large amounts of data to be moved across the network.

Support for all the major public clouds

NetApp has also announced that its ONTAP AI solutions will support all the major clouds. There is a version for Google Cloud Vertex AI. It takes advantage of Google Cloud NetApp volumes to speed up the creation of GenAI solutions.

Amazon FSx for NetApp ONTAP does a similar thing for data stored on Amazon. Users can also access and collaborate on Amazon Sagemaker notebooks. They can also use the managed service, Amazon Bedrock to create foundation models. In addition, customers can also use Kafka to process data in real time.

Surprisingly, NetApp has not said much about any new features for Microsoft Azure. That is a surprise. It will be interesting to see when NetApp says more about solutions for that platform.

Enterprise Times: What does this mean?

The move to private AI solutions, such as LLMs that are either company-wide or solution-specific, like technical and sales support, is growing. For organisations to build such solutions, however, they have a large problem to overcome. That is the way data is spread across the organisations in both local on-prem, corporate datacentre or cloud, storage.

Compounding that problem is the need to make sure the underlying data is trusted. That is easier said than done when you consider how that data is stored and how many apps have their own data silos. Much of the AI and ML work to date, along with much of the data analytics, assumes data is moved to a new location for that project. That is no longer enough when Gen AI is involved.

For the relearning process to work effectively, data has to be constantly accessible. That means when it is in multi-cloud environments, access needs to be considered along with the costs and time to move and process data. This move to deliver all-flash storage to work with the NVIDIA DGX platform, therefore, makes sense. Local processing of the data can be done, and the results combined.

Another area where this has a benefit is in explainability. By the solution always knowing where the data is, it can deliver transparency over answers, on demand.


Please enter your comment!
Please enter your name here