OmniSci has announced a CPU driven version of the OmniSci Enterprise Edition. It also announced a collaboration with Intel to improve the performance of its Accelerated Analytics Platform. The news was announced at Converge 2019, OmniSci’s inaugural user conference held at the Computer History Museum in Palo Alto.
According to Todd Mostak, OmniSci CEO and co-founder: “From our beginning, OmniSci has been focused on providing our users the fastest and most frictionless path to insight on the largest datasets. To do so, we architected our platform to take advantage of all the performance and parallelism of modern hardware, whether GPU or CPU.
“Today’s announcement of our collaboration with Intel, and the launch of a CPU version of our enterprise platform, is a natural next step towards our mission, and provides an even larger set of users access to the speed and capabilities of the OmniSci platform, regardless of the hardware they use.”
Why expand to include CPUs?
The advantage of using GPUs is the need for parallel processing. GPUs have many multiples of cores compared to CPUs. This is why OmniSci claims its Accelerated Analytics Platform solves the parallel processing problem. On its website it states: “OmniSci is a breakthrough technology, originating from MIT, designed to leverage the massively parallel processing of GPUs alongside traditional CPU compute, for extraordinary performance at scale.”
So why change? The answer is customer demand. Not every solution requires the cost and complexity of GPUs. Moving to CPUs means fewer processor cores but opens up the product to a much wider group of customers. It allows them to run on-premises or in their own data centre taking advantage of racks of cheap servers.
Another benefit is that it enables them to move data science work off of the servers. Data scientists and users can now use workstations, desktops and even laptops. Mostak showed a demo with over 6 million rows of data running on a MacBook with 6 cores and 32GB RAM. Allowing this level of access across a wider diversity of devices enables more customers to take advantage of the software.
OmniSci benchmarks show performance levels still high
As part of the announcement, OmniSci conducted its own benchmarks. It stated: “OmniSci recently benchmarked the OmniSci platform running on Intel Xeon Scalable processors and Intel Optane DC Persistent Memory. The results show orders-of-magnitude better performance for certain workloads compared to open source analytics technologies.”
What is missing is the comparison between the GPU and CPU elements as part of this announcement. This is disappointing. The benchmarks should have been done independently. This is something that Mostak said might well happen in the future. It would also have better to have real-world workloads used in the tests. Customers don’t have the tuned data sets that vendors often use in their own benchmarks.
There was also no comparison between different generations of Intel processors or server architecrtures. This is likely to show a significant difference that customers will want to consider when adopting the software. The performance stats released by the company assume the use of Intel Optane persistent memory. The question is how many customers are using Optane? The answer is likely to be very few.
Another issue that the company needs to address is how it intends to deal with multiple fabrics addressing different server designs and memory architectures. This would allow the company to target the hyperscalers and cloud companies who are looking for solutions in this space.
Enterprise Times: What does this mean?
Refocusing the software so that it runs effectively on CPUs as well as GPUs is a good thing. But the lack of detail in the benchmarks is disappointing. OmniSci also needs to think about what customer environments really look like and show a wider set of benchmarks and performance markers.
Putting that aside, anything that improves the underlying performance of massive dataset analytics is to be welcomed. There is an increasing number of organisations that are looking at pulling in data from vast arrays of sensors, IoT and other devices. To process that data in real-time and get meaningful analytics that can be actioned, requires solutions engineered for that. Most large data analytics fail at this point.
With this announcement coming just a few days after the Spark+AI conference in Amsterdam, we are moving into a new era of analytics. An increasing amount of that processing is also feeding into Machine Learning and AI. For those organisations that have been struggling to get meaningful results from their existing projects without significant changes of architecture, the future is looking increasingly interesting.