IBM S822LC Servers

At Supercomputer 16 (SC16) IBM and NVIDIA have announced what they call the fastest deep learning enterprise solution. The system is based on IBM Power System S822LC platforms that were announced in September. These systems contain the latest version of the IBM POWER8 processor that has NVIDIA NVLink embedded in it. IBM has also released a new deep learning toolkit called IBM PowerAI.

The solution is capable of running AlexNet with Caffe up to 2x faster than equivalent systems. It is also capable of outperforming systems running AlexNet with BVLC Caffe using 8 M40 GU-based x86 systems. This means that IBM has overtaken Intel in the battle for the world’s fastest deep learning framework not on one platform but two.

Ken King, General Manager, OpenPOWER Alliances, IBM Systems
Ken King, General Manager, OpenPOWER Alliances, IBM Systems

IBM PowerAI toolkit gives companies working on machine learning access to five different frameworks. This will appeal to researchers as it doesn’t lock them into any single solution. It also enables companies to build their own deep learning methods to train IBM Watson. This will accelerate the development of new solutions for many customers.

According to Ken King, General Manager, OpenPOWER: “PowerAI democratizes deep learning and other advanced analytic technologies by giving enterprise data scientists and research scientists alike an easy to deploy platform to rapidly advance their journey on AI. Coupled with our high performance computing servers built for AI, IBM provides what we believe is the best platform for enterprises building AI-based software, whether it’s chatbots for customer engagement, or real-time analysis of social media data.”

It’s all about the accelerators

What is important here is the use of accelerators. These are delivering a significant performance boost to IBM POWER8-based systems. Xilinx has been a major supporter of the OpenPOWER-based  SuperVessel developer cloud. In the last year the OpenPOWER Foundation has deployed three SuperVessel clouds. The latest of these is in Europe and came alongside other tools for developers working with POWER8 processors.

Mellanox has added accelerators to its network cards. It recently announced the ConnectX-6 adapters and Quantum switches that will run at 200Gb/s. It has not announced when it will add accelerators to these cards but it is on their roadmap. This will allow developers to deploy applications at the edge of the network.

The POWER8 with NVIDIA NVLink chip is just another accelerator. It provides an ultra high speed connection between the POWER8 processor and the NVIDIA Tesla P100 GPU accelerators. This removes the PCIe bottleneck in the current generation of motherboards. IBM is claiming that this use of accelerators is what allows it to outperform equivalent Intel processors.

Looking forward to the POWER9 revolution

Over the summer IBM has been quiet when it comes to knocking Intel off the benchmark top spots. This announcement follows the 4 world records set by Tencent Cloud using IBM POWER8 without the use of accelerators. As we get closer to the launch of POWER9 IBM is beginning to show its plans.

At the moment, IBM has the POWER8 chip which is a general purpose processor. This is why the additional of the accelerators makes such a difference in performance. With POWER9 that will change. The first big change will be a series of processors spread over 3 years. These will address specific markets and workloads. They will all be capable of working with different accelerators.

IBM has already said there will be another generation of the POWER9 with NVIDIA NVLink technology. What is not clear is what it might do to incorporate other accelerators and technologies from OpenPOWER Foundation partners into the POWER9 chip. Over the next six months as we approach the launch of that processor we will get more information about IBM’s plans.

What were the server configurations?

Below are the server configurations for both tests as supplied by IBM

Based on AlexNet Training for Top-1 50% Accuracy. IBM Power S822LC for HPC configuration: 16 cores (8 cores/socket) at 4.025 GHz with 4xNvidia Pascal P100 GPUs; 512 GB memory; Ubuntu 16.04.1 running NVCaffe 0.14.5 compared to IBM Power S822L configuration: 20 cores (10 cores/socket) at 3.694 GHz with 4xNvidia M40 GPUs; 512 GB memory; Ubuntu 16.04 running BVLC-Caffe f28f5ae2f2453f42b5824723efc326a04dd16d85. Software stack details for both configurations: G++ – 5.3.1, Gfortran –5.3.1, OpenBlas – 0.2.18, Boost –1.58.0, CUDA 8.0 Toolkit, Lapack –3.6.0, Hdf5 –1.8.16, Opencv –2.4.9.

IBM Power S822LC for HPC configuration: 20 cores (10 cores/socket) at 3.95 GHz with 4xNvidia Pascal P100 GPUs; 512 GB memory; Ubuntu 16.04 LE running IBM version BVLC 1.0.0-rc3 compared to Intel E5-2640v4 (Broadwell): 20 cores (10 cores/socket) at 3.6 GHz with 8xNvidia M40 GPUs; 512 GB memory; Ubuntu 16.04 LE running BVLC-Caffe 985493e9ce3e8b61e06c072a16478e6a74e3aa5a. Software stack details for both configurations: G++ – 5.4, Gfortran .4, OpenBlas – 0.2.19, Boost .58.0, CUDA 8.0 Toolkit, Lapack .6.0, Hdf5 .8.16, Opencv .4.9

Conclusion

This is yet another big success for IBM’s POWER8 processor. Once again it has been achieved by IBM working with OpenPOWER Foundation partners. It continues to validate the risk of making the POWER8 processor an open source hardware design.

The question is what will Intel do to compete? It has so far refused to follow ARM and IBM in allowing wider access to its core CPU architectures. It has also failed to deliver the same level of GPU integration as enabled through NVIDIA NVLink. Intel still controls the CPU market although IBM is eating into it.

Unless Intel starts showing its hand in terms of GPU accelerators to rival IBM and NVIDIA then IBM will continue to grow its market share. So far Intel seems to be treating IBM as an inconvenience. Of all the benchmarks IBM has taken from Intel, this one is the most important as it affects one of the key technologies for the next few years.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here