Tencent Cloud has smashed 4 world records for sorting data. The word smashed is appropriate as the new benchmarks are between 2 and 5 times better than 2015. Tencent Cloud now holds the record in both versions of the Gray and Minute tests that are administered by Sort Benchmark. The records were set using IBM POWER8 processors, IBM OpenPOWER hardware and Mellanox 100Gbps ConnectX4-EN network cards.
The tests are not just about raw horsepower. They are also used to test the performance of new architectures. This is important for scientific research, governments and organisations such as banks. They are all having to deal with increasing volumes of complex data that needs to be pulled across networks in order to be sorted and used. This means that they need new compute and networking architectures that are capable of handling ultra large volumes of data.
According to Zeus Jiang, Vice President of Tencent Cloud and General manager of Tencent’s Data Platform Department: “In the future, the ability to manage big data will be the foundation of successful Internet businesses. Tencent Cloud can provide precise high performance computing to enterprises using minimal time and resources. We will continue to improve the back end technology for our cloud service by optimizing architecture, software and hardware to help global enterprises solve complex business challenges by leveraging hyper-scale computing platforms.”
What are the four new records?
There are two categories each of which is run using two tests that look at performance. The two categories are GraySort and MinuteSort. GraySort is a measure of how many terabytes of data can be sorted per minute. It requires a minimum 100TB of data to be sorted. The MinuteSort is the amount of data that can be sorted in 60 seconds or less.
The two performance tests are names Daytona and Indy. Daytona is focused on sort code that can be applied to general purpose computing. This makes it a good test of the type of data enterprises and research establishments are dealing with. Success here is also about how effectively the data can be moved across the network from the storage arrays.
Indy is about pure speed. The data records are just 100-bytes in length and the keys are just 10-bytes. One area where Indy is likely to be applied is in sorting metadata sets. This is applicable to both intelligence and government agencies working with anonymised data and some types of specialist research.
Sort benchmark competition | 2016 World Record (Tencent Cloud) | 2015 World Record | 2016 Improvement |
Daytona GraySort | 44.8 TB/min | 15.9 TB/min | 2.8x greater performance |
Indy GraySort | 60.7 TB/min | 18.2 TB/min | 3.3x greater performance |
Daytona MinuteSort | 37 TB/min | 7.7 TB/min | 4.8x greater performance |
Indy MinuteSort | 55 TB/min | 11 TB/min | 5x greater performance |
What hardware did Tencent Cloud use?
The same hardware configuration was used in all four tests. It consisted of:
512 nodes x (2 OpenPOWER 10-core POWER8 2.926 GHZ, 512 GB RAM, 4 x Huawei ES3600P V3 1.2TB NVMe SSD, 100Gb Mellanox ConnectX4-EN).
These tests were conducted before IBM released the latest generation of its POWER8 processor with support for NVIDIA NVLink. That means that Tencent Cloud only had access to IBM Coherent Accelerator Processor Interface (CAPI) to speed up access to data on flash storage. Next year it will also be able to draw on GPUs to improve performance and the POWER9 processor which is expected to ship in 2017.
Surprisingly there was no mention of the use of accelerators on the Mellanox network components. Using the accelerators it would have been possible to do a pre-sort on the data as it moved across the network. This would enable the processor to work through data even faster and handle larger data sets.
The use of GPUs and accelerators is growing. It will be interesting to see if we see new tests introduced next year by the Sort Benchmark committee. This would be useful as it would provide a way to explore new system designs that use both these elements.
Conclusion
IBM is continuing to eat into the benchmark world that has for a long time been owned by Intel. This will cause some disquiet at Intel especially as IBM has overtaken it in the race to integrate GPUs and accelerators into the compute cycle. While these tests didn’t take advantage of those technologies next year’s tests will.
The need to manage ever larger volumes of data is essential to many enterprises. Benchmarks like this are important due to their focus on general purpose data sets and architectures. It will be interesting to see if this leads to an uptake of systems from IBM’s OpenPOWER partners.
Impressive numbers! and without accelerators … I can’t imagine what they’d do with NVLink + GPUs + etc. … Intel will probably react and we’ll see some submissions with Xeon Phi’s … let’s see. Good news for IBM folks! They always leaded the innovation.