Micro… Supermicro!
We were once asked to test a Supermicro platform based on E1.S. form factors — and, fortunately, we recently just had servers available for us in a laboratory in the Netherlands. So we ran the tests and are now ready to tell you how it went.
But before we dive into the tests, we will remind you what are these E1.S, and what they are made for. So let’s start with the basics.
How the EDSFF appeared
As the market got rid of outdated data access methods, NVMe SSD manufacturers have taken on the form factor. This is how the new EDSFF standard appeared, which, according to Intel developers, was better adapted to a data center environment. EDSFF (which stands for Enterprise and Data Center SSD Form Factor, and for many is known as Ruler) was invented to do just this: to provide the minimum total cost of storage on Flash drives on a data center scale, according to the principle ‘less space, more 1PB drives per 1U, less costs’.
But, of course, there was something to be sacrificed with this approach — performance per drive, for instance.
Supermicro servers support two types of drives: long and short. These form factors are described in the SNIA specifications:
- ‘short’ — SFF-TA-1006;
- ‘long’ — SFF-TA-1007.
In addition to the density of TB, IOps and GBps, drives from the EDSFF family can significantly reduce power consumption. In their marketing materials, Intel and Supermicro claim it is dozens of percent compared to U.2.
How did our tests go?
So, our colleagues from Supermicro have installed an SSG-1029P-NES32R server in that Dutch laboratory. The server is positioned as hardaware for databases, IOps-intensive applications and HPC infrastructures. It is based on the X11DSF-E motherboard with 2 sockets for installing the second generation Intel Xeon Scalable processor. In our case there were two Intel® Xeon® Gold 6252 processors, 8 memory sticks of 32GB, DDR4-2933 MHz and 32 Intel® SSDs DC P4511 Series. The platform, by the way, supports Intel® Optane ™ DCPMM.
For communicating with the ‘outside world’, there were these interfaces:
- 2x PCI-E 3.0 x16 (FHHL) slots,
- 1x PCI-E 3.0 x4 (LP) slot.
We will not list the rest of the technical characteristics as all the information is available on the vendor’s website. It is better to pay attention to the configuration and the results.
It is worth saying that Supermicro supplies such platforms only assembled. As a vendor, we can understand this policy, but as a potential buyer, we do not are not excited 🙂
FIO configuration
[global] filename=/dev/era_dimec ioengine=libaio direct=1 group_reporting=1 runtime=900 norandommap random_generator=tausworthe64 [seq_read] stonewall rw=read bs=1024k offset_increment=10% numjobs=8 [seq_write] stonewall rw=write bs=1024k offset_increment=10% numjobs=8 [rand_read] stonewall rw=randread bs=4k numjobs=48 [rand_write] stonewall rw=randwrite bs=4k numjobs=48
iodepth was varied with the script.
System configuration
OS: Ubuntu Server 20.04.3 LTS Kernel: 5.11.0-34-generic raidix-era-3.3.1-321-dkms-ubuntu-20.04-kver-5.11 BOOT_IMAGE = / vmlinuz-5.11.0-34-generic root = / dev / mapper / ubuntu - vg-ubuntu - lv ro noibrs noibpb nopti nospectre_v2 nospectre_v1 l1tf = off nospec_store_bypass_disable no_stf_barx = offas = off off
So, we had 32 Intel® SSD DC P4511 installed in our system.
As we always suggest, you need to do the math first, and ask yourself: how much performance, in theory, can I get?
According to the specification, the capabilities of each drive are as follows:
- maximum sequential read speed — 2800 MB/s;
- maximum sequential write speed — 2400 MB/s;
- random read speed — 610 200 IOps (4K Blocks);
- random write speed — 75,000 IOps (4K Blocks).
But as we ran a simultaneous test of the performance of all drives, we reached 9 999 000 IOps.
Almost 10 million! Although the expected performance should be close to 20 million IOps … At first, we thought it was just us, not the system. But after we examined it all thoroughly, it turned out that the problem lied in PCIe lines’ oversubscription. In such a system, the maximum per drive can be obtained only when the drives are half-loaded:
By reducing the block size to 512b, we managed to achieve a total drive performance of 12 million IOps for reads and 5 million IOps for writes.
Sure it hurt to lose half the performance, but 12 million IOps per 1U is more than enough. It is unlikely that you will ever find an application that would put such a workload on a storage system with two sockets.
But it wouldn’t be us…
if we hadn’t run two tests with RAIDIX on board!
As usual, we tested RAIDIX ERA in comparison with mdraid. Here is a summary of the results at 1U:
Parameter | RAIDIX ERA
RAID 5 / 50 |
LINUX SW
RAID 10 |
LINUX SW
RAID 5 |
Rand Read Performance (IOps / Latency) | 11 700 000
0.2 ms |
2 700 000
1.7 ms |
2 000 000
1.5 ms |
Rand Write Performance (IOps / Latency) | 2 500 000
0.6 ms |
350 000
5.3 ms |
150 000
10 ms |
Useful capacity | 224 TB | 128 TB | 250 TB |
Sequential Read Performance | 53 GBps | 56.2 GBPS | 53.1 GBPS |
Sequential Write Performance | 45 GBps | 24.6 GBps | 1.7 GBps |
Sequential Read performance in degraded mode | 42.5 GBps | 42.6 GBps | 1.4 GBps |
Mean CPU load at MAX perf | 13% | 24% | 37% |
We also got results for ERA and for various workloads:
Workload / Configuration | Performance |
4k Random Reads / 32 drives RAID 5 | 9 999 000 IOps,
latency 0,25 ms |
512b Random Reads / 32 drives RAID 5, 512b BS | 11 700 000 IOps,
latency 0,2 ms |
4k Random Reads / 16 drives RAID 5 | 5 380 000 IOps,
latency 0,25 ms |
512b Random Reads / 16 drives RAID 5, 512b BS | 8 293 000 IOps,
latency 0,2 ms |
4k Random Writes / 32 drives RAID 50 | 2 512 000 IOps,
latency 0,6 ms |
512b Random Writes / 32 drives RAID 50, 512b BS | 1 644 000 IOps,
latency 0,7 ms |
4k Random Writes / 16 drives RAID 50 | 1 548 000 IOps,
latency 0,6 ms |
512b Random Writes / 16 drives RAID 50, 512b BS | 859 000 IOps,
latency 0,7 ms |
1024k Sequential Reads / RAID 5, RAID 50 | 53 GBps |
1024k Sequential Writes / RAID 50, ss=64, merges=1, , mm=7000, mw=7000 | 45,6 GBps |
Of course, all the drives were beforehand prepared in accordance with the SNIA methodology. And at launch, we would vary the load — queue depth.
At what queue depth we did we get this latency, you may ask.
It was at the minimum latency, as we reached the performance peak. On average, that’s about 16:
During the tests, we found out about another feature of the platform (or rather the drives): with a low offset_increment value, they began to decline in performance, and quite noticeably. It seems that the drives really hate it when time is too short between calls to the same LBA.
Final Thoughts
The use scenarios for a system based on the SSG-1029P-NES32R platform, of course, are not limitless. The reasons lie in the rather high cost of the system and the small number of PCIe slots for such a capacity of the storage subsystem.
On the other hand, we managed to achieve excellent performance results, which is rare for E1.S drives. We, of course, knew that RAIDIX ERA would boost IOps, but to witness (again) a 5-10 times increase in random workloads IOps, and a modest 13% CPU load is always nice.
Do you need all this? Ask yourself (and a couple of other people at work). It happens that everything suits you, and then let a server, the form factor and performance remain as they are. If you want something more modern and faster, then you have just read about one alternative.