SupremeRAID™ by Graid Technology – Next Generation RAID using GPU’s

Posted on 07 June, 2023

Introduction

Allow me to introduce to you, our latest partner, Graid Technology. Over the past few years, Graid Technology have been working hard on something special and new for NVMe hardware RAID - pushing the market forwards with an innovative idea. This idea centres on the unusual concept of using a traditional GPU as a compute device for hardware RAID acceleration. Sounds unusual, but it really does make a lot of sense.

As you know, traditional Hardware RAID cards have been around for a long time and have served a critical purpose in storage server solutions, allowing SAS/SATA drives (both spindle and flash-based SSD) to be configured in RAID arrays and therefore providing redundancy for storage. As storage media technology evolved, we now of course have much faster drives than the traditional SAS/SATA protocols can handle, namely NVMe. Offering exponentially faster performance, NVMe drives use the PCIe bus rather than much slower SAS/SATA interfaces.

Your more traditional hardware RAID card occupies a PCIe slot and is then connected the HDD/SSD drives directly via cable, in servers this usually involves a backplane that has all drives connected to it.

The PCIe bandwidth of these cards (they are usually x8 PCIe lane cards) will simply be not enough to deliver the performance required by the NVMe drives. So, whilst hardware RAID cards do exist for NVMe, there are limitations in performance and the quantity of drives that can be used. If you have a PCIe x8 card for example and want to use x4 lane NVMe drives, then you would be able to connect the card to only 2x drive bays before you potentially saturate that link. Alternatively, you would have to use multiple controllers to connect to more drives, which in turn means you lose the ability to RAID across all of those drives as they will be on separate controllers.

Software defined software storage solutions using NVMe nowadays do not require hardware RAID, as redundancy is built into the software to cater for drive failures, but this process does mean the host CPU performance could take a hit as it is responsible for handling the task. This is the same for software-based RAID solutions such as MDADM in Linux & Intel’s VROC - the CPU will be handling the computational tasks for RAID management.

This is worse for arrays with parity calculations like RAID 5 and 6 but applies to all RAID levels to a varying degree. Due to limitations in their original design and architecture, software RAID tends to have trouble scaling linearly as arrays get more complex – generally they hit a ceiling as to how many drives performance they can handle before maxing out.

Graid Technology have studied these limitations and offer an innovative solution to truly unleash NVMe drive performance in RAID.

Meet the SupremeRAID™ SR-1010

The SR-1010 is a next generation hardware RAID card from Graid Technology. It uses PCIe 4.0 technology for the faster bandwidth capabilities available on the latest NVMe generation drives. The previous model, which still performs excellently is the SR-1000, offering the same function but is based on PCIe 3.0.

As we mentioned earlier, what makes this a unique product is that it is using a GPU to perform the RAID calculations. Graid Technology have taken a standard NVIDIA A2000 and re-purposed it with their software to create a next gen RAID controller.

The card doesn’t require any physical connections to the NVMe drives or server backplane, so installation is very easy. The NVMe drives attach to the PCI-Express bus in the usual way, and the SR-1010 manipulates the RAID array data through their host O/S driver. What enables this to scale is quite simple – the SR-1010 doesn’t need to be in the path of data which is being read from the NVMe drives; that goes straight to the place it’s needed. Only in the event of parity writes does data need to be sent to the card for processing, and since it’s a high-performance GPU with thousands of CUDA cores it handles that task with ease.

The SR-1010 is a dual width x16 PCIe 4.0 card that takes up 2 slots as it is double width. It could fit in a variety of Boston’s NVMe storage servers.

The SupremeRAID™ SR-1010

The SupremeRAID™ SR-1000

Boston Labs Test Setup

At Boston we have a plethora of server configurations including many options for NVMe flash storage, with 1U servers being able to support up to a massive 32x NVMe drives!

For our SupremeRAID™ SR-1010 test we have a Boston 2U Ultra server using Supermicro hardware. This server supports 12x hybrid 3.5” and 2.5” SATA/SAS/NVMe drive bays.

The full spec of the testing:

  • 2x Intel Xeon Scalable 3rd Generation 6330 28C 56T 2/3.1GHz
  • 16x 32GB DDR4 2933MHz = 512GB Memory
  • SupremeRAID™ SR-1010 Storage Controller
  • 10x KCD6XLUL1T92 - Kioxia CD6 1.92TB NVMe PCIe 4x4 2.5" NVME Drives

As mentioned, the setup was very easy, I installed the card with Windows Server 2022 and later switched to Linux (Ubuntu), the driver and software setup was straightforward. All that’s needed is the NVIDIA driver for the card, SupremeRAID™ software driver and the command line utility to control the RAID functionality. I would like to praise the utility because it is very easy to create and manage drive groups and RAID volumes.

You can create, delete, list physical drives and drive groups using simple commands and even use abbreviations to make things even quicker.

Performance of SupremeRAID SR-1010 vs MDADM/VROC

For the testing I needed something to compare SR-1010 with, there isn’t like or like hardware NVMe RAID solution (without caveats) so alternatives to this technology are software such as MDADM, Intel VROC. Broadcom do offer Tri-Mode RAID cards that can connect all types of drives (SAS/SATA/NVMe), but I have found they are limiting in the number of drives we can use. For example, my test server has 12 NVMe bays and the other Tri-mode solutions I found were limited to 4 drives per card, so I would need multiple cards to connect to all NVMe bays in the server. So Graid Technology presents a solution without the need to cable the controller to the servers drive backplane.

FIO

Flexible I/O (FIO) is a benchmark for storage devices, offering a host of options for generating I/O and testing storage arrays with many options to finely tune the parameters. For my testing I compared the SR-1010, MDADM and VROC using FIO in Ubuntu 20.04.

I ran tests for Random read/write using a 4KB block size and Sequential read and write using 1MB block size. As these are NVMe drives and can handle a lot of I/O, I set the number of jobs (threads) to 56 to match my CPU core count and the Queue Depth to 64 for Random and 32 for Sequential tests.

Enterprise users will be more interested in RAID 5/6 as this will make more use of the capacity of the NVMe drives than RAID 10, for example. But just for testing purposes I ran the benchmarks on RAID 10 too.

My FIO scripts were as follows:

[global]
filename=/dev/gdg0n1
bs=4K (or 1MB for sequential)
time_based=1
runtime=300
numjobs=56
cpus_allowed=0-55
randrepeat=0
ioengine=libaio
direct=1
random_generator=tausworthe64
cpus_allowed_policy=split
group_reporting=1

[randread]
iodepth=64 (or 32 for sequential)
rw=randread (other tests were randwrite, read, write)

The queue depth could be increased, but I found the latency getting very high on the certain MDADM and VROC tests so decided on 64/32 across the board.

Single Drive Test

To start with I tested a single drive using FIO in order to get some baseline figures. The results below are for 1x Kioxia CD6 1.92TB drive.

Read / Writes – higher is better

RAID 5

For Random tests we look at IOPS and for Sequential the throughput is measured. For RAID 5 we can see the SR-1010 perform much better in the random data tests, hitting 7 million IOPS in the read test which is double that if the MDADM and VROC arrays.

Raid 5 SR-1010 MDADM VROC
Random Read (56T/64QD) – IOPS 7,277,000 3,624,000 3,061,000
Random Write (56T/64QD) – IOPS 152,000 44,200 43,000
Sequential Read (56T/32QD) – GB/s 34.9 34.9 36.3
Sequential Write (56T/32QD) – GB/s 10.6 0.25 0.23

Read / Writes – higher is better

Sequential read is a much closer result with VROC showing the best performance. But again, on sequential write the SR-1010 is far and away the best performing with 10GB/s vs a much slower ~0.25GB/s on the software RAID solutions.

A word on CPU Usage and Latency

I mentioned latency before and in the sequential tests for MDADM and VROC it was very high. We are talking latency of seconds rather than milliseconds, making this a hard sell for any enterprise solution. The SupremeRAID™ SR-1010 performs much better here, across all tests.

Raid 5 Latency SR-1010 MDADM VROC
Random Read – Avg latency (ms) 0.49 0.99 1.17
Random Write – Avg latency (ms) 23.51 81.04 83.56
Sequential Read – Avg latency (ms) 56.2 53.78 51.77
Sequential Write – Avg latency (ms) 177.38 7178.74 8212.17

Latency - lower is better

With regards to CPU usage, the random read tests hit the CPU hard for MDADM and VROC and seemed to use all cycles during the FIO test which seemed somewhat strange, perhaps the number of jobs (56 to match the cpu cores) is causing the high usage. It’s clear to see a stark difference comparing the SupremeRAID™ SR-1010 with the others. This means CPU cycles will be free for other tasks, making overall server performance higher.

CPU usage – lower is better

Raid 6

Back to the bandwidth results, looking at RAID 6 we see a similar pattern with the SR-1010 performing better in all tests except sequential read.
Note, VROC does not support RAID 6 hence no results here.

Raid 6 SR-1010 MDADM VROC
Random Read (56T/64QD) – IOPS 7,241,000 3,677,000 n/a
Random Write (56T/64QD) – IOPS 605,000 37,500 n/a
Sequential Read (56T/32QD) – GB/s 30.8 34.5 n/a
Sequential Write (56T/32QD) – GB/s 9.71 0.19 n/a

Read / Writes – higher is better

Raid 10

Again, for RAID 10 there no results for VROC as only 4x drives are supported.
The random tests again show much higher performance in the SR-1010, with MDADM edging it in sequential read only.

Raid 10 SR-1010 MDADM VROC
Random Read (56T/64QD) – IOPS 6,857,000 1,541,000 n/a
Random Write (56T/64QD) – IOPS 1,435,000 150,000 n/a
Sequential Read (56T/32QD) – GB/s 35.4 36.1 n/a
Sequential Write (56T/32QD) – GB/s 6.11 0.63 n/a

Read / Writes – higher is better

Future Updates

Graid Technology are always looking to increase performance of the SR-1010 and SR-1000 card with driver and software updates. A future update set to increase RAID 5 sequential writes by 4x! A With talk of 90GB/s in RAID 5 sequential writes, this is a huge boost in performance and emphasises there is more to come from Graid Technology in the future. We will be sure to test this new release once it is out in May 2023! To read more about it please see the press release:
https://www.graidtech.com/blocks-files-features-new-supremeraid-software-release/

conclusion

With the SupremeRAID™ SR-1010 from Graid Technology, we have a very interesting solution for NVMe based RAID arrays. Until this point there have been very few options to get the best performance out of NVMe drives whilst maintaining the redundancy for drive failures. There are potential bottlenecks with traditional RAID cards and with software solutions. These are restricted by number of drives, bandwidth, higher CPU usage and higher storage latency.

Therefore, to utilise the NVMe drives to their true capabilities and to get the most out of your investment, Graid Technology offers an enticing solution.

The technology is new so our recommendation is to try it for yourselves and that is where we can come in! Boston Labs is all about enabling our customers to make informed decisions in selecting the right hardware, software, and overall solution for their specific challenges. If you’d like to request a test drive of the SupremeRAID™ SR-1010 , please get in contact by emailing [email protected] or call us on 01727 876100 and one of our experienced sales engineers will gladly guide you through building the perfect solution just for you.

Written by: Sukhdip Mander - Field Applicaion Engineer at Boston Limited

Tags: RAID, GPU, tech blog, GRAID, Benchmark

RSS Feed

Sign up to our RSS feed and get the latest news delivered as it happens.

click here

Test out any of our solutions at Boston Labs

To help our clients make informed decisions about new technologies, we have opened up our research & development facilities and actively encourage customers to try the latest platforms using their own tools and if necessary together with their existing hardware. Remote access is also available

Contact us

ISC 2024

Latest Event

ISC 2024 | 13th - 15th May 2024, Congress Center, Hamburg

International Super Computing is a can't miss event for anyone interested in HPC, tech, and more.

more info