+7 (812) 622 16 80

RAIDIX Data Storage System for the Intel Lustre Cluster

22.02.2018

Data storage solutions for HPC are committed to data protection, data availability, scalability and sustainable system performance. RAIDIX management software bundled with Intel® Enterprise Edition for Lustre* enables the essential supercomputer functionality and creates an effective storage cluster based on COTS hardware.

This document features technical descriptions of the RAIDIX and Lustre solutions, as well as recommended hardware architecture and the scheme of data storage deployment tailored to HPC requirements.

 

Introduction

Today’s High-Performance Computing (HPC) is not only an IT tool for researchers and scientists. A booming number of companies unlock the HPC benefits in versatile verticals. Enterprises generate large data volumes and employ high-performance applications for data analysis and processing. Aside from business process continuity, the corporate sector bets on data availability and access performance. Companies require highly scalable storage infrastructures with top throughput and fault-tolerance metrics.

The commercial solution Intel® Enterprise Edition for Lustre* builds on the Lustre software functionality optimized for reliable storage and maximum performance in HPC. The solution’s key benefits include record productivity, scalable capacity, proprietary management components and round-the-clock support.

For servicing the HPC workloads, RAIDIX has come up with an integrated solution based on the cluster-in-a-box RAIDIX HPC technology and the Intel Lustre software. At the core of the comprehensive system sits the RAIDIX volume manager operating on commodity hardware with pre-installed Lustre OSS (object storage server)/OST (object storage target) or MDS (metadata server)/MDT (metadata storage target).

RAIDIX is essentially a building block for high-availability Lustre HPC infrastructures. Such blocks may encompass from 8 to 128 drives in a single high-density chassis with performance showings up to 12GB/s. Separate storage nodes may consolidate within a scale-out system using Intel® Enterprise Edition for Lustre*.

RAIDIX Data Storage adheres to the stringent performance, fault-tolerance and data integrity requirements, and ensures record throughput, low latencies and fast failover due to the patented algorithms in RAID 6 and RAID 7.3. The unique algorithms reveal calculation speeds of 37GB/s (in RAID 6) and 25GB/s (in RAID 7.3) per core.

The traditional methods of Lustre OSS/MDS set-up require dedicated hardware and standalone configuration of each server. In contrast to this approach, RAIDIX allows the administrator to build an HPC infrastructure from integrated blocks and decrease TCO leveraging full compatibility with COTS hardware and SAN/NAS interfaces.

Hereinafter, you’ll find hardware requirements for a dual controller RAIDIX configuration and description of the comprehensive solution functionality.

 

Dual Controller RAIDIX Storage

RAIDIX is a management component for building high-performance data storage systems with the use of commodity hardware platforms based on Intel processors. To ensure full fault-tolerance, the RAIDIX solution can be configured in a Dual Controller (DC), Active-Active cluster mode. DC configurations reveal a perfect match with SBB (Storage Bridge Bay)-compatible platforms that include HA storage components out-of-the-box.

The general tech requirements for the RAIDIX DC platform:

CPU

Intel Xeon E5-2637 v4/E5-2667 v4 processors

Motherboard

Must support the chosen processor and PCI Express 3.0 x8/x16

Internal cache memory

Must support the corresponding motherboard, from 64 GB for each node

Chassis

Recommended: dual power supply, dual motherboard

SAS-controller (extra ports may be utilized for JBOD connectivity)

Recommended: Broadcom 93xx

HBA (cache synchronization controller)

Recommended: Mellanox ConnectX-3 VPI or higher

HBA (Lustre network connectivity controller)

Recommended: Mellanox ConnectX-3 VPI or higher

HDD

SAS disks are required for DC architectures

Level 2 cache devices

HGST SSD SS200

Lustre network

InfiniBand* QDR/FDR/EDR, Ethernet
10GbE/40GbE/100GbE

Management network

Ethernet 1GbE

Comprehensive Solution

RAIDIX allows the end customer to build a storage system with fast and reliable failover, high-performance data processing, advanced data protection and monitoring functionality. The RAIDIX software integrated with Intel® Enterprise Edition for Lustre* includes an installation package for Intel Xeon-based systems. The RAIDIX Erasure Coding (EC) algorithms tailored to Intel architectures enable record performance of input/output operations.

As for the scale-out Intel Lustre, this technology puts in a host of functional benefits:

  • High manageability with Intel Manager for Lustre.
  • High I/O performance for corporate applications such as MapReduce.
  • Support for the Intel Xeon Phi client.
  • Hadoop connector that enables Lustre implementation for Hadoop applications.
  • Complete hierarchical management of data storage.
  • Single thread improvement patch.

Storage Management

RAIDIX-based data storage employs a user-friendly web interface that allows for volume configuration and system performance monitoring.

Lustre Cluster Management

The Lustre-based cluster is managed through Intel Manager for Lustre — a web application based on REST API and a full-fledged CLI. The application’s functionality encompasses:

  • Set-up and monitoring of Lustre file systems
  • Server and volume configuration
  • Performance and resource allocation management.

Volume Data Protection

The RAIDIX software employs patented erasure coding algorithms optimized for high-performance tasks. RAIDIX supports RAID 0, RAID 5, RAID 6, RAID 6i, RAID 7.3, RAID 7.3i, RAID 10, and RAID N+M levels.

RAID 6. The level of interleaving blocks with double parity distribution, based on RAIDIX’s proprietary mathematical algorithms. RAID 6 is characterized with improved performance, since each drive is processing I/O requests (entries) independently, allowing for parallel data access. RAID 6 can sustain complete failure of two drives in the same group.

RAID 7.3. The level of interleaving blocks with triple parity distribution that allows for data reconstruction in case up to 3 drives fail. The patented RAID 7.3 unlocks high performance without any additional CPU load.

RAID 7.3 also has greater reliability compared to RAID 6: 3 checksums are calculated using multiple algorithms, the capacity of 3 drives is allocated for checksums. This RAID level is highly recommended for arrays over 32TB capacity.

RAID N+M. The level of interleaving blocks with ‘M’, i.e. a variable number of, checksums. Another patented know-how, RAID N+M allows the user to define the quantity of disks for checksums allocation. RAID N+M requires at least 8 disks and can sustain complete failure of up to 32 drives in the same group (depending on the number of parity disks).

Silent Data Corruption Protection

Silent Data Corruption can be caused by failures in drivers and drive firmware, memory errors, drive-head crashes and similar software and hardware problems. Silent errors on writing may go undetected by drive firmware or host operating systems and result in corrupted data structure and subsequent data loss.

RAIDIX includes a silent error correction algorithm that analyzes RAID metadata to detect and fix corruption incidents while regular drive operations are performed, without performance degradation.

Partial Reconstruction

RAIDIX employs a mechanism of partial RAID reconstruction that allows the system to restore only a particular area containing corrupted data on a hard drive, thus reducing overall array recovery time. Partial reconstruction is effective for larger arrays.

RAIDIX permanently tracks 1/2048th part of the drive and can restore damaged data only — on the fly. Reconstruction of damaged blocks is performed at high speed, maintaining correct writing for all data streams. As a result, the system enables reconstruction time economy and facilitates enclosure maintenance.

Sustainable High Performance

All RAID algorithms are calculated based on standard Intel Xeon processors with high performance and high levels of parallelization. The Advanced Reconstruction feature optimizes read performance by eliminating the drives with the lowest read rates. Fast computing algorithms make it easier to calculate data than read it physically from disk media. Advanced Reconstruction allows the data storage system to sustain required data rate even in degraded mode and during RAID reconstruction.

High Data Availability

The RAIDIX cluster system allows the administrator to create a fault-tolerant high-performance cluster (in the dual controller mode) and place arrays asymmetrically on the nodes. Each RAID’s data is available through the other node. At that, the parallel Lustre file system enables the customer to perform concurrent reading and writing to all OST volumes, thus boosting overall performance.

The manual and automatic failover features in RAIDIX ensure rock-solid fault-tolerance. In addition, RAIDIX enables high balanced performance due to seamless RAID migration across the nodes.

Lustre integration into the dual controller RAIDIX system allows the user to:

  • Asymmetrically allocate multiple Lustre OST on each RAIDIX cluster node and balance performance for each node
  • Ensure high availability of OST- and MDT-stored data (should a node fail, the data will remain available on the other node);
  • Blend the Lustre OST and MDT fault-tolerance mechanism into the node-level failover process. In case additional services such as Corosync or Pacemaker are required, the RAIDIX cluster will perform the Lustre failover.

 

Deployment Scheme

Image 1. The system deployment scheme

The system deployment scheme above is recommended for a typical HPC application:

  • The RAIDIX DC architecture is utilized for higher availability of each OST.
  • Lustre OSS in Active-Active is installed on each RAIDIX DC controller employed for OST.
  • Each OST in the RAIDIX cluster registers on both OSS servers installed on the cluster nodes. The native RAIDIX failover is configured: should one OSS fail, the RAIDIX fault-tolerant mechanism will grant control to the other — fully functional — OSS.
  • Lustre MGS (management server) and MDS (metadata server) should also be configured in the fault-tolerant DC mode to ensure greater availability of MGT and MDT targets.
  • Intel Manager for Lustre is installed to attain advanced management and system monitoring functionality.
  • 1GbE Ethernet is used for network connectivity management.
  • InfiniBand 56Gb is employed for Lustre connectivity.
  • Each client machine has the Lustre system installed.

Following these recommendations allows the user to build a high availability HPC infrastructure.

 

Suggested Architecture

For a hardware platform, RAIDIX recommends the use of a single chassis and identical SBB devices. The platform scales up with JBOD enclosures for increased capacity and performance.

AIC HA201-ТP is a 2U high-availability cluster-in-a-box solution based on commodity-off-the-shelf components. The dual controller configuration builds on two Intel (S26xxTP) servers. Each node supports a dual Intel Xeon processor, E5-2600 v4 series.

The HA201-TP solution ensures high data availability in Active-Active and includes fault-tolerant, hot swappable calculation nodes, 24 HDD bays and 5 PCIe Gen3 slots per node.

Platform

AIC HA201-ТP SBB

CPU

Dual processor Intel Xeon E5-26xx v4 for each motherboard

Motherboard

Intel Server Board S2600TP

Internal cache memory

64 GB per node

Chassis

AIC HA201-TP, dual motherboard, double power supply, 24 hot-swappable HDD bays

SAS-controller (internal backplane connection)

Broadcom 9300 8-i

HBA (for cache synchronization)

Dual-port adapter Mellanox ConnectX-3 or higher

HBA (Lustre network connection)

Mellanox ConnectX-3 or higher

HDD

24x NL-SAS 7.2K

RAIDIX software

v. 4.5

Intel® Enterprise Edition for Lustre*

v. 2.x/3.x

Image 2. AIC HA201-TP SBB Module front and back pane

Business Impact

The integrated solution comprised of the RAIDIX HPC technology and Intel® Enterprise Edition for Lustre* is a reliable building block for HPC infrastructures. The solution complies with the high performance, fault-tolerance and data integrity requirements, and ensures low latencies and reliability. The bundle of RAIDIX and Lustre stands for:

  • Decreased hardware expenses.
  • Decreased connectivity expenses.
  • Flexible configuration, ease of implementation and maintenance.
  • Fast failover and high data availability.

Download Whitepaper [PDF]: Data Storage System for the Intel Lustre Cluster