RAID L1/5/10 Benchmarks for oVirt/KVM Virtualization Servers

Preface

One day we finally decide to install a virtualization cluster. Although we had just several Linux and 2 Windows servers, managing all that staff scattered across different boxes becomes too cumbersome and time consuming, so move and run them under KVM (kernel virtual machine) on 2 physical servers was a logical solution. Virtualization cluster would consist of only 2 nodes (HP ProLiant DL380) + 1 controller (host engine), managed by oVirt (great free open source virtualization manager developed by smart folks from Red Hat). CPU resources were not problem, my only concern was hard disk subsystem bottleneck, with many parallel IO threads from a dozen of virtual machines. Large capacity server grade SLC SSD were quite expensive in 2018, due to overall cost concerns I had to opt for conventional hard drives.

WARNING – consumer grade MLC and TLC SATA SSD are NOT fully compatible with SAS RAID controllers! They may work for some time, yet during high load peaks system will crash, often rendering partitions on SSD unreadable. Tested WD, KingFast, KingSpec SSD with HP and Broadcom (3ware/AMCC) SAS RAID controllers, none of this combination performed reliably. IOzone disk test always led to complete freeze!

Benchmark tests were run on RAID L1 (2 x HP SAS 10000 rpm HDs), L5 (4 x WD Red NASware 5400 rpm) and L10 (same 4 x HD as L5), plus L10 via loopback NFS. Shared networked storage is a crucial component for the oVirt installation (here I’m explained why).

Hardware and Software Setup

  • Server HP ProLiant DL380 G7, 2 x Quad-Core Xeon L5630 2.13GHz, 36 GB RAM
  • HP SAS Smart Array P410i 512MB Cache (25% Read / 75% Write) <- remember this ratio!
  • Hard drive set #1: HP SAS 146GB 10000rpm DG146ABAB4 2.5” – 2 pcs (formatted as L1 RAID)
  • Hard drive set #2: WD Red SATA NASware 3 SATA 1TB 5400rpm WD10JFCX-68N 2.5” 16 MB Cache – 4 pcs (formatted as 3TB L5 or 2TB L10 RAID)
  • Linux CentOS 7.4 with 4.x mainline kernel
As name implies, Western Digital NASware hard drives designed especially for network attached storage and 24/7 duty cycle. They run silent and relatively cool. My only concern was slow 5400 rpm. HP ProLiant DL380 server has 8 x 2.5″ hot-swap slots for SAS or SATA hard drives. In finished system 2 x 146 GB SAS HDs in RAID L1 configuration were used for OS installation, 4 x 1TB WD NASware as RAID L10 for VM data (qcow2 and raw disk images).

In order to compare directly HP SAS 10000rpm and WD NASware SATA 5400rpm one had to run tests on single drives or equal RAID level, e.g. L1. Unfortunately, I had very lomited time, so my only concern was final system performance. HP SAS drives were used only for OS (CentOS/oVirt node) installation. The objective was to select VM data storage between RAID L5 and L10, and measure negative impact of NFS loopback necessary which is necessary for oVirt shared storage domain. And going forward (plus correlating numbers), I can state that latter has lower (sometimes up to 30%) performance in random read/write (due to higher access time and lower RPM), the rest is comparable. Higher read/write figures of WD NASware RAID is a result of parallel data transfer to several drives simultaneously.

RAID L0 – data stripped across 2 or more (n) drives, read and write transfer rates up to (n) times as high as the individual drive. Not used on mission critical servers, failure of 1 drive means complete data loss. However, for video editing it may be a great asset.
RAID L1 – data mirrored across 2 drives, performance slightly slower then single drive.
RAID L5 – data stripped between drives with distributed parity, write performance is usually increased (compared to single drive or RAID L1) since all RAID members participate in the serving of write requests, yet final scores heavily depends upon particular controller. Parity calculation overhead may play its negative role.
RAID L10 – nested RAID, stripe of mirrors, theoretically highest performance level after RAID 0.

A Little of Theory – and This is Really Important

Conventional hard drives are mechanical devices. In order to read piece of information from the platter, head had to be positioned over certain concentric track. Heads can’t be moved instantly from track to track, so the smaller chunks of information are scattered across plate, the lower will be speed of read/write. Don’t be fooled by gigabytes per second throughput in a factory data sheet, in most cases its only a theory (WD 2.5″ NASware 1TB specified at 144 MB/s internal transfer rate which is rarely achievable as clearly being seen from the benchmarks). Hard drive controllers, RAID cards, OS and software all use caching to improve performance. Large cache may significantly improve random write by means of caching and queuing data, yet it will not yield significant gain of random read, since even read ahead will result in many missed hits. It can be cured with specialized software design, e.g. database server, which may use large portion of RAM to keep most often accessed data like indexes to avoid excess disk operations.
In short – small chunk random read is the worst operation in terms of overall speed.
The fact outlined above implies crucial ruleto improve overall disk subsystem performance its much better to split data along several smaller RAIDs, rather then create single large one. For example, if you have 12-bay hot-swap drive cage, 3 x L10 RAIDs consisting of 4 hard drive each is much better then single L10 RAID populated with all available 12 disks.

OS Settings and Benchmark Software

In order to reduce influence of OS caching server was booted with grub “mem=1536M” option, limiting visible RAM to only 1.5GB. Partitions formatted as ext4 and mounted with noatime option. One additional test (crucial for oVirt) performed with ext4 partition mounted via loopback NFS (NFS server and NFS mount on same PC). Swap was turned off with “swapoff -a“. Gnome 3 default desktop enabled during tests. At first, I tried Phoronix Test Suite and was somewhat disappointed with bugs and quirks in this software. Additionally, running blogbench (file server load simulation) as standalone test revealed 30% – 40% inconsistency between consecutive runs (blogbench is also part of Phoronix Test Suite). Fortunately, IOzone proved to be very flexible, powerful and capable of producing very close results in several runs in a row.
I decided to perform 3 tests with IOzone. Command line options listed at the end of this page.
 
1) Single threaded test (record size 4kB/64kB/1MB, test file size 2GB) in order to check overall performance of the RAID controller + selected hard drives combination.
2) 3 parallel threads, record size 10 MB, test file size 1GB x 3. Typical scenario for shared network storage dedicated to large graphic and video editing.
3) 12 parallel threads, record size 64kB, test file size 200MB x 12. Simulates lightweight workload of several virtual machines running web, e-mail, database servers.
 
When judging results, take into account 25% read / 75% write cache settings on RAID controller!

Final charts (with comments) are show below.
For throughput graphs, higher bars are better, for CPU load – opposite.

For whatever reason, RAID L5 shows uneven discrepancy in performance numbers
 
As I stated before, small chunk random read/write is a worst case scenario, yielding minuscule numbers, chart had to be redrawn in a logarithmic scale to be readable (see below).
Here 10000rpm hard drives show their advantage, especially with record size 4kB.
 
In this 3-thread large file test, RAID L5 showed to be an excellent performer. CPU load % scales linearly with throughput. 4-disk RAID L5 can hog CPU up to 38.52%.
 
For oVirt VM storage domain, RAID L10 seems to be preferable solution over RAID L5 due to better random/read write throughput.

NFS Loopback

Impact of NFS loopback is very low, 10% or even less in average. Setup described above was used for a while with virtualization cluster running ISPConfig (on Debian), 2 accounting systems (1 on Linux, 1 on Windows), and several additional VMs for various needs (e.g. IP Telephony). I can rate performance as satisfactory for accounting systems, excellent for everything else.

YouTube Video

Short version (slide show) of this article available on YouTube (link).
Other oVirt-related videos:
oVirt – how to add 2nd (DMZ) network interface for virtual machines
oVirt – moving away from local storage setup to NFS/GlusterFS

IOZONE Command Line Options

1) Single threaded test (record size 4kB/64kB/1MB, test file size 2GB).
 
iozone -+u -p -+T -r 4k -r 64k -r 1m -s 2g -i 0 -i 1 -i 2 -i 8 -f /vmraid/iozone.tmp -R -b ~/bench/test-1thr.xls
 
2) 3 parallel threads, record size 10 MB, test file size 1GB x 3.
 
iozone -+u -p -t 3 -l 3 -u 3 -r 10m -s 1g -i 0 -i 1 -i 2 -i 8 -F /vmraid/F1 /vmraid/F2 /vmraid/F3 -R -b ~/bench/test-thr3.xls
iozone -O -+u -p -t 3 -l 3 -u 3 -r 10m -s 1g -i 0 -i 1 -i 2 -i 8 -F /vmraid/F1 /vmraid/F2 /vmraid/F3 -R -b ~/bench/test-thr3-io.xls
 
3) 12 parallel threads, record size 64kB, test file size 200MB x 12.
 
iozone -+u -p -t 12 -l 12 -u 12 -r 64k -s 200m -i 0 -i 1 -i 2 -i 8 -F /vmraid/F1 /vmraid/F2 /vmraid/F3 /vmraid/F4 /vmraid/F5 /vmraid/F6 /vmraid/F7 /vmraid/F8 /vmraid/F9 /vmraid/F10 /vmraid/F11 /vmraid/F12 -R -b ~/bench/test-thr12.xls
iozone -O -+u -p -t 12 -l 12 -u 12 -r 64k -s 200m -i 0 -i 1 -i 2 -i 8 -F /vmraid/F1 /vmraid/F2 /vmraid/F3 /vmraid/F4 /vmraid/F5 /vmraid/F6 /vmraid/F7 /vmraid/F8 /vmraid/F9 /vmraid/F10 /vmraid/F11 /vmraid/F12 -R -b ~/bench/test-thr12-io.xls
 

Leave a Reply