Root-causing vastly different performance on iozone O_SYNC benchmark for two HDD manufacturers

I have two servers A and B with the following configuration:

  • A: 4TB HDDs, with RAID 1 (MegaRAID SAS 2008), 128MB cache, no BBU, write-through mode, 7.2k RPM, manufacturer A.
  • B: 1.5TB HDDs, with RAID 1 (MegaRAID SAS 3108), 64MB cache, with BBU, but write-through mode, 10.5k RPM, manufacturer B.

I run the following benchmark on a single RAIDed partition: iozone -a -s 10240 -r 4 -+r

Results from A (excerpt):

                                                            random  random    bkwd   record   stride
          kB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
       10240       4     108     474  4193564  6667334 6556395     701 4058822      475  3653175  2303202  2616201 6785306  6101840

Results from B (excerpt):

                                                            random  random    bkwd   record   stride
          kB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
       10240       4    3332   46961  5478410  6836065 4994841    2951 2853077      728  2299133  1722202  2008983 4549365  4712594

Both servers have write-through caching enabled, but I am unable to root-cause why the write-throughput performance is horribly slow on server A (108kB/sec) when compared to server B (3332 kB/sec), assuming I am interpreting the results correctly.

What could be the reason? Both servers have identical other file system options (ext4/same default options).

Could it just be the case that disks from manufacturer B are superior to those from A for workloads involving a lot of synchronous writes?

thanks.

Answer

Regarding the measured 33x difference between your results, following up on our discussion in the comments, it turned out, that MegaCli64 -LDGetProp -DskCache -Lall -aAll showed that setup B had the disk drive cache enabled by default, while it was disabled on setup A.

Using MegaCli64 -LDSetProp -DisDskCache -Immediate -Lall -aAll resulted in both systems showing a similiar performance.

Is it safe to run the RAID with disk drive cache enabled?

Running a RAID with disk drive cache enabled is actually similiar to running a RAID controller with non BBU backed volatile cache with write caching enabled (forced write-back mode). It enhances the performance, but at the same time increases the possibility of data-loss and data-inconsistency in the event of a power failure.

If you want to avoid this chance, while still having a decent I/O performance, it is advisable to have a controller with BBU backed-cache and to configure your volume to write-back mode with disk caching disabled.

The difference between your two RAID controllers

I don’t know if you already knew, but there is more between software and hardware RAID (this is an interesting article regarding this).

In the end the MegaRAID SAS 2008 is more or less an HBA or IO-Controller with added RAID capability, while the MegaRAID SAS 3108 is a real RAID Controller™ (also called ROC or RAID-on-Chip), which has a dedicated processor for handling the RAID calculations.

The SAS 2008 is especially known for horrible write performance with some OEM firmwares (like the DELL one in the PERC H310 which I mentioned in the comment).

Especially the synchronous mode in combination with your chosen record length and file size seems to result in really poor results with software/fake RAID.

For reference, this is what I get on my workstation using 10k WD Velocity Raptors in software RAID1:

                                                    random  random    bkwd   record   stride                                   
      KB  reclen   write rewrite    read    reread    read   write    read  rewrite     read   fwrite frewrite   fread  freread
   10240       4     182     181  1804774  2127084 2110984     167 1673159      153  1760968   954589  1203989 2022512  2062824

If you are running in synchronous mode (O_SYNC) your Result A seems therefore to be reasonable in terms of what can be delivered via soft/fake-RAID.


Does write-through cache mode cause a performance degradation of the array over time?

I don’t think so. With an activated write-cache, the controller is able to perform certain operations to optimize the pending write operations.

For example this description of the cache operation is taken from the whitepaper for HP Smart Array controllers:

The write cache will typically fill up and remain full most of the
time in high-workload environments. The controller uses this
opportunity to analyze the pending write commands to improve their
efficiency. The controller can use write coalescing that combines
small writes to adjacent logical blocks into a single larger write for
quicker execution. The controller can also perform command
reordering, rearranging the execution order of the writes in the cache
to reduce the overall disk latency.

As you can read, the cache is used to further enhance the write-performance of the array, but this does not seem to have any impact on the performance of any subsequent write or read operations.

Regarding disk-fragmentation, this is a file-system/OS level problem. The RAID controller – operating on the block level – isn’t able to optimize file system fragmentation at all, so there is no difference if it operates in write-trough or write-back mode.

Attribution
Source : Link , Question Author : Vimal , Answer Author : s1lv3r

Leave a Comment