(originally posted on Google+ 07/2014)
Today's file server bottlenecks are often storage performance related, rather than due to CPU performance limits. This article describes the performance and reliability considerations when using RAID storage connected to a server. The outcome of our performance considerations applies to most server setups and is independent of the storage system and server hardware used, and whether the server runs a native OS, or a VM server environment like VMWare ESX or Microsoft Hyper-V with multiple VMs.
RAID level considerations
In our case we have 24 disks within a RAID enclosure/controller. It would be perfect to use just RAID 1 mirrored disks, which means 12 disks with each one mirrored. This would provide twelve channels of independent I/O, and a disk failure would only affect 1/12 of the total storage. However this RAID 1 setup would waste 50% of the storage capacity and sequential I/O is limited to the performance of any single disk.
A good tradeoff is to use RAID 5 (or RAID 6) and to use as many RAID 5 groups as possible. Putting all 24 disks into a single RAID group is a big mistake: even though large RAID groups offer great sequential read/write performance, processing many concurrent (non-sequential) I/O requests will be horribly slow. This is because each request needs to synchronize 24 disks before it can read or write a single block.
After extensive testing, we decided to use four RAID 5 groups each containing 5 disks (plus one spare disk per RAID group). Our 15k 300 GB SAS disks (Serial attached SCSI) offer about 200 MByte/sec with 1-2 ms seek time. With five disks the capacity per group would be 1.2 TB (not 1.5 TB, because RAID 5 offers the capacity of the number of disks minus 1). The maximum sequential read/write performance would be up to 800 MByte/sec for one RAID 5 group. Random I/O would be reasonable because it needs to synchronize only five disks, and the three other RAID groups can do independent I/O in parallel. Another benefit is that a failure of a RAID 5 group requires only a restore of 1.5 TB data versus 7 TB using a single RAID 5 group. This is a great tradeoff, it would be even better to use fewer disks per RAID 5 group.
HBA (Host Bus Adapter) controller considerations
Today's enterprise disks are connected via 6 Gbit/sec SAS which offers up to 600 MByte/sec throughput. Each SAS controller usually has one or more mini SAS host bus adapters (SFF 8087 internal or SFF 8088 external connectors). An SAS adapter has four SAS channels, each 6 Gbit. An HBA to an external RAID will use all four channels in parallel via a single cable, which means it offers 24 Gbit/sec, (a max. of 2400 MByte/sec). The problem is that when a lot of data is produced, e.g. backing up to disk, unpacking large archives, or copying large amounts of data, the controller may queue up a lot. E.g., for LSI, the max_queue_depth is usually 600 I/O requests but can be configured even higher. In addition, the operating system I/O queue can be gigabytes of data. This means a single HBA connection to the storage can be busy for many seconds because the I/O queue is so high that additional requests may take forever (e.g. 10 seconds duration for a single ssh login, or 120 seconds duration for mounting an AFP or SMB server volume). To overcome this problem the best practice is to have multiple HBA connections to the storage. Usually a RAID has multiple SAS or Fibre Channel ports. Therefore configure each RAID group to a dedicated port and use multiple HBAs with separate lines to the RAID, preferably a dedicated line per RAID group. It sounds complex, but offers the best performance. Using only a single line between the server and the storage works, but is a single bottleneck where all I/O needs to pass, and under heavy load this is worse. We spent weeks of testing with multiple servers using internal as well as external RAIDs. Our final configuration is an external RAID connected via four SAS lines (each 24 Gbit) to the server.
Volume setup considerations
For volumes we talk about file systems which may serve as NFS, AFP and SMB volumes to clients, or even serve as the storage for a VM. It is recommended to use multiple volumes, each linked to one or more RAID groups. The best configuration is to make sure that busy volumes are assigned to different RAID groups. This offers independent and parallel I/O with great performance. A volume limited to the maximum size of a single RAID group has the advantage that backup and restore is easier and can be done within a shorter time period. However, volume management like LVM2 or ZFS can combine multiple RAIDs into a single larger volume when needed. All this is file system and operating system independent and applies to any server OS or VM server environment.
HELIOS desktop and index server database
It is recommended to use the “desktoplocation” preference to relocate the desktop databases to a separate idle disk. This has the advantage that busy volumes will not not delay the database access. Another benefit is that an out of disk space volume (AFP or SMB) will not run into problems saving database content. The desktop location disk does not need to be a flash disk, a single idle disk for all volumes would be perfect.
Most customers probably just plug together their storage and server hardware and use a single RAID, maybe with a separate system disk. They may experience performance problems later on. For busy servers or for a VM server solution which migrates multiple servers into multiple VMs on a single box, the storage design is most important. Even when using entirely SSD-based storage, it still can be busy and the same rules may apply. For heavy-duty servers environments, there can also be good reasons to deploy dedicated servers, each with their own storage, to get predicable performance. Keep in mind that the average disk access time has not changed much over the last two decades and that RAID storage adds additional latency to it. Fibre Channel and iSCSI increase latency further. Note that a single server with a single disk may be faster than a poorly designed storage solution. We've spent a lot of time testing over the years. Recently we did extensive testing of our new servers before they went into production. This whole article is on based on our know-how and experience. We hope this information is valuable to you.