Tech Info 157: HELIOS LanTest performance vs. Finder/Explorer copy performance

HELIOS Tech Info #157

Wed, 6 Aug 2014

HELIOS LanTest performance vs. Finder/Explorer copy performance

When copying large files from and to the server with Mac OS X (AFP) or Windows (SMB) via the Mac Finder or Windows Explorer, copying may be significantly faster compared to the HELIOS LanTest UB64 read/write results. This has two reasons:

Mac Finder copy block size

The Mac automatically increases the Finder copy read/write block size up to several megabytes, until there is no additional performance gain, and continues with the most stable performance. This is a dynamic process and the Finder does a great job. HELIOS LanTest uses a static block size during the entire test depending on the selected network speed, e.g. for Gigabit networks it uses 128 kByte buffers. The HELIOS LanTest approach is more realistic because no application uses a dynamic adjustment and there is no API doing so automatically. Another reason is that HELIOS LanTest should measure the network performance and the server latency and doing super large block sizes would not include the server latency. Network and server may be capable to offer additional performance when multiple threads are reading/writing in parallel and when super large blocks are being used. However, this is not realistic because it causes drawdowns on slower network connections and limited network switch buffering as well as limited socket I/O buffers on the server and client.

HELIOS LanTest comes much closer to real client/server environments and measures the true performance, including the latency.

Block sizes used in HELIOS LanTest:
Network Block size   Test file size  
Slow networks, e.g. Standard Ethernet/DSL 32 kBytes 3 MB
Fast networks, e.g. Fast Ethernet 64 kBytes 30 MB
Very fast networks, e.g. Gigabit Ethernet 128 kBytes 300 MB
Enterprise networks, e.g. 10 Gigabit Ethernet 1024 kBytes 3000 MB
Backbone networks, e.g. 40 Gigabit Ethernet   4096 kBytes 12000 MB

HELIOS believes that these block sizes are realistic and represent typical use cases.

Windows Explorer CopyFile behavior

Starting with Windows 7, customers experience very fast copy speeds to and from the HELIOS PCShare file server. Microsoft internally optimized the CopyFile API which does up to eight asynchronous reads or writes in parallel, each of 32 kByte size. This feature is only available with the CopyFile API or doing asynchronous reads and writes in parallel, e.g. with multiple threads. The new CopyFile is an improvement for SMB servers when clients copy files using the Explorer or the CopyFile API. HELIOS LanTest measures the performance with sequential read/write operations using a specified block size, e.g. 128 kByte using Gigabit Ethernet testing. This reflects much better how applications work because it is very unlikely that applications use multiple asynchronous I/O requests in parallel. Depending on the network, client and server performance in a LanTest measurment of a 1 Gb network may get around 60 MB/sec reading and writing, the Windows Copy file may get around 100 MB due to multiple asynchronous I/Os.

Jumbo Frames – Ethernet MTU (maximum transfer unit)

By default,lower level TCP/IP packets are 1500 bytes large which is the standard for all servers and clients that are available today. Using Gigabit Ethernet and newer, MTU sizes up to 9220 bytes are allowed. MTU sizes larger than 1500 highly depend on the settings of the network NIC, Network Switch, VLAN, and layer 3 routing capabilities, as well as client and server OS settings. Larger MTU sizes can also result in failing TCP/IP connections and must be tested and enabled with care. Often a dual network interface config is suitable, e.g.: 1 GB Ethernet for standard services and Internet use, and in addition a 10 GbE Interface with Jumbo Frames from the client directly to the server for dedicated performance reasons, e.g. video editing, file syncing, etc.

Server and network latency testing

To measure the latency between the LanTest client and the server, run the “Lock/Unlock” test. As recording locking is based on many little 10 byte lock requests (forty rotating concurrent locks are being used) it does not involve any disk I/O. The HELIOS EtherShare and PCShare server implementation uses a shared memory for locking and there is no waiting operating system API involved. This means the locking test performance depends only on network speed, TCP/IP performance, server process scheduling, and finally the client OS with its network re-director. If latency is high, “Lock/Unlock” testing takes forever and investigation in the source of the performance problem is needed. Using different network connections, e.g. a cross-over cable between client and server, or different clients, or a different server allows isolating the problem. The LanTest latency testing is perfect because it omits any disk I/O and allows end-to-end client/server testing.

Summary

There is a race between Microsoft and Apple, with both vendors enhanced Finder/Explorer file copy usage within several releases. This works great against HELIOS servers and brings an even greater performance with EtherShare (AFP) and PCShare (SMB1). (SMB2 is still in development at HELIOS and not ready for production.) HELIOS LanTest offers true performance results and reflects how 99% of applications access files on server volumes. The HELIOS LanTest testing procedures are identical on Mac and Windows clients. LanTest also turns off client side caching to ensure that it represents end-to-end performance between the client and the server, including disk I/O, TCP and network NIC performance, as well as process task switching latency. HELIOS LanTest is perfect for file server performance measuring and reliability testing.

References

HELIOS LanTest: http://www.helios.de/web/EN/products/LanTest.html
AFP vs. SMB and NFS file sharing for network clients: http://www.helios.de/web/EN/news/AFP_vs_SMB-NFS.html
10 Gb Ethernet tuning in VMware ESX/Mac/Linux environments: http://www.helios.de/web/EN/support/TI/154.html
Tech Info 156: Optimizing RAID storage for high-performance server environments