We worked with the Gaudet and Jeruzalmi labs at Harvard University over the past few weeks to develop an SBGrid-compatible setup by centrally-utilizing their newly acquired 4-node (8-CPU) XServe G5 cluster. The solution we've provided these affiliated labs is indicative of the level of support and commitment our affiliates have come to expect as members of SBGrid.
Initial Setup
When we were contracted in July for administrative services by the Gaudet and Jeruzalmi labs, their computer systems were part of an ad-hoc network, with individual Red Hat Linux computers hosting their own local user accounts. There was no data redundancy or effective file sharing. Our first task was to centralize the accounts and home directories by creating a master NIS/NFS server on one of their dual-Xeon CPUs. We began pushing our software distribution to the head node, which was cross mounted with the other clients in their subnet. While this solution was effective, the labs were interested in more high-performance computing.
XServe Integration
We were excited to integrate the labs' newly-acquired cluster of 2.3Ghz XServe G5s into their network; Apple has done an excellent job in creating high-availability and high-performance hardware to back up their robust server OS. We also recognized that by having OS X on the head node, many administrative tasks (such as user admin and file sharing priveleges) could be performed by the hosting labs. Because of this, the labs don't need to employ a highly-trained admin to handle routine tasks, and this frees us to work on larger projects for our affiliates.
We configured the head node (running Tiger) to utilize LDAP for user authentication on their Linux and Mac clients. Home directories and our software distribution are now shared via NFS from a RAID-5 volume. GridMP and PBSPro are being used for job scheduling. The XServe operates as a gigabit NAT router, which provides greater performance and protection for their networked clients. With this setup, the clients are minimally-configured and there is greater transparency on all nodes in their network.
Caveats
Immediately after implementing this solution, it was evident that the NFS support under OS X wasn't fully compatible with the Linux clients. Many legacy X11 applications failed to properly list the directory contents on NFS mounts. However, the OS wasn't reporting any errors and the built-in applications (such as Firefox and GIMP) that we tested worked fine with Apple's brand of NFS. Ultimately, we narrowed down the problem with the help of our Apple representatives to OS X's implementation of NFS version 3, which expects clients to use a 64-bit aware version of the readdir() function call. By temporarily mounting the NFS shares using version 2 on the Linux clients, we provided a functional workaround until we can achieve recompilation with the "-D_FILE_OFFSET_BITS=64" flag for the affected legacy applications.
Conclusion
In the course of a few months, we provided the Gaudet and Jeruzalmi labs with an effective alternative to their ad-hoc network environment. By utilizing a G5 cluster as their NFS/LDAP server and gateway, the labs can concentrate on using the scientific tools we provide with very little required knowledge. We identified and isolated a specific bug in Apple's implementation of NFS and employed a workaround that is easily reversed when no longer needed.
SBGrid - News and Events
SBGrid - News and Events
Case Study: SBGrid XServe G5 Integration
Updated: March 27, 2014
Originally Published: August 9, 2005