Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
NPACI Rocks Clusters: Tools for Easily Deploying and Maintaining Manageable High-Performance Linux Clusters
| |
|
NPACI Rocks Clusters: Tools for Easily Deploying and Maintaining Manageable High-Performance Linux Clusters
Philip M. Papadopoulos6 , Mason J. Katz6 and Greg Bruno6 
| (6) |
San Diego Supercomputer Center, La Jolla, CA, USA |
Abstract
High-performance computing clusters (commodity hardware with low-latency, high-bandwidth interconnects) based on Linux, are
rapidly becoming the dominant computing platform for a wide range of scientific disciplines. Yet, straightforward software
installation, maintenance, and health monitoring for large-scale clusters has been a consistent and nagging problem for non-cluster
experts. The complexity of managing hardware heterogeneity, tracking security and bug fixes, insuring consistency of software
across nodes, and orchestrating wholesale (or forklift) upgrades of Linux OS releases (every 6 months) often discourages would-be
cluster users.
The NPACI Rocks toolkit takes a fresh perspective on management and installation of clusters to dramatically simplify this
software tracking. The basic notion is that complete (re)installation of OS images on every node is an easy function and the
preferred mode of software management. The NPACI Rocks toolkit builds on this simple notion by leveraging existing single-node
installation software (Red Hat’s Kickstart), scalable services (e.g., NIS, HTTP), automation, and database-driven configuration
management (MySQL) to make clusters approachable and maintainable by non-experts. The benefits include straightforward methods
to derive user-defined distributions that facilitate testing and system development and methods to easily include the latest
upgrades and security enhancements for production environments. Installation performance has good scaling properties with
a complete reinstallation (from a single server 100 Mbit http server) of a 96-node cluster taking only 28 minutes. This figure
is only 3 times longer than reinstalling just a single node.
The toolkit incorporates the latest Red Hat distribution (including security patches) with additional cluster-specific software.
Using the identical software tools that are used to create the base distribution, users can customize and localize Rocks for
their site. This flexibility means that the software structure is dynamic enough to meet the needs of cluster-software developers,
yet simple enough to allow non-experts to effectively manage clusters. Rocks is a solid infrastructure and is extensible so
that the community can adapt the software toolset to incorporate the latest functionality that defines a modern computing
cluster. Strong adherence to widely-used (de facto) tools allows Rocks to move with the rapid pace of Linux development.
Fulltext Preview (Small, Large)
|
|
|
|
|
|