Clusters of loosely connected machines are becoming an important model for commercial computing. The cost/performance ratio
makes these scale-out solutions an attractive platform for a class of computational needs. The work we describe in this paper
focuses on understanding performance when using a scale-out environment to run commercial workloads. We describe the novel
scale-out environment we configured and the workload we ran on it. We explain the unique performance challenges faced in such
an environment and the tools we applied and improved for this environment to address the challenges. We present data from
the tools that proved useful in optimizing performance on our system. We discuss the lessons we learned applying and modifying
existing tools to a commercial scale-out environment, and offer insights into making future performance tools effective in
this environment.