Dynamic binary instrumentation for performance analysis on large scale architectures such as the IBM Blue Gene/L system (BG/L)
poses unique challenges. Their unprecedented scale and often limited OS support require new mechanisms to organize binary
instrumentation, to interact with the target application, and to collect the resulting data.
We describe the design and current status of a new implementation of the Dynamic Probe Class Library (DPCL) API for large
scale systems. DPCL provides an easy to use layer for dynamic instrumentation on parallel MPI applications based on the DynInst
dynamic instrumentation library for sequential platforms. Our work includes modifying DynInst to control instrumentation from
remote I/O nodes and porting DPCL’s communication for performance data collection to use MRNet, a tree-based overlay network
that (TBON) supports scalable multicast and data reduction. We describe extensions to the DPCL API that support instrumentation
of task subsets and aggregation of collected performance data.
Keywords Massively parallel architectures - binary instrumentation - scalable data collection - performance analysis tools