Load balance is one of the critical factors affecting the overall per- formance of the BSP (Bulk Synchronous Parallel) programs.
Without sufficient performance profiling information generated by effective profiling tools, it is often difficult to find
out what extent and where load imbalance has occurred in a BSP program. In this paper, we introduce a new parallel performance
profil- ing system for the BSP model. The system traces and generates comprehensive information on timing and communication
by each process in each superstep. Its aim is to assist in the improvement of BSP program performance by identi- fying load
imbalance among processors. The profiling data is visualised via a series of performance profiling graphs, making it easier
to identify overloaded processes in a superstep. The visualising component of the system is written in Java, thus runs on
almost any type of computer systems.