The distributed shared memory (DSM) architecture simplifies development of parallel programs by relieving a user from the
tedious task of distributing data across processors. Furthermore, it allows incremental parallelization using, for example,
OpenMP or Java threads. While it is easy to demonstrate good performance on a few processors, achieving good scalability still
requires a good understanding of data flow in the application. In this paper we discuss ADAPT, an Automatic Data Alignment
and Placement Tool, that detects data congestions in FORTRAN array oriented codes and suggests code transformations to resolve
them. We then show how ADAPT suggested transformations, including data blocking, data placement, data transposition and page
size control improve performance of the NAS Parallel Benchmarks.