The Leibniz-Rechenzentrum in Munich has started operating a 112-node Hitachi SR8000-F1 with a peak performance of 1.3 Teraflops
in the second quarter of 2000, the fastest computer in Europe. In order to make use of the full memory bandwidth and hence
to obtain a significant fraction of the peak performance for memory intensive applications, the compilers offer preload and
prefetch optimization strategies to pipeline load/store operations, as well as automatic parallelization across the 8 processors
contained in every node. The nodes are connected by a conflict-free crossbar, enabling efficient communication via standard
message-passing interfaces. An overview of the innovative architectural concepts is given. We demonstrate to which extent
the capabilities of the compiler to automatically pseudovectorize/parallelize typical application code are sufficient to produce
well-performing code.