Relational database systems have traditionally optimized for I/O performance and organized records sequentially on disk pages
using the N-ary Storage Model (NSM) (a.k.a., slotted pages). Recent research, however, indicates that cache utilization and
performance is becoming increasingly important on modern platforms. In this paper, we first demonstrate that in-page data
placement is the key to high cache performance and that NSM exhibits low cache utilization on modern platforms. Next, we propose
a new data organization model called PAX (Partition Attributes Across), that significantly improves cache performance by grouping
together all values of each attribute within each page. Because PAX only affects layout inside the pages, it incurs no storage
penalty and does not affect I/O behavior. According to our experimental results (which were obtained without using any indices
on the participating relations), when compared to NSM: (a) PAX exhibits superior cache and memory bandwidth utilization, saving
at least 75% of NSM's stall time due to data cache accesses; (b) range selection queries and updates on memory-resident relations
execute 17–25% faster; and (c) TPC-H queries involving I/O execute 11–48% faster. Finally, we show that PAX performs well
across different memory system designs.
Keywords:Relational data placement – Disk page layout – Cache-conscious database systems
Received: November 1, 2001 / Accepted: August 29, 2002 Published online: November 22, 2002