The Complex Streamed Instruction (CSI) set is an architectural paradigm designed to accelerate multimedia applications. These
applications are characterized by streaming operations on small-width data elements such as 8-bit pixels or 16-bit audio samples.
CSI instructions operate on two-dimensional data streams in a SIMD fashion and are able to process streams of arbitrary length.
In this paper we evaluate the performance of the CSI architecture on a set of important image processing kernels. These kernels
are characterized by little data reuse which results in poor cache performance. Simulation results show that CSI provides
a speedup by a factor of up to 3.98 (2.60 on average) compared to Sun’s media ISA extension VIS. We also analyze the scalability
of VIS and CSI with respect to memory bandwidth. The results show that CSI scales much better than VIS with increasing bandwidth.