Media processing has motivated strong changes in the focus and design of processors. The inclusion of μSIMD multimedia extensions such as MMX is a cost effective option to improve the performance of those regions of the program
with large amounts of DLP. This paper provides an initial evaluation of μSIMD and vector-SIMD enhanced VLIW architectures. We show that these two architectures execute respectively an average of
40% and 57% fewer operations than the reference VLIW architecture. However, when most of the available DLP parallelism has
been exploited via multimedia extensions or wide-issue static scheduling, the remaining of the program exhibits only modest
amounts of ILP (1.40 operations per cycle for a 8-issue width architecture). We claim that, in general, vector-SIMD extensions
achieve the highest speed-ups while still reducing the fetch pressure, although for wide-issue μSIMD architectures reach a similar performance at a lower cost.