This paper considers code optimization using the novel TS1xx processor from Analog Devices. Very large instruction word architectures
(VLIW), such as the TS1xx represent the state of the art in high-performance signal processing. The theoretically achievable
peak performance of VLIW processors increases steadily with the use of on-chip parallelism. It is demonstrated that C compiler
technology cannot achieve peak computing rates on a statically scheduled processor and the applications programmer must rely
on hand optimized Assembler Libraries. This necessitates intimate knowledge of the specific compiler optimization techniques,
as well as the underlying hardware. Compiler friendly code optimized by the VisualC2.0 compiler, is compared against hand
optimized Assembler code for a common operation involving a loop with multiple memory accesses, floating point arithmetic
and pointer operations. It is found that mature C code for matrix vector multiplication executes in roughly 1.18 * n * m cycles, whereas the same operation optimized in assembler has a cycle complexity of 0.5 * n(m + 16) − a measurable performance improvement.