The Itanium processor, an implementation of an Explicitly Parallel Instruction Computing (EPIC) architecture, is an in-order
processor that fetches, executes, and forwards results to functional units in-order. The architecture relies heavily on the
compiler to expose Instruction Level Parallelism (ILP) to avoid stalls created by in-order processing.
The goal of this paper is to examine, in small steps, changing the in-order Itanium processor model to allow execution to
be performed out-of-order. The purpose is to overcome memory and functional unit latencies. To accomplish this, we consider
an architecture with Pending Functional Units (PFU). The PFU architecture assigns/schedules instructions to functional units in-order. Instructions sit at the pending
functional units until their operands become ready and then execute out-of-order. While an instruction is pending at a functional
unit, no other instruction can be scheduled to that functional unit. We examine several PFU architecture designs. The minimal
design does not perform renaming, and only supports bypassing of non-speculative result values. We then examine making PFU
more aggressive by supporting speculative register state, and then finally by adding in register renaming. We show that the
minimal PFU architecture provides on average an 18% speedup over an in-order EPIC processor and produces up to half of the
speedup that would be gained using a full out-of-order architecture.