Stream parallelism allows parallel programs to exploit the potential of executing different parts of the computation on distinct
input data items. Stream parallelism can also exploit the concurrent evaluation of the same function on different input items.
These techniques are usually named “pipelining” and “farming out”. The P3L language includes two stream parallel skeletons: the Pipe and the Farm constructors. The paper presents a methodology for
efficient implementation of the P3L Pipe and Farm on a BSP computer. The methodology provides a set of analytical models to predict the constructors performance
using the BSP cost model. Therefore a set of optimisation rules to decide the optimal degree of parallelism and the optimal
size for input tasks (grain) are derived. A prototype has been validated on a Cluster of PC and on a Cray T3D computer.