The availability of low cost powerful parallel graphics cards has stimulated a trend to port GP on Graphics Processing Units
(GPUs). Previous works on GPUs have shown evaluation phase speedups for large training cases sets. Using the CUDA language
on the G80 GPU, we show it is possible to efficiently interpret several GP programs in parallel, thus obtaining speedups also
for small training sets starting at less than 100 training cases. Our scheme was embedded in the well-known ECJ library, providing
an easy entry point for owners of G80 GPUs.