Predictive performance in model selection is often estimated using out-of-sample validation and test datasets. The assumption
is that the test and validation datasets are from the same population as the training dataset. This assumption may not apply
in the common application context where the model is applied to scoring of future data. This paper proposes a sample design
which can lead to better model performance and robust estimates of model generalization error. The sample design is shown
applied to a collection scoring application.