Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Overfitting Avoidance as Bias

Cullen Schaffer1

(1) Department of Computer Science, CUNY/Hunter College, 695 Park Avenue, New York, NY, 10021

Abstract  Strategies for increasing predictive accuracy through selective pruning have been widely adopted by researchers in decision tree induction. It is easy to get the impression from research reports that there are statistical reasons for believing that these overfitting avoidance strategies do increase accuracy and that, as a research community, we are making progress toward developing powerful, general methods for guarding against overfitting in inducing decision trees. In fact, any overfitting avoidance strategy amounts to a form of bias and, as such, may degrade performance instead of improving it. If pruning methods have often proven successful in empirical tests, this is due, not to the methods, but to the choice of test problems. As examples in this article illustrate, overfitting avoidance strategies are not better or worse, but only more or less appropriate to specific application domains. We are not—and cannot be—making progress toward methods both powerful and general.

Overfitting avoidance - decision tree pruning - inductive bias


References

Barren, A.R., amp; Cover, T.M. (1991). Minimum complexity density estimation.IEEE Transactions on Information Theory, 37, 1034–1054.
 
Blumer, A., Ehrenfeucht, A., Haussler, D., amp; Warmuth, M.K. (1987). Occam'srazor. Information Processing Letters, 24, 377–380.
 
Breiman, L., Friedman, J., Olshen, R., amp; Stone, C. (1984). Classification andRegression Trees. Pacific Grove,CA: Wadsworth amp; Brooks.
 
Buntine, W. (1990). A theory of learning classification rules. Doctoraldissertation, University of Technology, Sydney.
 
Cestnik, B., amp; Bratko, I. (1991). On estimating probabilities in tree pruning. InY., Kodratoff (Ed.), Machine learning, EWSL-91. Berlin: Springer-Verlag.
 
Conover, W.J. (1980). Practical nonparametric statistics. New York: JohnWiley.
 
Holte, R.C. (1991). Very simple classification rules perform well on most datasets(Technical Report TR-91-16).Ottawa, Canada: University of Ottawa, Department of Computer Science.
 
Jensen, D. (1991). Induction with randomization testing: Decision-orientedanalysis of large data sets. Doctoraldissertation, Washington University, Sever Institute of Technology.
 
Mingers, J. (1987). Expert systems-rule induction with statistical data. Journal ofthe Operational Research Society,38, 39–47.
ChemPortPubMed
 
Mingers, J. (1989). An empirical comparison of pruning methods for decision treeinduction. Machine Learning,4, 227–243.
 
Niblett, T., amp; Bratko, I. (1986). Learning decision rules in noisy domains. InM.A. Bramer (Ed)., Research and development in expert systems III,Cambridge: Cambridge University Press.
 
Pearl, J. (1978). On the connection between the complexity and credibility of inferredmodels. International Journal of General Systems, 4, 255–264.
 
Quinlan, J.R. (1986). The effect of noise on concept learning. In R.S. Michalski, J.G. Carbonell, amp; T.M. Mitchell(Eds.), Machine learning: An artificial intelligence approach (Vol. 2).San Mateo, CA: Morgan Kaufmann.
 
Quinlan, J.R. (1987). Simplifying decision trees. International Journal ofMan-Machine Studies, 27, 221–234.
CrossRef
 
Quinlan, J.R., amp; Rivest, R.L. (1989). Inferring decision trees using the minimumdescription length principle.Information and Computation, 80, 227–248.
 
Schaffer, C. (1992a). Deconstructing the digit recognition problem. MachineLearning: Proceedings of the Ninth International Conference (pp. 394–399).San Mateo, CA: Morgan Kaufmann.
 
Schaffer, C. (1992b). Sparse data and the effect of overfitting avoidance in decision treeinduction. Proceedings of the Tenth National Conference on ArtificialIntelligence (pp. 147–152). Cambridge, MA: MIT Press.
 
Schaffer, C. (1991). When does overfitting decrease prediction accuracy in induceddecision trees and rule sets?In Y. Kodratoff (Ed.), Machine learning, EWSL-91. Berlin: Springer-Verlag.
 
Valiant, L.G. (1984). A theory of the learnable. Communications of the ACM,27, 1134–1142.
 
Weiss, S., amp; Indurkhya, N. (1991). Reduced complexity rule induction. InProceedings of the 12th International Joint Conference on Artificial Intelligence(pp. 678–684). San Mateo, CA: Morgan Kaufmann.
 


Export this article
Export this article as RIS | Text
 
Remote Address: 38.107.191.110 • Server: mpweb23
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)