Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Khiops: A Discretization Method of Continuous Attributes with Guaranteed Resistance to Noise

Marc BoulléContact Information

(5)  France Telecom R&D, 2, Avenue Pierre Marzin, 22300 Lannion, France
Abstract
In supervised machine learning, some algorithms are restricted to discrete data and need to discretize continuous attributes. The Khiops discretization method, based on chi-square statistics, optimizes the chi-square criterion in a global manner on the whole discretization domain. In this paper, we propose a major evolution of the Khiops algorithm, that provides guarantees against overfitting and thus significantly improve the robustness of the discretizations. This enhancement is based on a statistical modeling of the Khiops algorithm, derived from the study of the variations of the chi-square value during the discretization process. This modeling, experimentally checked, allows to modify the algorithm and to bring a true control of overfitting. Extensive experiments demonstrate the validity of the approach and show that the Khiops method builds high quality discretizations, both in terms of accuracy and of small interval number.
French patents No 01 07006 and No 02 16733

Contact Information Marc Boullé
Email: marc.boulle@francetelecom.com
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.108 • Server: mpweb07
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)