A Modified ID3 Algorithm for Continuous Numerical Attributes Using Cut Point Approach
Perry Alexander
Prasad Kulkarni
Data classification is a methodology of data mining used to organize data by relevant categories to obtain meaningful information. A model is generated from the input training set which is used to classify the test data into predetermined groups or classes. One of the most widely used models is a decision tree which uses a tree like structure to list all possible outcomes. Decision tree is an important predictive analysis method in Data Mining as it requires minimum effort from the users for data interpretation.
This project implements ID3, an algorithm for building decision tree using information gain metric. Furthermore, through illustrating the basic ideas of ID3, this project also addresses the inefficiency of ID3 in handling continuous numerical attributes. A cut point approach is presented to discretize the numeric attributes into discrete intervals and enable ID3 functionality for them. Experiments show that such decision trees contain fewer number of nodes and branches in contrast to a tree obtained by basic ID3 algorithm. This modified algorithm can be used to classify real valued domains containing symbolic and numeric attributes with multiple discrete outcomes.