1、Data Mining Classification: Basic Concepts, Decision Trees, and Model EvaluationLecture Notes for Chapter 4Introduction to Data MiningbyTan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 Classification: Defi
2、nitionlGiven a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.lFind a model for class attribute as a function of the values of other attributes.lGoal: previously unseen records should be assigned a class as accurately as possible. A
3、 test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3 Illustrating Classification Task Tan,
4、Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 Examples of Classification TasklPredicting tumor cells as benign or malignantlClassifying credit card transactions as legitimate or fraudulentlClassifying secondary structures of protein as alpha-helix, beta-sheet, or random coillCategorizing
5、news stories as finance, weather, entertainment, sports, etc Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5 Classification TechniqueslDecision Tree based MethodslRule-based MethodslMemory based reasoninglNeural NetworkslNave Bayes and Bayesian Belief NetworkslSupport Vector Machines Ta
6、n,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6 Example of a Decision TreecategoricalcategoricalcontinuousclassRefundMarStTaxIncYESNONONOYes NoMarried Single, Divorced80KSplitting AttributesTraining Data Model: Decision Tree Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7 Ano
7、ther Example of Decision Treecategoricalcategoricalcontinuousclass MarStRefundTaxIncYESNONONOYes NoMarried Single, Divorced80KThere could be more than one tree that fits the same data! Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8 Decision Tree Classification TaskDecision Tree Tan,Ste
8、inbach, Kumar Introduction to Data Mining 4/18/2004 9 Apply Model to Test DataRefundMarStTaxIncYESNONONOYes NoMarried Single, Divorced80KTest DataStart from the root of tree. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10 Apply Model to Test DataRefundMarStTaxIncYESNONONOYes NoMarried Single, Divorced80KTest Data