1、Data MiningPractical Machine Learning Tools and TechniquesSlides for Chapter 8 of Data Mining by I. H. Witten, E. Frank andM. A. HallData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Ensemble learningCombining multiple modelsThe basic ideaBaggingBias-variance decomposition, bag
2、ging with costsRandomizationRotation forestsBoostingAdaBoost, the power of boostingAdditive regressionNumeric prediction, additive logistic regressionInterpretable ensemblesOption trees, alternating decision trees, logistic model treesStackingData Mining: Practical Machine Learning Tools and Techniq
3、ues (Chapter 8)Combining multiple modelsBasic idea:build different “experts”, let them voteAdvantage:often improves predictive performanceDisadvantage:usually produces output that is very hard to analyzebut: there are approaches that aim to produce a single comprehensible structureData Mining: Pract
4、ical Machine Learning Tools and Techniques (Chapter 8)BaggingCombining predictions by voting/averagingSimplest wayEach model receives equal weight“Idealized” version:Sample several training sets of size n(instead of just having one training set of size n)Build a classifier for each training setCombi
5、ne the classifiers predictionsLearning scheme is unstable almost always improves performanceSmall change in training data can make big change in model (e.g. decision trees)Data Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Bias-variance decompositionUsed to analyze how much sele
6、ction of any specific training set affects performanceAssume infinitely many classifiers,built from different training sets of size nFor any learning scheme,Bias = expected error of the combinedclassifier on new dataVariance= expected error due to theparticular training set usedTotal expected error
7、bias + varianceData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)More on baggingBagging works because it reduces variance by voting/averagingNote: in some pathological hypothetical situations the overall error might increaseUsually, the more classifiers the betterProblem: we on
8、ly have one dataset!Solution: generate new ones of size n by sampling from it with replacementCan help a lot if data is noisyCan also be applied to numeric predictionAside: bias-variance decomposition originally only known for numeric predictionData Mining: Practical Machine Learning Tools and Techn
9、iques (Chapter 8)Bagging classifiersLet n be the number of instances in the training dataFor each of t iterations:Sample n instances from training set(with replacement)Apply learning algorithm to the sampleStore resulting modelFor each of the t models:Predict class of instance using modelReturn clas
10、s that is predicted most oftenModel generationClassificationData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Bagging with costsBagging unpruned decision trees known to produce good probability estimatesWhere, instead of voting, the individual classifiers probability estimates
11、are averagedNote: this can also improve the success rateCan use this with minimum-expected cost approach for learning problems with costsProblem: not interpretableMetaCost re-labels training data using bagging with costs and then builds single treeData Mining: Practical Machine Learning Tools and Te
12、chniques (Chapter 8)RandomizationCan randomize learning algorithm instead of inputSome algorithms already have a random component: eg. initial weights in neural netMost algorithms can be randomized, eg. greedy algorithms:Pick from the N best options at random instead of always picking the best optio
13、nsEg.: attribute selection in decision treesMore generally applicable than bagging: e.g. random subsets in nearest-neighbor schemeCan be combined with baggingData Mining: Practical Machine Learning Tools and Techniques (Chapter 8)Rotation forests Bagging creates ensembles of accurate classifiers wit
14、h relatively low diversity Bootstrap sampling creates training sets with a distribution that resembles the original data Randomness in the learning algorithm increases diversity but sacrifices accuracy of individual ensemble members Rotation forests have the goal of creating accurate and diverse ensemble members