数据挖掘导论英文chap9-advanced-cluster-analysis.ppt

上传人:99****p 文档编号:1420384 上传时间:2019-02-25 格式:PPT 页数:37 大小:1.10MB
下载 相关 举报
数据挖掘导论英文chap9-advanced-cluster-analysis.ppt_第1页
第1页 / 共37页
数据挖掘导论英文chap9-advanced-cluster-analysis.ppt_第2页
第2页 / 共37页
数据挖掘导论英文chap9-advanced-cluster-analysis.ppt_第3页
第3页 / 共37页
数据挖掘导论英文chap9-advanced-cluster-analysis.ppt_第4页
第4页 / 共37页
数据挖掘导论英文chap9-advanced-cluster-analysis.ppt_第5页
第5页 / 共37页
点击查看更多>>
资源描述

1、Data MiningCluster Analysis: Advanced Concepts and AlgorithmsLecture Notes for Chapter 9Introduction to Data MiningbyTan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 Hierarchical Clustering: RevisitedlCrea

2、tes nested clusterslAgglomerative clustering algorithms vary in terms of how the proximity of two clusters are computedu MIN (single link): susceptible to noise/outliersu MAX/GROUP AVERAGE: may not work well with non-globular clusters CURE algorithm tries to handle both problemslOften starts with a

3、proximity matrix A type of graph-based algorithm Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3 l Uses a number of points to represent a clusterl Representative points are found by selecting a constant number of points from a cluster and then “shrinking” them toward the center of the c

4、lusterl Cluster similarity is the similarity of the closest pair of representative points from different clustersCURE: Another Hierarchical Approach Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 CURElShrinking representative points toward the center helps avoid problems with noise and

5、 outlierslCURE is better able to handle clusters of arbitrary shapes and sizes Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5 Experimental Results: CUREPicture from CURE, Guha, Rastogi, Shim. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6 Experimental Results: CUREPicture

6、 from CURE, Guha, Rastogi, Shim.(centroid)(single link) Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7 CURE Cannot Handle Differing DensitiesOriginal Points CURE Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8 Graph-Based ClusteringlGraph-Based clustering uses the proximit

7、y graph Start with the proximity matrix Consider each point as a node in a graph Each edge between two nodes has a weight which is the proximity between the two points Initially the proximity graph is fully connected MIN (single-link) and MAX (complete-link) can be viewed as starting with this graph

8、lIn the simplest case, clusters are connected components in the graph. Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9 Graph-Based Clustering: SparsificationlThe amount of data that needs to be processed is drastically reduced Sparsification can eliminate more than 99% of the entries in

9、 a proximity matrix The amount of time required to cluster the data is drastically reduced The size of the problems that can be handled is increased Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10 Graph-Based Clustering: Sparsification lClustering may work better Sparsification techniq

10、ues keep the connections to the most similar (nearest) neighbors of a point while breaking the connections to less similar points. The nearest neighbors of a point tend to belong to the same class as the point itself. This reduces the impact of noise and outliers and sharpens the distinction between clusters. lSparsification facilitates the use of graph partitioning algorithms (or algorithms based on graph partitioning algorithms. Chameleon and Hypergraph-based Clustering

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 教育教学资料库 > 课件讲义

Copyright © 2018-2021 Wenke99.com All rights reserved

工信部备案号浙ICP备20026746号-2  

公安局备案号:浙公网安备33038302330469号

本站为C2C交文档易平台,即用户上传的文档直接卖给下载用户,本站只是网络服务中间平台,所有原创文档下载所得归上传人所有,若您发现上传作品侵犯了您的权利,请立刻联系网站客服并提供证据,平台将在3个工作日内予以改正。