1、Pattern Recognition &artificial IntelligenceLecture 7: 聚 类 算法( 三 conts)1Hierarchical ClusteringRealization 基于分层的聚类 算法 -代码实现BIRCH:利用层次方法的平衡迭代归约和聚类Chameleon:利用动态建模的层次聚类算法ROCK:分类属性的层次聚类算法CURE: 基于质心和基于代表对象方法之间的中间策略2BIRCH BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies Agglomerative C
2、lustering designed for clustering a large amount of numerical data What Birch algorithm tries to solve? “ Most of the existing algorithms DO NOT consider the case that datasets can be too large to fit in main memory“ They DO NOT concentrate on minimizing the number of scans of the dataset“ I/O costs
3、 are very high The complexity of BIRCH is O(n) where n is the number of objects to be clustered.BIRCH: The Idea by exampleData Objects1Clustering Process (build a tree)Cluster11234562If cluster 1 becomes too large (not compact) by adding object 2,then split the clusterLeaf nodeBIRCH: The Idea by exa
4、mpleData Objects1Clustering Process (build a tree)Cluster11234562Leaf nodeCluster2entry 1 entry 2Leaf node with two entriesBIRCH: The Idea by exampleData Objects1Clustering Process (build a tree)Cluster11234562Leaf nodeCluster23entry1 is the closest to object 3If cluster 1 becomes too large by addin
5、g object 3,then split the clusterentry 1 entry 2BIRCH: The Idea by exampleData Objects1Clustering Process (build a tree)Cluster11234562Leaf nodeCluster23entry 1 entry 2 entry 3Cluster3Leaf node with three entriesBIRCH: The Idea by exampleData Objects1Clustering Process (build a tree)Cluster11234562L
6、eaf nodeCluster23entry 1 entry 2 entry 3Cluster34entry3 is the closest to object 4Cluster 2 remains compact when adding object 4then add object 4 to cluster 2Cluster2BIRCH: The Idea by exampleData Objects1Clustering Process (build a tree)Cluster11234562Leaf node3entry 1 entry 2 entry 3Cluster34entry
7、2 is the closest to object 5Cluster 3 becomes too large by adding object 5then split cluster 3?BUT there is a limit to the number of entries a node can haveThus, split the nodeCluster25BIRCH: The Idea by exampleData Objects1Clustering Process (build a tree)Cluster11234562Leaf node3Cluster34Cluster25entry 1 entry 2entry 1.1 entry 1.2 entry 2.1 entry 2.2Leaf nodeNon-Leaf nodeCluster4