ImageVerifierCode 换一换
格式:PPT , 页数:24 ,大小:3.16MB ,
资源ID:385694      下载积分:12 文钱
快捷下载
登录下载
邮箱/手机:
温馨提示:
快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。 如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝    微信支付   
验证码:   换一换

加入VIP,省得不是一点点
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【https://www.wenke99.com/d-385694.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: QQ登录   微博登录 

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(10ChallengingProblemsinDataMiningResearch.ppt)为本站会员(ga****84)主动上传,文客久久仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知文客久久(发送邮件至hr@wenke99.com或直接QQ联系客服),我们立即给予删除!

10ChallengingProblemsinDataMiningResearch.ppt

1、1,Data Mining: Opportunities and Challenges,Xindong Wu University of Vermont, USA;Hefei University of Technology, China(合肥工业大学计算机应用长江学者讲座教授),2,Deduction Induction: My Research Background,3,Outline,Data Mining OpportunitiesMajor Conferences and Journals in Data MiningMain Topics in Data Mining Some R

2、esearch Directions in Data Mining10 Challenging Problems in Data Mining Research,4,What Is Data Mining?,The discovery of knowledge (in the form of rules, trees, frequent patterns etc.) from large volumes of dataA hot field: 15 “data mining” conferences in 2003, including KDD, ICDM, SDM, IDA, PKDD an

3、d PAKDD excluding IJCAI, COMPSTAT, SIGMOD and other more general conferences that also publish data mining papers.,5,Main Activities in Data Mining: Conferences,The birth of data mining/KDD: 1989 IJCAI Workshop on Knowledge Discovery in Databases 1991-1994 Workshops on Knowledge Discovery in Databas

4、es1995 date: International Conferences on Knowledge Discovery in Databases and Data Mining (KDD)2001 date: IEEE ICDM and SIAM-DM (SDM)Several regional conferences, incl. PAKDD (since 1997) & PKDD (since 1997).,6,Data Mining: Major Journals,Data Mining and Knowledge Discovery (DMKD, since 1997)Knowle

5、dge and Information Systems (KAIS, since 1999)IEEE Transactions on Knowledge and Data Engineering (TKDE)Many others, incl. TPAMI, ML, IDA, ,7,ACM KDD vs. IEEE ICDM,8,Main Topics in Data Mining,Association analysis (frequent patterns)Classification (trees, Bayesian methods, etc) Clustering and outlie

6、r analysisSequential and spatial patterns, and time-series analysisText and Web miningData visualization and visual data mining.,9,Some Research Directions,Web mining (incl. Web structures, usage analysis, authoritative pages, and document classification)Intelligent data analysis in BioinformaticsMi

7、ning with data streams (in continuous, real-time, dynamic data environments)Integrated, intelligent data mining environments and tools (incl. induction, deduction, and heuristic computation).,10,Outline,Data Mining OpportunitiesMajor Conferences and Journals in Data MiningMain Topics in Data Mining

8、Some Research Directions in Data Mining10 Challenging Problems in Data Mining Research,11,10 Challenging Problems in Data Mining Research,Joint Efforts with Qiang Yang (Hong Kong Univ. of Sci. Hefei University of Technology, China),12,Why “Most Challenging Problems”?,What are the 10 most challenging

9、 problems in data mining, today?Different people have different views, a function of time as wellWhat do the experts think?Experts we consulted: Previous organizers of IEEE ICDM and ACM KDDWe asked them to list their 10 problems (requests sent out in Oct 05, and replies Obtained in Nov 05)Replies Ed

10、ited into an article: hopefully be useful for young researchersNot in any particular importance order,13,1. Developing a Unifying Theory of Data Mining,The current state of the art of data-mining research is too ad-hoc“techniques are designed for individual problemsno unifying theoryNeeds unifying r

11、esearchExploration vs explanationLong standing theoretical issuesHow to avoid spurious correlations?Deep research.Knowledge discovery on hidden causes?Similar to discovery of Newtons Law?,An Example (from Tutorial Slides by Andrew Moore):VC dimension. If youve got a learning algorithm in one hand an

12、d a dataset in the other hand, to what extent can you decide whether the learning algorithm is in danger of overfitting or underfitting?formal analysis into the fascinating question of how overfitting can happen, estimating how well an algorithm will perform on future data that is solely based on it

13、s training set error, a property (VC dimension) of the learning algorithm. VC-dimension thus gives an alternative to cross-validation, called Structural Risk Minimization (SRM), for choosing classifiers. CV,SRM, AIC and BIC.,14,2. Scaling Up for High Dimensional Data and High Speed Streams,Scaling u

14、p is neededultra-high dimensional classification problems (millions or billions of features, e.g., bio data)Ultra-high speed data streamsStreams.continuous, online processe.g. how to monitor network packets for intruders?concept drift and environment drift?RFID network and sensor network data,Excerp

15、t from Jian Peis Tutorialhttp:/www.cs.sfu.ca/jpei/,15,3. Sequential and Time Series Data,How to efficiently and accurately cluster, classify and predict the trends ?Time series data used for predictions are contaminated by noise.How to do accurate short-term and long-term predictions?Signal processi

16、ng techniques introduce lags in the filtered data, which reduces accuracyKey in source selection, domain knowledge in rules, and optimization methods,Real time series data obtained fromwireless sensors in Hong Kong USTCS department hallway,16,4. Mining Complex Knowledge from Complex Data,Mining grap

17、hsData that are not i.i.d. (independent and identically distributed)many objects are not independent of each other, and are not of a single type. mine the rich structure of relations among objects, E.g.: interlinked Web pages, social networks, metabolic networks in the cellIntegration of data mining

18、 and knowledge inference The biggest gap: unable to relate the results of mining to the real-world decisions they affect - all they can do is hand the results back to the userMore research on interestingness of knowledge.,Citation (Paper 2),Author (Paper1),Title,Conference Name,17,5. Data Mining in

19、a Network Setting,Community and Social NetworksLinked data between emails, Web pages, blogs, citations, sequences and peopleStatic and dynamic structural behaviorMining in and for Computer Networks.detect anomalies (e.g., sudden traffic spikes due to a DoS (Denial of Service) attackNeed to handle 10

20、Gig Ethernet links (a) detect (b) trace back (c ) drop packet,Picture from Matthew Pirrettis slides, Penn StateAn Example of packet streams (data courtesy of NCSA, UIUC),18,6. Distributed Data Mining and Mining Multi-agent Data,Need to correlate the data seen at the various probes (such as in a sens

21、or network)Adversary data mining: deliberately manipulate the data to sabotage them (e.g., make them produce false negatives)Game theory may be needed for help.,Games,Player 1:miner,Player 2,Action: H,H,H,T,T,T,(-1,1),(-1,1),(1,-1),(1,-1),Outcome,19,7. Data Mining for Biological and Environmental Pr

22、oblems,New problems raise new questionsLarge scale problems especially soBiological data mining, such as HIV vaccine designDNA, chemical properties, 3D structures, and functional properties need to be fused Environmental data miningMining for solving the energy crisis.,20,8. Data-mining-Process Rela

23、ted Problems,How to automate mining process?the composition of data mining operationsData cleaning, with logging capabilitiesVisualization and mining automation.,Need a methodology: help users avoid many data mining mistakesWhat is a canonical set of data mining operations?,Sampling,Feature Sel,Mini

24、ng,21,9. Security, Privacy and Data Integrity,How to ensure the users privacy while their data are being mined?How to do data mining for protection of security and privacy?Knowledge integrity assessment. Data are intentionally modified from their original version, in order to misinform the recipient

25、s or for privacy and securityDevelopment of measures to evaluate the knowledge integrity of a collection of DataKnowledge and patterns,http:/www.cdt.org/privacy/,Headlines (Nov 21 2005)Senate Panel Approves Data Security Bill - The Senate Judiciary Committee on Thursday passed legislation designed t

26、o protect consumers against data security failures by, among other things, requiring companies to notify consumers when their personal information has been compromised. While several other committees in both the House and Senate have their own versions of data security legislation, S. 1789 breaks ne

27、w ground by including provisions permitting consumers to access their personal files ,22,10. Dealing with Non-static, Unbalanced and Cost-sensitive Data,The UCI datasets are small and not highly unbalancedReal world data are large (105 features) but only 1% of the useful classes (+ve)There is much i

28、nformation on costs and benefits, but no overall model of profit and lossData may evolve with a bias introduced by sampling.,Each test incurs a cost Data extremely unbalanced Data change with time,23,10 Challenging Problems: Summary,Developing a Unifying Theory of Data Mining Scaling Up for High Dim

29、ensional Data/High Speed Streams Mining Sequence Data and Time Series Data Mining Complex Knowledge from Complex Data Data Mining in a Network Setting Distributed Data Mining and Mining Multi-agent DataData Mining for Biological and Environmental Problems Data-Mining-Process Related Problems Securit

30、y, Privacy and Data Integrity Dealing with Non-static, Unbalanced and Cost-sensitive Data,24,Contributors,Pedro Domingos, Charles Elkan, Johannes Gehrke, Jiawei Han, David Heckerman, Daniel Keim, Jiming Liu, David Madigan, Gregory Piatetsky-Shapiro, Vijay V. Raghavan and associates, Rajeev Rastogi, Salvatore J. Stolfo, Alexander Tuzhilin, and Benjamin W. Wah,

Copyright © 2018-2021 Wenke99.com All rights reserved

工信部备案号浙ICP备20026746号-2  

公安局备案号:浙公网安备33038302330469号

本站为C2C交文档易平台,即用户上传的文档直接卖给下载用户,本站只是网络服务中间平台,所有原创文档下载所得归上传人所有,若您发现上传作品侵犯了您的权利,请立刻联系网站客服并提供证据,平台将在3个工作日内予以改正。