ImageVerifierCode 换一换
格式:DOC , 页数:14 ,大小:127.50KB ,
资源ID:1709150      下载积分:12 文钱
快捷下载
登录下载
邮箱/手机:
温馨提示:
快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。 如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝    微信支付   
验证码:   换一换

加入VIP,省得不是一点点
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【https://www.wenke99.com/d-1709150.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: QQ登录   微博登录 

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(Discuss the application of data mining in Bioinformatics.doc)为本站会员(gs****r)主动上传,文客久久仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知文客久久(发送邮件至hr@wenke99.com或直接QQ联系客服),我们立即给予删除!

Discuss the application of data mining in Bioinformatics.doc

1、1Discuss the application of data mining in BioinformaticsAbstract. Bioinformatics is a hot subject, that crossing and penetration with a wide range of subjects. To study and understand the background and condition of bioinformatics, and the application of data mining in Bioinformatics, It is helpful

2、 to promote the development of biology and related sciences. The improvement of Bioinformatics depends on the breakthroughs of related discipline, at the same time, its development also provide information, materials and research methods to those disciplines. Key words: Bioinformatics,Data mining,Ap

3、plication. 1. Introduction Bioinformatics is the core of Biological technology, which accompanied by genome research and produce. Bioinformatics is a subject that combines biology, computer science and network, its research content develops with the emergence and development of genome research. The

4、human genome project was initiated and carried out the nucleic acid, protein data increase rapidly, how to obtain the effective information from 2the massive data into bioinformatics is an urgent problem to be solved. The bioinformatics has put forward higher requirements, and its also the challenge

5、s of theory of information and technical, to meet the needs of data collection, collation, retrieval, analysis. As a kind of emerging technology which based on database, statistics and artificial intelligence, data mining offer a data analysis tools never seen before appeared for genome scientists,

6、provided a new and strong tool for Gene and protein information analysis and extraction. Data mining and Bioinformatics has a good combination point which great potential in application is drawing increasing attention in the field of bioinformatics. The article will introduce the concept of data min

7、ing, biological data mining steps, discuss potential applications of data mining, and the development and Application of Bioinformatics mining tool. Studies show that data mining technology is a powerful tool in biological information processing and its application will make more progress. 2. The co

8、nception of bioinformatics Bioinformatics is a science using computer to store, retrieve and analyze biological information in biology science, its one of the important frontier of life science and physical 3science. The development of bioinformatics depend on the breakthrough of biology, computer s

9、cience and other related disciplines, on the other hand, bioinformatics provide information, materials and methods for these disciplines, and query, search, comparison, analysis the biological information, from the acquisition of genes encoding, gene regulation, protein and nucleic acid structure an

10、d function and relationship of rational knowledge of bioinformatics is the use of genomic information in the coding region of the protein space structure simulation and protein function prediction, and combined such information and biology and life process of physiological and biochemical informatio

11、n, outlines its molecular mechanism, finally applied to protein nucleic acid molecular design, drug design and personalized health care design. The three important parts of Bioinformatics are genome informatics, protein structure modeling and drug design. Its source is a DNA sequence information ana

12、lysis, in the protein coding region information for protein structure prediction and simulation, and then on the basis of specific protein functions necessary for drug design. 3. The relation of Data mining and knowledge discovery There are two popular views of the relation of Data mining 4and knowl

13、edge discovery, one view is that the data mining and knowledge discovery are the same concept, just have different name in different areas, in the field of scientific research, we call Knowledge discovery, and we call data mining in the field of engineering application. The other view said that Know

14、ledge discovery is acquire and mine knowledge from mass data, such knowledge is implicit, previously unknown, and potentially useful information. It means Data mining is the core stage of knowledge discovery. Data mining, knowledge discovery system is an organic whole. Data mining system is the proc

15、ess of knowledge discovery which around a data mining task. All the algorithms service for a mining system, Study the data mining system is use for establishes a scientific system of structure, in favor of mining algorithm for reuse, embedding, algorithm and system organic combination of other modul

16、es. Figure 1 is prototype structure of one mining system. 4. Data mining classification and Mining steps 4.1 Data mining involves many fields and methods,there are artificial intelligence, statistical data, visualization, parallel computing. Data mining has a variety of classifications. 54.1.1 Accor

17、ding to mining task, it can be divided into classification model, clustering, association rule discovery, sequence analysis, variance analysis, data visualization. 4.1.2 According to mining objects, it can be divided into relational database, object oriented database, spatial database, the temporal

18、database, text data source, multimedia database, database and web. 4.1.3 According to mining method, it can be divided into the machine learning method, statistical method, neural network method, decision tree, visualization, nearest neighbor technology. In machine learning, can be divided into Indu

19、ctive learning methods (such as decision tree, rule induction), case-based learning, genetic algorithm. In the statistical method, it can be divided into: regression analysis (multivariate regression, regression, discriminant analysis (BDF), Fischer discriminate, nonparametric discriminant and clust

20、er analysis (system), clustering, dynamic clustering), exploratory analysis (principal component analysis, correlation analysis and so on). 4.2 Data mining includes three parts, business requirements, a large amount of data and the algorithm of mining. The first thing to be sure of real data mining

21、is business requirements, and mining algorithm is one of the presently studying hotspots, 6it was mainly focused on adopting new mining algorithm to solve specific business problems. The mining algorithm can form a mining tool. The common process of it is as follows: (1) analyze problems, source dat

22、a database must be assessed to confirm whether it accords with the standard of data mining. Determine the expected results, and choose the optimal algorithm of the job.(2)Extraction, cleaning and checking data. Run the extracted data on a database that structure and data model was compatible. Provid

23、ing clean consolidated data with uniform structure, than browser a created model, ensure that all data is already present and complete. (3)Creating and debugging model, application of algorithm to model, than produce a structure, browse the structure in the data, confirm it to the source data “facts

24、 “ accurate representation, this is the important point. Though it may not be possible for every detail to do this, but by viewing the generated model, might find important characteristics. (4)Query the data of the data mining model, Once this model was building, the data can be used for decision su

25、pport. In the Microsoft data mining solution, the process usually uses VB or ASP DB for Data Mining by OLE Provider prepared front-end inquiry program. (5) A data mining model was maintenance, after data model was building, 7Initial data characteristics (such as validity) may change, and some inform

26、ation on the changes will affect precision greatly affected, because it changes as the basis of the original model of the nature. Therefore, maintaining the data mining model is a very important link. 5. The application of data mining in Bioinformatics. 5.1 Data mining base on privacy protection Dat

27、a mining technology provide effective tool for biological worker, at the same time comes about privacy protection problems. For example, the research unit of the confidential data, personal medical diagnostic records, and medical records are potentially open to misuse. In the data mining process usi

28、ng limited data access, fuzzy data, reducing the unnecessary packet, increase the noise data and other methods to achieve the purpose of protection of privacy. Such as anonymity technology is the identity of the hidden in the most direct technology. It as privacy protection technology of data mining

29、 is data mining result protection, also do not have primitive data hiding camouflage, but released with privacy of all data, but others have privacy data but cannot be deduced from the data owners identity. For example, a medical information data sheet as follow, date of birth, zip, allergic 8drug w

30、ere identified as a specific recording feature attribute collection, the past medical history as a privacy property protection. Anonymity privacy protection is hiding attribute collection which can be used as the only sign of it, which play indirect protection of privacy effect. From the table, we c

31、an know that identifier attribute value is not the same. An identity value can be associated with a particular record, a specific person to correspond. The privacy of data is match with a particular person, privacy can not be protected. But if we choose zip, allergy medication for identifying attrib

32、utes, past medical history is privacy attribute, the same is 07030 value without allergy 2 records, not the privacy attribute values polio colitis , and 07030 no allergy marked records only determined, can achieve the purpose of protection of personal privacy. After many years research and practice,

33、 a lot of data mining, machine learning systems and tools applied to the processing of biological information. General data mining analysis system can be divided into SAS Enterprise Miner , IBM Intelligent Miner ,SGIMinSet and so on. Some special integrated software package in the processing of biol

34、ogical information plays a great role. GCG (Genetics Computer Group) are used 9mainly in anglicizing DNA sequence and portioning sequence. Staden is the software package of DNA and protein sequence analysis. Moreover, there are Sequencher which used for large-scale sequencing, and VectorNTI which us

35、ed for rapid cloning. GeneMine is composed of Molecular Application Group development of bioinformatics data mining system, the system can be used for biological information data filtering, computing and cluster operations support, and further comprehensive analysis and visualization. At present the

36、 world database giant ORACL E, IBM will have biological information mining tools are embedded to ORACL E 9i, DB2, greatly improves the safety of the biological data and analysis of accuracy. 5.2 Semantic integration of data cleaning, data integration, heterogeneous, distributed database. Many countr

37、ies and organizations have established a biological sequence database, protein structure and function of the database to provide a wealth of information for people, but there were asunder distributed data, and the storage medium is also tending to be various. There are a large number of repeated inf

38、ormation sequence and some highly similar data in the same database. It is easy to result in data redundancy, so the heterogeneous and distributed database semantic integration 10has become an important task. Data cleaning, data integration method of data mining can help to solve the problems of dat

39、a redundancy. 5.3 Similarity search and alignment DNA sequence Sequence alignment can identify the evolutionary relationship of a newly discovered genes and a known gene family, identify their homology or similar, find the maximum matching between them, thereby quantitatively the degree of similarit

40、y. Because sequence data is digital, its internal different between nucleotide precision cross plays an important role. So the exploration of efficient search and alignment algorithm in sequence analysis is very important. At the same time for path analysis, evolution analysis found at different sta

41、ges of disease. Cause of a disease gene more than one, different genes in different stages of disease play a role. We can find the different stages of pathogenic gene sequence by the way of path analysis, evolution analysis, can be developed in different stages of treatment drugs, so as to achieve more effective therapeutic effect. 5.4The analysis of genome characterization and simultaneous occurrence of gene sequence. For a group of sequence of gene family, the only way of

Copyright © 2018-2021 Wenke99.com All rights reserved

工信部备案号浙ICP备20026746号-2  

公安局备案号:浙公网安备33038302330469号

本站为C2C交文档易平台,即用户上传的文档直接卖给下载用户,本站只是网络服务中间平台,所有原创文档下载所得归上传人所有,若您发现上传作品侵犯了您的权利,请立刻联系网站客服并提供证据,平台将在3个工作日内予以改正。