数据处理、光谱分析与数据挖掘.ppt

上传人:ga****84 文档编号:322284 上传时间:2018-09-22 格式:PPT 页数:63 大小:9.29MB
下载 相关 举报
数据处理、光谱分析与数据挖掘.ppt_第1页
第1页 / 共63页
数据处理、光谱分析与数据挖掘.ppt_第2页
第2页 / 共63页
数据处理、光谱分析与数据挖掘.ppt_第3页
第3页 / 共63页
数据处理、光谱分析与数据挖掘.ppt_第4页
第4页 / 共63页
数据处理、光谱分析与数据挖掘.ppt_第5页
第5页 / 共63页
点击查看更多>>
资源描述

1、基因序列的比对、挖掘和功能分析,邹权 (PH.D.&Professor)天津大学 计算机科学与技术学院 2017.10,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predictionmiRNA disease relationshipcrops yield related genes,Outline,Multiple Sequence Alignment(MSA) VS BLAST,Multiple Sequence Ali

2、gnment(MSA): What & Where,Multiple Sequence Alignment,Multiple DNA Sequence Alignment,Multiple Similar DNA Sequence Alignment,Our Focus,Phylogenetic tree,Virus sequences,Population SNV calling,Application,Techniques for similar DNA MSA,1. k-band Dynamic Programming,-1,-1,-4,-5,0,K-band,How to set k

3、for k-band?,Greedy search with suffix tree,T=GTCCTGAAGCTCCGT 1234567890123456,S=GTCCGAAGCTCCGG,(1,1,4),(5,6,9),2. Center star strategy,Techniques for similar DNA MSA,S1,S2,S3,S4,S5,S1,S2,S3,S4,S5,tree alignment,Center star strategy,sum up,update,final result,Extreme MSA for Very Similar DNA Sequence

4、s,Experiments,100 human mitochondria genome sequences16k length (1555KB),Our output 1558KBClustal 1627KB,Time cost of every steps,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predictionmiRNA disease relationshipcrops yield related genes,O

5、utline,Multiple sequence alignment in Hadoop,Multiple sequence alignment in Spark,Running time of different software tools on mtDNA datasets,Running time with HPTree on 16S rRNA datasets,Comparison with CPUs-based and Spark-based,CPUs-based MSA can only address small datasets ( 10% memory size) slow

6、ly.GPUs-based MSA can address small datasets in shorter time than the former.Spark-based MSA can address ultra-large datasets in acceptable time.,Memory Limit Exceeded,Running time (sec),Software,http:/ Web Server,Step 1:After you click the link(http:/ as shown in above, you will see the HAlign web

7、server.,2. Web Server,Step 2:After you submit your experiment task successfully, wait a second, you will see the results.,2. Web Server,Step 3:Now, you can visit your multiple sequences alignment results visualization by click View link.,2. Web Server,Step 4:Now, you can visit your phylogenetic tree

8、 visualization by click Generate link.,Quan Zou, Qinghua Hu, Maozu Guo, Guohua Wang. HAlign: Fast Multiple Similar DNA/RNA Sequence Alignment Based on the Centre Star Strategy. Bioinformatics. 2015,31(15): 2475-2481Xi Chen, Chen Wang, Shanjiang Tang, Ce Yu, Quan Zou. CMSA: A heterogeneous CPU/GPU co

9、mputing system for multiple similar RNA/DNA sequence alignment. BMC Bioinformatics. 2017, 18: 315Shixiang Wan, Quan Zou*. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing. Algorithms for Molecular Biology. 2017,

10、12: 25Wenhe Su, Quan Zou, etc. MASC: A Linear Method for Multiple Nucleotide Sequence Alignment on Spark Parallel Framework. Journal of Computational Biology. Accepted,References on MSA,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predict

11、ionmiRNA disease relationshipcrops yield related genes,Outline,Identification of microRNA,AUCGUGCAGAGACUAGACUGACAUCGUGCAGAGACUAGACUGACAUCGUGCAGAGACUAGACUGACAUCGUGCAGAGACUAGACUGACAUCGUGCAGAGACUAGACUGAC,1tgcgcgaauucacccauggauccauucaucuuccaagggcaccagc2agcgcgaauuccaagucacccauggauccauucaucuggcagcgu3agucg

12、cgaauucaucaucuuccaagggcacccauggauccaucca,microRNA prediction based on machine learning,obvious differences,weak generalization,33,100nt,100nt,Parameter Filter,Prediction Model,Extend,Compute Secondary Structures,Extract,Human CDs,Human Mature microRNAs,Blast,Mature-like Reads,Original NegativeSet,Mi

13、ned Sequences,Rebuilt,Replace,innovation point,34,microRNA family identification,2018/9/22,36/30,http:/ miRNA found by our method,1,37/30,Dinoflagellates genome (甲藻),Lin, et al. The Symbiodinium kawagutii genome illuminates dinoflagellate gene expression and coral symbiosis. Science. 2015, 350(6261)

14、: 691-694.,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predictionmiRNA disease relationshipcrops yield related genes,Outline,Machine learning frame in gene identification,-0.12972021-0.102671220.05165671-0.02537533-0.023275810.01257873-0

15、.04431615-0.037938240.00783558-0.09035013-0.04484774-0.02480496-0.01150325-0.024003250.03616526-0.13563429-0.15971042-0.00528393,-0.02425524-0.050296270.0067438-0.04724623-0.081165380.039152870.05580992-0.02495753-0.054907530.03615180.04706983-0.098071230.104478040.099174030.078162870.112675660.0606

16、0866-0.01122177,-0.12972021-0.10267122-0.02537533-0.02327581-0.04431615-0.03793824-0.09035013-0.04484774-0.01150325-0.02400325-0.13563429-0.15971042,-0.34972021-0.10267784-0.02537533-0.02356713-0.57316152-0.43227931-0.09881432-0.09100432-0.23156745-0.07830325-0.13563472-0.15957833,Ensemble learning:

17、 Make weak classifiers to strong one,ClassificationResult,Combine to form theFinal strong classifier,h1( ) h2()h3( ) h4( ) h5( ) h6() h7(),Ensemble learning for Class Imbalance Problem,http:/ in Bioinformatics,DNA Binding proteinsLi Song, Dapeng Li, Xiangxiang Zeng, Yunfeng Wu, Li Guo*,Quan Zou*. nD

18、NA-prot: Identification of DNA-binding Proteins Based on Unbalanced Classification.BMC Bioinformatics. 2014, 15:298. tRNAQuan Zou, et al. Improving tRNAscan-SE annotation results via ensemble classifiers.Molecular Informatics. 2015,34(11-12):761-770miRNALeyi Wei, Minghong Liao, Yue Gao, Rongrong Ji,

19、 Zengyou He*,Quan Zou*. Improved and Promising Identification of Human MicroRNAs by Incorporating a High-quality Negative Set.IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2014, 11(1):192-201circleRNAXiangxiang Zeng, Wei Lin, Maozu Guo,Quan Zou*. A comprehensive overview and eva

20、luation of circular RNA detection tools.PLoS Computational Biology. 2017,13(6): e1005420,2018/9/22,利用邹权副教授提出的集成学习方法,,Leyi Wei, Minghong Liao, Yue Gao, Rongrong Ji, Zengyou He*,Quan Zou*. Improved and Promising Identification of Human MicroRNAs by Incorporating a High-quality Negative Set.IEEE/ACM Tr

21、ansactions on Computational Biology and Bioinformatics. 2014, 11(1):192-201Quan Zou*, Yaozong Mao, Lingling Hu, Yunfeng Wu, Zhiliang Ji*. miRClassify: An advanced web server for miRNA family classification and annotation.Computers in Biology and Medicine. 2014, 45:157-160Chen Lin, Wenqiang Chen, Che

22、ng Qiu, Yunfeng Wu, Sridhar Krishnan,Quan Zou*. LibD3C: Ensemble Classifiers with a Clustering and Dynamic Selection Strategy.Neurocomputing. 2014,123:424-435.Quan Zou, Jiancang Zeng, Liujuan Cao, Rongrong Ji. A Novel Features Ranking Metric with Application to Scalable Visual and Bioinformatics Dat

23、a Classification.Neurocomputing. 2016, 173:346-354,References,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predictionmiRNA disease relationshipcrops yield related genes,Outline,52,Similarity between two microRNAs,(A),(B),(C),targets of mi

24、R1,targets of miR1,targets of miR1,targets of miR2,targets of miR2,targets of miR2,Quan Zou, et al. Similarity computation strategies in the microRNA-disease network: A Survey. Briefings in Functional Genomics. 2016, 15(1): 55-64.,53,Wei Tang, Zhijun Liao, Quan Zou*. Which statistical significance t

25、est best detects oncomiRNAs in cancer tissues? An exploratory analysis. Oncotarget. DOI: 10.18632/oncotarget.12828 .,http:/ Origin Detection,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predictionmiRNA disease relationshipcrops yield rela

26、ted genes,Outline,http:/ Zeng, Xuan Zhang,Quan Zou*. Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks.Briefings in Bioinformatics. 2016,17(2):193-203.Yuansheng Liu, Xiangxiang Zeng, Zengyou He*,Quan Zou*. Inferrin

27、g microRNA-disease associations by random walk on a heterogeneous network with multiple data sources.IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2017, 14(4): 905-915Wei Tang, Zhijun Liao,Quan Zou*. Which statistical significance test best detects oncomiRNAs in cancer tissues? An exploratory analysis.Oncotarget. 2016, 7(51):85613-85623Wei Tang, Shixiang Wan, Zhen Yang, Andrew E. Teschendorff*,Quan Zou*. Tumor Origin Detection with Tissue-Specific miRNA and DNA methylation Markers.Bioinformatics. Doi: 10.1093/bioinformatics/btx622,References,

展开阅读全文
相关资源
相关搜索

当前位置:首页 > 重点行业资料库 > 1

Copyright © 2018-2021 Wenke99.com All rights reserved

工信部备案号浙ICP备20026746号-2  

公安局备案号:浙公网安备33038302330469号

本站为C2C交文档易平台,即用户上传的文档直接卖给下载用户,本站只是网络服务中间平台,所有原创文档下载所得归上传人所有,若您发现上传作品侵犯了您的权利,请立刻联系网站客服并提供证据,平台将在3个工作日内予以改正。