1、基因序列的比对、挖掘和功能分析,邹权 (PH.D.&Professor)天津大学 计算机科学与技术学院 2017.10,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predictionmiRNA disease relationshipcrops yield related genes,Outline,Multiple Sequence Alignment(MSA) VS BLAST,Multiple Sequence Ali
2、gnment(MSA): What & Where,Multiple Sequence Alignment,Multiple DNA Sequence Alignment,Multiple Similar DNA Sequence Alignment,Our Focus,Phylogenetic tree,Virus sequences,Population SNV calling,Application,Techniques for similar DNA MSA,1. k-band Dynamic Programming,-1,-1,-4,-5,0,K-band,How to set k
3、for k-band?,Greedy search with suffix tree,T=GTCCTGAAGCTCCGT 1234567890123456,S=GTCCGAAGCTCCGG,(1,1,4),(5,6,9),2. Center star strategy,Techniques for similar DNA MSA,S1,S2,S3,S4,S5,S1,S2,S3,S4,S5,tree alignment,Center star strategy,sum up,update,final result,Extreme MSA for Very Similar DNA Sequence
4、s,Experiments,100 human mitochondria genome sequences16k length (1555KB),Our output 1558KBClustal 1627KB,Time cost of every steps,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predictionmiRNA disease relationshipcrops yield related genes,O
5、utline,Multiple sequence alignment in Hadoop,Multiple sequence alignment in Spark,Running time of different software tools on mtDNA datasets,Running time with HPTree on 16S rRNA datasets,Comparison with CPUs-based and Spark-based,CPUs-based MSA can only address small datasets ( 10% memory size) slow
6、ly.GPUs-based MSA can address small datasets in shorter time than the former.Spark-based MSA can address ultra-large datasets in acceptable time.,Memory Limit Exceeded,Running time (sec),Software,http:/ Web Server,Step 1:After you click the link(http:/ as shown in above, you will see the HAlign web
7、server.,2. Web Server,Step 2:After you submit your experiment task successfully, wait a second, you will see the results.,2. Web Server,Step 3:Now, you can visit your multiple sequences alignment results visualization by click View link.,2. Web Server,Step 4:Now, you can visit your phylogenetic tree
8、 visualization by click Generate link.,Quan Zou, Qinghua Hu, Maozu Guo, Guohua Wang. HAlign: Fast Multiple Similar DNA/RNA Sequence Alignment Based on the Centre Star Strategy. Bioinformatics. 2015,31(15): 2475-2481Xi Chen, Chen Wang, Shanjiang Tang, Ce Yu, Quan Zou. CMSA: A heterogeneous CPU/GPU co
9、mputing system for multiple similar RNA/DNA sequence alignment. BMC Bioinformatics. 2017, 18: 315Shixiang Wan, Quan Zou*. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing. Algorithms for Molecular Biology. 2017,
10、12: 25Wenhe Su, Quan Zou, etc. MASC: A Linear Method for Multiple Nucleotide Sequence Alignment on Spark Parallel Framework. Journal of Computational Biology. Accepted,References on MSA,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predict
11、ionmiRNA disease relationshipcrops yield related genes,Outline,Identification of microRNA,AUCGUGCAGAGACUAGACUGACAUCGUGCAGAGACUAGACUGACAUCGUGCAGAGACUAGACUGACAUCGUGCAGAGACUAGACUGACAUCGUGCAGAGACUAGACUGAC,1tgcgcgaauucacccauggauccauucaucuuccaagggcaccagc2agcgcgaauuccaagucacccauggauccauucaucuggcagcgu3agucg
12、cgaauucaucaucuuccaagggcacccauggauccaucca,microRNA prediction based on machine learning,obvious differences,weak generalization,33,100nt,100nt,Parameter Filter,Prediction Model,Extend,Compute Secondary Structures,Extract,Human CDs,Human Mature microRNAs,Blast,Mature-like Reads,Original NegativeSet,Mi
13、ned Sequences,Rebuilt,Replace,innovation point,34,microRNA family identification,2018/9/22,36/30,http:/ miRNA found by our method,1,37/30,Dinoflagellates genome (甲藻),Lin, et al. The Symbiodinium kawagutii genome illuminates dinoflagellate gene expression and coral symbiosis. Science. 2015, 350(6261)
14、: 691-694.,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predictionmiRNA disease relationshipcrops yield related genes,Outline,Machine learning frame in gene identification,-0.12972021-0.102671220.05165671-0.02537533-0.023275810.01257873-0
15、.04431615-0.037938240.00783558-0.09035013-0.04484774-0.02480496-0.01150325-0.024003250.03616526-0.13563429-0.15971042-0.00528393,-0.02425524-0.050296270.0067438-0.04724623-0.081165380.039152870.05580992-0.02495753-0.054907530.03615180.04706983-0.098071230.104478040.099174030.078162870.112675660.0606
16、0866-0.01122177,-0.12972021-0.10267122-0.02537533-0.02327581-0.04431615-0.03793824-0.09035013-0.04484774-0.01150325-0.02400325-0.13563429-0.15971042,-0.34972021-0.10267784-0.02537533-0.02356713-0.57316152-0.43227931-0.09881432-0.09100432-0.23156745-0.07830325-0.13563472-0.15957833,Ensemble learning:
17、 Make weak classifiers to strong one,ClassificationResult,Combine to form theFinal strong classifier,h1( ) h2()h3( ) h4( ) h5( ) h6() h7(),Ensemble learning for Class Imbalance Problem,http:/ in Bioinformatics,DNA Binding proteinsLi Song, Dapeng Li, Xiangxiang Zeng, Yunfeng Wu, Li Guo*,Quan Zou*. nD
18、NA-prot: Identification of DNA-binding Proteins Based on Unbalanced Classification.BMC Bioinformatics. 2014, 15:298. tRNAQuan Zou, et al. Improving tRNAscan-SE annotation results via ensemble classifiers.Molecular Informatics. 2015,34(11-12):761-770miRNALeyi Wei, Minghong Liao, Yue Gao, Rongrong Ji,
19、 Zengyou He*,Quan Zou*. Improved and Promising Identification of Human MicroRNAs by Incorporating a High-quality Negative Set.IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2014, 11(1):192-201circleRNAXiangxiang Zeng, Wei Lin, Maozu Guo,Quan Zou*. A comprehensive overview and eva
20、luation of circular RNA detection tools.PLoS Computational Biology. 2017,13(6): e1005420,2018/9/22,利用邹权副教授提出的集成学习方法,,Leyi Wei, Minghong Liao, Yue Gao, Rongrong Ji, Zengyou He*,Quan Zou*. Improved and Promising Identification of Human MicroRNAs by Incorporating a High-quality Negative Set.IEEE/ACM Tr
21、ansactions on Computational Biology and Bioinformatics. 2014, 11(1):192-201Quan Zou*, Yaozong Mao, Lingling Hu, Yunfeng Wu, Zhiliang Ji*. miRClassify: An advanced web server for miRNA family classification and annotation.Computers in Biology and Medicine. 2014, 45:157-160Chen Lin, Wenqiang Chen, Che
22、ng Qiu, Yunfeng Wu, Sridhar Krishnan,Quan Zou*. LibD3C: Ensemble Classifiers with a Clustering and Dynamic Selection Strategy.Neurocomputing. 2014,123:424-435.Quan Zou, Jiancang Zeng, Liujuan Cao, Rongrong Ji. A Novel Features Ranking Metric with Application to Scalable Visual and Bioinformatics Dat
23、a Classification.Neurocomputing. 2016, 173:346-354,References,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predictionmiRNA disease relationshipcrops yield related genes,Outline,52,Similarity between two microRNAs,(A),(B),(C),targets of mi
24、R1,targets of miR1,targets of miR1,targets of miR2,targets of miR2,targets of miR2,Quan Zou, et al. Similarity computation strategies in the microRNA-disease network: A Survey. Briefings in Functional Genomics. 2016, 15(1): 55-64.,53,Wei Tang, Zhijun Liao, Quan Zou*. Which statistical significance t
25、est best detects oncomiRNAs in cancer tissues? An exploratory analysis. Oncotarget. DOI: 10.18632/oncotarget.12828 .,http:/ Origin Detection,Sequence alignmentAlgorithmParallelIdentification and miningmicroRNAmachine learning related worksFunction predictionmiRNA disease relationshipcrops yield rela
26、ted genes,Outline,http:/ Zeng, Xuan Zhang,Quan Zou*. Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks.Briefings in Bioinformatics. 2016,17(2):193-203.Yuansheng Liu, Xiangxiang Zeng, Zengyou He*,Quan Zou*. Inferrin
27、g microRNA-disease associations by random walk on a heterogeneous network with multiple data sources.IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2017, 14(4): 905-915Wei Tang, Zhijun Liao,Quan Zou*. Which statistical significance test best detects oncomiRNAs in cancer tissues? An exploratory analysis.Oncotarget. 2016, 7(51):85613-85623Wei Tang, Shixiang Wan, Zhen Yang, Andrew E. Teschendorff*,Quan Zou*. Tumor Origin Detection with Tissue-Specific miRNA and DNA methylation Markers.Bioinformatics. Doi: 10.1093/bioinformatics/btx622,References,