1、NGS中的单基因遗传病的生物信息分析,张超2015-3-25,HGP: Human Genome Project,1990-2001英,美,法,德,日,中30亿美元与曼哈顿原子弹计划、阿波罗登月计划并称为三大科学计划,NGS: Next Generation Sequencing,通量大 1.8T/run10几个基因组/run耗资少10000RMB/sample 耗时少3d/run,单基因遗传病&NGS,用基因组学来解释遗传病,将基因型和表型联系起来。,蛋白质是影响表型的直接原因,DNA是根本原因,NGS测序技术,WGS(Whole Genome Sequencing)WES(Whole Ex
2、ome Sequencing)TES(Target Enrichment Sequencing)RNA-seq, CHIP-seq, RRBS,MeDIP,Hi-C,illumina platform,MiSeqNextSeq(500,550)HiSeq(2000,2500,3000,4000)Hiseq X(xten,xfive),桥式PCR,SBS,测序出来的reads序列,WES :capture,遗传病基本分析流程,1.QC,(1) 去除带接头(adapter)的reads; (2) 去除N(N表示无法确定碱基信息)的比例大于10%的reads; (3) 当单端测序read中含有的低质
3、量(低于5)碱基数超过该条read长度比例的50% 时,需要去除此对paired reads。,2.Mapping,Read1与read2不在同一个链上,5,5,read1,read2,Ref,SAM/BAM format,header,Reads:RNAME,FLAG,CHR,POS,MAPQ,CIGAR,Depth & Coverage,Avg_depth=total_base/genome_lengthCoverage=covered_base/genome_lengthCapture_efficient=total_base_in_TR/total_base,Depth=3covera
4、ge50%,ref,reads,3.SNP/indel calling,GATK,4.CNV calling,cnv 检测方法,PEMsplit readsRDde novo assemblycombination of above,5.SV calling,结果注释,染色体结构注释:cytoBand,基因组重复区域,保守区域,基因名,转录本,基因间区基因结构注释:外显子,内含子,UTR,剪切区域外显子变异类型:missense, splicing, synonymous, stop loss,stop gain, frameshift.(OR4F5:NM_001005484:exon1:c.
5、A421G:p.T141A)突变功能预测:sift,ployphen2, mutationTaster疾病相关:1000G,dbSNP_nonflagged,GWAS,OMIM,HGMD。,致病突变筛选,过滤掉高频突变过滤掉同义突变筛选出外显子,剪切区突变突变危害预测,基于家系的突变筛选,隐形病显性病de novo mutation,显性病,筛选突变策略患者为0/1不患病为0/0,隐形病,筛选突变策略:父母0/1患病子女1/1 或者 复合杂合不患病子女中为0/1 或者 0/0,De novo mutation,筛选突变策略:父母中为0/0患病子女为0/1不患病子女为0/0,其他分析,病毒插入通路富集蛋白功能网络癌症:Somatic variationDriver mutationClone分析(异质性,转移复发),Thanks,