1、Experiments in Bioinformatics (生物信息学实验指导)Experiments in Bioinformatics Edited by Longjiang FanExperiment 1. Construction of Genetic Maps Experiment 2. Analysis of DNA Sequence Experiment 3. Analysis of Protein Sequence Experiment 4. Multiple Sequence Alignments Experiment 5. Analysis of Gene Functio
2、n Last updated: 2002/1/16Experiments in Bioinformatics (生物信息学实验指导)Experiment 1. Construction of Genetic Maps(2 学时)THE PURPOSE About 400-1000 bases can been read in a sequencing run with the most modern sequencer and sequencing projects, such as the human genome project (HGP), have presented many cha
3、llenges for sequence assembly because of the size and complexity of their genomes. So, construction of a landmark map of whole genome is essential for the accurate assembly of a special genome. Genetic (linkage) maps are those basic “landmark map”. In HGP, “four sheets of map” are its main target, i
4、.e. genetic map, physical map, sequence map and genes map. Furthermore, genetic maps are important for plant breeding, genetic disease research and the like. QTL (Quantitative Traits Loci) mapping is such efforts of construction of genetic maps. Many genes responsible for polygenic inheritance of pa
5、rticular characteristics are scattered around the genome. Their positions are known as quantitative traits loci (QTL). It is useful to know where they are for medical and agricultural reasons. In the case of animal and plant breeding it would be useful to identify young individuals with favorable al
6、leles without waiting for their expression at maturity. The procedure in breeding situations is to take inbred lines that differ in the trait of interest, and also varies for markers typically variable number tandem repeats (VNTRs) at numerous probe sites. They are crossed and both the F2 progeny an
7、d later generations are examined for the desired trait and for the variations at the probe sites. If the presence of the trait correlates with inheritance of a particular marker allele, it is likely that one or more genes affecting the trait is located on the DNA close to that marker. The same proce
8、dure can be followed in human families, particularly for disease susceptibility loci, but is more complicated and difficult because the family sizes are smaller. The usual case is that one or two genes cause most of the variation, and there are increasingly more genes with smaller effects. Genes tha
9、t contribute 5% or less to the variation in a trait are very difficult to find.In this experiment you will understand the steps to map QTL and use QTL mapping programs based on different methods, i.e. interval mapping (IM), composite interval mapping (CIM) and mixed-model based composite interval ma
10、pping (MCIM) to finish QTL map based on relative experimental data.LIST OF MATERIALS AND TOOLSMaterials: Two set experiment data (D:): Mletest.mcd (for WinQTLCart) or Mletest.map and Mletest.txt (for QTLMapper); riceQTL.map and riceQTL.txt (for QTLMapper)Tools:Windows QTL Cartographer (WinQTLCart) V
11、ersion 1.30 by Shengchu Wang, Experiments in Bioinformatics (生物信息学实验指导)C. J. Baston and Z. B. Zeng or MAPMAKER/QTL Version 1.1 by Stephen E. Lincoln, Mark J. Daly and Eric S. Lander; QTLMapper Version 1.0 by Daolong Wang, Jun Zhu, et al.PROCEDUREDownload and setup the software (WinQTLCart and QTLMap
12、per), data files and this guide The files list for this experiment: WinQTLCart /QTLMapper /Bioinformaticsguide.doc/ riceQTL.map and riceQTL.txt /Mletest.map and Mletest.txtDownload those file from Bioinplant Lab page ( QTLs based on IM and CIM with WinQTLCartStep1. Start by inputing or creating sour
13、ce data and verify the dataOpen and verify Mletest.mcd file. Step2. View and modify the source dataRecognize its data bodyStep3. Select “Interval mapping” item in METHOD menu for the source data analysis The result file mletest-I.qrt is created.Step4. Select “Composite Interval mapping” item in METH
14、OD menu for the source data analysisThe result file mletest-C.qrt is created.Step5. View the mapping results in graphics and compare the resultsOpen mletest-I.qrt and mletest-C.qrt respectively and compare their LR or LOD graphs.Mapping QTLs based on MCIM with QTLMapperStep1. Start by knowing menu s
15、ystem of QTLMapper 1.0Step2. Preparing input filesTo use QTLMapper 1.0 for QTL mapping analysis, you need to get your marker linkage map, and data of markers and traits into two plain text files (a map file and a data file) in a format recognized by QTLMapper 1.0. These files are collectively called
16、 Input Files.Step3. Working with “File” sub-menuFile sub-menu performs operations related to input files. Open the map file (mletest.map) and the data file (mletst.txt) respectively.Step4. Working with “Run” sub-menuRun sub-menu implements all the operations related to mapping QTLs.Some suggestions:
17、Select “2. Map main-effect QTL” to run;Change the “Genomic range” of “setting mapping ranges” into “All”(i.e. whole genome);Experiments in Bioinformatics (生物信息学实验指导)Change the “No” for “For all testing points” of “How to save results” into “Yes”;Run filtration, Bayesian test and calculation.Four mai
18、n files are created at d: of your computer: mletest.qtl, mletest.flq, mletest.bye and mletest.ctq Step5. Working with “Output” sub-menuOutput sub-menu is designed for processing the original result file from mapping QTLs with additive/epistatic effect so that the manual work for presentation with th
19、e original result file can be largely reduced. In addition, Output sub-menu can also be used for obtaining the results of hypothesis test using some special methods.You can make LR graph file (mletest.plt) at this step.Run Wgnuplot software (packed with QTLMapper) and open the mletest.plt file.Compa
20、re the LR graph with the other two graphs created by WinQTLCart.Step6. Understanding files created by QTLMapper1.0Several kinds of result files will be created from the analysis for mapping QTLs with QTLMapper 1.0. To make inferences about the putative QTLs for the traits under study, the user needs
21、 to understand the contents of these result files. In general, every result file consists of two portions: description of conditions on which the result is obtained, and the result body. In these files, there is usually a word “End” that ends the files.Focus on the mletest.flq file (open it with Not
22、epan).QUESTIONS FOR DISCUSSION1. How many QTLs can been mapped in the rice experiment (see riceQTL file)? Where are their locations?Experiments in Bioinformatics (生物信息学实验指导)Experiment 2. Analysis of DNA Sequence(2 学时)THE PURPOSE 1. Molecular databases: you will learn how to use and understand molecu
23、lar databases that store the wealth of information that is so useful to the molecular biologist, such as finding and retrieving sequence in public databases, how to read the coding of database entries, etc.2. Similarity searching: perform your own similarity searches of provided “unknown” sequences
24、on the nucleotide databases with BLAST, the most popular sequence alignment search tool. You have been given a unknown sequence to identify, but no clues as to what it is. The provider wants an unbiased opinion.LIST OF MATERIALS AND TOOLSSOD gene sequences ;Two unknown sequences:AAAAGAAAAGGTTAGAAAGA
25、TGAGAGATGATAAAGGGTCCATTTGAGGTTAGGTAATATGGTTTGGTATCCCTGTAGTTAAAAGTTTTTGTCTTATTTTAGAATACTGTGATCTATTTCTTTAGTATTAATTTTTCCTTCTGTTTTCCTCATCTAGGGAACCCCAAGAGCATCCAATAGAAGCTGTGCAATTATGTAAAATTTTCAACTGTCTTCCTCAAAATAAAGAAGTATGGTAATCTTTACCTGTATACAGTGCAGAGCCTTCTCAGAAGCACAGAATATTTTTATATTTCCTTTATGTGAATTTTTAAGCTGCAA
26、ATCTGATGGCCTTAATTTCCTTTTTGACACTGAAAGTTTTGTAAAAGAAATCATGTCCATACACTTTGTTGCAAGATGTGAATTATTGACACTGAACTTAATAACTGTGTACTGTTCGGAAGGGGTTCCTCAAATTTTTTGACTTTTTTTGTATGTGTGTTTTTTCTTTTTTTTTAAGTTCTTATGAGGAGGGGAGGGTAAATAAACCACTGTGCGTCTTGGTGTAATTTGAAGATTGCCCCATCTAGACTAGCAATCTCTTCATTATTCTCTGCTATATATAAAACGGTGCTGTGAGGG
27、AGGGGAAAAGCATTTTTCAATATATTGAACTTTTGTACTGAATTTTTTTGTAATAAGCAATCAAGGTTATAATTTTTTTTAAAATAGAAATTTTGTAAGAAGGCAATATTAACCTAATCACCATGTAAGCACTCTGGATGATGGATTCCACAAAACTTGGTTTTATGGTTACTTCTTCTCTTAGATTCTTAATTCATGAGGAGGGTGGGGGAGGGAGGTGGAGGGAGGGAAGGGTTTCTCTATTAAAATGCATTCGTTGTGTTTTTTAAGATAGTGTAACTTGCTTAAATTTCTTATGTG
28、ACATTAACAAATAAAAAAGCTCTTTTAATATTAGATAAGTCCGGCCTGGGCGACAGAGCAAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADatabases: three main public databases(GenBank,EMBL and DDBJ).Tools: PubMed;Entrez and SRS;BLASTPROCEDUREStep1: Obtaining a sequence of interestThere are many ways to obtain a sequence of intere
29、st (SOD gene in this experiment) through or :Experiments in Bioinformatics (生物信息学实验指导) Search GenBank (or EMBL and DDBJ) for sequences of interest (US). Search PubMed, a public version of full Medline for topics of interest (US). Search a variety of sequence and structures databases using the SRS s
30、erver at EMBL (Germany) or Entrez server at NCBI (US).Step2: Reading database entries (records)Step3: Find similar sequence in the databases with BLAST1. Go to EXPASY mirror site at Peking University ()2. Enter the EXPASY (EMBnet) Basic BLAST Server WWW pageIf for any reason, you cannot access the E
31、XPASY BLAST server, you canuse any other BLAST server, such as the BLAST server of 3. Select the program: BLASTNThis is the BLAST program that will compare a nucleotide query sequenceagainst a nucleotide database.4. Select the DNA database: All EMBL (without GHT and ESTs)This is the main EMBL nucleo
32、tide database.5. Ignore the matrix option.It is not used by BLASTN.6. Select sequence input format: Plain TextYou will be submitting the nucleotide sequence in plain text.7. Select the following:Gapped Alignment: ON;BLAST filter: ON;Graphic Output: ON.These are all ON by default.8. Paste the query s
33、equence into the specified area.9. Hit the button: Run BLAST10. Wait as your query is processed by the server.11. Examine the output.Step4: Reading the output of BLASTSep5: Understanding BLAST1. Copy the second unknown query sequence into the pasting window and run a same Basic BLAST search. Examine
34、 the result.QUESTIONS FOR DISCUSSION1. There are three main different searching programs (BLAST, FASTA and BLITZ) available. Which program is best to use for a certain type of sequence?2. Explain the result of Step5.Experiments in Bioinformatics (生物信息学实验指导)Experiment 3. Analysis of Protein Sequence(
35、2 学时)THE PURPOSE 1. Protein sequence databases: There are two major, non-specialised protein databases that you will frequently encounter: PIR and SWISS-PROT. Unlike the three major nucleotide databases, the entries in PIR and SWISS-PROT are not mirrored (copied). Each one has its advantages and dis
36、advantages, which you should consider before deciding which database to search. you will learn how to use and understand the entries in those databases.2. Protein databases searching: Protein database searching is the most important method to master. It is between two and five times more sensitive t
37、han DNA database searching. Perform similarity searching in protein database which you specify with the same programs (BALST or FASTA).LIST OF MATERIALS AND TOOLSProtein id CAA32643.1 and CAA00826.1A human amino acid sequence:MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKVLRLFEENDVNLTHIESRPSRLKK
38、DEYEFFTHLDKRSLPALTNIIKILRHDIGATVHELSRDKKKDTVPWFPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQFADIAYNYRHGQPIPRVEYMEEEKKTWGTVFKTLKSLYKTHACYEYNHIFPLLEKYCGFHEDNIPQLEDVSQFLQTCTGFRLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFAQFSQEIGLASLGAPDEYIEKLATIYWFTVEFGLCKQGDSIKAYGAGLLSSFGELQYCLSEKPKLLPLELEKTAI
39、QNYTVTEFQPLYYVAESFNDAKEKVRNFAATIPRPFSVRYDPYTQRIEVLDNTQQLKILADSINSEIGILCSALQKIKDatabases: PIR and SWISS-PROT; PDB and PAHdbTools: BLAST or FASTAPROCEDUREStep1. Obtaining the sequences of interest and examine the results (not necessary in this experiment) Search PubMed, a public version of full Medlin
40、e for topics of interest (US). Search a variety of sequence and structures databases using the SRS server at EMBL (Germany) or Entrez server at NCBI (US). Search PIR (or WISS-PROT) for sequences of interest (US).Step2. A exercise of searchingIn the exercise given below, you will integrate the knowle
41、dge you have gained from last experiment and classroom. You should also realize how easy it is to use other databases and related sources of information, particularly now that you have an understanding of the molecular databases.1. Go to a sequence alignment program of your choice. You might choose
42、to use:Experiments in Bioinformatics (生物信息学实验指导)The EXPASY BLAST server.Or the GeneStream FASTA server.2. Copy the human amino acid sequence (given in the one letter code).3. Paste the sequence into the query sequence window and adjust the options as necessary. You wont need to specify advanced opti
43、ons, but you should choose a program and database. For simplicity, please use the main SWISS-PROT database. You may wish to try other databases, but you should return to SWISS-PROT when continuing with this exercise.4. Run the search and identify the protein. Select the following:Gapped Alignment: O
44、N;BLAST filter: ON;Graphic Output: ON.Matrix: blosum625. Use the link provided to see the SWISS-PROT report. If the link fails for any reason, you can do a text search of SWISS-PROT. Go to SWISS-PROT and search by the identifier you identified after the BLAST or FASTA search.Step3. Answer the follow
45、ing questions and correct themNow, try to answer all of the questions below. You may need to look at pages that are linked from the SWISS-PROT report, but you will not need to search further than the first page of any site. Answering all of the questions may take some time, but you will get a feel f
46、or what is available, and how to get it. You may even find yourself becoming fascinated by the report, and exploring on your own! Write down the answers, and see if you got them right by comparing your answers to the correct answers on the next page.1. What is the SWISS-PROT name of the entry?2. Wha
47、t is the SWISS-PROT primary accession number?3. What is the most common name of the protein?4. What is the gene called?5. Which year was the crystal structure of the catalytic domaindetermined? Name the first name author.6. Does the enzyme require a co-factor to function? If so, what?7. Name the mos
48、t common disease that arises as a result of deficiency ofthis enzyme.8. Which cytogenetic locus does the gene reside at? (e.g. 13p10.1)9. What is the PAHdb?10. How many amino acid residues are there in the protein?11. What is the molecular weight of the protein?12. More tasks (if you can): Look brie
49、fly at entries in GeneCards, MIM (Mendelian Inheritance in Man), obtain the nucleic acid sequence and locate a FASTA report for the protein sequence. View a three-dimensional (3D) image of the protein that the gene codes for (Hint: PDB stores such files!).Exercise answers:What is the SWISS-PROT name of the entry?Experiments in Bioinformatics (生物信息学实验指导)PH4H_HumanWhat is the SWISS-PROT primary acce