SNP Pipeline
- Developer: Brad Sickler
- Started: October 2006
- Completed: March 2007
Overview
Prof. David G. Smith, a professor with the UC Davis Department of Anthropology, is working on QTL linkage studies and population dynamics in Rhesus macaque monkeys. In order to do this he needs to develop a library a sufficient number of genetic markers. The R. macaque draft genome is close to completion (~6.1X coverage) but only has a small number of markers currently available. This project was started as a first of its kind to use 454 pyro-sequencing technology as a means to develop generate a large pool of candidate SNP/Indel markers.
454’s sequencing technology can obtain hundreds of thousands of sequences each around 100bp long scattered randomly across the entire genome.
Project Goals
- 1. Analyze the feasibility of using 454 generated sequences to generate potential polymorphism. Specifically, analyze pre-existing sets of 454 data and simulate a run on the R.macaque genome to verify experimental validity. 454 Distribution | 454 Simulation
- 2. Develop, test, and tune a computational pipeline for SNP and Indel discovery using 454 sequences against a reference genome.
- 3. Develop a set of visualization and analysis tools to facilitate candidate selection for sequencing and polymorphism verification. http://mamusnp.genomecenter.ucdavis.edu
Final Status
All goals were met as of November 2006. The completed pipeline is fully reusable and all generated data and queries are entered in a relational database. Overall, this pipeline identified 22,892 candidate SNPs and 2,923 candidate Indels from two initial 454 runs. Preliminary resequencing results confirm a success rate of over 60% in verifying the SNPs. Tracks were developed for SNPs and Indels in the UCSC genome browser and all SNP results are available online at http://mamusnp.genomecenter.ucdavis.edu.
With the success of the project, Dr. Smith is submitting a 5 year grant proposal to the NIH to develop markers using this method. The methods used to generate these markers has been published on PLoS One. Click MamuSNP: A Resource for Rhesus Macaque (Macaca mulatta) Genomics, 2007.
Time Estimates
285 total working hours.