Please use this identifier to cite or link to this item: http://hdl.handle.net/1880/51151
Title: A pipeline of programs for collecting and analyzing group II intron retroelement sequences from GenBank
Authors: Abebe, Michael
Candales, Manuel A
Duong, Adrian
Hood, Keyar S
Li, Tony
Neufeld, Ryan AE
Shakenov, Abat
Sun, Runda
Wu, Li
Jarding, Ashley M
Semper, Cameron
Zimmerly, Steven
Keywords: Bacteria;Genomes;Retroelement;Reverse transcriptase;Ribozyme
Issue Date: Dec-2013
Citation: Abebe, M., Candales, M.A., Duong, A., Hood, K.S., Li, R., Neufeld, R.A.E., Sun, R., Wu, L., Jarding, A.M., Semper C., Zimmerly, S. (2013) A pipeline of programs for collecting and analyzing group II intron retroelement sequences from GenBank. Mobile DNA 4: 28. doi:10.1186/1759-8753-4-28
Abstract: Background Accurate and complete identification of mobile elements is a challenging task in the current era of sequencing, given their large numbers and frequent truncations. Group II intron retroelements, which consist of a ribozyme and an intron-encoded protein (IEP), are usually identified in bacterial genomes through their IEP; however, the RNA component that defines the intron boundaries is often difficult to identify because of a lack of strong sequence conservation corresponding to the RNA structure. Compounding the problem of boundary definition is the fact that a majority of group II intron copies in bacteria are truncated. Results Here we present a pipeline of 11 programs that collect and analyze group II intron sequences from GenBank. The pipeline begins with a BLAST search of GenBank using a set of representative group II IEPs as queries. Subsequent steps download the corresponding genomic sequences and flanks, filter out non-group II introns, assign introns to phylogenetic subclasses, filter out incomplete and/or non-functional introns, and assign IEP sequences and RNA boundaries to the full-length introns. In the final step, the redundancy in the data set is reduced by grouping introns into sets of ≥95% identity, with one example sequence chosen to be the representative. Conclusions These programs should be useful for comprehensive identification of group II introns in sequence databases as data continue to rapidly accumulate.
URI: http://hdl.handle.net/1880/51151
Appears in Collections:Zimmerly, Steven

Files in This Item:
File Description SizeFormat 
mobile dna-pipeline.pdf754.92 kBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons