A pipeline of programs for collecting and analyzing group II intron retroelement sequences from GenBank

dc.contributor.authorAbebe, Michael
dc.contributor.authorCandales, Manuel A
dc.contributor.authorDuong, Adrian
dc.contributor.authorHood, Keyar S
dc.contributor.authorLi, Tony
dc.contributor.authorNeufeld, Ryan AE
dc.contributor.authorShakenov, Abat
dc.contributor.authorSun, Runda
dc.contributor.authorWu, Li
dc.contributor.authorJarding, Ashley M
dc.contributor.authorSemper, Cameron
dc.contributor.authorZimmerly, Steven
dc.date.accessioned2016-04-15T22:41:35Z
dc.date.available2016-04-15T22:41:35Z
dc.date.issued2013-12
dc.description.abstractBackground Accurate and complete identification of mobile elements is a challenging task in the current era of sequencing, given their large numbers and frequent truncations. Group II intron retroelements, which consist of a ribozyme and an intron-encoded protein (IEP), are usually identified in bacterial genomes through their IEP; however, the RNA component that defines the intron boundaries is often difficult to identify because of a lack of strong sequence conservation corresponding to the RNA structure. Compounding the problem of boundary definition is the fact that a majority of group II intron copies in bacteria are truncated. Results Here we present a pipeline of 11 programs that collect and analyze group II intron sequences from GenBank. The pipeline begins with a BLAST search of GenBank using a set of representative group II IEPs as queries. Subsequent steps download the corresponding genomic sequences and flanks, filter out non-group II introns, assign introns to phylogenetic subclasses, filter out incomplete and/or non-functional introns, and assign IEP sequences and RNA boundaries to the full-length introns. In the final step, the redundancy in the data set is reduced by grouping introns into sets of ≥95% identity, with one example sequence chosen to be the representative. Conclusions These programs should be useful for comprehensive identification of group II introns in sequence databases as data continue to rapidly accumulate.en_US
dc.description.refereedYesen_US
dc.identifier.citationAbebe, M., Candales, M.A., Duong, A., Hood, K.S., Li, R., Neufeld, R.A.E., Sun, R., Wu, L., Jarding, A.M., Semper C., Zimmerly, S. (2013) A pipeline of programs for collecting and analyzing group II intron retroelement sequences from GenBank. Mobile DNA 4: 28. doi:10.1186/1759-8753-4-28en_US
dc.identifier.doi10.1186/1759-8753-4-28
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/35159
dc.identifier.urihttp://hdl.handle.net/1880/51151
dc.language.isoenen_US
dc.publisher.departmentBiological Sciencesen_US
dc.publisher.facultyScienceen_US
dc.publisher.institutionUniversity of Calgaryen_US
dc.rightsAttribution 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectBacteriaen_US
dc.subjectGenomesen_US
dc.subjectRetroelementen_US
dc.subjectReverse transcriptaseen_US
dc.subjectRibozymeen_US
dc.titleA pipeline of programs for collecting and analyzing group II intron retroelement sequences from GenBanken_US
dc.typejournal article
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
mobile dna-pipeline.pdf
Size:
754.92 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.84 KB
Format:
Item-specific license agreed upon to submission
Description: