As a member of the Buerkle lab this summer, I have been involved in the assembly and annotation of 454 generated EST sequences for lodgepole pine (Pinus contorta), which is one of the most ecologically and economically important species in the Rocky Mountains. This EST collection contains more than 500,000 sequences generated from a normalized cDNA collection, and appears to represent a substantial fraction of the expressed genes in lodgepole pine. In particular, I have worked with Dr. Tom Parchman on characterizing the sequences in this resource to facilitate the identification of gene-based markers to investigate the genetics of cone serotiny, or fire-stimulated seed release from cones. My work has primarily focused on the annotation and characterization of the contigs and singletons that resulted from the assembly of the 454 sequences. This includes identification of genes contained in the sequence collection using searches based on sequence similarity of known proteins in order to characterize and enumerate the unique genes sequenced. This work has utilized a variety of computational tools, including BLAST comparisons of assembled reads against UniRef50 and TAIR annotated protein databases and writing a set of custom Perl scripts for unique gene identification, redundancy and retrotransposon filtering, and data management. In addition, I have developed Perl scripts for parsing information for assembly statistics and for the discovery and enumeration of LTR retrotransposons in the lodgepole 454 data. This work will be included in a manuscript on the characterization of the lodgepole pine transcriptome and this resource will be used for the development of thousands of genetic markers. I will also be continuing this research and my collaboration with Drs. Parchman and Buerkle, and my senior thesis at Beloit College will be based on this work.