At the Intersection of Computer Science and Biology: Using Bioinformatics in Analyzing, Visualizing, and Exploring Genome-Wide Patterns
MetadataShow full item record
The first part of my main project over the summer consisted of creating a database of insertion and deletions of nucleotide bases (indels) as well as single nucleotide polymorphisms (SNPs) for 23 different breeds of dogs. A SNP is a variation in the DNA sequence at an individual nucleotide base in relation to a reference organism. In our case, the reference breed was the first dog to have its DNA sequenced, a Boxer. The database system used was MySQL along with Python and Java APIs. The creation of such a database along with its interaction with other servers online was the "bioinformatics" portion of my project. The second part of my main project concerned making the information in the database actionable as to ascertain genome-wide patterns. First the data had to be analyzed, so I developed a "sliding window" algorithm that would "slide" along the genome and count SNPs and/or indels in a given interval for any specific breed(s). Then to visualize the patterns, I displayed the counts per window as a heat map, where areas of high SNP/indel frequency are red, areas of medium SNP/indel frequency are white, and areas of low SNP/indel frequency are blue. As well as my main project, I also helped other researchers and student interns with their respective presentations. This ranged from helping mine data from online databases as to reduce time "copying and pasting", to running scripts and programs to analyze results. However, it was my main project as well as working with the other researchers that contributed to my greater understanding of where bioinformatics and computational biology play their part in modern research.