At the Intersection of Computer Science and Biology: Using Bioinformatics in Analyzing, Visualizing, and Exploring Genome-Wide Patterns
Keselring, Alex
Issue Date
Alternative Title
The first part of my main project over the summer consisted of creating a
database of insertion and deletions of nucleotide bases (indels) as well as single
nucleotide polymorphisms (SNPs) for 23 different breeds of dogs. A SNP is a
variation in the DNA sequence at an individual nucleotide base in relation to a
reference organism. In our case, the reference breed was the first dog to have
its DNA sequenced, a Boxer. The database system used was MySQL along with
Python and Java APIs. The creation of such a database along with its interaction
with other servers online was the "bioinformatics" portion of my project.
The second part of my main project concerned making the information in
the database actionable as to ascertain genome-wide patterns. First the data
had to be analyzed, so I developed a "sliding window" algorithm that would "slide"
along the genome and count SNPs and/or indels in a given interval for any
specific breed(s). Then to visualize the patterns, I displayed the counts per
window as a heat map, where areas of high SNP/indel frequency are red, areas
of medium SNP/indel frequency are white, and areas of low SNP/indel frequency
are blue.
As well as my main project, I also helped other researchers and student
interns with their respective presentations. This ranged from helping mine data
from online databases as to reduce time "copying and pasting", to running scripts
and programs to analyze results. However, it was my main project as well as
working with the other researchers that contributed to my greater understanding
of where bioinformatics and computational biology play their part in modern
28 p.
U.S. copyright laws protect this material. Commercial use or distribution of this material is not permitted without prior written