Parallel Algorithm Design for Compression of DNA Data Files
Abstract
In genetic research, processing and analyzing DNA sequences is very important. DNA sequences can get very long, and researchers often must analyze very large amounts of them. This makes it very important to have a fast and efficient algorithm for compressing and decompressing these files to allow processing, sharing, and analyzing of them to be as efficient as possible. These DNA sequences are commonly stored in a file called a FastQ file which, along with a few lines describing the DNA, contains the DNA sequence. The goal of my research was to build a fast and efficient compression algorithm using parallel programming techniques to speed up processing and sharing of these DNA sequences. Over the course of my research, I analyzed three algorithms and chose one that I believed would work the best for FastQ files, then modified the algorithm to allow it to run in parallel.