Surveying Features of Spherical Viruses
The main research goal was to investigate the relationship between gauge points and other capsid properties. Data was collected from online databases: ViperDB (VIrus Particle ExploreR database), RCSB (Research Collaboratory for Structural Bioinformatics), and SCOP (Structural Classification of Proteins). I wrote code to streamline and automate this process, analyze capsid data, and compile a database, then I made a website to visualize the results. Dr. Wilson already had a MatLab script called SC_frankencode.m (which I will refer to as GP.m) to find gauge points given a capsid’s PDB (Protein Data Bank) coordinates from ViperDB, so I wrote scripts to perform other analysis on PDB and point array files. The first thing I worked on was developing a script called find_aas.py to detect which amino acids were nearby (<5Å) each point in a given point array and write the results to an Excel file. I then made a shell script to loop over a folder of point arrays and later used pyinstaller to create an executable of the Python script, as well as a corresponding adapted PA-folder script. Next I created a pipeline that downloaded all of the capsid coordinate files from ViperDB and performed all the steps involved in collecting the gauge point and amino acid data. This consisted of: creating a list of all PDB IDs using the API, downloading each of their coordinate files, running makeicos.pl to create the full capsid of the AU in the coordinate file, extract_coords.pl to get the XYZ coordinates, pdb_indo.pl to load the capsid, my version of GP.m which writes the output to an Excel file (along with the points of the 5 closest Point Arrays to protrusions), and find_aas.py on each point array. This produced ~1,200 Excel files (available here - full_*.xlsx files contain amino acid data, others contain gauge point data). I made an iPython notebook file called xlfiles_json.ipynb to compile a database with the results, additional data from scraping SCOP, RCSB, and ViperDB, and the output of some other analysis scripts (detailed in the data section below). The last stage of my work has been focused on making a website where you can search, filter, and create visualizations for information stored in the database.
U.S. copyright laws protect this material. Commercial use or distribution of this material is not permitted without prior written permission of the copyright holder. All rights reserved.