306 
Viroverse: A Research Database and Bioinformatics Analysis Framework
Brandon Maust*1, W Deng1, J Stoddard1, Z Frazier2, M Guerquin1, G Learn1, R Samudrala1, R Bumgarner1, and J Mullins1
1Univ of Washington, Seattle, US and 2Univ of Southern California, Los Angeles, US
Background: Groups around the world have collected
pathogen and human gene sequence data and relevant clinical and laboratory
parameters from large cohorts. While useful for their original purpose in
supporting basic research, major issues confront researchers who wish to mine
these data sources. Obtaining data via repository-specific request procedures
by submission of a specific hypothesis and conversion of heterogeneously
collected data to a useful common format and encoding are often encumbrances to
beginning a study.
Methods: Using the Seattle Primary Infection Project and Multi-Center
AIDS Cohort Study (MACS) cohorts as prototypes, we developed a database and
toolkit, which together constitute a software infrastructure for the
acquisition, retention, and evaluation of clinical, laboratory, and genetic
data derived from human hosts and their infecting pathogens. Focusing on HIV,
we built a highly normalized relational database structure specific to the
molecular biology and attendant data of viral pathogens and a series of tools
to capture experimental data and couple it to analysis.
Results: This database, Viroverse, currently includes >1800 subject
records, the majority of which have complete information including: medical history,
demographic, laboratory tests, risk assessment and sexual behavioral data, host
genetic markers, viral gene sequences, and cytotoxic T lymphocyte (CTL)
recognition data. Existing data were assembled from a variety of heterogeneous
formats using a generalized data loading interface. Additional entry forms
capture experimental data from sample acquisition and beyond. EpitopeDB
is a specialized interface to collect and query subject data from enzyme-linked
immunosorbent spot assay (ELISpot) reactivity experiments. Diver is
an automated interface to standard phylogenetic analyses of gene sequences.
Conclusions: Viroverse is proving to be a useful too for handling large
amounts of data generated within a lab and through collaborations with other
groups. A consistent analytical framework and means for interchanging
information on viral infections facilitate efficient manual exploration and
extraction of data and development of new tools for discovery-based data mining
approaches. The former will facilitate hypothesis testing while the latter will
allow rapid exploration of statistically significant correlations that, in
turn, will generate novel hypotheses and increase the pace at which scientific
exploration can proceed.
|