|
|
|
|
|
Session 59
Poster Presentations Viral Genetic Diversity Session Day and Time: Thursday 1:30 - 3:30 pm Room: Hall A |
Background: It has been hypothesized that
the selective environment within the CNS compartment differs from that of
peripheral tissues. Therefore, it is expected that virus replicating within the
CNS should develop distinct genotypic and phenotypic characteristics from
peripheral virus. V3 genotype is likely to differ, based on the observation
that the predominant target cell type within each compartment is distinct. We
applied a battery of machine learning techniques to discriminate between CSF-
and plasma-derived V3 amino acid sequences.
Methods:
All 188 CSF V3 sequences were down-loaded from the Los Alamos HIV Sequence
Database. A matching set of 188 plasma-derived V3 sequences were randomly
chosen from the same database. The entire sequence set was aligned using
ClustalW. A subset of algorithms within the Weka
(Waikato Environment for Knowledge Analysis) machine learning suite was
implemented to classify V3 amino acid sequences based on their compartment of
origin. Ten (10)-fold cross-validation was used to assess the accuracy of all
tested classifiers.
Results:
The support vector machine constituted the best overall classifier,
categorizing V3 sequences based on their compartment of origin with 90.6%
accuracy. A decision tree inducer achieved an accuracy of 89.6% in 10-fold
cross-validation and revealed a complex sequence signature associated with
compartmentalization.
Conclusions: The
performance of these classifiers suggests that there is a strong
compartment-specific sequence signature. Unlike signatures associated with
other phenotypes, e.g., primary drug resistance and coreceptor usage, the
underlying pattern was complex and comprised many positions across the V3
sequence. The discovery of compartment-specific genotypic characteristics may
prove invaluable in tracking evolutionary patterns during HIV-1 infection. In
addition, the deciphering of these patterns by the support vector machine and
decision tree inducer speak to the efficacy of a machine learning approach to
sequence-based classification.