589 Combining Genotype Groups and Recursive Partitioning: An Application to HIV-1 Genetics Data A. Foulkes*1, V. Degruttola2, K. Hertogs3, L. Bachelor4 1Univ of Pennsylvania Sch of Med, Philadelphia; 2Harvard Sch of Public Hlth, Boston, MA; 3Tibotec-Virco, Mechelen, Belgium; and 4Vircolab, Inc, Durham, NC
Background: Understanding the relationship between HIV-1 genotypic markers of resistance and response to therapy presents an analytic challenge due to the high dimensionality of the viral genome and the complex interactions across sites on the genome. Recursive partitioning (RP) is well suited to handling both of these features and a natural approach to classifying patients based on covariates in a way that captures information on the variability of a response variable. In the HIV genetics setting, we aim to classify patients (pts) based on the genotypic characteristics of their infecting viral population so that the resulting groups explain the most variability in drug susceptibility phenotype. By combining dimension reduction techniques with RP we are able to arrive at different classification schemes, which are potentially more informative. An illustration of the relative benefits of the different approaches is provided using 2,559 protease sequences provided by the Virco Group and corresponding 50% inhibitory concentrations (IC50) for Indinavir (IDV).
Methods: A two-stage approach to characterizing drug susceptibility phenotype by genotype is proposed. The first stage involves grouping observations with similar or identical viral genotypes. The second stage uses these groups in addition to individual amino acid sites as predictors in a recursive partitioning. The first approach to grouping pts uses information on known resistance sites to classify patients based on patterns of mutations. The second approach uses K-means clustering to arrive at the initial clusters. Bootstrapping is used to compare the combination of patterning and RP (PRP) and clustering and RP (CRP).
Results: In the data example provided, both PRP and CRP explain more of the variability in the IDV response than RP alone and result in fewer resistance classes. Of 100 bootstrap samples, 89 resulted in a smaller cross validated error using the CRP than RP alone while all 100 resulted in a smaller cross validated error using the PRP compared to RP. This result indicates that based on these data the variability explained is greater using PRP compared to RP. The number of resistance classes is greater using RP than both PRP and CRP in all 100 bootstrap samples.
Conclusions: This research demonstrates the value of combining dimension reduction techniques with recursive partitioning in making predictions about phenotype based on a large number of viral characteristics.