Machine learning methods for HIV/AIDS diagnostics and therapy planning

Prabhakaran, Sandhya. Machine learning methods for HIV/AIDS diagnostics and therapy planning. 2014, Doctoral Thesis, University of Basel, Faculty of Science.


Official URL: http://edoc.unibas.ch/diss/DissB_10700

Downloads: Statistics Overview


The focus of the thesis is the development and application of Machine Learning methods to the domain of HIV/AIDS diagnostics and therapy planning. The thesis addresses this domain from two different facets. In Facet I, we analyse the genetically-diverse HIV populations present in an infected patient's blood samples. Understanding genetic diversity is crucial for further insights into the viral-host interactions, evolution of drug-resistant viral lineage within an infected host and for personalised medication where drugs are prescribed to a patient based on his/her viral lineage. With the help of recent sequencing technologies, one can generate shorter viral strains called reads from infected blood samples. These reads are made use of in genetic-diversity studies. The puzzle is in matching every read to its parent strain or haplotype, which can be seen as a standard clustering task. Given error-prone reads with limited lengths, the main modelling challenge is that non-overlapping reads do not have any suitable a priori pairwise similarity measure; this leads to a non-standard clustering problem. None of the previous approaches have provided a convincing strategy to solve this issue. In this work we overcome this problem by introducing a propagating Dirichlet Process Mixture Model. In Facet II, we take the first steps to identify similarity patterns between drugs used in HIV/AIDS therapy and active chemical compounds. Currently there exists only a frugal number of anti-HIV drugs available to prepare drug cocktails. When a viral lineage becomes resistant to a particular drug, it tends to show resistance to other drugs in the same drug category, a property called cross-resistance. This situation demands development of newer and resilient drugs and thus, an indepth understanding of similarities between the current drugs and active chemical compounds is necessary. This is done by examining a landscape of active chemical compounds that also contains the drugs. With respect to this, we develop two models: one for Network Inference and another for Automatic Archetype Analysis. For network inference, we present a fully probabilistic approach that infers networks from pairwise Euclidean distances of 'n' objects where the objects are active chemical compounds. For automatic archetype analysis, we develop a sparsity-inducing model based on a Group-Lasso formulation that identifies the representative/archetypal objects given a set of 'n' objects (or active chemical compounds). The model is aided with a well-defined criterion, Bayesian Information Criterion (BIC), that enables automatic model selection.
Advisors:Roth, Volker
Committee Members:Vetter, Thomas
Faculties and Departments:05 Faculty of Science > Departement Mathematik und Informatik > Informatik > Biomedical Data Analysis (Roth)
UniBasel Contributors:Prabhakaran, Sandhya and Roth, Volker and Vetter, Thomas
Item Type:Thesis
Thesis Subtype:Doctoral Thesis
Thesis no:10700
Thesis status:Complete
Number of Pages:147 S.
Identification Number:
edoc DOI:
Last Modified:22 Jan 2018 15:51
Deposited On:05 May 2014 12:59

Repository Staff Only: item control page