Rey, Mélanie. Copula models in machine learning. 2015, PhD Thesis, University of Basel, Faculty of Science.
Available under License CC BY-NC-ND (Attribution-NonCommercial-NoDerivatives).
Official URL: http://edoc.unibas.ch/diss/DissB_11283
Our first contribution is the introduction of a copula mixture model to perform dependency-seeking clustering for co-occurring samples from different data sources. The model takes advantage of the great flexibility offered by the copula framework to extend mixtures of Canonical Correlation Analyzers to multivariate data with arbitrary continuous marginal densities. We formulate our model as a non-parametric Bayesian mixture and provide an efficient Markov Chain Monte Carlo inference algorithm for it. Experiments on real and synthetic data demonstrate that the increased flexibility of the copula mixture significantly improves the quality of the clustering and the interpretability of the results.
The second contribution is a reformulation of the information bottleneck (IB) problem in terms of a copula, using the equivalence between mutual information and negative copula entropy. Focusing on the Gaussian copula, we extend the analytical IB solution available for the multivariate Gaussian case to meta-Gaussian distributions which retain a Gaussian dependence structure but allow arbitrary marginal densities. The resulting approach extends the range of applicability of IB to non-Gaussian continuous data and is less sensitive to outliers than the original IB formulation.
Our third and final contribution is the development of a novel sparse compression technique based on the information bottleneck (IB) principle, which takes into account side information. We achieve this by introducing a sparse variant of IB that compresses the data by preserving the information in only a few selected input dimensions. By assuming a Gaussian copula we can capture arbitrary non-Gaussian marginals, continuous or discrete. We use our model to select a subset of biomarkers relevant to the evolution of malignant melanoma and show that our sparse selection provides reliable predictors.
|Committee Members:||Elidan, Gal|
|Faculties and Departments:||05 Faculty of Science > Departement Mathematik und Informatik > Informatik > Datenanalyse (Roth)|
|Bibsysno:||Link to catalogue|
|Number of Pages:||88 p.|
|Last Modified:||30 Jun 2016 10:57|
|Deposited On:||14 Jul 2015 13:10|
Repository Staff Only: item control page