Repository logo
Log In
  1. Home
  2. Unibas
  3. Publications
  4. Bayesian grouped variable selection
 
  • Details

Bayesian grouped variable selection

Date Issued
2012
Author(s)
Shankar Raman, Sudhir  
DOI
10.5451/unibas-005976904
Abstract
Traditionally, variable selection in the context of linear regression has been approached using optimization based approaches like the classical Lasso. Such methods provide a sparse
point estimate with respect to regression coefficients but are unable to provide more information regarding the distribution of regression coefficients like expectation, variance
estimates etc. In the recent years, there has been some progress on the Bayesian formulation for variable selection like for example, the Bayesian Lasso. Motivated by these developments, in this thesis, we build an omnibus Bayesian framework for grouped-variable
selection in linear regression models. This framework is capable of summarizing the posterior distribution over the regression coefficients with estimates for the moments and
the mode. The inference is carried out using Markov Chain Monte Carlo (MCMC) sampling. The estimate for the mode of the posterior distribution over regression coefficients is also generated from the same MCMC sampling algorithm with minimal changes using simulated annealing.

Going beyond simple linear regression, the framework is also extended further to accommodate generalized linear models like Poisson and binomial models with minimal changes to the framework. On the algorithm side, we develop a highly efficient MCMC sampling algorithm for inference purposes. Apart from the Poisson and binomial models, another model that has been incorporated into this framework is the Weibull model which is extensively used for survival analysis. This extension has been combined with an additional clustering component using a survival mixture-of-experts model. The clustering component is particularly useful for performing variable selection (per cluster) simultaneously with cluster identification using Dirichlet processes which avoids the need for fixing the number of clusters in advance.

The resulting framework has been applied to several biological applications like identification of novel compound bio-markers for breast cancer from tissue microarray data and analyzing splice site data for identifying distinguishing features of true splice sites.
Survival data for breast cancer patients has been used to identify low-risk and high-risk
patients and the significant compound markers of each group.
File(s)
Loading...
Thumbnail Image
Name

Thesis_6_.pdf

Size

3.41 MB

Format

Adobe PDF

Checksum

(MD5):9f503164344f567e6e4fb6fe7cf2a74b

University of Basel

edoc
Open Access Repository University of Basel

  • About edoc
  • About Open Access at the University of Basel
  • edoc Policy

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement