Jointly profiling the expressions of thousands of genes and searching for their associations with cancer development and prognosis play important roles in providing preventive and prognostic information to cancer treatment management. The availability of large-scale gene profiling datasets has largely stimulated multi-dataset approaches, of which integrative analysis that synthesizes information from various data sources based on raw data has better performance in separating signal from noise, getting more efficient estimates and improving reproducibility than single dataset analysis. Multiple methods have been developed to accommodate heterogeneous datasets including penalization, thresholding, and boosting, but few studies are in the Bayesian framework.
This paper is motivated by the importance of gene marker identification to cancer development and prognosis, the effectiveness of integrative analysis in improving marker selection and model prediction, and the limited integrative approaches in the Bayesian framework. The goal of this paper is to identify a set of gene markers accurately using a novel Bayesian integrative approach and to fill the gap of Bayesian variable selection methods in integrative analysis of heterogeneous datasets.
This paper proposes a new integrative Bayesian variable selection method in the linear and the accelerated failure-time (AFT) models under heterogeneous data structure. A weighted approach is introduced to accommodate the user-defined weighted observations in the linear model and the censoring issue in the AFT model. The simulation studies show the proposed approach has satisfactory performance in terms of closeness to the true effects, true positive rate & false positive rate and computational cost. This paper analyzes three cancer prognosis studies with gene expression measurements and identifies associated genes using the proposed method.