Feature Selection For Genomic Data By Combining Filter And Wrapper Approaches

Main Article Content

Ali El Akadi
Aouatif Amine
Abdeljalil El Ouardighi
Driss Aboutajdine

Abstract

Gene expression data usually contains a large number of genes, but a small number of samples. Feature selection for gene expression data aims at finding a set of genes that best discriminate biological samples of different types. In this paper, we propose a two-stage selection algorithm for genomic data by combining MRMR (Minimum Redundancy Maximum Relevance) and GA (Genetic Algorithm): In the first stage, MRMR is used to filter noisy and redundant genes in high dimensional microarray data. In the second stage, the GA uses the classifier accuracy as a fitness function to select the highly discriminating genes. The proposed method is tested on five open datasets: NCI, Lymphoma, Lung, Leukemia and Colon using Support Vector Machine and Naïve Bayes classifiers. The comparison of the MRMR-GA with MRMR filter and GA wrapper shows that our method is able to find the smallest gene subset that gives the most classification accuracy in leave-one-out cross-validation (LOOCV).

Article Details

How to Cite
Akadi, A. E., Amine, A., El Ouardighi, A., & Aboutajdine, D. (2009). Feature Selection For Genomic Data By Combining Filter And Wrapper Approaches. INFOCOMP Journal of Computer Science, 8(4), 28–36. Retrieved from https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/279
Section
Articles