Data dimensionality reduction based on genetic selection of feature subsets

Main Article Content

K. M. Faraoun
A. Rabhi

Abstract

In the present paper, we show that a multi-classification process can be significantly enhanced by selecting an optimal set of the features used as input for the training operation. The selection of such a subset will reduce the dimensionality of the data samples and eliminate the redundancy and ambiguity introduced by some attributes. The used classifier can then operate only on the selected features to perform the learning process. A genetic search is used here to explore the set of all possible features subsets whose size is exponentially proportional to the number of features. A new measure is proposed to compute the information gain provided by each features subsets, and used as the fitness function of the genetic search. Experiments are performed using the KDD99 dataset to classify DoS network intrusions, according to the 41 existing features. The optimality of the obtained features subset is then tested using a multi-layered neural network. Obtained results show that the proposed approach can enhance both the classification rate and the learning runtime.

Article Details

How to Cite
Faraoun, K. M., & Rabhi, A. (2007). Data dimensionality reduction based on genetic selection of feature subsets. INFOCOMP Journal of Computer Science, 6(2), 9–19. Retrieved from https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/169
Section
Articles