CORESTA Congress, Paris, 2006, PT 20

Feature selection from near infrared spectra using genetic algorithms for tobacco blend recognition

Altadis Research Centre, Fleury-les-Aubrais, France

Near infrared reflectance spectroscopy has proved to be very useful for predicting concentrations of chemical compounds in tobacco, classification or pattern recognition of cigarette blends. The resulting measurement of a sample is a whole reflectance spectrum that can be considered as the summation of the spectra of its major chemical constituents and consists of many overlapped absorption bands. But the success of these applications is limited by the properties of spectral data: large number of variables, redundancy or collinearity of the wavelengths and a poorly understood relationship between the spectral variables and the dependant variables. In a classification context the only information of interest is the one concerning the differences between groups of tobacco blends. The goal is to search for informative spectral regions allowing a good discrimination. Informative regions mean that they contain useful information for Factorial Discriminant Analysis (FDA) model building and are helpful to improve the performance of the model and to understand better the differences between groups. For this purpose we introduce a genetic algorithm-based approach (GA) as a method of spectral feature selection. Genetic algorithms are global search and optimization methods based upon the principles of natural evolution and selection. The implementation of a GA consists of different basic steps including initialization of a gene population (set of selected wavelength intervals), crossover, mutation, evaluation and selection of the best individuals regarding the criterion to optimize (fitness). Once the GA has converged, a FDA model is performed from the selected wavelength intervals.The method noted FDA-GA was applied to spectral data consisting of up to 1050 wavelengths (visible and near infrared regions) in order to discriminate tobacco blends from 363 finished products covering different taste lines. The efficiency of this method is discussed in comparison to other well-known algorithms of classification. Moreover the spectral patterns of discriminant factors are examined in a chemical interpretation attempt of absorption bands.