DML-PL - Yadda

1

Correlation-based feature selection strategy in classification problems

100%

Michalak K., Kwaśnicka H.

International Journal of Applied Mathematics and Computer Science

|

2006

|

tom 16

|

nr 4

503-511

EN

In classification problems, the issue of high dimensionality, of data is often considered important. To lower data dimensionality, feature selection methods are often employed. To select a set of features that will span a representation space that is as good as possible for the classification task, one must take into consideration possible interdependencies between the features. As a trade-off between the complexity of the selection process and the quality of the selected feature set, a pairwise selection strategy has been recently suggested. In this paper, a modified pairwise selection strategy is proposed. Our research suggests that computation time can be significantly lowered while maintaining the quality of the selected feature sets by using mixed univariate and bivariate feature evaluation based on the correlation between the features. This paper presents the comparison of the performance of our method with that of the unmodified pairwise selection strategy based on several well-known benchmark sets. Experimental results show that, in most cases, it is possible to lower computation time and that with high statistical significance the quality of the selected feature sets is not lower compared with those selected using the unmodified pairwise selection process.

2

Rough sets methods in feature reduction and classification

100%

Świniarski R. W.

International Journal of Applied Mathematics and Computer Science

|

2001

|

tom 11

|

nr 3

565-582

EN

The paper presents an application of rough sets and statistical methods to feature reduction and pattern recognition. The presented description of rough sets theory emphasizes the role of rough sets reducts in feature selection and data reduction in pattern recognition. The overview of methods of feature selection emphasizes feature selection criteria, including rough set-based methods. The paper also contains a description of the algorithm for feature selection and reduction based on the rough sets method proposed jointly with Principal Component Analysis. Finally, the paper presents numerical results of face recognition experiments using the learning vector quantization neural network, with feature selection based on the proposed principal components analysis and rough sets methods.

3

Artykuł dostępny w postaci pełnego tekstu - kliknij by otworzyć plik

On the order equivalence relation of binary association measures

88%

Paradowski M.

International Journal of Applied Mathematics and Computer Science

|

2015

|

tom 25

|

nr 3

645-657

EN

Over a century of research has resulted in a set of more than a hundred binary association measures. Many of them share similar properties. An overview of binary association measures is presented, focused on their order equivalences. Association measures are grouped according to their relations. Transformations between these measures are shown, both formally and visually. A generalization coefficient is proposed, based on joint probability and marginal probabilities. Combining association measures is one of recent trends in computer science. Measures are combined in linear and nonlinear discrimination models, automated feature selection or construction. Knowledge about their relations is particularly important to avoid problems of meaningless results, zeroed generalized variances, the curse of dimensionality, or simply to save time.

4

Data mining methods for prediction of air pollution

88%

Siwek K., Osowski S.

International Journal of Applied Mathematics and Computer Science

|

2016

|

tom 26

|

nr 2

467-478

EN

The paper discusses methods of data mining for prediction of air pollution. Two tasks in such a problem are important: generation and selection of the prognostic features, and the final prognostic system of the pollution for the next day. An advanced set of features, created on the basis of the atmospheric parameters, is proposed. This set is subject to analysis and selection of the most important features from the prediction point of view. Two methods of feature selection are compared. One applies a genetic algorithm (a global approach), and the other-a linear method of stepwise fit (a locally optimized approach). On the basis of such analysis, two sets of the most predictive features are selected. These sets take part in prediction of the atmospheric pollutants PM10, SO2, NO2 and O3. Two approaches to prediction are compared. In the first one, the features selected are directly applied to the random forest (RF), which forms an ensemble of decision trees. In the second case, intermediate predictors built on the basis of neural networks (the multilayer perceptron, the radial basis function and the support vector machine) are used. They create an ensemble integrated into the final prognosis. The paper shows that preselection of the most important features, cooperating with an ensemble of predictors, allows increasing the forecasting accuracy of atmospheric pollution in a significant way.

5

Selecting differentially expressed genes for colon tumor classification

75%

Fujarewicz K., Wiench M.

International Journal of Applied Mathematics and Computer Science

|

2003

|

tom 13

|

nr 3

327-335

EN

DNA microarrays provide a new technique of measuring gene expression, which has attracted a lot of research interest in recent years. It was suggested that gene expression data from microarrays (biochips) can be employed in many biomedical areas, e.g., in cancer classification. Although several, new and existing, methods of classification were tested, a selection of proper (optimal) set of genes, the expressions of which can serve during classification, is still an open problem. Recently we have proposed a new recursive feature replacement (RFR) algorithm for choosing a suboptimal set of genes. The algorithm uses the support vector machines (SVM) technique. In this paper we use the RFR method for finding suboptimal gene subsets for tumornormal colon tissue classification. The obtained results are compared with the results of applying other methods recently proposed in the literature. The comparison shows that the RFR method is able to find the smallest gene subset (only six genes) that gives no misclassifications in leave-one-out cross-validation for a tumornormal colon data set. In this sense the RFR algorithm outperforms all other investigated methods.

6

Data mining methods for gene selection on the basis of gene expression arrays

75%

Muszyński M., Osowski S.

International Journal of Applied Mathematics and Computer Science

|

2014

|

tom 24

|

nr 3

657-668

EN

The paper presents data mining methods applied to gene selection for recognition of a particular type of prostate cancer on the basis of gene expression arrays. Several chosen methods of gene selection, including the Fisher method, correlation of gene with a class, application of the support vector machine and statistical hypotheses, are compared on the basis of clustering measures. The results of applying these individual selection methods are combined together to identify the most often selected genes forming the required pattern, best associated with the cancerous cases. This resulting pattern of selected gene lists is treated as the input data to the classifier, performing the task of the final recognition of the patterns. The numerical results of the recognition of prostate cancer from normal (reference) cases using the selected genes and the support vector machine confirm the good performance of the proposed gene selection approach.

Ograniczanie wyników

6 International Journal of Applied Mathematics and Computer Science

2 Osowski S.

1 Fujarewicz K.

1 Kwaśnicka H.

1 Michalak K.

1 Muszyński M.

1 Paradowski M.

1 Siwek K.

1 Wiench M.

1 Świniarski R. W.

1 2016

1 2015

1 2014

1 2006

1 2003

1 2001

Wyniki wyszukiwania

Correlation-based feature selection strategy in classification problems

Rough sets methods in feature reduction and classification

On the order equivalence relation of binary association measures

Data mining methods for prediction of air pollution

Selecting differentially expressed genes for colon tumor classification

Data mining methods for gene selection on the basis of gene expression arrays