Wyniki wyszukiwania

1

Artykuł dostępny w postaci pełnego tekstu - kliknij by otworzyć plik

Machine-learning in optimization of expensive black-box functions

100%

Tenne Y.

International Journal of Applied Mathematics and Computer Science

|

2017

|

tom 27

|

nr 1

105-118

EN

Modern engineering design optimization often uses computer simulations to evaluate candidate designs. For some of these designs the simulation can fail for an unknown reason, which in turn may hamper the optimization process. To handle such scenarios more effectively, this study proposes the integration of classifiers, borrowed from the domain of machine learning, into the optimization process. Several implementations of the proposed approach are described. An extensive set of numerical experiments shows that the proposed approach improves search effectiveness.

2

Linear discriminant analysis with a generalization of the Moore-Penrose pseudoinverse

100%

Górecki T., Łuczak M.

International Journal of Applied Mathematics and Computer Science

|

2013

|

tom 23

|

nr 2

463-471

EN

The Linear Discriminant Analysis (LDA) technique is an important and well-developed area of classification, and to date many linear (and also nonlinear) discrimination methods have been put forward. A complication in applying LDA to real data occurs when the number of features exceeds that of observations. In this case, the covariance estimates do not have full rank, and thus cannot be inverted. There are a number of ways to deal with this problem. In this paper, we propose improving LDA in this area, and we present a new approach which uses a generalization of the Moore-Penrose pseudoinverse to remove this weakness. Our new approach, in addition to managing the problem of inverting the covariance matrix, significantly improves the quality of classification, also on data sets where we can invert the covariance matrix. Experimental results on various data sets demonstrate that our improvements to LDA are efficient and our approach outperforms LDA.

3

A variant of gravitational classification

100%

Górecki T., Luczak M.

Biometrical Letters

|

2014

|

tom 51

|

nr 1

1-12

EN

In this article there is proposed a new two-parametrical variant of the gravitational classification method. We use the general idea of objects' behavior in a gravity field. Classification depends on a test object's motion in a gravity field of training points. To solve this motion problem, we use a simulation method. This classifier is compared to the 1NN method, because our method tends towards it for some parameter values. Experimental results on different data sets demonstrate an improvement in efficiency and that this approach outperforms the 1NN method by providing a significant reduction in the mean classification error rate.

4

Self-adaptation of parameters in a learning classifier system ensemble machine

100%

Troć M., Unold O.

International Journal of Applied Mathematics and Computer Science

|

2010

|

tom 20

|

nr 1

157-174

EN

Self-adaptation is a key feature of evolutionary algorithms (EAs). Although EAs have been used successfully to solve a wide variety of problems, the performance of this technique depends heavily on the selection of the EA parameters. Moreover, the process of setting such parameters is considered a time-consuming task. Several research works have tried to deal with this problem; however, the construction of algorithms letting the parameters adapt themselves to the problem is a critical and open problem of EAs. This work proposes a novel ensemble machine learning method that is able to learn rules, solve problems in a parallel way and adapt parameters used by its components. A self-adaptive ensemble machine consists of simultaneously working extended classifier systems (XCSs). The proposed ensemble machine may be treated as a meta classifier system. A new self-adaptive XCS-based ensemble machine was compared with two other XCSbased ensembles in relation to one-step binary problems: Multiplexer, One Counts, Hidden Parity, and randomly generated Boolean functions, in a noisy version as well. Results of the experiments have shown the ability of the model to adapt the mutation rate and the tournament size. The results are analyzed in detail.

5

A topological approach for protein classification

100%

Cang Z., Mu L., Wu K., Opron K., Xia K., Wei G.-W.

Molecular Based Mathematical Biology

|

2015

|

tom 3

|

nr 1

EN

Protein function and dynamics are closely related to its sequence and structure.However, prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity between proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics. Persistent homology is a new branch of algebraic topology that has found its success in the topological data analysis in a variety of disciplines, including molecular biology. The present work explores the potential of using persistent homology as an independent tool for protein classification. To this end, we propose a molecular topological fingerprint based support vector machine (MTF-SVM) classifier. Specifically,we construct machine learning feature vectors solely fromprotein topological fingerprints,which are topological invariants generated during the filtration process. To validate the presentMTF-SVMapproach, we consider four types of problems. First, we study protein-drug binding by using the M2 channel protein of influenza A virus. We achieve 96% accuracy in discriminating drug bound and unbound M2 channels. Secondly, we examine the use of MTF-SVM for the classification of hemoglobin molecules in their relaxed and taut forms and obtain about 80% accuracy. Thirdly, the identification of all alpha, all beta, and alpha-beta protein domains is carried out using 900 proteins.We have found a 85% success in this identification. Finally, we apply the present technique to 55 classification tasks of protein superfamilies over 1357 samples and 246 tasks over 11944 samples. Average accuracies of 82% and 73% are attained. The present study establishes computational topology as an independent and effective alternative for protein classification.

6

A rainfall forecasting method using machine learning models and its application to the Fukuoka city case

88%

Sumi S., Zaman M., Hirose H.

International Journal of Applied Mathematics and Computer Science

|

2012

|

tom 22

|

nr 4

841-854

EN

In the present article, an attempt is made to derive optimal data-driven machine learning methods for forecasting an average daily and monthly rainfall of the Fukuoka city in Japan. This comparative study is conducted concentrating on three aspects: modelling inputs, modelling methods and pre-processing techniques. A comparison between linear correlation analysis and average mutual information is made to find an optimal input technique. For the modelling of the rainfall, a novel hybrid multi-model method is proposed and compared with its constituent models. The models include the artificial neural network, multivariate adaptive regression splines, the k-nearest neighbour, and radial basis support vector regression. Each of these methods is applied to model the daily and monthly rainfall, coupled with a pre-processing technique including moving average and principal component analysis. In the first stage of the hybrid method, sub-models from each of the above methods are constructed with different parameter settings. In the second stage, the sub-models are ranked with a variable selection technique and the higher ranked models are selected based on the leave-one-out cross-validation error. The forecasting of the hybrid model is performed by the weighted combination of the finally selected models.

7

Data-driven models for fault detection using kernel PCA: A water distribution system case study

88%

Nowicki A., Grochowski M., Duzinkiewicz K.

International Journal of Applied Mathematics and Computer Science

|

2012

|

tom 22

|

nr 4

939-949

EN

Kernel Principal Component Analysis (KPCA), an example of machine learning, can be considered a non-linear extension of the PCA method. While various applications of KPCA are known, this paper explores the possibility to use it for building a data-driven model of a non-linear system-the water distribution system of the Chojnice town (Poland). This model is utilised for fault detection with the emphasis on water leakage detection. A systematic description of the system's framework is followed by evaluation of its performance. Simulations prove that the presented approach is both flexible and efficient.

8

Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms

88%

Trawiński B., Smętek M., Telec Z., Lasota T.

International Journal of Applied Mathematics and Computer Science

|

2012

|

tom 22

|

nr 4

867-881

EN

In the paper we present some guidelines for the application of nonparametric statistical tests and post-hoc procedures devised to perform multiple comparisons of machine learning algorithms. We emphasize that it is necessary to distinguish between pairwise and multiple comparison tests. We show that the pairwise Wilcoxon test, when employed to multiple comparisons, will lead to overoptimistic conclusions. We carry out intensive normality examination employing ten different tests showing that the output of machine learning algorithms for regression problems does not satisfy normality requirements. We conduct experiments on nonparametric statistical tests and post-hoc procedures designed for multiple 1 × N and N × N comparisons with six different neural regression algorithms over 29 benchmark regression data sets. Our investigation proves the usefulness and strength of multiple comparison statistical procedures to analyse and select machine learning algorithms.

9

Application of agent-based simulated annealing and tabu search procedures to solving the data reduction problem

88%

Czarnowski I., Jędrzejowicz P.

International Journal of Applied Mathematics and Computer Science

|

2011

|

tom 21

|

nr 1

57-68

EN

The problem considered concerns data reduction for machine learning. Data reduction aims at deciding which features and instances from the training set should be retained for further use during the learning process. Data reduction results in increased capabilities and generalization properties of the learning model and a shorter time of the learning process. It can also help in scaling up to large data sources. The paper proposes an agent-based data reduction approach with the learning process executed by a team of agents (A-Team). Several A-Team architectures with agents executing the simulated annealing and tabu search procedures are proposed and investigated. The paper includes a detailed description of the proposed approach and discusses the results of a validating experiment.

10

Multi-label classification using error correcting output codes

63%

Kajdanowicz T., Kazienko P.

International Journal of Applied Mathematics and Computer Science

|

2012

|

tom 22

|

nr 4

829-840

EN

A framework for multi-label classification extended by Error Correcting Output Codes (ECOCs) is introduced and empirically examined in the article. The solution assumes the base multi-label classifiers to be a noisy channel and applies ECOCs in order to recover the classification errors made by individual classifiers. The framework was examined through exhaustive studies over combinations of three distinct classification algorithms and four ECOC methods employed in the multi-label classification problem. The experimental results revealed that (i) the Bode-Chaudhuri-Hocquenghem (BCH) code matched with any multi-label classifier results in better classification quality; (ii) the accuracy of the binary relevance classification method strongly depends on the coding scheme; (iii) the label power-set and the RAkEL classifier consume the same time for computation irrespective of the coding utilized; (iv) in general, they are not suitable for ECOCs because they are not capable to benefit from ECOC correcting abilities; (v) the all-pairs code combined with binary relevance is not suitable for datasets with larger label sets.

Ograniczanie wyników

Machine-learning in optimization of expensive black-box functions

Linear discriminant analysis with a generalization of the Moore-Penrose pseudoinverse

A variant of gravitational classification

Self-adaptation of parameters in a learning classifier system ensemble machine

A topological approach for protein classification

A rainfall forecasting method using machine learning models and its application to the Fukuoka city case

Data-driven models for fault detection using kernel PCA: A water distribution system case study

Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms

Application of agent-based simulated annealing and tabu search procedures to solving the data reduction problem

Multi-label classification using error correcting output codes