Wyniki wyszukiwania

1

Center-based l₁-clustering method

100%

Sabo K.

International Journal of Applied Mathematics and Computer Science

|

2014

|

tom 24

|

nr 1

151-163

EN

In this paper, we consider the l₁-clustering problem for a finite data-point set which should be partitioned into k disjoint nonempty subsets. In that case, the objective function does not have to be either convex or differentiable, and generally it may have many local or global minima. Therefore, it becomes a complex global optimization problem. A method of searching for a locally optimal solution is proposed in the paper, the convergence of the corresponding iterative process is proved and the corresponding algorithm is given. The method is illustrated by and compared with some other clustering methods, especially with the l₂-clustering method, which is also known in the literature as a smooth k-means method, on a few typical situations, such as the presence of outliers among the data and the clustering of incomplete data. Numerical experiments show in this case that the proposed l₁-clustering algorithm is faster and gives significantly better results than the l₂-clustering algorithm.

2

An alternative extension of the k-means algorithm for clustering categorical data

100%

San O. M., Huynh V.-N., Nakamori Y.

International Journal of Applied Mathematics and Computer Science

|

2004

|

tom 14

|

nr 2

241-247

EN

Most of the earlier work on clustering has mainly been focused on numerical data whose inherent geometric properties can be exploited to naturally define distance functions between data points. Recently, the problem of clustering categorical data has started drawing interest. However, the computational cost makes most of the previous algorithms unacceptable for clustering very large databases. The -means algorithm is well known for its efficiency in this respect. At the same time, working only on numerical data prohibits them from being used for clustering categorical data. The main contribution of this paper is to show how to apply the notion of 'cluster centers' on a dataset of categorical objects and how to use this notion for formulating the clustering problem of categorical objects as a partitioning problem. Finally, a -means-like algorithm for clustering categorical data is introduced. The clustering performance of the algorithm is demonstrated with two well-known data sets, namely, em soybean disease and em nursery databases.

3

An algorithm for reducing the dimension and size of a sample for data exploration procedures

88%

Kulczycki P., Łukasik S.

International Journal of Applied Mathematics and Computer Science

|

2014

|

tom 24

|

nr 1

133-149

EN

The paper deals with the issue of reducing the dimension and size of a data set (random sample) for exploratory data analysis procedures. The concept of the algorithm investigated here is based on linear transformation to a space of a smaller dimension, while retaining as much as possible the same distances between particular elements. Elements of the transformation matrix are computed using the metaheuristics of parallel fast simulated annealing. Moreover, elimination of or a decrease in importance is performed on those data set elements which have undergone a significant change in location in relation to the others. The presented method can have universal application in a wide range of data exploration problems, offering flexible customization, possibility of use in a dynamic data environment, and comparable or better performance with regards to the principal component analysis. Its positive features were verified in detail for the domain's fundamental tasks of clustering, classification and detection of atypical elements (outliers).

4

Graph-based generation of a meta-learning search space

75%

Jankowski N.

International Journal of Applied Mathematics and Computer Science

|

2012

|

tom 22

|

nr 3

647-667

EN

Meta-learning is becoming more and more important in current and future research concentrated around broadly defined data mining or computational intelligence. It can solve problems that cannot be solved by any single, specialized algorithm. The overall characteristic of each meta-learning algorithm mainly depends on two elements: the learning machine space and the supervisory procedure. The former restricts the space of all possible learning machines to a subspace to be browsed by a meta-learning algorithm. The latter determines the order of selected learning machines with a module responsible for machine complexity evaluation, organizes tests and performs analysis of results. In this article we present a framework for meta-learning search that can be seen as a method of sophisticated description and evaluation of functional search spaces of learning machine configurations used in meta-learning. Machine spaces will be defined by specially defined graphs where vertices are specialized machine configuration generators. By using such graphs the learning machine space may be modeled in a much more flexible way, depending on the characteristics of the problem considered and a priori knowledge. The presented method of search space description is used together with an advanced algorithm which orders test tasks according to their complexities.

Ograniczanie wyników

4 International Journal of Applied Mathematics and Computer Science

1 Huynh V.-N.

1 Jankowski N.

1 Kulczycki P.

1 Nakamori Y.

1 Sabo K.

1 San O. M.

1 Łukasik S.

2 2014

1 2012

1 2004

Center-based l₁-clustering method

An alternative extension of the k-means algorithm for clustering categorical data

An algorithm for reducing the dimension and size of a sample for data exploration procedures

Graph-based generation of a meta-learning search space