Data mining methods for gene selection on the basis of gene expression arrays

Muszyński, Michał; Osowski, Stanisław

doi:10.2478/amcs-2014-0048

Artykuł - szczegóły

Czasopismo

International Journal of Applied Mathematics and Computer Science

2014 | 24 | 3 | 657-668

Tytuł artykułu

Data mining methods for gene selection on the basis of gene expression arrays

Autorzy

Michał Muszyński , Stanisław Osowski

Treść / Zawartość

Pełne teksty:

http://matwbn.icm.edu.pl/ksiazki/amc/amc24/amc24315.pdf [zdalny]

Warianty tytułu

Języki publikacji

EN

Abstrakty

EN

The paper presents data mining methods applied to gene selection for recognition of a particular type of prostate cancer on the basis of gene expression arrays. Several chosen methods of gene selection, including the Fisher method, correlation of gene with a class, application of the support vector machine and statistical hypotheses, are compared on the basis of clustering measures. The results of applying these individual selection methods are combined together to identify the most often selected genes forming the required pattern, best associated with the cancerous cases. This resulting pattern of selected gene lists is treated as the input data to the classifier, performing the task of the final recognition of the patterns. The numerical results of the recognition of prostate cancer from normal (reference) cases using the selected genes and the support vector machine confirm the good performance of the proposed gene selection approach.

Słowa kluczowe

EN

gene expression array gene ranking feature selection clusterization measures fusion SVM classification

Wydawca

University of Zielona Gora Press

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2014

Tom

24

Numer

3

Strony

657-668

Opis fizyczny

Daty

wydano

2014

otrzymano

2013-09-17

poprawiono

2014-01-15

poprawiono

2014-03-09

Twórcy

autor

Michał Muszyński

Faculty of Electrical Engineering, Warsaw University of Technology, pl. Politechniki 1, 00-661 Warsaw, Poland

autor

Stanisław Osowski

Faculty of Electrical Engineering, Warsaw University of Technology, pl. Politechniki 1, 00-661 Warsaw, Poland
Faculty of Electronic Engineering, Military University of Technology, ul. Kaliskiego 2, 00-908 Warsaw, Poland

Bibliografia

Baldi, P. and Long, A. (2001). A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inference of gene changes, Bioinformatics 17(4): 509-519.
Chang, C.-C. and Lin, C.-J. (2011). LibSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology 1(27): 1-27.
De Rinaldis, E. (2007). DNA Microarrays: Current Applications, Horizon Scientific Press, Norfolk.
Duda, R., Hart, P. and Stork, P. (2003). Pattern Classification and Scene Analysis, John Wiley, New York, NY.
Eisen, M., Spellman, P. and Brown, P. (1998). Cluster analysis and display of genome wide expression patterns, Proceedings of the National Academy of Sciences 95(25): 14863-14868.
Fan, R.-E., Chen, P.-H. and Lin, C.-J. (2005). Working set selection using second order information for training SVM, Journal of Machine Learning Research 6(12): 1889-1918.
Furey, T., Cristianini, N., Duffy, N., Bednarski, D., Schummer, M. and Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics 16(10): 906-914.
Golub, T., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A. and Bloomfield, C.D. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science 286(5439): 531-537.
Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection, Journal of Machine Learning Research 3(3): 1158-1182.
Guyon, I., Weston, A., Barnhill, S. and Vapnik, V. (2002). Gene selection for cancer classification using SVM, Machine Learning 46(1-3): 389-422.
Haykin, S. (1999). Neural Networks. A Comprehensive Foundation, 2nd Edition, Prentice-Hall, Englewood Cliffs, NJ.
Herrero, J., Valencia, A. and Dopazon, A. (2001). A hierarchical unsupervised growing neural network for clustering gene expression patterns, Bioinformatics 17(2): 126-136.
Hewett, R. and Kijsanayothin, P. (2008). Tumor classification ranking from microarray data, BMC Genomics 9(2): 1-11.
Huang, T.M. and Kecman, V. (2005). Gene extraction for cancer diagnosis by support vector machines-an improvement, Artificial Intelligence in Medicine 9(35): 185-194.
Huang, X. and Pan, W. (2003). Linear regression and two-class classification with gene expression data, Bioinformatics 19(16): 2072-2078.
Makinaci, M. (2007). Support vector machine approach for classification of cancerous prostate regions, World Academy of Science, Engineering and Technology 1(7): 166-169.
Matlab (2012). Matlab User Manual-Statistics Toolbox, MathWorks, Natic.
Mitsubayashi, H., Aso, S., Nagashima, T. and Okada, Y. (2008). Accurate and robust gene selection for desease classification using a simple statistics, Biomedical Informatics 3(2): 68-71.
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E. and Golub, T. (2001). Multiclass cancer diagnosis using tumor gene expression signatures, Proceedings of the National Academy of Sciences 98(26): 15149-15154.
Sabo, K. (2014). Center-based l₁-clustering method, International Journal of Applied Mathematics and Computer Science 24(1): 151-163, DOI: 10.2478/amcs-2014-0012.
Scholkopf, B. and Smola, A. (2002). Learning with Kernels, MIT Press, Cambridge, MA.
Sprent, P. and Smeeton, N. (2007). Applied Nonparametric Statistical Methods, Chapman and Hall-CRC, Boca Raton, FL.
Świniarski, R.W. (2001). Rough sets methods in feature reduction and classification, International Journal of Applied Mathematics and Computer Science 11(3): 565-582.
Tan, P.N., Steinbach, M. and Kumar, V. (2006). Introduction to Data Mining, Pearson Education, Boston, MA.
Vanderbilt (2002). Data base of prostate cancer, Vanderbilt University, http://discover1.mc.vanderbilt.edu/discover/public/mcsvm.
Vert, J. (2007). Kernel methods in genomics and computational biology, in G. Camps-Valls, J.L. Rojo-Alvarez and M. Martinez-Ramon (Eds.), Kernel Methods in Bioengineering, Signal and Image Processing, Idea Group, London, pp. 42-64.
Wang, X. and Gotoh, O. (2009). Cancer classification using single genes, Genom Informatics 23(1): 179-188.
Wang, X. and Gotoh, O. (2010). A robust gene selection method for microarray-based cancer classification, Cancer Informatics 9(2): 15-30.
Wiliński, A. and Osowski, S. (2012). Ensemble of data mining methods for gene ranking, Bulletin of the Polish Academy of Sciences 60(3): 461-471.
Woolf, P.J. and Wang, Y. (2000). A fuzzy logic approach to analyzing gene expression data, Physiological Genomics 3(1): 9-15.
Yang, F. (2011). Robust feature selection for microarray data based on multicriterion fusion, IEEE Transactions on Computational Biology and Bioinformatics 8(4): 1080-1092.

Typ dokumentu

Bibliografia

Identyfikatory

DOI

10.2478/amcs-2014-0048

Identyfikator YADDA

bwmeta1.element.bwnjournal-article-amcv24i3p657bwm

Artykuł - szczegóły

Czasopismo

International Journal of Applied Mathematics and Computer Science

Tytuł artykułu

Data mining methods for gene selection on the basis of gene expression arrays

Autorzy

Treść / Zawartość

Warianty tytułu

Języki publikacji

Abstrakty

Słowa kluczowe

Wydawca

Czasopismo

Rocznik

Tom

Numer

Strony

Opis fizyczny

Daty

Twórcy

Bibliografia

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA