Quality improvement of rule-based gene group descriptions using information about GO terms importance occurring in premises of determined rules

Sikora, Marek; Gruca, Aleksandra

doi:10.2478/v10006-010-0041-3

Artykuł - szczegóły

Czasopismo

International Journal of Applied Mathematics and Computer Science

2010 | 20 | 3 | 555-570

Tytuł artykułu

Quality improvement of rule-based gene group descriptions using information about GO terms importance occurring in premises of determined rules

Autorzy

Marek Sikora , Aleksandra Gruca

Treść / Zawartość

Pełne teksty:

http://matwbn.icm.edu.pl/ksiazki/amc/amc20/amc20311.pdf [zdalny]

Warianty tytułu

Języki publikacji

EN

Abstrakty

EN

In this paper we present a method for evaluating the importance of GO terms which compose multi-attribute rules. The rules are generated for the purpose of biological interpretation of gene groups. Each multi-attribute rule is a combination of GO terms and, based on relationships among them, one can obtain a functional description of gene groups. We present a method which allows evaluating the influence of a given GO term on the quality of a rule and the quality of a whole set of rules. For each GO term, we compute how big its influence on the quality of generated set of rules and therefore the quality of the obtained description is. Based on the computed quality of GO terms, we propose a new algorithm of rule induction in order to obtain a more synthetic and more accurate description of gene groups than the description obtained by initially determined rules. The obtained GO terms ranking and newly obtained rules provide additional information about the biological function of genes that compose the analyzed group of genes.

Słowa kluczowe

EN

decision rules importance of rules premises measures of rules interestingness gene ontology descriptions of gene groups

Wydawca

University of Zielona Gora Press

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2010

Tom

20

Numer

3

Strony

555-570

Opis fizyczny

Daty

wydano

2010

otrzymano

2010-01-10

poprawiono

2010-06-01

Twórcy

autor

Marek Sikora

Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland

autor

Aleksandra Gruca

Institute of Innovative Technologies EMAG, Leopolda 31, 40-189 Katowice, Poland

Bibliografia

Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules, VLDB'94, Proceedings of the 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, pp. 487-499.
Agresti, A. (2002). Categorical Data Analysis, Wiley Interscience, Hoboken, NJ.
Al-Shahrour, F., Minguez, P., Vaquerizas, J., Conde, L. and Dopazo, J. (2005). Babelomics: A suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments, Nucleic Acids Research 33: W460-W464.
An, A. and Cercone, N. (2001). Rule quality measures for rule induction systems: Description and evaluation, Computational Intelligence 17(3): 409-424.
Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H., Cherry, J., Davis, A., Dolinski, K., Dwight, S., Eppig, J., Harris, M., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J., Richardson, J., Ringwald, M., Rubin, G. and Sherlock, G. (2000). Gene ontology: Tool for the unification of biology, Nature Genetics 25(1): 25-29.
Bairagi, R. and Suchindran, C. (1989). An estimator of the cutoff point maximizing sum of sensitivity and specificity, Sankhya, Indian Journal of Statistics 51(B-2): 263-269.
Baldi, P. and Hatfield, G. (2002). DNA Microarrays and Gene Expression, Cambridge University Press, Cambridge.
Banzhaf, J. (1965). Weighted voting doesn‘t work: A mathematical analysis, Rutgers Law Review 19(2): 317-343.
Benjamini, Y. and Hochberg, T. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing, Journal of the Royal Statistical Society: Series B 57(1): 289-300.
Bruckmann, A., Hensbergen, P., Balog, C., Deelder, A., de Steensma, H. and van Heusden, G. (2007). Posttranscriptional control of the saccharomyces cerevisiae proteome by 14-3-3 proteins, Journal of Proteome Research 6(5): 1689-1699.
Brzezinska, I., Greco, S. and Slowinski, R. (2007). Mining pareto-optimal rules with respect to support and confirmation or support and anti-support, Engineering Applications of Artificial Intelligence 20(5): 587-600.
Carmona-Saez, P., Chagoyen, M., Rodriguez, A., Trelles, O., Carazo, J. and Pascual-Montano, A. (2006). Integrated analysis of gene expression by association rules discovery, BMC Bioinformatics 7(1): 54.
Carmona-Saez, P., Chagoyen, M., Tirado, F., Carazo, J. and Pascual-Montano, A. (2007). Genecodis: A web-based tool for finding significant concurrent annotations in gene lists, Genome Biology 8(1): R3.
Eisen, M., Spellman, P., Brown, P. and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns, Proceedings of the National Academy of Sciences of the United States of America 95(25): 14863-14868.
Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P. (1996). From data mining to knowledge discovery: An overview, in U. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining, American Association for Artificial Intelligence, Menlo Park, CA, pp. 1-34.
Fürnkranz, J. (1999). Separate-and-conquer rule learning, Artificial Intelligence Review 13(1): 3-54.
Fürnkranz, J. and Flach, P. (2005). Roc'n'rule learning - Towards a better understanding of covering algorithms, Machine Learning 58(1): 39-77.
Greco, S., Pawlak, Z. and Słowiński, R. (2004). Can Bayesian confirmation measures be useful for rough set decision rules?, Engineering Applications of Artificial Intelligence 17(4): 345-361.
Greco, S., Słowiński, R. and Stefanowski, J. (2007). Evaluating importance of conditions in the set of discovered rules, RSFDGrC '07: Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Toronto, Ontario, Canada, pp. 314-321.
Gruca, A. and Sikora, M. (2009). Ontological description of gene groups by the multiattribute statistically significant logical rules, in S. Safeeullah (Ed.), Engineering the Computer Science and IT, INTECH, Vukovar, pp. 277-303.
Gruca, A., Sikora, M., Chróst, Ł. and Polański, A. (2009). Rulego. Bioinformatical internet service system architecture, Proceedings of the 16th Conference on Computer Networks. Communications in Computer and Information Sciences, Wisła, Poland, pp. 160-167.
Grzymała-Busse, J., Stefanowski, J. and Wilk, S. (2005). A comparison of two approaches to data mining from imbalanced data, Journal of Intelligent Manufacturing 16(6): 565-573.
Grzymała-Busse, J. and Ziarko, W. (2003). Data mining based on rough sets, in J. Wang (Ed.), Data Mining: Opportunities and Challenges, IGI Publishing, Hershey, PA, pp. 142-173.
Guillet, F. and Hamilton, H. (2007). Quality Measures in Data Mining (Studies in Computational Intelligence), Springer-Verlag New York, Inc., Secaucus, NJ.
Hackenberg, M. and Matthiesen, R. (2008). Annotationmodules: A tool for finding significant combinations of multisource annotations for gene lists, Bioinformatics 24(11): 1386-1393.
Hvidsten, T., Legreid, A. and Komorowski, H. (2003). Learning rule-based models of biological process from gene expression time profiles using gene ontology, Bioinformatics 19(9): 1116-1123.
Iyer, V., Eisen, M., Ross, D., Schuler, G., Moore, T., Lee, J., Trent, J., Staudt, L., Hudson, J., Boguski, M., Lashkari, D., Shalon, D., Botstein, D. and Brown, P. (1999). The transcriptional program in the response of human fibroblasts to serum, Science 283(5398): 83-87.
Kano, M., Morishita, Y., Iwata, C., Iwasaka, S., Watabe, T., Ouchi, Y., Miyazono, K. and Miyazawa, K. (2005). Vegfa and fgf-2 synergistically promote neoangiogenesis through enhancement of endogenous pdgf-b-pdgfrbeta signaling, Journal of Cell Science 118(Pt 16): 3759-3768.
Khatri, P. and Drăghici, S. (2005). Ontological analysis of gene expression data: Current tools, limitations, and open problems, Bioinformatics 21(18): 3587-3595.
Maere, S., Heymans, K. and Kuiper, M. (2005). Bingo: A cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks, Bioinformatics 21(16): 3448-3449.
Mata-Greenwood, E., Meyrick, B., Soifer, S., Fineman, J. and Black, S. (2003). Expression of vegf and its receptors flt-1 and flk-1/kdr is altered in lambs with increased pulmonary blood flow and pulmonary hypertension, American Journal of Physiology: Lung Cellular and Molecular Physiology 285(1): L222-L231.
Michalski, R., Bratko, I. and Kubar, M. (1998). Machine Learning and Data Mining: Methods and Applications, John Wiley and Sons, New York, NY.
Midelfart, H. (2005a). Supervised learning in the gene ontology, Part I: A rough set framework, in J. Peters and A. Skowron (Eds.) Transactions on Rough Sets IV, Lecture Notes in Computer Science, Vol. 3700, Springer, Berlin/Heidelberg, pp. 69-97.
Midelfart, H. (2005b). Supervised learning in gene ontology, Part II: A bottom-up algorithm, in J. Peters and A. Skowron (Eds.) Transactions on Rough Sets IV, Lecture Notes in Computer Science, Vol. 3700, Springer, Berlin/Heidelberg, pp. 98-124.
Seghezzi, G., Patel, S., Ren, C., Gualandris, A., Pintucci, G., Robbins, E., Shapiro, R., Galloway, A., Rifkin, D. and Mignatti, P. (1998). Fibroblast growth factor-2 (fgf-2) induces vascular endothelial growth factor (vegf) expression in the endothelial cells of forming capillaries: an autocrine mechanism contributing to angiogenesis, The Journal of Cell Biology 141(7): 1659-1673.
Sikora, M. (2006). Rule Quality Measures in Creation and Reduction of Data Role Models, Lecture Notes in Artificial Intelligence, Vol. 4259, Springer, Heidelberg, pp. 716-725.
Sikora, M. (2010). Decision rules-based data models using TRS and NetTRS-Methods and algorithms, in J., Peters and A. Skowron (Eds.), Transactions on Rough Sets XI, Lecture Notes on Computer Sciences, Vol. 5946, Springer, Berlin/Heidelberg, pp. 130-160.
Stefanowski, J. and Vanderpooten, D. (2001). Induction of decision rules in classification and discovery-oriented perspectives, International Journal on Intelligent Systems 16(1): 13-27.

Typ dokumentu

Bibliografia

Identyfikatory

DOI

10.2478/v10006-010-0041-3

Identyfikator YADDA

bwmeta1.element.bwnjournal-article-amcv20i3p555bwm

Artykuł - szczegóły

Czasopismo

International Journal of Applied Mathematics and Computer Science

Tytuł artykułu

Quality improvement of rule-based gene group descriptions using information about GO terms importance occurring in premises of determined rules

Autorzy

Treść / Zawartość

Warianty tytułu

Języki publikacji

Abstrakty

Słowa kluczowe

Wydawca

Czasopismo

Rocznik

Tom

Numer

Strony

Opis fizyczny

Daty

Twórcy

Bibliografia

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA