Pipelined language model construction for Polish speech recognition
The aim of works described in this article is to elaborate and experimentally evaluate a consistent method of Language Model (LM) construction for the sake of Polish speech recognition. In the proposed method we tried to take into account the features and specific problems experienced in practical applications of speech recognition in the Polish language, reach inflection, a loose word order and the tendency for short word deletion. The LM is created in five stages. Each successive stage takes the model prepared at the previous stage and modifies or extends it so as to improve its properties. At the first stage, typical methods of LM smoothing are used to create the initial model. Four most frequently used methods of LM construction are here. At the second stage the model is extended in order to take into account words indirectly co-occurring in the corpus. At the next stage, LM modifications are aimed at reduction of short word deletion errors, which occur frequently in Polish speech recognition. The fourth stage extends the model by insertion of words that were not observed in the corpus. Finally the model is modified so as to assure highly accurate recognition of very important utterances. The performance of the methods applied is tested in four language domains.
- Brown, P., deSouza, P.V., Mercer, R.L., Pietra, V.J.D. and Lai, J.C. (1992). Class-based n-gram models of natural language, Computational Linguistics 18(1): 467-479.
- Brychcin, T. and Konopik, M. (2011). Morphological based language models for inflectional languages, Proceedings of the 6th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems, Praque, Czech Republic, pp. 560-563.
- Chen, S. and Goodman, S. (1999). An empirical study of smoothing techniques for language modeling, Computer Speech and Language 1(13): 359-394.
- Chen, Y. and Chan, K. (2003). Extended multi-word trigger pair language model using data mining technique, Systems, Man and Cybernetics 1(1): 262-267.
- Devine, E., Gaehde, S. and Curtis, A. (2007). Comparative evaluation of three continuous speech recognition software packages in the generation of medical reports, Journal of American Medical Informatics Association 1(7): 462-468.
- Gale, A. and Sampson, G. (1995). Good-Turing frequency estimation without tears, Journal of Quantitative Linguistics 2(1): 217-239.
- Goodman, J. (2001). A bit of progress in language modeling extended version, Technical Report MSR-TR-2001-72, Machine Learning and Applied Statistics Group, Microsoft Research, Redmond, WA.
- Iyer, R. and Ostendorf, M. (1999). Modeling long distance dependence in language: Topic mixtures versus dynamic cache models, IEEE Transactions on Speech and Audio Processing 7(1): 30-39.
- Jelinek, F., Merialdo, B., Roukos, S. and Strauss, M. (2001). A dynamic language model for speech recognition, Proceedings of the Workshop on Speech and Natural Language, HLT'91, Pacific Grove, CA, USA, pp. 293-295.
- Jurafsky, D. and Matrin, J. (2009). Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Pearson Prentice Hall, Englewood Cliffs, NJ.
- Kasprzak, W., Wilkowski, A. and Czapnik, K. (2012). Hand gesture recognition based on free-form contours and probabilistic inference, International Journal of Applied Mathematics and Computer Science 22(2): 437-448, DOI: 10.2478/v10006-012-0033-6.
- Katz, S. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Transactions on Acoustics, Speech, and Signal Processing 35(3): 400-401.
- Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation, MIT Summit 2005, Phuket, Thailand, pp. 79-86.
- Kolorenc, J., Nouza, J. and Cerva, P. (2006). Multi-words in the Czech TV and radio news transcription system, Proceedings of SPECOM 2006, St. Petersburg, Russia, pp. 70-74.
- Lee, A., Kawahara, T. and Shikano, K. (2001). Julius-an open source real-time large vocabulary recognition engine, Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH), Aalborg, Denmark, pp. 1691-1694.
- Mauces, M., Rotownik, T. and Zemljak, M. (2003). Modelling highly inflected Slovenian language, International Journal of Speech Technology 1(6): 254-257.
- Mikolov, T., Deoras, A., Kombrink, S., Burget, L. and Cernocky, J. (2011). Empirical evaluation and combination of advanced language modeling techniques, INTERSPEECH, ISCA, Florence, Italy, pp. 605-608.
- Niesler, T., Whittaker, E.W.D. and Woodland, P. (1998). Comparison of part-of-speech and automatically derived category-based language models for speech recognition, Proceedings of ICASSP 98, Seattle, WA, USA, pp. 177-180.
- Piasecki, M. (2007). Polish tagger TaKIPI: Rule based construction and optimisation, Task Quarterly 11(1): 151-167.
- Piasecki, M. and Broda, B. (2007). Correction of medical handwriting OCR based on semantic similarity, in H. Yin, P. Tino, E. Corchado, W. Byrne and X. Yao (Eds.), Intelligent Data Engineering and Automated Learning-IDEAL 2007, Lecture Notes in Computer Science, Vol. 4881, Springer-Verlag, Heidelberg, pp. 437-446.
- Piasecki, M. and Radziszewski, A. (2008). Morphological prediction for Polish by a statistical a tergo index, Systems Science 34(4): 7-17.
- Sarukkai, R. and Ballard, D. (1996). Word set probability boosting for improved spontaneous dialogue recognition. The ab and tab algorithms, Technical Report TR-601, University of Rochester, New York, NY.
- Sas, J. (2009). Optimal spoken dialog control in hands-free medical information systems, Journal of Medical Informatics and Technologies 13: 113-120.
- Sas, J. (2010). Application of local bidirectional language model to error correction in Polish medical speech recognition, Journal of Medical Informatics and Technologies 15(1): 127-134.
- Sas, J. and Żołnierek, A. (2011). Distant co-occurrence language model for ASR in loose word order languages, Proceedings of the International Conference on Computer Recognition Systems Cores 2011, Wrocław, Poland, pp. 767-778.
- Vaiciunas, A., Kaminskas, V. and Raskinis, G. (2004). Statistical language models of Lithuanian based on word clustering and morphological decomposition, Informatica 15(4): 565-580.
- Ward, W. and Issar, S. (1996). A class based language model for speech recognition, Acoustics, Speech, and Signal Processing, ICASSP 96, Atlanta, GA, USA, pp. 416-418.
- Whittaker, E. and Woodland, P. (2003). Language modelling for Russian and English using words and classes, Computer Speech and Language 17(1): 87-104.
- Woliński, M. (2006). Morfeusz-a practical tool for the morphological analysis of Polish, Inteligent Processing and Web Mining: IIPWM 06, Ustroń, Poland, pp. 503-512.
- Woźniak, M. and Krawczyk, B. (2012). Combined classifier based on feature space partitioning, International Journal of Applied Mathematics and Computer Science 22(4): 855-866, DOI: 10.2478/v10006-012-0063-0.
- Young, S. and Everman, G. (2009). The HTK Book (for HTK Version 3.4), Cambridge University, Cambridge.
- Ziółko, B., Skurzok, D. and Ziółko, M. (2010). Word n-grams for Polish, Proceedings of the 10th IASTED International Conference on Artificial Intelligence and Applications (AIA 2010), Innsbruck, Austria, pp. 197-201.
- Ziółko, J., Gałka, J., Jadczyk, T., Skurzok, D. and Masior, M. (2011). Automatic speech recognition system dedicated for Polish, Proceedings of the INTERSPEECH 2011 Conference, Florence, Italy, pp. 3315-3316.
- Ziółko, J., Gałka, J. and Skurzok, D. (2010). Speech modelling using phoneme segmentation and modified weighted Levenshtein distance, Proceedings of the ICALP2010 Colloquium, Bordeaux, France, pp. 743-746.