PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
2014 | 24 | 1 | 199-212
Tytuł artykułu

Survival analysis on data streams: Analyzing temporal events in dynamically changing environments

Treść / Zawartość
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
In this paper, we introduce a method for survival analysis on data streams. Survival analysis (also known as event history analysis) is an established statistical method for the study of temporal “events” or, more specifically, questions regarding the temporal distribution of the occurrence of events and their dependence on covariates of the data sources. To make this method applicable in the setting of data streams, we propose an adaptive variant of a model that is closely related to the well-known Cox proportional hazard model. Adopting a sliding window approach, our method continuously updates its parameters based on the event data in the current time window. As a proof of concept, we present two case studies in which our method is used for different types of spatio-temporal data analysis, namely, the analysis of earthquake data and Twitter data. In an attempt to explain the frequency of events by the spatial location of the data source, both studies use the location as covariates of the sources.
Rocznik
Tom
24
Numer
1
Strony
199-212
Opis fizyczny
Daty
wydano
2014
otrzymano
2013-01-30
poprawiono
2013-08-30
Twórcy
autor
  • Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Straße, 35032 Marburg, Germany
  • Department of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Straße, 35032 Marburg, Germany
Bibliografia
  • Aggarwal, C.C., Han, J., Wang, J. and Yu, P.S. (2003). A framework for clustering evolving data streams, Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany, pp. 81-92.
  • Allan, J., Papka, R. and Lavrenko, V. (1998). On-line new event detection and tracking, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), Melbourne, Australia, pp. 37-45.
  • Amati, G., Amodeo, G. and Gaibisso, C. (2012). Survival analysis for freshness in microblogging search, Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM-2012), Maui, HI, USA, pp. 2483-2486.
  • Amodeo, G., Blanco, R. and Brefeld, U. (2011). Hybrid models for future event prediction, Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM-2011), Glasgow, UK, pp. 1981-1984.
  • Babcock, B., Babu, S., Datar, M., Motwani, R. and Widom, J. (2002). Models and issues in data stream systems, Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Madison, WI, USA, pp. 1-16.
  • Beringer, J. and H¨ullermeier, E. (2006). Online clustering of parallel data streams, Data and Knowledge Engineering 58(2): 180-204.
  • Bottou, L. (1998). Online algorithms and stochastic approximations, in D. Saad (Ed.), Online Learning and Neural Networks, Cambridge University Press, Cambridge.
  • Chen, G., Wu, X. and Zhu, X. (2005). Sequential pattern mining in multiple streams, Proceedings of the 5th IEEE International Conference on Data Mining (ICDM), Houston, TX, USA, pp. 585-588.
  • Cheon, S.-P., Kim, S., Lee, S.-Y. and Lee, C.-B. (2009). Bayesian networks based rare event prediction with sensor data, Knowledge-Based Systems 22(5): 336-343.
  • Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y. and Zdonik, S. (2003). Scalable distributed stream processing, Proceedings of CIDR-03: 1st Biennial Conference on Innovative Database Systems, Asilomar, CA, USA.
  • Considine, J., Li, F., Kollios, G. and Byers, J. (2004). Approximate aggregation techniques for sensor databases, ICDE-04: 20th IEEE International Conference on Data Engineering, Boston, MA, USA, pp. 449-460.
  • Cormode, G. and Muthukrishnan, S. (2005). What's hot and what's not: Tracking most frequent items dynamically, ACM Transactions on Database Systems 30(1): 249-278.
  • Cox, D. (1972). Regression models and life tables, Journal of the Royal Statistical Society B 34(2): 187-220.
  • Cox, D. R. and Oakes, D. (1984). Analysis of Survival Data, Chapman & Hall, London.
  • Das, A., Gehrke, J. and Riedewald, M. (2003). Approximate join processing over data streams, Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, CA, USA, pp. 40-51.
  • Domingos, P. and Hulten, G. (2003). A general framework for mining massive data streams, Journal of Computational and Graphical Statistics 12(4): 945-949.
  • Gaber, M.M., Zaslavsky, A. and Krishnaswamy, S. (2005). Mining data streams: A review, ACM SIGMOD Record 34(1): 18-26.
  • Gama, J. (2012). A survey on learning from data streams: Current and future trends, Progress in Artificial Intelligence 1(1): 45-55.
  • Gama, J. and Gaber, M.M. (2007). Learning from Data Streams, Springer-Verlag, Berlin/New York, NY.
  • Garofalakis, M., Gehrke, J. and Rastogi, R. (2002). Querying and mining data streams: You only get one look, Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, WI, USA, pp. 635-635.
  • Golab, L. and Tamer, M. (2003). Issues in data stream management, ACM SIGMOD Record 32(2): 5-14.
  • Hulten, G., Spencer, L. and Domingos, P. (2001). Mining time-changing data streams, Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, pp. 97-106.
  • Ikonomovska, E., Gama, J. and Dzeroski, S. (2011). Learning model trees from evolving data streams, Data Mining and Knowledge Discovery 23(1): 128-168.
  • Krizanovic, K., Galic, Z. and Baranovic, M. (2011). Data types and operations for spatio-temporal data streams, IEEE International Conference on Mobile Data Management (MDM), Luleå, Sweden, pp. 11-14.
  • Li, R., Lei, K.H., Khadiwala, R. and Chang, K.C.-C. (2012). Tedas: A twitter-based event detection and analysis system, Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA, pp. 1273-1276.
  • Oliveira, M. and Gama, J. (2012). A framework to monitor clusters evolution applied to economy and finance problems, Intelligent Data Analysis 16(1): 93-111.
  • Radinsky, K. and Horvitz, E. (2013). Mining the web to predict future events, Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM 2013), Rome, Italy, pp. 255-264.
  • Sakaki, T., Okazaki, M. and Matsuo, Y. (2013). Tweet analysis for real-time event detection and earthquake reporting system development, IEEE Transactions on Knowledge and Data Engineering 25(4): 919-931.
  • Weng, J. and Lee, B.-S. (2011). Event detection in twitter, Proceedings of the 5th International Conference on Weblogs and Social Media (ICWSM 2011), Barcelona, Spain.
  • Yang, Y., Pierce, T. and Carbonell, J.G. (1998). A study of retrospective and on-line event detection, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), Melbourne, Australia, pp. 28-36.
  • Zadeh, L. (1965). 8(3): 338-353. Fuzzy sets, Information and Control
  • Zupan, B., Demšar, J., Kattan, M.W., Beck, J.R. and Bratko, I. (2000). Machine learning for survival analysis: A case study on recurrence of prostate cancer, Artificial Intelligence in Medicine 20(1): 59-75.
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.bwnjournal-article-amcv24i1p199bwm
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.