Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
2013 | 1 | 65-93
Tytuł artykułu

Prediction of time series by statistical learning: general losses and fast rates

Treść / Zawartość
Warianty tytułu
Języki publikacji
We establish rates of convergences in statistical learning for time series forecasting. Using the PAC-Bayesian approach, slow rates of convergence √ d/n for the Gibbs estimator under the absolute loss were given in a previous work [7], where n is the sample size and d the dimension of the set of predictors. Under the same weak dependence conditions, we extend this result to any convex Lipschitz loss function. We also identify a condition on the parameter space that ensures similar rates for the classical penalized ERM procedure. We apply this method for quantile forecasting of the French GDP. Under additional conditions on the loss functions (satisfied by the quadratic loss function) and for uniformly mixing processes, we prove that the Gibbs estimator actually achieves fast rates of convergence d/n. We discuss the optimality of these different rates pointing out references to lower bounds when they are available. In particular, these results bring a generalization the results of [29] on sparse regression estimation to some autoregression.
  • University College Dublin, School of Mathematical Sciences,
  • INSIGHT Centre for Data Analytics
  • Université de Cergy, Laboratoire Analyse Géométrie Modélisation
  • Université Paris-Dauphine, CEREMADE
  • [1] A. Agarwal and J. C. Duchi, The generalization ability of online algorithms for dependent data, IEEE Trans. Inform.Theory 59 (2011), no. 1, 573–587.
  • [2] H. Akaike, Information theory and an extension of the maximum likelihood principle, 2nd International Symposiumon Information Theory (B. N. Petrov and F. Csaki, eds.), Budapest: Akademia Kiado, 1973, pp. 267–281.
  • [3] P. Alquier and P. Lounici, PAC-Bayesian bounds for sparse regression estimation with exponential weights, Electron.J. Stat. 5 (2011), 127–145.[Crossref]
  • [4] P. Alquier, PAC-Bayesian bounds for randomized empirical risk minimizers, Math. Methods Statist. 17 (2008), no. 4,279–304.[Crossref]
  • [5] K. B. Athreya and S. G. Pantula, Mixing properties of Harris chains and autoregressive processes, J. Appl. Probab.23 (1986), no. 4, 880–892. MR 867185 (88c:60127)
  • [6] J.-Y. Audibert, Fast rates in statistical inference through aggregation, Ann. Statist. 35 (2007), no. 2, 1591–1646.
  • [7] P. Alquier and O. Wintenberger, Model selection for weakly dependent time series forecasting, Bernoulli 18 (2012),no. 3, 883–193.[Crossref]
  • [8] G. Biau, O. Biau, and L. Rouvière, Nonparametric forecasting of the manufacturing output growth with firm-levelsurvey data, Journal of Business Cycle Measurement and Analysis 3 (2008), 317–332.
  • [9] A. Belloni and V. Chernozhukov, L1-penalized quantile regression in high-dimensional sparse models, Ann. Statist.39 (2011), no. 1, 82–130.[Crossref]
  • [10] P. Brockwell and R. Davis, Time series: Theory and methods (2nd edition), Springer, 2009.
  • [11] E. Britton, P. Fisher, and J. Whitley, The inflation report projections: Understanding the fan chart, Bank of EnglandQuarterly Bulletin 38 (1998), no. 1, 30–37.
  • [12] L. Birgé and P. Massart, Gaussian model selection, J. Eur. Math. Soc. 3 (2001), no. 3, 203–268.
  • [13] G. Biau and B. Patra, Sequential quantile prediction of time series, IEEE Trans. Inform. Theory 57 (2011), 1664–1674.[Crossref]
  • [14] F. Bunea, A. B. Tsybakov, and M. H. Wegkamp, Aggregation for gaussian regression, Ann. Statist. 35 (2007), no. 4,1674–1697.[Crossref]
  • [15] O. Catoni, A PAC-Bayesian approach to adaptative classification, preprint (2003).
  • [16] O. Catoni, Statistical learning theory and stochastic optimization, Springer Lecture Notes in Mathematics, 2004.
  • [17] O. Catoni, PAC-Bayesian supervised classification (the thermodynamics of statistical learning), Lecture Notes-Monograph Series, vol. 56, IMS, 2007.
  • [18] N. Cesa-Bianchi and G. Lugosi, Prediction, learning, and games, Cambridge University Press, New York, 2006.
  • [19] L. Clavel and C. Minodier, A monthly indicator of the french business climate, Documents de Travail de la DESE,2009.
  • [20] M. Cornec, Constructing a conditional gdp fan chart with an application to french business survey data, 30th CIRETConference, New York, 2010.
  • [21] N. V. Cuong, L. S. Tung Ho, and V. Dinh, Generalization and robustness of batched weighted average algorithmwith v-geometrically ergodic markov data, Proceedings of ALT’13 (Jain S., R. Munos, F. Stephan, and T. Zeugmann,eds.), Springer, 2013, pp. 264–278.
  • [22] J. C. Duchi, A. Agarwal, M. Johansson, and M. I. Jordan, Ergodic mirror descent, SIAM J. Optim. 22 (2012), no. 4,1549–1578.
  • [23] J. Dedecker, P. Doukhan, G. Lang, J. R. León, S. Louhichi, and C. Prieur, Weak dependence, examples and applications,Lecture Notes in Statistics, vol. 190, Springer-Verlag, Berlin, 2007.
  • [24] M. Devilliers, Les enquêtes de conjoncture, Archives et Documents, no. 101, INSEE, 1984.
  • [25] E. Dubois and E. Michaux, étalonnages à l’aide d’enquêtes de conjoncture: de nouvaux résultats, Économie etPrévision, no. 172, INSEE, 2006.
  • [26] P. Doukhan, Mixing, Lecture Notes in Statistics, Springer, New York, 1994.
  • [27] K. Dowd, The inflation fan charts: An evaluation, Greek Economic Review 23 (2004), 99–111.
  • [28] A. Dalalyan and J. Salmon, Sharp oracle inequalities for aggregation of affine estimators, Ann. Statist. 40 (2012),no. 4, 2327–2355.[Crossref]
  • [29] A. Dalalyan and A. Tsybakov, Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity,Mach. Learn. 72 (2008), 39–61.
  • [30] F. X. Diebold, A. S. Tay, and K. F. Wallis, Evaluating density forecasts of inflation: the survey of professionalforecasters, Discussion Paper No.48, ESRC Macroeconomic Modelling Bureau, University of Warwick and WorkingPaper No.6228, National Bureau of Economic Research, Cambridge, Mass., 1997.
  • [31] M. D. Donsker and S. S. Varadhan, Asymptotic evaluation of certain markov process expectations for large time. iii.,Comm. Pure Appl. Math. 28 (1976), 389–461.
  • [32] P. Doukhan and O. Wintenberger, Weakly dependent chain with infinite memory, Stochastic Process. Appl. 118(2008), no. 11, 1997–2013.
  • [33] R. F. Engle, Autoregressive conditional heteroscedasticity with estimates of variance of united kingdom inflation,Econometrica 50 (1982), 987–1008.[Crossref]
  • [34] C. Francq and J.-M. Zakoian, Garch models: Structure, statistical inference and financial applications, Wiley-Blackwell, 2010.
  • [35] S. Gerchinovitz, Sparsity regret bounds for individual sequences in online linear regression, Proceedings of COLT’11,2011.
  • [36] J. Hamilton, Time series analysis, Princeton University Press, 1994.
  • [37] H. Hang and I. Steinwart, Fast learning from α-mixing observations, Technical report, Fakultät für Mathematik undPhysik, Universität Stuttgart, 2012.
  • [38] I. A. Ibragimov, Some limit theorems for stationary processes, Theory Probab. Appl. 7 (1962), no. 4, 349–382.
  • [39] A. B. Juditsky, A. V. Nazin, A. B. Tsybakov, and N. Vayatis, Recursive aggregation of estimators bythe mirror descentalgorithm with averaging, Probl. Inf. Transm. 41 (2005), no. 4, 368–384.[Crossref]
  • [40] A. B. Juditsky, P. Rigollet, and A. B. Tsybakov, Learning my mirror averaging, Ann. Statist. 36 (2008), no. 5,2183–2206.[Crossref]
  • [41] R. Koenker and G. Jr. Bassett, Regression quantiles, Econometrica 46 (1978), 33–50.[Crossref]
  • [42] R. Koenker, Quantile regression, Cambridge University Press, Cambridge, 2005.
  • [43] S. Kullback, Information theory and statistics, Wiley, New York, 1959.
  • [44] N. Littlestone and M.K. Warmuth, The weighted majority algorithm, Information and Computation 108 (1994),212–261.
  • [45] P. Massart, Concentration inequalities and model selection - ecole d’été de probabilités de saint-flour xxxiii - 2003,Lecture Notes in Mathematics - J. Picard Editor, vol. 1896, Springer, 2007.
  • [46] D. A. McAllester, PAC-Bayesian model averaging, Procs. of of the 12th Annual Conf. On Computational LearningTheory, Santa Cruz, California (Electronic), ACM, New-York, 1999, pp. 164–170.
  • [47] R. Meir, Nonparametric time series prediction through adaptive model selection, Mach. Learn. 39 (2000), 5–34.
  • [48] C. Minodier, Avantages comparés des séries premières valeurs publiées et des séries des valeurs révisées, Documentsde Travail de la DESE, 2010.
  • [49] D. S. Modha and E. Masry, Memory-universal prediction of stationary random processes, IEEE Trans. Inform. Theory44 (1998), no. 1, 117–133.[Crossref]
  • [50] S. P. Meyn and R. L. Tweedie, Markov chains and stochastic stability, Communications and Control EngineeringSeries, Springer-Verlag London Ltd., London, 1993. MR 1287609 (95j:60103)
  • [51] A. Nemirovski, Topics in nonparametric statistics, Lectures on Probability Theory and Statistics - Ecole d’ét’e deprobagilités de Saint-Flour XXVIII (P. Bernard, ed.), Springer, 2000, pp. 85–277.
  • [52] R Development Core Team, R: A language and environment for statistical computing, R Foundation for StatisticalComputing, Vienna, 2008.
  • [53] E. Rio, Ingalités de Hoeffding pour les fonctions lipschitziennes de suites dépendantes, C. R. Math. Acad. Sci. Paris330 (2000), 905–908.
  • [54] P.-M. Samson, Concentration of measure inequalities for markov chains and φ-mixing processes, Ann. Probab. 28(2000), no. 1, 416–461.
  • [55] I. Steinwart and A. Christmann, Fast learning from non-i.i.d. observations, Advances in Neural Information ProcessingSystems 22 (Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, eds.), 2009, pp. 1768–1776.
  • [56] I. Steinwart, D. Hush, and C. Scovel, Learning from dependent observations, J. Multivariate Anal. 100 (2009),175–194.[Crossref]
  • [57] Y. Seldin, F. Laviolette, N. Cesa-Bianchi, J. Shawe-Taylor, J. Peters, and P. Auer, Pac-bayesian inequalities formartingales, IEEE Trans. Inform. Theory 58 (2012), no. 12, 7086–7093.[Crossref]
  • [58] A. Sanchez-Perez, Time series prediction via aggregation : an oracle bound including numerical cost, PreprintarXiv:1311.4500, 2013.
  • [59] G. Stoltz, Agrégation séquentielle de prédicteurs : méthodologie générale et applications à la prévision de laqualité de l’air et à celle de la consommation électrique, Journal de la SFDS 151 (2010), no. 2, 66–106.
  • [60] J. Shawe-Taylor and R. Williamson, A PAC analysis of a bayes estimator, Proceedings of the Tenth Annual Conferenceon Computational Learning Theory, COLT’97, ACM, 1997, pp. 2–9.
  • [61] N. N. Taleb, Black swans and the domains of statistics, Amer. Statist. 61 (2007), no. 3, 198–200.
  • [62] A. S. Tay and K. F. Wallis, Density forecasting: a survey, J. Forecast 19 (2000), 235–254.
  • [63] V. Vapnik, The nature of statistical learning theory, Springer, 1999.
  • [64] V.G. Vovk, Aggregating strategies, Proceedings of the 3rd Annual Workshop on Computational Learning Theory(COLT), 1990, pp. 372–283.
  • [65] O. Wintenberger, Deviation inequalities for sums of weakly dependent time series, Electron. Commun. Probab. 15(2010), 489–503.
  • [66] Y.-L. Xu and D.-R. Chen, Learning rate of regularized regression for exponentially strongly mixing sequence, J.Statist. Plann. Inference 138 (2008), 2180–2189.
  • [67] B. Zou, L. Li, and Z. Xu, The generalization performance of erm algorithm with strongly mixing observations, Mach.Learn. 75 (2009), 275–295.
Typ dokumentu
Identyfikator YADDA
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.