An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects

Arciniegas-Alarcón, Sergio; García-Peña, Marisol; Krzanowski, Wojtek Janusz; Santos Dias, Carlos Tadeu dos

doi:10.2478/bile-2014-0006

Artykuł - szczegóły

Czasopismo

Biometrical Letters

2014 | 51 | 2 | 75-88

Tytuł artykułu

An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects

Autorzy

Sergio Arciniegas-Alarcón , Marisol García-Peña , Wojtek Janusz Krzanowski , Carlos Tadeu dos Santos Dias

Treść / Zawartość

Pełne teksty:

http://www.degruyter.com/view/j/bile.2014.51.issue-2/bile-2014-0006/bile-2014-0006.pdf [zdalny]

Warianty tytułu

Języki publikacji

EN

Abstrakty

EN

A common problem in multi-environment trials arises when some genotypeby- environment combinations are missing. In Arciniegas-Alarcón et al. (2010) we outlined a method of data imputation to estimate the missing values, the computational algorithm for which was a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition (SVD). In the present paper we provide two extensions to this methodology, by including weights chosen by cross-validation and allowing multiple as well as simple imputation. The three methods are assessed and compared in a simulation study, using a complete set of real data in which values are deleted randomly at different rates. The quality of the imputations is evaluated using three measures: the Procrustes statistic, the squared correlation between matrices and the normalised root mean squared error between these estimates and the true observed values. None of the methods makes any distributional or structural assumptions, and all of them can be used for any pattern or mechanism of the missing values.

Słowa kluczowe

EN

cross-validation singular value decomposition imputation genotype-by-environment interaction weights missing values

Wydawca

De Gruyter Open

Czasopismo

Biometrical Letters

Rocznik

2014

Tom

51

Numer

2

Strony

75-88

Opis fizyczny

Daty

wydano

2014-12-01

online

2014-12-20

Twórcy

autor

Sergio Arciniegas-Alarcón

sergio.arciniegas@gmail.com

Departamento de Ciências Exatas, Universidade de São Paulo/ESALQ, Cx.P.09, CEP.13418-900, Piracicaba, SP - Brasil

autor

Marisol García-Peña

Departamento de Ciências Exatas, Universidade de São Paulo/ESALQ, Cx.P.09, CEP.13418-900, Piracicaba, SP - Brasil

autor

Wojtek Janusz Krzanowski

sergio.arciniegas@gmail.com

College of Engineering, Mathematics and Physical Sciences, Harrison Building, University of Exeter, North Park Road, Exeter, EX4 4QF, United Kingdom

autor

Carlos Tadeu dos Santos Dias

Departamento de Ciências Exatas, Universidade de São Paulo/ESALQ, Cx.P.09, CEP.13418-900, Piracicaba, SP - Brasil

Bibliografia

Arciniegas-Alarcón S., García-Peña M., Dias C.T.S. (2011): Data imputation in trials with genotype×environment interaction. Interciencia 36(6): 444-449.
Arciniegas-Alarcón S., García-Peña M., Dias C.T.S., Krzanowski W.J. (2010): An alternative methodology for imputing missing data in trials with genotypeby- environment interaction. Biometrical Letters 47(1): 1-14.
Bergamo G.C., Dias C.T.S., Krzanowski W.J. (2008): Distribution-free multiple imputation in an interaction matrix through singular value decomposition. Scientia Agricola 65(4): 422-427.[WoS]
Calinski T., Czajka S., Kaczmarek Z., Krajewski P., Pilarczyk W. (2009): Analyzing the Genotype-by-Environment Interactions Under a Randomization- Derived Mixed Model. Journal of Agricultural, Biological and Environmental Statistics 14(2): 224-241.[WoS][Crossref]
Ching W., Li L., Tsing N., Tai C., Ng T. (2010): A weighted local least squares imputation method for missing value estimation in microarray gene expression data. International Journal of Data Mining and Bioinformatics 4(3): 331-347.
Denis J.B., Baril C.P. (1992): Sophisticated models with numerous missing values: the multiplicative interaction model as an example. Biuletyn Oceny Odmian 24-25: 33-45.
Di Ciaccio A. (2011): Bootstrap and nonparametric predictors to impute missing data. In: B. Fichet et al. (eds.), Classification and Multivariate Analysis for Complex Data Structures, Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag Berlin Heidelberg.
Dias C.T.S., Krzanowski W.J. (2003): Model selection and cross validation in additive main effect and multiplicative interaction models. Crop Science 43: 865-873.[Crossref]
Gabriel K.R. (2002): Le biplot - outil d’exploration de données multidimensionelles. Journal de la Société Française de Statistique 143(3-4): 5-55.
García-Peña M., Dias C.T.S. (2009): Analysis of bivariate additive models with multiplicative interaction (AMMI). Biometric Brazilian Journal 27(4): 586-602.
Gauch H.G. (2013): A simple protocol for AMMI analysis of yield trials. Crop Science 53: 1860-1869.[Crossref][WoS]
Gauch H.G., Zobel R.W. (1990): Imputing missing yield trial data. Theoretical and Applied Genetics 79: 753-761.
Josse J., Pagès J., Husson F. (2011): Multiple imputation in PCA. Advances in data analysis and classification 5(3): 231-246.
Josse J., Husson F. (2012): Handling missing values in exploratory multivariate data analysis methods. Journal de la Société Française de Statistique 153(2): 79-99.
Krzanowski W.J. (1988): Missing value imputation in multivariate data using the singular value decomposition of a matrix. Biometrical Letters XXV(1-2): 31-39.
Krzanowski W.J. (2000): Principles of multivariate analysis: A user’s perspective. Oxford: University Press.
Kroonenberg P.M. (2008): Applied multiway data analysis. John Wiley & Sons.
Kumar A., Verulkar S.B., Mandal N.P., Variar M., Shukla V.D., Dwivedi J.L., Singh B.N., Singh O.N., Swain P., Mall A.K., Robin S., Chandrababu R., Jain A., Haefele S.M., Piepho H.P., Raman A. (2012): High-yielding, droughttolerant, stable rice genotypes for the shallow rainfed lowland droughtprone ecosystem. Field Crops Research 133: 37-47.[WoS]
Little R., Rubin D. (2002): Statistical analysis with missing data. 2nd ed. John Wiley & Sons, New York, NY.
Paderewski J., Rodrigues P.C. (2014): The usefulness of EM-AMMI to study the influence of missing data pattern and application to Polish post-registration winter wheat data. Australian Journal of Crop Science 8: 640-645.
Piepho H.P. (1995): Methods for estimating missing genotype-location combinations in multilocation trials - an empirical comparison. Informatik Biometrie und Epidemiologie in Medizin und Biologie 26: 335-349.
Piepho H.P., Möhring J. (2006): Selection in cultivar trials - Is it ignorable? Crop Science 46: 192-201.[Crossref]
R Development Core Team (2013): R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org/
Rodrigues P., Pereira D.G.S., Mexia J.T. (2011): A comparison between joint regression analysis and the additive main and multiplicative interaction model: the robustness with increasing amounts of missing data. Scientia Agricola 68(6): 679-686.[Crossref]
Rubin D.B. (1978): Multiple imputation in sample surveys: a phenomenological Bayesian approach to nonresponse. In: Survey Research Methods Section Of The American Statistical Association. Proceedings: 20-34.
Sabaghnia N., Karimizadeh R., Mohammadi M. (2012): Model selection in additive main effect and multiplicative interaction model in durum wheat. Genetika 44(2): 325-339.[Crossref][WoS]
Schafer J.L., Graham J.W. (2002): Missing data: our view of the state of the art. Psychological Methods 7(2): 147-177.[Crossref][PubMed]
van Buuren S. (2012): Flexible imputation of missing data. CRC press.
Wright K. (2012): agridat: Agricultural datasets. R package version 1.4. http://CRAN.R-project.org/package=agridat>
Yan W., Pageau D., Frégeau-Reid J., Durand J. (2011): Assessing the representativeness and repeatability of test locations for genotype evaluation. Crop Science 51: 1603-1610.[Crossref][WoS]
Yan W. (2013): Biplot analysis of incomplete two-way data. Crop Science 53(1): 48-57. [WoS][Crossref]

Typ dokumentu

Bibliografia

Identyfikatory

DOI

10.2478/bile-2014-0006

Identyfikator YADDA

bwmeta1.element.doi-10_2478_bile-2014-0006

Artykuł - szczegóły

Czasopismo

Biometrical Letters

Tytuł artykułu

An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects

Autorzy

Treść / Zawartość

Warianty tytułu

Języki publikacji

Abstrakty

Słowa kluczowe

Wydawca

Czasopismo

Rocznik

Tom

Numer

Strony

Opis fizyczny

Daty

Twórcy

Bibliografia

Typ dokumentu

Bibliografia

Identyfikatory

Identyfikator YADDA