imputation {geneticae}R Documentation

Imputation of missing cells in two-way data sets

Description

Missing values are not allowed by the AMMI or GGE methods. This function provides several methods to impute missing observations in data from multi-environment trials and to subsequently adjust the mentioned methods.

Usage

imputation(
  Data,
  genotype = "gen",
  environment = "env",
  response = "yield",
  rep = NULL,
  type = "EM-AMMI",
  nPC = 2,
  initial.values = NA,
  precision = 0.01,
  maxiter = 1000,
  change.factor = 1,
  simplified.model = FALSE,
  scale = TRUE,
  method = "EM",
  row.w = NULL,
  coeff.ridge = 1,
  seed = NULL,
  nb.init = 1,
  Winf = 0.8,
  Wsup = 1
)

Arguments

Data

dataframe containing genotypes, environments, repetitions (if any) and the phenotypic trait of interest. Other variables that will not be used in the analysis can be present.

genotype

column name containing genotypes.

environment

column name containing environments.

response

column name containing the phenotypic trait.

rep

column name containing replications. If this argument is NULL, there are no replications available in the data. Defaults to NULL.

type

imputation method. Either "EM-AMMI", "Gabriel","WGabriel","EM-PCA". Defaults to "EM-AMMI".

nPC

number of components used to predict the missing values. Default to 2.

initial.values

initial values of the missing cells. It can be a single value or a vector of length equal to the number of missing cells (starting from the missing values in the first column). If omitted, the initial values will be obtained by the main effects from the corresponding model, that is, by the grand mean of the observed data increased (or decreased) by row and column main effects.

precision

threshold for assessing convergence.

maxiter

maximum number of iteration for the algorithm.

change.factor

When 'change.factor' is equal to 1, the previous approximation is changed with the new values of missing cells (standard EM-AMMI algorithm). However, when 'change.factor' less than 1, then the new approximations are computed and the values of missing cells are changed in the direction of this new approximation but the change is smaller. It could be useful if the changes are cyclic and thus convergence could not be reached. Usually, this argument should not affect the final outcome (that is, the imputed values) as compared to the default value of 'change.factor' = 1.

simplified.model

the AMMI model contains the general mean, effects of rows, columns and interaction terms. So the EM-AMMI algorithm in step 2 calculates the current effects of rows and columns; these effects change from iteration to iteration because the empty (at the outset) cells in each iteration are filled with different values. In step 3 EM-AMMI uses those effects to re-estimate cells marked as missed (as default, simplified.model=FALSE). It is, however, possible that this procedure will not converge. Thus the user is offered a simplified EM-AMMI procedure that calculates the general mean and effects of rows and columns only in the first iteration and in next iterations uses these values (simplified.model=TRUE). In this simplified procedure the initial values affect the outcome (whilst EM-AMMI results usually do not depend on initial values). For the simplified procedure the number of iterations to convergence is usually smaller and, furthermore, convergence will be reached even in some cases where the regular procedure fails. If the regular procedure does not converge for the standard initial values, the simplified model can be used to determine a better set of initial values.

scale

boolean. By default TRUE leading to a same weight for each variable

method

"Regularized" by default or "EM"

row.w

row weights (by default, a vector of 1 for uniform row weights)

coeff.ridge

1 by default to perform the regularized imputePCA algorithm; useful only if method="Regularized". Other regularization terms can be implemented by setting the value to less than 1 in order to regularized less (to get closer to the results of the EM method

seed

integer, by default seed = NULL implies that missing values are initially imputed by the mean of each variable. Other values leads to a random initialization

nb.init

integer corresponding to the number of random initializations; the first initialization is the initialization with the mean imputation

Winf

peso inferior

Wsup

peso superior

Details

Often, multi-environment experiments are unbalanced because several genotypes are not tested in some environments. Several methodologies have been proposed in order to solve this lack of balance caused by missing values, some of which are included in this function:

Value

imputed data matrix

References

Paderewski, J. (2013). An R function for imputation of missing cells in two-way data sets by EM-AMMI algorithm. Communications in Biometry and Crop Science 8, 60–69.

Julie Josse, Francois Husson (2016). missMDA: A Package for Handling Missing Values in Multivariate Data Analysis. Journal of Statistical Software 70, 1-31.

Arciniegas-Alarcón S., García-Peña M., Dias C.T.S., Krzanowski W.J. (2010). An alternative methodology for imputing missing data in trials with genotype-by-environment interaction. Biometrical Letters 47, 1–14.

Arciniegas-Alarcón S., García-Peña M., Krzanowski W.J., Dias C.T.S. (2014). An alternative methodology for imputing missing data in trials with genotype-byenvironment interaction: some new aspects. Biometrical Letters 51, 75-88.

Examples

library(geneticae)
# Data without replications
library(agridat)
data(yan.winterwheat)

# generating missing values
yan.winterwheat[1,3]<-NA
yan.winterwheat[3,3]<-NA
yan.winterwheat[2,3]<-NA

imputation(yan.winterwheat, genotype = "gen", environment = "env",
           response = "yield", type = "EM-AMMI")

# Data with replications
data(plrv)
plrv[1,3] <- NA
plrv[3,3] <- NA
plrv[2,3] <- NA
imputation(plrv, genotype = "Genotype", environment = "Locality",
           response = "Yield", rep = "Rep", type = "EM-AMMI")


[Package geneticae version 0.4.0 Index]