R: Empirical Best Linear Unbiased Predictor

EBLUP {qape}

R Documentation

Empirical Best Linear Unbiased Predictor

Description

The function computes the value of the EBLUP of the linear combination of the variable of interest under the linear mixed model estimated using REML.

Usage

EBLUP(YS, fixed.part, random.part, reg, con, gamma, weights, estMSE)

Arguments

`YS`	values of the variable of interest observed in the sample.
`fixed.part`	fixed-effects terms declared as in lmer object.
`random.part`	random-effects terms declared as in lmer object.
`reg`	the population matrix of auxiliary variables named in fixed.part and random.part.
`con`	the population 0-1 vector with 1s for elements in the sample and 0s for elements which are not in the sample.
`gamma`	the population vector which transpose multiplied by the population vector of the variable of interest gives the predicted characteristic. For example, if gamma is the population vector of 1s, the sum of the values of the variable of interest in the whole dataset is predicted.
`weights`	the population vector of weights, defined as in lmer object, allowing to include heteroscedasticity of random components in the mixed linear model.
`estMSE`	logical. If TRUE, the naive MSE estimator and its components are computed.

Details

The function computes the value of the EBLUP of the linear combination of the variable of interest based on the formula (21) in Zadlo (2017) (see Remark 5.1 in the paper for further explanations). Predicted values for unsampled population elements in subsets for which random effects are not observed in the sample are computed based only on fixed effects. The naive MSE estimator of the EBLUP, which is the sum of two components given by equations (31) and (32) in Zadlo (2017) p. 8094, where unknown parameters are replaced by their REML estimates, is also computed. The naive MSE estimator ignores the variability of EBLUP resulting from the estimation of variance components.

Value

The function returns a list with the following objects:

`fixed.part`	the fixed part of the formula of model.
`random.part`	the random part of the formula of model.
`thetaP`	the value of the predictor.
`beta`	the estimated vector of fixed effects.
`Xbeta`	the product of two matrices: the population model matrix of auxiliary variables X and the estimated vector of fixed effects.
`sigma2R`	the estimated variance parameter of the distribution of random components.
`R`	the estimated covariance matrix of random components for sampled elements.
`G`	the estimated covariance matrix of random effects.
`model`	the formula of the model (as in lmer object).
`mEst`	lmer object with the estimated model.
`YS`	the sample vector of the variable of interest.
`reg`	the population matrix of auxiliary variables named in fixed.part and random.part.
`con`	the population 0-1 vector with 1s for elements in the sample and 0s for elements which are not in the sample.
`regS`	the sample matrix of auxiliary variables named in fixed.part and random.part.
`regR`	the matrix of auxiliary variables named in fixed.part and random.part for population elements which are not observed in the sample.
`gamma`	the population vector which transpose multiplied by the population vector of the variable of interest gives the predicted characteristic.
`gammaS`	the subvector of gamma for sampled elements.
`gammaR`	the subvector of gamma for population elements which are not observed in the sample.
`weights`	the population vector of weights, defined as in lmer object, allowing to include the heteroscedasticity of random components in the mixed linear model.
`Z`	the population model matrix of auxiliary variables associated with random effects.
`ZBlockNames`	labels of blocks of random effects in Z matrix.
`X`	the population model matrix of auxiliary variables associated with fixed effects.
`ZS`	the submatrix of Z matrix where the number of rows equals the number of sampled elements and the number of columns equals the number of estimated random effects.
`XR`	the submatrix of X matrix (with the same number of columns) for population elements which are not observed in the sample.
`ZR`	the submatrix of Z matrix where the number of rows equals the number of population elements which are not observed in the sample and the number of columns equals the number of estimated random effects.
`eS`	the sample vector of estimated random components.
`vS`	the estimated vector of random effects.
`g1`	the first component of the naive MSE estimator (computed if estMSE = TRUE).
`g2`	the second component of the naive MSE estimator (computed if estMSE = TRUE).
`neMSE`	the naive MSE estimator (computed if estMSE = TRUE).

Author(s)

Alicja Wolny-Dominiak, Tomasz Zadlo

References

1. Henderson, C.R. (1950) Estimation of Genetic Parameters (Abstract). Annals of Mathematical Statistics 21, 309-310.
2. Royall, R.M. (1976) The Linear Least Squares Prediction Approach to Two-Stage Sampling. Journal of the American Statistical Association 71, 657-473.
3. Zadlo, T. (2017) On prediction of population and subpopulation characteristics for future periods, Communications in Statistics - Simulation and Computation 461(10), 8086-8104.

Examples

library(lme4)
library(Matrix)


### Prediction of the subpopulation mean based on the cross-sectional data

data(invData) 
# data from one period are considered: 
invData2018 <- invData[invData$year == 2018,] 
attach(invData2018)

N <- nrow(invData2018) # population size
n <- 100 # sample size
# subpopulation of interest: NUTS4type==2
Nd <- sum(NUTS4type == 2) # subpopulation size

set.seed(123456)
sampled_elements <- sample(N,n)
con <- rep(0,N)
con[sampled_elements] <- 1 # elements in the sample
YS <- investments[sampled_elements]
fixed.part <- 'newly_registered'
random.part <- '(1| NUTS2)' 
reg = invData2018[, -which(names(invData2018) == 'investments')]

gamma <- rep(0,N)
gamma[NUTS4type == 2] <- 1/Nd

weights <- rep(1,N) # homoscedastic random components
estMSE <- TRUE 

# Predicted value of the mean in the following subpopulation: NUTS4type==2
EBLUP(YS, fixed.part, random.part, reg, con, gamma, weights, estMSE)$thetaP

# All results
EBLUP(YS, fixed.part, random.part, reg, con, gamma, weights, estMSE)

detach(invData2018)

##########################################################

### Prediction of the subpopulation total based on the longitudinal data

data(invData)
attach(invData)

N <- nrow(invData[(year == 2013),]) # population size in the first period
n <- 38 # sample size in the first period
# subpopulation and time period of interest: NUTS2 == '02' & year == 2018
# subpopulation size in the period of interest:
Ndt <- sum(NUTS2 == '02' & year == 2018)

set.seed(123456)
sampled_elements_in_2013 <- sample(N,n)
con2013 <- rep(0,N)
con2013[sampled_elements_in_2013] <- 1 # elements in the sample in 2013

# balanced panel sample - the same elements in all 6 periods:
con <- rep(con2013,6)

YS <- investments[con == 1]
fixed.part <- 'newly_registered'
random.part <- '(newly_registered  | NUTS4)' 
reg <- invData[, -which(names(invData) == 'investments')]

gamma <- rep(0,nrow(invData))
gamma[NUTS2 == '02' & year == 2018] <- 1

weights <- rep(1,nrow(invData)) # homoscedastic random components
estMSE <- TRUE 

# Predicted value of the total 
# in the following subpopulation: NUTS4type == 2 
# in the following time period: year == 2018
EBLUP(YS, fixed.part, random.part, reg, con, gamma, weights, estMSE)$thetaP

# All results
EBLUP(YS, fixed.part, random.part, reg, con, gamma, weights, estMSE)

detach(invData)

[Package qape version 2.1 Index]