R: Empirical Best Predictor based on the nested error linear...

ebpLMMne {qape}

R Documentation

Empirical Best Predictor based on the nested error linear mixed model

Description

The function computes the value of the EBP under the nested error linear mixed model estimated using REML assumed for possibly transformed variable of interest.

Usage

ebpLMMne(YS, fixed.part, division, reg, con, backTrans, thetaFun, L)

Arguments

`YS`	values of the variable of interest (already transformed if necessary) observed in the sample and used in the model as the dependent variable.
`fixed.part`	fixed-effects terms declared as in lmer object.
`division`	the variable dividing the population dataset into subsets (the nested error linear mixed model with 'division'-specific random components is estimated).
`reg`	the population matrix of auxiliary variables named in fixed.part and division.
`con`	the population 0-1 vector with 1s for elements in the sample and 0s for elements which are not in the sample.
`backTrans`	back-transformation function of the variable of interest (e.g. if YS is log-tranformed, then backTrans <- function(x) exp(x)).
`thetaFun`	the predictor function (e.g. mean or sd)
`L`	the number of iterations used to compute the value of the predictor.

Details

The function computes the value of the EBP based on the algorithm described in Molina and Rao (2010) in Section 4.

Value

The function returns a list with the following objects:

`thetaP`	the value/s of the predictor (more than one value is computed if in thetaFun more than one population characteristic is defined).
`fixed.part`	the fixed part of the formula of model.
`random.part`	the random part of the formula of model.
`division`	the variable dividing the population dataset into subsets (the nested error linear mixed model with 'division'-specific random components is estimated).
`thetaFun`	the function of the population values of the variable of interest (on the original scale) which defines at least one population or subpopulation characteristic to be predicted.
`backTrans`	back-transformation function of the variable of interest (e.g. if YS is log-tranformed, then backTrans <- function(x) exp(x).
`L`	the number of iterations used to compute the value of the predictor.
`beta`	the estimated vector of fixed effects.
`Xbeta`	the product of two matrices: the population model matrix of auxiliary variables X and the estimated vector of fixed effects.
`sigma2R`	the estimated variance parameter of the distribution of random components.
`R`	the estimated covariance matrix of random components for sampled elements.
`G`	the estimated covariance matrix of random effects.
`model`	the formula of the model (as in lmer object).
`mEst`	lmer object with the estimated model.
`YS`	values of the variable of interest (already transformed if necessary) observed in the sample and used in the model as the dependent variable.
`reg`	the population matrix of auxiliary variables named in fixed.part and random.part.
`con`	the population 0-1 vector with 1s for elements in the sample and 0s for elements which are not in the sample.
`regS`	the sample matrix of auxiliary variables named in fixed.part and random.part.
`regR`	the matrix of auxiliary variables named in fixed.part and random.part for unsampled population elements.
`weights`	the population vector of weigts, defined as in lmer object, allowing to include the heteroscedasticity of random components in the mixed linear model.
`Z`	the population model matrix of auxiliary variables associated with random effects.
`ZBlockNames`	labels of blocks of random effects in Z matrix.
`X`	the population model matrix of auxiliary variables associated with fixed effects.
`ZS`	the submatrix of Z matrix where the number of rows equals the number of sampled elements and the number of columns equals the number of estimated random effects.
`XR`	the submatrix of X matrix (with the same number of columns) for unsampled population elements.
`ZR`	the submatrix of Z matrix where the number of rows equals the number of unsampled population elements and the number of columns equals the number of estimated random effects.
`eS`	the sample vector of estimated random components.
`vS`	the estimated vector of random effects.

Author(s)

Alicja Wolny-Dominiak, Tomasz Zadlo

References

1. Chwila, A., Zadlo, T. (2022) On properties of empirical best predictors. Communications in Statistics - Simulation and Computation, 51(1), 220-253, https://doi.org/10.1080/03610918.2019.1649422
2. Molina, I., Rao, J.N.K. (2010) Small area estimation of poverty indicators. Canadian Journal of Statistics 38(3), 369-385.
3. Zadlo, T. (2017). On prediction of population and subpopulation characteristics for future periods, Communications in Statistics - Simulation and Computation 461(10), 8086-8104.

Examples


library(lme4)
library(Matrix)


### Prediction of the subpopulation median 
### and the subpopulation standard deviation 
### based on the cross-sectional data

data(invData) 
# data from one period are considered: 
invData2018 <- invData[invData$year == 2018,] 
attach(invData2018)

N <- nrow(invData2018) # population size
n <- 100 # sample size

set.seed(123456)
sampled_elements <- sample(N,n)
con <- rep(0,N)
con[sampled_elements] <- 1 # elements in the sample
YS <- log(investments[sampled_elements]) # log-transformed values
backTrans <- function(x) exp(x) # back-transformation of the variable of interest
fixed.part <- 'log(newly_registered)'
division <- 'NUTS2' # NUTS2-specific random effects are taken into account
reg <- invData2018[, -which(names(invData2018) == 'investments')]


# Characteristics to be predicted - the median and the standard deviation
# in the subpopulation of interest: NUTS4type==2
thetaFun <- function(x) {c(median(x[NUTS4type == 2]), sd(x[NUTS4type == 2]))}

L <- 5

# Predicted values of the median and the standard deviation
# in the following subpopulation: NUTS4type==2
set.seed(123456)
ebpLMMne(YS, fixed.part, division, reg, con, backTrans, thetaFun, L)$thetaP

set.seed(123456)
ebpLMMne(YS, fixed.part, division, reg, con, backTrans, thetaFun, L)

# All results
set.seed(123456)
str(ebpLMMne(YS, fixed.part, division, reg, con, backTrans, thetaFun, L))

detach(invData2018)

##########################################################

### Prediction of the subpopulation quartiles based on longitudinal data

data(invData)
attach(invData)

N <- nrow(invData[(year == 2013),]) # population size in the first period
n <- 38 # sample size in the first period

set.seed(123456)
sampled_elements_in_2013 <- sample(N,n)
con2013 <- rep(0,N)
con2013[sampled_elements_in_2013] <- 1 # elements in the sample in 2013

# balanced panel sample - the same elements in all 6 periods:
con <- rep(con2013,6)

YS <- log(investments[con == 1]) # log-transformed values
backTrans <- function(x) exp(x) # back-transformation of the variable of interest
fixed.part <- 'log(newly_registered)'
division <- 'NUTS4' # NUTS4-specific random effects are taken into account
reg <- invData[, -which(names(invData) == 'investments')]
thetaFun <- function(x) {quantile(x[NUTS2 == '02' & year == 2018],probs = c(0.25,0.5,0.75))}

L <- 5

# Predicted values of quartiles 
# in the following subpopulation: NUTS4type==2 
# in the following time period: year==2018
set.seed(123456)
ebpLMMne(YS, fixed.part, division, reg, con, backTrans, thetaFun, L)$thetaP

set.seed(123456)
ebpLMMne(YS, fixed.part, division, reg, con, backTrans, thetaFun, L)

# All results
str(ebpLMMne(YS, fixed.part, division, reg, con, backTrans, thetaFun, L))


detach(invData)

[Package qape version 2.1 Index]