R: Multinomial logistic estimator under dual frame approach with...

MLDF {Frames2}

R Documentation

Multinomial logistic estimator under dual frame approach with auxiliary information from each frame

Description

Produces estimates for class totals and proportions using multinomial logistic regression from survey data obtained from a dual frame sampling design using a model assisted approach with a possibly different set of auxiliary variables for each frame. Confidence intervals are also computed, if required.

Usage

MLDF (ysA, ysB, pik_A, pik_B, domains_A, domains_B, xsA, xsB, xA, xB, ind_samA, 
 ind_samB, ind_domA, ind_domB, N, conf_level = NULL)

Arguments

`ysA`	A data frame containing information about one or more factors, each one of dimension `n_A`, collected from `s_A`.
`ysB`	A data frame containing information about one or more factors, each one of dimension `n_B`, collected from `s_B`.
`pik_A`	A numeric vector of length `n_A` containing first order inclusion probabilities for units included in `s_A`.
`pik_B`	A numeric vector of length `n_B` containing first order inclusion probabilities for units included in `s_B`.
`domains_A`	A character vector of size `n_A` indicating the domain each unit from `s_A` belongs to. Possible values are "a" and "ab".
`domains_B`	A character vector of size `n_B` indicating the domain each unit from `s_B` belongs to. Possible values are "b" and "ba".
`xsA`	A numeric vector of length `n_A` or a numeric matrix or data frame of dimensions `n_A` x `m_A`, with `m_A` the number of auxiliary variables in frame A, containing auxiliary information in frame A for units included in `s_A`.
`xsB`	A numeric vector of length `n_B` or a numeric matrix or data frame of dimensions `n_B` x `m_B`, with `m_B` the number of auxiliary variables in frame B, containing auxiliary information in frame B for units included in `s_B`.
`xA`	A numeric vector or length `N_A` or a numeric matrix or data frame of dimensions `N_A` x `m_A`, with `m_A` the number of auxiliary variables in frame A, containing auxiliary information for the units in frame A.
`xB`	A numeric vector or length `N_B` or a numeric matrix or data frame of dimensions `N_B` x `m_B`, with `m_B` the number of auxiliary variables in frame B, containing auxiliary information for the units in frame B.
`ind_samA`	A numeric vector of length `n_A` containing the identificators of units of the frame A (from 1 to `N_A`) that belongs to `s_A`.
`ind_samB`	A numeric vector of length `n_B` containing the identificators of units of the frame B (from 1 to `N_B`) that belongs to `s_B`.
`ind_domA`	A character vector of length `N_A` indicating the domain each unit from frame A belongs to. Possible values are "a" and "ab".
`ind_domB`	A character vector of length `N_B` indicating the domain each unit from frame B belongs to. Possible values are "b" and "ba".
`N`	A numeric value indicating the size of the population.
`conf_level`	(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Details

Multinomial logistic estimator in dual frame using auxiliary information from each frame for a proportion is given by

\hat{P}_{MLi}^{DF} = \frac{1}{N} \left(\sum_{k \in U_a} p_{ki}^A + \eta \sum_{k \in U_{ab}} p_{ki}^A + (1 - \eta) \sum_{k \in U_{ba}} p_{ki}^B + \sum_{k \in U_b} p_{ki}^B \right.

+ \sum_{k \in s_a} d_k^A (z_{ki} - p_{ki}^A) + \eta \sum_{k \in s_{ab}} d_k^A (z_{ki} - p_{ki}^A)

\left. + (1 - \eta) \sum_{k \in s_{ba}} d_k^B (z_{ki} - p_{ki}^B) + \sum_{k \in s_b} d_k^B (z_{ki} - p_{ki}^B)\right), \hspace{0.3cm} i = 1,...,m

with \eta \in (0,1), m the number of categories of the response variable, z_i the indicator variable for the i-th category of the response variable, d^A and d^B the design weights for each frame, defined as the inverse of the first order inclusion probabilities and

p_{ki}^A = \frac{exp(x_k^{'}\beta_i^A)}{\sum_{r=1}^m exp(x_k^{'}\beta_r^A)},

being \beta_i^A the maximum likelihood parameters of the multinomial logistic model considering weights d^A. p_{ki}^B can be defined similarly.

Value

MLDF returns an object of class "MultEstimatorDF" which is a list with, at least, the following components:

`Call`	the matched call.
`Est`	class frequencies and proportions estimations for main variable(s).

References

Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.

Lehtonen, R. and Veijanen, A. (1998) On multinomial logistic generalizaed regression estimators Technical report 22, Department of Statistics, University of Jyvaskyla.

Examples

data(DatMA)
data(DatMB)
data(DatPopM) 

N <- nrow(DatPopM)
levels(DatPopM$Domain) <- c(levels(DatPopM$Domain), "ba")
DatPopMA <- subset(DatPopM, DatPopM$Domain == "a" | DatPopM$Domain == "ab", stringAsFactors = FALSE)
DatPopMB <- subset(DatPopM, DatPopM$Domain == "b" | DatPopM$Domain == "ab", stringAsFactors = FALSE)
DatPopMB[DatPopMB$Domain == "ab",]$Domain <- "ba"

#Let calculate proportions of categories of variable Prog using MLDF estimator
#using Read as auxiliary variable
MLDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, 
DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, 
DatPopMA$Domain, DatPopMB$Domain, N)

#Let obtain 95% confidence intervals together with the estimations
MLDF(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$Domain, DatMB$Domain, 
DatMA$Read, DatMB$Read, DatPopMA$Read, DatPopMB$Read, DatMA$Id_Frame, DatMB$Id_Frame, 
DatPopMA$Domain, DatPopMB$Domain, N, conf_level = 0.95)

[Package Frames2 version 0.2.1 Index]