MLCSW {Frames2}R Documentation

Multinomial logistic calibration estimator under single frame approach with auxiliary information from the whole population

Description

Produces estimates for class totals and proportions using multinomial logistic regression from survey data obtained from a dual frame sampling design using a model calibrated single frame approach with auxiliary information from the whole population. Confidence intervals are also computed, if required.

Usage

MLCSW (ysA, ysB, pik_A, pik_B, pik_ab_B, pik_ba_A, domains_A, domains_B, xsA, xsB,
 x, ind_sam, N_A, N_B, N_ab = NULL, met = "linear", conf_level = NULL)

Arguments

ysA

A data frame containing information about one or more factors, each one of dimension nAn_A, collected from sAs_A.

ysB

A data frame containing information about one or more factors, each one of dimension nBn_B, collected from sBs_B.

pik_A

A numeric vector of length nAn_A containing first order inclusion probabilities for units included in sAs_A.

pik_B

A numeric vector of length nBn_B containing first order inclusion probabilities for units included in sBs_B.

pik_ab_B

A numeric vector of size nAn_A containing first order inclusion probabilities according to sampling design in frame B for units belonging to overlap domain that have been selected in sAs_A.

pik_ba_A

A numeric vector of size nBn_B containing first order inclusion probabilities according to sampling design in frame A for units belonging to overlap domain that have been selected in sBs_B.

domains_A

A character vector of size nAn_A indicating the domain each unit from sAs_A belongs to. Possible values are "a" and "ab".

domains_B

A character vector of size nBn_B indicating the domain each unit from sBs_B belongs to. Possible values are "b" and "ba".

xsA

A numeric vector of length nAn_A or a numeric matrix or data frame of dimensions nAn_A x mm, with mm the number of auxiliary variables, containing auxiliary information in frame A for units included in sAs_A.

xsB

A numeric vector of length nBn_B or a numeric matrix or data frame of dimensions nBn_B x mm, with mm the number of auxiliary variables, containing auxiliary information in frame B for units included in sBs_B.

x

A numeric vector or length NN or a numeric matrix or data frame of dimensions NN x mm, with mm the number of auxiliary variables, containing auxiliary information for every unit in the population.

ind_sam

A numeric vector of length n=nA+nBn = n_A + n_B containing the identificators of units of the population (from 1 to NN) that belongs to sAs_A or sBs_B

N_A

A numeric value indicating the size of frame A

N_B

A numeric value indicating the size of frame B

N_ab

(Optional) A numeric value indicating the size of the overlap domain

met

(Optional) A character vector indicating the distance that must be used in calibration process. Possible values are "linear", "raking" and "logit". Default is "linear".

conf_level

(Optional) A numeric value indicating the confidence level for the confidence intervals, if desired.

Details

Multinomial logistic calibration estimator in single frame using auxiliary information from the whole population for a proportion is given by

P^MLCiSW=1N(ksAsBw~kzki)i=1,...,m\hat{P}_{MLCi}^{SW} = \frac{1}{N} \left(\sum_{k \in s_A \cup s_B} \tilde{w}_k z_{ki}\right) \hspace{0.3cm} i = 1,...,m

with mm the number of categories of the response variable, ziz_i the indicator variable for the i-th category of the response variable, and w~\tilde{w} calibration weights which are calculated having into account a different set of constraints, depending on the case. For instance, if NA,NBN_A, N_B and NabN_{ab} are known, calibration constraints are

ksaw~k=Na,ksabsbaw~k=Nab,ksbaw~k=Nba\sum_{k \in s_a}\tilde{w}_k = N_a, \sum_{k \in s_{ab} \cup s_{ba}}\tilde{w}_k = N_{ab}, \sum_{k \in s_{ba}}\tilde{w}_k = N_{ba}

and

ksAsBw~kp~ki=kUp~ki\sum_{k \in s_A \cup s_B}\tilde{w}_k \tilde{p}_{ki} = \sum_{k \in U} \tilde{p}_{ki}

with

p~ki=exp(xkβi~)r=1mexp(xkβr~),\tilde{p}_{ki} = \frac{exp(x_k^{'}\tilde{\beta_i})}{\sum_{r=1}^m exp(x_k^{'}\tilde{\beta_r})},

being βi~\tilde{\beta_i} the maximum likelihood parameters of the multinomial logistic model considering weights d~k={dkAif ka(1/dkA+1/dkB)1if kabbadkBif kb\tilde{d}_k =\left\{\begin{array}{lcc} d_k^A & \textrm{if } k \in a\\ (1/d_k^A + 1/d_k^B)^{-1} & \textrm{if } k \in ab \cup ba \\ d_k^B & \textrm{if } k \in b \end{array} \right..

Value

MLCSW returns an object of class "MultEstimatorDF" which is a list with, at least, the following components:

Call

the matched call.

Est

class frequencies and proportions estimations for main variable(s).

References

Molina, D., Rueda, M., Arcos, A. and Ranalli, M. G. (2015) Multinomial logistic estimation in dual frame surveys Statistics and Operations Research Transactions (SORT). To be printed.

See Also

JackMLCSW

Examples

data(DatMA)
data(DatMB)
data(DatPopM) 

IndSample <- c(DatMA$Id_Pop, DatMB$Id_Pop)
N_FrameA <- nrow(DatPopM[DatPopM$Domain == "a" | DatPopM$Domain == "ab",])
N_FrameB <- nrow(DatPopM[DatPopM$Domain == "b" | DatPopM$Domain == "ab",])
N_Domainab <- nrow(DatPopM[DatPopM$Domain == "ab",])
#Let calculate proportions of categories of variable Prog using MLCSW estimator
#using Read as auxiliary variable
MLCSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA,
DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, 
N_FrameB)

#Now, let suppose that the overlap domian size is known
MLCSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA,
DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, 
N_FrameB, N_Domainab)

#Let obtain 95% confidence intervals together with the estimations
MLCSW(DatMA$Prog, DatMB$Prog, DatMA$ProbA, DatMB$ProbB, DatMA$ProbB, DatMB$ProbA,
DatMA$Domain, DatMB$Domain, DatMA$Read, DatMB$Read, DatPopM$Read, IndSample, N_FrameA, 
N_FrameB, N_Domainab, conf_level = 0.95)

[Package Frames2 version 0.2.1 Index]