harmonize.x {StatMatch} | R Documentation |
Harmonizes the marginal (joint) distribution of a set of variables observed independently in two sample surveys referred to the same target population
Description
This function permits to harmonize the marginal or the joint distribution of a set of variables observed independently in two sample surveys carried out on the same target population. This harmonization is carried out by using the calibration of the survey weights of the sample units in both the surveys according to the procedure suggested by Renssen (1998).
Usage
harmonize.x(svy.A, svy.B, form.x, x.tot=NULL,
cal.method="linear", ...)
Arguments
svy.A |
A |
svy.B |
A |
form.x |
A R formula specifying which of the variables, common to both the surveys, have to be considered, and how have to be considered. For instance When dealing with categorical variables, the formula Due to weights calibration features, it is preferable to work with categorical X variables. In some cases, the procedure may be successful when a single continuous variable is considered jointly with one or more categorical variables. When dealing with several continuous variable it may be preferable to categorize them. |
x.tot |
A vector or table with known population totals for the X variables. A vector is required when When |
cal.method |
A string that specifies how the calibration of the weights in Alternatively, it is possible to rake the origin survey weights by specifying |
... |
Further arguments that may be necessary for calibration or post-stratification. The number of iterations used in calibration can be modified too by using the argument See |
Details
This function harmonizes the totals of the X variables, observed in both survey A and survey B, to be equal to given known totals specified via x.tot
. When these totals are not known (x.tot=NULL
) they are estimated by combining the estimates derived from the two separate surveys. The harmonization is carried out according to a procedure suggested by Renssen (1998) based on calibration of survey weights (for major details on calibration see Sarndal and Lundstrom, 2005). The procedure is particularly suited to deal with categorical X variables, in this case it permits to harmonize the joint or the marginal distribution of the categorical variables being considered. Note that an incomplete crossing of the X variables can be considered: i.e. harmonisation wrt to the joint distribution of X_1 \times X_2
and the marginal distribution of X_3
).
The calibration procedure may not produce the final result due to convergence problems. In this case an error message appears. In order to reach convergence it may be necessary to launch the procedure with less constraints (i.e a reduced number of population totals) by joining adjacent categories or by discarding some variables.
In some limited cases, it could be possible to consider both categorical and continuous variables. In this situation it may happen that calibration is not successful. In order to reach convergence it may be necessary to categorize the continuous X variables.
Post-stratification is a special case of calibration; all the weights of the units in a given post-stratum are modified so as to reproduce the known total for that post-stratum. Post-stratification avoids problems of convergence but, on the other hand, it may produce final weights with a higher variability than those derived from the calibration.
Value
A R with list the results of calibration procedures carried out on survey A and survey B, respectively. In particular the following components will be provided:
cal.A |
The survey object |
cal.B |
The survey object |
weights.A |
The new calibrated weights associated to the the units in |
weights.B |
The new calibrated weights associated to the the units in |
call |
Stores the call to this function with all the values specified for the various arguments ( |
Author(s)
Marcello D'Orazio mdo.statmatch@gmail.com
References
D'Orazio, M., Di Zio, M. and Scanu, M. (2006). Statistical Matching: Theory and Practice. Wiley, Chichester.
Renssen, R.H. (1998) “Use of Statistical Matching Techniques in Calibration Estimation”. Survey Methodology, N. 24, pp. 171–183.
Sarndal, C.E. and Lundstrom, S. (2005) Estimation in Surveys with Nonresponse. Wiley, Chichester.
See Also
comb.samples
, calibrate
, svydesign
, postStratify
,
Examples
data(quine, package="MASS") #loads quine from MASS
str(quine)
# split quine in two subsets
suppressWarnings(RNGversion("3.5.0"))
set.seed(7654)
lab.A <- sample(nrow(quine), 70, replace=TRUE)
quine.A <- quine[lab.A, c("Eth","Sex","Age","Lrn")]
quine.B <- quine[-lab.A, c("Eth","Sex","Age","Days")]
# create svydesign objects
require(survey)
quine.A$f <- 70/nrow(quine) # sampling fraction
quine.B$f <- (nrow(quine)-70)/nrow(quine)
svy.qA <- svydesign(~1, fpc=~f, data=quine.A)
svy.qB <- svydesign(~1, fpc=~f, data=quine.B)
#------------------------------------------------------
# example (1)
# Harmonizazion of the distr. of Sex vs. Age
# usign poststratification
# (1.a) known population totals
# the population toatal are computed on the full data frame
tot.sex.age <- xtabs(~Sex+Age, data=quine)
tot.sex.age
out.hz <- harmonize.x(svy.A=svy.qA, svy.B=svy.qB, form.x=~Sex+Age,
x.tot=tot.sex.age, cal.method="poststratify")
tot.A <- xtabs(out.hz$weights.A~Sex+Age, data=quine.A)
tot.B <- xtabs(out.hz$weights.B~Sex+Age, data=quine.B)
tot.sex.age-tot.A
tot.sex.age-tot.B
# (1.b) unknown population totals (x.tot=NULL)
# the population total is estimated by combining totals from the
# two surveys
out.hz <- harmonize.x(svy.A=svy.qA, svy.B=svy.qB, form.x=~Sex+Age,
x.tot=NULL, cal.method="poststratify")
tot.A <- xtabs(out.hz$weights.A~Sex+Age, data=quine.A)
tot.B <- xtabs(out.hz$weights.B~Sex+Age, data=quine.B)
tot.A
tot.A-tot.B
#-----------------------------------------------------
# example (2)
# Harmonizazion wrt the maginal distribution
# of 'Eth', 'Sex' and 'Age'
# using linear calibration
# (2.a) vector of population total known
# estimated from the full data set
# note the formula! only marginal distribution of the
# variables are considered
tot.m <- colSums(model.matrix(~Eth+Sex+Age-1, data=quine))
tot.m
out.hz <- harmonize.x(svy.A=svy.qA, svy.B=svy.qB, x.tot=tot.m,
form.x=~Eth+Sex+Age-1, cal.method="linear")
summary(out.hz$weights.A) #check for negative weights
summary(out.hz$weights.B) #check for negative weights
tot.m
svytable(formula=~Eth, design=out.hz$cal.A)
svytable(formula=~Eth, design=out.hz$cal.B)
svytable(formula=~Sex, design=out.hz$cal.A)
svytable(formula=~Sex, design=out.hz$cal.B)
# Note: margins are equal but joint distributions are not!
svytable(formula=~Sex+Age, design=out.hz$cal.A)
svytable(formula=~Sex+Age, design=out.hz$cal.B)
# (2.b) vector of population total unknown
out.hz <- harmonize.x(svy.A=svy.qA, svy.B=svy.qB, x.tot=NULL,
form.x=~Eth+Sex+Age-1, cal.method="linear")
svytable(formula=~Eth, design=out.hz$cal.A)
svytable(formula=~Eth, design=out.hz$cal.B)
svytable(formula=~Sex, design=out.hz$cal.A)
svytable(formula=~Sex, design=out.hz$cal.B)
#-----------------------------------------------------
# example (3)
# Harmonizazion wrt the joint distribution of 'Sex' vs. 'Age'
# and the marginal distribution of 'Eth'
# using raking
# vector of population total known
# estimated from the full data set
# note the formula!
tot.m <- colSums(model.matrix(~Eth+(Sex:Age-1)-1, data=quine))
tot.m
out.hz <- harmonize.x(svy.A=svy.qA, svy.B=svy.qB, x.tot=tot.m,
form.x=~Eth+(Sex:Age)-1, cal.method="raking")
summary(out.hz$weights.A) #check for negative weights
summary(out.hz$weights.B) #check for negative weights
tot.m
svytable(formula=~Eth, design=out.hz$cal.A)
svytable(formula=~Eth, design=out.hz$cal.B)
svytable(formula=~Sex+Age, design=out.hz$cal.A)
svytable(formula=~Sex+Age, design=out.hz$cal.B)