dynarski {DOS2} | R Documentation |
A Natural Experiment Concerning Subsidized College Education
Description
The data are from Susan Dynarski (2003)'s study of the effects of a subsidy for college education provided Social Security to children whose fathers had died. The subsidy was eliminated in 1982. As presented here, the data are only a portion of the data used in Dynarski (2003), specifically the period from 1979-1981 when the subsidy was available. The data set here is Table 14.1 of "Design of Observational Studies" (2nd edition), where it was used with the limited goal of illustrating matching techniques.
Usage
data("dynarski")
Format
A data frame with 2820 observations on the following 10 variables.
id
identification number
zb
treatment indicator, zb=1 if 1979-1981 with father deceased, or zb=0 if 1979-1981 with father alive
faminc
family income, in units of tens of thousands of dollars
incmiss
indicator for missing family income
black
1 if black, 0 otherwise
hisp
1 if hispanic, 0 otherwise
afqtpct
test score percentile, Armed Forces Qualifications Test
edmissm
indicator for missing education of mother
edm
education of mother, 1 is <high school, 2 is high school, 3 is some college, 4 is a BA degree or higher
female
1 if female, 0 if male
Details
The examples reproduce steps in Chapter 14 of "Design of Observational Studies" (2nd edition). Portions of the examples require you to load Ben Hansen's 'optmatch' package and accept its academic license; these portions of the examples do not run automatically. Hansen's 'optmatch' package uses the Fortran code of Bertsekas and Tseng (1988).
Source
Dynarski (2003).
References
Bertsekas, D. P. and Tseng, P. (1988) <doi:10.1007/BF02288322> "The relax codes for linear minimum cost network flow problems". Annals of Operations Research, 13(1), 125-190.
Dynarski, S. M. (2003) <doi:10.1257/000282803321455287> "Does aid matter? Measuring the effect of student aid on college attendance and completion". American Economic Review, 93(1), 279-288.
Hansen, B. B. (2007) <https://www.r-project.org/conferences/useR-2007/program/presentations/hansen.pdf> "Flexible, optimal matching for observational studies". R News, 7, 18-24.
Hansen, B. B. and Klopfer, S. O. (2006) <doi:10.1198/106186006X137047> "Optimal full matching and related designs via network flows". Journal of Computational and Graphical Statistics, 15(3), 609-627.
Rosenbaum, P. R. (1989). "Optimal matching for observational studies" <doi:10.1080/01621459.1989.10478868> Journal of the American Statistical Association, 84(408), 1024-1032.
Rosenbaum, P. R. (1991) <doi:10.1111/j.2517-6161.1991.tb01848.x> A characterization of optimal designs for observational studies. Journal of the Royal Statistical Society: Series B (Methodological), 53(3), 597-610.
Examples
#
data(dynarski)
# Table 14.1 of "Design of Observational Studies" (2nd edition)
head(dynarski)
# Table 14.2 of "Design of Observational Studies" (2nd edition)
zb<-dynarski$zb
zbf<-factor(zb,levels=c(1,0),labels=c("Father deceased",
"Father not deceased"))
table(zbf)
Xb<-dynarski[,3:10]
# Estimate the propensity score, Rosenbaum (2010, Section 14.3)
p<-glm(zb~Xb$faminc+Xb$incmiss+Xb$black+Xb$hisp
+Xb$afqtpct+Xb$edmissm+Xb$edm+Xb$female,
family=binomial)$fitted.values
# Figure 14.1 in "Design of Observational Studies" (2nd edition)
boxplot(p~zbf,ylab="Propensity score",main="1979-1981 Cohort")
# Read about missing covariate values in section 14.4
# of "Design of Observational Studies" (2nd edition)
# Robust Mahalanobis distance matrix, treated x control
dmat<-smahal(zb,Xb)
dim(dmat)
# Table 14.3 in "Design of Observational Studies" (2nd edition)
round(dmat[1:5,1:5],2)
# Add a caliper on the propensity score using a penalty function
dmat<-addcaliper(dmat,zb,p,caliper=.2)
dim(dmat)
# Table 14.4 in "Design of Observational Studies" (2nd edition)
round(dmat[1:5,1:5],2)
## Not run:
# YOU MUST LOAD the 'optmatch' package and accept its license to continue.
# Note that the 'optmatch' package has changed since 2010. It now suggests
# that you indicate the data explicitly as data=dynarski.
# Creating a 1-to-10 match, as in section 14.6 of
# "Design of Observational Studies" (2nd edition)
# This may take a few minutes.
m<-fullmatch(dmat,data=dynarski,min.controls = 10,max.controls = 10,
omit.fraction = 1379/2689)
length(m)
sum(matched(m))
1441/11 # There are 131 matched sets, 1 treated, 10 controls
# Alternative, simpler code to do the same thing
m2<-pairmatch(dmat,controls=10,data=dynarski)
# Results are the same:
sum(m[matched(m)]!=m2[matched(m2)])
# Housekeeping
im<-as.integer(m)
dynarski<-cbind(dynarski,im)
dm<-dynarski[matched(m),]
dm<-dm[order(dm$im,1-dm$zb),]
# Table 14.5 in "Design of Observational Studies" (2nd edition)
which(dm$id==10)
dm[188:198,]
which(dm$id==396)
dm[23:33,]
which(dm$id==3051)
dm[1068:1078,]
# In principle, there can be a tie, in which several different
# matched samples all minimize the total distance. On my
# computer, this calculation reproduces Table 14.5, but were
# there a tie, 'optmatch' should return one of the tied optimal
# matches, but not any particular one.
## End(Not run)