dynarski {DOS} | R Documentation |
A Natural Experiment Concerning Subsidized College Education
Description
The data are from Susan Dynarski (2003)'s study of the effects of a subsidy for college education provided Social Security to children whose fathers had died. The subsidy was eliminated in 1982. As presented here, the data are only a portion of the data used in Dynarski (2003), specifically the period from 1979-1981 when the subsidy was available. The data set here is Table 13.1 of Design of Observational Studies (1st edition), where it was used with the limited goal of illustrating matching techniques.
Usage
data("dynarski")
Format
A data frame with 2820 observations on the following 10 variables.
id
identification number
zb
treatment indicator, zb=1 if 1979-1981 with father deceased, or zb=0 if 1979-1981 with father alive
faminc
family income, in units of tens of thousands of dollars
incmiss
indicator for missing family income
black
1 if black, 0 otherwise
hisp
1 if hispanic, 0 otherwise
afqtpct
test score percentile, Armed Forces Qualifications Test
edmissm
indicator for missing education of mother
edm
education of mother, 1 is <high school, 2 is high school, 3 is some college, 4 is a BA degree or higher
female
1 if female, 0 if male
Details
The examples reproduce steps in Chapter 13 of Rosenbaum (2010). Portions of the examples require you to load Ben Hansen's optmatch package and accept its academic license; these portions of the examples do not run automatically. Hansen's optmatch package uses the Fortran code of Bertsekas and Tseng (1988).
Source
Dynarski (2003).
References
Bertsekas, D. P. and Tseng, P. (1988). The relax codes for linear minimum cost network flow problems. Annals of Operations Research, 13(1), 125-190.
Dynarski, S. M. (2003). Does aid matter? Measuring the effect of student aid on college attendance and completion. American Economic Review, 93(1), 279-288.
Hansen, B. B. (2007). Flexible, optimal matching for observational studies. R News, 7, 18-24.
Hansen, B. B. and Klopfer, S. O. (2006). Optimal full matching and related designs via network flows. Journal of Computational and Graphical Statistics, 15(3), 609-627.
Rosenbaum, P. R. (2010). Design of Observational Studies. New York: Springer.
Examples
#
data(dynarski)
# Table 13.1 of Rosenbaum (2010)
head(dynarski)
# Table 13.2 of Rosenbaum (2010)
zb<-dynarski$zb
zbf<-factor(zb,levels=c(1,0),labels=c("Father deceased","Father not deceased"))
table(zbf)
Xb<-dynarski[,3:10]
# Estimate the propensity score, Rosenbaum (2010, Section 13.3)
p<-glm(zb~Xb$faminc+Xb$incmiss+Xb$black+Xb$hisp
+Xb$afqtpct+Xb$edmissm+Xb$edm+Xb$female,
family=binomial)$fitted.values
# Figure 13.1 in Rosenbaum (2010)
boxplot(p~zbf,ylab="Propensity score",main="1979-1981 Cohort")
# Read about missing covariate values in section 13.4 of Rosenbaum (2010)
# Robust Mahalanobis distance matrix, treated x control
dmat<-smahal(zb,Xb)
dim(dmat)
# Table 13.3 in Rosenbaum (2010)
round(dmat[1:5,1:5],2)
# Add a caliper on the propensity score using a penalty function
dmat<-addcaliper(dmat,zb,p,caliper=.2)
dim(dmat)
# Table 13.4 in Rosenbaum (2010)
round(dmat[1:5,1:5],2)
## Not run:
# YOU MUST LOAD the optmatch package and accept its license to continue.
# Note that the optmatch package has changed since 2010. It now suggests
# that you indicate the data explicitly as data=dynarski.
# Creating a 1-to-10 match, as in section 13.6 of Rosenbaum (2010)
# This may take a few minutes.
m<-fullmatch(dmat,data=dynarski,min.controls = 10,max.controls = 10,omit.fraction = 1379/2689)
length(m)
sum(matched(m))
1441/11 # There are 131 matched sets, 1 treated, 10 controls
# Alternative, simpler code to do the same thing
m2<-pairmatch(dmat,controls=10,data=dynarski)
# Results are the same:
sum(m[matched(m)]!=m2[matched(m2)])
# Housekeeping
im<-as.integer(m)
dynarski<-cbind(dynarski,im)
dm<-dynarski[matched(m),]
dm<-dm[order(dm$im,1-dm$zb),]
# Table 13.5 in Rosenbaum (2010)
which(dm$id==10)
dm[188:198,]
which(dm$id==396)
dm[23:33,]
which(dm$id==3051)
dm[1068:1078,]
# In principle, there can be a tie, in which several different
# matched samples all minimize the total distance. On my
# computer, this calculation reproduces Table 13.5, but were
# there a tie, optmatch should return one of the tied optimal
# matches, but not any particular one.
## End(Not run)