build.panel {psidR} | R Documentation |
build.panel: Build PSID panel data set
Description
Builds a panel data set with id variables pid
(unique person identifier) and year
from individual PSID family files and supplemental wealth files.
Usage
build.panel(
datadir = NULL,
fam.vars,
ind.vars = NULL,
heads.only = FALSE,
current.heads.only = FALSE,
sample = NULL,
design = "balanced",
loglevel = INFO
)
Arguments
datadir |
either |
fam.vars |
data.frame of variable to retrieve from family files. Can contain see example for required format. |
ind.vars |
data.frame of variables to get from individual file. In almost all cases this will be the type of survey weights you want to use. don't include id variables ER30001 and ER30002. |
heads.only |
logical TRUE if user wants household heads only. Household heads in sample year. |
current.heads.only |
logical TRUE if user wants current household heads only. Distinguishes mover outs heads. |
sample |
string indicating which sample to select: "SRC" (survey research center), "SEO" (survey for economic opportunity), "immigrant" (immigrant sample), "latino" (Latino family sample). Defaults to NULL, so no subsetting takes place. |
design |
either character balanced or all or integer. balanced means only individuals who appear in each wave are considered. All means all are taken. An integer value stands for minimum consecutive years of participation, i.e. design=3 means present in at least 3 consecutive waves. |
loglevel |
one of INFO, WARN and DEBUG. INFO by default. |
Details
There are several supported approches. Approach one downloads stata data, uses stata to build each wave, then puts it together with 'psidR'. The second (recommended) approach downloads all data directly from the psid servers (no Stata needed). For this approach you need to supply the precise names of psid variables - those variable names vary by year. E.g. total family income will have different names in different waves. The function getNamesPSID
greatly helps collecting names for all waves.
Value
resulting data.table
. the variable pid
is the unique person identifier, constructed from ID1968 and pernum
Merging
The variables interview number
in each family file map to
the interview number
variable of a given year in the individual file. Run example(build.panel)
for a demonstration.
Supplements
Notice that support for wealth supplements is disabled! Recent releases of the main family file have wealth data included. Earlier waves must be merged manually, again by variable interview number
as above.
Examples
## Not run:
# ################################################
# Real-world example: not run because takes long.
# Build panel with income, wage, age and education
# optionally: add wealth supplements!
# ################################################
# The package is installed with a list of variables
# Alternatively, search for names with \code{\link{getNamesPSID}}
# This is the body of function build.psid()
# (so why not call build.psid() and see what happens!)
r = system.file(package="psidR")
if (small){
f = fread(file.path(r,"psid-lists","famvars-small.txt"))
i = fread(file.path(r,"psid-lists","indvars-small.txt"))
} else {
f = fread(file.path(r,"psid-lists","famvars.txt"))
i = fread(file.path(r,"psid-lists","indvars.txt"))
}
setkey(i,"name")
setkey(f,"name")
i = dcast(i[,list(year,name,variable)],year~name)
f = dcast(f[,list(year,name,variable)],year~name)
d = build.panel(datadir="~/datasets/psid/",fam.vars=f,
ind.vars=i,
heads.only =TRUE,sample="SRC",
design="all")
save(d,file="~/psid.RData")
## End(Not run)
# ######################################
# reproducible example on artifical data.
# run this with example(build.panel).
# ######################################
## make reproducible family data sets for 2 years
## variables are: family income (Money) and age
## Data acquisition step:
## run build.panel with sascii=TRUE
# testPSID creates artifical PSID data
td <- testPSID(N=12,N.attr=0)
fam1985 <- data.table::copy(td$famvars1985)
fam1986 <- data.table::copy(td$famvars1986)
IND2019ER <- data.table::copy(td$IND2019ER)
# create a temporary datadir
my.dir <- tempdir()
#save those in the datadir
# notice different R formats admissible
save(fam1985,file=paste0(my.dir,"/FAM1985ER.rda"))
save(fam1986,file=paste0(my.dir,"/FAM1986ER.RData"))
save(IND2019ER,file=paste0(my.dir,"/IND2019ER.RData"))
## end Data acquisition step.
# now define which famvars
famvars <- data.frame(year=c(1985,1986),
money=c("Money85","Money86"),
age=c("age85","age86"))
# create ind.vars
indvars <- data.frame(year=c(1985,1986),ind.weight=c("ER30497","ER30534"))
# call the builder
# data will contain column "relation.head" holding the relationship code.
d <- build.panel(datadir=my.dir,fam.vars=famvars,
ind.vars=indvars,
heads.only=FALSE)
# see what happens if we drop non-heads
# only the ones who are heads in BOTH years
# are present (since design='balanced' by default)
d <- build.panel(datadir=my.dir,fam.vars=famvars,
ind.vars=indvars,
heads.only=TRUE)
print(d[order(pid)],nrow=Inf)
# change sample design to "all":
# we'll keep individuals if they are head in one year,
# and drop in the other
d <- build.panel(datadir=my.dir,fam.vars=famvars,
ind.vars=indvars,heads.only=TRUE,
design="all")
print(d[order(pid)],nrow=Inf)
file.remove(paste0(my.dir,"/FAM1985ER.rda"),
paste0(my.dir,"/FAM1986ER.RData"),
paste0(my.dir,"/IND2019ER.RData"))
# END psidR example
# #####################################################################
# Please go to https://github.com/floswald/psidR for more example usage
# #####################################################################