R: Read Data to a Data Frame

getData {EdSurvey}

R Documentation

Read Data to a Data Frame

Description

Reads in selected columns to a data.frame or a light.edsurvey.data.frame. On an edsurvey.data.frame, the data are stored on disk.

Usage

getData(
  data,
  varnames = NULL,
  drop = FALSE,
  dropUnusedLevels = TRUE,
  dropOmittedLevels = TRUE,
  defaultConditions = TRUE,
  formula = NULL,
  recode = NULL,
  includeNaLabel = FALSE,
  addAttributes = FALSE,
  returnJKreplicates = TRUE,
  omittedLevels = deprecated()
)

Arguments

`data`	an `edsurvey.data.frame` or a `light.edsurvey.data.frame`
`varnames`	a character vector of variable names that will be returned. When both `varnames` and a `formula` are specified, variables associated with both are returned. Set to `NULL` by default.
`drop`	a logical value. When set to the default value of `FALSE`, when a single column is returned, it is still represented as a `data.frame` and is not converted to a vector.
`dropUnusedLevels`	a logical value. When set to the default value of `TRUE`, drops unused levels of all factor variables.
`dropOmittedLevels`	a logical value. When set to the default value of `TRUE`, drops those levels of all factor variables that are specified in an `edsurvey.data.frame`. Use `print` on an `edsurvey.data.frame` to see the omitted levels. The omitted levels also can be adjusted with `setAttributes`; see Examples.
`defaultConditions`	a logical value. When set to the default value of `TRUE`, uses the default conditions stored in an `edsurvey.data.frame` to subset the data. Use `print` on an `edsurvey.data.frame` to see the default conditions.
`formula`	a `formula`. When included, `getData` returns data associated with all variables of the `formula`. When both `varnames` and a formula are specified, the variables associated with both are returned. Set to `NULL` by default.
`recode`	a list of lists to recode variables. Defaults to `NULL`. Can be set as `recode` `=` `list(var1` `=` `list(from` `=` `c("a","b","c"), to` `=` `"d"))`. See Examples.
`includeNaLabel`	a logical value to indicate if `NA` (missing) values are returned as literal `NA` values or as factor levels coded as `NA`
`addAttributes`	a logical value set to `TRUE` to get a `data.frame` that can be used in calls to other functions that usually would take an `edsurvey.data.frame`. This `data.frame` also is called a `light.edsurvey.data.frame`. See Description section in `edsurvey.data.frame` for more information on `light.edsurvey.data.frame`.
`returnJKreplicates`	a logical value indicating if JK replicate weights should be returned. Defaults to `TRUE`.
`omittedLevels`	this argument is deprecated. Use `dropOmittedLevels`.

Details

By default, an edsurvey.data.frame does not have data read into memory until getData is called and returns a data frame. This structure allows EdSurvey to have a minimal memory footprint. To keep the footprint small, you need to limit varnames to just the necessary variables.

There are two methods of attaching survey attributes to a data.frame to make it usable by the functions in the EdSurvey package (e.g., lm.sdf): (a) setting the addAttributes argument to TRUE at in the call to getData or (b) by appending the attributes to the data frame with rebindAttributes.

When getData is called, it returns a data frame. Setting the addAttributes argument to TRUE adds the survey attributes and changes the resultant data.frame to a light.edsurvey.data.frame.

Alternatively, a data.frame can be coerced into a light.edsurvey.data.frame using rebindAttributes. See Examples in the rebindAttributes documentation.

If both formula and varnames are populated, the variables on both will be included.

See the vignette titled Using the getData Function in EdSurvey for long-form documentation on this function.

Value

When addAttributes is FALSE, getData returns a data.frame containing data associated with the requested variables. When addAttributes is TRUE, getData returns a light.edsurvey.data.frame.

Author(s)

Tom Fink, Paul Bailey, and Ahmad Emad

Examples

## Not run: 
# read in the example data (generated, not real student data)
sdf <- readNAEP(path=system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))

# get two variables, without weights
df <- getData(data=sdf, varnames=c("dsex", "b017451"))
table(df)

# example of using recode
df2 <- getData(data=sdf, varnames=c("dsex", "t088301"),
               recode=list(t088301=list(from=c("Yes, available","Yes, I have access"),
                                        to=c("Yes")),
                           t088301=list(from=c("No, have no access"),
                                        to=c("No"))))
table(df2)

# when readNAEP is called on a data file, it appends a default 
# condition to the edsurvey.data.frame. You can see these conditions
# by printing the sdf
sdf

# As per the default condition specified, getData restricts the data to only
# Reporting Sample. This behavior can be changed as follows:
df2 <- getData(data=sdf, varnames=c("dsex", "b017451"), defaultConditions = FALSE)
table(df2)

# similarly, the default behavior of omitting certain levels specified
# in the edsurvey.data.frame can be changed as follows:
df2 <- getData(data=sdf, varnames=c("dsex", "b017451"), omittedLevels = FALSE)
table(df2)

# omittedLevels can also be edited with setAttributes()
# here, the omitted level "Multiple" is removed from the list
sdfIncludeMultiple <- setAttributes(data=sdf, attribute="omittedLevels", value=c(NA, "Omitted"))
# check that it was set
getAttributes(data=sdfIncludeMultiple, attribute="omittedLevels")
# notice that omittedLevels is TRUE, removing NA and "Omitted" still
dfIncludeMultiple <- getData(data=sdfIncludeMultiple, varnames=c("dsex", "b017451"))
table(dfIncludeMultiple)

# the variable "c052601" is from the school-level data file; merging is handled automatically.
# returns a light.edsurvey.data.frame using addAttributes=TRUE argument
gddat <- getData(data=sdf, 
                 varnames=c("composite", "dsex", "b017451","c052601"),
                 addAttributes = TRUE)
class(gddat)
# look at the first few lines
head(gddat)

# get a selection of variables, recode using ifelse, and reappend attributes
# with rebindAttributes so that it can be used with EdSurvey analysis functions
df0 <- getData(data=sdf, varnames=c("composite", "dsex", "b017451", "origwt"))
df0$sex <- ifelse(df0$dsex=="Male", "boy", "girl")
df0 <- rebindAttributes(data=df0, attributeData=sdf)

# getting all the data can use up all the memory and is generally a bad idea
df0 <- getData(data=sdf, varnames=colnames(sdf),
               omittedLevels=FALSE, defaultConditions=FALSE)

## End(Not run)

[Package EdSurvey version 4.0.7 Index]