R: EdSurvey Standard Deviation

SD {EdSurvey}

R Documentation

EdSurvey Standard Deviation

Description

Calculate the standard deviation of a numeric variable in an edsurvey.data.frame.

Usage

SD(
  data,
  variable,
  weightVar = NULL,
  jrrIMax = 1,
  varMethod = "jackknife",
  dropOmittedLevels = TRUE,
  defaultConditions = TRUE,
  recode = NULL,
  targetLevel = NULL,
  jkSumMultiplier = getAttributes(data, "jkSumMultiplier"),
  returnVarEstInputs = FALSE,
  omittedLevels = deprecated()
)

Arguments

`data`	an `edsurvey.data.frame`, an `edsurvey.data.frame.list`, or a `light.edsurvey.data.frame`
`variable`	character vector of variable names
`weightVar`	character weight variable name. Default is the default weight of `data` if it exists. If the given survey data do not have a default weight, the function will produce unweighted statistics instead. Can be set to `NULL` to return unweighted statistics.
`jrrIMax`	a numeric value; when using the jackknife variance estimation method, the default estimation option, `jrrIMax=1`, uses the sampling variance from the first plausible value as the component for sampling variance estimation. The `Vjrr` term (see Statistical Methods Used in EdSurvey) can be estimated with any number of plausible values, and values larger than the number of plausible values on the survey (including `Inf`) will result in all plausible values being used. Higher values of `jrrIMax` lead to longer computing times and more accurate variance estimates.
`varMethod`	deprecated parameter; `gap` always uses the jackknife variance estimation
`dropOmittedLevels`	a logical value. When set to `TRUE`, drops those levels of the specified `variable`. Use print on an `edsurvey.data.frame` to see the omitted levels. Defaults to `FALSE`.
`defaultConditions`	a logical value. When set to the default value of `TRUE`, uses the default conditions stored in an `edsurvey.data.frame` to subset the data. Use `print` on an `edsurvey.data.frame` to see the default conditions.
`recode`	a list of lists to recode variables. Defaults to `NULL`. Can be set as `recode` `=` `list(var1` `=` `list(from` `=` `c("a","b","c"), to` `=` `"d"))`.
`targetLevel`	a character string. When specified, calculates the gap in the percentage of students at `targetLevel` in the `variable` argument, which is useful for comparing the gap in the percentage of students at a survey response level.
`jkSumMultiplier`	when the jackknife variance estimation method—or balanced repeated replication (BRR) method—multiplies the final jackknife variance estimate by a value, set `jkSumMultiplier` to that value. For an `edsurvey.data.frame`, or a `light.edsurvey.data.frame`, the recommended value can be recovered with `EdSurvey::getAttributes(myData,` `"jkSumMultiplier")`.
`returnVarEstInputs`	a logical value set to `TRUE` to return the inputs to the jackknife and imputation variance estimates, which allows for the computation of covariances between estimates.
`omittedLevels`	this argument is deprecated. Use `dropOmittedLevels`

Value

a list object with elements:

`mean`	the mean assessment score for `variable`, calculated according to the vignette titled Statistical Methods Used in EdSurvey
`std`	the standard deviation of the `mean`
`stdSE`	the standard error of the `std`
`df`	the degrees of freedom of the `std`
`varEstInputs`	the variance estimate inputs used for calculating covariances with `varEstToCov`. Only returned with `returnVarEstInputs` is `TRUE`

Author(s)

Paul Bailey and Huade Huo

Examples

## Not run: 
# read in the example data (generated, not real student data)
sdf <- readNAEP(path=system.file("extdata/data", "M36NT2PM.dat", package="NAEPprimer"))

# get standard deviation for Male's composite score
SD(data = subset(sdf, dsex == "Male"), variable = "composite")

# get several standard deviations

# build an edsurvey.data.frame.list
sdfA <- subset(sdf, scrpsu %in% c(5,45,56))
sdfB <- subset(sdf, scrpsu %in% c(75,76,78))
sdfC <- subset(sdf, scrpsu %in% 100:200)
sdfD <- subset(sdf, scrpsu %in% 201:300)

sdfl <- edsurvey.data.frame.list(datalist=list(sdfA, sdfB, sdfC, sdfD),
                                 labels=c("A locations",
                                          "B locations",
                                          "C locations",
                                          "D locations"))

# this shows how these datasets will be described:
sdfl$covs

# SD results for each survey
SD(data = sdfl, variable = "composite")
# SD results more compactly and with comparisons
gap(variable="composite", data=sdfl, stDev=TRUE, returnSimpleDoF=TRUE)

## End(Not run)

[Package EdSurvey version 4.0.7 Index]