R: Multivariate Regression

mvrlm.sdf {EdSurvey}

R Documentation

Multivariate Regression

Description

Fits a multivariate linear model that uses weights and variance estimates appropriate for the edsurvey.data.frame.

Usage

mvrlm.sdf(
  formula,
  data,
  weightVar = NULL,
  relevels = list(),
  jrrIMax = 1,
  dropOmittedLevels = TRUE,
  defaultConditions = TRUE,
  recode = NULL,
  returnVarEstInputs = FALSE,
  estMethod = "OLS",
  verbose = TRUE,
  omittedLevels = deprecated()
)

Arguments

`formula`	a `Formula` for the linear model. See `Formula`; left-hand side variables are separated with vertical pipes (`\|`). See Examples.
`data`	an `edsurvey.data.frame` or an `edsurvey.data.frame.list`
`weightVar`	character indicating the weight variable to use (see Details). The `weightVar` must be one of the weights for the `edsurvey.data.frame`. If `NULL`, uses the default for the `edsurvey.data.frame`.
`relevels`	a list. Used to change the contrasts from the default treatment contrasts to treatment contrasts with a chosen omitted group (the reference group). To do this, the user puts an element on the list with the same name as a variable to change contrasts on and then make the value for that list element equal to the value that should be the omitted group (the reference group).
`jrrIMax`	a numeric value; when using the jackknife variance estimation method, the default estimation option, `jrrIMax=1`, uses the sampling variance from the first plausible value as the component for sampling variance estimation. The `V_{jrr}` term (see Statistical Methods Used in EdSurvey) can be estimated with any number of plausible values, and values larger than the number of plausible values on the survey (including `Inf`) will result in all plausible values being used. Higher values of `jrrIMax` lead to longer computing times and more accurate variance estimates.
`dropOmittedLevels`	a logical value. When set to the default value of `TRUE`, drops those levels of all factor variables that are specified in `edsurvey.data.frame`. Use `print` on an `edsurvey.data.frame` to see the omitted levels.
`defaultConditions`	a logical value. When set to the default value of `TRUE`, uses the default conditions stored in `edsurvey.data.frame` to subset the data. Use `print` on an `edsurvey.data.frame` to see the default conditions.
`recode`	a list of lists to recode variables. Defaults to `NULL`. Can be set as `recode` `=` `list(var1=` `list(from=c("a","b","c"),` `to ="d"))`.
`returnVarEstInputs`	a logical value. Set to `TRUE` to return the inputs to the jackknife and imputation variance estimates, which allow for computation of covariances between estimates.
`estMethod`	a character value indicating which estimation method to use. Default is `OLS`; other option is `GLS`.
`verbose`	logical; indicates whether a detailed printout should display during execution
`omittedLevels`	this argument is deprecated. Use `dropOmittedLevels`

Details

This function implements an estimator that correctly handles multiple left-hand side variables that are either numeric or plausible values, allows for survey sampling weights, and estimates variances using the jackknife replication method. The vignette titled Statistical Methods Used in EdSurvey describes estimation of the reported statistics.

The coefficients are estimated using the sample weights according to the section “Estimation of Weighted Means When Plausible Values Are Not Present” or the section “Estimation of Weighted Means When Plausible Values Are Present,” depending on if there are assessment variables or variables with plausible values in them.

The coefficient of determination (R-squared value) is similarly estimated by finding the average R-squared using the sample weights for each set of plausible values.

Variance estimation of coefficients

All variance estimation methods are shown in the vignette titled Statistical Methods Used in EdSurvey.

When the predicted value does not have plausible values, the variance of the coefficients is estimated according to the section “Estimation of Standard Errors of Weighted Means When Plausible Values Are Not Present, Using the Jackknife Method.”

When plausible values are present, the variance of the coefficients is estimated according to the section “Estimation of Standard Errors of Weighted Means When Plausible Values Are Present, Using the Jackknife Method.”

For more information on the specifics of multivariate regression, see the vignette titled Methods and Overview of Using EdSurvey for Multivariate Regression.

Value

An edsurvey.mvrlm with elements:

`call`	the function call
`formula`	the formula used to fit the model
`coef`	the estimates of the coefficients
`se`	the standard error estimates of the coefficients
`Vimp`	the estimated variance caused by uncertainty in the scores (plausible value variables)
`Vjrr`	the estimated variance caused by sampling
`M`	the number of plausible values
`varm`	the variance estimates under the various plausible values
`coefm`	the values of the coefficients under the various plausible values
`coefmat`	the coefficient matrix (typically produced by the summary of a model)
`r.squared`	the coefficient of determination
`weight`	the name of the weight variable
`npv`	the number of plausible values
`njk`	the number of the jackknife replicates used
`varEstInputs`	When `returnVarEstInputs` is `TRUE`, this element is returned. These are used for calculating covariances with `varEstToCov`.
`residuals`	residuals for each of the PV models
`fitted.values`	model fitted values
`residCov`	residual covariance matrix for dependent variables
`residPV`	residuals for each dependent variable
`inputs`	coefficient estimation input matrices
`n0`	full data n
`nUsed`	n used for model
`B`	imputation variance-covariance matrix, before multiplication by (M+1)/M
`U`	sampling variance-covariance matrix

Author(s)

Alex Lishinski and Paul Bailey

Examples

## Not run: 
# read in the example data (generated, not real student data)
sdf <- readNAEP(path=system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))

# use | symbol to separate dependent variables in the left-hand side of formula
mvrlm.fit <- mvrlm.sdf(formula=algebra | geometry ~ dsex + m072801, jrrIMax = 5, data = sdf)

# print method returns coefficients, as does coef method
mvrlm.fit
coef(mvrlm.fit)

# for more detailed results, use summary:
summary(mvrlm.fit)

# details of model can also be accessed through components of the returned object; for example:

# coefficients (one column per dependent variable)
mvrlm.fit$coef
# coefficient table with standard errors and p-values (1 table per dependent variable)
mvrlm.fit$coefmat
# R-squared values (one per dependent variable)
mvrlm.fit$r.squared
# residual covariance matrix
mvrlm.fit$residCov

# dependent variables can have plausible values or not (or a combination)

mvrlm.fit <- mvrlm.sdf(formula=composite | mrps22 ~ dsex + m072801, data = sdf, jrrIMax = 5)
summary(mvrlm.fit)

mvrlm.fit <- mvrlm.sdf(formula=algebra | geometry | measurement ~ dsex + m072801,
	                   data = sdf, jrrIMax = 5)
summary(mvrlm.fit)

mvrlm.fit <- mvrlm.sdf(formula=mrps51 | mrps22 ~ dsex + m072801, data = sdf, jrrIMax = 5)
summary(mvrlm.fit)

# hypotheses about coefficient restrictions can also be tested using the Wald test

mvr <- mvrlm.sdf(formula=algebra | geometry ~ dsex + m072801, data = sdf)

hypothesis <- c("geometry_dsexFemale = 0", "algebra_dsexFemale = 0")

# test statistics based on the F and chi-squared distribution are available
linearHypothesis(model=mvr, hypothesis = hypothesis, test = "F")
linearHypothesis(model=mvr, hypothesis = hypothesis, test = "Chisq")

## End(Not run)

[Package EdSurvey version 4.0.7 Index]