cor.sdf {EdSurvey}  R Documentation 
Computes the correlation of two variables on an edsurvey.data.frame
,
a light.edsurvey.data.frame
, or an edsurvey.data.frame.list
.
The correlation accounts for plausible values and the survey design.
cor.sdf(
x,
y,
data,
method = c("Pearson", "Spearman", "Polychoric", "Polyserial"),
weightVar = "default",
reorder = NULL,
omittedLevels = TRUE,
defaultConditions = TRUE,
recode = NULL,
condenseLevels = TRUE,
fisherZ = if (match.arg(method) %in% "Pearson") { TRUE } else { FALSE },
jrrIMax = Inf,
verbose = TRUE
)
x 
a character variable name from the 
y 
a character variable name from the 
data 
an 
method 
a character string indicating which correlation coefficient (or covariance) is to be computed.
One of 
weightVar 
character indicating the weight variable to use. See Details section in 
reorder 
a list of variables to reorder. Defaults to 
omittedLevels 
a logical value. When set to the default value of 
defaultConditions 
a logical value. When set to the default value of 
recode 
a list of lists to recode variables. Defaults to 
condenseLevels 
a logical value. When set to the default value of

fisherZ 
for standard error and mean calculations, set to 
jrrIMax 
a numeric value; when using the jackknife variance estimation method, the default estimation option, 
verbose 
a logical value. Set to 
The getData
arguments and recode.sdf
may be useful. (See Examples.)
The correlation methods are calculated as described in the documentation for the wCorr
package—see browseVignettes(package="wCorr")
.
When method
is set to polyserial
, all x
arguments are assumed to be continuous and all y
assumed discrete. Therefore,
be mindful of variable selection as this may result in calculations taking a very long time to complete.
The Fisher Ztransformation is both a variance stabilizing and normalizing transformation for the Pearson correlation coefficient (Fisher, 1915). The transformation takes the inverse hybarbolic tangent of the correlation coefficients and then calculates all variances and confidence intervals. These are then transformed back to the correlation space (values between 1 and 1, inclusive) using the hyperbolic tangent function. The Taylor series approximation (or delta method) is applied for the standard errors.
An edsurvey.cor
that has print and summary methods.
The class includes the following elements:
correlation 
numeric estimated correlation coefficient 
Zse 
standard error of the correlation ( 
correlates 
a vector of length two showing the columns for which the correlation coefficient was calculated 
variables 

order 
a list that shows the order of each variable 
method 
the type of correlation estimated 
Vjrr 
the jackknife component of the variance estimate. For Pearson, in the atanh space. 
Vimp 
the imputation component of the variance estimate. For Pearson, in the atanh space. 
weight 
the weight variable used 
npv 
the number of plausible values used 
njk 
the number of the jackknife replicates used 
n0 
the original number of observations 
nUsed 
the number of observations used in the analysis—after any conditions and any listwise deletion of missings is applied 
se 
the standard error of the correlation, in the correlation ([1,1]) space 
ZconfidenceInterval 
the confidence interval of the correlation in the transformation space 
confidenceInterval 
the confidence interval of the correlation in the correlation ([1,1]) space 
transformation 
the name of the transformation used when calculating standard errors 
Paul Bailey; relies heavily on the wCorr
package, written by Ahmad Emad and Paul Bailey
Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika, 10(4), 507–521.
cor
and weightedCorr
## Not run:
# read in the example data (generated, not real student data)
sdf < readNAEP(system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))
# for two categorical variables any of the following work
c1_pears < cor.sdf(x="b017451", y="b003501", data=sdf, method="Pearson",
weightVar="origwt")
c1_spear < cor.sdf(x="b017451", y="b003501", data=sdf, method="Spearman",
weightVar="origwt")
c1_polyc < cor.sdf(x="b017451", y="b003501", data=sdf, method="Polychoric",
weightVar="origwt")
c1_pears
c1_spear
c1_polyc
# for categorical variables, users can either keep the original numeric levels of the variables
# or condense the levels (default)
# the following call condenses the levels of the variable 'c046501'
cor.sdf(x="c046501", y="c044006", data=sdf)
# the following call keeps the original levels of the variable 'c046501'
cor.sdf(x="c046501", y="c044006", data=sdf, condenseLevels = FALSE)
# these take awhile to calculate for large datasets, so limit to a subset
sdf_dnf < subset(sdf, b003601 == 1)
# for a categorical variable and a scale score any of the following work
c2_pears < cor.sdf(x="composite", y="b017451", data=sdf_dnf, method="Pearson",
weightVar="origwt")
c2_spear < cor.sdf(x="composite", y="b017451", data=sdf_dnf, method="Spearman",
weightVar="origwt")
c2_polys < cor.sdf(x="composite", y="b017451", data=sdf_dnf, method="Polyserial",
weightVar="origwt")
c2_pears
c2_spear
c2_polys
# recode two variables
cor.sdf(x="c046501", y="c044006", data=sdf, method="Spearman", weightVar="origwt",
recode=list(c046501=list(from="0%",to="None"),
c046501=list(from=c("15%", "610%", "1125%", "2650%",
"5175%", "7690%", "Over 90%"),
to="Between 0% and 100%"),
c044006=list(from=c("15%", "610%", "1125%", "2650%",
"5175%", "7690%", "Over 90%"),
to="Between 0% and 100%")))
# reorder two variables
cor.sdf(x="b017451", y="sdracem", data=sdf, method="Spearman", weightVar="origwt",
reorder=list(sdracem=c("White", "Hispanic", "Black", "Asian/Pacific Island",
"Amer Ind/Alaska Natv", "Other"),
b017451=c("Every day", "2 or 3 times a week", "About once a week",
"Once every few weeks", "Never or hardly ever")))
# recode two variables and reorder
cor.sdf(x="pared", y="b013801", data=subset(sdf, !pared %in% "I Don\'t Know"),
method="Spearman", weightVar = "origwt",
recode=list(pared=list(from="Some ed after H.S.", to="Graduated H.S."),
pared=list(from="Graduated college", to="Graduated H.S."),
b013801=list(from="010", to="Less than 100"),
b013801=list(from="1125", to="Less than 100"),
b013801=list(from="26100", to="Less than 100")),
reorder=list(b013801=c("Less than 100", ">100")))
## End(Not run)