gap {EdSurvey}  R Documentation 
Compares the average levels of a variable between two groups that potentially share members.
gap(
variable,
data,
groupA = "default",
groupB = "default",
percentiles = NULL,
achievementLevel = NULL,
achievementDiscrete = FALSE,
stDev = FALSE,
targetLevel = NULL,
weightVar = NULL,
jrrIMax = 1,
varMethod = c("jackknife"),
omittedLevels = TRUE,
defaultConditions = TRUE,
recode = NULL,
referenceDataIndex = 1,
returnVarEstInputs = FALSE,
returnSimpleDoF = FALSE,
returnSimpleN = FALSE,
returnNumberOfPSU = FALSE,
noCov = FALSE,
pctMethod = c("unbiased", "symmetric", "simple"),
includeLinkingError = FALSE
)
variable 
a character indicating the variable to be compared, potentially with a subject scale or subscale 
data 
an 
groupA 
an expression or character expression that defines a condition for the subset.
This subset will be compared to 
groupB 
an expression or character expression that defines a condition for the subset.
This subset will be compared to 
percentiles 
a numeric vector. The 
achievementLevel 
the achievement level(s) at which percentages should be calculated 
achievementDiscrete 
a logical indicating if the achievement level
specified in the 
stDev 
a logical, set to 
targetLevel 
a character string. When specified, calculates the gap in
the percentage of students at

weightVar 
a character indicating the weight variable to use. See Details. 
jrrIMax 
a numeric value; when using the jackknife variance estimation method, the default estimation option, 
varMethod 
deprecated parameter, 
omittedLevels 
a logical value. When set to the default value of

defaultConditions 
a logical value. When set to the default value
of 
recode 
a list of lists to recode variables. Defaults to 
referenceDataIndex 
a numeric used only when the 
returnVarEstInputs 
a logical value; set to 
returnSimpleDoF 
a logical value set to 
returnSimpleN 
a logical value set to 
returnNumberOfPSU 
a logical value set to 
noCov 
set the covariances to zero in result 
pctMethod 
a character that is one of 
includeLinkingError 
a logical value set to 
This function calculates the gap between groupA
and groupB
(which
may be omitted to indicate the full sample). The gap is
calculated for one of four statistics:
The mean score gap (in the score
variable) identified in the variable
argument.
This is the default. The means and their standard errors are
calculated using the methods
described in the lm.sdf
function documentation.
The gap between respondents at
the percentiles specified in the percentiles
argument.
This is returned when the percentiles
argument is
defined. The mean and standard error are computed as described in the
percentile
function documentation.
The gap in the percentage of
students at (when achievementDiscrete
is TRUE
) or at
or above (when achievementDiscrete
is FALSE
) a
particular achievement level. This is used when the
achievementLevel
argument is defined. The mean and standard error
are calculated as described in the achievementLevels
function documentation.
The gap in the percentage of
respondents responding at targetLevel
to
variable
. This is used when targetLevel
is
defined. The mean and standard deviation are calculated as described in
the edsurveyTable
function documentation.
The return type depends on if the class of the data
argument is an
edsurvey.data.frame
or an edsurvey.data.frame.list
. Both
include the call (called call
), a list called labels
,
an object named percentage
that shows the percentage in groupA
and groupB
, and an object
that shows the gap called results
.
The labels include the following elements:
definition 
the definitions of the groups 
nFullData 
the nsize for the full dataset (before applying the definition) 
nUsed 
the nsize for the data after the group is subsetted and other restrictions (such as omitted values) are applied 
nPSU 
the number of PSUs used in calculation–only returned when

The percentages are computed according to the vignette titled Statistical Methods Used in EdSurvey in the section “Estimation of Weighted Percentages When Plausible Values Are Not Present.” The standard errors are calculated according to “Estimation of the Standard Error of Weighted Percentages When Plausible Values Are Not Present, Using the Jackknife Method.” Standard errors of differences are calculated as the square root of the typical variance formula
Var(AB) = Var(A) + Var(B)  2 Cov(A,B)
where the covariance term is calculated as described in the vignette titled Statistical Methods Used in EdSurvey in the section “Estimation of Covariances.” These degrees of freedom are available only with the jackknife variance estimation. The degrees of freedom used for hypothesis testing are always set to the number of jackknife replicates in the data.
the data argument is an edsurvey.data.frame
When the data
argument is an edsurvey.data.frame
,
gap
returns an S3 object of class gap
.
The percentage
object is a numeric vector with the following elements:
pctA 
the percentage of respondents in 
pctAse 
the standard error on the percentage of respondents in

dofA 
degrees of freedom appropriate for a ttest involving 
pctB 
the percentage of respondents in 
pctBse 
the standard error on the percentage of respondents in

dofB 
degrees of freedom appropriate for a ttest involving 
diffAB 
the value of 
covAB 
the covariance of 
diffABse 
the standard error of 
diffABpValue 
the pvalue associated with the ttest used
for the hypothesis test that 
dofAB 
degrees of freedom used in calculating

The results
object is a numeric data frame with the following elements:
estimateA 
the mean estimate of 
estimateAse 
the standard error of 
dofA 
degrees of freedom appropriate for a ttest involving 
estimateB 
the mean estimate of 
estimateBse 
the standard error of 
dofB 
degrees of freedom appropriate for a ttest involving 
diffAB 
the value of 
covAB 
the covariance of 
diffABse 
the standard error of 
diffABpValue 
the pvalue associated with the ttest used
for the hypothesis test that 
dofAB 
degrees of freedom used for the ttest on 
If the gap was in achievement levels or percentiles and more
than one percentile or achievement level is requested,
then an additional column
labeled percentiles
or achievementLevel
is included
in the results
object.
When results
has a single row and when returnVarEstInputs
is TRUE
, the additional elements varEstInputs
and
pctVarEstInputs
also are returned. These can be used for calculating
covariances with varEstToCov
.
the data argument is an edsurvey.data.frame.list
When the data
argument is an edsurvey.data.frame.list
,
gap
returns an S3 object of class gapList
.
The results
object in the edsurveyResultList
is
a data.frame
. Each row regards a particular dataset from the
edsurvey.data.frame
, and a reference dataset is dictated by
the referenceDataIndex
argument.
The percentage
object is a data.frame
with the following elements:
covs 
a data frame with a column for each column in the 
... 
all elements in the 
diffAA 
the difference in 
covAA 
the covariance of 
diffAAse 
the standard error for 
diffAApValue 
the pvalue associated with the ttest used
for the hypothesis test that 
diffBB 
the difference in 
covBB 
the covariance of 
diffBBse 
the standard error for 
diffBBpValue 
the pvalue associated with the ttest used
for the hypothesis test that 
diffABAB 
the value of 
covABAB 
the covariance of 
diffABABse 
the standard error for 
diffABABpValue 
the pvalue associated with the ttest used
for the hypothesis test that 
The results
object is a data.frame
with the following elements:
... 
all elements in the 
diffAA 
the value of 
covAA 
the covariance of 
diffAAse 
the standard error for 
diffAApValue 
the pvalue associated with the ttest used
for the hypothesis test that 
diffBB 
the value of 
covBB 
the covariance of 
diffBBse 
the standard error for 
diffBBpValue 
the pvalue associated with the ttest used
for the hypothesis test that 
diffABAB 
the value of 
covABAB 
the covariance of 
diffABABse 
the standard error for 
diffABABpValue 
the pvalue associated with the ttest used
for the hypothesis test that 
sameSurvey 
a logical value indicating if this line uses the same
survey as the reference line. Set to 
Paul Bailey, Trang Nguyen, and Huade Huo
## Not run:
# read in the example data (generated, not real student data)
sdf < readNAEP(system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))
# find the mean score gap in the primer data between males and females
gap("composite", sdf, dsex=="Male", dsex=="Female")
# find the score gap of the quartiles in the primer data between males and females
gap("composite", sdf, dsex=="Male", dsex=="Female", percentile=50)
gap("composite", sdf, dsex=="Male", dsex=="Female", percentile=c(25, 50, 75))
# find the percent proficient (or higher) gap in the primer data between males and females
gap("composite", sdf, dsex=="Male", dsex=="Female",
achievementLevel=c("Basic", "Proficient", "Advanced"))
# find the discrete achievement level gapthis is harder to interpret
gap("composite", sdf, dsex=="Male", dsex=="Female",
achievementLevel="Proficient", achievementDiscrete=TRUE)
# find the percent talk about studies at home (b017451) never or hardly
# ever gap in the primer data between males and females
gap("b017451", sdf, dsex=="Male", dsex=="Female",
targetLevel="Never or hardly ever")
# example showing how to compare multiple levels
gap("b017451",sdf, dsex=="Male", dsex=="Female", targetLevel="Infrequently",
recode=list(b017451=list(from=c("Never or hardly ever",
"Once every few weeks",
"About once a week"),
to=c("Infrequently"))))
# make subsets of sdf by scrpsu, "Scrambled PSU and school code"
sdfA < subset(sdf, scrpsu %in% c(5,45,56))
sdfB < subset(sdf, scrpsu %in% c(75,76,78))
sdfC < subset(sdf, scrpsu %in% 100:200)
sdfD < subset(sdf, scrpsu %in% 201:300)
sdfl < edsurvey.data.frame.list(list(sdfA, sdfB, sdfC, sdfD),
labels=c("A locations", "B locations",
"C locations", "D locations"))
gap("composite", sdfl, dsex=="Male", dsex=="Female", percentile=c(50))
## End(Not run)
## Not run:
# example showing using linking error with gap
# load Grade 4 math data
# requires NAEP RUD license with these files in the folder the user is currectly in
g4math2015 < readNAEP("M46NT1AT.dat")
g4math2017 < readNAEP("M48NT1AT.dat")
g4math2019 < readNAEP("M50NT1AT.dat")
# make an edsurvey.data.frame.list from math grade 4 2015, 2017, and 2019 data
g4math < edsurvey.data.frame.list(list(g4math2019, g4math2017, g4math2015),
labels = c("2019", "2017", "2015"))
# gap analysis with linking error in variance estimation across surveys
gap("composite", g4math, dsex == "Male", dsex == "Female", includeLinkingError=TRUE)
gap("composite", g4math, dsex == "Male", dsex == "Female", percentiles = c(10, 25),
includeLinkingError=TRUE)
gap("composite", g4math, dsex == "Male", dsex == "Female",
achievementDiscrete = TRUE, achievementLevel=c("Basic", "Proficient", "Advanced"),
includeLinkingError=TRUE)
## End(Not run)