lsa.lin.reg {RALSA}R Documentation

Compute linear regression coefficients specified groups

Description

lsa.lin.reg computes linear regression coefficients within groups defined by one or more variables.

Usage

lsa.lin.reg(
  data.file,
  data.object,
  split.vars,
  bckg.dep.var,
  PV.root.dep,
  bckg.indep.cont.vars,
  bckg.indep.cat.vars,
  bckg.cat.contrasts,
  bckg.ref.cats,
  PV.root.indep,
  interactions,
  standardize = FALSE,
  weight.var,
  include.missing = FALSE,
  shortcut = FALSE,
  save.output = TRUE,
  output.file,
  open.output = TRUE
)

Arguments

data.file

The file containing lsa.data object. Either this or data.object shall be specified, but not both. See details.

data.object

The object in the memory containing lsa.data object. Either this or data.file shall be specified, but not both. See details.

split.vars

Categorical variable(s) to split the results by. If no split variables are provided, the results will be for the overall countries' populations. If one or more variables are provided, the results will be split by all but the last variable and the percentages of respondents will be computed by the unique values of the last splitting variable.

bckg.dep.var

Name of a continuous background or contextual variable used as a dependent variable in the model. See details.

PV.root.dep

The root name for a set of plausible values used as a dependent variable in the model. See details.

bckg.indep.cont.vars

Names of continuous independent background or contextual variables used as predictors in the model. See details.

bckg.indep.cat.vars

Names of categorical independent background or contextual variables used as predictors in the model to compute contrasts for (see bckg.cat.contrasts and bckg.ref.cats). See details.

bckg.cat.contrasts

String vector with the same length as the length of bckg.indep.cat.vars specifying the type of contrasts to compute in case bckg.indep.cat.vars are provided. See details.

bckg.ref.cats

Vector of integers with the same length as the length of bckg.indep.cat.vars and bckg.cat.contrasts specifying the reference categories for the contrasts to compute in case bckg.indep.cat.vars are provided. See details.

PV.root.indep

The root names for a set of plausible values used as a independent variables in the model. See details.

interactions

Interaction terms - a list containing vectors of length of two. See details.

standardize

Shall the dependent and independent variables be standardized to produce beta coefficients? The default is FALSE. See details.

weight.var

The name of the variable containing the weights. If no name of a weight variable is provide, the function will automatically select the default weight variable for the provided data, depending on the respondent type.

include.missing

Logical, shall the missing values of the splitting variables be included as categories to split by and all statistics produced for them? The default (FALSE) takes all cases on the splitting variables without missing values before computing any statistics. See details.

shortcut

Logical, shall the "shortcut" method for IEA TIMSS, TIMSS Advanced, TIMSS Numeracy, eTIMSS PSI, PIRLS, ePIRLS, PIRLS Literacy and RLII be applied? The default (FALSE) applies the "full" design when computing the variance components and the standard errors of the estimates.

save.output

Logical, shall the output be saved in MS Excel file (default) or not (printed to the console or assigned to an object).

output.file

If save.output = TRUE (default), full path to the output file including the file name. If omitted, a file with a default file name "Analysis.xlsx" will be written to the working directory (getwd()). Ignored if save.output = FALSE.

open.output

Logical, shall the output be open after it has been written? The default (TRUE) opens the output in the default spreadsheet program installed on the computer. Ignored if save.output = FALSE.

Details

Either data.file or data.object shall be provided as source of data. If both of them are provided, the function will stop with an error message.

The function computes linear regression coefficients by the categories of the splitting variables. The percentages of respondents in each group are computed within the groups specified by the last splitting variable. If no splitting variables are added, the results will be computed only by country.

If standardize = TRUE, the variables will be standardized before computing any statistics to provide beta regression coefficients.

Either a background/contextual variable (bckg.dep.var) or a root name of a set of plausible values (PV.root.dep) can be provided as dependent variable but not both.

Background/contextual variables passed to bckg.indep.cont.vars will be treated as numeric variables in the model. Variables with discrete number of categories (i.e. factors) passed to bckg.indep.cat.vars will be used to compute contrasts. In this case the type of contrast has to be passed to bckg.cat.contrasts and the number of the reference categories for each of the bckg.indep.cat.vars. The number of types of contrasts and the reference categories must be the same as the number of bckg.indep.cat.vars. The currently supported contrast coding schemes are:

Note that when using standardize = TRUE, the contrast coding of bckg.indep.cat.vars is not standardized. Thus, the regression coefficients may not be comparable to other software solutions for analyzing large-scale assessment data which rely on, for example, SPSS or SAS where the contrast coding of categorical variables (e.g. dummy coding) takes place by default. However, the model statistics will be identical.

Multiple continuous or categorical background variables and/or sets of plausible values can be provided to compute regression coefficients for. Please note that in this case the results will slightly differ compared to using each pair of the same background continuous variables or PVs in separate analysis. This is because the cases with the missing values are removed in advance and the more variables are provided, the more cases are likely to be removed. That is, the function support only listwisie deletion.

Computation of regression coefficients involving plausible values requires providing a root of the plausible values names in PV.root.dep and/or PV.root.indep. All studies (except CivED, TEDS-M, SITES, TALIS and TALIS Starting Strong Survey) have a set of PVs per construct (e.g. in TIMSS five for overall mathematics, five for algebra, five for geometry, etc.). In some studies (say TIMSS and PIRLS) the names of the PVs in a set always start with character string and end with sequential number of the PV. For example, the names of the set of PVs for overall mathematics in TIMSS are BSMMAT01, BSMMAT02, BSMMAT03, BSMMAT04 and BSMMAT05. The root of the PVs for this set to be added to PV.root.dep or PV.root.indep will be "BSMMAT". The function will automatically find all the variables in this set of PVs and include them in the analysis. In other studies like OECD PISA and IEA ICCS and ICILS the sequential number of each PV is included in the middle of the name. For example, in ICCS the names of the set of PVs are PV1CIV, PV2CIV, PV3CIV, PV4CIV and PV5CIV. The root PV name has to be specified in PV.root.dep or PV.root.indep as "PV#CIV". More than one set of PVs can be added in PV.root.indep.

The function can also compute two-way interaction effects between independent variables by passing a list to the interactions argument. The list must contain vectors of length two and all variables in these vectors must also be passed as independent variables (see the examples). Note the following:

If include.missing = FALSE (default), all cases with missing values on the splitting variables will be removed and only cases with valid values will be retained in the statistics. Note that the data from the studies can be exported in two different ways: (1) setting all user-defined missing values to NA; and (2) importing all user-defined missing values as valid ones and adding their codes in an additional attribute to each variable. If the include.missing is set to FALSE (default) and the data used is exported using option (2), the output will remove all values from the variable matching the values in its missings attribute. Otherwise, it will include them as valid values and compute statistics for them.

The shortcut argument is valid only for TIMSS, eTIMSS, TIMSS Advanced, TIMSS Numeracy, eTIMSS PSI, PIRLS, ePIRLS, PIRLS Literacy and RLII. Previously, in computing the standard errors, these studies were using 75 replicates because one of the schools in the 75 JK zones had its weights doubled and the other one has been taken out. Since TIMSS 2015 and PIRLS 2016 the studies use 150 replicates and in each JK zone once a school has its weights doubled and once taken out, i.e. the computations are done twice for each zone. For more details see Foy & LaRoche (2016) and Foy & LaRoche (2017). If replication of the tables and figures is needed, the shortcut argument has to be changed to TRUE.

The lsa.lin.reg also provides model Wald F-Statistic, as this is the appropriate statistic with complex sampling designs. See Bate (2004) and Rao & Scott (1984). The Wald F-Statistic is computed using Chi-square distribution and tests only the null hypothesis.

The function provides two-tailed t-test and p-values for the regression coefficients.

Value

If save.output = FALSE, a list containing the estimates and analysis information. If save.output = TRUE (default), an MS Excel (.xlsx) file (which can be opened in any spreadsheet program), as specified with the full path in the output.file. If the argument is missing, an Excel file with the generic file name "Analysis.xlsx" will be saved in the working directory (getwd()). The workbook contains four spreadsheets. The first one ("Estimates") contains a table with the results by country and the final part of the table contains averaged results from all countries' statistics. The following columns can be found in the table, depending on the specification of the analysis:

When interaction terms are included, the cells with the interactions in the Variables column will contain the names of the two variables in each of the interaction terms, divided by colon, e.g. ASBGSSB:ASBGHRL.

The second sheet contains the model statistics:

The third sheet contains some additional information related to the analysis per country in columns:

The fourth sheet contains the call to the function with values for all parameters as it was executed. This is useful if the analysis needs to be replicated later.

References

Bate, S. M. (2004). Generalized Linear Models for Large Dependent Data Sets [Doctoral Thesis]. University of London.

LaRoche, S., Joncas, M., & Foy, P. (2016). Sample Design in TIMSS 2015. In M. O. Martin, I. V. S. Mullis, & M. Hooper (Eds.), Methods and Procedures in TIMSS 2015. TIMSS & PIRLS International Study Center.

LaRoche, S., Joncas, M., & Foy, P. (2017). Sample Design in PIRLS 2016. In M. O. Martin, I. V. S. Mullis, & M. Hooper (Eds.), Methods and Procedures in PIRLS 2016 (p. 3.1-3.34). Lynch School of Education, Boston College.

Rao, J. N. K., & Scott, A. J. (1984). On Chi-Squared Tests for Multiway Contingency Tables with Cell Proportions Estimated from Survey Data. The Annals of Statistics, 12(1). https://doi.org/10.1214/aos/1176346391

UCLA: Statistical Consulting Group. (2020). R LIBRARY CONTRAST CODING SYSTEMS FOR CATEGORICAL VARIABLES. IDRE Stats - Statistical Consulting Web Resources. https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/

See Also

lsa.convert.data

Examples

# Compute linear regression coefficients with the complex student background scale "Student
# Sense of School Belonging/SCL" as dependent variable, and "Home Educational Resources/SCL"
# and "Students Value Science/SCL" as independent variables, by sex of students in TIMSS 2015
# grade 8 using data file, omit missing from the splitting variable (female and male as answered
# by the students), without shortcut, and open the output after the computations are done
## Not run: 
lsa.lin.reg(data.file = "C:/temp/test.RData", split.vars = "BSBG01", bckg.dep.var = "BSBGSSB",
bckg.indep.cont.vars = c("BSBGHER", "BSBGSVS"))

## End(Not run)

# Compute linear regression coefficients with the set of PVs on overall mathematics achievement
# as dependent variable, and "Home Educational Resources/SCL" and "Students Value Science/SCL"
# independent variables, by sex of students in TIMSS 2015 grade 8 using data file, omit missing
# from the splitting variable (female and male as answered by the students), with shortcut, and
# without opening the output after the computations are done
## Not run: 
lsa.lin.reg(data.file = "C:/temp/test.RData", split.vars = "BSBG01", PV.root.dep = "BSMMAT",
bckg.indep.cont.vars = c("BSBGHER", "BSBGSVS"), shortcut = TRUE, open.output = FALSE)

## End(Not run)

# Same as above, standardizing the coefficients
## Not run: 
lsa.lin.reg(data.file = "C:/temp/test.RData", split.vars = "BSBG01", PV.root.dep = "BSMMAT",
bckg.indep.cont.vars = c("BSBGHER", "BSBGSVS"), standardize = TRUE, shortcut = TRUE,
open.output = FALSE)

## End(Not run)

# Compute linear regression with contrast coded categorical variables, using student sex as
# splitting variable, the set of five PVs on overall mathematics achievement as dependent
# variable, and the frequency of speaking the language of test at home and the number of
# books at home as contrast (dummy and simple) coded variables where the second and the third
# categories, respectively, are the reference, without shortcut, saving the output in the home
# directory and opening it after the computations are done
## Not run: 
lsa.lin.reg(data.object = merged.TIMSS.2015, split.vars = "BSBG01", PV.root.dep = "BSMMAT",
bckg.indep.cat.vars = c("BSBG03", "BSBG04"), bckg.cat.contrasts = c("dummy", "simple"),
bckg.ref.cats = c(2, 3))

## End(Not run)

# Compute linear regression with interaction terms using PIRLS 2016 student data.
## Not run: 
lsa.lin.reg(data.file = "C:/temp/test.RData", bckg.dep.var = "ASBGSB",
bckg.indep.cont.vars = c("ASBGSSB", "ASBGHRL"), bckg.indep.cat.vars = "ASBG01",
interactions = list(c("ASBG01", "ASBGSSB"), c("ASBGHRL", "ASBGSSB")))

## End(Not run)


[Package RALSA version 1.4.7 Index]