lsa.crosstabs {RALSA}R Documentation

Compute crosstabulations and design corrected chi-square statistics

Description

lsa.crosstabs computes two-way tables and estimates the Rao-Scott first- and second-order adjusted chi-square.

Usage

lsa.crosstabs(
  data.file,
  data.object,
  split.vars,
  bckg.row.var,
  bckg.col.var,
  expected.cnts = TRUE,
  row.pcts = FALSE,
  column.pcts = FALSE,
  total.pcts = FALSE,
  weight.var,
  include.missing = FALSE,
  shortcut = FALSE,
  graphs = FALSE,
  graph.row.label = NULL,
  graph.col.label = NULL,
  save.output = TRUE,
  output.file,
  open.output = TRUE
)

Arguments

data.file

A file containing lsa.data object. Either this or data.object shall be specified, but not both. See details.

data.object

An object in the memory containing lsa.data. Either this or data.file shall be specified, but not both. See details.

split.vars

Categorical variable(s) to split the results by. If no split variables are provided, the results will be for the overall countries' populations. If one or more variables are provided, the results will be split by all but the last variable and the percentages of respondents will be computed by the unique values of the last splitting variable.

bckg.row.var

Name of the categorical background row variable. The results will be computed by all groups specified by the splitting variables. See details.

bckg.col.var

Name of the categorical background column variable. The results will be computed by all groups specified by the splitting variables. See details.

expected.cnts

Logical, shall the expected counts be computed as well? The default (TRUE) will compute the expected counts. If FALSE, only the observed counts will be included in the output.

row.pcts

Logical, shall row percentages be computed? The default (FALSE) will skip the computation of the row percentages.

column.pcts

Logical, shall column percentages be computed? The default (FALSE) will skip the computation of the column percentages.

total.pcts

Logical, shall percentages of total be computed? The default (FALSE) will skip the computation of the total percentages.

weight.var

The name of the variable containing the weights. If no name of a weight variable is provided, the function will automatically select the default weight variable for the provided data, depending on the respondent type.

include.missing

Logical, shall the missing values of the splitting variables be included as categories to split by and all statistics produced for them? The default (FALSE) takes all cases on the splitting variables without missing values before computing any statistics. See details.

shortcut

Logical, shall the "shortcut" method for IEA TIMSS, TIMSS Advanced, TIMSS Numeracy, eTIMSS, PIRLS, ePIRLS, PIRLS Literacy and RLII be applied? The default (FALSE) applies the "full" design when computing the variance components and the standard errors of the estimates.

graphs

Logical, shall graphs be produced? Default is FALSE. See details.

graph.row.label

String, custom label for the row variable in graphs. Ignored if graphs = FALSE. See details.

graph.col.label

String, custom label for the column variable in graphs. Ignored if graphs = FALSE. See details.

save.output

Logical, shall the output be saved in MS Excel file (default) or not (printed to the console or assigned to an object).

output.file

If save.output = TRUE (default), full path to the output file including the file name. If omitted, a file with a default file name "Analysis.xlsx" will be written to the working directory (getwd()). Ignored if save.output = FALSE.

open.output

Logical, shall the output be open after it has been written? The default (TRUE) opens the output in the default spreadsheet program installed on the computer. Ignored if save.output = FALSE.

Details

The function computes two-way tables between two categorical variables and estimates the Rao-Scott first- and second-order design correction of the chi-square statistics. All statistics are computed within the groups specified by the last splitting variable. If no splitting variables are added, the results will be computed only by country.

Either data.file or data.object shall be provided as source of data. If both of them are provided, the function will stop with an error message.

Only two (row and column) categorical variables can be provided. The function always computes the observed counts. If requested, the expected counts, row percentages, column percentages and total percentages can be computed as well.

If include.missing = FALSE (default), all cases with missing values on the splitting variables will be removed and only cases with valid values will be retained in the statistics. Note that the data from the studies can be exported in two different ways: (1) setting all user-defined missing values to NA; and (2) importing all user-defined missing values as valid ones and adding their codes in an additional attribute to each variable. If the include.missing is set to FALSE (default) and the data used is exported using option (2), the output will remove all values from the variable matching the values in its missings attribute. Otherwise, it will include them as valid values and compute statistics for them.

The shortcut argument is valid only for TIMSS, eTIMSS, TIMSS Advanced, TIMSS Numeracy, PIRLS, ePIRLS, PIRLS Literacy and RLII. Previously, in computing the standard errors, these studies were using 75 replicates because one of the schools in the 75 JK zones had its weights doubled and the other one has been taken out. Since TIMSS 2015 and PIRLS 2016 the studies use 150 replicates and in each JK zone once a school has its weights doubled and once taken out, i.e. the computations are done twice for each zone. For more details see Foy & LaRoche (2016) and Foy & LaRoche (2017).

If graphs = TRUE, the function will produce graphs, heatmaps of counts per combination of bckg.row.var and bckg.col.var category (population estimates) per group defined by the split.vars will be produced. All plots are produced per country. If no split.vars at the end there will be a heatmap for all countries together. By default the row and column variable names are used for labeling the axes of the heatmaps, unless graph.row.label and/or graph.col.label arguments are supplied. These two arguments accept strings of length 1 which will be used to label the axes.

The function also computes chi-square statistics with Rao-Scott first- and second-order design corrections because of the clustering in complex survey designs. For more details, see Rao & Scott (1984, 1987) and Skinner (2019).

Value

If save.output = FALSE, a list containing the estimates and analysis information. If graphs = TRUE, the plots will be added to the list of estimates.

If save.output = TRUE (default), an MS Excel (.xlsx) file (which can be opened in any spreadsheet program), as specified with the full path in the output.file. If the argument is missing, an Excel file with the generic file name "Analysis.xlsx" will be saved in the working directory (getwd()). The workbook contains four spreadsheets. The first one ("Estimates") contains a table with the results by country and the final part of the table contains averaged results from all countries' statistics. The following columns can be found in the table, depending on the specification of the analysis:

The second sheet contains some additional information related to the analysis per country in the following columns:

The third sheet contains some additional information related to the analysis per country in the following columns:

The fourth sheet contains the call to the function with values for all parameters as it was executed. This is useful if the analysis needs to be replicated later.

If graphs = TRUE there will be an additional "Graphs" sheet containing all plots.

If any warnings resulting from the computations are issued, these will be included in an additional "Warnings" sheet in the workbook as well.

References

LaRoche, S., Joncas, M., & Foy, P. (2016). Sample Design in TIMSS 2015. In M. O. Martin, I. V. S. Mullis, & M. Hooper (Eds.), Methods and Procedures in TIMSS 2015 (pp. 3.1-3.37). Chestnut Hill, MA: TIMSS & PIRLS International Study Center.

LaRoche, S., Joncas, M., & Foy, P. (2017). Sample Design in PIRLS 2016. In M. O. Martin, I. V. S. Mullis, & M. Hooper (Eds.), Methods and Procedures in PIRLS 2016 (pp. 3.1-3.34). Chestnut Hill, MA: Lynch School of Education, Boston College.

Rao, J. N. K., & Scott, A. J. (1984). On Chi-Squared Tests for Multiway Contingency Tables with Cell Proportions Estimated from Survey Data. The Annals of Statistics, 12(1). https://doi.org/10.1214/aos/1176346391

Rao, J. N. K., & Scott, A. J. (1987). On Simple Adjustments to Chi-Square Tests with Sample Survey Data. The Annals of Statistics, 15(1), 385-397.

Skinner, C. (2019). Analysis of Categorical Data for Complex Surveys. International Statistical Review, 87(S1), S64-S78. https://doi.org/10.1111/insr.12285

See Also

lsa.convert.data

Examples

# Compute two-way table between student sex and how much they proud they are proud to go to
# school using PIRLS 2016 student data.
## Not run: 
lsa.crosstabs(data.file = "C:/Data/PIRLS_2016_G8_Student_Miss_to_NA.RData",
bckg.row.var = "ITSEX", bckg.col.var = "ASBG12E")

## End(Not run)

# Same as the above, this time also computing the expected counts, row percentages, column
# percentages, percentages of total.
## Not run: 
lsa.crosstabs(data.file = "C:/Data/PIRLS_2016_G8_Student_Miss_to_NA.RData",
bckg.row.var = "ITSEX", bckg.col.var = "ASBG12E", expected.cnts = TRUE,
row.pcts = TRUE, column.pcts = TRUE, total.pcts = TRUE)

## End(Not run)


[Package RALSA version 1.4.7 Index]