R: Produce data diagnostic tables

lsa.data.diag {RALSA}

R Documentation

Produce data diagnostic tables

Description

lsa.data.diag is a utility function which produces diagnostic tables for variables in an lsa.data object available in the memory or saved in an .RData file. The function can be used with regular data.frame or data.table, i.e. it is applicable not only to large-scale assessment data.

Usage

lsa.data.diag(
  data.file,
  data.object,
  split.vars,
  variables,
  weight.var,
  cont.freq = FALSE,
  include.missing = FALSE,
  output.file,
  open.output = TRUE,
  ...
)

Arguments

`data.file`	The file containing `lsa.data` object. Either this or `data.object` shall be specified, but not both. See details.
`data.object`	The object in the memory containing `lsa.data` object. Either this or `data.file` shall be specified, but not both. See details.
`split.vars`	Variable(s) to split the results by. If no split variables are provided, the results will be computed on country level. (if weights are used) or samples (if no weights are used). See details.
`variables`	Names of the variables to compute statistics for. If the variables are factors or character, frequencies will be computed, and if they are numeric, descriptives will be computed, unless `cont.freq = TRUE`. See details.
`weight.var`	The name of the variable containing the weights, if weighted statistics are needed. If no name of a weight variable is provided, the function will automatically select the default weight variable for the provided `lsa.data`, depending on the respondent type. `"none"` is for unweighted statistics. See details.
`cont.freq`	Logical, shall the values of the numeric categories be treated as categorical to compute frequencies for? See details.
`include.missing`	Shall the `NA` and user-defined missing values (if available) be included as splitting categories for the variables in `split.vars`? The default is `FALSE`. See details.
`output.file`	Full path to the output file including the file name. If omitted, a file with a default file name "Analysis.xlsx" will be written to the working directory (`getwd()`).
`open.output`	Logical, shall the output be open after it has been written? The default (`TRUE`) opens the output in the default spreadsheet program installed on the computer.
`...`	Further arguments.

Details

The function produces data diagnostic tables for variables in an lsa.data set by the categories of splitting variables. The function is also applicable to data sets which are not of class lsa.data, a regular data.frame or a data.table are accepted as well. If the data is of class lsa.data and no split.vars variables are provided, the results will be automatically split and computed by country. The country ID variable will be added automatically, there is no need to specify it explicitly in split.vars. If the data is not of class lsa.data and no split.vars variables are provided, the results will be computed without any split.

Either data.file or data.object shall be provided as source of data. If both of them are provided, the function will stop with an error message.

If variables are provided for the split.vars argument and include.missing = TRUE, the function will automatically add the NA and user-defined missing values from the missings attribute (if available) of the split.vars variables to the categories to split by and will compute statistics for the provided variables for these as well. See the documentation on lsa.convert.data for more details on the conversion of data with and without user-defined missing values.

If no variable names are provided to variables all variables available in the data set will be added automatically, except the weighting and splitting variables, and statistics for all of them will be computed.

If the variables provided to the variables argument are factor or character, the function will compute frequencies, percentages, valid percentages, and cumulative percentages. If the variables are numeric, the computed statistics will include the total number of cases, range, minimum, maximum, mean, variance, and standard deviation. If cont.freq = TRUE, then the numeric variables will be treated as factors.

If the data set is of class lsa.data and no weight variable is provided, the computed statistics will be automatically weighted by the default weight for the respondents' data in the object. If the name of a weight variable is provided, the statistics will be weighted by it. If weight.var = "none", the computed statistics will be unweighted. If the data is not of class lsa.data and no weight.var is provided, the computed statistics will be unweighted. If a weight variable is provided, the computed statistics will be weighted by it.

Value

A MS Excel (.xlsx) file (which can be opened in any spreadsheet program), as specified with the full path in the output.file. If the argument is missing, an Excel file with the generic file name "Analysis.xlsx" will be saved in the working directory (getwd()). The first sheet in the workbook is an Index sheet. All other sheets contain the computed statistics for the variables, one sheet per variable. The Index sheet contains columns with the names of the variables for which statistics are computed and their labels, if available. The names are clickable links, if clicked, they switch to the corresponding sheet with statistics for the corresponding variable. If the data is of class lsa.data, the Index sheet also contains information with the study name, cycle, respondent type and used weight. If the data is not of class lsa.data, the Index sheet contains information only which weight was used. Each sheet with statistics for a variable contains a clickable link to go back to the Index sheet, the variable name and label (if any), and the table with statistics for that variable.

Note

This function is intended only as utility function for diagnostic purposes, to inspect the variables prior to performing an actual analysis. It is not intended for actual analysis of large-scale assessments' data. Reporting statistics from it can and will lead to biased and erroneous conclusions.

Examples

# Merge PIRLS 2016 school principal data for all countries
## Not run: 
lsa.merge.data(inp.folder = "C:/Data", file.types = list(acg = NULL),
out.file = "C:/Merged/Merged.RData")

## End(Not run)

# Produce diagnostic tables for some factor (categorical) and numeric (continuous) variables
# by country
## Not run: 
lsa.data.diag(data.file = "C:/Merged/Merged.RData",
variables = c("ACBG05A", "ACBG04", "ACBGELS", "ACBGRRS"),
output.file = "C:/temp/test.xlsx", open.output = TRUE)

## End(Not run)

# Repeat the above, splitting the results by country and percentage of students at school
# coming from economically affluent homes ("ACBG03B")
## Not run: 
lsa.data.diag(data.file = "C:/Merged/Merged.RData",
split.vars = "ACBG03B", variables = c("ACBG05A", "ACBG04", "ACBGELS", "ACBGRRS"),
output.file = "C:/temp/test.xlsx", open.output = TRUE)

## End(Not run)

# Repeat the above, this time treating the numeric variables ("ACBGELS" and "ACBGRRS")
# as categorical
## Not run: 
lsa.data.diag(data.file = "C:/Merged/Merged.RData",
split.vars = "ACBG03B, include.missing = TRUE,
variables = c("ACBG05A", "ACBG04", "ACBGELS", "ACBGRRS"),
output.file = "C:/temp/test.xlsx", open.output = TRUE)

## End(Not run)

# Produce diag for all variables in the data set by country and percentage of students
# coming from economically affluent homes ("ASBG03B")
## Not run: 
lsa.data.diag(data.file = "C:/Merged/Merged.RData",
split.vars = "ACBG03B, output.file = "C:/temp/test.xlsx",
open.output = TRUE)

## End(Not run)

[Package RALSA version 1.4.7 Index]