lsa.data.diag {RALSA} | R Documentation |
Produce data diagnostic tables
Description
lsa.data.diag
is a utility function which produces diagnostic tables for variables in an lsa.data
object available in the memory or saved in an .RData
file. The function can be used with regular data.frame
or data.table
, i.e. it is applicable not only to large-scale assessment data.
Usage
lsa.data.diag(
data.file,
data.object,
split.vars,
variables,
weight.var,
cont.freq = FALSE,
include.missing = FALSE,
output.file,
open.output = TRUE,
...
)
Arguments
data.file |
The file containing |
data.object |
The object in the memory containing |
split.vars |
Variable(s) to split the results by. If no split variables are provided, the results will be computed on country level. (if weights are used) or samples (if no weights are used). See details. |
variables |
Names of the variables to compute statistics for. If the variables are
factors or character, frequencies will be computed, and if they are
numeric, descriptives will be computed, unless |
weight.var |
The name of the variable containing the weights, if weighted statistics
are needed. If no name of a weight variable is provided, the function
will automatically select the default weight variable for the provided
|
cont.freq |
Logical, shall the values of the numeric categories be treated as categorical to compute frequencies for? See details. |
include.missing |
Shall the |
output.file |
Full path to the output file including the file name. If omitted, a file
with a default file name "Analysis.xlsx" will be written to the working
directory ( |
open.output |
Logical, shall the output be open after it has been written? The default
( |
... |
Further arguments. |
Details
The function produces data diagnostic tables for variables in an lsa.data
set by the categories of splitting variables. The function is also applicable to data sets which are not of class lsa.data
, a regular data.frame
or a data.table
are accepted as well. If the data is of class lsa.data
and no split.vars
variables are provided, the results will be automatically split and computed by country. The country ID variable will be added automatically, there is no need to specify it explicitly in split.vars
. If the data is not of class lsa.data
and no split.vars
variables are provided, the results will be computed without any split.
Either data.file
or data.object
shall be provided as source of data. If both of them are provided, the function will stop with an error message.
If variables are provided for the split.vars
argument and include.missing = TRUE
, the function will automatically add the NA
and user-defined missing values from the missings
attribute (if available) of the split.vars
variables to the categories to split by and will compute statistics for the provided variables
for these as well. See the documentation on lsa.convert.data
for more details on the conversion of data with and without user-defined missing values.
If no variable names are provided to variables
all variables available in the data set will be added automatically, except the weighting and splitting variables, and statistics for all of them will be computed.
If the variables provided to the variables
argument are factor or character, the function will compute frequencies, percentages, valid percentages, and cumulative percentages. If the variables are numeric, the computed statistics will include the total number of cases, range, minimum, maximum, mean, variance, and standard deviation. If cont.freq = TRUE
, then the numeric variables will be treated as factors.
If the data set is of class lsa.data
and no weight variable is provided, the computed statistics will be automatically weighted by the default weight for the respondents' data in the object. If the name of a weight variable is provided, the statistics will be weighted by it. If weight.var = "none"
, the computed statistics will be unweighted. If the data is not of class lsa.data
and no weight.var
is provided, the computed statistics will be unweighted. If a weight variable is provided, the computed statistics will be weighted by it.
Value
A MS Excel (.xlsx
) file (which can be opened in any spreadsheet program), as specified with the full path in the output.file
. If the argument is missing, an Excel file with the generic file name "Analysis.xlsx" will be saved in the working directory (getwd()
). The first sheet in the workbook is an Index
sheet. All other sheets contain the computed statistics for the variables, one sheet per variable. The Index
sheet contains columns with the names of the variables for which statistics are computed and their labels, if available. The names are clickable links, if clicked, they switch to the corresponding sheet with statistics for the corresponding variable. If the data is of class lsa.data
, the Index
sheet also contains information with the study name, cycle, respondent type and used weight. If the data is not of class lsa.data
, the Index
sheet contains information only which weight was used. Each sheet with statistics for a variable contains a clickable link to go back to the Index
sheet, the variable name and label (if any), and the table with statistics for that variable.
Note
This function is intended only as utility function for diagnostic purposes, to inspect the variables prior to performing an actual analysis. It is not intended for actual analysis of large-scale assessments' data. Reporting statistics from it can and will lead to biased and erroneous conclusions.
See Also
Examples
# Merge PIRLS 2016 school principal data for all countries
## Not run:
lsa.merge.data(inp.folder = "C:/Data", file.types = list(acg = NULL),
out.file = "C:/Merged/Merged.RData")
## End(Not run)
# Produce diagnostic tables for some factor (categorical) and numeric (continuous) variables
# by country
## Not run:
lsa.data.diag(data.file = "C:/Merged/Merged.RData",
variables = c("ACBG05A", "ACBG04", "ACBGELS", "ACBGRRS"),
output.file = "C:/temp/test.xlsx", open.output = TRUE)
## End(Not run)
# Repeat the above, splitting the results by country and percentage of students at school
# coming from economically affluent homes ("ACBG03B")
## Not run:
lsa.data.diag(data.file = "C:/Merged/Merged.RData",
split.vars = "ACBG03B", variables = c("ACBG05A", "ACBG04", "ACBGELS", "ACBGRRS"),
output.file = "C:/temp/test.xlsx", open.output = TRUE)
## End(Not run)
# Repeat the above, this time treating the numeric variables ("ACBGELS" and "ACBGRRS")
# as categorical
## Not run:
lsa.data.diag(data.file = "C:/Merged/Merged.RData",
split.vars = "ACBG03B, include.missing = TRUE,
variables = c("ACBG05A", "ACBG04", "ACBGELS", "ACBGRRS"),
output.file = "C:/temp/test.xlsx", open.output = TRUE)
## End(Not run)
# Produce diag for all variables in the data set by country and percentage of students
# coming from economically affluent homes ("ASBG03B")
## Not run:
lsa.data.diag(data.file = "C:/Merged/Merged.RData",
split.vars = "ACBG03B, output.file = "C:/temp/test.xlsx",
open.output = TRUE)
## End(Not run)