R: Clean the raw dataframe of the number of students and arrange...

Util_nstud_wide {SchoolDataIT}

R Documentation

Clean the raw dataframe of the number of students and arrange it in a wide format

Description

This function firstly cleans the output of the Get_nstud function from the outliers in terms of average number of students by class at the school level and imputates the number of classes to 1 when missing, then it rearranges the data into a wide format, in such a way to represent the number of students, the number of classes and the average number of students by class at each school grade in a unique observation for each school.

Usage

Util_nstud_wide(
  data = NULL,
  missing_to_1 = FALSE,
  nstud_imputation_thresh = 19,
  UB_nstud_byclass = 99,
  LB_nstud_byclass = 1,
  verbose = TRUE,
  autoAbort = FALSE,
  ...
)

Arguments

`data`	Object of class `list`, including two objects of class `tbl_df`, `tbl` and `data.frame`, obtainded as output of the `Get_nstud` function with the default `filename` parameter. If `NULL`, the function will download it automatically but it will not be saved in the global environment. `NULL` by default.
`missing_to_1`	Logical. Whether the number of classes should be imputed to 1 when it is missing and the number of students is below a threshold (argument `nstud_imputation_thresh`). `TRUE` by default.
`nstud_imputation_thresh`	Numeric. The minimum threshold below which the number of classes is imputed to 1 if missing, if `missing_to_1 == TRUE`. E.g. if the threshold is 19, for all the schools in which there are 19 or less students in a given grade but the number of classes for that grade is missing, the number of classes is imputated to 1. `19` by default.
`UB_nstud_byclass`	Numeric. The upper limit of the acceptable school-level average of the number of students by class. If a school has, on average, a higher number of students by class, the record is considered an outlier and filtered out. `99` by default, i.e. no restriction is made. Please notice that boundaries are included in the acceptance interval.
`LB_nstud_byclass`	Numeric. The lower limit of the acceptable school-level average of the number of students by class. If a school has, on average, a smaller number of students by class, the record is considered an outlier and filtered out. `1` by default. Please notice that boundaries are included in the acceptance interval.
`verbose`	Logical. If `TRUE`, the user keeps track of the main underlying operations. `TRUE` by default.
`autoAbort`	Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. `FALSE` by default.
`...`	Arguments to `Get_nstud`, needed if `data` is not provided.

Details

In the example, we compare the dataframe obtained with the default settings and the one imposed setting narrow inclusion criteria

Value

An object of class tbl_df, tbl and data.frame

Examples



nstud.default <- Util_nstud_wide(example_input_nstud23)


nstud.narrow <- Util_nstud_wide(example_input_nstud23,
  UB_nstud_byclass = 35, LB_nstud_byclass = 5 )

nrow(nstud.default)
nrow(nstud.narrow)

nstud.default

summary(nstud.default)

[Package SchoolDataIT version 0.2.0 Index]