Util_nstud_wide {SchoolDataIT}R Documentation

Clean the raw dataframe of the number of students and arrange it in a wide format

Description

This function firstly cleans the output of the Get_nstud function from the outliers in terms of average number of students by class at the school level and imputates the number of classes to 1 when missing, then it rearranges the data into a wide format, in such a way to represent the number of students, the number of classes and the average number of students by class at each school grade in a unique observation for each school.

Usage

Util_nstud_wide(
  data = NULL,
  missing_to_1 = FALSE,
  nstud_imputation_thresh = 19,
  UB_nstud_byclass = 99,
  LB_nstud_byclass = 1,
  verbose = TRUE,
  autoAbort = FALSE,
  ...
)

Arguments

data

Object of class list, including two objects of class tbl_df, tbl and data.frame, obtainded as output of the Get_nstud function with the default filename parameter. If NULL, the function will download it automatically but it will not be saved in the global environment. NULL by default.

missing_to_1

Logical. Whether the number of classes should be imputed to 1 when it is missing and the number of students is below a threshold (argument nstud_imputation_thresh). TRUE by default.

nstud_imputation_thresh

Numeric. The minimum threshold below which the number of classes is imputed to 1 if missing, if missing_to_1 == TRUE. E.g. if the threshold is 19, for all the schools in which there are 19 or less students in a given grade but the number of classes for that grade is missing, the number of classes is imputated to 1. 19 by default.

UB_nstud_byclass

Numeric. The upper limit of the acceptable school-level average of the number of students by class. If a school has, on average, a higher number of students by class, the record is considered an outlier and filtered out. 99 by default, i.e. no restriction is made. Please notice that boundaries are included in the acceptance interval.

LB_nstud_byclass

Numeric. The lower limit of the acceptable school-level average of the number of students by class. If a school has, on average, a smaller number of students by class, the record is considered an outlier and filtered out. 1 by default. Please notice that boundaries are included in the acceptance interval.

verbose

Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.

autoAbort

Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.

...

Arguments to Get_nstud, needed if data is not provided.

Details

In the example, we compare the dataframe obtained with the default settings and the one imposed setting narrow inclusion criteria

Value

An object of class tbl_df, tbl and data.frame

Examples



nstud.default <- Util_nstud_wide(example_input_nstud23)


nstud.narrow <- Util_nstud_wide(example_input_nstud23,
  UB_nstud_byclass = 35, LB_nstud_byclass = 5 )

nrow(nstud.default)
nrow(nstud.narrow)

nstud.default

summary(nstud.default)



[Package SchoolDataIT version 0.2.0 Index]