Set_DB {SchoolDataIT} | R Documentation |
Build up a comprehensive database regarding the school system
Description
This function generates a unique dataframe of the school system data including a customary choice of available datasets. This function allows the user to aggregate the desired datasets, when available, among these:
Invalsi census survey
School buildings
Number of students and school classes
Number of teachers
Broadband connection availability
To save as much time as possible it is possible to plug in ready-made input data; otherwise they will be downloaded automatically but not saved in the global environment When a new dataset is joined to the existing ones, it is possible that some observations in this datasets are missing. In this case, by default, the choice of keeping as much observational units as possible, or to remove units with missing variables is left to the user.
Usage
Set_DB(
Year = 2023,
level = "LAU",
conservative = TRUE,
Invalsi = TRUE,
SchoolBuildings = TRUE,
nstud = TRUE,
nteachers = TRUE,
BroadBand = TRUE,
verbose = TRUE,
show_col_types = FALSE,
Invalsi_subj = c("ELI", "ERE", "ITA", "MAT"),
Invalsi_grade = c(2, 5, 8, 10, 13),
Invalsi_WLE = FALSE,
SchoolBuildings_include_numerics = TRUE,
SchoolBuildings_include_qualitatives = FALSE,
SchoolBuildings_row_cutout = FALSE,
SchoolBuildings_col_cut_thresh = 20000,
SchoolBuildings_flag_outliers = TRUE,
SchoolBuildings_count_missing = FALSE,
nstud_imputation_thresh = 19,
nstud_missing_to_1 = FALSE,
UB_nstud_byclass = 99,
LB_nstud_byclass = 1,
InnerAreas = TRUE,
ord_InnerAreas = FALSE,
nstud_check = TRUE,
nstud_check_registry = "Any",
BroadBand_impute_missing = TRUE,
Date = as.Date(paste0(substr(year.patternA(Year), 1, 4), "-09-01")),
NA_autoRM = NULL,
input_Invalsi_IS = NULL,
input_Registry = NULL,
input_SchoolBuildings = NULL,
input_nstud = NULL,
input_School2mun = NULL,
input_AdmUnNames = NULL,
input_InnerAreas = NULL,
input_teachers4student = NULL,
input_nteachers = NULL,
input_BroadBand = NULL,
autoAbort = FALSE
)
Arguments
Year |
Numeric or Character. The relevant school year. Available in the formats: |
level |
Character. The administrative level of detail at which data must be aggregated.
Either |
conservative |
Logical. If |
Invalsi |
Logical. Whether the Invalsi census data must be included (see |
SchoolBuildings |
Logical. Whether the school buildings dataset must be included (see |
nstud |
Logical. Whether the students number per class must be included (see |
nteachers |
Logical. Whether the number of teachers by province must be included (see |
BroadBand |
Logical. Whether the broadband availability in schools must be included (see |
verbose |
Logical. If |
show_col_types |
Logical. If |
Invalsi_subj |
Character. If |
Invalsi_grade |
Numeric. If |
Invalsi_WLE |
Logical. Whether to express Invalsi scores as averagev WLE score rather that the percentage of sufficient tests, if both are Invalsi_grade is either or |
SchoolBuildings_include_numerics |
Logical. Whether to include strictly numeric variables alongside with Boolean ones in the school buildings database (see |
SchoolBuildings_include_qualitatives |
Logical. Whether to include qualitative variables alongside with Boolean ones in the school buildings database (see |
SchoolBuildings_row_cutout |
Logical. Whether to filter out rows including missing fields in the school buildings database (see |
SchoolBuildings_col_cut_thresh |
Numeric. The threshold of missing values allowed for each variable in the school buildings database (see |
SchoolBuildings_flag_outliers |
Logical. Whether to assign NA to outliers in numeric variables; see |
SchoolBuildings_count_missing |
Logical. Whether the function should return the percentage of NAs in the input school buildings database (see also |
nstud_imputation_thresh |
Numeric. If |
nstud_missing_to_1 |
Numeric. If |
UB_nstud_byclass |
Numeric. The upper limit of the acceptable school-level average of the number of students by class if |
LB_nstud_byclass |
Numeric. The lower limit of the acceptable school-level average of the number of students by class if |
InnerAreas |
Logical. Whether the percentage of schools belonging to inner/internal areas must be included (see |
ord_InnerAreas |
Logical. If |
nstud_check |
Logical. If |
nstud_check_registry |
Character. If |
BroadBand_impute_missing |
Whether the schools not included in the Broadband dataset must be considered in the total of schools (i.e. the denominator to the Broadband availability indicator). |
Date |
Character or Date. The threshold date to broadband activation to consider it activated for a school, i.e. the date before which the works of broadband activation must be finished in order to consider a school as provided with the broadband. By default, September 1st at the beginning of the school year. |
NA_autoRM |
Logical. Either |
input_Invalsi_IS |
Object of class |
input_Registry |
Object of class |
input_SchoolBuildings |
Object of class |
input_nstud |
Object of class |
input_School2mun |
Object of class |
input_AdmUnNames |
Object of class |
input_InnerAreas |
Object of class |
input_teachers4student |
Object of class |
input_nteachers |
Object of class |
input_BroadBand |
Object of classs |
autoAbort |
Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. |
Value
An object of class tbl_df
, tbl
and data.frame
See Also
Util_DB_MIUR_num
, Group_DB_MIUR
, Group_nstud
, Util_Check_nstud_availability
, Get_School2mun
for similar arguments.
Examples
DB23_prov <- Set_DB(Year = 2023, level = "NUTS-3",Invalsi_grade = c(5, 8, 13),
Invalsi_subj = "Italian",nteachers = FALSE, BroadBand = FALSE,
SchoolBuildings_count_missing = FALSE,NA_autoRM= TRUE,
input_SchoolBuildings = example_input_DB23_MIUR[, -c(11:18, 10:27)],
input_Invalsi_IS = example_Invalsi23_prov,
input_nstud = example_input_nstud23,
input_InnerAreas = example_InnerAreas,
input_School2mun = example_School2mun23,
input_AdmUnNames = example_AdmUnNames20220630)
DB23_prov
summary(DB23_prov[, -c(22:62)])