Set_DB {SchoolDataIT}R Documentation

Build up a comprehensive database regarding the school system


This function generates a unique dataframe of the school system data including a customary choice of available datasets. This function allows the user to aggregate the desired datasets, when available, among these:

To save as much time as possible it is possible to plug in ready-made input data; otherwise they will be downloaded automatically but not saved in the global environment When a new dataset is joined to the existing ones, it is possible that some observations in this datasets are missing. In this case, by default, the choice of keeping as much observational units as possible, or to remove units with missing variables is left to the user.


  Year = 2023,
  level = "LAU",
  conservative = TRUE,
  Invalsi = TRUE,
  SchoolBuildings = TRUE,
  nstud = TRUE,
  nteachers = TRUE,
  BroadBand = TRUE,
  verbose = TRUE,
  show_col_types = FALSE,
  Invalsi_subj = c("ELI", "ERE", "ITA", "MAT"),
  Invalsi_grade = c(2, 5, 8, 10, 13),
  Invalsi_WLE = FALSE,
  SchoolBuildings_include_numerics = TRUE,
  SchoolBuildings_include_qualitatives = FALSE,
  SchoolBuildings_row_cutout = FALSE,
  SchoolBuildings_col_cut_thresh = 20000,
  SchoolBuildings_flag_outliers = TRUE,
  SchoolBuildings_count_missing = FALSE,
  nstud_imputation_thresh = 19,
  nstud_missing_to_1 = FALSE,
  UB_nstud_byclass = 99,
  LB_nstud_byclass = 1,
  InnerAreas = TRUE,
  ord_InnerAreas = FALSE,
  nstud_check = TRUE,
  nstud_check_registry = "Any",
  BroadBand_impute_missing = TRUE,
  Date = as.Date(paste0(substr(year.patternA(Year), 1, 4), "-09-01")),
  NA_autoRM = NULL,
  input_Invalsi_IS = NULL,
  input_Registry = NULL,
  input_SchoolBuildings = NULL,
  input_nstud = NULL,
  input_School2mun = NULL,
  input_AdmUnNames = NULL,
  input_InnerAreas = NULL,
  input_teachers4student = NULL,
  input_nteachers = NULL,
  input_BroadBand = NULL,
  autoAbort = FALSE



Numeric or Character. The relevant school year. Available in the formats: 2023, "2022/2023", 202223, 20222023. Important: if input datasets are plugged in, please select the same Year argument used to download the input data. 2023 by default.


Character. The administrative level of detail at which data must be aggregated. Either "LAU"/"Municipality" or "NUTS-3"/"Province". "LAU" by default.


Logical. If FALSE, only the schools included in all the datasets are taken as input. TRUE by default.


Logical. Whether the Invalsi census data must be included (see Get_Invalsi_IS. TRUE by default.


Logical. Whether the school buildings dataset must be included (see link{Get_DB_MIUR}, Util_DB_MIUR_num. TRUE by default.


Logical. Whether the students number per class must be included (see Get_nstud. TRUE by default.


Logical. Whether the number of teachers by province must be included (see link{Get_nteachers_prov}). TRUE by default.


Logical. Whether the broadband availability in schools must be included (see Get_BroadBand). TRUE by default


Logical. If TRUE, the user keeps track of the main underlying operations. TRUE by default.


Logical. If TRUE, if the verbose argument is also TRUE, the columns of the raw dataset are shown during the download. FALSE by default.


Character. If Invalsi == TRUE, the school subject(s) to include, among "Englis_listening"/"ELI", "English_reading"/"ERE", "Italian"/"Ita" and "Mathematics"/"MAT". All four by default.


Numeric. If Invalsi == TRUE, the educational grade to choose. Either 2 (2nd year of primary school), 5 (last year of primary school), 8 (last year of middle shcool), 10 (2nd year of high school) or 13 (last year of school). All by default.


Logical. Whether to express Invalsi scores as averagev WLE score rather that the percentage of sufficient tests, if both are Invalsi_grade is either or 2 5. FALSE by default


Logical. Whether to include strictly numeric variables alongside with Boolean ones in the school buildings database (see Util_DB_MIUR_num). TRUE by default.


Logical. Whether to include qualitative variables alongside with Boolean ones in the school buildings database (see Util_DB_MIUR_num). FALSE by default.


Logical. Whether to filter out rows including missing fields in the school buildings database (see Util_DB_MIUR_num). FALSE by default.


Numeric. The threshold of missing values allowed for each variable in the school buildings database (see Util_DB_MIUR_num). If a variable as a higher number of missing observations, then it is cut out. 20.000 by default. Warning: if the option SchoolBuildings_row_cutout is active, please select a lower threshold (e.g. 1000)


Logical. Whether to assign NA to outliers in numeric variables; see Util_DB_MIUR_num for more details. TRUE by default.


Logical. Whether the function should return the percentage of NAs in the input school buildings database (see also Group_DB_MIUR). FALSE by default.


Numeric. If nstud_missing_to_1 == TRUE, the minimum threshold below which the number of classes is imputed to 1 if missing; see also Util_nstud_wide. 19 by default.


Numeric. If nstud == TRUE, whether the number of classes should be imputed to 1 when it is missing and the number of students is below a threshold (argument nstud_imputation_thresh, see Util_nstud_wide). FALSE by default.


Numeric. The upper limit of the acceptable school-level average of the number of students by class if nstud == TRUE; see also Util_nstud_wide. 99 by default, i.e. no restriction is made. Please notice that boundaries are included in the acceptance interval.


Numeric. The lower limit of the acceptable school-level average of the number of students by class if nstud == TRUE; see also Util_nstud_wide. 1 by default. Please notice that boundaries are included in the acceptance interval.


Logical. Whether the percentage of schools belonging to inner/internal areas must be included (see Get_InnerAreas). TRUE by default.


Logical. If check == TRUE and InnerAreas == TRUE, whether the Inner areas classification should be treated as an ordinal variable rather than as a categorical one (see Get_InnerAreas for the classification). FALSE by default.


Logical. If nstud == TRUE, whether to check the students number availability across all school included in the school registries (see Util_Check_nstud_availability). TRUE by default.


Character. If nstud == TRUE and nstud_check == TRUE, the school registries whose availability has to be checked. Either "Registry_from_buildings" (buildings registry), "Registry_from_registry" (proper registry), "Any" or "Both". "Any" by default.


Whether the schools not included in the Broadband dataset must be considered in the total of schools (i.e. the denominator to the Broadband availability indicator). TRUE by default.


Character or Date. The threshold date to broadband activation to consider it activated for a school, i.e. the date before which the works of broadband activation must be finished in order to consider a school as provided with the broadband. By default, September 1st at the beginning of the school year.


Logical. Either TRUE, FALSE or NULL. If TRUE, the values missing in a single dataset are automatically deleted from the final DB. If FALSE, the missing observations are kept automatically. If NULL, the choice is left to the user by an interactive menu. NULL by default.


Object of class tbl_df, tbl and data.frame. If INVALSI == TRUE, the raw Invalsi survey data, obtained as output of the Get_Invalsi_IS function. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Object of class tbl_df, tbl and data.frame. The school registry corresponding to the year in scope, obtained as output of the function Get_Registry. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Object of class tbl_df, tbl and data.frame. If SchoolBuildings == TRUE, the raw school buildings dataset obtained as output of the function Get_DB_MIUR. If NULL, it will be downloaded automatically but not saved in the global environment. NULL by default.


Object of class list, including two objects of classtbl_df, tbl and data.frame. If nstud == TRUE, the students and classes counts, obtained as output of the function Get_nstud with default filename parameter. If NULL, the function will download it automatically but it will not be saved in the global environment. NULL by default.


Object of class list with elements of class tbl_df, tbl and data.frame If nstud == TRUE, the mapping from school codes to municipality (and province) codes. Needed only if check == TRUE, obtained as output of the function Get_School2mun. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default.


Object of class tbl_df, tbl and data.frame, obtained as output of the function Get_AdmUnNames If necessary,the ISTAT file including all the codes and the names of the administrative units for the year in scope. Required either if nstud == TRUE & nstud_check == TRUE, or if SchoolBuildings == TRUE, input_DB_MIUR is not provided, and the school year is one of 2015/16, 2017/18 or 1018/19 If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default.


Object of class tbl_df, tbl and data.frame. If InnerAreas == TRUE, the classification of peripheral municipalities, obtained as output of the function Get_InnerAreas If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Object of class tbl_df, tbl and data.frame. If nteachers == TRUE and nstud = TRUE, the number of teachers for studets by province. Please notice that this object cannot be considered a substitute for the number of students by class since it provides no information on the number of schools in single educational grades but only at the school order level. Obtained as output of the function Group_teachers4stud. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default.


Object of class tbl_df, tbl and data.frame. If nteachers == TRUE, the number of teachers by province, obtained as output of the function Get_nteachers_prov. If NULL, it will be downloaded automatically, but not saved in the global environment. NULL by default


Object of classs tbl_df, tbl and data.frame. If BroadBand == TRUE, the raw Broadband connection dataset obtaned as output of the function Get_BroadBand If NULL, it will be downloaded automatically but not saved in the global environment. NULL by default.


Logical. In case any data must be retrieved, whether to automatically abort the operation and return NULL in case of missing internet connection or server response errors. FALSE by default.


An object of class tbl_df, tbl and data.frame

See Also

Util_DB_MIUR_num, Group_DB_MIUR, Group_nstud, Util_Check_nstud_availability, Get_School2mun for similar arguments.


DB23_prov <- Set_DB(Year = 2023, level = "NUTS-3",Invalsi_grade = c(5, 8, 13),
      Invalsi_subj = "Italian",nteachers = FALSE, BroadBand = FALSE,
      SchoolBuildings_count_missing = FALSE,NA_autoRM= TRUE,
      input_SchoolBuildings = example_input_DB23_MIUR[, -c(11:18, 10:27)],
      input_Invalsi_IS = example_Invalsi23_prov,
      input_nstud = example_input_nstud23,
      input_InnerAreas = example_InnerAreas,
      input_School2mun = example_School2mun23,
      input_AdmUnNames = example_AdmUnNames20220630)


summary(DB23_prov[, -c(22:62)])

[Package SchoolDataIT version 0.2.0 Index]