R: Connect to PISA Data

readPISA {EdSurvey}

R Documentation

Connect to PISA Data

Description

Opens a connection to a PISA data file and returns an edsurvey.data.frame with information about the file and data.

Usage

readPISA(
  path,
  database = c("INT", "CBA", "FIN"),
  countries,
  cognitive = c("score", "response", "none"),
  forceReread = FALSE,
  verbose = TRUE
)

Arguments

`path`	a character vector to the full directory path(s) to the PISA-extracted fixed-width files and SPSS control files (.txt).
`database`	a character to indicate a selected database. Must be one of `INT` (general database that most people use), `CBA` (computer-based database in PISA 2012 only), or `FIN` (financial literacy database in PISA 2012 and 2018). Defaults to `INT`.
`countries`	a character vector of the country/countries to include using the three-digit ISO country code. A list of country codes can be found in the PISA codebook or https://en.wikipedia.org/wiki/ISO_3166-1#Current_codes. If files are downloaded using `downloadPISA`, a country dictionary text file can be found in the filepath.
`cognitive`	one of `none`, `score`, or `response`. Default is `score`. The PISA database often has three student files: student questionnaire, cognitive item response, and scored cognitive item response. The first file is used as the main student file with student background information. Users can choose whether to merge `score` or `response` data into the main file or not (if `none`).
`forceReread`	a logical value to force rereading of all processed data. Defaults to `FALSE`. Setting `forceReread` to be `TRUE` will cause PISA data to be reread and increase processing time.
`verbose`	a logical value that will determine if you want verbose output while the function is running to indicate progress. Defaults to `TRUE`.

Details

Reads in the unzipped files downloaded from the PISA database using the OECD Repository (https://www.oecd.org/pisa/). Users can use downloadPISA to download all required files. Student questionnaire files (with weights and plausible values) are used as main files, which are then merged with cognitive, school, and parent files (if available).

The average first-time processing time for 1 year and one database for all countries is 10–15 minutes. If forceReread is set to be FALSE, the next time this function is called will take only 5–10 seconds.

For the PISA 2000 study, please note that the study weights are subject specific. Each weight has different adjustment factors for reading, mathematics, and science based on it's original subject source file. For example, the w_fstuwt_read weight is associated with the reading subject data file. Special care must be used to select the correct weight based on your specific analysis. See the OECD documentation for further details. Use the showWeights function to see all three student level subject weights:

w_fstuwt_read = Reading (default)
w_fstuwt_scie = Science
w_fstuwt_math = Mathematics

Value

an edsurvey.data.frame for a single specified country or an edsurvey.data.frame.list if multiple countries are specified

Author(s)

Tom Fink, Trang Nguyen, and Paul Bailey

References

Organisation for Economic Co-operation and Development. (2017). PISA 2015 technical report. Paris, France: OECD Publishing. Retrieved from https://www.oecd.org/pisa/data/2015-technical-report/

Examples

## Not run: 
# the following call returns an edsurvey.data.frame to 
# PISA 2012 International Database for Singapore
sgp2012 <- readPISA(path = "~/PISA/2012", database = "INT", countries = "sgp")

# extract a data.frame with a few variables
gg <- getData(sgp2012, c("cnt","read","w_fstuwt"))  
head(gg)

# conduct an analysis on the edsurvey.data.frame
edsurveyTable(formula=read ~ st04q01 + st20q01, data = sgp2012)

## End(Not run)

[Package EdSurvey version 4.0.7 Index]