dataInfo {esmtools} | R Documentation |
Display information regarding the dataset in a succinct way.
Description
The 'dataInfo()' function displays detailed information about a dataset in a similar style as 'sessionInfo()'. It provides various details such as size, creation and update times, number of columns and rows, number of participants, variable names, and more. This information is useful for reproducibility, tracking the dataset, and ensuring transparency in data analysis workflows.
Usage
dataInfo(
file_path = NULL,
read_fun = NULL,
idvar = NULL,
timevar = NULL,
validvar = NULL,
citation = NULL,
URL = NULL,
DOI = NULL,
path = TRUE,
variables = TRUE
)
Arguments
file_path |
The path or URL of the dataset file. |
read_fun |
The function used to read the dataset file. |
idvar |
The identifier variable(s) in the dataset, represented as a character vector. |
timevar |
A time variable(s) name in the dataset. Preference is to use the sent timestamp variable (the time when the beep was sent to the participant). |
validvar |
The validation variable name in the dataset, represented as a numerical vector. If NULL, the function do not display compliance rate information. |
citation |
A character element to cite the article or document associated with the script. |
URL |
The citation information for the dataset (article associated), represented as a character string. If NULL, the function will not display the citation information. |
DOI |
The Digital Object Identifier (DOI) of the dataset, if applicable. If NULL, the function will not display the DOI information. |
path |
If TRUE, the function will display the path information. |
variables |
A logical value indicating whether to display the names of the dataset's variables. Set to TRUE to display variable information, and FALSE to omit it. The default is TRUE. |
Details
The 'dataInfo()' function provides a comprehensive summary of information about the dataset. The information returned includes:
Size: The size of the dataset in octets.
File extension
Creation and Update Times: The date and time when the data file was created and last updated.
Number of Columns and Rows
Number of Participants
Average Observations per Participant
Compliance Mean: The mean compliance value for the dataset.
Data Collection Period: The duration or period during which the data was collected.
Path: The path or URL of the dataset file.
Variable Names: The names of the variables in the dataset.
Associated Links: Any associated URL, DOI, or citation links for the dataset.
Value
The 'dataInfo()' function displays detailed information about the dataset. It can also be store as a list in a variable.
A kable object that summarizes the information on the data, the current R session, and the article or document associated with the script.
Examples
library(dplyr)
# Load data
file_path <- system.file("extdata", "esmdata_sim.csv", package = "esmtools")
# Create a function to read the data
read_fun <- function(x) read.csv2(x) %>%
mutate(sent = as.POSIXct(as.character(sent), format="%Y-%m-%d %H:%M:%S"))
# Get data information
dataInfo(
file_path = file_path, read_fun = read_fun,
idvar = "id", timevar = "sent"
)