prep {prepdat} | R Documentation |
Creates One Finalized Table Ready for Statistical Analysis
Description
prep()
aggregates a single dataset in a long format
according to any number of grouping variables. This makes prep()
suitable for aggregating data from various types of experimental designs
such as between-subjects, within-subjects (i.e., repeated measures), and
mixed designs (i.e., experimental designs that include both between- and
within- subjects independent variables). prep()
returns a data
frame with a number of dependent measures for further analysis for each
aggregated cell (i.e., experimental cell) according to the provided
grouping variables (i.e., independent variables). Dependent measures for
each experimental cell include among others means before and after
rejecting observations according to a flexible standard deviation
criteria, number of rejected observations according to the flexible
standard deviation criteria, proportions of rejected observations
according to the flexible standard deviation criteria, number of
observations before rejection, means after rejecting observations
according to procedures described in Van Selst & Jolicoeur (1994;
suitable when measuring reaction-times), standard deviations, medians,
means according to any percentile (e.g., 0.05, 0.25, 0.75, 0.95) and
harmonic means. The data frame prep()
returns can also be exported
as a txt or csv file to be used for statistical analysis in other
statistical programs.
Usage
prep(
dataset = NULL
, file_name = NULL
, file_path = NULL
, id = NULL
, within_vars = c()
, between_vars = c()
, dvc = NULL
, dvd = NULL
, keep_trials = NULL
, drop_vars = c()
, keep_trials_dvc = NULL
, keep_trials_dvd = NULL
, id_properties = c()
, sd_criterion = c(1, 1.5, 2)
, percentiles = c(0.05, 0.25, 0.75, 0.95)
, outlier_removal = NULL
, keep_trials_outlier = NULL
, decimal_places = 4
, notification = TRUE
, dm = c()
, save_results = TRUE
, results_name = "results.txt"
, results_path = NULL
, save_summary = TRUE
)
Arguments
dataset |
Name of the data frame in R that contains the long format
table after merging the individual data files using
|
file_name |
A string with the name of a txt or csv file (including the
file extension, e.g. |
file_path |
A string with the path of the folder in which
|
id |
A string with the name of the column in |
within_vars |
String vector with names of grouping variables in
|
between_vars |
String vector with names of grouping variables in
|
dvc |
A string with the name of the column in |
dvd |
A string with the name of the column in |
keep_trials |
A string. Allows deleting unnecessary observations and
keeping necessary observations in |
drop_vars |
String vector with names of columns to delete in |
keep_trials_dvc |
A string. Allows deleting unnecessary observations
and keeping necessary observations in |
keep_trials_dvd |
A string. Allows deleting unnecessary observations
and keeping necessary observations in |
id_properties |
String vector with names of columns in |
sd_criterion |
Numeric vector specifying a number of standard deviation
criteria for which |
percentiles |
Numeric vector containing wanted percentiles for |
outlier_removal |
Numeric. Specifies which outlier removal procedure
with moving criterion to calculate for |
keep_trials_outlier |
A string. Allows deleting unnecessary
observations and keeping necessary observations in |
decimal_places |
Numeric. Specifies number of decimals to be written
in |
notification |
Logical. If |
dm |
String vector with names of dependent measures the function
returns. If empty (i.e., |
save_results |
Logical. If TRUE, the function creates a txt file
containing the returned data frame. Default is |
results_name |
A string with the name of the file |
results_path |
A string with the path of the folder in which
|
save_summary |
Logical. if |
Value
A data frame with dependent measures for the dependent variables in
dvc
and dvd
by id
and grouping variables.
The first column in the finalized table is the id
column.
In case id_properties
was used, the next columns will be the
value of each id_properties
for each id
.
If between_vars
was used then the next column{}s will be the value
of each beween_vars
for each id
.
The next columns of the finalized table contain the dependent measures
according to the design specified. If within_vars
was used, then the
data for each dependent measure was first divided according to the levels
of the first grouping variable in witin_vars()
, and then within each
of those levels prep()
divided the data according to the next
variable in within_vars()
and so forth.
The dependent measures in the finalized table are:
mdvc
: mean dvc
.
sdvc
: SD for dvc
.
meddvc
: median dvc
.
tdvc
: mean dvc
after rejecting observations above
standard deviation criteria specified in sd_criterion
.
ntr
: number of observations rejected for each standard deviation
criterion specified in sd_criterion
.
ndvc
: number of observations before rejection.
ptr
: proportion of observations rejected for each standard
deviation criterion specified in sd_criterion
.
rminv
: harmonic mean of dvc
.
prt
: dvc
according to each of the percentiles specified
in percentiles
.
mdvd
: mean dvd
.
merr
: mean error.
nrmc
: mean dvc
according to non-recursive procedure with
moving criterion.
nnrmc
: number of observations rejected for dvc
according
to non-recursive procedure with moving criterion.
pnrmc
: percent of observations rejected for dvc
according
to non-recursive procedure with moving criterion.
tnrmc
: total number of observations upon which the non-recursive
procedure with moving criterion was applied.
mrmc
: mean dvc
according to modified-recursive procedure
with moving criterion.
nmrmc
: number of observations rejected for dvc
according
to modified-recursive procedure with moving criterion.
pmrmc
: percent of observations rejected for dvc
according
to modified-recursive procedure with moving criterion.
tmrmc
: total number of observations upon which the
modified-recursive procedure with moving criterion was applied.
hrmc
: mean dvc
according to hybrid-recursive procedure
with moving criterion.
nhrmc
: number of observations rejected for dvc
according
to hybrid-recursive procedure with moving criterion.
thrmc
: total number of observations upon which the
hybrid-recursive procedure with moving criterion was applied.
References
Grange, J.A. (2015). trimr: An implementation of common response time trimming methods. R Package Version 1.0.1. https://CRAN.R-project.org/package=trimr
Van Selst, M., & Jolicoeur, P. (1994). A solution to the effect of sample size on outlier elimination. The quarterly journal of experimental psychology, 47(3), 631-650.
Examples
data(stroopdata)
finalized_stroopdata <- prep(
dataset = stroopdata
, file_name = NULL
, file_path = NULL
, id = "subject"
, within_vars = c("block", "target_type")
, between_vars = c("order")
, dvc = "rt"
, dvd = "ac"
, keep_trials = NULL
, drop_vars = c()
, keep_trials_dvc = "raw_data$rt > 100 & raw_data$rt < 3000 & raw_data$ac == 1"
, keep_trials_dvd = "raw_data$rt > 100 & raw_data$rt < 3000"
, id_properties = c()
, sd_criterion = c(1, 1.5, 2)
, percentiles = c(0.05, 0.25, 0.75, 0.95)
, outlier_removal = 2
, keep_trials_outlier = "raw_data$ac == 1"
, decimal_places = 0
, notification = TRUE
, dm = c()
, save_results = FALSE
, results_name = "results.txt"
, results_path = NULL
, save_summary = FALSE
)