R: rapidsplit

rapidsplit {rapidsplithalf}

R Documentation

rapidsplit

Description

A very fast algorithm for computing stratified permutated split-half reliability.

Usage

rapidsplit(
  data,
  subjvar,
  diffvars = NULL,
  stratvars = NULL,
  subscorevar = NULL,
  aggvar,
  splits,
  aggfunc = c("means", "medians"),
  errorhandling = list(type = c("none", "fixedpenalty"), errorvar = NULL, fixedpenalty =
    600, blockvar = NULL),
  standardize = FALSE,
  include.scores = TRUE,
  verbose = TRUE,
  check = TRUE
)

## S3 method for class 'rapidsplit'
print(x, ...)

## S3 method for class 'rapidsplit'
plot(
  x,
  type = c("average", "minimum", "maximum", "random", "all"),
  show.labels = TRUE,
  ...
)

rapidsplit.chunks(
  data,
  subjvar,
  diffvars = NULL,
  stratvars = NULL,
  subscorevar = NULL,
  aggvar,
  splits,
  aggfunc = c("means", "medians", "custom"),
  errorhandling = list(type = c("none", "fixedpenalty"), errorvar = NULL, fixedpenalty =
    600, blockvar = NULL),
  standardize = FALSE,
  include.scores = TRUE,
  verbose = TRUE,
  check = TRUE,
  chunks = 4,
  cluster = NULL
)

Arguments

`data`	Dataset, a `data.frame`.
`subjvar`	Subject ID variable name, a `character`.
`diffvars`	Names of variables that determine which conditions need to be subtracted from each other, `character`.
`stratvars`	Additional variables that the splits should be stratified by; a `character`.
`subscorevar`	A `character` variable identifying subgroups within a participant's data from which separate scores should be computed. To compute a participant's final score, these subscores will be averaged together. A typical use case is the D-score of the implicit association task.
`aggvar`	Name of variable whose values to aggregate, a `character`. Examples include reaction times and error rates.
`splits`	Number of split-halves to average, an `integer`. It is recommended to use around 5000.
`aggfunc`	The function by which to aggregate the variable defined in `aggvar`; can be `"means"`, `"medians"`, or a custom function (not a function name). This custom function must take a numeric vector and output a single value. Only if `aggfunc` is set to `"custom"`.
`errorhandling`	A list with 4 named items, to be used to replace error trials with the block mean of correct responses plus a fixed penalty, as in the IAT D-score. The 4 items are `type` which can be set to `"none"` for no error replacement, or `"fixedpenalty"` to replace error trials as described; `errorvar` requires name of the `logical` variable indicating an incorrect response (as `TRUE`); `fixedpenalty` indicates how much of a penalty should be added to said block mean; and `blockvar` indicates the name of the block variable.
`standardize`	Whether to divide by scores by the subject's SD; a `logical`. Regardless of whether error penalization is utilized, this standardization will be based on the unpenalized SD of correct and incorrect trials, as in the IAT D-score.
`include.scores`	Include all individual split-half scores?
`verbose`	Display progress bars? Defaults to `TRUE`.
`check`	Check input for possible problems?
`x`	`rapidsplit` object to print or plot.
`...`	Ignored.
`type`	Character argument indicating what should be plotted. By default, this plots the random split whose correlation is closest to the average. However, this can also plot the random split with the `"minimum"` or `"maximum"` split-half correlation, or any `"random"` split. `"all"` splits can also be plotted together in one figure.
`show.labels`	Should participant IDs be shown above their points in the scatterplot? Defaults to `TRUE` and is ignored when `type` is `"all"`.
`chunks`	Number of chunks to divide the splits in, for more memory-efficient computation, and to divide over multiple cores if requested.
`cluster`	Chunks will be run on separate cores if a cluster is provided, or an `integer` specifying the number of cores. Otherwise, if the value is `NULL`, the chunks are run sequentially.

Details

The order of operations (with optional steps between brackets) is:

Splitting
(Replacing error trials within block within split)
Computing aggregates per condition (per subscore) per person
Subtracting conditions from each other
(Dividing the resulting (sub)score by the SD of the data used to compute that (sub)score)
(Averaging subscores together into a single score per person)
Correlating scores from one half with scores from the other half
Applying the Spearman-Brown correction using spearmanBrown()
Computing the average split-half reliability using cormean()

Value

A list containing

r: the averaged reliability.
allcors: a vector with the reliability of each iteration.
nobs: the number of participants.
scores: the individual participants scores in each split-half, contained in a list with two matrices (Only included if requested with include.scores).

Note

This function can use a lot of memory in one go. If you're computing the reliability of a large dataset or you have little RAM, it may pay off to use the sequential version of this function instead: rapidsplit.chunks()
It is currently unclear it is better to pre-process your data before or after splitting it. If you are computing the IAT D-score, you can therefore use errorhandling and standardize to perform these two actions after splitting, or you can process your data before splitting and forgo these two options.

Examples


data(foodAAT)
# Reliability of the double difference score:
# [RT(push food)-RT(pull food)] - [RT(push object)-RT(pull object)]

frel<-rapidsplit(data=foodAAT,
                 subjvar="subjectid",
                 diffvars=c("is_pull","is_target"),
                 stratvars="stimid",
                 aggvar="RT",
                 splits=100)
                 
print(frel)

plot(frel,type="all")

           
# Compute a single random split-half reliability of the error rate
rapidsplit(data=foodAAT,
           subjvar="subjectid",
           aggvar="error",
           splits=1,
           aggfunc="means")

# Compute the reliability of an IAT D-score
data(raceIAT)
rapidsplit(data=raceIAT,
           subjvar="session_id",
           diffvars="congruent",
           subscorevar="blocktype",
           aggvar="latency",
           errorhandling=list(type="fixedpenalty",errorvar="error",
                              fixedpenalty=600,blockvar="block_number"),
           splits=100,
           standardize=TRUE)


# Unstratified reliability of the median RT
rapidsplit.chunks(data=foodAAT,
                  subjvar="subjectid",
                  aggvar="RT",
                  splits=100,
                  aggfunc="medians",
                  chunks=8)

# Compute the reliability of Tukey's trimean of the RT
# on 2 CPU cores
trimean<-function(x){ 
  sum(quantile(x,c(.25,.5,.75))*c(1,2,1))/4
}
rapidsplit.chunks(data=foodAAT,
                  subjvar="subjectid",
                  aggvar="RT",
                  splits=200,
                  aggfunc=trimean,
                  cluster=2)

[Package rapidsplithalf version 0.2 Index]