by_split {splithalfr} | R Documentation |
Calculate split scores per participant
Description
Calculates split scores, by applying fn_score
to subsets of
data
as specified via participants
. It provides a range of
additional arguments for different splitting methods and to support parallel
processing. To learn more about writing scoring algorithms for use with the
splithalfr
, see the included vignettes. by_split
is modeled after the by
function, accepting similar values for
the first three arguments (data
, INDICES
, FUN
). For more
information about different metods for splitting data, see
get_split_indexes_from_stratum
. For more information about
stratification, see split_df
Usage
by_split(
data,
participants,
fn_score,
stratification = NULL,
replications = 1,
method = c("random", "odd_even", "first_second"),
replace = FALSE,
split_p = 0.5,
subsample_p = 1,
subsample_n = NULL,
careful = TRUE,
match_participants = FALSE,
ncores = detectCores(),
seed = NULL,
verbose = TRUE
)
Arguments
data |
(data frame) data frame containing data to score. Data should be in long format, with one row per combination of participant and trial or item. |
participants |
(vector) Vector that identifies participants in
|
fn_score |
(function) receives full or split sets, should return a single number. |
stratification |
(vector). Vector that identifies which subsets of
|
replications |
(numeric) Number of replications that split scores are calculated. |
method |
(character) Splitting method. Note that |
replace |
(logical) If TRUE, stratum is sampled with replacement. |
split_p |
(numeric) Desired length of both parts, expressed as a
proportion of the length of the data per participant. If |
subsample_p |
(numeric) Subsample a proportion of |
subsample_n |
(numeric) Subsample a number of participants before splitting. |
careful |
(boolean) If TRUE, stop with an error when called with arguments that may yield unexpected splits |
match_participants |
(logical) Default FALSE. If FALSE, the split-halves
are newly randomized for each iteration and participant. If TRUE, the
split-halves are newly randomized for each replication, but within a
replication the same randomization is applied across participants. If the
order of rows of datasets per participant denotes similar observations
(such as items in a questionnaire), |
ncores |
(integer). By default, all available CPU cores are used. If 1,
split replications are executed serially (via |
seed |
(integer). When split replications are exectured in parallel,
|
verbose |
(logical) If TRUE, reports progress. Note that progress across split replications is not displayed when these are executed in parallel. |
Value
(data frame) Returns a data frame with a column for
participant
, a column replication
that counts split
replications, and score_1
and score_2
for the score
calculated of each part via fn_score
.
Examples
# N.B. This example uses R script from the vignette: "rapi_sum"
data("ds_rapi", package = "splithalfr")
# Convert to long format
ds_long <- reshape(
ds_rapi,
varying = paste("V", 1 : 23, sep = ""),
v.names = "answer",
direction = "long",
idvar = "twnr",
timevar = "item"
)
# Function for RAPI sum score
rapi_fn_score <- function (data) {
return (sum(data$answer))
}
# Calculate scores on full data
by(
ds_long,
ds_long$twnr,
rapi_fn_score
)
# Permutation split, one iteration, items matched across participants
split_scores <- by_split(
ds_long,
ds_long$twnr,
rapi_fn_score,
ncores = 1,
match_participants = TRUE
)
# Mean flanagan-rulon coefficient across splits
fr <- mean(split_coefs(split_scores, flanagan_rulon))