basefunctions {Rsampling} | R Documentation |
Shuffling and resampling functions
Description
Functions to run (un)restricted sampling with or without replacement in a dataframe.
Usage
within_rows(dataframe, cols = 1:ncol(dataframe), replace = FALSE,
FUN = base::sample)
within_columns(dataframe, cols = 1:ncol(dataframe), stratum = rep(1,
nrow(dataframe)), replace = FALSE, FUN = base::sample)
normal_rand(dataframe, cols = 1:ncol(dataframe), stratum = rep(1,
nrow(dataframe)), replace = FALSE, FUN = base::sample)
rows_as_units(dataframe, stratum = rep(1, nrow(dataframe)), replace = FALSE,
length.out = NULL)
columns_as_units(dataframe, cols = 1:ncol(dataframe), replace = FALSE,
length.out = NULL)
Arguments
dataframe |
a dataframe with the data to be shuffled or resampled. |
cols |
columns of dataframe that should be selected to be resampled/shuffled. Defaults for all columns. |
replace |
(logical) should the data be permuted (FALSE) or resampled with replacement (TRUE) ? |
FUN |
function used for the sampling procedure. The default is |
stratum |
factor or integer vector that separates data in groups or strata. Randomizations will be performed within each level of the stratum. Needs at least two observations in each level. Default is a single-level stratum. |
length.out |
(integer) specifies the size of the resulting data set.
For columns_as_units, a data.frame with length.out columns will be returned, and for
rows_as_units, a data.frame with length.out rows will be returned.
Note that if length.out is larger than the relevant dimension, |
Value
a dataframe with the same structure of those input in dataframe
with values randomized accordingly.
Details
Each function performs as close as possible the corresponding options in Resampling Stats add-in for Excel (www.resample.com) for permutation (shuffling) and sampling with replacement (resampling) values in tabular data:
-
normal_rand
corresponds to the 'normal shuffle' and 'normal resample' option. For shuffling (replace=FALSE
) the data is permuted over all cells ofdataframe
. For resampling (replace=TRUE
) data from any cell can be sampled and attributed to any other cell. -
within_rows
andwithin_columns
correspond to the options with the same names. The randomization is done within each row or column ofdataframe
. So for shuffling the values of each row/column are permuted independently and for resampling the values are sampled independently from each row/column and attributed only to cells of the row/column they were sampled. -
rows_as_units
andcolumns_as_units
also correspond to the options with the same names. Each row or columndataframe
is shuffled or resampled as whole. Only the placement of rows and columns in the dataframe change. The values and their position within each row/column remains the same.
All functions assemble the randomized values in a dataframe
of the same configuration of the original. Columns that
were not selected to be randomized with argument cols
are then
bound to the resulting dataframe. The order and names of the rows and columns are preserved, except if length.out
is specified. In this case, the randomized rows/columns may be shifted to the end of the table.
When both stratum
and length.out
are used, the function will try to keep the proportion of each strata close to the original.
References
Statistics.com LCC. 2009. Resampling Stats Add-in for Excel User's Guide. http://www.resample.com/content/software/excel/userguide/RSXLHelp.pdf