sample_dat {imputeTestbench} | R Documentation |
Sample time series data
Description
Sample time series using completely at random (MCAR) or at random (MAR)
Usage
sample_dat(datin, smps = "mcar", repetition = 10, b = 10,
blck = 50, blckper = TRUE, plot = FALSE)
Arguments
datin |
input numeric vector |
smps |
chr string of sampling type to use, options are |
repetition |
numeric for repetitions to be done for each missPercent value |
b |
numeric indicating the total amount of missing data as a percentage to remove from the complete time series |
blck |
numeric indicating block sizes as a proportion of the sample size for the missing data |
blckper |
logical indicating if the value passed to |
plot |
logical indicating if a plot is returned showing the sampled data, plots only the first repetition |
Value
Input data with NA
values for the sampled observations if plot = FALSE
, otherwise a plot showing the missing observations over the complete dataset.
The missing data if smps = 'mar'
are based on random sampling by blocks. The start location of each block is random and overlapping blocks are not counted uniquely for the required sample size given by b
. Final blocks are truncated to ensure the correct value of b
is returned. Blocks are fixed at 1 if the proportion is too small, in which case "mcar"
should be used. Block sizes are also truncated to the required sample size if the input value is too large if blckper = FALSE
. For the latter case, this is the same as setting blck = 1
and blckper = TRUE
.
For all cases, the first and last observation will never be removed to allow comparability of interpolation schemes. This is especially relevant for cases when b
is large and smps = 'mar'
is used. For example, method = na.approx
will have rmse = 0 for a dataset where the removed block includes the last n observations. This result could provide misleading information in comparing methods.
Examples
a <- rnorm(1000)
# default sampling
sample_dat(a)
# use mar sampling
sample_dat(a, smps = 'mar')
# show a plot of one repetition
sample_dat(a, plot = TRUE)
# show a plot of one repetition, mar sampling
sample_dat(a, smps = 'mar', plot = TRUE)
# change plot aesthetics
library(ggplot2)
p <- sample_dat(a, plot = TRUE)
p + scale_colour_manual(values = c('black', 'grey'))
p + theme_minimal()
p + ggtitle('Example of simulating missing data')