R: Partitioning A Dataset Randomly

partition_random {cleandata}

R Documentation

Partitioning A Dataset Randomly

Description

Designed to create a validation column. Optionally records the result into a log file.

Usage

partition_random(x, name = 'Partition', train,
    val = 10^ceiling(log10(train))-train, test = TRUE,
		seed = FALSE, log = eval.parent(in_log_default))

Arguments

`x`	The data frame
`name`	The name of the validation column.
`train`	The proportion of the training set.
`val`	The proportion of the validation set. If not given, a default value is calculated by assuming the sum of `train` and `val` is a nth power of 10.
`test`	Whether to have test set. If `TURE`, a default value is calculated by assuming the sum of `train` and `val` is a nth power of 10.
`seed`	Whether to set a random seed. If you want a reproducible result, pass a number to `seed` as the random seed.
`log`	Controls log files. To produce log files, assign it or the `log_arg` variable in the parent environment (dynamic scope) a list of arguments for `sink()`, such as `file`, `append`, and `split`.

Value

A partitioned column.

Warning

x can only be a data frame. Don't pass a vector to it.

Examples

# refer to vignettes if you want to use log files
message('refer to vignettes if you want to use log files')

# building a data frame
A <- 2:16
B <- letters[12:26]
df <- data.frame(A, B)

# partitioning
df0 <- partition_random(df, train = 7)
df0 <- cbind(df, df0)
print(df0)
df0 <- partition_random(df, train = 7, val = 2)
df0 <- cbind(df, df0)
print(df0)

[Package cleandata version 0.3.0 Index]