partition_random {cleandata}R Documentation

Partitioning A Dataset Randomly

Description

Designed to create a validation column. Optionally records the result into a log file.

Usage

partition_random(x, name = 'Partition', train,
    val = 10^ceiling(log10(train))-train, test = TRUE,
		seed = FALSE, log = eval.parent(in_log_default))

Arguments

x

The data frame

name

The name of the validation column.

train

The proportion of the training set.

val

The proportion of the validation set. If not given, a default value is calculated by assuming the sum of train and val is a nth power of 10.

test

Whether to have test set. If TURE, a default value is calculated by assuming the sum of train and val is a nth power of 10.

seed

Whether to set a random seed. If you want a reproducible result, pass a number to seed as the random seed.

log

Controls log files. To produce log files, assign it or the log_arg variable in the parent environment (dynamic scope) a list of arguments for sink(), such as file, append, and split.

Value

A partitioned column.

Warning

x can only be a data frame. Don't pass a vector to it.

See Also

sink

Examples

# refer to vignettes if you want to use log files
message('refer to vignettes if you want to use log files')

# building a data frame
A <- 2:16
B <- letters[12:26]
df <- data.frame(A, B)

# partitioning
df0 <- partition_random(df, train = 7)
df0 <- cbind(df, df0)
print(df0)
df0 <- partition_random(df, train = 7, val = 2)
df0 <- cbind(df, df0)
print(df0)

[Package cleandata version 0.3.0 Index]