ArtificialData {MultiJoin} | R Documentation |
create artificial data for testing
Description
This function allows quick generation of a test data set which can be used with the majority of the Join functions
Usage
ArtificialData(fakeDataDir = "~/fakeData2/", joinKey = letters[1:20],
numFiles = 4, N = rep(15, numFiles), SORT = 1, GZIP = 0,
sep = c(" ", ",", "\t", "|")[1], prefix = "file", suffix = ".txt",
daten = month.abb, NCOL = rep(3, numFiles), chunkSize = 1000,
verbose = 0)
Arguments
fakeDataDir |
directory to put the data |
joinKey |
set of join keys to choose from (has to be longer than N) - this column will be the key for join |
numFiles |
number of files to split the data across |
N |
number of rows in each file created, e.g. N = c(15,20,10,30) |
SORT |
should the join key be sorted? |
GZIP |
should the data files created by gzipped? |
sep |
column delimiter; default white space |
prefix |
file name prefix |
suffix |
file name suffix |
daten |
data to sample from |
NCOL |
number of data columns per file |
chunkSize |
write that many lines to the file at once |
verbose |
level of verbosity |
Value
invisibly return data and file names
Author(s)
"Markus Loecher, Berlin School of Economics and Law (BSEL)" <markus.loecher@gmail.com>
Examples
if (0){
ArtificialData("fakeData2",verbose=1)
ArtificialData("fakeData2",joinKey = 1:2000, N = rep(1500,4) ,verbose=0)
ret = ArtificialData(fakeDataDir="/tmp/fakeData")
ret = ArtificialData(fakeDataDir="./fakeData", joinKey=letters[1:10], numFiles = 6, N = rep(5,6))
ret = ArtificialData(SORT = 1, GZIP = 1)
ret = ArtificialData(fakeDataDir="fakeData", joinKey = 0:9, N = rep(6, 4), verbose=1)
#on allegro:
ret = ArtificialData(fakeDataDir="./fakeData", joinKey=letters, numFiles = 10,
N = rep(18,10), NCOL=rep(5,10))
}
[Package MultiJoin version 0.1.1 Index]