R: Load PRISMA Data Files

loadPrismaData {PRISMA}

R Documentation

Load PRISMA Data Files

Description

Loads files generated by the sally tool (see http://www.mlsec.org/sally/) and represents the data as binary token/ngrams x documents matrix. After loading, statistical tests are applied to find features which are not volatile nor constant. Co-occurring features are grouped to further compactify the data. See system.file("extdata","sallyPreprocessing.py", package="PRISMA") for a Python script which generates the corresponding .fsally file from a .sally file which reduce the loading time via loadPrismaData considerably.

Usage

loadPrismaData(path, maxLines = -1, fastSally = TRUE,
               alpha = 0.05, skipFeatureCorrelation=FALSE)

Arguments

`path`	path of the data file without the .sally extension. loadPrisma loads path.sally or path.fsally depending on the fastSally switch.
`maxLines`	maximal number of lines to read from the data file. -1 means to read all lines.
`fastSally`	should the fsally file be used, which drastically decreases loading time.
`alpha`	significance level for the feature tests. If NULL, all features are kept.
`skipFeatureCorrelation`	should the grouping of features based on correlation analysis be skipped.

Value

prismaData

data object representing the tokenized documents as features x samples matrix.

Author(s)

Tammo Krueger <tammokrueger@googlemail.com>

References

See http://www.mlsec.org/sally/ for the sally utility.

Examples

# please see the vingette for examles
# please see system.file("extdata","asap.tar.gz", package="PRISMA") for
# an example sally output

[Package PRISMA version 0.2-7 Index]