loadPrismaData {PRISMA}R Documentation

Load PRISMA Data Files

Description

Loads files generated by the sally tool (see http://www.mlsec.org/sally/) and represents the data as binary token/ngrams x documents matrix. After loading, statistical tests are applied to find features which are not volatile nor constant. Co-occurring features are grouped to further compactify the data. See system.file("extdata","sallyPreprocessing.py", package="PRISMA") for a Python script which generates the corresponding .fsally file from a .sally file which reduce the loading time via loadPrismaData considerably.

Usage

loadPrismaData(path, maxLines = -1, fastSally = TRUE,
               alpha = 0.05, skipFeatureCorrelation=FALSE)

Arguments

path

path of the data file without the .sally extension. loadPrisma loads path.sally or path.fsally depending on the fastSally switch.

maxLines

maximal number of lines to read from the data file. -1 means to read all lines.

fastSally

should the fsally file be used, which drastically decreases loading time.

alpha

significance level for the feature tests. If NULL, all features are kept.

skipFeatureCorrelation

should the grouping of features based on correlation analysis be skipped.

Value

prismaData

data object representing the tokenized documents as features x samples matrix.

Author(s)

Tammo Krueger <tammokrueger@googlemail.com>

References

See http://www.mlsec.org/sally/ for the sally utility.

Examples

# please see the vingette for examles
# please see system.file("extdata","asap.tar.gz", package="PRISMA") for
# an example sally output

[Package PRISMA version 0.2-7 Index]