loadPrismaData {PRISMA} | R Documentation |
Load PRISMA Data Files
Description
Loads files generated by the sally tool (see
http://www.mlsec.org/sally/) and represents the data as binary
token/ngrams x documents matrix. After loading, statistical tests are
applied to find features which are not volatile nor
constant. Co-occurring features are grouped to further compactify the
data. See system.file("extdata","sallyPreprocessing.py",
package="PRISMA")
for a Python script which generates the
corresponding .fsally file from a .sally file which reduce the
loading time via loadPrismaData
considerably.
Usage
loadPrismaData(path, maxLines = -1, fastSally = TRUE,
alpha = 0.05, skipFeatureCorrelation=FALSE)
Arguments
path |
path of the data file without the .sally extension. loadPrisma loads path.sally or path.fsally depending on the fastSally switch. |
maxLines |
maximal number of lines to read from the data file. -1 means to read all lines. |
fastSally |
should the fsally file be used, which drastically decreases loading time. |
alpha |
significance level for the feature tests. If NULL, all features are kept. |
skipFeatureCorrelation |
should the grouping of features based on correlation analysis be skipped. |
Value
prismaData |
data object representing the tokenized documents as features x samples matrix. |
Author(s)
Tammo Krueger <tammokrueger@googlemail.com>
References
See http://www.mlsec.org/sally/ for the sally utility.
Examples
# please see the vingette for examles
# please see system.file("extdata","asap.tar.gz", package="PRISMA") for
# an example sally output