textToXY,textToXYpred {regtools}R Documentation

Tools for Text Classification

Description

"R-style," classification-oriented wrappers for the text2vec package.

Usage

    textToXY(docs,labels,kTop=50,stopWords='a') 
    textToXYpred(ttXYout,predDocs) 

Arguments

docs

Character vector, one element per document.

predDocs

Character vector, one element per document.

labels

Class labels, as numeric, character or factor. NULL is used at the prediction stage.

kTop

The number of most-frequent words to retain; 0 means retain all.

stopWords

Character vector of common words, e.g. prepositions to delete. Recommended is tm::stopwords('english').

ttXYout

Output object from textToXY.

Details

A typical classification/machine learning package will have as arguments a feature matrix X and a labels vector/factor Y. For a "bag of words" analysis in the text case, each row of X would be a document and each column a word.

The functions here are basically wrappers for generating X. Wrappers are convenient in that:

The typical usage pattern is thus:

Value

The function textToXY returns an R list with components x and y for X and Y, and a copy of the input stopWords.

The function textToXY returns X.

Author(s)

Norm Matloff


[Package regtools version 1.7.0 Index]