createDTM {tmcn}R Documentation

Create a Chinese term-document matrix or a document-term matrix.

Description

Create a Chinese term-document matrix or a document-term matrix.

Usage

createDTM(string, language = c("zh", "en"), tokenize = NULL, removePunctuation = TRUE, 
  removeNumbers = TRUE, removeStopwords = TRUE)
createTDM(string, language = c("zh", "en"), tokenize = NULL, removePunctuation = TRUE, 
  removeNumbers = TRUE, removeStopwords = TRUE)

Arguments

string

A character vector.

language

The language type, 'zh' means Chinese.

tokenize

A tokenizers function.

removePunctuation

Whether to remove the punctuations.

removeNumbers

Whether to remove the numbers.

removeStopwords

Whether to remove the stop words.

Details

Package "tm" is required.

Value

An object of class TermDocumentMatrix or class DocumentTermMatrix.

Author(s)

Jian Li <rweibo@sina.com>


[Package tmcn version 0.2-13 Index]