m2doc {chinese.misc}R Documentation

Rewrite Terms and Frequencies into Many Files

Description

Given a matrix representing a document term matrix, this function takes each row as term frequencies for one file, and rewrite each row as a text. Some text mining tools other than R accept segmented Chinese texts. If you already convert texts into a matrix, you can use this function to convert it into texts, corpus or create document term matrix again.

Usage

m2doc(m, checks = FALSE)

Arguments

m

a numeric matrix, data frame is not allowed. It must represent a document term matrix, rather than a term document matrix. Each row of the matrix represents a text. The matrix should have column names as terms to be written, but if it is NULL, the function will take them as "term1", "term2", "term3", ...No NA in the matrix is allowed.

checks

should be TRUE or FALSE. If it is TRUE, the function will check whether there is any NA in the input, whether it is numeric, and whether there is any negative number. Default is FALSE to save time.

Value

a character vector, each element is a text with repeated terms (by rep) linked by a space.

Examples

s <- sample(1:5, 20, replace = TRUE)
m <- matrix(s, nrow = 5)
colnames(m) <- c("r", "text", "mining", "data")
m2doc(m)

[Package chinese.misc version 0.2.3 Index]