A Text Mining Toolkit for Chinese


[Up] [Top]

Documentation for package ‘tmcn’ version 0.2-13

Help Pages

catUTF8 Print the UTF-8 codes of a string.
createDTM Create a Chinese term-document matrix or a document-term matrix.
createTDM Create a Chinese term-document matrix or a document-term matrix.
createWordFreq Create a word frequency data.frame.
GBK GBK character set
getCharset Get the current encoding of the locale.
isBIG5 Indicate whether the encoding of input string is BIG5.
isGB18030 Indicate whether the encoding of input string is GB18030.
isGB2312 Indicate whether the encoding of input string is GB2312.
isGBK Indicate whether the encoding of input string is GBK.
isUTF8 Indicate whether the encoding of input string is UTF-8.
left Extract the left or right substrings in a character vector.
NTUSD National Taiwan University Semantic Dictionary
revUTF8 Revert UTF-8 string to Chinese character.
right Extract the left or right substrings in a character vector.
setchs Set locale to Simplified Chinese/Traditional Chinese/UK.
setcht Set locale to Simplified Chinese/Traditional Chinese/UK.
setuk Set locale to Simplified Chinese/Traditional Chinese/UK.
SIMTRA Dictionary of simplified and traditional Chinese
SPORT Sport news.
STOPWORDS Dictionary of Chinese stop words
stopwordsCN Return Chinese stop words.
strcap Mixed case capitalizing.
strextract Extract matched substrings by regular expression.
strpad Pad a string to a specified length with a padding character.
strstrip Trim space of a string.
toPinyin Convert a chinese text to pinyin format.
toTrad Convert a Chinese text from simplified to traditional characters and vice versa.
toUTF8 Convert encoding of Chinese string to UTF-8.