A Text Mining Toolkit for Chinese

Documentation for package ‘tmcn’ version 0.2-13

DESCRIPTION file.

Help Pages

catUTF8	Print the UTF-8 codes of a string.
createDTM	Create a Chinese term-document matrix or a document-term matrix.
createTDM	Create a Chinese term-document matrix or a document-term matrix.
createWordFreq	Create a word frequency data.frame.
GBK	GBK character set
getCharset	Get the current encoding of the locale.
isBIG5	Indicate whether the encoding of input string is BIG5.
isGB18030	Indicate whether the encoding of input string is GB18030.
isGB2312	Indicate whether the encoding of input string is GB2312.
isGBK	Indicate whether the encoding of input string is GBK.
isUTF8	Indicate whether the encoding of input string is UTF-8.
left	Extract the left or right substrings in a character vector.
NTUSD	National Taiwan University Semantic Dictionary
revUTF8	Revert UTF-8 string to Chinese character.
right	Extract the left or right substrings in a character vector.
setchs	Set locale to Simplified Chinese/Traditional Chinese/UK.
setcht	Set locale to Simplified Chinese/Traditional Chinese/UK.
setuk	Set locale to Simplified Chinese/Traditional Chinese/UK.
SIMTRA	Dictionary of simplified and traditional Chinese
SPORT	Sport news.
STOPWORDS	Dictionary of Chinese stop words
stopwordsCN	Return Chinese stop words.
strcap	Mixed case capitalizing.
strextract	Extract matched substrings by regular expression.
strpad	Pad a string to a specified length with a padding character.
strstrip	Trim space of a string.
toPinyin	Convert a chinese text to pinyin format.
toTrad	Convert a Chinese text from simplified to traditional characters and vice versa.
toUTF8	Convert encoding of Chinese string to UTF-8.