catUTF8 |
Print the UTF-8 codes of a string. |
createDTM |
Create a Chinese term-document matrix or a document-term matrix. |
createTDM |
Create a Chinese term-document matrix or a document-term matrix. |
createWordFreq |
Create a word frequency data.frame. |
GBK |
GBK character set |
getCharset |
Get the current encoding of the locale. |
isBIG5 |
Indicate whether the encoding of input string is BIG5. |
isGB18030 |
Indicate whether the encoding of input string is GB18030. |
isGB2312 |
Indicate whether the encoding of input string is GB2312. |
isGBK |
Indicate whether the encoding of input string is GBK. |
isUTF8 |
Indicate whether the encoding of input string is UTF-8. |
left |
Extract the left or right substrings in a character vector. |
NTUSD |
National Taiwan University Semantic Dictionary |
revUTF8 |
Revert UTF-8 string to Chinese character. |
right |
Extract the left or right substrings in a character vector. |
setchs |
Set locale to Simplified Chinese/Traditional Chinese/UK. |
setcht |
Set locale to Simplified Chinese/Traditional Chinese/UK. |
setuk |
Set locale to Simplified Chinese/Traditional Chinese/UK. |
SIMTRA |
Dictionary of simplified and traditional Chinese |
SPORT |
Sport news. |
STOPWORDS |
Dictionary of Chinese stop words |
stopwordsCN |
Return Chinese stop words. |
strcap |
Mixed case capitalizing. |
strextract |
Extract matched substrings by regular expression. |
strpad |
Pad a string to a specified length with a padding character. |
strstrip |
Trim space of a string. |
toPinyin |
Convert a chinese text to pinyin format. |
toTrad |
Convert a Chinese text from simplified to traditional characters and vice versa. |
toUTF8 |
Convert encoding of Chinese string to UTF-8. |