Text Processing for Small or Big Data Files


[Up] [Top]

Documentation for package ‘textTinyR’ version 1.1.8

Help Pages

batch_compute Compute batches
big_tokenize_transform String tokenization and transformation for big data sets
bytes_converter bytes converter of a text file ( KB, MB or GB )
cluster_frequency Frequencies of an existing cluster object
cosine_distance cosine distance of two character strings (each string consists of more than one words)
COS_TEXT Cosine similarity for text documents
Count_Rows Number of rows of a file
dense_2sparse convert a dense matrix to a sparse matrix
dice_distance dice similarity of words using n-grams
dims_of_word_vecs dimensions of a word vectors file
Doc2Vec Conversion of text documents to word-vector-representation features ( Doc2Vec )
JACCARD_DICE Jaccard or Dice similarity for text documents
levenshtein_distance levenshtein distance of two words
load_sparse_binary load a sparse matrix in binary format
matrix_sparsity sparsity percentage of a sparse matrix
read_characters read a specific number of characters from a text file
read_rows read a specific number of rows from a text file
save_sparse_binary save a sparse matrix in binary format
select_predictors Exclude highly correlated predictors
sparse_Means RowMens and colMeans for a sparse matrix
sparse_Sums RowSums and colSums for a sparse matrix
sparse_term_matrix Term matrices and statistics ( document-term-matrix, term-document-matrix)
TEXT_DOC_DISSIM Dissimilarity calculation of text documents
text_file_parser text file parser
text_intersect intersection of words or letters in tokenized text
tokenize_transform_text String tokenization and transformation ( character string or path to a file )
tokenize_transform_vec_docs String tokenization and transformation ( vector of documents )
token_stats token statistics
utf_locale utf-locale for the available languages
vocabulary_parser returns the vocabulary counts for small or medium ( xml and not only ) files