Text Processing for Small or Big Data Files

Documentation for package ‘textTinyR’ version 1.1.8

Help Pages

batch_compute	Compute batches
big_tokenize_transform	String tokenization and transformation for big data sets
bytes_converter	bytes converter of a text file ( KB, MB or GB )
cluster_frequency	Frequencies of an existing cluster object
cosine_distance	cosine distance of two character strings (each string consists of more than one words)
COS_TEXT	Cosine similarity for text documents
Count_Rows	Number of rows of a file
dense_2sparse	convert a dense matrix to a sparse matrix
dice_distance	dice similarity of words using n-grams
dims_of_word_vecs	dimensions of a word vectors file
Doc2Vec	Conversion of text documents to word-vector-representation features ( Doc2Vec )
JACCARD_DICE	Jaccard or Dice similarity for text documents
levenshtein_distance	levenshtein distance of two words
load_sparse_binary	load a sparse matrix in binary format
matrix_sparsity	sparsity percentage of a sparse matrix
read_characters	read a specific number of characters from a text file
read_rows	read a specific number of rows from a text file
save_sparse_binary	save a sparse matrix in binary format
select_predictors	Exclude highly correlated predictors
sparse_Means	RowMens and colMeans for a sparse matrix
sparse_Sums	RowSums and colSums for a sparse matrix
sparse_term_matrix	Term matrices and statistics ( document-term-matrix, term-document-matrix)
TEXT_DOC_DISSIM	Dissimilarity calculation of text documents
text_file_parser	text file parser
text_intersect	intersection of words or letters in tokenized text
tokenize_transform_text	String tokenization and transformation ( character string or path to a file )
tokenize_transform_vec_docs	String tokenization and transformation ( vector of documents )
token_stats	token statistics
utf_locale	utf-locale for the available languages
vocabulary_parser	returns the vocabulary counts for small or medium ( xml and not only ) files