text_intersect {textTinyR} | R Documentation |
intersection of words or letters in tokenized text
Description
intersection of words or letters in tokenized text
intersection of words or letters in tokenized text
Usage
# utl <- text_intersect$new(token_list1 = NULL, token_list2 = NULL)
Details
This class includes methods for text or character intersection. If both distinct and letters are FALSE then the simple (count or ratio) word intersection will be computed.
Value
a numeric vector
Methods
text_intersect$new(file_data = NULL)
--------------
count_intersect(distinct = FALSE, letters = FALSE)
--------------
ratio_intersect(distinct = FALSE, letters = FALSE)
Methods
Public methods
Method new()
Usage
text_intersect$new(token_list1 = NULL, token_list2 = NULL)
Arguments
token_list1
a list, where each sublist is a tokenized text sequence (token_list1 should be of same length with token_list2)
token_list2
a list, where each sublist is a tokenized text sequence (token_list2 should be of same length with token_list1)
Method count_intersect()
Usage
text_intersect$count_intersect(distinct = FALSE, letters = FALSE)
Arguments
distinct
either TRUE or FALSE. If TRUE then the intersection of distinct words (or letters) will be taken into account
letters
either TRUE or FALSE. If TRUE then the intersection of letters in the text sequences will be computed
Method ratio_intersect()
Usage
text_intersect$ratio_intersect(distinct = FALSE, letters = FALSE)
Arguments
distinct
either TRUE or FALSE. If TRUE then the intersection of distinct words (or letters) will be taken into account
letters
either TRUE or FALSE. If TRUE then the intersection of letters in the text sequences will be computed
Method clone()
The objects of this class are cloneable with this method.
Usage
text_intersect$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
References
https://www.kaggle.com/c/home-depot-product-search-relevance/discussion/20427 by Igor Buinyi
Examples
library(textTinyR)
tok1 = list(c('compare', 'this', 'text'),
c('and', 'this', 'text'))
tok2 = list(c('with', 'another', 'set'),
c('of', 'text', 'documents'))
init = text_intersect$new(tok1, tok2)
init$count_intersect(distinct = TRUE, letters = FALSE)
init$ratio_intersect(distinct = FALSE, letters = TRUE)