R: intersection of words or letters in tokenized text

text_intersect {textTinyR}

R Documentation

intersection of words or letters in tokenized text

Description

intersection of words or letters in tokenized text

Usage

# utl <- text_intersect$new(token_list1 = NULL, token_list2 = NULL)

Details

This class includes methods for text or character intersection. If both distinct and letters are FALSE then the simple (count or ratio) word intersection will be computed.

Value

a numeric vector

Methods

text_intersect$new(file_data = NULL)
--------------
count_intersect(distinct = FALSE, letters = FALSE)
--------------
ratio_intersect(distinct = FALSE, letters = FALSE)

Methods

Method `new()`

Usage

text_intersect$new(token_list1 = NULL, token_list2 = NULL)

Arguments

token_list1: a list, where each sublist is a tokenized text sequence (token_list1 should be of same length with token_list2)
token_list2: a list, where each sublist is a tokenized text sequence (token_list2 should be of same length with token_list1)

Method `count_intersect()`

Usage

text_intersect$count_intersect(distinct = FALSE, letters = FALSE)

Arguments

distinct: either TRUE or FALSE. If TRUE then the intersection of distinct words (or letters) will be taken into account
letters: either TRUE or FALSE. If TRUE then the intersection of letters in the text sequences will be computed

Method `ratio_intersect()`

Usage

text_intersect$ratio_intersect(distinct = FALSE, letters = FALSE)

Arguments

distinct: either TRUE or FALSE. If TRUE then the intersection of distinct words (or letters) will be taken into account
letters: either TRUE or FALSE. If TRUE then the intersection of letters in the text sequences will be computed

Method `clone()`

The objects of this class are cloneable with this method.

Usage

text_intersect$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

References

https://www.kaggle.com/c/home-depot-product-search-relevance/discussion/20427 by Igor Buinyi

Examples


library(textTinyR)

tok1 = list(c('compare', 'this', 'text'),

            c('and', 'this', 'text'))

tok2 = list(c('with', 'another', 'set'),

            c('of', 'text', 'documents'))


init = text_intersect$new(tok1, tok2)


init$count_intersect(distinct = TRUE, letters = FALSE)


init$ratio_intersect(distinct = FALSE, letters = TRUE)

[Package textTinyR version 1.1.8 Index]

intersection of words or letters in tokenized text

Description

Usage

Details

Value

Methods

Methods

Public methods

Method new()

Usage

Arguments

Method count_intersect()

Usage

Arguments

Method ratio_intersect()

Usage

Arguments

Method clone()

Usage

Arguments

References

Examples

Method `new()`

Method `count_intersect()`

Method `ratio_intersect()`

Method `clone()`