R: Characteristic words and documents (LexChar)

LexChar {Xplortext}

R Documentation

Characteristic words and documents (LexChar)

Description

Measure of the association between vocabulary or words and quantitative or qualitative contextual variables.

Usage

LexChar(object, proba=0.05, maxCharDoc=10, maxPrnDoc=100, 
              marg.doc="before",  context=NULL, correct=TRUE, nbsample=500,
              seed=12345,...)

Arguments

`object`	TextData, DocumentTermMatrix, dataframe or matrix object
`proba`	threshold on the p-value used when selecting the characteristic words (by default 0.05)
`maxCharDoc`	maximum number of characteristic source-documents to extract (by default 10). See details
`maxPrnDoc`	maximum length to be printed for a characteristic document (by default 100 characters)
`marg.doc`	if after/before, frequencies after/before TextData selection are used as document weighting (by default "before"); if before.RW all words under threshold in TextData function are included as a new word named RemovedWords
`context`	name of quantitative or qualitative variables
`correct`	if TRUE, pvalue correction test is applied for quantitative contextual variables (by default TRUE)
`nbsample`	number of samples drawn to evaluate the pvalues in quantitative contextual variables
`seed`	Seed to obtain the same results using permutation tests (by default 12345)
`...`	further arguments passed to or from other methods

Details

The lexical table provided by TextData can consider either source-documents or aggregate-documents, in accordance with the value of argument "var.agg" in TextData. Context cualitative variables allow to aggregate documents by combining the categories of the qualitative variables and the aggregation variable if any.

Extracting the characteristic words (CharWord) for a too high number of documents is of no interest and time-consuming.

In any case, only the first maxPrnDoc characters of each characteristic document are printed (by default 100).

In the case of the association between words and qualitative variables, the usual characteristic words are provided.

quali$CharWord provides the qualitative variables (including the aggregation variable) and their categories. quali$stats provides association statistics for vocabulary and qualitative variables (including the aggregation variable). quali$CharDoc provides characteristic source-documents for the categories. quanti$CharWord provides characteristic quantitative variables for each word. If there are aggregation variable and/or qualitative contextual variable, from aggregated lexical table. quanti$stats provides statistics for vocabulary and quantitative variables. If there are aggregation variable and/or qualitative contextual variable, from aggregated lexical table.

If the lexical table (object) is not a TextData object, context argument can be columns of the same dataframe. The aggregate lexical table is constructed from the combinations of the categories of the qualitative variables (including the aggregation variable).

Value

Returns a list including:

`CharWord`	characteristic words of all the documents
`stats`	association statistics of the lexical table
`CharDoc`	characteristic source-documents of all the aggregate-documents including qualitative contextual variables
`Vocab`	characteristic quantitative and qualitative variables of the words. CharWord and stats are provided.

Author(s)

Monica Bécue-Bertaut, Ramón Alvarez-Esteban ramon.alvarez@unileon.es, Josep-Antón Sánchez-Espigares, Belchin Kostov

References

Lebart, L., Salem, A., & Berry, L. (1998). Exploring textual data. (D. Kluwer, Ed.). doi:10.1007/978-94-017-1525-6.

Examples

data(open.question)
 res.TD<-TextData(open.question, var.text=c(9,10), var.agg="Gen_Edu", Fmin=10, Dmin=10,
                   remov.number=TRUE, stop.word.tm=TRUE)
 res.LexChar <-LexChar(res.TD)
 summary(res.LexChar)

[Package Xplortext version 1.5.3 Index]