i_mca {idm}R Documentation

Incremental Multiple Correspondence Analysis (MCA)

Description

This function computes the Multiple Correspondence Analysis (MCA) solution on the indicator matrix using two incremental methods described in Iodice D'Enza & Markos (2015)

Usage

i_mca(data1, data2, method=c("exact","live"), current_rank, nchunk = 2, 
 ff = 0, disk = FALSE)

Arguments

data1

Matrix or data frame of starting data or full data if data2 = NULL

data2

Matrix or data frame of incoming data

method

String specifying the type of implementation: "exact" or "live". "exact" refers to the case when all the data is available from the start and dimension reduction is based on the method of Hall et al. (2002). "live" refers to the case when new data comes in as data flows and dimension reduction is based on the method of Ross et al. (2008). The main difference between the two approaches lies in the calculation of the column margins of the input matrix. For the "exact" approach, the analysis is based on the "global" margins, that is, the margins of the whole indicator matrix, which is available in advance. For the "live" approach, the whole matrix is unknown and the global margins are approximated by the "local" margins, that is, the average margins of the data analysed insofar. A detailed description of the two implementations is provided in Iodice D' Enza & Markos (2015).

current_rank

Rank of approximation or number of components to compute; if empty, the full rank is used

nchunk

Number of incoming data chunks (equal splits of 'data2', default = 2) or a Vector with the row size of each incoming data chunk

ff

Number between 0 and 1 indicating the "forgetting factor" used to down-weight the contribution of earlier data blocks to the current solution. When ff = 0 (default) no forgetting occurs; applicable only when method ="live"

disk

Logical indicating whether then output is saved to hard disk

Value

rowpcoord

Row principal coordinates

colpcoord

Column principal coordinates

rowcoord

Row standard coordinates

colcoord

Column standard coordinates

sv

Singular values

inertia.e

Percentages of explained inertia

levelnames

Column labels

rowctr

Row contributions

colctr

Column contributions

rowcor

Row squared correlations

colcor

Column squared correlations

rowmass

Row masses

colmass

Column masses

nchunk

A copy of nchunk in the return object

disk

A copy of disk in the return object

ff

A copy of ff in the return object

allrowcoord

A list containing the row principal coordinates produced after each data chunk is analyzed; returned only when disk = FALSE

allcolcoord

A list containing the column principal coordinates on the principal components produced after each data chunk is analyzed; returned only when disk = FALSE

allrowctr

A list containing the row contributions after each data chunk is analyzed; returned only when disk = FALSE

allcolctr

A list containing the column contributions after each data chunk is analyzed; returned only when disk = FALSE

allrowcor

A list containing the row squared correlations produced after each data chunk is analyzed; returned only when disk = FALSE

allcolcor

A list containing the column squared correlations produced after each data chunk is analyzed; returned only when disk = FALSE

References

Hall, P., Marshall, D., & Martin, R. (2002). Adding and subtracting eigenspaces with eigenvalue decomposition and singular value decomposition. Image and Vision Computing, 20(13), 1009-1016.

Iodice D' Enza, A., & Markos, A. (2015). Low-dimensional tracking of association structures in categorical data, Statistics and Computing, 25(5), 1009–1022.

Iodice D'Enza, A., Markos, A., & Buttarazzi, D. (2018). The idm Package: Incremental Decomposition Methods in R. Journal of Statistical Software, Code Snippets, 86(4), 1–24. DOI: 10.18637/jss.v086.c04.

Ross, D. A., Lim, J., Lin, R. S., & Yang, M. H. (2008). Incremental learning for robust visual tracking. International Journal of Computer Vision, 77(1-3), 125–141.

See Also

update.i_mca, i_pca, update.i_pca, add_es

Examples

##Example 1 - Exact case
data("women", package = "idm")
nc = 5 # number of chunks
res_iMCAh = i_mca(data1 = women[1:300,1:7], data2 = women[301:2107,1:7]
,method = "exact", nchunk = nc)
#static MCA plot of attributes on axes 2 and 3
plot(x = res_iMCAh, dim = c(2,3), what = c(FALSE,TRUE), animation = FALSE)

#\donttest is used here because the code calls the saveLatex function of the animation package 
#which requires ImageMagick or GraphicsMagick and 
#Adobe Acrobat Reader to be installed in your system 
#Creates animated plot in PDF for objects and variables
plot(res_iMCAh, animation = TRUE, frames = 10, movie_format = 'pdf')


##Example 2 - Live case
data("tweet", package = "idm")
nc = 5
#provide attributes with custom labels
labels = c("HLTN", "ICN", "MRT","BWN","SWD","HYT","CH", "-", "-/+", "+", "++", "Low", "Med","High")
#mimics the 'live' MCA implementation 
res_iMCAl = i_mca(data1 = tweet[1:100,], data2 = tweet[101:1000,],
method="live", nchunk = nc, current_rank = 2)


#\donttest is used here because the code calls the saveLatex function of the animation package 
#which requires ImageMagick or GraphicsMagick and 
#Adobe Acrobat Reader to be installed in your system 
#See help(im.convert) for details on the configuration of ImageMagick or GraphicsMagick.
#Creates animated plot in PDF for observations and variables
plot(res_iMCAl, labels = labels, animation = TRUE, frames = 10, movie_format = 'pdf')


[Package idm version 1.8.3 Index]