R: Weighted PCA for data nuggets

DN.Wpca {WCluster}

R Documentation

Weighted PCA for data nuggets

Description

This function conducts weighted PCA on data nuggets, considering data nugget centers and weights.

Usage

DN.Wpca(datanugget,wcol = NULL, corr = FALSE)

Arguments

`datanugget`	An object of class datanugget, i.e., the output of functions `create.DN` or `refine.DN` in the package `datanugget`.
`wcol`	Column Weights: Vector of weights for each variable of data nuggets. Must be of class numeric or integer or table. If NULL, column weights are not considered, i.e., weights equal 1 for all columns.
`corr`	A logical value indicating whether to use correlation matrix. This is recommended when the column weights are not equal. The default value is FALSE.

Details

Data nuggets are a representative sample meant to summarize Big Data by reducing a large dataset to a much smaller dataset by eliminating redundant points while also preserving the peripheries of the dataset. Each data nugget is defined by a center (location), weight (importance), and scale (internal variability). Data nuggets for a large dataset could be created and refined by functions create.DN or refine.DN in the package datanugget. Based on data nugget centers and weights, this function conducts weighted PCA by eigen method for data nugget centers with nugget weights as observational weights. Variable weights could also be included and considered in this function. Correlation matrix is recommended to use when the column weights are not equal.

Value

A list containing the following components:

`sdev`	the standard deviations of the weighted principal components (i.e., the square roots of the eigenvalues of the weighted covariance/correlation matrix).
`rotation`	The matrix of the loading vectors for each of the weighted prinicipal components.
`x`	The weighted prinicipal components.
`center`, `scale`	the weighted centering and scaling used.
`wrow`, `wcol`	row weights and column weights used.

Author(s)

Yajie Duan, Javier Cabrera, Ge Cheng

References

Amaratunga, D., & Cabrera, J. (2009). Exploration and analysis of DNA microarray and protein array data. John Wiley & Sons (Vol. 605).

Cherasia, K. E., Cabrera, J., Fernholz, L. T., & Fernholz, R. (2022). Data Nuggets in Supervised Learning. In Robust and Multivariate Statistical Methods: Festschrift in Honor of David E. Tyler (pp. 429-449). Cham: Springer International Publishing.

Beavers, T., Cheng, G., Duan, Y., Cabrera, J., Lubomirski, M., Amaratunga, D., Teigler, J. (2023). Data Nuggets: A Method for Reducing Big Data While Preserving Data Structure (Submitted for Publication)

Examples


      require(datanugget)

      ## small example
      X = cbind.data.frame(rnorm(10^3),
                           rnorm(10^3),
                           rnorm(10^3))

      suppressMessages({

        my.DN = create.DN(x = X,
                          R = 500,
                          delete.percent = .1,
                          DN.num1 = 500,
                          DN.num2 = 250,
                          no.cores = 0,
                          make.pbs = FALSE)

        my.DN.PCA.info = DN.Wpca(my.DN)

      })

      my.DN.PCA.info$sdev
      my.DN.PCA.info$rotation
      my.DN.PCA.info$x

[Package WCluster version 1.2.0 Index]