R: Functional PCA of probability densities among time

fpcat {dad}

R Documentation

Functional PCA of probability densities among time

Description

Performs functional principal component analysis of probability densities in order to describe a data “foldert”, consisting of individuals on which are observed p variables on T times. It returns an object of class fpcat.

Usage

fpcat(xf, group.name="time", method = 1, ind = 1, nvar = NULL, gaussiand = TRUE,
    windowh = NULL, normed=TRUE, centered=TRUE, data.centered = FALSE,
    data.scaled = FALSE, common.variance = FALSE, nb.factors = 3, nb.values = 10,
    sub.title = "", plot.eigen = TRUE, plot.score = FALSE, nscore = 1:3,
    filename = NULL)

Arguments

`xf`	object of class `"foldert"` or data.frame. An object of class `"foldert"` is a list of data frames with the same column names, each of them corresponding to a time of observation. Its elements are data frames with `p` numeric columns. If there are non numeric columns, there is an error. The `t^{th}` element (`t = 1, \ldots, T`) matches with the `t^{th}` time of observation. If it is a data frame: If `method=1`: the column with name given by the `group.name` argument is a factor giving the groups. The other columns are all numeric; otherwise, there is an error. If `method=2`: the column named after the `ind` argument contains the identifiers of the measured objects, and the observations are organized as follows: Given `timecol` the number of the column named by the `group.name` argument, the observations corresponding to the 1st time are on columns `timecol : (timecol + nvar - 1)` the observations corresponding to the 2nd time are on columns `(timecol + nvar) : (timecol + 2 * nvar - 1)` and so on.
`group.name`	string or numeric. If `xf` is an object of class `"foldert"`, string. Name of the grouping variable, that is the observation times. The default is `groupname = "time"`. If `xf` is a data frame, string or numeric, as the `ind` argument of `as.foldert.data.frame`. If `method = 1`, `timecol` is the name or the number of the column of x containing the times of observation, or the number of this column. `x[, timecol]` must be of class `"numeric"`, `"ordered"`, `"Date"`, `"POSIXlt"` or `"POSIXct"`, otherwise, there is an error. If `method=2`, `timecol` is the name or the number of the first column corresponding to the first observation. If there are duplicated column names and several columns are named by `timecol`, the first one is considered.
`method`	if `xf` is a data frame, 1 or 2. Omitted if `xf` is an object of class `"foldert"`. If `xf` is a data frame, `method` indicates the layout of this data frame and, therefore, the method used to extract the data and build the foldert. If `method = 1`, there is a column containing the identifiers of the measured objects and a column containing the times. The other columns contain the observations. If `method = 2`, there is a column containing the identifiers of the measured objects, and the observations are organized as follows: the observations corresponding to the 1st time are on columns `timecol : (timecol + nvar - 1)` the observations corresponding to the 2nd time are on columns `(timecol + nvar) : (timecol + 2 * nvar - 1)` and so on.
`ind`	if `xf` is a data frame, string or numeric. Omitted if `xf` is an object of class `"foldert"`. The name of the column of x containing the indentifiers of the measured objects, or the number of this column. See the `ind` argument of `as.foldert.data.frame`.
`nvar`	if `xf` is a data frame and `mathod=2`, string or numeric. Omitted if `xf` is an object of class `"foldert"` or if `method=1`. The number of variable measured at each observation time. See the `ind` argument of `as.foldert.data.frame`.

All other arguments are the same as for fpcad.

`gaussiand`	logical. If `TRUE` (default), the probability densities are supposed Gaussian. If `FALSE`, densities are estimated using the Gaussian kernel method (as `fpcad`).
`windowh`	either a list of `T` bandwidths (one per density associated to a group), or a strictly positive number. If `windowh = NULL` (default), the bandwidths are automatically computed (as `fpcad`). See Details.
`normed`	logical. If `TRUE` (default), the densities are normed before computing the distances (as `fpcad`).
`centered`	logical. If `TRUE` (default), the densities are centered (as `fpcad`).
`data.centered`	logical. If `TRUE` (default is `FALSE`), the data of each group are centered (as `fpcad`).
`data.scaled`	logical. If `TRUE` (default is `FALSE`), the data of each group are centered (even if `data.centered = FALSE`) and scaled (as `fpcad`).
`common.variance`	logical. If `TRUE` (default is `FALSE`), a common covariance matrix (or correlation matrix if `data.scaled = TRUE`), computed on the whole data, is used. If `FALSE` (default), a covariance (or correlation) matrix per group is used (as `fpcad`).
`nb.factors`	numeric. Number of returned principal scores (default `nb.factors = 3`) (as `fpcad`). Warning: The `plot.fpcad` and `interpret.fpcad` functions cannot take into account more than `nb.factors` principal factors (as `fpcad`).
`nb.values`	numerical. Number of returned eigenvalues (default `nb.values = 10`) (as `fpcad`).
`sub.title`	string. Subtitle for the graphs (default `NULL`) (as `fpcad`).
`plot.eigen`	logical. If `TRUE` (default), the barplot of the eigenvalues is plotted (as `fpcad`).
`plot.score`	logical. If `TRUE`, the graphs of principal scores are plotted. A new graphic device is opened for each pair of principal scores defined by `nscore` argument (as `fpcad`).
`nscore`	numeric vector. If `plot.score = TRUE`, the numbers of the principal scores which are plotted. By default it is equal to `nscore = 1:3`. Its components cannot be greater than `nb.factors` (as `fpcad`).
`filename`	string. Name of the file in which the results are saved. By default (`filename = NULL`) the results are not saved (as `fpcad`).

Details

The T probability densities f_t corresponding to the T times of observation are either parametrically estimated or estimated using the Gaussian kernel method (see fpcad for the use of the arguments indicating the method used to estimate these densities).

Value

Returns an object of class fpcat, that is a list including:

`times`	vector of the times of observation.
`inertia`	data frame of the eigenvalues and percentages of inertia.
`contributions`	data frame of the contributions to the first `nb.factors` principal components.
`qualities`	data frame of the qualities on the first `nb.factors` principal factors.
`scores`	data frame of the first `nb.factors` principal scores.
`norm`	vector of the `L^2` norms of the densities.
`means`	list of the means.
`variances`	list of the covariance matrices.
`correlations`	list of the correlation matrices.
`skewness`	list of the skewness coefficients.
`kurtosis`	list of the kurtosis coefficients.

Author(s)

Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Gilles Hunault, Sabine Demotes-Mainard

References

Boumaza, R. (1998). Analyse en composantes principales de distributions gaussiennes multidimensionnelles. Revue de Statistique Appliqu?e, XLVI (2), 5-20.

Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.

Delicado, P. (2011). Dimensionality reduction when data are density functions. Computational Statistics & Data Analysis, 55, 401-420.

Yousfi, S., Boumaza, R., Aissani, D., Adjabi, S. (2014). Optimal bandwith matrices in functional principal component analysis of density functions. Journal of Statistical Computation and Simulation, 85 (11), 2315-2330.

Examples

times <- as.Date(c("2017-03-01", "2017-04-01", "2017-05-01", "2017-06-01"))
x1 <- data.frame(z1=rnorm(6,1,5), z2=rnorm(6,3,3))
x2 <- data.frame(z1=rnorm(6,4,6), z2=rnorm(6,5,2))
x3 <- data.frame(z1=rnorm(6,7,2), z2=rnorm(6,8,4))
x4 <- data.frame(z1=rnorm(6,9,3), z2=rnorm(6,10,2))
ft <- foldert(x1, x2, x3, x4, times = times, rows.select="intersect")
print(ft)
result <- fpcat(ft)
print(result)
plot(result)

[Package dad version 4.1.2 Index]