singR {singR} | R Documentation |
SImultaneous Non-Gaussian Component analysis for data integration.
This function combines all steps from the SING paper
n.comp.X = NULL,
n.comp.Y = NULL,
df = 0,
rho_extent = c("small", "medium", "large"),
Cplus = TRUE,
tol = 1e-10,
stand = FALSE,
distribution = "JB",
maxiter = 1500,
individual = FALSE,
whiten = c("sqrtprec", "eigenvec", "none"),
restarts.dbyd = 0,
restarts.pbyd = 20
dX |
original dataset for decomposition, matrix of n x px. |
dY |
original dataset for decomposition, matrix of n x py. |
n.comp.X |
the number of non-Gaussian components in dataset X. If null, will estimate the number using ICtest::FOBIasymp. |
n.comp.Y |
the number of non-Gaussian components in dataset Y. If null, will estimate the number using ICtest::FOBIasymp. |
df |
default value=0 when use JB, if df>0, estimates a density for the loadings using a tilted Gaussian (non-parametric density estimate). |
rho_extent |
Controls similarity of the scores in the two datasets. Numerical value and three options in character are acceptable. small, medium or large is defined from the JB statistic. Try "small" and see if the loadings are equal, then try others if needed. If numeric input, it will multiply the input by JBall to get the rho. |
Cplus |
whether to use C code (faster) in curvilinear search. |
tol |
difference tolerance in curvilinear search. |
stand |
whether to use standardization, if true, it will make the column and row means to 0 and columns sd to 1. If false, it will only make the row means to 0. |
distribution |
"JB" or "tiltedgaussian"; "JB" is much faster. In SING, this refers to the "density" formed from the vector of loadings. "tiltedgaussian" with large df can potentially model more complicated patterns. |
maxiter |
the max iteration number for the curvilinear search. |
individual |
whether to return the individual non-Gaussian components, default value = F. |
whiten |
whitening method used in lngca. Defaults to "svd" which uses the n left eigenvectors divided by sqrt(px-1) by 'eigenvec'. Optionally uses the square root of the n x n "precision" matrix by 'sqrtprec'. |
restarts.dbyd |
default = 0. These are d x d initial matrices padded with zeros, which results in initializations from the principal subspace. Can speed up convergence but may miss low variance non-Gaussian components. |
restarts.pbyd |
default = 20. Generates p x d random orthogonal matrices. Use a large number for large datasets. Note: it is recommended that you run lngca twice with different seeds and compare the results, which should be similar when a sufficient number of restarts is used. In practice, stability with large datasets and a large number of components can be challenging. |
Function outputs a list including the following:
variable loadings for joint NG components in dataset X with matrix rj x px.
variable loadings for joint NG components in dataset Y with matrix rj x py.
variable loadings for individual NG components in dataset X with matrix riX x px.
variable loadings for individual NG components in dataset Y with matrix riX x py.
scores of individual NG components in X with matrix n x riX.
scores of individual NG components in Y with matrix n x riY.
Estimated subject scores for joint components in dataset X with matrix n x rj.
Estimated subject scores for joint components in dataset Y with matrix n x rj.
Average of est.Mjx and est.Mjy as the subject scores for joint components in both datasets with matrix n x rj.
whether to use C version of curvilinear search.
the weight of rho in search
degree of freedom, = 0 when use JB, >0 when use tiltedgaussian.
#get simulation data
# use JB stat to compute with singR
# use tiltedgaussian distribution to compute with singR.
# tiltedgaussian may be more accurate but is considerably slower,
# and is not recommended for large datasets.
# use pmse to measure difference from the truth
pmse(M1 = t(output_JB$est.Mj),M2 = t(exampledata$mj),standardize = TRUE)
pmse(M1 = t(output_tilted$est.Mj),M2 = t(exampledata$mj),standardize = TRUE)