gofar_s {gofar} | R Documentation |
Generalize Sequential factor extraction via co-sparse unit-rank estimation (GOFAR(S)) using k-fold crossvalidation
Description
Divide and conquer approach for low-rank and sparse coefficent matrix estimation: Sequential
Usage
gofar_s(
Yt,
X,
nrank = 3,
nlambda = 40,
family,
familygroup = NULL,
cIndex = NULL,
ofset = NULL,
control = list(),
nfold = 5,
PATH = FALSE
)
Arguments
Yt |
response matrix |
X |
covariate matrix; when X = NULL, the fucntion performs unsupervised learning |
nrank |
an integer specifying the desired rank/number of factors |
nlambda |
number of lambda values to be used along each path |
family |
set of family gaussian, bernoulli, possion |
familygroup |
index set of the type of multivariate outcomes: "1" for Gaussian, "2" for Bernoulli, "3" for Poisson outcomes |
cIndex |
control index, specifying index of control variable in the design matrix X |
ofset |
offset matrix specified |
control |
a list of internal parameters controlling the model fitting |
nfold |
number of folds in k-fold crossvalidation |
PATH |
TRUE/FALSE for generating solution path of sequential estimate after cross-validation step |
Value
C |
estimated coefficient matrix; based on GIC |
Z |
estimated control variable coefficient matrix |
Phi |
estimted dispersion parameters |
U |
estimated U matrix (generalize latent factor weights) |
D |
estimated singular values |
V |
estimated V matrix (factor loadings) |
lam |
selected lambda values based on the chosen information criterion |
familygroup |
spcified familygroup of outcome variables. |
fitCV |
output from crossvalidation step, for each sequential step |
References
Mishra, Aditya, Dipak K. Dey, Yong Chen, and Kun Chen. Generalized co-sparse factor regression. Computational Statistics & Data Analysis 157 (2021): 107127
Examples
family <- list(gaussian(), binomial(), poisson())
control <- gofar_control()
nlam <- 40 # number of tuning parameter
SD <- 123
# Simulated data for testing
data('simulate_gofar')
attach(simulate_gofar)
q <- ncol(Y)
p <- ncol(X)
#
# Simulate data with 20% missing entries
miss <- 0.20 # Proportion of entries missing
t.ind <- sample.int(n * q, size = miss * n * q)
y <- as.vector(Y)
y[t.ind] <- NA
Ym <- matrix(y, n, q)
naind <- (!is.na(Ym)) + 0 # matrix(1,n,q)
misind <- any(naind == 0) + 0
#
# Model fitting begins:
control$epsilon <- 1e-7
control$spU <- 50 / p
control$spV <- 25 / q
control$maxit <- 1000
# Model fitting: GOFAR(S) (full data)
set.seed(SD)
rank.est <- 5
fit.seq <- gofar_s(Y, X,
nrank = rank.est, family = family,
nlambda = nlam, familygroup = familygroup,
control = control, nfold = 5
)
# Model fitting: GOFAR(S) (missing data)
set.seed(SD)
rank.est <- 5
fit.seq.m <- gofar_s(Ym, X,
nrank = rank.est, family = family,
nlambda = nlam, familygroup = familygroup,
control = control, nfold = 5
)