gofar_s {gofar}R Documentation

Generalize Sequential factor extraction via co-sparse unit-rank estimation (GOFAR(S)) using k-fold crossvalidation

Description

Divide and conquer approach for low-rank and sparse coefficent matrix estimation: Sequential

Usage

gofar_s(
  Yt,
  X,
  nrank = 3,
  nlambda = 40,
  family,
  familygroup = NULL,
  cIndex = NULL,
  ofset = NULL,
  control = list(),
  nfold = 5,
  PATH = FALSE
)

Arguments

Yt

response matrix

X

covariate matrix; when X = NULL, the fucntion performs unsupervised learning

nrank

an integer specifying the desired rank/number of factors

nlambda

number of lambda values to be used along each path

family

set of family gaussian, bernoulli, possion

familygroup

index set of the type of multivariate outcomes: "1" for Gaussian, "2" for Bernoulli, "3" for Poisson outcomes

cIndex

control index, specifying index of control variable in the design matrix X

ofset

offset matrix specified

control

a list of internal parameters controlling the model fitting

nfold

number of folds in k-fold crossvalidation

PATH

TRUE/FALSE for generating solution path of sequential estimate after cross-validation step

Value

C

estimated coefficient matrix; based on GIC

Z

estimated control variable coefficient matrix

Phi

estimted dispersion parameters

U

estimated U matrix (generalize latent factor weights)

D

estimated singular values

V

estimated V matrix (factor loadings)

lam

selected lambda values based on the chosen information criterion

familygroup

spcified familygroup of outcome variables.

fitCV

output from crossvalidation step, for each sequential step

References

Mishra, Aditya, Dipak K. Dey, Yong Chen, and Kun Chen. Generalized co-sparse factor regression. Computational Statistics & Data Analysis 157 (2021): 107127

Examples


family <- list(gaussian(), binomial(), poisson())
control <- gofar_control()
nlam <- 40 # number of tuning parameter
SD <- 123

# Simulated data for testing

data('simulate_gofar')
attach(simulate_gofar)
q <- ncol(Y)
p <- ncol(X)
#
# Simulate data with 20% missing entries
miss <- 0.20 # Proportion of entries missing
t.ind <- sample.int(n * q, size = miss * n * q)
y <- as.vector(Y)
y[t.ind] <- NA
Ym <- matrix(y, n, q)
naind <- (!is.na(Ym)) + 0 # matrix(1,n,q)
misind <- any(naind == 0) + 0
#
# Model fitting begins:
control$epsilon <- 1e-7
control$spU <- 50 / p
control$spV <- 25 / q
control$maxit <- 1000



# Model fitting: GOFAR(S) (full data)
set.seed(SD)
rank.est <- 5
fit.seq <- gofar_s(Y, X,
  nrank = rank.est, family = family,
  nlambda = nlam, familygroup = familygroup,
  control = control, nfold = 5
)


# Model fitting: GOFAR(S) (missing data)
set.seed(SD)
rank.est <- 5
fit.seq.m <- gofar_s(Ym, X,
  nrank = rank.est, family = family,
  nlambda = nlam, familygroup = familygroup,
  control = control, nfold = 5
)


[Package gofar version 0.1 Index]