R: Generalize Sequential factor extraction via co-sparse...

gofar_s {gofar}

R Documentation

Generalize Sequential factor extraction via co-sparse unit-rank estimation (GOFAR(S)) using k-fold crossvalidation

Description

Divide and conquer approach for low-rank and sparse coefficent matrix estimation: Sequential

Usage

gofar_s(
  Yt,
  X,
  nrank = 3,
  nlambda = 40,
  family,
  familygroup = NULL,
  cIndex = NULL,
  ofset = NULL,
  control = list(),
  nfold = 5,
  PATH = FALSE
)

Arguments

`Yt`	response matrix
`X`	covariate matrix; when X = NULL, the fucntion performs unsupervised learning
`nrank`	an integer specifying the desired rank/number of factors
`nlambda`	number of lambda values to be used along each path
`family`	set of family gaussian, bernoulli, possion
`familygroup`	index set of the type of multivariate outcomes: "1" for Gaussian, "2" for Bernoulli, "3" for Poisson outcomes
`cIndex`	control index, specifying index of control variable in the design matrix X
`ofset`	offset matrix specified
`control`	a list of internal parameters controlling the model fitting
`nfold`	number of folds in k-fold crossvalidation
`PATH`	TRUE/FALSE for generating solution path of sequential estimate after cross-validation step

Value

`C`	estimated coefficient matrix; based on GIC
`Z`	estimated control variable coefficient matrix
`Phi`	estimted dispersion parameters
`U`	estimated U matrix (generalize latent factor weights)
`D`	estimated singular values
`V`	estimated V matrix (factor loadings)
`lam`	selected lambda values based on the chosen information criterion
`familygroup`	spcified familygroup of outcome variables.
`fitCV`	output from crossvalidation step, for each sequential step

References

Mishra, Aditya, Dipak K. Dey, Yong Chen, and Kun Chen. Generalized co-sparse factor regression. Computational Statistics & Data Analysis 157 (2021): 107127

Examples


family <- list(gaussian(), binomial(), poisson())
control <- gofar_control()
nlam <- 40 # number of tuning parameter
SD <- 123

# Simulated data for testing

data('simulate_gofar')
attach(simulate_gofar)
q <- ncol(Y)
p <- ncol(X)
#
# Simulate data with 20% missing entries
miss <- 0.20 # Proportion of entries missing
t.ind <- sample.int(n * q, size = miss * n * q)
y <- as.vector(Y)
y[t.ind] <- NA
Ym <- matrix(y, n, q)
naind <- (!is.na(Ym)) + 0 # matrix(1,n,q)
misind <- any(naind == 0) + 0
#
# Model fitting begins:
control$epsilon <- 1e-7
control$spU <- 50 / p
control$spV <- 25 / q
control$maxit <- 1000



# Model fitting: GOFAR(S) (full data)
set.seed(SD)
rank.est <- 5
fit.seq <- gofar_s(Y, X,
  nrank = rank.est, family = family,
  nlambda = nlam, familygroup = familygroup,
  control = control, nfold = 5
)


# Model fitting: GOFAR(S) (missing data)
set.seed(SD)
rank.est <- 5
fit.seq.m <- gofar_s(Ym, X,
  nrank = rank.est, family = family,
  nlambda = nlam, familygroup = familygroup,
  control = control, nfold = 5
)

[Package gofar version 0.1 Index]