big_univLogReg {bigstatsr} | R Documentation |
Slopes of column-wise logistic regressions of each column of a Filebacked Big Matrix, with some other associated statistics. Covariates can be added to correct for confounders.
big_univLogReg(
X,
y01.train,
ind.train = rows_along(X),
ind.col = cols_along(X),
covar.train = NULL,
tol = 1e-08,
maxiter = 20,
ncores = 1
)
X |
An object of class FBM. |
y01.train |
Vector of responses, corresponding to |
ind.train |
An optional vector of the row indices that are used, for the training part. If not specified, all rows are used. Don't use negative indices. |
ind.col |
An optional vector of the column indices that are used. If not specified, all columns are used. Don't use negative indices. |
covar.train |
Matrix of covariables to be added in each model to correct
for confounders (e.g. the scores of PCA), corresponding to |
tol |
Relative tolerance to assess convergence of the coefficient.
Default is |
maxiter |
Maximum number of iterations before giving up.
Default is |
ncores |
Number of cores used. Default doesn't use parallelism. You may use nb_cores. |
If convergence is not reached by the main algorithm for some columns,
the corresponding niter
element is set to NA
and a message is given.
Then, glm is used instead for the corresponding column.
If it can't converge either, all corresponding estimations are set to NA
.
A data.frame with 4 elements:
the slopes of each regression,
the standard errors of each slope,
the number of iteration for each slope. If is NA
, this means that the
algorithm didn't converge, and glm was used instead.
the z-scores associated with each slope.
This is also an object of class mhtest
. See methods(class = "mhtest")
.
set.seed(1)
X <- big_attachExtdata()
n <- nrow(X)
y01 <- sample(0:1, size = n, replace = TRUE)
covar <- matrix(rnorm(n * 3), n)
X1 <- X[, 1] # only first column of the Filebacked Big Matrix
# Without covar
test <- big_univLogReg(X, y01)
## new class `mhtest`
class(test)
attr(test, "transfo")
attr(test, "predict")
## plot results
plot(test)
plot(test, type = "Volcano")
## To get p-values associated with the test
test$p.value <- predict(test, log10 = FALSE)
str(test)
summary(glm(y01 ~ X1, family = "binomial"))$coefficients[2, ]
# With all data
str(big_univLogReg(X, y01, covar.train = covar))
summary(glm(y01 ~ X1 + covar, family = "binomial"))$coefficients[2, ]
# With only half of the data
ind.train <- sort(sample(n, n/2))
str(big_univLogReg(X, y01[ind.train],
covar.train = covar[ind.train, ],
ind.train = ind.train))
summary(glm(y01 ~ X1 + covar, family = "binomial",
subset = ind.train))$coefficients[2, ]