pcsslm {pcsstools} | R Documentation |
Approximate a linear model using PCSS
Description
pcsslm
approximates a linear model of a combination of variables using
precomputed summary statistics.
Usage
pcsslm(formula, pcss = list(), ...)
Arguments
formula |
an object of class formula whose dependent variable is a
combination of variables and logical | operators.
All model terms must have appropriate PCSS in |
pcss |
a list of precomputed summary statistics. In all cases, this
should include |
... |
additional arguments. See Details for more information. |
Details
pcsslm
parses the input formula
's dependent variable for
functions such as sums (+
), products (*
), or logical
operators (|
and &
).
It then identifies models the combination of variables using one of
model_combo
, model_product
,
model_or
, model_and
, or
model_prcomp
.
Different precomputed summary statistics are needed inside pcss
depending on the function that combines the dependent variable.
For linear combinations (and principal component analysis), only
n
,means
, andcovs
are requiredFor products and logical combinations, the additional items
predictors
andresponses
are required. These are named lists of objects of classpredictor
generated bynew_predictor
, with apredictor
object for each independent variable inpredictors
and each dependent variable inresponses
. However, if only modeling the product or logical combination of only two variables,responses
can beNULL
without consequence.
If modeling a principal component score of a set of variables, include
the argument comp
where comp
is an integer indicating which
principal component score to analyze. Optional logical arguments
center
and standardize
determine if responses should be
centered and standardized before principal components are calculated.
If modeling a linear combination, include the argument phi
, a named
vector of linear weights for each variable in the dependent variable in
formula.
If modeling a product, include the argument response
, a character
equal to either "continuous"
or "binary"
. If "binary"
,
specialized approximations are performed to estimate means and variances.
Value
an object of class "pcsslm"
.
An object of class "pcsslm"
is a list containing at least the
following components:
call |
the matched call |
terms |
the |
coefficients |
a |
sigma |
the square root of the estimated variance of the random error. |
df |
degrees of freedom, a 3-vector |
fstatistic |
a 3-vector with the value of the F-statistic with its numerator and denominator degrees of freedom. |
r.squared |
|
adj.r.squared |
the above |
cov.unscaled |
a |
Sum Sq |
a 3-vector with the model's Sum of Squares Regression (SSR), Sum of Squares Error (SSE), and Sum of Squares Total (SST). |
References
Wolf JM, Westra J, Tintle N (2021). “Using Summary Statistics to Model Multiplicative Combinations of Initially Analyzed Phenotypes With a Flexible Choice of Covariates.” Frontiers in Genetics, 12, 1962. ISSN 1664-8021, doi:10.3389/fgene.2021.745901, https://www.frontiersin.org/articles/10.3389/fgene.2021.745901/full.
Wolf JM, Barnard M, Xia X, Ryder N, Westra J, Tintle N (2020). “Computationally efficient, exact, covariate-adjusted genetic principal component analysis by leveraging individual marker summary statistics from large biobanks.” Pacific Symposium on Biocomputing, 25, 719–730. ISSN 2335-6928, doi:10.1142/9789811215636_0063, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6907735/.
Gasdaska A, Friend D, Chen R, Westra J, Zawistowski M, Lindsey W, Tintle N (2019). “Leveraging summary statistics to make inferences about complex phenotypes in large biobanks.” Pacific Symposium on Biocomputing, 24, 391–402. ISSN 2335-6928, doi:10.1142/9789813279827_0036, https://pubmed.ncbi.nlm.nih.gov/30963077/.
See Also
model_combo
, model_product
,
model_or
, model_and
, and
model_prcomp
.
Examples
## Principal Component Analysis
ex_data <- pcsstools_example[c("g1", "x1", "y1", "y2", "y3")]
pcss <- list(
means = colMeans(ex_data),
covs = cov(ex_data),
n = nrow(ex_data)
)
pcsslm(y1 + y2 + y3 ~ g1 + x1, pcss = pcss, comp = 1)
## Linear combination of variables
ex_data <- pcsstools_example[c("g1", "g2", "y1", "y2")]
pcss <- list(
means = colMeans(ex_data),
covs = cov(ex_data),
n = nrow(ex_data)
)
pcsslm(y1 + y2 ~ g1 + g2, pcss = pcss, phi = c(1, -1))
summary(lm(y1 - y2 ~ g1 + g2, data = ex_data))
## Product of variables
ex_data <- pcsstools_example[c("g1", "x1", "y4", "y5", "y6")]
pcss <- list(
means = colMeans(ex_data),
covs = cov(ex_data),
n = nrow(ex_data),
predictors = list(
g1 = new_predictor_snp(maf = mean(ex_data$g1) / 2),
x1 = new_predictor_normal(mean = mean(ex_data$x1), sd = sd(ex_data$x1))
),
responses = lapply(
colMeans(ex_data)[3:length(colMeans(ex_data))],
new_predictor_binary
)
)
pcsslm(y4 * y5 * y6 ~ g1 + x1, pcss = pcss, response = "binary")
summary(lm(y4 * y5 * y6 ~ g1 + x1, data = ex_data))
## Disjunct (OR statement) of variables
ex_data <- pcsstools_example[c("g1", "x1", "y4", "y5")]
pcss <- list(
means = colMeans(ex_data),
covs = cov(ex_data),
n = nrow(ex_data),
predictors = list(
g1 = new_predictor_snp(maf = mean(ex_data$g1) / 2),
x1 = new_predictor_normal(mean = mean(ex_data$x1), sd = sd(ex_data$x1))
)
)
pcsslm(y4 | y5 ~ g1 + x1, pcss = pcss)
summary(lm(y4 | y5 ~ g1 + x1, data = ex_data))