r2_diff {r2redux} | R Documentation |
r2_diff function
Description
This function estimates var(R2(y~x[,v1]) - R2(y~x[,v2])) where R2 is the R squared value of the model, y is N by 1 matrix having the dependent variable, and x is N by M matrix having M explanatory variables. v1 or v2 indicates the ith column in the x matrix (v1 or v2 can be multiple values between 1 - M, see Arguments below)
Usage
r2_diff(dat, v1, v2, nv)
Arguments
dat |
N by (M+1) matrix having variables in the order of cbind(y,x) |
v1 |
This can be set as v1=c(1) or v1=c(1,2) |
v2 |
This can be set as v2=c(2), v2=c(3), v2=c(1,3) or v2=c(3,4) |
nv |
Sample size |
Value
This function will estimate significant difference between two PGS (either dependent or independent and joint or single). To get the test statistics for the difference between R2(y~x[,v1]) and R2(y~x[,v2]). (here we define R2_1=R2(y~x[,v1])) and R2_2=R2(y~x[,v2]))). The outputs are listed as follows.
rsq1 |
R2_1 |
rsq2 |
R2_2 |
var1 |
Variance of R2_1 |
var2 |
variance of R2_2 |
var_diff |
Variance of difference between R2_1 and R2_2 |
r2_based_p |
two tailed P-value for significant difference between R2_1 and R2_2 |
r2_based_p_one_tail |
one tailed P-value for significant difference |
mean_diff |
Differences between R2_1 and R2_2 |
upper_diff |
Upper limit of 95% CI for the difference |
lower_diff |
Lower limit of 95% CI for the difference |
Examples
#To get the test statistics for the difference between R2(y~x[,1]) and
#R2(y~x[,2]). (here we define R2_1=R2(y~x[,1])) and R2_2=R2(y~x[,2])))
dat=dat1
nv=length(dat$V1)
v1=c(1)
v2=c(2)
output=r2_diff(dat,v1,v2,nv)
output
#r2redux output
#output$rsq1 (R2_1)
#0.03836254
#output$rsq2 (R2_2)
#0.03881135
#output$var1 (variance of R2_1)
#0.0001436128
#output$var2 (variance of R2_2)
#0.0001451358
#output$var_diff (variance of difference between R2_1 and R2_2)
#5.678517e-07
#output$r2_based_p (two tailed p-value for significant difference)
#0.5514562
#output$r2_based_p_one_tail(one tailed p-value for significant difference)
#0.2757281
#output$mean_diff (differences between R2_1 and R2_2)
#-0.0004488044
#output$upper_diff (upper limit of 95% CI for the difference)
#0.001028172
#output$lower_diff (lower limit of 95% CI for the difference)
#-0.001925781
#output$$p$nested
#1
#output$$p$nonnested
#0.5514562
#output$$p$LRT
#1
#To get the test statistics for the difference between R2(y~x[,1]+x[,2]) and
#R2(y~x[,2]). (here R2_1=R2(y~x[,1]+x[,2]) and R2_2=R2(y~x[,1]))
dat=dat1
nv=length(dat$V1)
v1=c(1,2)
v2=c(1)
output=r2_diff(dat,v1,v2,nv)
#r2redux output
#output$rsq1 (R2_1)
#0.03896678
#output$rsq2 (R2_2)
#0.03836254
#output$var1 (variance of R2_1)
#0.0001473686
#output$var2 (variance of R2_2)
#0.0001436128
#output$var_diff (variance of difference between R2_1 and R2_2)
#2.321425e-06
#output$r2_based_p (p-value for significant difference between R2_1 and R2_2)
#0.4366883
#output$mean_diff (differences between R2_1 and R2_2)
#0.0006042383
#output$upper_diff (upper limit of 95% CI for the difference)
#0.00488788
#output$lower_diff (lower limit of 95% CI for the difference)
#-0.0005576171
#Note: If the directions are not consistent, for instance, if one correlation
#is positive (R_1) and another is negative (R_2), or vice versa, it is crucial
#to approach the interpretation of the comparative test with caution.
#It's important to note that R^2 alone does not provide information about the
#direction or sign of the relationships between predictors and the response variable.
#When faced with multiple predictors common between two models, for example,
#y = any_cov1 + any_cov2 + ... + any_covN + e vs.
#y = PRS + any_cov1 + any_cov2 +...+ any_covN + e
#A more streamlined approach can be adopted by consolidating the various
#predictors into a single predictor (see R code below).
#R
#dat=dat1
#here let's assume, we wanted to test one PRS (dat$V2)
#with 5 covariates (dat$V7 to dat$V11)
#mod1 <- lm(dat$V1~dat$V2 + dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#merged_predictor1 <- mod1$fitted.values
#mod2 <- lm(dat$V1~ dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#merged_predictor2 <- mod2$fitted.values
#dat=data.frame(dat$V1,merged_predictor1,merged_predictor2)
#the comparison can be equivalently expressed as:
#y = merged_predictor1 + e vs.
#y = merged_predictor2 + e
#This comparison can be simply achieved using the r2_diff function, e.g.
#To get the test statistics for the difference between R2(y~x[,1]) and
#R2(y~x[,2]). (here x[,1]= merged_predictor2 (from full model),
#and x[,2]= merged_predictor1(from reduced model))
#v1=c(1)
#v2=c(2)
#output=r2_diff(dat,v1,v2,nv)
#note that the merged predictor from the full model (v1) should be the first.
#str(output)
#List of 11
#$ rsq1 : num 0.0428
#$ rsq2 : num 0.042
#$ var1 : num 0.0.000158
#$ var2 : num 0.0.000156
#$ var_diff : num 2.87e-06
#$ r2_based_p : num 0.658
#$ r2_based_p_one_tail: num 0.329
#$ mean_diff : num 0.000751
#$ upper_diff : num 0.00407
#$ lower_diff : num -0.00257
#$ p :List of 3
#..$ nested : num 0.386
#..$ nonnested: num 0.658
#..$ LRT : num 0.376
#Importantly note that in this case, merged_predictor1 is nested within
#merged_predictor2 (see mod1 vs. mod2 above). Therefore, this is
#nested model comparison. So, output$p$nested (0.386) should be used
#instead of output$p$nonnested (0.658).
#Note that r2_based_p is the same as output$p$nonnested (0.658) here.
##For this scenario, alternatively, the outcome variable (y) can be preadjusted
#with covariate(s), following the procedure in R:
#mod <- lm(y ~ any_cov1 + any_cov2 + ... + any_covN)
#y_adj=scale(mod$residuals)
#then, the comparative significance test can be approximated by using
#the following model y_adj = PRS (r2_var(dat, v1, nv))
#R
#dat=dat1
#mod <- lm(dat$V1~dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#y_adj=scale(mod$residuals)
#dat=data.frame(y_adj,dat$V2)
#v1=c(1)
#output=r2_var(dat, v1, nv)
#str(output)
#$ var : num 2e-06
#$ LRT_p :Class 'logLik' : 0.98 (df=2)
#$ r2_based_p: num 0.977
#$ rsq : num 8.21e-07
#$ upper_r2 : num 0.00403
#$ lower_r2 : num -0.000999
#In another scenario where the same covariates, but different
#PRS1 and PRS2 are compared,
#y = PRS1 + any_cov1 + any_cov2 + ... + any_covN + e vs.
#y = PRS2 + any_cov1 + any_cov2 + ... + any_covN + e
#following approach can be employed (see R code below).
#R
#dat=dat1
#here let's assume dat$V2 as PRS1, dat$V3 as PRS2 and dat$V7 to dat$V11 as covariates
#mod1 <- lm(dat$V1~dat$V2 + dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#merged_predictor1 <- mod1$fitted.values
#mod2 <- lm(dat$V1~dat$V3 + dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#merged_predictor2 <- mod2$fitted.values
#dat=data.frame(dat$V1,merged_predictor2,merged_predictor1)
#the comparison can be equivalently expressed as:
#y = merged_predictor1 + e vs.
#y = merged_predictor2 + e
#This comparison can be simply achieved using the r2_diff function, e.g.
#To get the test statistics for the difference between R2(y~x[,1]) and
#R2(y~x[,2]). (here x[,1]= merged_predictor2, and x[,2]= merged_predictor1)
#v1=c(1)
#v2=c(2)
#output=r2_diff(dat,v1,v2,nv)
#str(output)
#List of 11
#$ rsq1 : num 0.043
#$ rsq2 : num 0.0428
#$ var1 : num 0.000159
#$ var2 : num 0.000158
#$ var_diff : num 2.6e-07
#$ r2_based_p : num 0.657
#$ r2_based_p_one_tail: num 0.328
#$ mean_diff : num 0.000227
#$ upper_diff : num 0.00123
#$ lower_diff : num 0.000773
#$ p :List of 3
#..$ nested : num 0.634
#..$ nonnested: num 0.657
#..$ LRT : num 0.627
#Importantly note that in this case, merged_predictor1 and merged_predictor2
#are not nested to each other (see mod1 vs. mod2 above).
#Therefore, this is nonnested model comparison.
#So, output$p$nonnested (0.657) should be used instead of
#output$p$nested (0.634). Note that r2_based_p is the same
#as output$p$nonnested (0.657) here.
#For the above non-nested scenario, alternatively, the outcome variable (y)
#can be preadjusted with covariate(s), following the procedure in R:
#mod <- lm(y ~ any_cov1 + any_cov2 + ... + any_covN)
#y_adj=scale(mod$residuals)
#R
#dat=dat1
#mod <- lm(dat$V1~dat$V7+ dat$V8+ dat$V9+ dat$V10+ dat$V11)
#y_adj=scale(mod$residuals)
#dat=data.frame(y_adj,dat$V3,dat$V2)
#the comparison can be equivalently expressed as:
#y_adj = PRS1 + e vs.
#y_adj = PRS2 + e
#then, the comparative significance test can be approximated by using r2_diff function
#To get the test statistics for the difference between R2(y~x[,1]) and
#R2(y~x[,2]). (here x[,1]= PRS1 and x[,2]= PRS2)
#v1=c(1)
#v2=c(2)
#output=r2_diff(dat,v1,v2,nv)
#str(output)
#List of 11
#$ rsq1 : num 5.16e-05
#$ rsq2 : num 4.63e-05
#$ var1 : num 2.21e-06
#$ var2 : num 2.18e-06
#$ var_diff : num 1.31e-09
#$ r2_based_p : num 0.884
#$ r2_based_p_one_tail: num 0.442
#$ mean_diff : num 5.28e-06
#$ upper_diff : num 7.63e-05
#$ lower_diff : num -6.57e-05
#$ p :List of 3
#..$ nested : num 0.942
#..$ nonnested: num 0.884
#..$ LRT : num 0.942