ple_train {StratifiedMedicine} | R Documentation |
Patient-level Estimates: Train Model
Description
Wrapper function to train a patient-level estimate (ple) model. Used directly in PRISM and can be used to directly fit a ple model by name.
Usage
ple_train(
Y,
A,
X,
Xtest = NULL,
family = "gaussian",
propensity = FALSE,
ple = "ranger",
meta = ifelse(family == "survival", "T-learner", "X-learner"),
hyper = NULL,
tau = NULL,
...
)
Arguments
Y |
The outcome variable. Must be numeric or survival (ex; Surv(time,cens) ) |
A |
Treatment variable. (Default supports binary treatment, either numeric or factor). "ple_train" accomodates >2 along with binary treatments. |
X |
Covariate space. |
Xtest |
Test set. Default is NULL (no test predictions). Variable types should match X. |
family |
Outcome type. Options include "gaussion" (default), "binomial", and "survival". |
propensity |
Propensity score estimation, P(A=a|X). Default=FALSE which use the marginal estimates, P(A=a) (applicable for RCT data). If TRUE, will use the "ple" base learner to estimate P(A=a|X). |
ple |
Base-learner used to estimate patient-level equantities, such as the conditional average treatment effect (CATE), E(Y|A=1,X)-E(Y|A=0, X) = CATE(X). Default is random based based through "ranger". "None" uses no ple. See below for details on estimating the treatment contrasts. |
meta |
Using the ple model as a base learner, meta-learners can be used for estimating patient-level treatment differences. Options include "T-learner" (treatment specific models), "S-learner" (single model), and "X-learner". For family="gaussian" & "binomial", the default is "X-learner", which uses a two-stage regression approach (See Kunzel et al 2019). For "survival", the default is "T-learner". "X-learner" is currently not supported for survival outcomes. |
hyper |
Hyper-parameters for the ple model (must be list). Default is NULL. |
tau |
Maximum follow-up time for RMST based estimates (family="survival"). Default=NULL, which takes min(max(time[a])), for a=1,..,A. |
... |
Any additional parameters, not currently passed through. |
Details
ple_train uses base-learners along with a meta-learner to obtain patient-level
estimates under different treatment exposures (see Kunzel et al). For family="gaussian"
or "binomial", output estimates of \mu(a,x)=E(Y|x,a)
and treatment differences
(average treatment effect or risk difference). For survival, either logHR based estimates
or RMST based estimates can be obtained. Current base-learner ("ple") options include:
1. linear: Uses either linear regression (family="gaussian"), logistic regression (family="binomial"), or cox regression (family="survival"). No hyper-parameters.
2. ranger: Uses random forest ("ranger" R package). The default hyper-parameters are: hyper = list(mtry=NULL, min.node.pct=0.10)
where mtry is number of randomly selected variables (default=NULL; sqrt(dim(X))) and min.node.pct is the minimum node size as a function of the total data size (ex: min.node.pct=10% requires at least 10
3. glmnet: Uses elastic net ("glmnet" R package). The default hyper-parameters are: hyper = list(lambda="lambda.min")
where lambda controls the penalty parameter for predictions. lambda="lambda.1se" will likely result in a less complex model.
4. bart: Uses bayesian additive regression trees (Chipman et al 2010; BART R package). Default hyper-parameters are:
hyper = list(sparse=FALSE)
where sparse controls whether to perform variable selection based on a sparse Dirichlet prior rather than simply uniform.
Value
Trained ple models and patient-level estimates for train/test sets.
mod - trained model(s)
mu_train - Patient-level estimates (training set)
mu_test - Patient-level estimates (test set)
References
Wright, M. N. & Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 77:1-17. doi: 10.18637/jss.v077.i01
Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent, https://web.stanford.edu/~hastie/Papers/glmnet.pdf Journal of Statistical Software, Vol. 33(1), 1-22 Feb 2010 Vol. 33(1), 1-22 Feb 2010.
Chipman, H., George, E., and McCulloch R. (2010) Bayesian Additive Regression Trees. The Annals of Applied Statistics, 4,1, 266-298
Kunzel S, Sekhon JS, Bickel PJ, Yu B. Meta-learners for Estimating Hetergeneous Treatment Effects using Machine Learning. 2019.
See Also
Examples
library(StratifiedMedicine)
## Continuous ##
dat_ctns = generate_subgrp_data(family="gaussian")
Y = dat_ctns$Y
X = dat_ctns$X
A = dat_ctns$A
# X-Learner (With ranger based learners)
mod1 = ple_train(Y=Y, A=A, X=X, Xtest=X, ple="ranger", method="X-learner")
summary(mod1$mu_train)
# T-Learner (Treatment specific)
mod2 = ple_train(Y=Y, A=A, X=X, Xtest=X, ple="ranger", method="T-learner")
summary(mod2$mu_train)
mod3 = ple_train(Y=Y, A=A, X=X, Xtest=X, ple="bart", method="X-learner")
summary(mod3$mu_train)