surv_fit {survminer} | R Documentation |
Create Survival Curves
Description
Wrapper arround the standard survfit() function to create survival curves. Compared to the standard survfit() function, it supports also:
a list of data sets and/or a list of formulas,
a grouped data sets as generated by the function surv_group_by,
group.by option
There are many cases, where this function might be useful:
-
Case 1: One formula and One data set. Example: You want to fit the survival curves of one biomarker/gene in a given data set. This is the same as the standard survfit() function. Returns one survfit object.
-
Case 2: List of formulas and One data set. Example: You want to fit the survival curves of a list of biormarkers/genes in the same data set. Returns a named list of survfit objects in the same order as formulas.
-
Case 3: One formula and List of data sets. Example: You want to fit survival curves of one biomarker/gene in multiple cohort of patients (colon, lung, breast). Returns a named list of survfit objects in the same order as the data sets.
-
Case 4: List of formulas and List of data sets. Example: You want to fit survival curves of multiple biomarkers/genes in multiple cohort of patients (colon, lung, breast). Each formula will be applied to each of the data set in the data list. Returns a named list of survfit objects.
-
Case 5: One formula and grouped data sets by one or two variables. Example: One might like to plot the survival curves of patients treated by drug A vs patients treated by drug B in a dataset grouped by TP53 and/or RAS mutations. In this case use the argument
group.by
. Returns a named list of survfit objects. -
Case 6. In a rare case you might have a list of formulas and a list of data sets, and you might want to apply each formula to the mathcing data set with the same index/position in the list. For example formula1 is applied to data 1, formula2 is applied to data 2, and so on ... In this case formula and data lists should have the same length and you should specify the argument match.fd = TRUE ( stands for match formula and data). Returns a named list of survfit objects.
The output of the surv_fit
() function can be directly handled by the following functions:
These functions return one element or a list of elements depending on the format of the input.
Usage
surv_fit(formula, data, group.by = NULL, match.fd = FALSE, ...)
Arguments
formula |
survival formula. See survfit.formula. Can be a list of formula. Named lists are recommended. |
data |
a data frame in which to interpret the variables named in the formula. Can be a list of data sets. Named lists are recommended. Can be also a grouped dataset as generated by the function surv_group_by(). |
group.by |
a grouping variables to group the data set by. A character vector containing the name of grouping variables. Should be of length <= 2. |
match.fd |
logical value. Default is FALSE. Stands for "match formula and data". Useful only when you have a list of formulas and a list of data sets, and you want to apply each formula to the matching data set with the same index/position in the list. For example formula1 is applied to data 1, formula2 is applied to data 2, and so on .... In this case use match.fd = TRUE. |
... |
Other arguments passed to the survfit.formula function. |
Value
Returns an object of class survfit if one formula and one data set provided.
Returns a named list of survfit objects when input is a list of formulas and/or data sets. The same holds true when grouped data sets are provided or when the argument
group.by
is specified.If the names of formula and data lists are available, the names of the resulting survfit objects list are obtained by collapsing the names of formula and data lists.
If the formula names are not available, the variables in the formulas are extracted and used to build the name of survfit object.
In the case of grouped data sets, the names of survfit object list are obtained by collapsing the levels of grouping variables and the names of variables in the survival curve formulas.
Examples
library("survival")
library("magrittr")
# Case 1: One formula and One data set
#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
fit <- surv_fit(Surv(time, status) ~ sex,
data = colon)
surv_pvalue(fit)
# Case 2: List of formulas and One data set.
# - Different formulas are applied to the same data set
# - Returns a (named) list of survfit objects
#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
# Create a named list of formulas
formulas <- list(
sex = Surv(time, status) ~ sex,
rx = Surv(time, status) ~ rx
)
# Fit survival curves for each formula
fit <- surv_fit(formulas, data = colon)
surv_pvalue(fit)
# Case 3: One formula and List of data sets
#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
fit <- surv_fit(Surv(time, status) ~ sex,
data = list(colon, lung))
surv_pvalue(fit)
# Case 4: List of formulas and List of data sets
# - Each formula is applied to each of the data in the data list
# - argument: match.fd = FALSE
#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
# Create two data sets
set.seed(123)
colon1 <- dplyr::sample_frac(colon, 1/2)
set.seed(1234)
colon2 <- dplyr::sample_frac(colon, 1/2)
# Create a named list of formulas
formula.list <- list(
sex = Surv(time, status) ~ sex,
adhere = Surv(time, status) ~ adhere,
rx = Surv(time, status) ~ rx
)
# Fit survival curves
fit <- surv_fit(formula.list, data = list(colon1, colon2),
match.fd = FALSE)
surv_pvalue(fit)
# Grouped survfit
#:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
# - Group by the treatment "rx" and fit survival curves on each subset
# - Returns a list of survfit objects
fit <- surv_fit(Surv(time, status) ~ sex,
data = colon, group.by = "rx")
# Alternatively, do this
fit <- colon %>%
surv_group_by("rx") %>%
surv_fit(Surv(time, status) ~ sex, data = .)
surv_pvalue(fit)