R: Define splits for cross-fitting

make_splits {tidyhte}

R Documentation

Define splits for cross-fitting

Description

This takes a dataset, a column with a unique identifier and an arbitrary number of covariates on which to stratify the splits. It returns the original dataset with an additional column .split_id corresponding to an identifier for the split.

Usage

make_splits(data, identifier, ..., .num_splits)

Arguments

`data`	dataframe
`identifier`	Unquoted name of unique identifier column
`...`	variables on which to stratify (requires that `quickblock` be installed.)
`.num_splits`	number of splits to create. If VIMP is requested in `QoI_cfg`, this must be an even number.

Details

To see an example analysis, read vignette("experimental_analysis") in the context of an experiment, vignette("experimental_analysis") for an observational study, or vignette("methodological_details") for a deeper dive under the hood.

Value

original dataframe with additional .split_id column

Examples

library("dplyr")
if(require("palmerpenguins")) {
data(package = 'palmerpenguins')
penguins$unitid = seq_len(nrow(penguins))
penguins$propensity = rep(0.5, nrow(penguins))
penguins$treatment = rbinom(nrow(penguins), 1, penguins$propensity)
cfg <- basic_config() %>% 
add_known_propensity_score("propensity") %>%
add_outcome_model("SL.glm.interaction") %>%
remove_vimp()
attach_config(penguins, cfg) %>%
make_splits(unitid, .num_splits = 4) %>%
produce_plugin_estimates(outcome = body_mass_g, treatment = treatment, species, sex) %>%
construct_pseudo_outcomes(body_mass_g, treatment) %>%
estimate_QoI(species, sex)
}

[Package tidyhte version 1.0.2 Index]