node_conditional_prob {simDAG} | R Documentation |
Simulate a Node Using Conditional Probabilities
Description
This function can be used to generate dichotomous or categorical variables dependent on one or more categorical variables where the probabilities of occurrence in each strata defined by those variables is known.
Usage
node_conditional_prob(data, parents, probs, default_probs=NULL,
default_val=NA, labels=NULL,
coerce2factor=FALSE, check_inputs=TRUE)
Arguments
data |
A |
parents |
A character vector specifying the names of the parents that this particular child node has. |
probs |
A named list where each element corresponds to one stratum defined by parents. If only one name is given in |
default_probs |
If not all possible strata of |
default_val |
Value of the produced variable in strata that are not included in the |
labels |
A vector of labels for the generated output. If |
coerce2factor |
A single logical value specifying whether to return the drawn events as a factor or not. |
check_inputs |
A single logical value specifying whether input checks should be performed or not. Set to |
Details
Utilizing the user-defined discrete probability distribution in each stratum of parents
(supplied using the probs
argument), this function simply calls either the rbernoulli
or the rcategorical
function.
Formal Description:
Formally, the data generation process can be described as a series of conditional equations. For example, suppose that there is just one parent node sex
with the levels male
and female
with the goal of creating a binary outcome that has a probability of occurrence of 0.5 for males and 0.7 for females. The conditional equation is then:
Y \sim Bernoulli(p),
where:
p = \begin{cases}
0.5, & \text{if } \texttt{sex="male"} \\
0.7, & \text{if } \texttt{sex="female"} \\
\end{cases},
and Bernoulli(p)
is the Bernoulli distribution with success probability p
. If the outcome has more than two categories, the Bernoulli distribution would be replaced by Multinomial(p)
with p
being replaced by a matrix of class probabilities. If there are more than two variables, the conditional distribution would be stratified by the intersection of all subgroups defined by the variables.
Value
Returns a numeric vector of length nrow(data)
.
Author(s)
Robin Denz
See Also
empty_dag
, node
, node_td
, sim_from_dag
, sim_discrete_time
Examples
library(simDAG)
set.seed(42)
#### two classes, one parent node ####
# define conditional probs
probs <- list(male=0.5, female=0.8)
# define DAG
dag <- empty_dag() +
node("sex", type="rcategorical", labels=c("male", "female"),
coerce2factor=TRUE, probs=c(0.5, 0.5)) +
node("chemo", type="rbernoulli", p=0.5) +
node("A", type="conditional_prob", parents="sex", probs=probs)
# generate data
data <- sim_from_dag(dag=dag, n_sim=1000)
#### three classes, one parent node ####
# define conditional probs
probs <- list(male=c(0.5, 0.2, 0.3), female=c(0.8, 0.1, 0.1))
# define DAG
dag <- empty_dag() +
node("sex", type="rcategorical", labels=c("male", "female"),
coerce2factor=TRUE, probs=c(0.5, 0.5)) +
node("chemo", type="rbernoulli", p=0.5) +
node("A", type="conditional_prob", parents="sex", probs=probs)
# generate data
data <- sim_from_dag(dag=dag, n_sim=1000)
#### two classes, two parent nodes ####
# define conditional probs
probs <- list(male.FALSE=0.5,
male.TRUE=0.8,
female.FALSE=0.1,
female.TRUE=0.3)
# define DAG
dag <- empty_dag() +
node("sex", type="rcategorical", labels=c("male", "female"),
coerce2factor=TRUE, probs=c(0.5, 0.5)) +
node("chemo", type="rbernoulli", p=0.5) +
node("A", type="conditional_prob", parents=c("sex", "chemo"), probs=probs)
# generate data
data <- sim_from_dag(dag=dag, n_sim=1000)
#### three classes, two parent nodes ####
# define conditional probs
probs <- list(male.FALSE=c(0.5, 0.1, 0.4),
male.TRUE=c(0.8, 0.1, 0.1),
female.FALSE=c(0.1, 0.7, 0.2),
female.TRUE=c(0.3, 0.4, 0.3))
# define dag
dag <- empty_dag() +
node("sex", type="rcategorical", labels=c("male", "female"),
coerce2factor=TRUE, probs=c(0.5, 0.5)) +
node("chemo", type="rbernoulli", p=0.5) +
node("A", type="conditional_prob", parents=c("sex", "chemo"), probs=probs)
# generate data
data <- sim_from_dag(dag=dag, n_sim=1000)