node_binomial {simDAG} | R Documentation |
Simulate a Node Using Logistic Regression
Description
Data from the parents is used to generate the node using logistic regression by predicting the covariate specific probability of 1 and sampling from a Bernoulli distribution accordingly.
Usage
node_binomial(data, parents, formula=NULL, betas, intercept,
return_prob=FALSE, coerce2factor=FALSE,
coerce2numeric=FALSE, labels=NULL)
Arguments
data |
A |
parents |
A character vector specifying the names of the parents that this particular child node has. If non-linear combinations or interaction effects should be included, the user may specify the |
formula |
An optional |
betas |
A numeric vector with length equal to |
intercept |
A single number specifying the intercept that should be used when generating the node. |
return_prob |
Either |
coerce2factor |
Either |
coerce2numeric |
Either |
labels |
A character vector of length 2 or |
Details
Using the normal form a logistic regression model, the observation specific event probability is generated for every observation in the dataset. Using the rbernoulli
function, this probability is then used to take one bernoulli sample for each observation in the dataset. If only the probability should be returned return_prob
should be set to TRUE
.
Formal Description:
Formally, the data generation can be described as:
Y \sim Bernoulli(logit(\texttt{intercept} + \texttt{parents}_1 \cdot \texttt{betas}_1 + ... + \texttt{parents}_n \cdot \texttt{betas}_n)),
where Bernoulli(p)
denotes one Bernoulli trial with success probability p
, n
is the number of parents (length(parents)
) and the logit(x)
function is defined as:
logit(x) = ln(\frac{x}{1-x}).
For example, given intercept=-15
, parents=c("A", "B")
and betas=c(0.2, 1.3)
the data generation process is defined as:
Y \sim Bernoulli(logit(-15 + A \cdot 0.2 + B \cdot 1.3)).
Output Format:
By default this function returns a logical vector containing only TRUE
and FALSE
entries, where TRUE
corresponds to an event and FALSE
to no event. If those should be coded as 0/1 instead, the user can use the coerce2numeric
argument. If they should be coded as a character with specific labels, the user can use the labels
argument. To additionally output it as a factor, the user may use the coerce2factor
argument. If both coerce2factor
and coerce2numeric
are set to TRUE
, the result will be a factor. The last three arguments of this function are ignored if return_prob
is set to TRUE
.
Value
Returns a logical vector (or numeric vector if return_prob=TRUE
) of length nrow(data)
.
Author(s)
Robin Denz
See Also
empty_dag
, node
, node_td
, sim_from_dag
, sim_discrete_time
Examples
library(simDAG)
set.seed(5425)
# define needed DAG
dag <- empty_dag() +
node("age", type="rnorm", mean=50, sd=4) +
node("sex", type="rbernoulli", p=0.5) +
node("smoking", type="binomial", parents=c("age", "sex"),
betas=c(1.1, 0.4), intercept=-2)
# simulate data from it
sim_dat <- sim_from_dag(dag=dag, n_sim=100)
# returning only the estimated probability instead
dag <- empty_dag() +
node("age", type="rnorm", mean=50, sd=4) +
node("sex", type="rbernoulli", p=0.5) +
node("smoking", type="binomial", parents=c("age", "sex"),
betas=c(1.1, 0.4), intercept=-2, return_prob=TRUE)
sim_dat <- sim_from_dag(dag=dag, n_sim=100)