node_gaussian {simDAG} | R Documentation |
Simulate a Node Using Linear Regression
Description
Data from the parents is used to generate the node using linear regression by predicting the covariate specific mean and sampling from a normal distribution with that mean and a specified standard deviation.
Usage
node_gaussian(data, parents, formula=NULL, betas, intercept, error)
Arguments
data |
A |
parents |
A character vector specifying the names of the parents that this particular child node has. If non-linear combinations or interaction effects should be included, the user may specify the |
formula |
An optional |
betas |
A numeric vector with length equal to |
intercept |
A single number specifying the intercept that should be used when generating the node. |
error |
A single number specifying the sigma error that should be used when generating the node. |
Details
Using the general linear regression equation, the observation-specific value that would be expected given the model is generated for every observation in the dataset generated thus far. We could stop here, but this would create a perfect fit for the node, which is unrealistic. Instead, we add an error term by taking one sample of a normal distribution for each observation with mean zero and standard deviation error
. This error term is then added to the predicted mean.
Formal Description:
Formally, the data generation can be described as:
where denotes the normal distribution with mean 0 and a standard deviation of
error
and is the number of parents (
length(parents)
).
For example, given intercept=-15
, parents=c("A", "B")
, betas=c(0.2, 1.3)
and error=2
the data generation process is defined as:
Value
Returns a numeric vector of length nrow(data)
.
Author(s)
Robin Denz
See Also
empty_dag
, node
, node_td
, sim_from_dag
, sim_discrete_time
Examples
library(simDAG)
set.seed(12455432)
dag <- empty_dag() +
node("age", type="rnorm", mean=50, sd=4) +
node("sex", type="rbernoulli", p=0.5) +
node("bmi", type="gaussian", parents=c("sex", "age"),
betas=c(1.1, 0.4), intercept=12, error=2)
sim_dat <- sim_from_dag(dag=dag, n_sim=100)