sim.data {flam}R Documentation

Simulate Data from a Variety of Functional Scenarios

Description

This function generates data according to the simulation scenarios considered in Section 5 and plotted in Figure 2 of Petersen, A., Witten, D., and Simon, N. (2014). Fused Lasso Additive Model. arXiv preprint arXiv:1409.5391. Each scenario has four covariates that have some non-linear association with the outcome. There is the option to also generate a user-specified number of covariates that have no association with the outcome.

Usage

sim.data(n, scenario, zerof, noise = 1, family = "gaussian")

Arguments

n

number of observations.

scenario

simulation scenario to use. Options are 1, 2, 3, or 4, which correspond to the simulation scenarios of Section 5 in Petersen, A., Witten, D., and Simon, N. (2014). Fused Lasso Additive Model. arXiv preprint arXiv:1409.5391. Each scenario has four covariates.

zerof

number of noise covariates (those that have no relationship to the outcome) to include. This can be used to replicate the high-dimensional scenarios of Section 5 in Petersen, A., Witten, D., and Simon, N. (2014). Fused Lasso Additive Model. arXiv preprint arXiv:1409.5391. The total number of covariates will be 4 + zerof.

noise

the variance of the errors. If family = "gaussian", the errors of observations are generated using MVN(0, noiseI) with default of 1, otherwise noise is not used if family="binomial".

family

the error distribution of observations (must be family="gaussian" or family="binomial"). If family = "gaussian", the errors of observations are generated using MVN(0, noiseI), otherwise observations are Bernoulli if family="binomial".

Value

x

n x p covariate matrix.

y

n-vector containing the outcomes for the n observations in x.

theta

n x p mean matrix used to generate y.

Author(s)

Ashley Petersen

References

Petersen, A., Witten, D., and Simon, N. (2014). Fused Lasso Additive Model. arXiv preprint arXiv:1409.5391.

Examples

#See ?'flam-package' for a full example of how to use this package

#generate data to fit FLAM model with squared-error loss
set.seed(1)
data <- sim.data(n = 50, scenario = 1, zerof = 10, noise = 1)
flam.out <- flam(x = data$x, y = data$y, family = "gaussian")

#alternatively, generate data for logistic FLAM model 
#note: 'noise' argument no longer needed
data2 <- sim.data(n = 50, scenario = 1, zerof = 0, family = "binomial")
flam.logistic.out <- flam(x = data2$x, y = data2$y, family = "binomial")


#vary generating functions
#choose large n because we want to plot generating functions
data1 <- sim.data(n = 500, scenario = 1, zerof = 0)
data2 <- sim.data(n = 500, scenario = 2, zerof = 0)
data3 <- sim.data(n = 500, scenario = 3, zerof = 0)
data4 <- sim.data(n = 500, scenario = 4, zerof = 0)
#and plot to see functional forms
par(mfrow=c(2,2))
col.vec = c("dodgerblue1","orange","seagreen1","hotpink")
for (i in 1:4) {
	if (i==1) data = data1 else if (i==2) data = data2 
		else if (i==3) data = data3 else data = data4
	plot(1,type="n",xlim=c(-2.5,2.5),ylim=c(-3,3),xlab=expression(x[j]),
		ylab=expression(f[j](x[j])),main=paste("Scenario ",i,sep=""))
	sapply(1:4, function(j) points(sort(data$x[,j]), 
		data$theta[order(data$x[,j]),j],col=col.vec[j],type="l",lwd=3))
}

#include large number of predictors that have no relationship to outcome
data <- sim.data(n = 50, scenario = 1, zerof = 100, noise = 1)

[Package flam version 3.2 Index]