R: Simulate twin data with control over A, C, and E parameters,...

umx_make_TwinData {umx}

R Documentation

Simulate twin data with control over A, C, and E parameters, as well as moderation of A.

Description

Makes MZ and DZ twin data, optionally with moderated A. By default, the three variance components must sum to 1.

See examples for how to use this: it is pretty flexible.

If you provide 2 varNames, they will be used for twin 1 and twin 2. If you provide one, it will be expanded to var_T1 and var_T2. note: the function was designed around nSib = 2 and var names = var_T1. It isn't yet smart enough to do, for instance scaling or shifting to make the min value 0 (normal for most traits we analyse) for nonstandard varNames and 'nSib“.

Note, if you want a power calculator, see power.ACE.test() and umxPower().

Usage

You must supply nMZpairs (you can omit nDZpairs). You can give any two of A, C, or E and the function deduces the missing parameter so A+C+E == 1.

Moderation

Univariate GxE Data To simulate data for umxGxE, offer up a list of the average, min and max values for AA, i.e., c(avg = .5, min = 0, max = 1).

umx_make_TwinData will return moderated data, with average value = avg, swinging down to min and up to max across 3-SDs of the moderator.

Bivariate GxE Data

To simulate data with a moderator that is not shared by both twins. Moderated heritability is specified via the bivariate relationship (AA, CC, EE) and two moderators in each component. AA = list(a11 = .4, a12 = .1, a22 = .15) CC = list(c11 = .2, c12 = .1, c22 = .10) EE = list(e11 = .4, e12 = .3, e22 = .25) Amod = list(Beta_a1 = .025, Beta_a2 = .025) Cmod = list(Beta_c1 = .025, Beta_c2 = .025) Emod = list(Beta_e1 = .025, Beta_e2 = .025)

Usage

umx_make_TwinData(
  nMZpairs,
  nDZpairs = nMZpairs,
  AA = NULL,
  CC = NULL,
  EE = NULL,
  DD = NULL,
  varNames = "var",
  MZr = NULL,
  DZr = MZr,
  nSib = 2,
  dzAr = 0.5,
  scale = FALSE,
  mean = 0,
  sd = 1,
  nThresh = NULL,
  sum2one = TRUE,
  bivAmod = NULL,
  bivCmod = NULL,
  bivEmod = NULL,
  seed = NULL,
  empirical = FALSE
)

Arguments

`nMZpairs`	Number of MZ pairs to simulate
`nDZpairs`	Number of DZ pairs to simulate (defaults to nMZpairs)
`AA`	value for A variance. NOTE: See options for use in GxE and Bivariate GxE
`CC`	value for C variance.
`EE`	value for E variance.
`DD`	value for E variance.
`varNames`	name for variables (defaults to 'var')
`MZr`	If MZr and DZr are set (default = NULL), the function returns dataframes of the request n and correlation.
`DZr`	Set to return dataframe using MZr and Dzr (Default NULL)
`nSib`	Number of siblings in a family (default = 2). "3" = extra sib.
`dzAr`	DZ Ar (default .5)
`scale`	Whether to scale output to var=1 mean=0 (Default FALSE)
`mean`	mean for traits (default = 0) (not applied to moderated cases)
`sd`	sd of traits (default = 1) (not applied to moderated cases)
`nThresh`	If supplied, use as thresholds and return mxFactor output? (default is not to)
`sum2one`	Whether to enforce AA + CC + EE summing the one (default = TRUE)
`bivAmod`	Used for Bivariate GxE data: list(Beta_a1 = .025, Beta_a2 = .025)
`bivCmod`	Used for Bivariate GxE data: list(Beta_c1 = .025, Beta_c2 = .025)
`bivEmod`	Used for Bivariate GxE data: list(Beta_e1 = .025, Beta_e2 = .025)
`seed`	Allows user to set.seed() if wanting reproducible dataset
`empirical`	Passed to mvrnorm

Value

list of mzData and dzData dataframes containing T1 and T2 plus, if needed M1 and M2 (moderator values)

References

https://github.com/tbates/umx, https://tbates.github.io

Examples

# =====================================================================
# = Basic Example, with all elements of std univariate data specified =
# =====================================================================
tmp = umx_make_TwinData(nMZpairs = 10000, AA = .30, CC = .00, EE = .70)
# Show dataframe with 20,000 rows and 3 variables: var_T1, var_T2, and zygosity
str(tmp)

# ===============================
# = How to consume the datasets =
# ===============================

mzData = tmp[tmp$zygosity == "MZ", ]
dzData = tmp[tmp$zygosity == "DZ", ]
str(mzData); str(dzData); 
cov(mzData[, c("var_T1", "var_T2")])
cov(dzData[, c("var_T1", "var_T2")])
umxAPA(mzData[, c("var_T1", "var_T2")])

# Prefer to work in path coefficient values? (little a?)
tmp    = umx_make_TwinData(2000, AA = .7^2, CC = .0)
mzData = tmp[tmp$zygosity == "MZ", ]
dzData = tmp[tmp$zygosity == "DZ", ]
m1 = umxACE(selDVs="var", sep="_T", mzData= mzData, dzData= dzData)

# Examine correlations
cor(mzData[,c("var_T1","var_T2")])
cor(dzData[,c("var_T1","var_T2")])

# Example with D (left un-modeled in ACE)
tmp = umx_make_TwinData(nMZpairs = 500, AA = .4, DD = .2, CC = .2)
m1 = umxACE(selDVs="var", data = tmp, mzData= "MZ", dzData= "DZ")
# |    |   a1|   c1|   e1|
# |:---|----:|----:|----:|
# |var | 0.86| 0.24| 0.45|

m1 = umxACE(selDVs="var", data = tmp, mzData= "MZ", dzData= "DZ", dzCr=.25)
# |    |  a1|d1 |   e1|
# |:---|---:|:--|----:|
# |var | 0.9|.  | 0.44|


# =============
# = Shortcuts =
# =============

# Omit nDZpairs (equal numbers of both by default)
tmp = umx_make_TwinData(100, AA = 0.5, CC = 0.3) # omit any one of A, C, or E (sums to 1)
cov(tmp[tmp$zygosity == "DZ", c("var_T1","var_T2")])

# Not limited to unit variance
tmp = umx_make_TwinData(100, AA = 3, CC = 2, EE = 3, sum2one = FALSE) 
cov(tmp[tmp$zygosity == "MZ", c("var_T1","var_T2")])

# Output can be scaled (mean=0, std=1)
tmp = umx_make_TwinData(100, AA = .7, CC = .1, scale = TRUE) 
cov(tmp[tmp$zygosity == "MZ", c("var_T1","var_T2")])

## Not run: 

# ===============
# = GxE Example =
# ===============

AA = c(avg = .5, min = .1, max = .8)
tmp = umx_make_TwinData(nMZpairs = 140, nDZpairs = 240, AA = AA, CC = .35, EE = .65, scale= TRUE)
mzData = tmp[tmp$zygosity == "MZ", ]
dzData = tmp[tmp$zygosity == "DZ", ]
m1 = umxGxE(selDVs = "var", selDefs = "M", sep = "_T", mzData = mzData, dzData = dzData)

# =====================
# = Threshold Example =
# =====================
tmp = umx_make_TwinData(100, AA = .6, CC = .2, nThresh = 3)
str(tmp)
umx_polychoric(subset(tmp, zygosity=="MZ", c("var_T1", "var_T2")))$polychorics
# Running model with 7 parameters
#           var_T1    var_T2
# var_T1 1.0000000 0.7435457
# var_T2 0.7435457 1.0000000


# =================================================
# = Just use MZr and DZr (also works with nSib>2) =
# =================================================
tmp = umx_make_TwinData(100, MZr = .86, DZr = .60, nSib= 3, varNames = "IQ")
umxAPA(subset(tmp, zygosity == "MZ", paste0("IQ_T", 1:2)))
umxAPA(subset(tmp, zygosity == "DZ", paste0("IQ_T", 1:2)))
m1 = umxACE(selDVs= "IQ", data = tmp)
m1 = umxACE(selDVs= "IQ", data = tmp, nSib=3)
# TODO tmx_ examples of unmodeled D etc.

# Bivariate GxSES example (see umxGxEbiv)

AA   = list(a11 = .4, a12 = .1, a22 = .15)
CC   = list(c11 = .2, c12 = .1, c22 = .10)
EE   = list(e11 = .4, e12 = .3, e22 = .25)
Amod = list(Beta_a1 = .025, Beta_a2 = .025)
Cmod = list(Beta_c1 = .025, Beta_c2 = .025)
Emod = list(Beta_e1 = .025, Beta_e2 = .025)
tmp = umx_make_TwinData(5000, AA =AA, CC = CC, EE = EE, 
			bivAmod = Amod, bivCmod =Cmod, bivEmod =Emod)
str(tmp)
# 'data.frame':	10000 obs. of  7 variables:
#  $ defM_T1 : num  0.171 0.293 -0.173 0.238 -0.73 ...
#  $ defM_T2 : num  0.492 -0.405 -0.696 -0.829 -0.858 ...
#  $ M_T1    : num  0.171 0.293 -0.173 0.238 -0.73 ...
#  $ var_T1  : num  0.011 0.1045 0.5861 0.0583 1.0225 ...
#  $ M_T2    : num  0.492 -0.405 -0.696 -0.829 -0.858 ...
#  $ var_T2  : num  -0.502 -0.856 -0.154 0.065 -0.268 ...
#  $ zygosity: Factor w/ 2 levels "MZ","DZ": 1 1 1 1 1 1 1 1 1 1 ...

# TODO tmx example showing how moderation of A introduces heteroscedasticity in a regression model:
# More residual variance at one extreme of the x axis (moderator) 
# m1 = lm(var_T1~ M_T1, data = x); 
# x = rbind(tmp[[1]], tmp[[2]])
# plot(residuals(m1)~ x$M_T1, data=x)

## End(Not run)

[Package umx version 4.20.0 Index]