umx_make_TwinData {umx}R Documentation

Simulate twin data with control over A, C, and E parameters, as well as moderation of A.

Description

Makes MZ and DZ twin data, optionally with moderated A. By default, the three variance components must sum to 1.

See examples for how to use this: it is pretty flexible.

If you provide 2 varNames, they will be used for twin 1 and twin 2. If you provide one, it will be expanded to var_T1 and var_T2. note: the function was designed around nSib = 2 and var names = var_T1. It isn't yet smart enough to do, for instance scaling or shifting to make the min value 0 (normal for most traits we analyse) for nonstandard varNames and 'nSib“.

Note, if you want a power calculator, see power.ACE.test() and umxPower().

Usage

You must supply nMZpairs (you can omit nDZpairs). You can give any two of A, C, or E and the function deduces the missing parameter so A+C+E == 1.

Moderation

Univariate GxE Data To simulate data for umxGxE, offer up a list of the average, min and max values for AA, i.e., c(avg = .5, min = 0, max = 1).

umx_make_TwinData will return moderated data, with average value = avg, swinging down to min and up to max across 3-SDs of the moderator.

Bivariate GxE Data

To simulate data with a moderator that is not shared by both twins. Moderated heritability is specified via the bivariate relationship (AA, CC, EE) and two moderators in each component. AA = list(a11 = .4, a12 = .1, a22 = .15) CC = list(c11 = .2, c12 = .1, c22 = .10) EE = list(e11 = .4, e12 = .3, e22 = .25) Amod = list(Beta_a1 = .025, Beta_a2 = .025) Cmod = list(Beta_c1 = .025, Beta_c2 = .025) Emod = list(Beta_e1 = .025, Beta_e2 = .025)

Usage

umx_make_TwinData(
  nMZpairs,
  nDZpairs = nMZpairs,
  AA = NULL,
  CC = NULL,
  EE = NULL,
  DD = NULL,
  varNames = "var",
  MZr = NULL,
  DZr = MZr,
  nSib = 2,
  dzAr = 0.5,
  scale = FALSE,
  mean = 0,
  sd = 1,
  nThresh = NULL,
  sum2one = TRUE,
  bivAmod = NULL,
  bivCmod = NULL,
  bivEmod = NULL,
  seed = NULL,
  empirical = FALSE
)

Arguments

nMZpairs

Number of MZ pairs to simulate

nDZpairs

Number of DZ pairs to simulate (defaults to nMZpairs)

AA

value for A variance. NOTE: See options for use in GxE and Bivariate GxE

CC

value for C variance.

EE

value for E variance.

DD

value for E variance.

varNames

name for variables (defaults to 'var')

MZr

If MZr and DZr are set (default = NULL), the function returns dataframes of the request n and correlation.

DZr

Set to return dataframe using MZr and Dzr (Default NULL)

nSib

Number of siblings in a family (default = 2). "3" = extra sib.

dzAr

DZ Ar (default .5)

scale

Whether to scale output to var=1 mean=0 (Default FALSE)

mean

mean for traits (default = 0) (not applied to moderated cases)

sd

sd of traits (default = 1) (not applied to moderated cases)

nThresh

If supplied, use as thresholds and return mxFactor output? (default is not to)

sum2one

Whether to enforce AA + CC + EE summing the one (default = TRUE)

bivAmod

Used for Bivariate GxE data: list(Beta_a1 = .025, Beta_a2 = .025)

bivCmod

Used for Bivariate GxE data: list(Beta_c1 = .025, Beta_c2 = .025)

bivEmod

Used for Bivariate GxE data: list(Beta_e1 = .025, Beta_e2 = .025)

seed

Allows user to set.seed() if wanting reproducible dataset

empirical

Passed to mvrnorm

Value

References

See Also

Other Twin Data functions: umx_long2wide(), umx_make_twin_data_nice(), umx_residualize(), umx_scale_wide_twin_data(), umx_wide2long(), umx

Other Data Functions: noNAs(), prolific_anonymize(), prolific_check_ID(), prolific_read_demog(), umxFactor(), umxHetCor(), umx_as_numeric(), umx_cont_2_quantiles(), umx_lower2full(), umx_make_MR_data(), umx_make_fake_data(), umx_make_raw_from_cov(), umx_merge_randomized_columns(), umx_polychoric(), umx_polypairwise(), umx_polytriowise(), umx_read_lower(), umx_rename(), umx_reorder(), umx_score_scale(), umx_select_valid(), umx_stack(), umx_strings2numeric(), umx

Examples

# =====================================================================
# = Basic Example, with all elements of std univariate data specified =
# =====================================================================
tmp = umx_make_TwinData(nMZpairs = 10000, AA = .30, CC = .00, EE = .70)
# Show dataframe with 20,000 rows and 3 variables: var_T1, var_T2, and zygosity
str(tmp)

# ===============================
# = How to consume the datasets =
# ===============================

mzData = tmp[tmp$zygosity == "MZ", ]
dzData = tmp[tmp$zygosity == "DZ", ]
str(mzData); str(dzData); 
cov(mzData[, c("var_T1", "var_T2")])
cov(dzData[, c("var_T1", "var_T2")])
umxAPA(mzData[, c("var_T1", "var_T2")])

# Prefer to work in path coefficient values? (little a?)
tmp    = umx_make_TwinData(2000, AA = .7^2, CC = .0)
mzData = tmp[tmp$zygosity == "MZ", ]
dzData = tmp[tmp$zygosity == "DZ", ]
m1 = umxACE(selDVs="var", sep="_T", mzData= mzData, dzData= dzData)

# Examine correlations
cor(mzData[,c("var_T1","var_T2")])
cor(dzData[,c("var_T1","var_T2")])

# Example with D (left un-modeled in ACE)
tmp = umx_make_TwinData(nMZpairs = 500, AA = .4, DD = .2, CC = .2)
m1 = umxACE(selDVs="var", data = tmp, mzData= "MZ", dzData= "DZ")
# |    |   a1|   c1|   e1|
# |:---|----:|----:|----:|
# |var | 0.86| 0.24| 0.45|

m1 = umxACE(selDVs="var", data = tmp, mzData= "MZ", dzData= "DZ", dzCr=.25)
# |    |  a1|d1 |   e1|
# |:---|---:|:--|----:|
# |var | 0.9|.  | 0.44|


# =============
# = Shortcuts =
# =============

# Omit nDZpairs (equal numbers of both by default)
tmp = umx_make_TwinData(100, AA = 0.5, CC = 0.3) # omit any one of A, C, or E (sums to 1)
cov(tmp[tmp$zygosity == "DZ", c("var_T1","var_T2")])

# Not limited to unit variance
tmp = umx_make_TwinData(100, AA = 3, CC = 2, EE = 3, sum2one = FALSE) 
cov(tmp[tmp$zygosity == "MZ", c("var_T1","var_T2")])

# Output can be scaled (mean=0, std=1)
tmp = umx_make_TwinData(100, AA = .7, CC = .1, scale = TRUE) 
cov(tmp[tmp$zygosity == "MZ", c("var_T1","var_T2")])

## Not run: 

# ===============
# = GxE Example =
# ===============

AA = c(avg = .5, min = .1, max = .8)
tmp = umx_make_TwinData(nMZpairs = 140, nDZpairs = 240, AA = AA, CC = .35, EE = .65, scale= TRUE)
mzData = tmp[tmp$zygosity == "MZ", ]
dzData = tmp[tmp$zygosity == "DZ", ]
m1 = umxGxE(selDVs = "var", selDefs = "M", sep = "_T", mzData = mzData, dzData = dzData)

# =====================
# = Threshold Example =
# =====================
tmp = umx_make_TwinData(100, AA = .6, CC = .2, nThresh = 3)
str(tmp)
umx_polychoric(subset(tmp, zygosity=="MZ", c("var_T1", "var_T2")))$polychorics
# Running model with 7 parameters
#           var_T1    var_T2
# var_T1 1.0000000 0.7435457
# var_T2 0.7435457 1.0000000


# =================================================
# = Just use MZr and DZr (also works with nSib>2) =
# =================================================
tmp = umx_make_TwinData(100, MZr = .86, DZr = .60, nSib= 3, varNames = "IQ")
umxAPA(subset(tmp, zygosity == "MZ", paste0("IQ_T", 1:2)))
umxAPA(subset(tmp, zygosity == "DZ", paste0("IQ_T", 1:2)))
m1 = umxACE(selDVs= "IQ", data = tmp)
m1 = umxACE(selDVs= "IQ", data = tmp, nSib=3)
# TODO tmx_ examples of unmodeled D etc.

# Bivariate GxSES example (see umxGxEbiv)

AA   = list(a11 = .4, a12 = .1, a22 = .15)
CC   = list(c11 = .2, c12 = .1, c22 = .10)
EE   = list(e11 = .4, e12 = .3, e22 = .25)
Amod = list(Beta_a1 = .025, Beta_a2 = .025)
Cmod = list(Beta_c1 = .025, Beta_c2 = .025)
Emod = list(Beta_e1 = .025, Beta_e2 = .025)
tmp = umx_make_TwinData(5000, AA =AA, CC = CC, EE = EE, 
			bivAmod = Amod, bivCmod =Cmod, bivEmod =Emod)
str(tmp)
# 'data.frame':	10000 obs. of  7 variables:
#  $ defM_T1 : num  0.171 0.293 -0.173 0.238 -0.73 ...
#  $ defM_T2 : num  0.492 -0.405 -0.696 -0.829 -0.858 ...
#  $ M_T1    : num  0.171 0.293 -0.173 0.238 -0.73 ...
#  $ var_T1  : num  0.011 0.1045 0.5861 0.0583 1.0225 ...
#  $ M_T2    : num  0.492 -0.405 -0.696 -0.829 -0.858 ...
#  $ var_T2  : num  -0.502 -0.856 -0.154 0.065 -0.268 ...
#  $ zygosity: Factor w/ 2 levels "MZ","DZ": 1 1 1 1 1 1 1 1 1 1 ...

# TODO tmx example showing how moderation of A introduces heteroscedasticity in a regression model:
# More residual variance at one extreme of the x axis (moderator) 
# m1 = lm(var_T1~ M_T1, data = x); 
# x = rbind(tmp[[1]], tmp[[2]])
# plot(residuals(m1)~ x$M_T1, data=x)

## End(Not run)

[Package umx version 4.20.0 Index]