simgeno {netgwas}R Documentation

Generate genotype data based on Gaussian copula

Description

Generating discrete ordinal data based on underlying "genome-like" graph structure. The procedure of simulating data relies on a continues variable, which can be simulated from either multivariate normal distribution, or multivariate t-distribution with d degrees of freedom.

Usage

simgeno( p = 90, n = 200, k = NULL, g = NULL, adjacent = NULL, alpha =
              NULL , beta = NULL, con.dist = "Mnorm", d = NULL, vis = TRUE)

Arguments

p

The number of variables. The default value is 90.

n

The number of sample size (observations). The default value is 200.

k

The number of states (categories). The default value is 3.

g

The number of groups (chromosomes) in the graph. The default value is about p/20 if p >= 40 and 2 if p < 40.

adjacent

The number of adjacent variable(s) to be linked to a variable. For example, if adjacent = 1 indicates a variable is linked via an edge with its adjacent variable on the left hand side, and its adjacent variable on the right hand side. The adjacent = 2 defines a variable is linked via an edge with its 2 adjacent variables on its left hand side, and 2 adjacent variables on its right hand side. The default value is 1.

alpha

A probability that a pair of non-adjacent variables in the same group is given an edge. The default value is 0.01.

beta

A probability that variables in different groups are linked with an edge. The default value is 0.02.

con.dist

The distribution of underlying continuous variable. If con.dist = "Mnorm", a multivariate normal distribution with mean 0 is applied. If con.dist = "Mt", the t-distribution with a degrees of freedom is applied. The default distribution is con.dist = "Mnorm".

d

The degrees of freedom of the continuous variable, only applicable when con.dist = "Mt". The default value is 3.

vis

Visualize the graph pattern and the adjacency matrix of the true graph structure. The default value is TRUE.

Details

The graph pattern is generated as below:

genome-like: p variables are evenly partitions variables into g disjoint groups; the adjacent variables within each group are linked via an edge. With a probability alpha a pair of non-adjacent variables in the same group is given an edge. Variables in different groups are linked with an edge with a probability of beta.

Value

An object with S3 class "simgeno" is returned:

data

The generated data as an n by p matrix.

Theta

A p by p matrix corresponding to the inverse of covariance.

adj

A p by p matrix corresponding to the adjacency matrix of the true graph structure.

Sigma

A p by p covariance matrix for the generated data.

n.groups

The number of groups.

groups

A vector that indicates each variable belongs to which group.

sparsity

The sparsity levels of the true graph.

Author(s)

Pariya Behrouzi and Ernst C. Wit
Maintainer: Pariya Behrouzi <pariya.behrouzi@gmail.com>

References

1. Behrouzi, P., Arends, D., and Wit, E. C. (2023). netgwas: An R Package for Network-Based Genome-Wide Association Studies. The R journal, 14(4), 18-37.
2. Behrouzi, P., and Wit, E. C. (2019). Detecting epistatic selection with partially observed genotype data by using copula graphical models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 68(1), 141-160.

See Also

netsnp, and netgwas-package

Examples

#genome-like graph structure
sim1 <- simgeno(alpha = 0.01, beta = 0.02)
plot(sim1)

#genome-like graph structure with more edges between variables in a same or different groups
sim2 <- simgeno(adjacent = 3, alpha = 0.02 , beta = 0.03)
plot(sim2)

		#simulate data
		D <- simgeno(p=50, n=100, g=5, k= 3, adjacent = 3, alpha = 0.06 , beta = 0.08)
		plot(D)

		#Reconstructing intra- and inter-chromosomal conditional interactions (LD) network
		out <- netsnp(data = D$data, n.rho= 4, ncores= 1)
		plot(out)
		#Select an optimal graph
		sel <- selectnet(out)
		plot(sel, vis= "CI" )
	

[Package netgwas version 1.14.3 Index]