R: The Elastic Net penalized SEM with Network GPT Framework

elasticNetSEM {sparseSEM}

R Documentation

The Elastic Net penalized SEM with Network GPT Framework

Description

Fit the elastic-net penalized structureal Equation Models (SEM) with input data (X, Y): Y = BY + fX + e.

For users new to this package, elasticNetSEM provides the simplified entry point: Missing matrix can be all 0 (none or uknown), so as B matrix (unknow connections in the network), thus only Y and X are mandatory.

Underlying the function, the program obtains the optimal hyperparameter (alpha, lambda) from k-fold cross validation (CV) with fixed k= 5. Specifically, for each alpha from 0.95 to 0.05 at a step of -0.05, the function perform 5 fold CV for lambda_max to lambda_min in 20 step to determine the optimal (alpha, lambda) for the data.

Generally, the software program performs the following Network GPT Framework to arrive at final network structure:
Step 1. Generating a Complete Graph:

- SEM-ridge regression (L2 penalty) with k-fold CV: this step find the optimal ridge hyperparameter rho;

- fit SEM ridge regression model (L2 penalty) with rho from Step 1, obtain the initial status (non-sparse) of network structure (B_ridge);

Step 2. Elastic net penalized SEM regression with k-fold CV: this step finds the optimal hyperparameter (alpha, lambda);

Step 3. Fit elastic net SEM model with (alpha, lambda) from Step 2; This step applies a block cooridnate ascent algorithm, and the complete graph from Step-1 is used as the intial step;

Step 4. Calculate results for PD, FDR, provide the function output.

For large scale network inference, a standalone C/C++ software with openMPI for parallel computation is also available upon request.

Usage

elasticNetSEM(Y, X, Missing, B, verbose = 0)

Arguments

`Y`	The observed node response data with dimension of M (nodes) by N (samples). Y is normalized inside the function.
`X`	The network node attribute matrix with dimension of M by N. Theoretically, X can be L by N matrix, with L being the total node attributes. In current implementation, each node only allows one and only one attribute. If you have more than one attributes for some nodes, please consider selecting the top one by either correlation or principal component methods. If for some nodes there is no attribute available, fill in the rows with all zeros. See the yeast data 'yeast.rda' for example. X is normalized inside the function.
`Missing`	Optional M by N matrix corresponding to elements of Y. 0 denotes not missing, and 1 denotes missing. If a node i in sample j has the label missing (Missing[i,j] = 1), then Y[i,j] is set to 0.
`B`	Optional input. For a network with M nodes, B is the M by M adjacency matrix. If data is simulated/with known true network topology (i.e., known adjacency matrix), the Power of detection (PD) and False Discovery Rate (FDR) is computed in the output parameter 'statistics'. If the true network topology is unknown, B is optional, and the PD/FDR in output parameter 'statistics' should be ignored.
`verbose`	describe the information output from -1 - 10, larger number means more output

Details

the function perform CV and parameter inference, calculate power and FDR

Value

`Bout`	the computed weights for the network topology. B[i,j] = 0 means there is no edge between node i and j; B[i,j]!=0 denotes an (undirected) edge between note i and j with B[i,j] being the weight of the edge.
`fout`	f is 1 by M array keeping the weight for X (in SEM: Y = BY + FX + e). Theoretically, F can be M by L matrix, with M being the number of nodes, and L being the total node attributes. However, in current implementation, each node only allows one and only one attribute. If you have more than one attributes for some nodes, please consider selecting the top one by either correlation or principal component methods.
`stat`	statistics is 1x6 array keeping record of: 1. correct positive 2. total positive 3. false positive 4. positive detected 5. Power of detection (PD) = correct positive/total positive 6. False Discovery Rate (FDR) = false positive/positive detected
`hyperparameters`	Model hyperparameters obtained from cross validation.
`runTime`	computational time
`call`	the call that produced this object

Note

Difference in three functions:
1) elasticNetSEM: Default alpha = 0.95: -0.05: 0.05; default 20 lambdas
2) elasticNetSEMcv: user supplied alphas (one or more), lambdas; compute the optimal parameters and network parameters
3) elasticNetSEMpoint: user supplied one alpha and one lambda, compute the network parameters

Author(s)

Anhui Huang; Dept of Electrical and Computer Engineering, Univ of Miami, Coral Gables, FL

References

1. Cai, X., Bazerque, J.A., and Giannakis, G.B. (2013). Inference of Gene Regulatory Networks with Sparse Structural Equation Models Exploiting Genetic Perturbations. PLoS Comput Biol 9, e1003068.
2. Huang, A. (2014). "Sparse model learning for inferring genotype and phenotype associations." Ph.D Dissertation Chapter 7. University of Miami(1186).

Examples

	library(sparseSEM)
	data(B);
	data(Y);
	data(X);
	data(Missing);
	#Example
	
	  OUT <- elasticNetSEM(Y, X, Missing, B, verbose = 1);

[Package sparseSEM version 4.0 Index]