LearnTgs {TGS} | R Documentation |
Implement the TGS Algorithm
Description
The TGS algorithm takes a time-series gene expression dataset as input. It analyses the data and reconstructs the underlying temporal sequence of gene regulatory events. The reconstructed output is given in the form of time-varying gene regulatory networks (GRNs). The TGS algorithm is extremely time-efficient and hence suitable for processing large datasets with hundreds to thousands of genes. More details about the algorithm can be found at Saptarshi Pyne, Alok Ranjan Kumar, and Ashish Anand. Rapid reconstruction of time-varying gene regulatory networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 17(1):278–291, Jan–Feb 2020.
Usage
LearnTgs(
isfile = 0,
json.file = "",
input.dirname = "",
input.data.filename = "",
num.timepts = 2,
true.net.filename = "",
input.wt.data.filename = "",
is.discrete = TRUE,
num.discr.levels = 2,
discr.algo = "",
mi.estimator = "mi.pca.cmi",
apply.aracne = FALSE,
clr.algo = "CLR",
max.fanin = 14,
allow.self.loop = TRUE,
scoring.func = "BIC",
output.dirname = ""
)
Arguments
isfile |
Numeric. 1 or 0. 1 if input arguments are given in a json file. Otherwise, 0. | ||||||||||||||||||||
json.file |
Character string. Absolute path to the JSON file if
| ||||||||||||||||||||
input.dirname |
Character string. Absolute path to the directory where input files are kept. By default, the current working directory. | ||||||||||||||||||||
input.data.filename |
Character string. Name of the file containing the input data. It can either be a '.tsv' file or an '.RData' file.
For both '.tsv' and '.RData' input, multiple rows with the same time point ID represent multiple replicates at the same time point. In other words, these rows belong to the same time point in different time series. The time points belonging to the same time series must be together and in ascending order. An exemplary dataset with three genes {G1, G2, G3}, two time points {t1, t2} and two time series is shown below.
| ||||||||||||||||||||
num.timepts |
Numeric. Positive integer greater than 1. Number of distinct time points. | ||||||||||||||||||||
true.net.filename |
Character string. Name of the file containing the true network. In case it is non-empty, the name should refer to an '.RData' file. The '.RData' file must have an object named 'true.net.adj.matrix'. The object can either be a matrix or a list.
| ||||||||||||||||||||
input.wt.data.filename |
Character string. Name of the file containing the Wild Type expressions of the genes. If non-empty, then must be a '.tsv' file. The first row should contain the names of the genes. Only exception is the (1, 1)-th cell which should be empty. The second row should have the wild type expressions. Therefore, the (2, j)-th cell must contain the wild type expression of the j-th gene. Again the only exception is the (2, 1)-th cell which should be empty. An example with three genes {G1, G2, G3} is given below.
| ||||||||||||||||||||
is.discrete |
Logical. TRUE or FALSE. TRUE if the input data is discrete. Otherwise, FALSE. | ||||||||||||||||||||
num.discr.levels |
Numeric. Positive integer greater than 1. Number of discrete levels that each gene has (if the input data is discrete) or each gene should have (if the input data needs to be discretised). | ||||||||||||||||||||
discr.algo |
Character string. Name of the discretisation algorithm to be
used when the input data needs to be discretised. The available algorithms
are – 'discretizeData.2L.Tesla' and 'discretizeData.2L.wt.l'. If you
choose algorithm 'discretizeData.2L.wt.l', please provide the wild type
data using argument | ||||||||||||||||||||
mi.estimator |
Character string. Name of the algorithm for estimating mutual informations. There is only one algorithm available at this moment. It is 'mi.pca.cmi'. | ||||||||||||||||||||
apply.aracne |
Logical. TRUE or FALSE. TRUE if you wish to apply ARACNE for refining the mutual information matrix. Otherwise, FALSE. | ||||||||||||||||||||
clr.algo |
Character string. Name of the context likelihood relatedness (CLR) algorithm to use. The available algorithms are – 'CLR', 'CLR2', 'CLR2.1', 'CLR3' and 'spearman'. | ||||||||||||||||||||
max.fanin |
Numeric. Positive integer. Maximum number of regulators each gene can have. | ||||||||||||||||||||
allow.self.loop |
Logical. TRUE or FALSE. TRUE if you wish to allow self loops. Otherwise, FALSE. | ||||||||||||||||||||
scoring.func |
Character string. Name of the scoring function to use. At this moment, the only available option is 'BIC'. | ||||||||||||||||||||
output.dirname |
Character string. File path to a directory where output files are to be saved. There are three options. Option 1: It can be the absolute path to an existing directory. Option 2: It can also be the absolute path to a non-existing directory. In this case, the directory will be created. Option 3 (default): If provided an empty string, then it will be the current working directory. |
Details
The function does not return any values. Instead, it outputs a set of
files and saves them under the directory specified by output.dirname
.
The output files are described in Section 'Value'.
Value
- input.data.discr.RData
Discretised version of the input data. This file is created only if the input data is not discretised as specified by input argument 'is.discrete'.
- mut.info.matrix.RData
Mutual information matrix of the given genes. This RData file contains a matrix named 'mut.info.matrix'. The (i, j)-th cell of the matrix represents the mutual information between the i-th and j-th genes. This is a symmetric matrix.
- mi.net.adj.matrix.wt.RData
Weighted Mutual information network of the given genes. This RData file contains a matrix named 'mi.net.adj.matrix.wt'. The (i, j)-th cell of the matrix represents the weight of the edge from the i-th gene to the j-th gene. The edge weight is a non-negative real number.
- mi.net.adj.matrix.RData
Unweighted Mutual information network of the given genes. This RData file contains a matrix named 'mi.net.adj.matrix'. Each cell of the matrix contains a value of 1 or 0. If the (i, j)-th cell contains 1, then there exists an edge from the i-th gene to the j-th gene. Otherwise, the edge does not exist.
- unrolled.DBN.adj.matrix.list.RData
Reconstructed time-varying GRNs. This RData file contains a list named 'unrolled.DBN.adj.matrix.list'. The length of the list is equal to the total number of time intervals, which is
(num.timepts - 1)
. Each element in the list is a network adjacency matrix. The p-th element in the list represents the adjacency matrix of the GRN corresponding to the p-th time interval. In this adjacency matrix, each cell contains a value of 1 or 0. If the (i, j)-th cell contains 1, then there exists a directed edge from the i-th gene to the j-th gene. Otherwise, the edge does not exist.- di.net.adj.matrix.RData
Rolled GRN. This RData file contains a matrix named 'di.net.adj.matrix'. Each cell in the matrix contains a value of 1 or 0. If the (i, j)-th cell contains 1, then there exists an edge from the i-th gene to the j-th gene. Otherwise, the edge does not exist.
- net.sif
Rolled GRN in the SIF format compatible with Cytoscape.
- Result.RData
Correctness metrics. This file is created only if true network is given through input argument 'true.net.filename'. Inside this RData file, there is a matrix named 'Result'. The columns represent the correctness metrics, such as - TP (number of true positive predictions) and FP (number of false positive predictions). The rows depend upon the nature of the true network. If the true network is time-varying GRNs, then the number of rows is equal to the number of time intervals. In that case, the p-th row contains the correctness metrics of the reconstructed GRN corresponding to the p-th time interval. On the other hand, if the true network is a summary GRN, then there exists only one row. This row represents the correctness metrics of the rolled GRN.
- output.txt
Console output.
- sessionInfo.txt
R session information.
Examples
## Not run:
TGS::LearnTgs(
isfile = 0,
json.file = '',
input.dirname = 'C:/GitHub/TGS/inst/extdata',
input.data.filename = 'InSilicoSize10-Yeast1-trajectories.tsv',
num.timepts = 21,
true.net.filename = 'DREAM3GoldStandard_InSilicoSize10_Yeast1_TrueNet.RData',
input.wt.data.filename = 'InSilicoSize10-Yeast1-null-mutants.tsv',
is.discrete = FALSE,
num.discr.levels = 2,
discr.algo = 'discretizeData.2L.wt.l',
mi.estimator = 'mi.pca.cmi',
apply.aracne = FALSE,
clr.algo = 'CLR',
max.fanin = 14,
allow.self.loop = FALSE,
scoring.func = 'BIC',
output.dirname = 'C:/GitHub/TGS/inst/extdata/Output_Ds10n')
TGS::LearnTgs(
isfile = 0,
json.file = '',
input.dirname = 'C:/GitHub/TGS/inst/extdata',
input.data.filename = 'edi-data-10n.tsv',
num.timepts = 21,
true.net.filename = 'edi.net.10.adj.mx.RData',
input.wt.data.filename = '',
is.discrete = FALSE,
num.discr.levels = 2,
discr.algo = 'discretizeData.2L.Tesla',
mi.estimator = 'mi.pca.cmi',
apply.aracne = FALSE,
clr.algo = 'CLR',
max.fanin = 14,
allow.self.loop = TRUE,
scoring.func = 'BIC',
output.dirname = 'C:/GitHub/TGS/inst/extdata/Output_Ed10n')
## End(Not run)