plot.TcGSA {TcGSA} | R Documentation |
Plot a Gene Set Trends Heatmap.
Description
This function plots a gene sets dynamic trends heatmap.
Usage
## S3 method for class 'TcGSA'
plot(
x,
threshold = 0.05,
myproc = "BY",
nbsimu_pval = 1e+06,
expr,
Subject_ID,
TimePoint,
baseline = NULL,
only.signif = TRUE,
group.var = NULL,
Group_ID_paired = NULL,
ref = NULL,
group_of_interest = NULL,
ranking = FALSE,
FUNcluster = NULL,
clustering_metric = "euclidian",
clustering_method = "ward",
B = 500,
max_trends = 4,
aggreg.fun = "median",
na.rm.aggreg = TRUE,
methodOptiClust = "firstSEmax",
indiv = "genes",
verbose = TRUE,
clust_trends = NULL,
N_clusters = NULL,
myclusters = NULL,
label.clusters = NULL,
prev_rowCL = NULL,
descript = TRUE,
plot = TRUE,
color.vec = c("darkred", "#D73027", "#FC8D59", "snow", "#91BFDB", "#4575B4",
"darkblue"),
legend.breaks = NULL,
label.column = NULL,
time_unit = "",
cex.label.row = 1,
cex.label.column = 1,
margins = c(5, 25),
heatKey.size = 1,
dendrogram.size = 1,
heatmap.height = 1,
heatmap.width = 1,
cex.clusterKey = 1,
cex.main = 1,
horiz.clusterKey = TRUE,
main = NULL,
subtitle = NULL,
...
)
Arguments
x |
an object of class' |
threshold |
the threshold at which the FDR or the FWER should be controlled. |
myproc |
a vector of character strings containing the names of the
multiple testing procedures for which adjusted p-values are to be computed.
This vector should include any of the following: " |
nbsimu_pval |
the number of observations under the null distribution to
be generated in order to compute the p-values. Default is |
expr |
either a matrix or dataframe of gene expression upon which
dynamics are to be calculated, or a list of gene sets estimation of gene
expression. In the case of a matrix or dataframe, its dimension are |
Subject_ID |
a factor of length |
TimePoint |
a numeric vector or a factor of length |
baseline |
the value of |
only.signif |
logical flag for plotting only the significant gene sets.
If |
group.var |
in the case of several treatment' groups, this is a factor of
length |
Group_ID_paired |
a character vector of length |
ref |
the group which is used as reference in the case of several
treatment groups. Default is |
group_of_interest |
the group of interest, for which dynamics are to be
computed in the case of several treatment groups. Default is |
ranking |
a logical flag. If |
FUNcluster |
the clustering function used to agglomerate genes in
trends. Default is |
clustering_metric |
character string specifying the metric to be used
for calculating dissimilarities between observations in the hierarchical
clustering when |
clustering_method |
character string defining the agglomerative method
to be used in the hierarchical clustering when |
B |
integer specifying the number of Monte Carlo ("bootstrap") samples
used to compute the gap statistics. Default is |
max_trends |
integer specifying the maximum number of different clusters
to be tested. Default is |
aggreg.fun |
a character string such as |
na.rm.aggreg |
a logical flag indicating whether |
methodOptiClust |
character string indicating how the "optimal"" number
of clusters is computed from the gap statistics and their standard
deviations. Possible values are |
indiv |
a character string indicating by which unit observations are
aggregated (through |
verbose |
logical flag enabling verbose messages to track the computing
status of the function. Default is |
clust_trends |
object of class ClusteredTrends containing
already computed trends for the plotted gene sets. Default is |
N_clusters |
an integer that is the number of clusters in which the
dynamics should be regrouped. The cutoff of the clustering tree is
automatically calculated accordingly. Default is |
myclusters |
a character vector of colors for predefined clusters of the
represented gene sets, with as many levels as the value of |
label.clusters |
if |
prev_rowCL |
a hclust object, such as the one return by the
present plotting function (see Value) for instance. If not |
descript |
logical flag indicating that the description of the gene sets
should appear after their name on the right side of the plot if |
plot |
logical flag indicating whether the heatmap should be plotted or
not. Default is |
color.vec |
a character strings vector used to define the color
palette used in the plot. Default is
|
legend.breaks |
a numeric vector indicating the splitting points for
coloring. Default is |
label.column |
a vector of character strings with the labels to be
displayed for the columns (i.e. the time points). Default is |
time_unit |
the time unit to be displayed (such as |
cex.label.row |
a numerical value giving the amount by which row labels
text should be magnified relative to the default |
cex.label.column |
a numerical value giving the amount by which column
labels text should be magnified relative to the default |
margins |
numeric vector of length 2 containing the margins (see
|
heatKey.size |
the size of the color key for the heatmap fill. Default
is |
dendrogram.size |
the horizontal size of the dendrogram. Default is |
heatmap.height |
the height of the heatmap. Default is |
heatmap.width |
the width of the heatmap. Default is |
cex.clusterKey |
a numerical value giving the amount by which the
clusters legend text should be magnified relative to the default |
cex.main |
a numerical value giving the amount by which title text
should be magnified relative to the default |
horiz.clusterKey |
a logical flag; if |
main |
a character string for an optional title. Default is |
subtitle |
a character string for an optional subtitle. Default is |
... |
other parameters to be passed through to plotting functions. |
Details
On the heatmap, each line corresponds to a gene set, and each column to a time point.
If expr
is a matrix or a dataframe, then the "original" data are
plotted. On the other hand, if expr
is a list returned in the
'Estimations'
element of TcGSA.LR
, then it is those
"estimations" made by the TcGSA.LR
function that are plotted.
If descript
is FALSE
, the second element of margins
can
be reduced (for instance use margins = c(5, 10)
), as there is not so
much need for space in order to display only the gene set names, without
their description.
If there is a large number of significant gene sets, the hierarchical clustering
step repeated for each of them can take a few minutes. To speed things up
(especially) when playing with the plotting parameters for having a nice plot,
one can run the clustTrend
function beforehand, and plug its results
in the plot.TcGSA
function via the clust_trends
argument.
Value
An object of class hclust which describes the tree produced by the clustering process. The object is a list with components:
-
merge
ann-1
by2
matrix. Rowi
ofmerge
describes the merging of clusters at step i of the clustering. If an elementj
in the row is negative, then observation -j
was merged at this stage. Ifj
is positive then the merge was with the cluster formed at the (earlier) stagej
of the algorithm. Thus negative entries in merge indicate agglomerations of singletons, and positive entries indicate agglomerations of non-singletons. -
height
a set ofn-1
real values (non-decreasing for ultrametric trees). The clustering height: that is, the value of the criterion associated with the Ward clustering method. -
order
a vector giving the permutation of the original observations suitable for plotting, in the sense that a cluster plot using this ordering and matrix merge will not have crossings of the branches. -
labels
the gene set trends name. -
call
the call which produced the result clustering:hclust(d = dist(map2heat, method = "euclidean"), method = "ward.D2")
-
method
"ward.D2", as it is the clustering method that has been used for clustering the gene set trends. -
dist.method
"euclidean", as it is the distance that has been used for clustering the gene set trends. -
legend.breaks
a numeric vector giving the splitting points used for coloring the heatmap. Ifplot
isFALSE
, then it isNULL
. -
myclusters
a character vector of colors for the dynamic clusters of the represented gene set trends, with as many levels as the value ofN_clusters
. If no dynamic clusters were represented, than this isNULL
. -
ddr
a dendrogram object with the reordering used for the heatmap. Seeheatmap.2
function from packagegplots
. gene set.names character vector with the names of the gene sets used in the heatmap.
-
clust.trends
a ClusteredTrends object. -
clustersExport
a data frame with 2 variables containing the two following variables :-
GeneSet
: the gene set trends clustered. -
Cluster
: the dynamic cluster they belong to.
The data frame is order by the variable
Cluster
. -
-
data_plotted
: the data matrix represented by the heatmap
Author(s)
Boris P. Hejblum
References
Hejblum BP, Skinner J, Thiebaut R, (2015) Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLOS Comput. Biol. 11(6): e1004310. doi: 10.1371/journal.pcbi.1004310
See Also
Examples
if(interactive()){
data(data_simu_TcGSA)
tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design,
subject_name="Patient_ID", time_name="TimePoint",
time_func="linear", crossedRandom=FALSE)
summary(tcgsa_sim_1grp)
plot(x=tcgsa_sim_1grp, expr=tcgsa_sim_1grp$Estimations,
Subject_ID=design$Patient_ID, TimePoint=design$TimePoint,
baseline=1,
B=100,
time_unit="H",
dendrogram.size=0.4, heatmap.width=0.8, heatmap.height=2, cex.main=0.7
)
tcgsa_sim_2grp <- TcGSA.LR(expr=expr_2grp, gmt=gmt_sim, design=design,
subject_name="Patient_ID", time_name="TimePoint",
time_func="linear", crossedRandom=FALSE,
group_name="group.var")
summary(tcgsa_sim_2grp)
plot(x=tcgsa_sim_2grp, expr=expr_2grp,
Subject_ID=design$Patient_ID, TimePoint=design$TimePoint,
B=100,
time_unit="H",
)
}