meta2d {MetaCycle} | R Documentation |
Detect rhythmic signals from time-series datasets with multiple methods
Description
This is a function that incorporates ARSER, JTK_CYCLE and Lomb-Scargle to detect rhythmic signals from time-series datasets.
Usage
meta2d(infile, outdir = "metaout", filestyle, timepoints, minper = 20,
maxper = 28, cycMethod = c("ARS", "JTK", "LS"),
analysisStrategy = "auto", outputFile = TRUE,
outIntegration = "both", adjustPhase = "predictedPer",
combinePvalue = "fisher", weightedPerPha = FALSE, ARSmle = "auto",
ARSdefaultPer = 24, outRawData = FALSE, releaseNote = TRUE,
outSymbol = "", parallelize = FALSE, nCores = 1, inDF = NULL)
Arguments
infile |
a character string. The name of input file containing time-series data. |
outdir |
a character string. The name of directory used to store output files. |
filestyle |
a character vector(length 1 or 3). The data format of
input file, must be |
timepoints |
a numeric vector corresponding to sampling time points of input time-series data; if sampling time points are in the first line of input file, it could be set as a character sting-"Line1" or "line1". |
minper |
a numeric value. The minimum period length of interested
rhythms. The default is |
maxper |
a numeric value. The maximum period length of interested
rhythms. The default is |
cycMethod |
a character vector(length 1 or 2 or 3). User-defined
methods for detecting rhythmic signals, must be selected as any one, any
two or all three methods(default) from |
analysisStrategy |
a character string. The strategy used to select
proper methods from |
outputFile |
logical. If |
outIntegration |
a character string. This parameter controls what
kinds of analysis results will be outputted, must be one of |
adjustPhase |
a character string. The method used to adjust original
phase calculated by each method in integration file, must be one of
|
combinePvalue |
a character string. The method used to integrate
multiple p-values, must be one of |
weightedPerPha |
logical. If |
ARSmle |
a character string. The strategy of using MLE method in
|
ARSdefaultPer |
a numeric value. The expected period length of
interested rhythm, which is a necessary parameter for |
outRawData |
logical. If |
releaseNote |
logical. If |
outSymbol |
a character string. A common prefix exists in the names of output files. |
parallelize |
logical. If |
nCores |
a integer. Bigger or equal to one, number of cores to use. |
inDF |
data.frame. If |
Details
ARSER(Yang, 2010),
JTK_CYCLE(
Hughes, 2010), and
Lomb-Scargle(Glynn, 2006) are three popular methods of detecting
rhythmic signals. ARS
can not analyze unevenly sampled datasets,
or evenly sampled datasets but with missing values, or with replicate
samples, or with non-integer sampling interval. JTK
is not
suitable to analyze unevenly sampled datasets or evenly sampled datasets
but with non-integer sampling interval. If set analysisStrategy
as "auto"
(default), meta2d
will automatically select
proper method from cycMethod
for each input dataset. If the user
clearly know that the dataset could be analyzed by each method defined
by cycMethod
and do not hope to output integrated values,
analysisStrategy
can be set as "selfUSE"
.
ARS
used here is translated from its python version which always
uses "yule-walker"
, "burg"
, and "mle"
methods(see
ar
) to fit autoregressive models to time-series
data. Fitting by "mle"
will be very slow for datasets
with many time points. If ARSmle = "auto"
is used,
meta2d
will only include "mle"
when number of time points
is smaller than 24. In addition, one evaluation work(Wu, 2014) indicates
that ARS
shows relative high false positive rate in analyzing
high-resolution datasets (1h/2days and 2h/2days). JTK
(version 3)
used here is the latest version, which improves its p-value calculation
in analyzing datasets with missing values.
The power of detecting rhythmic signals for an algorithm is associated
with the nature of data and interested periodic pattern(Deckard, 2013),
which indicates that integrating analysis results from multiple methods
may be helpful to rhythmic detection. For integrating p-values,
Bonferroni correction("bonferroni"
) and Fisher's method(
"fisher"
) (Fisher, 1925; implementation code from MADAM)
could be selected, and "bonferroni"
is usually more conservative
than "fisher"
. The integrated period is arithmetic mean of
multiple periods. For integrating phase, meta2d
takes use of
mean of circular quantities. Integrated period and phase is further
used to calculate the baseline value and amplitude through fitting a
constructed periodic model.
Phases given by JTK
and LS
need to be adjusted with their
predicted period (adjustedPhase = "predictedPer"
) before
integration. If adjustedPhas = "notAdjusted"
is selected, no
integrated phase will be calculated. If set weightedPerPha
as
TRUE
, weighted scores will be used in averaging periods and
phases. Weighted scores for one method are based on all its reported
p-values, which means a weighted score assigned to any one profile will
be affected by all other profiles. It is always a problem of averaging
phases with quite different period lengths(eg. averaging two phases
with 16-hours' and 30-hours' period length). Currently, setting
minper
, maxper
and ARSdefaultPer
to a same value
may be the only way of completely eliminating such problem.
This function is originally aimed to analyze large scale periodic data(
eg. circadian transcriptome data) without individual information.
Please pay attention to data format of input file(see Examples
part). Except the first column and first row, others are time-series
experimental values(setting missing values as NA
).
Value
meta2d
will write analysis results in different files under
outdir
if set outputFile = TRUE
. Files named with
"ARSresult", "JTKresult" and "LSreult" store analysis results from
ARS
, JTK
and LS
respectively. The file named with
"meta2d" is the integration file, and it stores integrated values in
columns with a common name tag-"meta2d". The integration file also
contains p-value, FDR value, period, phase(adjusted phase if
adjustedPhase = "predictedPer"
) and amplitude values calculated
by each method.
If outputFile = FALSE
is selected, meta2d
will return a
list containing the following components:
ARS | analysis results from ARS method |
JTK | analysis results from JTK method |
LS | analysis results from LS method |
meta | the integrated analysis results as mentioned above |
References
Yang R. and Su Z. (2010). Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation. Bioinformatics, 26(12), i168–i174.
Hughes M. E., Hogenesch J. B. and Kornacker K. (2010). JTK_CYCLE: an efficient nonparametric algorithm for detecting rhythmic components in genome-scale data sets. Journal of Biological Rhythms, 25(5), 372–380.
Glynn E. F., Chen J. and Mushegian A. R. (2006). Detecting periodic patterns in unevenly spaced gene expression time series using Lomb-Scargle periodograms. Bioinformatics, 22(3), 310–316.
Wu G., Zhu J., Yu J., Zhou L., Huang J. Z. and Zhang Z. (2014). Evaluation of five methods for genome-wide circadian gene identification. Journal of Biological Rhythms, 29(4), 231–242.
Deckard A., Anafi R. C., Hogenesch J. B., Haase S.B. and Harer J. (2013). Design and analysis of large-scale biological rhythm studies: a comparison of algorithms for detecting periodic signals in biological data. Bioinformatics, 29(24), 3174–3180.
Fisher, R.A. (1925). Statistical methods for research workers. Oliver and Boyd (Edinburgh).
Kugler K. G., Mueller L.A. and Graber A. (2010). MADAM - an open source toolbox for meta-analysis. Source Code for Biology and Medicine, 5, 3.
Examples
# write 'cycSimu4h2d', 'cycMouseLiverRNA' and 'cycYeastCycle' into three
# 'csv' files
write.csv(cycSimu4h2d, file="cycSimu4h2d.csv", row.names=FALSE)
write.csv(cycMouseLiverRNA, file="cycMouseLiverRNA.csv", row.names=FALSE)
write.csv(cycYeastCycle, file="cycYeastCycle.csv", row.names=FALSE)
# write 'cycMouseLiverProtein' into a 'txt' file
write.table(cycMouseLiverProtein, file="cycMouseLiverProtein.txt",
sep="\t", quote=FALSE, row.names=FALSE)
# analyze 'cycMouseLiverRNA.csv' with JTK_CYCLE
# this is masked for keeping the total running time within 10s required by CRAN check
# meta2d(infile="cycMouseLiverRNA.csv", filestyle="csv", outdir="example",
# timepoints=18:65, cycMethod="JTK", outIntegration="noIntegration")
# analyze 'cycMouseLiverProtein.txt' with JTK_CYCLE and Lomb-Scargle
meta2d(infile="cycMouseLiverProtein.txt", filestyle="txt",
outdir="example", timepoints=rep(seq(0, 45, by=3), each=3),
cycMethod=c("JTK","LS"), outIntegration="noIntegration")
# analyze 'cycSimu4h2d.csv' with ARSER, JTK_CYCLE and Lomb-Scargle and
# output integration file with analysis results from each method
meta2d(infile="cycSimu4h2d.csv", filestyle="csv", outdir="example",
timepoints="Line1")
# analyze 'cycYeastCycle.csv' with ARSER, JTK_CYCLE and Lomb-Scargle to
# detect transcripts associated with cell cycle, and only output
# integration file
meta2d(infile="cycYeastCycle.csv",filestyle="csv", outdir="example",
minper=80, maxper=96, timepoints=seq(2, 162, by=16),
outIntegration="onlyIntegration", ARSdefaultPer=85,
outRawData=TRUE)
# return analysis results instead of output them into files
cyc <- meta2d(infile="cycYeastCycle.csv",filestyle="csv",
minper=80, maxper=96, timepoints=seq(2, 162, by=16),
outputFile=FALSE, ARSdefaultPer=85, outRawData=TRUE)
head(cyc$ARS)
head(cyc$JTK)
head(cyc$LS)
head(cyc$meta)