binarizeTimeSeries {BoolNet} | R Documentation |
Binarize a set of real-valued time series
Description
Binarizes a set of real-valued time series using k-means clustering, edge detection, or scan statistics.
Usage
binarizeTimeSeries(measurements,
method = c("kmeans","edgeDetector","scanStatistic"),
nstart = 100,
iter.max = 1000,
edge = c("firstEdge","maxEdge"),
scaling = 1,
windowSize = 0.25,
sign.level = 0.1,
dropInsignificant = FALSE)
Arguments
measurements |
A list of matrices, each corresponding to one time series. Each row of these matrices contains real-valued measurements for one gene on a time line, i. e. column |
method |
The employed binarization technique. "kmeans" uses k-means clustering for binarization. "edgeDetector" searches for a large gradient in the sorted measurements. "scanStatistic" searches for accumulations in the measurements. See Details for descriptions of the techniques. |
nstart |
If |
iter.max |
If |
edge |
If If set to "maxEdge", the binarization threshold is the position of the edge with the overall highest gradient. |
scaling |
If |
windowSize |
If |
sign.level |
If |
dropInsignificant |
If this is set to true, genes whose binarizations are insignificant in the scan statistic (see Details) are removed from the binarized time series. Otherwise, a warning is printed if such genes exist. |
Details
This method supports three binarization techniques:
- k-means clustering
For each gene, k-means clusterings are performed to determine a good separation of groups. The values belonging to the cluster with the smaller centroid are set to 0, and the values belonging to the greater centroid are set to 1.
- Edge detector
This approach first sorts the measurements for each gene. In the sorted measurements, the algorithm searches for differences of two successive values that satisfy a predefined condition: If the "firstEdge" method was chosen, the pair of values whose difference exceeds the scaled average gradient of all values is chosen and used as maximum and minimum value of the two groups. If the "maxEdge" method was chosen, the largest difference between two successive values is taken. For details, see Shmulevich et al.
- Scan statistic
The scan statistic assumes that the measurements for each gene are uniformly and independently distributed independently over a certain range. The scan statistic shifts a scanning window across the data and decides for each window position whether there is an unusual accumulation of data points based on an approximated test statistic (see Glaz et al.). The window with the smallest p-value is remembered. The boundaries of this window form two thresholds, from which the value that results in more balanced groups is taken for binarization. Depending on the supplied significance level, gene binarizations are rated according to the p-value of the chosen window.
Value
Returns a list with the following elements:
binarizedMeasurements |
A list of matrices with the same structure as |
reject |
If |
thresholds |
The thresholds used for binarization |
References
I. Shmulevich and W. Zhang (2002), Binary analysis and optimization-based normalization of gene expression data. Bioinformatics 18(4):555–565.
J. Glaz, J. Naus, S. Wallenstein (2001), Scan Statistics. New York: Springer.
See Also
Examples
# load test data
data(yeastTimeSeries)
# perform binarization with k-means
bin <- binarizeTimeSeries(yeastTimeSeries)
print(bin)
# perform binarization with scan statistic
# - will find and remove 2 insignificant genes!
bin <- binarizeTimeSeries(yeastTimeSeries, method="scanStatistic",
dropInsignificant=TRUE, sign.level=0.2)
print(bin)
# perform binarization with edge detector
bin <- binarizeTimeSeries(yeastTimeSeries, method="edgeDetector")
print(bin)
# reconstruct a network from the data
reconstructed <- reconstructNetwork(bin$binarizedMeasurements,
method="bestfit", maxK=4)
print(reconstructed)