R: Perform entire SCALPEL pipeline.

scalpel {scalpel}

R Documentation

Perform entire SCALPEL pipeline.

Description

Segmentation, Clustering, and Lasso Penalties (SCALPEL) is a method for neuronal calcium imaging data that identifies the locations of neurons, and estimates their calcium concentrations over time. The pipeline involves several steps, each of which is described briefly in its corresponding function. See scalpelStep0, scalpelStep1, scalpelStep2, scalpelStep3 for more details. Full details for the SCALPEL method are provided in Petersen, A., Simon, N., and Witten, D. (Forthcoming). SCALPEL: Extracting Neurons from Calcium Imaging Data.

Usage

scalpel(
  outputFolder,
  rawDataFolder,
  videoHeight,
  minClusterSize = 1,
  lambdaMethod = "trainval",
  lambda = NULL,
  cutoff = 0.18,
  omega = 0.2,
  fileType = "R",
  processSeparately = TRUE,
  minSize = 25,
  maxSize = 500,
  maxWidth = 30,
  maxHeight = 30,
  removeBorder = FALSE,
  alpha = 0.9,
  thresholdVec = NULL,
  maxSizeToCluster = 3000
)

Arguments

`outputFolder`	Step 0 parameter: The existing directory where the results should be saved.
`rawDataFolder`	Step 0 parameter: The directory where the raw data version of Y is saved. The data should be a PxT matrix, where P is the total number of pixels per image frame and T the number of frames of the video, for which the (i,j)th element contains the fluorescence of the ith pixel in the jth frame. To create Y, you should vectorize each 2-dimensional image frame by concatenating the columns of the image frame. If the data is saved in a single file, it should be named "Y_1.mat", "Y_1.rds", "Y_1.txt", or "Y_1.txt.gz" (depending on `fileType`), and if the data is split over multiple files, they should be split into chunks of the columns and named consecutively ("Y_1.mat", "Y_2.mat", etc.; "Y_1.rds", "Y_2.rds", etc.; "Y_1.txt", "Y_2.txt", etc.; or "Y_1.txt.gz", "Y_2.txt.gz", etc.).
`videoHeight`	Step 0 parameter: The height of the video (in pixels).
`minClusterSize`	Step 3 parameter: The minimum number of preliminary dictionary elements that a cluster must contain in order to be included in the sparse group lasso.
`lambdaMethod`	Step 3 parameter: A description of how lambda should be chosen: either `"trainval"` (default), `"distn"`, or `"user"`. A value of `"trainval"` means lambda will be chosen using a training/validation set approach. A value of `"distn"` means lambda will be chosen as the negative of the 0.1% quantile of elements of active pixels (i.e., those contained in at least one dictionary element) of Y. Using `"distn"` is computationally faster than `"trainval"`. Alternatively with `"user"`, the value of lambda can be directly specified using `lambda`.
`lambda`	Step 3 parameter: The value of lambda to use when fitting the sparse group lasso. By default, the value is automatically chosen using the approach specified by `lambdaMethod`. If a value is provided for `lambda`, then `lambdaMethod` will be ignored.
`cutoff`	Step 2 parameter: A value in [0,1] indicating where to cut the dendrogram that results from hierarchical clustering of the preliminary dictionary elements. The default value is 0.18.
`omega`	Step 2 parameter: A value in [0,1] indicating how to weight spatial vs. temporal information in the dissimilarity metric used for clustering. If `omega=1`, only spatial information is used. The default value is 0.2.
`fileType`	Step 0 parameter: Indicates whether raw data is an .rds (default value; `fileType="R"`), .mat (`fileType="matlab"`), .txt (`fileType="text"`), or .txt.gz (`fileType="zippedText"`) file. Any text files should not have row or column names.
`processSeparately`	Step 0 parameter: Logical scalar giving whether the multiple raw data files should be processed individually, versus all at once. Processing the files separately may be preferable for larger videos. Default value is `TRUE`; this argument is ignored if the raw data is saved in a single file.
`minSize`, `maxSize`	Step 1 parameter: The minimum and maximum size, respectively, for a preliminary dictionary element with default values of 25 and 500, respectively.
`maxWidth`, `maxHeight`	Step 1 parameter: The maximum width and height, respectively, for a preliminary dictionary element with default values of 30.
`removeBorder`	Step 3 parameter: A logical scalar indicating whether the dictionary elements containing pixels in the 10-pixel border of the video should be removed prior to fitting the sparse group lasso. The default value is `FALSE`.
`alpha`	Step 3 parameter: The value of alpha to use when fitting the sparse group lasso. The default value is 0.9.
`thresholdVec`	Optional advanced user argument: Step 1 parameter: A vector with the desired thresholds to use for image segmentation. If not specified, the default is to use the negative of the minimum of the processed Y data, the negative of the 0.1% quantile of the processed Y data, and the mean of these. If there were multiple raw data files that were processed separately, these values are calculated on only the first part of data, and then these thresholds are used for the remaining parts.
`maxSizeToCluster`	Optional advanced user argument: Step 2 parameter: The maximum number of preliminary dictionary elements to cluster at once. We attempt to cluster each overlapping set of preliminary dictionary elements, but if one of these sets is very large (e.g., >10,000), memory issues may result. Thus we perform a two-stage clustering in which we first cluster together random sets of size approximately equaling `maxSizeToCluster` and then cluster together the representatives from the first stage. Finally, we recalculate the representatives using all of the preliminary dictionary elements in the final clusters. The default value is 3000. If `maxSizeToCluster` is set to `NULL`, single-stage clustering is done, regardless of the size of the overlapping sets. Memory issues may result when using this option to force single-stage clustering if the size of the largest overlapping set of preliminary dictionary elements is very large (e.g., >10,000).

Details

Several files containing data from the pipeline, as well as summaries of each step, are saved in various subdirectories of "outputFolder".

Value

An object of class scalpel, which can be summarized using summary, used to rerun SCALPEL Steps 1-3 with new parameters using scalpelStep1, scalpelStep2, and scalpelStep3, or can be used with any of the plotting functions: plotFrame, plotThresholdedFrame, plotVideoVariance, plotCandidateFrame, plotCluster, plotResults, plotResultsAllLambda, plotSpatial, plotTemporal, and plotBrightest. The individual elements are described in detail in the documentation for the corresponding step: scalpelStep0, scalpelStep1, scalpelStep2, and scalpelStep3.

Examples

## Not run: 
### many of the functions in this package are interconnected so the
### easiest way to learn to use the package is by working through the vignette,
### which is available at ajpete.com/software

#existing folder to save results (update this to an existing folder on your computer)
outputFolder = "scalpelResults"
#location on computer of raw data in R package to use
rawDataFolder = gsub("Y_1.rds", "", system.file("extdata", "Y_1.rds", package = "scalpel"))
#video height of raw data in R package
videoHeight = 30
#run SCALPEL pipeline
scalpelOutput = scalpel(outputFolder = outputFolder, rawDataFolder = rawDataFolder,
                       videoHeight = videoHeight)
#summarize each step
summary(scalpelOutput, step = 0)
summary(scalpelOutput, step = 1)
summary(scalpelOutput, step = 2)
summary(scalpelOutput, step = 3)

## End(Not run)

[Package scalpel version 1.0.3 Index]