create_ondisc_matrix_from_mtx {ondisc} | R Documentation |
Create an ondisc_matrix
from a .mtx file.
Description
Initializes an ondisc_matrix
from a .mtx file, a features.tsv file, and a barcodes.tsv file. Returns an ondisc_matrix
along with cell-specific and feature-specific covariate matrices.
Usage
create_ondisc_matrix_from_mtx(
mtx_fp,
barcodes_fp,
features_fp,
n_lines_per_chunk = 3e+08,
on_disk_dir = NULL,
file_name = NULL,
return_metadata_ondisc_matrix = FALSE,
progress = TRUE
)
Arguments
mtx_fp |
file path to a .mtx file storing the expression data. The .mtx file can represent either an integer matrix or a logical (i.e., binary) matrix. If the .mtx file contains only two columns (after the initial three-column row of metadata), then the .mtx file is assumed to represent a logical matrix. |
barcodes_fp |
file path to the .tsv file containing the cell barcodes. |
features_fp |
file path to the features.tsv file. The first column (required) contains the feature IDs (e.g., ENSG00000186092), and the second column (optional) contains the human-readable feature names (e.g., OR4F5). Subsequent columns are discarded. |
n_lines_per_chunk |
(optional) number of lines in .mtx file to process per chunk. Defaults to 3e+08. |
on_disk_dir |
(optional) directory in which to store the on-disk portion of the ondisc_matrix. Defaults to the directory in which the .mtx file is located. |
file_name |
(optional) name of the file in which to store the .h5 data on-disk. Defaults to ondisc_matrix_x.h5, where x is a unique integer starting at 1. |
return_metadata_ondisc_matrix |
(optional) return the output as a metadata_ondisc_matrix (instead of a list)? Defaults to FALSE. |
progress |
(optional; default FALSE) print progress messages? |
Details
The function can compute the following cell-specific and feature-specific covariates:
cell-specific: (i) total number of features expressed in cell (n_nonzero_cell), (ii) total UMI count (n_umis_cell), and (iii) percentage of UMIs that map to mitochondrial genes (p_mito_cell).
feature-specific: (i) total number of cells in which feature is expressed (n_nonzero_feature), (ii) mean expression of feature across cells (mean_expression_feature), (iii) coefficient of variation of feature expression across cells (coef_of_variation_feature).
The function decides which covariates to compute given the input; in general, the function computes the maximum set of covariates possible.
Value
A list containing (i) an ondisc_matrix, (ii) a cell-specific covariate matrix, and (iii) a feature-specific covariate matrix; if the parameter return_metadata_ondisc_matrix set to TRUE, converts the list to a metadata_ondisc_matrix before returning.
Examples
## Not run:
# First example: initialize a metadata_ondisc_matrix
# using simulated expression data; store output in tempdir()
file_locs <- system.file("extdata",package = "ondisc",
c("gene_expression.mtx", "genes.tsv", "cell_barcodes.tsv"))
names(file_locs) <- c("expressions", "features", "barcodes")
expression_data <- create_ondisc_matrix_from_mtx(mtx_fp = file_locs[["expressions"]],
barcodes_fp = file_locs[["barcodes"]],
features_fp = file_locs[["features"]],
on_disk_dir = tempdir(),
file_name = "expressions",
return_metadata_ondisc_matrix = TRUE)
saveRDS(object = expression_data, file = paste0(tempdir(), "/expressions.rds"))
# Second example: initialize a metadata_ondisc_matrix using simulated
# gRNA perturbation data; store in tempdir()
file_locs <- system.file("extdata", package = "ondisc",
c("perturbation.mtx", "guides.tsv", "cell_barcodes.tsv"))
names(file_locs) <- c("perturbations", "features", "barcodes")
perturbation_data <- create_ondisc_matrix_from_mtx(mtx_fp = file_locs[["perturbations"]],
barcodes_fp = file_locs[["barcodes"]],
features_fp = file_locs[["features"]],
on_disk_dir = tempdir(),
file_name = "perturbations",
return_metadata_ondisc_matrix = TRUE)
saveRDS(object = perturbation_data, file = paste0(tempdir(), "/perturbations.rds"))
## End(Not run)