R: Data preprocessing

Preprocessing {scapGNN}

R Documentation

Data preprocessing

Description

This function is to prepare data for the ConNetGNN function.

Usage

Preprocessing(data, parallel.cores = 1, verbose = TRUE)

Arguments

`data`	The input data should be a data frame or a matrix where the rows are genes and the columns are cells. The `seurat` object are also accepted.
`parallel.cores`	Number of processors to use when doing the calculations in parallel (default: `2`). If `parallel.cores=0`, then it will use all available core processors unless we set this argument with a smaller number.
`verbose`	Gives information about each step. Default: `TRUE`.

Details

Preprocessing

The function is able to interface with the seurat framework. The process of seurat data processing refers to Examples. The input data should be containing hypervariable genes and log-transformed. Left-truncated mixed Gaussian (LTMG) modeling to calculate gene regulatory signal matrix. Positively correlated gene-gene and cell-cell are used as the initial gene correlation matrix and cell correlation matrix.

Value

A list:

orig_dara: User-submitted raw data, rows are highly variable genes and columns are cells.
cell_features: Cell feature matrix.
gene_features: Gene feature matrix.
ltmg_matrix: Gene regulatory signal matrix for LTMG.
cell_adj: The adjacency matrix of the cell correlation network.
gene_adj: The adjacency matrix of the gene correlation network.

Examples


# Load dependent packages.
# require(coop)

# Seurat data processing.
# require(Seurat)

# Load the PBMC dataset (Case data for seurat)
# pbmc.data <- Read10X(data.dir = "../data/pbmc3k/filtered_gene_bc_matrices/hg19/")

# Our recommended data filtering is that only genes expressed as non-zero in more than
# 1% of cells, and cells expressed as non-zero in more than 1% of genes are kept.
# In addition, users can also filter mitochondrial genes according to their own needs.
# pbmc <- CreateSeuratObject(counts = pbmc.data, project = "case",
#                                     min.cells = 3, min.features = 200)
# pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
# pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)

# Normalizing the data.
# pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize")

# Identification of highly variable features.
# pbmc <- FindVariableFeatures(pbmc, selection.method = 'vst', nfeatures = 2000)

# Run Preprocessing.
# Prep_data <- Preprocessing(pbmc)



# Users can also directly input data
# in data frame or matrix format
# containing highly variable genes.
data("Hv_exp")
Hv_exp <- Hv_exp[,1:20]
Hv_exp <- Hv_exp[which(rowSums(Hv_exp) > 0),]
Prep_data <- Preprocessing(Hv_exp[1:10,])

[Package scapGNN version 0.1.4 Index]