RunPCA {MARVEL}R Documentation

Principle component analysis

Description

Performs principle component analysis on splicing or gene data. This is a wrapper function for RunPCA.PSI and RunPCA.Exp.

Usage

RunPCA(
  MarvelObject,
  cell.group.column,
  cell.group.order = NULL,
  cell.group.colors = NULL,
  sample.ids = NULL,
  min.cells = 25,
  features,
  point.size = 0.5,
  point.alpha = 0.75,
  point.stroke = 0.1,
  seed = 1,
  method.impute = "random",
  cell.group.column.impute = NULL,
  level
)

Arguments

MarvelObject

Marvel object. S3 object generated from TransformExpValues function.

cell.group.column

Character string. The name of the sample metadata column in which the variables will be used to label the cell groups on the PCA.

cell.group.order

Character string. The order of the variables under the sample metadata column specified in cell.group.column to appear in the PCA cell group legend.

cell.group.colors

Character string. Vector of colors for the cell groups specified for PCA analysis using cell.type.columns and cell.group.order. If not specified, default ggplot2 colors will be used.

sample.ids

Character strings. Specific cells to plot.

min.cells

Numeric value. The minimum no. of cells expressing the splicing event or gene for the event or gene, respectively, to be included for analysis.

features

Character string. Vector of tran_id or gene_id for analysis. Should match tran_id or gene_id column of MarvelObject$ValidatedSpliceFeature or MarvelObject$GeneFeature when level set to "splicing" or "gene", respectively.

point.size

Numeric value. Size of data points on reduced dimension space.

point.alpha

Numeric value. Transparency of the data points on reduced dimension space. Take any values between 0 to 1. The smaller the value, the more transparent the data points will be.

point.stroke

Numeric value. The thickness of the outline of the data points. The larger the value, the thicker the outline of the data points.

seed

Numeric value. Only applicable when level set to "splicing". Ensures imputed values for NA PSIs are reproducible.

method.impute

Character string. Only applicable when level set to "splicing". Indicate the method for imputing missing PSI values (low coverage). "random" method randomly assigns any values between 0-1. "population.mean" method uses the mean PSI value for each cell population. Default option is "population.mean".

cell.group.column.impute

Character string. Only applicable when method.impute set to "population.mean". The name of the sample metadata column in which the variables will be used to impute missing values.

level

Character string. Indicate "splicing" or "gene" for splicing or gene expression analysis, respectively

Value

An object of class S3 with new slots MarvelObject$PCA$PSI$Results, MarvelObject$PCA$PSI$Plot, and MarvelObject$PCA$PSI$Plot.Elbow or MarvelObject$PCA$Exp$Results, MarvelObject$PCA$Exp$Plot, and MarvelObject$PCA$Exp$Plot.Elbow, when level option specified as "splicing" or "gene", respectively.

Examples

marvel.demo <- readRDS(system.file("extdata/data", "marvel.demo.rds", package="MARVEL"))

# Define splicing events for analysis
df <- do.call(rbind.data.frame, marvel.demo$PSI)
tran_ids <- df$tran_id

# PCA
marvel.demo <- RunPCA(MarvelObject=marvel.demo,
                      sample.ids=marvel.demo$SplicePheno$sample.id,
                      cell.group.column="cell.type",
                      cell.group.order=c("iPSC", "Endoderm"),
                      cell.group.colors=NULL,
                      min.cells=5,
                      features=tran_ids,
                      level="splicing",
                      point.size=2
                      )

# Check outputs
head(marvel.demo$PCA$PSI$Results$ind$coord)
marvel.demo$PCA$PSI$Plot

[Package MARVEL version 1.4.0 Index]