R: Show RNAseq data overlayed on a scatter plot

scatterPlot {dittoViz}

R Documentation

Show RNAseq data overlayed on a scatter plot

Description

Show RNAseq data overlayed on a scatter plot

Usage

scatterPlot(
  data_frame,
  x.by,
  y.by,
  color.by = NULL,
  shape.by = NULL,
  split.by = NULL,
  size = 1,
  rows.use = NULL,
  show.others = TRUE,
  x.adjustment = NULL,
  y.adjustment = NULL,
  color.adjustment = NULL,
  x.adj.fxn = NULL,
  y.adj.fxn = NULL,
  color.adj.fxn = NULL,
  split.show.all.others = TRUE,
  opacity = 1,
  color.panel = dittoColors(),
  colors = seq_along(color.panel),
  split.nrow = NULL,
  split.ncol = NULL,
  split.adjust = list(),
  multivar.split.dir = c("col", "row"),
  shape.panel = c(16, 15, 17, 23, 25, 8),
  rename.color.groups = NULL,
  rename.shape.groups = NULL,
  min.color = "#F0E442",
  max.color = "#0072B2",
  min.value = NA,
  max.value = NA,
  plot.order = c("unordered", "increasing", "decreasing", "randomize"),
  xlab = x.by,
  ylab = y.by,
  main = "make",
  sub = NULL,
  theme = theme_bw(),
  do.hover = FALSE,
  hover.data = unique(c(color.by, paste0(color.by, ".color.adj"), "color.multi",
    "color.which", x.by, paste0(x.by, ".x.adj"), y.by, paste0(y.by, ".y.adj"), shape.by,
    split.by)),
  hover.round.digits = 5,
  do.contour = FALSE,
  contour.color = "black",
  contour.linetype = 1,
  add.trajectory.by.groups = NULL,
  add.trajectory.curves = NULL,
  trajectory.group.by,
  trajectory.arrow.size = 0.15,
  add.xline = NULL,
  xline.linetype = "dashed",
  xline.color = "black",
  add.yline = NULL,
  yline.linetype = "dashed",
  yline.color = "black",
  do.letter = FALSE,
  do.ellipse = FALSE,
  do.label = FALSE,
  labels.size = 5,
  labels.highlight = TRUE,
  labels.repel = TRUE,
  labels.repel.adjust = list(),
  labels.split.by = split.by,
  legend.show = TRUE,
  legend.color.title = "make",
  legend.color.size = 5,
  legend.color.breaks = waiver(),
  legend.color.breaks.labels = waiver(),
  legend.shape.title = shape.by,
  legend.shape.size = 5,
  show.grid.lines = TRUE,
  do.raster = FALSE,
  raster.dpi = 300,
  data.out = FALSE
)

Arguments

`data_frame`	A data_frame where columns are features and rows are observations you might wish to visualize.
`x.by`, `y.by`	Single strings denoting the name of a column of `data_frame` containing numeric data to use for the x- and y-axis of the scatterplot.
`color.by`	Single string denoting the name of a column of `data_frame` to use for setting the color of plotted points. Alternatively, a string vector naming multiple such columns of data to plot at once.
`shape.by`	Single string denoting the name of a column of `data_frame` containing discrete data to use for setting the shape of plotted points.
`split.by`	1 or 2 strings denoting the name(s) of column(s) of `data_frame` containing discrete data to use for faceting / separating data points into separate plots. When 2 columns are named, c(row,col), the first is used as rows and the second is used for columns of the resulting facet grid. When 1 column is named, shape control can be achieved with `split.nrow` and `split.ncol`
`size`	Number which sets the size of data points. Default = 1.
`rows.use`	String vector of rownames of `data_frame` OR an integer vector specifying the row-indices of data points which should be plotted. Alternatively, a Logical vector, the same length as the number of rows in `data_frame`, where `TRUE` values indicate which rows to plot.
`show.others`	Logical. TRUE by default, whether rows not targeted by `rows.use` should be shown in the background in light gray.
`x.adjustment`, `y.adjustment`, `color.adjustment`	A recognized string indicating whether numeric `x.by`, `y.by`, and `color.by` data should be used directly (default) or should be adjusted to be "z-score": scaled with the scale() function to produce a relative-to-mean z-score representation "relative.to.max": divided by the maximum value to give percent of max values between [0,1] Ignored if the target data is not numeric as these known adjustments target numeric data only. In order to leave the unedited data available for use in other features, the adjusted data are put in a new column and that new column is used for plotting.
`x.adj.fxn`, `y.adj.fxn`, `color.adj.fxn`	If you wish to apply a function to edit the `x.by`, `y.by`, or `color.by` data before use, in a way not possible with the `color.adjustment` input, this input can be given a function which takes in a vector of values as input and returns a vector of values of the same length as output. For example, `function(x) {log2(x)}` or `as.factor`. In order to leave the unedited data available for use in other features, the adjusted data are put in a new column and that new column is used for plotting.
`split.show.all.others`	Logical which sets whether gray "others" points of facets should include all points of other facets (`TRUE`) versus just points left out by `rows.use` which would exist in the current facet (`FALSE`).
`opacity`	Number between 0 and 1. 1 = opaque. 0 = invisible. Default = 1. (In terms of typical ggplot variables, = alpha)
`color.panel`	String vector which sets the colors to draw from when `color.by` indicates discrete data. `dittoColors()` by default, see `dittoColors` for contents. A named vector can be used if names are matched to the distinct values of the `color.by` data.
`colors`	Integer vector, the indexes / order, of colors from `color.panel` to actually use. Useful for quickly swapping around colors of the default set (when not using names for color matching).
`split.nrow`, `split.ncol`	Integers which set the dimensions of faceting/splitting when faceting by a single feature.
`split.adjust`	A named list which allows extra parameters to be pushed through to the faceting function call. List elements should be valid inputs to the faceting functions, e.g. 'list(scales = "free")'. For options, when giving 1 column to `split.by`, see `facet_wrap`, OR when giving 2 columns to `split.by`, see `facet_grid`.
`multivar.split.dir`	"row" or "col", sets the direction of faceting used for 'var' values when: `var` is given multiple column names AND `split.by` is used to provide an additional feature to facet by
`shape.panel`	Vector of integers, corresponding to ggplot shapes, which sets what shapes to use in conjunction with `shape.by`. When nothing is supplied to `shape.by`, only the first value is used. Default is a set of 6, `c(16,15,17,23,25,8)`, the first being a simple, solid, circle.
`rename.color.groups`	String vector which sets new names for the identities of `color.by` groups.
`rename.shape.groups`	String vector which sets new names for the identities of `shape.by` groups.
`min.color`	color for `min` value of numeric `color.by`-data. Default = yellow
`max.color`	color for `max` value of numeric `color.by`-data. Default = blue
`min.value`, `max.value`	Number which sets the `color.by`-data value associated with the minimum or maximum colors.
`plot.order`	String. If the data should be plotted based on the order of the color data, sets whether to plot in "increasing", "decreasing", or "randomize"d order.
`xlab`, `ylab`	Strings which set the labels for the axes. To remove, set to `NULL`.
`main`	String, sets the plot title. A default title is automatically generated based on `color.by` and `shape.by` when either are provided. To remove, set to `NULL`.
`sub`	String, sets the plot subtitle.
`theme`	A ggplot theme which will be applied before internal adjustments. Default = `theme_bw()`. See https://ggplot2.tidyverse.org/reference/ggtheme.html for other options and ideas.
`do.hover`	Logical which controls whether the ggplot output will be converted to a plotly object so that data about individual points can be displayed when you hover your cursor over them. The `hover.data` argument is used to determine what data to show upon hover.
`hover.data`	String vector which denotes what data to show for each data point, upon hover, when `do.hover` is set to `TRUE`. Defaults to all data expected to be useful. Only values present in the plotting data are actually used. These can be column names of `data_frame` and any column names which will be created to accommodate multivar and data adjustment functionality. You can run the function with `data.out = TRUE` and inspect the `$Target_data` output's columns to view your available options.
`hover.round.digits`	Integer number specifying the number of decimal digits to round displayed numeric values to, when `do.hover` is set to `TRUE`.
`do.contour`	Logical. Whether density-based contours should be displayed.
`contour.color`	String that sets the color of the `do.contour` contours.
`contour.linetype`	String or numeric which sets the type of line used for `do.contour` contours. Defaults to "solid", but see `linetype` for other options.
`add.trajectory.by.groups`	List of vectors representing trajectory paths, each from start-group to end-group, where vector contents are the group-names indicated by the `trajectory.group.by` column of `data_frame`.
`add.trajectory.curves`	List of matrices, each representing coordinates for a trajectory path, from start to end, where matrix columns represent x and y coordinates of the paths.
`trajectory.group.by`	String denoting the name of a column of `data_frame` to use for generating trajectories from data point groups.
`trajectory.arrow.size`	Number representing the size of trajectory arrows, in inches. Default = 0.15.
`add.xline`	numeric value(s) where one or multiple vertical line(s) should be added.
`xline.linetype`	String which sets the type of line for `add.xline`. Defaults to "dashed", but any ggplot linetype will work.
`xline.color`	String that sets the color(s) of the `add.xline` line(s).
`add.yline`	numeric value(s) where one or multiple vertical line(s) should be added.
`yline.linetype`	String which sets the type of line for `add.yline`. Defaults to "dashed", but any ggplot linetype will work.
`yline.color`	String that sets the color(s) of the `add.yline` line(s).
`do.letter`	Logical which sets whether letters should be added on top of the colored dots. For extended colorblindness compatibility. NOTE: `do.letter` is ignored if `do.hover = TRUE` or `shape.by` is used because lettering is incompatible with plotly and with changing the dots' to be different shapes.
`do.ellipse`	Logical. Whether `color.by` groups should be surrounded by median-centered ellipses.
`do.label`	Logical. Whether to add text labels near the center (median) of `color.by` groups.
`labels.size`	Number which sets the size of labels text when `do.label = TRUE`.
`labels.highlight`	Logical. Whether labels should have a box behind them when `do.label = TRUE`.
`labels.repel`	Logical, that sets whether the labels' placements will be adjusted with ggrepel to avoid intersections between labels and plot bounds when `do.label = TRUE`. TRUE by default.
`labels.repel.adjust`	A named list which allows extra parameters to be pushed through to ggrepel function calls. List elements should be valid inputs to the `geom_label_repel` by default, or `geom_text_repel` when `labels.highlight = FALSE`.
`labels.split.by`	String of one or two column names which controls the facet-split calculations for label placements. Defaults to `split.by`, so generally there is no need to adjust this except when if you plan to apply faceting externally.
`legend.show`	Logical. Whether any legend should be displayed. Default = `TRUE`.
`legend.color.title`, `legend.shape.title`	Strings which set the title for the color or shape legends.
`legend.color.size`, `legend.shape.size`	Numbers representing the size of shapes in the color and shape legends (for discrete variable plotting). Default = 5. *Enlarging the icons in the colors legend is incredibly helpful for making colors more distinguishable by color blind individuals.
`legend.color.breaks`	Numeric vector which sets the discrete values to label in the color-scale legend for `color.by`-data.
`legend.color.breaks.labels`	String vector, with same length as `legend.color.breaks`, which sets the labels for the tick marks of the color-scale.
`show.grid.lines`	Logical which sets whether grid lines should be shown within the plot space.
`do.raster`	Logical. When set to `TRUE`, rasterizes the internal plot layer, changing it from individually encoded points to a flattened set of pixels. This can be useful for editing in external programs (e.g. Illustrator) when there are many thousands of data points.
`raster.dpi`	Number indicating dots/pixels per inch (dpi) to use for rasterization. Default = 300.
`data.out`	Logical. When set to `TRUE`, changes the output, from the plot alone, to a list containing the plot ("p"), a data.frame containing the underlying data for target rows ("Target_data"), a data.frame containing the underlying data for non-target rows ("Others_data"), and the ultimately used mapping of columns to given aesthetic sets ("cols_used"), because modification of newly made columns is required for many features.

Details

This function first makes any requested adjustments to data in the given data_frame, internally only, such as scaling the color.by-column if color.adjustment was given "z-score".

Next, if a set of rows to target was indicated with the rows.use input, then the data_frame is split into Target_data and Others_data.

Then, rows are reordered to match with the requested plot.order behavior.

Finally, a scatter plot is created from the resultant data.frames. Non-target data points are colored in gray if show.others=TRUE, and target data points are displayed on top, colored and shaped based on the color.by- and shape.by-associated data. If split.by was used, the plot will be split into a matrix of panels based on the associated groupings.

Value

a ggplot scatterplot where colored dots and/or shapes represent individual rows of the given data_frame.

Alternatively, if data.out=TRUE, a list containing four slots is output: the plot (named 'p'), a data.frame containing the underlying data for target rows (named 'Target_data'), a data.frame containing the underlying data for non-target rows (named 'Others_data'), and a list providing mappings of final column names in 'Target_data' to given plot aesthetics (named 'cols_used') because modification of newly made columns is required for many features.

Alternatively, if do.hover is set to TRUE, the plot is coverted from ggplot to plotly & additional information about each data point, determined by the hover.data input, is displayed upon hovering the cursor over the plot.

Many characteristics of the plot can be adjusted using discrete inputs

size and opacity can be used to adjust the size and transparency of the data points. size can be given a number, or a column name of data_frame.
Colors used can be adjusted with color.panel and/or colors for discrete data, or min, max, min.color, and max.color for continuous data.
Shapes used can be adjusted with shape.panel.
Color and shape labels can be changed using rename.color.groups and rename.shape.groups.
Titles and axes labels can be adjusted with main, sub, xlab, ylab, and legend.title arguments.
Legends can also be adjusted in other ways, using variables that all start with "legend." for easy tab completion lookup.

Author(s)

Daniel Bunis

Examples

example("dittoExampleData", echo = FALSE)

# The minimal inputs for scatterPlot are the 'data_frame', and 2 column names,
#   given to 'x.by' and 'y.by', indicating which data to use for the x and y
#   axes, respectively.
scatterPlot(
    example_df, x.by = "PC1", y.by = "PC2")

# 'color.by' and/or 'shape.by' can also be given column names in order to
#   show represent that columns data in the color or shape of the data points.
#   'shape.by' must be pointed to discrete data, but 'color.by' can be given
#   discrete or numeric data.
scatterPlot(
    example_df, x.by = "PC1", y.by = "PC2",
    color.by = "groups",
    shape.by = "SNP",
    size = 3)
scatterPlot(
    example_df, x.by = "PC1", y.by = "PC2",
    color.by = "gene1",
    size = 3)

# Data can be "split" or faceted by a discrete variable as well.
scatterPlot(example_df, x.by = "PC1", y.by = "PC2", color.by = "gene1",
    split.by = "timepoint") # single split.by element
scatterPlot(example_df, x.by = "PC1", y.by = "PC2", color.by = "gene1",
    split.by = c("groups","SNP")) # row and col split.by elements

# Modify the look with intuitive inputs
scatterPlot(example_df, x.by = "PC1", y.by = "PC2", color.by = "groups",
    size = 5,
    opacity = 0.3,
    show.grid.lines = FALSE,
    ylab = NULL, xlab = "PC2 by PC1",
    main = "Plot Title",
    sub = "subtitle",
    legend.color.title = "Legend\nRetitle")

# You can restrict to only certain data points using the 'rows.use' input.
#   The input can be given rownames, indexes, or a logical vector
#   All "other" points will now only be shown as a gray background, or will not
#   be shown add all if you also add 'show.others = FALSE'
scatterPlot(example_df, x.by = "PC1", y.by = "PC2", color.by = "groups",
    sub = "show only first 40 observations, by index",
    rows.use = 1:40)
scatterPlot(example_df, x.by = "PC1", y.by = "PC2", color.by = "groups",
    sub = "show only 3 observations, by name",
    rows.use = c("obs1", "obs2", "obs25"))
scatterPlot(example_df, x.by = "PC1", y.by = "PC2", color.by = "groups",
    sub = "show groups A,B,D only, by logical, without others as background",
    rows.use = example_df$groups!="C",
    show.others = FALSE)

# Many extra features are easy to add as well:
#   Each is started via an input starting with 'do.FEATURE*' or 'add.FEATURE*'
#   And when tweaks for that feature are possible, those inputs will start be
#   named starting with 'FEATURE*'. For example, color.by groups can be labeled
#   with 'do.label = TRUE' and the tweaks for this feature are given with inputs
#   'labels.size', 'labels.highlight', and 'labels.repel':
scatterPlot(example_df, x.by = "PC1", y.by = "PC2", color.by = "groups",
    sub = "default labeling",
    do.label = TRUE)          # Turns on the labeling feature
scatterPlot(example_df, x.by = "PC1", y.by = "PC2", color.by = "groups",
    sub = "tweaked labeling",
    do.label = TRUE,          # Turns on the labeling feature
    labels.size = 8,          # Adjust the text size of labels
    labels.highlight = FALSE, # Removes white background behind labels
    labels.repel = FALSE)     # Turns off anti-overlap location adjustments

# Faceting can also be used to show multiple continuous variables side-by-side
#   by giving a vector of column names to 'color.by'.
#   This can also be combined with 1 'split.by' variable, with direction then
#   controlled via 'multivar.split.dir':
scatterPlot(example_df, x.by = "PC1", y.by = "PC2",
    color.by = c("gene1", "gene2"))
scatterPlot(example_df, x.by = "PC1", y.by = "PC2",
    color.by = c("gene1", "gene2"),
    split.by = "groups")
scatterPlot(example_df, x.by = "PC1", y.by = "PC2",
    color.by = c("gene1", "gene2"),
    split.by = "groups",
    multivar.split.dir = "row")

# Sometimes, it can be useful for external editing or troubleshooting purposes
#   to see the underlying data that was directly used for plotting.
# 'data.out = TRUE' can be provided in order to obtain not just plot ("plot"),
#   but also the "Target_data" and "Others_data" data.frames and "cols_used"
#   returned as a list.
out <- scatterPlot(example_df, x.by = "PC1", y.by = "PC2", color.by = "groups",
    rows.use = 1:40,
    data.out = TRUE)
out$plot
summary(out$Target_data)
summary(out$Others_data)
out$cols_used

[Package dittoViz version 1.0.1 Index]

Show RNAseq data overlayed on a scatter plot

Description

Usage

Arguments

Details

Value

Many characteristics of the plot can be adjusted using discrete inputs

Author(s)

See Also

Examples