R: Frequencies on Lists and UpSet Plot

freqs_list {lares}

R Documentation

Frequencies on Lists and UpSet Plot

Description

Visualize frequency of elements on a list, list vector, or vector with comma separated values. Detect which combinations and elements are the most frequent and how much they represent of your total observations. This is similar to the UpSet Plots which may be used as an alternative to Venn diagrams.

Usage

freqs_list(
  df,
  var = NULL,
  wt = NULL,
  fx = "mean",
  rm.na = FALSE,
  min_elements = 1,
  limit = 10,
  limit_x = NA,
  limit_y = NA,
  tail = TRUE,
  size = 10,
  unique = TRUE,
  abc = FALSE,
  title = "",
  plot = TRUE
)

Arguments

`df`	Data.frame
`var`	Variable. Variables you wish to process.
`wt`	Variable, numeric. Select a numeric column to use in the colour scale, used as sum, mean... of those values for each of the combinations.
`fx`	Character. Set operation: mean, sum
`rm.na`	Boolean. Remove NA value from `wt`?
`min_elements`	Integer. Exclude combinations with less than n elements
`limit`, `limit_x`, `limit_y`	Integer. Show top n combinations (x) and/or elements (y). The rest will be grouped into a single element. Set argument to 0 to ignore. `limit_x`/`limit_y` answer to `limit`'s argument.
`tail`	Boolean. Show tail grouped into "..." on the plots?
`size`	Numeric. Text base size
`unique`	Boolean. a,b = b,a?
`abc`	Boolean. Do you wish to sort by alphabetical order?
`title`	Character. Overwrite plot's title with.
`plot`	Boolean. Plot viz? Will be generated anyways in the output object

Value

List. data.frame with the data results, elements and combinations.

Examples

## Not run: 
df <- dplyr::starwars
head(df[, c(1, 4, 5, 12)], 10)

# Characters per movies combinations in a list column
head(df$films, 2)
freqs_list(df, films)

# Skin colours in a comma-separated column
head(df$skin_color)
x <- freqs_list(df, skin_color, min_elements = 2, limit = 5, plot = FALSE)
# Inside "x" we'll have:
names(x)

# Using the 'wt' argument to add a continuous value metric
# into an already one-hot encoded columns dataset (and hide tail)
csv <- "https://raw.githubusercontent.com/hms-dbmi/UpSetR/master/inst/extdata/movies.csv"
movies <- read.csv(csv, sep = ";")
head(movies)
freqs_list(movies,
  wt = AvgRating, min_elements = 2, tail = FALSE,
  title = "Movies\nMixed Genres\nRanking"
)
# So, please: no more Comedy+SciFi and more Drama+Horror films (based on ~50 movies)!

## End(Not run)