PAM.hm {PAMhm} | R Documentation |
Main function to produce a heatmap using PAM clustering.
Description
This is the main wrapper function to be called by end users. It accepts a numeric matrix (or an object that can be coerced to a numeric matrix) or a number of data file formats and produces one or more PDFs with the plots.
Usage
PAM.hm(
x,
project.folder = ".",
nsheets = 1,
dec = ".",
header = TRUE,
symbolcol = 1,
sample.names = NULL,
cluster.number = 4,
trim = NULL,
winsorize.mat = TRUE,
cols = "BlueWhiteRed",
dendrograms = "Both",
autoadj = TRUE,
pdf.height = 10,
pdf.width = 10,
labelheight = 0.25,
labelwidth = 0.2,
r.cex = 0.5,
c.cex = 1,
medianCenter = NULL,
log = FALSE,
do.log = FALSE,
log.base = 2,
metric = "manhattan",
na.strings = "NA",
makeFolder = TRUE,
do.pdf = FALSE,
do.png = FALSE,
save.objects = FALSE
)
Arguments
x |
( |
project.folder |
( |
nsheets |
( |
dec |
( |
header |
( |
symbolcol |
( |
sample.names |
( |
cluster.number |
( |
trim |
( |
winsorize.mat |
( |
cols |
( |
dendrograms |
( |
autoadj |
( |
pdf.height |
( |
pdf.width |
( |
labelheight |
( |
labelwidth |
( |
r.cex |
( |
c.cex |
( |
medianCenter |
( |
log |
( |
do.log |
( |
log.base |
( |
metric |
( |
na.strings |
( |
makeFolder |
( |
do.pdf |
( |
do.png |
( |
save.objects |
( |
Details
Argument x
can be a data.frame
or numeric matrix to be used directly for plotting the heatmap.
If it is a data.frame
argument symbolcol
sets the respective columns for symbols to be used as
labels and the column where the numeric data starts.
Matrices will be coerced to data frames.
The read function accepts txt, tsv, csv and xls files.
If PDF, PNG or R object files are to be saved, i.e., if the corresponding arguments are TRUE
, a results
folder will be created using time and date to create a unique name. The folder will be created in the directory
set by argument project.folder
. The reasoning behind that behaviour is that during development the
heatmap was used as data analysis tool testing various cluster.number
values with numerous files and
comparing the results.
The cluster.number
argument defines the numbers of clusters when doing PAM. After processing it is passed
one-by-one to argument k
in pam
. The numbers can be defined in the form
c("2","4-7", "9")
, for example, depending on the experimental setup. An integer vector is coerced to
character.
If autoadj
is TRUE
character expansion (cex) for rows annd columns, pdf width and height and
label width and height are adjusted automatically based on the dimensions of the data matrix and length
(number of characters) of the labels.
The default behavior regarding outliers is to winsorize the matrix before plotting, i.e., shrink outliers to the unscattered part of the data by replacing extreme values at both ends of the distribution with less extreme values. This is done for the same reason as trimming but the data will not be symmetrical around 0.
Value
A list: Invisibly returns the results object from the PAM clustering.
References
Kaufman, L., & Rousseeuw, P. J. (Eds.). (1990). Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, Inc. doi: 10.1002/9780470316801
See Also
Examples
# Generate a random 10x10 matrix and plot it using default values
set.seed(1234) # for reproducibility
mat <- matrix(rnorm(120), nrow = 20) # standard normal
PAM.hm(mat, cluster.number = 3)
## Plot with more than one cluster number
PAM.hm(mat, cluster.number = 2:4) # integer vector
PAM.hm(mat, cluster.number = c("2", "4-5")) # character vector
# Using the 'trim' argument
## Introduce outlier to the matrix and plot w/o trimming or winsorization
mat[1] <- mat[1] * 10
PAM.hm(mat, cluster.number = 3, trim = NULL, winsorize = FALSE)
## calculate a trim value by getting the largest possible absolute integer and
## plot again
tr <- min(abs(ceiling(c(min(mat, na.rm = TRUE), max(mat, na.rm = TRUE)))),
na.rm = TRUE)
PAM.hm(mat, cluster.number = 3, trim = tr, winsorize = FALSE)
## Note that the outlier is still visible but since it is less extreme
## it does not distort the colour scheme.
# An example reading data from an Excel file
# The function readxl::read_excel is used internally to read Excel files.
# The example uses their example data.
readxl_datasets <- readxl::readxl_example("datasets.xlsx")
PAM.hm(readxl_datasets, cluster.number = 4, symbolcol = 5)