assess_missing_data_tsne {SNPfiltR}R Documentation

Vizualise how missing data thresholds affect sample clustering

Description

This function can be run in two ways: 1) Without 'thresholds' specified. This will run t-SNE for the input vcf without filtering, and visualize the clustering of samples in two-dimensional space, coloring each sample according to a priori population assignment given in the popmap. 2) With 'thresholds' specified. This will filter your input vcf file to the specified missing data thresholds, and run a t-SNE clustering analysis for each filtering iteration. For each iteration, a 2D plot will be output showing clustering according to the specified popmap. This option is ideal for assessing the effects of missing data on clustering patterns.

Usage

assess_missing_data_tsne(
  vcfR,
  popmap = NULL,
  thresholds = NULL,
  perplexity = NULL,
  iterations = NULL,
  initial_dims = NULL,
  clustering = TRUE
)

Arguments

vcfR

a vcfR object

popmap

set of population assignments that will be used to color code the plots

thresholds

a vector specifying the missing data filtering thresholds to explore

perplexity

numerical value specifying the perplexity paramter during t-SNE (default: 5)

iterations

a numerical value specifying the number of iterations for t-SNE (default: 1000)

initial_dims

a numerical value specifying the number of initial_dimensions for t-SNE (default: 5)

clustering

use partitioning around medoids (PAM) to do unsupervised clustering on the output? (default = TRUE, max clusters = # of levels in popmap + 2)

Value

a series of plots showing the clustering of all samples in two-dimensional space

Examples

assess_missing_data_tsne(vcfR = SNPfiltR::vcfR.example,
popmap = SNPfiltR::popmap,
thresholds = .8)

[Package SNPfiltR version 1.0.1 Index]