R: Identify/Delete Spurious Rows and Columns from DNA Alignments

EmptyCells {ips}

R Documentation

Identify/Delete Spurious Rows and Columns from DNA Alignments

Description

After subsetting (see e.g. DNAbin), DNA sequence alignments can contain rows and columns that consist entirely of missing and/or ambiguous character states. identifyEmptyCells will identify and deleteEmptyCells will delete all such rows (taxa) and columns (characters) from a DNA sequence alignment.

Usage

deleteEmptyCells(
  DNAbin,
  margin = c(1, 2),
  nset = c("-", "n", "?"),
  quiet = FALSE
)

identifyEmptyCells(
  DNAbin,
  margin = c(1, 2),
  nset = c("-", "n", "?"),
  quiet = FALSE
)

Arguments

`DNAbin`	An object of class `DNAbin`.
`margin`	A vector giving the subscripts the function will be applied over: `1` indicates rows, `2` indicates columns, and `c(1, 2)` indicates rows and columns.
`nset`	A vector of mode character; rows or columns that consist only of the characters given in `nset` will be deleted from the alignment. Allowed are `"-"`, `"?"`,`"n"`, `"b"`, `"d"`,`"h"`, `"v"`, `"r"`,`"y"`, `"s"`, `"w"`,`"k"`, and `"m"`.
`quiet`	Logical: if set to `TRUE`, screen output will be suppressed.

Details

For faster execution, deleteEmptyCells handles sequences in ape's bit-level coding scheme.

Value

An object of class DNAbin.

References

Cornish-Bowden, A. 1984. Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. 13: 3021–3030.

Examples

  # COX1 sequences of bark beetles
  data(ips.cox1)
  # introduce completely ambiguous rows and colums
  x <- as.character(ips.cox1[1:6, 1:60])
  x[3, ] <- rep("n", 60)
  x[, 20:24] <- rep("-", 6)
  x <- as.DNAbin(x)
  image(x)
  # identify those rows and colums
  (id <- identifyEmptyCells(x))
  xx <- x[-id$row, -id$col]
  # delete those rows and colums
  x <- deleteEmptyCells(x)
  image(x)
  identical(x, xx)

[Package ips version 0.0.12 Index]