create.diffsmatrix {TRAMPR} | R Documentation |
Calculate Matrix of Distances between Peaks
Description
Generate an array of goodness-of-fit (or distance) between samples and knowns based on the sizes (in base pairs) of TRFLP peaks. For each sample/known combination, and for each enzyme/primer combination, this calculates the minimum distance between any peak in the sample and the single peak in the known.
Usage
create.diffsmatrix(samples, knowns)
Arguments
samples |
A |
knowns |
A |
Details
This function will rarely need to be called directly, but does most of
the calculations behind TRAMP
, so it is useful to
understand how this works.
This function generates a three-dimensional matrix of the (smallest, see below) distance in base
pairs between peaks in a collection of unknowns (run data) and a
database of knowns for several enzyme/primer combinations.
is
the number of different samples in the samples data
(
length(labels(samples))
), is the number of different
types in the knowns database (
length(labels(knowns))
), and
is the number of different enzyme/primer combinations. The
enzyme/primer combinations used are all combinations present in the
knowns database; combinations present only in the samples will be
ignored. Not all samples need contain all enzyme/primer combinations
present in the knowns.
In the resulting array, m[i,j,k]
is the difference (in base
pairs) between the i
th sample and the j
th known for the
k
th enzyme/primer combination. The ordering of the
enzyme/primer combinations is arbitrary, so a data.frame of
combinations is included as the attribute
enzyme.primer
, where
enzyme.primer$enzyme[k]
and enzyme.primer$primer[k]
correspond to enzyme and primer used for the distances in
m[,,k]
.
Each case in the knowns database has a single (or no) peak for each
enzyme/primer combination, but each sample may contain multiple peaks
for an enzyme/primer combination; the difference is always the
smallest distance from the sample to the known peak. Where a sample
and/or a known lacks an enzyme/primer combination, the value of the
difference is NA
. The smallest absolute distance is
taken between sample and known peaks, but the sign of the difference
is preserved (negative where the closest sample peak was less than the
known peak, positive where greater; see absolute.min
).
Value
A three-dimensional matrix, with an attribute enzyme.primer
,
described above.
See Also
TRAMP
, which uses output from
create.diffsmatrix
.
Examples
data(demo.samples)
data(demo.knowns)
s <- length(labels(demo.samples))
k <- length(labels(demo.knowns))
n <- nrow(unique(demo.knowns$data[c("enzyme", "primer")]))
m <- create.diffsmatrix(demo.samples, demo.knowns)
dim(m)
identical(dim(m), c(s, k, n))
## Maximum error for each sample/known (i.e. across all enzyme/primer
## combinations), similar to how calculated by \link{TRAMP}
error <- apply(abs(m), 1:2, max, na.rm=TRUE)
dim(error)
## Euclidian error (see ?\link{TRAMP})
error.euclid <- sqrt(rowSums(m^2, TRUE, 2))/rowSums(!is.na(m), dims=2)
## Euclidian and maximum error will require different values of
## accept.error in TRAMP:
plot(error, error.euclid, pch=".")