disteg {lineup} | R Documentation |
Calculate distance between two gene expression data sets
Description
Calculate a distance between all pairs of individuals for two gene expression data sets
Usage
disteg(
cross,
pheno,
pmark,
min.genoprob = 0.99,
k = 20,
min.classprob = 0.8,
classprob2drop = 1,
repeatKNN = TRUE,
max.selfd = 0.3,
phenolabel = "phenotype",
weightByLinkage = FALSE,
map.function = c("haldane", "kosambi", "c-f", "morgan"),
verbose = TRUE
)
Arguments
cross |
An object of class |
pheno |
A data frame of phenotypes (generally gene expression data), stored as individuals x phenotypes. The row names must contain individual identifiers. |
pmark |
Pseudomarkers that are closest to the genes in |
min.genoprob |
Threshold on genotype probabilities; if maximum
probability is less than this, observed genotype taken as |
k |
Number of nearest neighbors to consider in forming a k-nearest neighbor classifier. |
min.classprob |
Minimum proportion of neighbors with a common class to make a class prediction. |
classprob2drop |
If an individual is inferred to have a genotype mismatch with classprob > this value, treat as an outlier and drop from the analysis and then repeat the KNN construction without it. |
repeatKNN |
If TRUE, repeat k-nearest neighbor a second time, after omitting individuals who seem to not be self-self matches |
max.selfd |
Min distance from self (as proportion of mismatches between observed and predicted eQTL genotypes) to be excluded from the second round of k-nearest neighbor. |
phenolabel |
Label for expression phenotypes to place in the output distance matrix. |
weightByLinkage |
If TRUE, weight the eQTL to account for their relative positions (for example, two tightly linked eQTL would each count about 1/2 of an isolated eQTL) |
map.function |
Used if |
verbose |
if TRUE, give verbose output. |
Details
We consider the expression phenotypes in batches, by which pseudomarker they
are closest to. For each batch, we pull the genotype probabilities at the
corresponding pseudomarker and use the individuals that are in common
between cross
and pheno
and whose maximum genotype probability
is above min.genoprob
, to form a classifier of eQTL genotype from
expression values, using k-nearest neighbor (the function
class::knn()
). The classifier is applied to all individuals with
expression data, to give a predicted eQTL genotype. (If the proportion of
the k nearest neighbors with a common class is less than
min.classprob
, the predicted eQTL genotype is left as NA
.)
If repeatKNN
is TRUE, we repeat the construction of the k-nearest
neighbor classifier after first omitting individuals whose proportion of
mismatches between observed and inferred eQTL genotypes is greater than
max.selfd
.
Finally, we calculate the distance between the observed eQTL genotypes for
each individual in cross
and the inferred eQTL genotypes for each
individual in pheno
, as the proportion of mismatches between the
observed and inferred eQTL genotypes.
If weightByLinkage
is TRUE
, we use weights on the mismatch
proportions for the various eQTL, taking into account their linkage. Two
tightly linked eQTL will each be given half the weight of a single isolated
eQTL.
Value
A matrix with nind(cross)
rows and nrow(pheno)
columns, containing the distances. The individual IDs are in the row and
column names. The matrix is assigned class "lineupdist"
.
The names of the genes that were used to construct the classifier are saved
in an attribute "retained"
.
The observed and inferred eQTL genotypes are saved as attributes
"obsg"
and "infg"
.
The denominators of the proportions that form the inter-individual distances
are in the attribute "denom"
.
Author(s)
Karl W Broman, broman@wisc.edu
See Also
distee()
, summary.lineupdist()
,
pulldiag()
, omitdiag()
, findCommonID()
,
find.gene.pseudomarker()
, calc.locallod()
,
plot.lineupdist()
, class::knn()
,
plotEGclass()
Examples
library(qtl)
# load example data
data(f2cross, expr1, pmap, genepos)
# calculate QTL genotype probabilities
f2cross <- calc.genoprob(f2cross, step=1)
# find nearest pseudomarkers
pmark <- find.gene.pseudomarker(f2cross, pmap, genepos)
# line up individuals
id <- findCommonID(f2cross, expr1)
# calculate LOD score for local eQTL
locallod <- calc.locallod(f2cross[,id$first], expr1[id$second,], pmark)
# take those with LOD > 25
expr1s <- expr1[,locallod>25,drop=FALSE]
# calculate distance between individuals
# (prop'n mismatches between obs and inferred eQTL geno)
d <- disteg(f2cross, expr1s, pmark)
# plot distances
plot(d)
# summary of apparent mix-ups
summary(d)
# plot of classifier for and second eQTL
par(mfrow=c(2,1), las=1)
plotEGclass(d)
plotEGclass(d, 2)