cta20 {Modalclust} | R Documentation |
Two dimensional data in original and log scale
Description
Two dimensional data in original and log scale and their hierarchical modal clustering. This dataset demonstrates the fact that modal clustering techniques can be used to cluster untransformed data as it does not depend on parametric assumptions. The clustering results, before and after the log transformation both produce nice separation of the three clusters.
Usage
data(cta20)
data(cta20.hmac)
data(logcta20)
data(logcta20.hmac)
Format
cta20 and logcta20 are two dimensional
matrices. cta20.hmac and logcta20.hmac are objects of class hmac
obtained from applying phmac
on cta20 and logcta20 respectively
Details
The dataset is generated by illumina technology for high throughput genotyping named GOLDEN GATE. The data values are actual measurements made by the machine (intensity), after these are normalized (background subtracted etc). The data set is used for making genotype calls by Illumina. The data around X- and Y-axes represents the two homozygous genotypes (e.g. AA and TT), while the cluster along the 45-degree line represents the heterozygous (e.g. AT) genotype. Due to noisy reads, the data points often lie in-between the axes, and cluster detection is used for making automatic genotype calls.
Author(s)
Surajit Ray and Yansong Cheng
Examples
data(logcta20)
data(logcta20.hmac)
plot(logcta20)
plot(logcta20.hmac)
plot(logcta20.hmac,level=4)