cta20 {Modalclust}R Documentation

Two dimensional data in original and log scale

Description

Two dimensional data in original and log scale and their hierarchical modal clustering. This dataset demonstrates the fact that modal clustering techniques can be used to cluster untransformed data as it does not depend on parametric assumptions. The clustering results, before and after the log transformation both produce nice separation of the three clusters.

Usage

data(cta20)
data(cta20.hmac)
data(logcta20)
data(logcta20.hmac)

Format

cta20 and logcta20 are two dimensional matrices. cta20.hmac and logcta20.hmac are objects of class hmac obtained from applying phmac on cta20 and logcta20 respectively

Details

The dataset is generated by illumina technology for high throughput genotyping named GOLDEN GATE. The data values are actual measurements made by the machine (intensity), after these are normalized (background subtracted etc). The data set is used for making genotype calls by Illumina. The data around X- and Y-axes represents the two homozygous genotypes (e.g. AA and TT), while the cluster along the 45-degree line represents the heterozygous (e.g. AT) genotype. Due to noisy reads, the data points often lie in-between the axes, and cluster detection is used for making automatic genotype calls.

Author(s)

Surajit Ray and Yansong Cheng

Examples

data(logcta20)
data(logcta20.hmac)
plot(logcta20)
plot(logcta20.hmac)
plot(logcta20.hmac,level=4)

[Package Modalclust version 0.7 Index]