tcgaov {eclust}R Documentation

Subset of TCGA mRNA Ovarian serous cystadenocarcinoma data


A dataset containing a subset of the TCGA mRNA Ovarian serous cystadenocarcinoma data generated using Affymetrix HTHGU133a arrays. Differences in gene expression profiles have led to the identification of robust molecular subtypes of ovarian cancer; these are of biological and clinical importance because they have been shown to correlate with overall survival (Tothill et al., 2008). Improving prediction of survival time based on gene expression signatures can lead to targeted therapeutic interventions (Helland et al., 2011). The proposed ECLUST algorithm was applied to gene expression data from 511 ovarian cancer patients profiled by the Affymetrix Human Genome U133A 2.0 Array. The data were obtained from the TCGA Research Network: and downloaded via the TCGA2STAT R library (Wanet al., 2015). Using the 881 signature genes from Helland et al. (2011) we grouped subjects into two groups based on the results in this paper, to create a “positive control” environmental variable expected to have a strong effect. Specifically, we defined an environment variable in our framework as: E = 0 for subtypes C1 and C2 (n = 253), and E = 1 for subtypes C4 and C5 (n = 258).




A data.table and data.frame with 511 rows and 886 variables:


unique patient identifier (character)


cancer subtype (1,2,3 or 4) as per Helland et al. 2011 (integer)


binary environment variable for ECLUST method. E = 0 for subtypes 1 and 2 (n = 253), and E = 1 for subtypes 4 and 5 (n = 258) (numeric)


vital status, 0 = alive, 1 = dead (numeric)


overall survival time (numeric)

columns 6:886

gene expression data for 881 genes. column names are the gene names (numeric)



Richard W Tothill, Anna V Tinker, Joshy George, Robert Brown, Stephen B Fox, Stephen Lade, Daryl S Johnson, Melanie K Trivett, Dariush Etemadmoghadam, Bianca Locandro, et al. Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clinical Cancer Research, 14(16):5198–5208, 2008.

Aslaug Helland, Michael S Anglesio, Joshy George, Prue A Cowin, Cameron N Johnstone, Colin M House, Karen E Sheppard, Dariush Etemadmoghadam, Nataliya Melnyk, Anil K Rustgi, et al. Deregulation of mycn, lin28b and let7 in a molecular subtype of aggressive high-grade serous ovarian cancers. PloS one, 6(4):e18064, 2011.


# using data.table syntax from the data.table package
tcgaov[1:5, 1:10, with = FALSE]
tcgaov[,table(subtype, E, useNA = "always")]

[Package eclust version 0.1.0 Index]