H3K27Ac {MAnorm2} | R Documentation |
ChIP-seq Samples for H3K27Ac in Human Lymphoblastoid Cell Lines
Description
Benefiting from the associated ChIP-seq samples, this dataset profiles H3K27Ac levels along the whole genome for multiple human lymphoblastoid cell lines, each derived from a separate person. Specifically, a set of genomic intervals of around the same size (2 kb) has been systematically selected to thoroughly cover the part of the genome that is enriched with reads in at least one of the ChIP-seq samples. And for each of these intervals, this dataset records its raw read count and enrichment status in each of the samples.
Usage
H3K27Ac
Format
H3K27Ac
is a data frame that records the features of 73,828
non-overlapping genomic intervals regarding the H3K27Ac ChIP-seq signals
in multiple human lymphoblastoid cell lines. It contains the following
variables:
chrom, start, end
Genomic coordinate of each interval. Note that these coordinates are 0-based and correspond to the hg19 genome assembly.
cellLine_H3K27Ac_num.read_cnt
Each variable whose name is of this form records the number of reads from a ChIP-seq sample that fall within each genomic interval. For example,
GM12891_H3K27Ac_2.read_cnt
corresponds to the 2nd biological replicate of a ChIP-seq experiment that targets H3K27Ac in a cell line named GM12891.cellLine_H3K27Ac_num.occupancy
Each variable whose name is of this form records the enrichment status of each genomic interval in a ChIP-seq sample. An enrichment status of 1 indicates that the interval is enriched with reads in the sample; an enrichment status of 0 indicates otherwise. In practice, enrichment status of a genomic interval in a certain ChIP-seq sample could be determined by its overlap with the peaks (see "References" below) of the sample. Note also that variables of this class correspond to the variables of raw read counts one by one.
Each cell line derives from a separate individual of the Caucasian
population. Use attr(H3K27Ac, "metaInfo")
to get a data frame
that records meta information about the involved individuals.
Source
Raw sequencing data were obtained from Kasowski et al., 2013 (see
"References" below). Adapters and low-sequencing-quality bases were
trimmed from 3' ends of reads using trim_galore
. The
resulting reads were then aligned to the hg19 reference genome
by bowtie
. MACS
was utilized to call peaks
for each ChIP-seq sample.
Finally, MAnorm2_utils
was exploited to integrate the alignment
results as well as peaks of ChIP-seq samples into this regular
table. MAnorm2_utils
is specifically designed to create input
tables of MAnorm2
.
See the home page of
MAnorm2_utils for more information about it. It has also been uploaded
to the PyPI repository as a Python package.
References
Zhang, Y., et al., Model-based analysis of ChIP-Seq (MACS). Genome Biol, 2008. 9(9): p. R137.
Kasowski, M., et al., Extensive variation in chromatin states across humans. Science, 2013. 342(6159): p. 750-2.