R: Reading epigenomic data from epimark file

read_epigenomic_data {REPTILE}

R Documentation

Reading epigenomic data from epimark file

Description

Function to read epimark file from disk and generate data.frame instance. It is used to read epigenomic data from file on disk and generate the input data.frame instance to fuel the model training, prediction and other following steps. Epimark file is a tab-separated file with a header. The first four columns are "chr", "start", "end" and "id", specifying the chromosome, start, end and id of regions. Each of the remaining columns contain values of one epigenetic mark in one sample (condition, cell or tissue type, etc) and the column name follows "MARK_SAMPLE" format, such as "H3K4me1_mESC".

Usage

read_epigenomic_data(data_info, epimark_file, query_sample,
                     ref_sample = NULL, incl_dev = T)

Arguments

`data_info`	data.frame generated by reading data information file specifying the samples and marks used in the analysis. The data.frame includes at least two columns named "sample" and "mark", corresponding to the samples and marks included.
`epimark_file`	name of epimark file
`query_sample`	name of the target sample
`ref_sample`	a vector of names of the reference sample(s)
`incl_dev`	logical value indicates whether to calculate the intensity deviation feature. Intensity deviation is defined as the intensity in target sample subtracted by the mean intensity in reference samples (i.e. reference epigenome) and it captures the tissue-specificity of each epigenetic mark.

Value

data.frame instance containing intensity and intensity deviation values of each mark for each region

Author(s)