sharpr2 {sharpr2}R Documentation

sharpr2

Description

For a HiDRA dataset on a given chromosome, this function calls tiled regions (the regions covered by at least one fragment), and calculates regulatory scores for each tiled region. The regulatory scores are based on standardized log(RNA/PLASMID).

Usage

sharpr2(data, l_min = 150, l_max = 600, f_rna = 10, f_dna = 0,
  s_a = 300, verbose = FALSE, auto = TRUE, sig = TRUE, len = FALSE, 
  alpha = 0.05, win = 5, mse = FALSE, max_t = 1)

Arguments

data

A data.frame containing an ATAC-STARR dataset for one chromosome. The data.frame must contain four columns: 'start', 'end', 'PLASMID', 'RNA'. 'PLASMID' and 'RNA' are the values for DNA and RNA, which should be non-negative real numbers (average value over multiple replicates) or integers (counts).

l_min

The fragments with a length smaller than l_min will not be processed. The default is 150.

l_max

The fragments with a length larger than l_max will not be processed. The default is 600.

f_rna

The fragments with an RNA count smaller than f_rna will not be processed. The default is 10.

f_dna

The fragments with an DNA count smaller than f_rna will not be processed. The default is 0.

s_a

A variance hyperparameter in the prior for the latent regulatory scores. The default is 1000.

verbose

An indicator of whether to show processing information. The default is FALSE.

auto

An indicator of whether to automatically estimate the ridge coefficient \lambda from the data for each tiled region using a data-driven way described in the reference. The default is TRUE. If auto is TRUE, s_a is ignored and a ridge coefficient is estimated for each tiled region separately. If auto is FALSE, a global user-defined ridge coefficient (1/s_a) is used.

sig

An indicator of whether to identify significant motif regions for the estimated scores. Only valid if auto=TRUE. The default is TRUE.

len

An indicator of whether to model log(RNA/PLASMID) of each fragment as the average or the sum of the latent regulatory scores. The default is FALSE, which is the sum.

alpha

A regional FWER to call high resolution driver elements (the significant regulatory region). The default is 0.05.

win

A window size for removing sporadic identified significant regions. If a significant consecutive region is small than win, it will be treated as false signals. The default is 5.

mse

An indicator of whether mean square errors are included in the output results. The default is FALSE.

max_t

A value between 0 and 1, indicating the proportion of non-zero eigenvectors used to calculate \lambda when auto=TRUE. The default is 1.

Details

The default value of s_a is set to be 300, which is equivalent to a ridge coefficient of 0.0033. This default ridge coefficient value is selected by the median of the estimated \lambda from the first library.

Value

score: the regulatory scores for each tiled region. This list contains four components: est_a (the regulatory scores at each locus), sd_e (the sqare root of the mean square error), var_nb (the variance of the esitmate at each locus), \lambda (the ridge coefficient).

region: the start and end positions for each tiled region.

n_reg: total number of tiled regions.

n_read: the number of reads in each tiled region.

sig_reg: identified high resolution driver elements based on the cutoff.

motif: predicted 20bp motifs

cutoff: the cutoff used to call high resolution driver elements for the tiled region.

References

Xinchen Wang, Liang He, Sarah Goggin, Alham Saadat, Li Wang, Melina Claussnitzer, Manolis Kellis. High-resolution genome-wide functional dissection of transcriptional regulatory regions in human.

Examples

data(hidra_ex)
re <- sharpr2(hidra_ex[1:2000,], l_min = 150, l_max = 600, f_dna = 5, f_rna = 0, sig=FALSE)

[Package sharpr2 version 1.1.1.0 Index]