inudge.fit {DIME}R Documentation

Function for Fitting iNUDGE model parameters

Description

Function to estimate parameters for NUDGE model, mixture of uniform and k-normal. Parameters are estimated using EM algorithm.

Usage

inudge.fit(data, avg = NULL, K = 2, weights = NULL, weights.cutoff = -1.345,
 pi = NULL, mu = NULL, sigma = NULL, tol = 1e-5, max.iter = 2000, z = NULL)

Arguments

data

an R list of vector of normalized intensities (counts). Each element can correspond to particular chromosome. User can construct their own list containing only the chromosome(s) they want to analyze.

avg

optional vector of mean data (or log intensities). Only required when any one of huber weight (lower, upper or full) is selected.

K

optional number of normal component that will be fitted in iNUDGE model.

weights

optional weights to be used for robust fitting. Can be a matrix the same length as data, or a character description of the huber weight method to be employed: "lower" - only value below weights.cutoff are weighted,\ "upper" - only value above weights.cutoff are weighted,\ "full" - both values above and below weights.cutoff are weighted,\ If selected, mean of data (avg) is required.

weights.cutoff

optional cutoff to be used with the Huber weighting scheme.

pi

optional vector containing initial estimates for proportion of the iNUDGE mixture components. The first entry is for the uniform component, the middle k entries are for normal components.

mu

optional vector containing initial estimates of the Gaussian means in iNUDGE model.

sigma

optional vector containing initial estimates of the Gaussian standard deviation in (i)NUDGE model. Must have K entries.

tol

optional threshold for convergence for EM algorithm to estimate iNUDGE parameters.

max.iter

optional maximum number of iterations for EM algorithm to estimate iNUDGE parameters.

z

optional 2-column matrix with each row giving initial estimate of probability of the region being non-differential and a starting estimate for the probability of the region being differential. Each row must sum to 1. Number of row must be equal to data length.

Value

A list of object:

name

the name of the model "iNUDGE"

pi

a vector of estimated proportion of each components in the model

mu

a vector of estimated Gaussian means for k-normal components.

sigma

a vector of estimated Gaussian standard deviation for k-normal components.

K

the number of normal components in the corresponding mixture model.

loglike

the log likelihood for the fitted mixture model.

iter

the actual number of iterations run by the EM algorithm.

fdr

the local false discover rate estimated based on iNUDGE model.

phi

a matrix of estimated iNUDGE mixture component function.

AIC

Akaike Information Criteria.

BIC

Bayesian Information Criteria.

Author(s)

Cenny Taslim taslim.2@osu.edu, with contributions from Abbas Khalili khalili@stat.ubc.ca, Dustin Potter potterdp@gmail.com, and Shili Lin shili@stat.osu.edu

See Also

DIME, gng.fit, nudge.fit

Examples

library(DIME);

# generate simulated datasets with underlying uniform and 2-normal distributions
set.seed(1234);
N1 <- 1500; N2 <- 500; rmu <- c(-2.25,1.5); rsigma <- c(1,1); 
rpi <- c(.10,.45,.45); a <- (-6); b <- 6; 
chr4 <- list(c(-runif(ceiling(rpi[1]*N1),min = a,max =b),
  rnorm(ceiling(rpi[2]*N1),rmu[1],rsigma[1]), 
  rnorm(ceiling(rpi[3]*N1),rmu[2],rsigma[2])));
chr9 <- list(c(-runif(ceiling(rpi[1]*N2),min = a,max =b),
  rnorm(ceiling(rpi[2]*N2),rmu[1],rsigma[1]), 
  rnorm(ceiling(rpi[3]*N2),rmu[2],rsigma[2])));
# analyzing chromosome 4 and 9
data <- list(chr4,chr9);

# fit iNUDGE model with 2 normal components and maximum iterations = 20
set.seed(1234);
test <- inudge.fit(data, K = 2, max.iter=20);

# Getting the best fitted iNUDGE model (parameters)
test$best$pi # estimated proportion of each component in iNUDGE
test$best$mu # estimated mean of the normal component(s) in iNUDGE
# estimated standard deviation of the normal component(s) in iNUDGE
test$best$sigma 

[Package DIME version 1.3.0 Index]