R: Specify which distribution to fit on the marker values

fit {optimalThreshold}

R Documentation

Specify which distribution to fit on the marker values

Description

This function is a wrapper to create an S4 object to specify a distribution to fit the marker values.

Usage

fit(x, distr, ini = NULL, thin = NULL, burnin = NULL, model = NULL,
  paraNames = NULL, mcmcList = NULL, cdf = NULL, gradient = NULL,
  hessian = NULL)

Arguments

`x`	a vector of marker values (NA values allowed, see Details).
`distr`	a character that specifies the distribution to fit (normal, log-normal, scaled t, gamma, logistic, user-defined or undefined, see Details).
`ini`	specification of initial values for the parameters of the marker distribution in the form of a list. Each list must be named. A list should be provided for each MCMC chain. NULL for "norm" and "lnorm".
`thin`	the thinning interval between consecutive observations. NULL for "norm" and "lnorm".
`burnin`	a positive integer that defines the length of the burn-in iterations when performing the MCMC algorithm. NULL for "norm" and "lnorm".
`model`	a character string used to define the model. Must match with the definition of a model compatible with JAGS. Necessary only for the t and logistic distributions (see Details).
`paraNames`	a string vector containing the names of the parameters of the submitted distribution. Should be provided only for "user" defined distribution.
`mcmcList`	an object of class mcmc.list where each list contains an MCMC chain. To be provided only for "user" defined distribution.
`cdf`	a function that characterizes the cumulative distribution. To be provided only for "user" defined distribution (see Details).
`gradient`	a function that characterizes the density distribution. To be provided only for "user" defined distribution (see Details).
`hessian`	a function that characterizes the first derivative of the probability density function. To be provided only for "user" defined distribution (see Details).

Details

This function allows the user to specify which distribution should be fitted to the marker values. If NA values are present in the x argument passed to the function, a warning is produced. However, the user should not discard the NA values from the original data because the length of the x argument is calculated internally to to estimate the mean risk of event occurrence in each treatment arm. So NA values are managed internally by the function. Five theoretical distributions are implemented by the package: normal, log-normal, gamma, scaled t, and logistic. This is here that the user must specify which of the four distributions must be of type 'undefined' (or in other words which distribution must be expressed as a function of the three other distributions and mean risks of event). The user may also define its own theoretical distribution. The details for each theoretical distribution are provided hereafter:

Fit a normal distribution: when specifying distr="norm" you fit a normal distribution to the marker values passed to the x argument of the function. Non-informative priors are used (p(\mu,\sigma^2) \propto (\sigma^2)^(-1)). Posterior values of the normal distribution parameters are sampled directly from the exact posterior distributions. If you don't want to use non-informative priors, see the explanation on how to fit a user-defined distribution.
Fit a log-normal distribution: when specifying distr="lnorm" you fit a log-normal distribution to the marker values passed to the x argument of the function. Non-informative priors are used (p(\mu,\sigma^2) \propto (\sigma^2)^(-1)). Posterior values of the log-normal distribution parameters are sampled directly from the exact posterior distributions. If you don't want to use non-informative priors, see the explanation on how to fit a user-defined distribution.
Fit a gamma distribution: when specifying distr="gamma" you fit a gamma distribution to the marker values passed to the x argument of the function. Non-informative priors are used (p(shape,scale) \propto 1/scale). Posterior values of the gamma distribution parameters are sampled using the ARS method. This method requires that the user specifies a list of initial values passed to the ini argument of the function. Each element of this list must be a list with one element named 'shape'. It also requires the thin of the MCMC chain, and the length of the burnin phase passed to the burnin argument. If you don't want to use non-informative priors, see the explanation on how to fit a user-defined distribution.
Fit a scaled t distribution: when specifying distr="t" you fit a scaled t distribution to the marker values passed to the x argument of the function. Posterior values of the scaled t distribution parameters are sampled using an MCMC algorithm through the JAGS software, so the function requires the user to provide the JAGS model as a character string through the model argument of the function. If NULL, a model with vague priors is provided to the function automatically:

mu ~ U(min(x),max(x))

log(sd) ~ U(-10,10)

1/df ~ U(0,1)

This method requires that the user specifies a list of initial values passed to the ini argument of the function. Each element of this list must be a list with three elements named 'mu', 'sd', and 'df'. It also requires the thin of the MCMC chain, and the length of the burnin phase passed to the burnin argument.
Fit a logistic distribution: when specifying distr="logis" you fit a logistic distribution to the marker values passed to the x argument of the function. Posterior values of the logistic distribution parameters are sampled using a MCMC algorithm through the JAGS software, so the function requires the user to provide the JAGS model as a character string through the model argument of the function. If NULL, a model with vague priors is provided to the function automatically:

location ~ U(min(x),max(x))

log(scale) ~ U(-10,10)

This method requires that the user specifies a list of initial values passed to the ini argument of the function. Each element of this list must be a list with two elements named 'location', and 'scale'. It also requires the thin of the MCMC chain, and the length of the burnin phase passed to the burnin argument.
Fit a user-defined distribution: when specifying distr="user" you fit a user-defined distribution to the marker values passed to the x argument of the function. First of all, the user must give the parameters name in the argument paraNames of the function using a character vector. Then, the user provides a posterior sample of the parameters of the distribution obtained using JAGS or another software through an object of class mcmc.list to the argument mcmcList of the function (this implies that the user performed the Bayesian inference himself). Note that the names passed to the mcmc.list object must match with the names given in the paraNames argument. Then, the user must specify the cdf, gradient, and hessian functions associated with the fitted distribution. The cdf function is the cumulative distribution function that is fitted to the marker values, the gradient function is its first derivative which corresponds to the probability density function fitted to the marker values, and the hessian function is the second derivative of cdf. When the fitted distribution is a supported distribution (e.g. a normal distribution with informative priors), the user may use the getMethod(cdf,"normalDist") function to use the standard method for normal distribution used in the package. When the fitted distribution is not supported, the user must specify directly the cdf as function(x,mu,sd) pnorm(x,mu,sd) (if we keep the example of the normal distribution). The same idea may be used for the gradient and hessian functions (see the examples to have more details).
Specify which marker distribution is expressed as a function of the three others and the mean risks of event using distr="undefined".

Value

Returns an object to be passed to the trtSelThresh and diagThresh functions.

Examples

#Fit a normal distribution
x <- rnorm(250)
fitX <- fit(x, "norm")

#Fit a log-normal distribution
x <- rlnorm(250)
fitX <- fit(x, "lnorm")

#Fit a gamma distribution
x <- rgamma(250, shape = 2, scale = 1.2)
fitX <- fit(x, "gamma", 
            ini = list(list(shape = 1), 
                       list(shape = 2), 
                       list(shape = 3)),
            thin = 1, burnin = 1000)

#Fit a scaled t distribution
x <- optimalThreshold:::rt.scaled(250, df = 4, mean = 2.5, sd = 2)
fitX <- fit(x, "t",
            ini = list(list(mu = 1, sd = 1, df = 2), 
                       list(mu = 2, sd = 2, df = 4), 
                       list(mu = 3, sd = 3, df = 6)),
            thin = 1, burnin = 1000, model = NULL)

#Fit a logistic distribution
x <- rlogis(250)
fitX <- fit(x, "logis", 
            ini = list(list(location = 0.3, scale = 0.5), 
                       list(location = 1, scale = 1), 
                       list(location = 2, scale = 2)), 
            thin = 1, burnin = 1000, model = NULL)

#Specify which distribution is 'undefined'
x <- rnorm(250)
fitX <- fit(x, "undefined")

#Fit a user-defined normal distribution with informative priors
library(rjags)
x <- rnorm(250, mean = 2, sd = 1)
model <- "model
		{
			mu ~ dunif(0, 4)
			log_sd ~ dunif(-1, 1)
			sd <- exp(log_sd)
			tau <- 1 / (sd^2)
			for (i in 1:N)
			{
				x[i] ~ dnorm(mu, tau)
			}
		}
		"
modelJAGS <- jags.model(file = textConnection(model), data = list(x = x, N = length(x)), 
                        inits = list(list(mu = 1, log_sd = -0.5),list(mu = 3.5, log_sd = 0.5)),
                        n.chains = 2, quiet = TRUE)
update(modelJAGS, 1000, progress.bar = "text")
mcmcpara <- coda.samples(modelJAGS, c("mu", "log_sd"), n.iter = 2000, thin = 1)
varnames(mcmcpara) <- c("mu", "sd")
mcmcpara[[1]][, "sd"] <- exp(mcmcpara[[1]][, "sd"])
mcmcpara[[2]][, "sd"] <- exp(mcmcpara[[2]][, "sd"])
fitX <- fit(x, "user", paraNames = varnames(mcmcpara), mcmcList = mcmcpara, 
            cdf = function(x, mu, sd) pnorm(x, mu, sd), 
            gradient = getMethod(gradient, "normalDist"), 
            hessian = function(x, mu, sd) ((mu - x) / sd^2) * dnorm(x, mu, sd))

[Package optimalThreshold version 1.0 Index]