R: Computes p-value for multiple mean-type hypotheses, based on...

el2.cen.EMm {emplik2}

R Documentation

Computes p-value for multiple mean-type hypotheses, based on two independent samples that may contain censored data.

Description

This function uses the EM algorithm to calculate a maximized empirical likelihood ratio for a set of p hypotheses as follows:

H_o: E(g(x,y)-mean)=0

where E indicates expected value; g(x,y) is a vector of user-defined functions g_1(x,y), \ldots, g_p(x,y); and mean is a vector of p hypothesized values of E(g(x,y)). The two samples x and y are assumed independent. They may be uncensored, right-censored, left-censored, or left-and-right (“doubly”) censored. A p-value for H_o is also calculated, based on the assumption that -2*log(empirical likelihood ratio) is asymptotically distributed as chisq(p).

Usage

el2.cen.EMm(x, dx, wx=rep(1,length(x)), y, dy, wy=rep(1,length(y)), 
            p, H, xc=1:length(x), yc=1:length(y), mean, maxit=15)

Arguments

`x`	a vector of the data for the first sample
`dx`	a vector of the censoring indicators for x: 0=right-censored, 1=uncensored, 2=left-censored
`wx`	a vector of data case weight for x
`y`	a vector of the data for the second sample
`dy`	a vector of the censoring indicators for y: 0=right-censored, 1=uncensored, 2=left-censored
`wy`	a vector of data case weight for y
`p`	the number of hypotheses
`H`	a matrix defined as `H = [H_1, H_2, \ldots, H_p]`, where `H_k = [g_k(x_i,y_j)-mu_k], k=1, \ldots, p`
`xc`	a vector containing the indices of the `x` datapoints, controls if tied x collapse or not
`yc`	a vector containing the indices of the `y` datapoints, ditto
`mean`	the hypothesized value of `E(g(x,y)`)
`maxit`	a positive integer used to control the maximum number of iterations of the EM algorithm; default is 15

Details

The value of mean_k should be chosen between the maximum and minimum values of g_k(x_i,y_j); otherwise there may be no distributions for x and y that will satisfy H_o. If mean_k is inside this interval, but the convergence is still not satisfactory, then the value of mean_k should be moved closer to the NPMLE for E(g_k(x,y)). (The NPMLE itself should always be a feasible value for mean_k.)

Value

el2.cen.EMm returns a list of values as follows:

`xd1`	a vector of unique, uncensored `x`-values in ascending order
`yd1`	a vector of unique, uncensored `y`-values in ascending order
`temp3`	a list of values returned by the `el2.test.wtm` function (which is called by `el2.cen.EMm`)
`mean`	the hypothesized value of `E(g(x,y))`
`NPMLE`	a non-parametric-maximum-likelihood-estimator vector of `E(g(x,y))`
`logel00`	the log of the unconstrained empirical likelihood
`logel`	the log of the constrained empirical likelihood
`"-2LLR"`	-2*(log-likelihood-ratio) for the `p` simultaneous hypotheses
`Pval`	the p-value for the `p` simultaneous hypotheses, equal to `1 - pchisq(-2LLR, df = p)`
`logvec`	the vector of successive values of `logel` computed by the EM algorithm (should converge toward a fixed value)
`sum_muvec`	sum of the probability jumps for the uncensored `x`-values, should be 1
`sum_nuvec`	sum of the probability jumps for the uncensored `y`-values, should be 1

Author(s)

William H. Barton <bbarton@lexmark.com>

References

Barton, W. (2010). Comparison of two samples by a nonparametric likelihood-ratio test. PhD dissertation at University of Kentucky.

Chang, M. and Yang, G. (1987). “Strong Consistency of a Nonparametric Estimator of the Survival Function with Doubly Censored Data.” Ann. Stat.,15, pp. 1536-1547.

Dempster, A., Laird, N., and Rubin, D. (1977). “Maximum Likelihood from Incomplete Data via the EM Algorithm.” J. Roy. Statist. Soc., Series B, 39, pp.1-38.

Gomez, G., Julia, O., and Utzet, F. (1992). “Survival Analysis for Left-Censored Data.” In Klein, J. and Goel, P. (ed.), Survival Analysis: State of the Art. Kluwer Academic Publishers, Boston, pp. 269-288.

Li, G. (1995). “Nonparametric Likelihood Ratio Estimation of Probabilities for Truncated Data.” J. Amer. Statist. Assoc., 90, pp. 997-1003.

Owen, A.B. (2001). Empirical Likelihood. Chapman and Hall/CRC, Boca Raton, pp. 223-227.

Turnbull, B. (1976). “The Empirical Distribution Function with Arbitrarily Grouped, Censored and Truncated Data.” J. Roy. Statist. Soc., Series B, 38, pp. 290-295.

Zhou, M. (2005). “Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm.” J. Comput. Graph. Stat., 14, pp. 643-656.

Zhou, M. (2009) emplik package on CRAN website. The function el2.cen.EMm here extends el.cen.EM2 inside emplik from one-sample to two-samples.

Examples

 
x<-c(10, 80, 209, 273, 279, 324, 391, 415, 566, 85, 852, 881, 895, 954, 1101, 1133,
1337, 1393, 1408, 1444, 1513, 1585, 1669, 1823, 1941)
dx<-c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0)
y<-c(21, 38, 39, 51, 77, 185, 240, 289, 524, 610, 612, 677, 798, 881, 899, 946, 1010,
1074, 1147, 1154, 1199, 1269, 1329, 1484, 1493, 1559, 1602, 1684, 1900, 1952)
dy<-c(1,1,1,1,1,1,2,2,1,1,1,1,1,2,1,1,1,1,1,1,0,0,1,1,0,0,1,0,0,0)
nx<-length(x)
ny<-length(y)
xc<-1:nx
yc<-1:ny
wx<-rep(1,nx)
wy<-rep(1,ny)
mu=c(0.5,0.5)
p <- 2
H1<-matrix(NA,nrow=nx,ncol=ny)
H2<-matrix(NA,nrow=nx,ncol=ny)
for (i in 1:nx) {
  for (j in 1:ny) {
   H1[i,j]<-(x[i]>y[j])
   H2[i,j]<-(x[i]>1060) } }
H=matrix(c(H1,H2),nrow=nx,ncol=p*ny)

# Ho1: X is stochastically equal to Y
# Ho2: mean of X equals mean of Y

el2.cen.EMm(x=x, dx=dx, y=y, dy=dy, p=2, H=H, mean=mu, maxit=10)

# Result: Pval is 0.6310234, so we cannot with 95 percent confidence reject the two
# simultaneous hypotheses Ho1 and Ho2

[Package emplik2 version 1.32 Index]