R: Type IV Non-Random Labeling of a Given Set of Points

rnonRLIV {nnspat}

R Documentation

Type IV Non-Random Labeling of a Given Set of Points

Description

An object of class "SpatPatterns".

Given the set of n points, dat, in a region, this function assigns n_1=round(n*ult.prop,0) of them as cases, and the rest as controls with first selecting k_0=round(n*init.prop,0) as cases initially and assigning the label case to the remaining points with infection probabilities equal to the scaled bivariate normal density values at those points. The initial and ultimate number of cases will be k_0 and n_1 on the average if the argument poisson=TRUE (i.e., k_0=rpois(1,round(n*init.prop,0)) and n_1=rpois(1,round(n*ult.prop,0)) ), otherwise they will be exactly equal to n_1=round(n*ult.prop,0) and k_0=round(n*init.prop,0). More specifically, let z_1,\ldots,z_{k_0} be the initial cases and for j=1,2,\ldots,k_0 let \phi_{G,j}(z_i) be the value of the pdf of the BVN(z_j,s_1,s_2,rho), which is the bivariate normal distribution mean=z_j and standard deviations of the first and second components being s_1 and s_2 (denoted as s1 and s2 as arguments of the function) and correlation between them being \rho (denoted as rho as an argument of the function) (i.e., the covariance matrix is \Sigma=S where S_{11}=s_1^2, S_{22}=s_2^2, S_{12}=S_{21}=s_1 s_2 \rho). Add these pdf values as p_j=\sum_{j=1}^{k_0} \phi_{G,j}(z_i) for each i=1,2,\ldots,n and find p_{\max}=\max p_j. Then label the points (other than the initial cases) as cases with infection probabilities prob equal to the value of the p_j/p_{\max} values at these points. We stop when we first exceed n_1 cases. \rho has to be in (-1,1) for prob to be a valid probability and s_1 and s_2 must be positive (actually these are required for the BVN density to be nondegenerately defined). If rand.init=TRUE, first k_0 entries are chosen as the initial cases in the data set, dat, otherwise, k_0 initial cases are selected randomly among the data points.

Algorithmically, first all dat points are treated as non-cases (i.e., controls or healthy subjects). Then the function follows the following steps for labeling of the points:

step 0: n_1 is generated randomly from a Poisson distribution with mean = round(n*ult.prop,0), so that the average number of ultimate cases will be round(n*ult.prop,0) if the argument poisson=TRUE, else n_1=round(n*ult.prop,0). And k_0 is generated randomly from a Poisson distribution with mean = round(n*init.prop,0), so that the average number of initial cases will be round(n*init.prop,0) if the argument poisson=TRUE, else k_0=round(n*init.prop,0).

step 1: Initially, k_0 many points from dat are selected as cases. The selection of initial cases are determined based on the argument rand.init (with default=TRUE) where if rand.init=TRUE then the initial cases are selected randomly from the data points, and if rand.init= FALSE, the first k_0 entries in the data set, dat, are selected as the cases.

step 2: Then it assigns the label case to the remaining points with infection probabilities prob=\sum_{j=1}^{k_0} \phi_{G,j}(z_i)/p_{\max}, which is the sum of the BVN densities scaled by the maximum of such sums. See the description for the details of the parameters in the prob.

step 3: The procedure ends when number of cases n_c exceed n_1, and n_c-n_1 of the cases (other than the initial cases) are randomly selected and relabeled as controls, i.e., 0s, so that the number of cases is exactly n_1.

In the output cases are labeled as 1 and controls as 0, and initial contagious case is marked with a red cross in the plot of the pattern.

See Ceyhan (2014) for more detail where type IV non-RL pattern is the case 4 of non-RL pattern considered in Section 6 with n_1 and k_0 are fixed as parameters and rho is represented as k_{pow} and rho/k_{den}=1 in the article.

Although the non-RL pattern is described for the case-control setting, it can be adapted for any two-class setting when it is appropriate to treat one of the classes as cases or one of the classes behave like cases and other class as controls.

Usage

rnonRLIV(
  dat,
  init.prop,
  ult.prop,
  s1,
  s2,
  rho,
  rand.init = TRUE,
  poisson = FALSE
)

Arguments

`dat`	A set of points the non-RL procedure is applied to obtain cases and controls randomly in the type IV fashion (see the description).
`init.prop`	A real number between 0 and 1 representing the initial proportion of cases in the data set, `dat`. The selection of the initial cases depends on the parameter `rand.init` and the number of initial cases depends on the parameter poisson (see the description).
`ult.prop`	A real number between 0 and 1 representing the ultimate proportion of cases in the data set, `dat` after the non-RL assignment. The number of ultimate cases depends on the parameter poisson (see the description).
`s1`, `s2`	Positive real numbers representing the standard deviations of the first and second components of the bivariate normal distribution.
`rho`	A real number between -1 and 1 representing the correlation between the first and second components of the bivariate normal distribution.
`rand.init`	A logical argument (default is `TRUE`) to determine the choice of the initial case in the data set, `dat`. If `rand.init=TRUE` then the initial case is selected randomly from the data points, and if `rand.init=` `FALSE`, the first `k_0` entries in the data set, `dat`, is labeled as the initial case.
`poisson`	A logical argument (default is `FALSE`) to determine whether the number of initial and ultimate cases, `k_0` and `n_1`, will be random or fixed. If `poisson=TRUE` then the `k_0` and `n_1` are from a Poisson distribution, `k_0=rpois(1,round(ninit.prop,0))` and `n_1=rpois(1,round(nult.prop,0))` otherwise they are fixed, `k_0=round(ninit.prop,0)` and `n_1=round(nult.prop,0)`.

Value

A list with the elements

`pat.type`	`="cc"` for the case-control patterns for RL or non-RL of the given data points, `dat`
`type`	The type of the point pattern
`parameters`	initial and ultimate proportion of cases after the non-RL procedure is applied to the data, `s1`, `s2` and `rho` which are standard deviations and the correlation for the components of the bivariate normal distribution.
`dat.points`	The set of points non-RL procedure is applied to obtain cases and controls randomly in the type IV fashion
`lab`	The labels of the points as 1 for cases and 0 for controls after the type IV nonRL procedure is applied to the data set, `dat`. Cases are denoted as red dots and controls as black circles in the plot.
`init.cases`	The initial cases in the data set, `dat`. Marked with red crosses in the plot of the points.
`gen.points`, `ref.points`	Both are `NULL` for this function, as initial set of points, `dat`, are provided for the non-RL procedure.
`desc.pat`	Description of the point pattern
`mtitle`	The `"main"` title for the plot of the point pattern
`num.points`	The `vector` of two numbers, which are the number of cases and controls.
`xlimit`, `ylimit`	The possible ranges of the `x`- and `y`-coordinates of the generated and the reference points

Author(s)

Elvan Ceyhan

References

Ceyhan E (2014). “Segregation indices for disease clustering.” Statistics in Medicine, 33(10), 1662-1684.

Examples

n<-40;  #try also n<-20; n<-100;
ult<-.5; #try also .25, .75
#data generation
dat<-cbind(runif(n,0,1),runif(n,0,1))

int<-.1
s1<-s2<-.4
rho<- .1

Xdat<-rnonRLIV(dat,int,ult,s1,s2,rho,poisson=FALSE) #labeled data, try also with poisson=TRUE
Xdat

table(Xdat$lab)

summary(Xdat)
plot(Xdat,asp=1)
plot(Xdat)

#normal original data
n<-40;  #try also n<-20; n<-100;
dat<-cbind(rnorm(n,0,1),rnorm(n,0,1))
ult<-.5; #try also .25, .75

int<-.1
s1<-s2<-.4
rho<-0.1

Xdat<-rnonRLIV(dat,int,ult,s1,s2,rho,poisson=FALSE) #labeled data, try also with poisson=TRUE
Xdat

table(Xdat$lab)

summary(Xdat)
plot(Xdat,asp=1)
plot(Xdat)

[Package nnspat version 0.1.2 Index]