simulateCRT {CRTspat}  R Documentation 
simulateCRT
generates simulated data for a cluster randomized trial (CRT) with geographic spillover between arms.
simulateCRT(
trial = NULL,
effect = 0,
outcome0 = NULL,
generateBaseline = TRUE,
matchedPair = TRUE,
scale = "proportion",
baselineNumerator = "base_num",
baselineDenominator = "base_denom",
denominator = NULL,
ICC_inp = NULL,
kernels = 200,
sd = NULL,
theta_inp = NULL,
tol = 0.005
)
trial 
an object of class 
effect 
numeric. The simulated effect size (defaults to 0) 
outcome0 
numeric. The anticipated value of the outcome in the absence of intervention 
generateBaseline 
logical. If 
matchedPair 
logical. If 
scale 
measurement scale of the outcome. Options are: 'proportion' (the default); 'count'; 'continuous'. 
baselineNumerator 
optional name of numerator variable for preexisting baseline data 
baselineDenominator 
optional name of denominator variable for preexisting baseline data 
denominator 
optional name of denominator variable for the outcome 
ICC_inp 
numeric. Target intra cluster correlation, provided as input when baseline data are to be simulated 
kernels 
number of kernels used to generate a de novo 
sd 
numeric. standard deviation of the normal kernel measuring spatial smoothing leading to spillover 
theta_inp 
numeric. input spillover interval 
tol 
numeric. tolerance of output ICC 
Synthetic data are generated by sampling around the values of
variable propensity
, which is a numerical vector
(taking positive values) of length equal to the number of locations.
There are three ways in which propensity
can arise:
propensity
can be provided as part of the input trial
object.
Baseline numerators and denominators (values of baselineNumerator
and baselineDenominator
may be provided.
propensity
is then generated as the numerator:denominator ratio
for each location in the input object
Otherwise propensity
is generated using a 2D Normal
kernel density. The OOR::StoSOO
is used to achieve an intracluster correlation coefficient (ICC) that approximates
the value of 'ICC_inp'
by searching for an appropriate value of the kernel bandwidth.
num[i]
, the synthetic outcome for location i
is simulated with expectation:
E(num[i]) = outcome0[i] * propensity[i] * denom[i] * (1  effect*I[i])/mean(outcome0[] * propensity[])
The sampling distribution of num[i]
depends on the value of scale
as follows:
scale
=’continuous’: Values of num
are sampled from a
Normal distributions with means E(num[i])
and variance determined by the fitting to ICC_inp
.
scale
=’count’: Simulated events are allocated to locations via multivariate hypergeometric distributions
parameterised with E(num[i])
.
scale
=’proportion’: Simulated events are allocated to locations via multinomial distributions
parameterised with E(num[i])
.
denominator
may specify a vector of numeric (nonzero) values
in the input "CRTsp"
or data.frame
which is returned
as variable denom
. It acts as a scalefactor for continuous outcomes, ratemultiplier
for counts, or denominator for proportions. For discrete data all values of denom
must be > 0.5 and are rounded to the nearest integer in calculations of num
.
By default, denom
is generated as a vector of ones, leading to simulation of
dichotomous outcomes if scale
=’proportion’.
If baseline numerators and denominators are provided then the output vectors
base_denom
and base_num
are set to the input values. If baseline numerators and denominators
are not provided then the synthetic baseline data are generated by sampling around propensity
in the same
way as the outcome data, but with the effect size set to zero.
If matchedPair
is TRUE
then pairmatching on the baseline data will be used in randomization providing
there are an even number of clusters. If there are an odd number of clusters then matched pairs are not generated and
an unmatched randomization is output.
Either sd
or theta_inp
must be provided. If both are provided then
the value of sd
is overwritten
by the standard deviation implicit in the value of theta_inp
.
Spillover is simulated as arising from a diffusionlike process.
For further details see Multerer (2021)
A list of class "CRTsp"
containing the following components:
geom_full  list:  summary statistics describing the site cluster assignments, and randomization 
design  list:  values of input parameters to the design 
trial  data frame:  rows correspond to geolocated points, as follows: 
x  numeric vector: xcoordinates of locations  
y  numeric vector: ycoordinates of locations  
cluster  factor: assignments to cluster of each location  
arm  factor: assignments to control or intervention for each location 

nearestDiscord  numeric vector: signed Euclidean distance to nearest discordant location (km)  
propensity  numeric vector: propensity for each location  
base_denom  numeric vector: denominator for baseline  
base_num  numeric vector: numerator for baseline  
denom  numeric vector: denominator for the outcome  
num  numeric vector: numerator for the outcome  
...  other objects included in the input "CRTsp" object
or data.frame 

{smalltrial < readdata('smalltrial.csv')
simulation < simulateCRT(smalltrial,
effect = 0.25,
ICC_inp = 0.05,
outcome0 = 0.5,
matchedPair = FALSE,
scale = 'proportion',
sd = 0.6,
tol = 0.05)
summary(simulation)
}