FixedBinBinIT {Surrogate}R Documentation

Fits (univariate) fixed-effect models to assess surrogacy in the binary-binary case based on the Information-Theoretic framework

Description

The function FixedBinBinIT uses the information-theoretic approach (Alonso & Molenberghs, 2007) to estimate trial- and individual-level surrogacy based on fixed-effect models when both S and T are binary variables. The user can specify whether a (weighted or unweighted) full, semi-reduced, or reduced model should be fitted. See the Details section below.

Usage

FixedBinBinIT(Dataset, Surr, True, Treat, Trial.ID, Pat.ID, 
Model=c("Full"), Weighted=TRUE, Min.Trial.Size=2, Alpha=.05, 
Number.Bootstraps=50, Seed=sample(1:1000, size=1))

Arguments

Dataset

A data.frame that should consist of one line per patient. Each line contains (at least) a surrogate value, a true endpoint value, a treatment indicator, a patient ID, and a trial ID.

Surr

The name of the variable in Dataset that contains the surrogate endpoint values.

True

The name of the variable in Dataset that contains the true endpoint values.

Treat

The name of the variable in Dataset that contains the treatment indicators. The treatment indicator should either be coded as 1 for the experimental group and -1 for the control group, or as 1 for the experimental group and 0 for the control group.

Trial.ID

The name of the variable in Dataset that contains the trial ID to which the patient belongs.

Pat.ID

The name of the variable in Dataset that contains the patient's ID.

Model

The type of model that should be fitted, i.e., Model=c("Full"), Model=c("Reduced"), or Model=c("SemiReduced"). See the Details section below. Default Model=c("Full").

Weighted

Logical. In practice it is often the case that different trials (or other clustering units) have different sample sizes. Univariate models are used to assess surrogacy in the information-theoretic approach, so it can be useful to adjust for heterogeneity in information content between the trial-specific contributions (particularly when trial-level surrogacy measures are of primary interest and when the heterogeneity in sample sizes is large). If Weighted=TRUE, weighted regression models are fitted. If Weighted=FALSE, unweighted regression analyses are conducted. See the Details section below. Default TRUE.

Min.Trial.Size

The minimum number of patients that a trial should contain to be included in the analysis. If the number of patients in a trial is smaller than the value specified by Min.Trial.Size, the data of the trial are excluded from the analysis. Default 2.

Alpha

The \alpha-level that is used to determine the confidence intervals around R^2_{h} and R^2_{ht}. Default 0.05.

Number.Bootstraps

The standard errors and confidence intervals for R^2_{h}, R^2_{b.ind} and R^2_{h.ind} are determined based on a bootstrap procedure. Number.Bootstraps specifies the number of bootstrap samples that are used. Default 50.

Seed

The seed to be used in the bootstrap procedure. Default sample(1:1000, size=1).

Details

Individual-level surrogacy

The following univariate generalised linear models are fitted:

g_{T}(E(T_{ij}))=\mu_{Ti}+\beta_{i}Z_{ij},

g_{T}(E(T_{ij}|S_{ij}))=\gamma_{0i}+\gamma_{1i}Z_{ij}+\gamma_{2i}S_{ij},

where i and j are the trial and subject indicators, g_{T} is an appropriate link function (i.e., a logit link when binary endpoints are considered), S_{ij} and T_{ij} are the surrogate and true endpoint values of subject j in trial i, and Z_{ij} is the treatment indicator for subject j in trial i. \mu_{Ti} and \beta_{i} are the trial-specific intercepts and treatment-effects on the true endpoint in trial i. \gamma_{0i} and \gamma_{1i} are the trial-specific intercepts and treatment-effects on the true endpoint in trial i after accounting for the effect of the surrogate endpoint.

The -2 log likelihood values of the previous models in each of the i trials (i.e., L_{1i} and L_{2i}, respectively) are subsequently used to compute individual-level surrogacy based on the so-called Variance Reduction Factor (VFR; for details, see Alonso & Molenberghs, 2007):

R^2_{h}= 1 - \frac{1}{N} \sum_{i} exp \left(-\frac{L_{2i}-L_{1i}}{n_{i}} \right),

where N is the number of trials and n_{i} is the number of patients within trial i.

When it can be assumed (i) that the treatment-corrected association between the surrogate and the true endpoint is constant across trials, or (ii) when all data come from a single clinical trial (i.e., when N=1), the previous expression simplifies to:

R^2_{h.ind}= 1 - exp \left(-\frac{L_{2}-L_{1}}{N} \right).

The upper bound does not reach to 1 when T is binary, i.e., its maximum is 0.75. Kent (1983) claims that 0.75 is a reasonable upper bound and thus R^2_{h.ind} can usually be interpreted without paying special consideration to the discreteness of T. Alternatively, to address the upper bound problem, a scaled version of the mutual information can be used when both S and T are binary (Joe, 1989):

R^2_{b.ind}= \frac{I(T,S)}{min[H(T), H(S)]},

where the entropy of T and S in the previous expression can be estimated using the log likelihood functions of the GLMs shown above.

Trial-level surrogacy

When a full or semi-reduced model is requested (by using the argument Model=c("Full") or Model=c("SemiReduced") in the function call), trial-level surrogacy is assessed by fitting the following univariate models:

S_{ij}=\mu_{Si}+\alpha_{i}Z_{ij}+\varepsilon_{Sij}, (1)

T_{ij}=\mu_{Ti}+\beta_{i}Z_{ij}+\varepsilon_{Tij}, (1)

where i and j are the trial and subject indicators, S_{ij} and T_{ij} are the surrogate and true endpoint values of subject j in trial i, Z_{ij} is the treatment indicator for subject j in trial i, \mu_{Si} and \mu_{Ti} are the fixed trial-specific intercepts for S and T, and \alpha_{i} and \beta_{i} are the fixed trial-specific treatment effects on S and T, respectively. The error terms \varepsilon_{Sij} and \varepsilon_{Tij} are assumed to be independent.

When a reduced model is requested by the user (by using the argument Model=c("Reduced") in the function call), the following univariate models are fitted:

S_{ij}=\mu_{S}+\alpha_{i}Z_{ij}+\varepsilon_{Sij}, (2)

T_{ij}=\mu_{T}+\beta_{i}Z_{ij}+\varepsilon_{Tij}, (2)

where \mu_{S} and \mu_{T} are the common intercepts for S and T. The other parameters are the same as defined above, and \varepsilon_{Sij} and \varepsilon_{Tij} are again assumed to be independent.

When the user requested a full model approach (by using the argument Model=c("Full") in the function call, i.e., when models (1) were fitted), the following model is subsequently fitted:

\widehat{\beta}_{i}=\lambda_{0}+\lambda_{1}\widehat{\mu_{Si}}+\lambda_{2}\widehat{\alpha}_{i}+\varepsilon_{i}, (3)

where the parameter estimates for \beta_i, \mu_{Si}, and \alpha_i are based on models (1) (see above). When a weighted model is requested (using the argument Weighted=TRUE in the function call), model (3) is a weighted regression model (with weights based on the number of observations in trial i). The -2 log likelihood value of the (weighted or unweighted) model (3) (L_1) is subsequently compared to the -2 log likelihood value of an intercept-only model (\widehat{\beta}_{i}=\lambda_{3}; L_0), and R^2_{ht} is computed based based on the Variance Reduction Factor (for details, see Alonso & Molenberghs, 2007):

R^2_{ht}= 1 - exp \left(-\frac{L_1-L_0}{N} \right),

where N is the number of trials.

When a semi-reduced or reduced model is requested (by using the argument Model=c("SemiReduced") or Model=c("Reduced") in the function call), the following model is fitted:

\widehat{\beta}_{i}=\lambda_{0}+\lambda_{1}\widehat{\alpha}_{i}+\varepsilon_{i},

where the parameter estimates for \beta_i and \alpha_i are based on models (1) when a semi-reduced model is fitted or on models (2) when a reduced model is fitted. The -2 log likelihood value of this (weighted or unweighted) model (L_1) is subsequently compared to the -2 log likelihood value of an intercept-only model (\widehat{\beta}_{i}=\lambda_{3}; L_0), and R^2_{ht} is computed based on the reduction in the likelihood (as described above).

Value

An object of class FixedBinBinIT with components,

Data.Analyze

Prior to conducting the surrogacy analysis, data of patients who have a missing value for the surrogate and/or the true endpoint are excluded. In addition, the data of trials (i) in which only one type of the treatment was administered, and (ii) in which either the surrogate or the true endpoint was a constant (i.e., all patients within a trial had the same surrogate and/or true endpoint value) are excluded. In addition, the user can specify the minimum number of patients that a trial should contain in order to include the trial in the analysis. If the number of patients in a trial is smaller than the value specified by Min.Trial.Size, the data of the trial are excluded. Data.Analyze is the dataset on which the surrogacy analysis was conducted.

Obs.Per.Trial

A data.frame that contains the total number of patients per trial and the number of patients who were administered the control treatment and the experimental treatment in each of the trials (in Data.Analyze).

Trial.Spec.Results

A data.frame that contains the trial-specific intercepts and treatment effects for the surrogate and the true endpoints (when a full or semi-reduced model is requested), or the trial-specific treatment effects for the surrogate and the true endpoints (when a reduced model is requested).

R2ht

A data.frame that contains the trial-level surrogacy estimate and its confidence interval.

R2h.ind

A data.frame that contains the individual-level surrogacy estimate R^2_{h.ind} (single-trial based estimate) and its confidence interval.

R2h

A data.frame that contains the individual-level surrogacy estimate R^2_{h} (cluster-based estimate) and its confidence interval (based on a bootsrtrap).

R2b.ind

A data.frame that contains the individual-level surrogacy estimate R^2_{b.ind} (single-trial based estimate accounting for upper bound) and its confidence interval (based on a bootstrap).

R2h.Ind.By.Trial

A data.frame that contains individual-level surrogacy estimates R^2_{hInd} (cluster-based estimates) and their confidence interval for each of the trials seperately.

Author(s)

Wim Van der Elst, Ariel Alonso, & Geert Molenberghs

References

Alonso, A, & Molenberghs, G. (2007). Surrogate marker evaluation from an information theory perspective. Biometrics, 63, 180-186.

Joe, H. (1989). Relative entropy measures of multivariate dependence. Journal of the American Statistical Association, 84, 157-164.

Kent, T. J. (1983). Information gain as a general measure of correlation. Biometrica, 70, 163-173.

See Also

FixedBinContIT, FixedContBinIT, plot Information-Theoretic BinCombn

Examples

## Not run:  # Time consuming (>5sec) code part
# Generate data with continuous Surr and True
Sim.Data.MTS(N.Total=5000, N.Trial=50, R.Trial.Target=.9, R.Indiv.Target=.9,
             Fixed.Effects=c(0, 0, 0, 0), D.aa=10, D.bb=10, Seed=1,
             Model=c("Full"))
# Dichtomize Surr and True
Surr_Bin <- Data.Observed.MTS$Surr
Surr_Bin[Data.Observed.MTS$Surr>.5] <- 1
Surr_Bin[Data.Observed.MTS$Surr<=.5] <- 0
True_Bin <- Data.Observed.MTS$True
True_Bin[Data.Observed.MTS$True>.15] <- 1
True_Bin[Data.Observed.MTS$True<=.15] <- 0
Data.Observed.MTS$Surr <- Surr_Bin
Data.Observed.MTS$True <- True_Bin

# Assess surrogacy using info-theoretic framework
Fit <- FixedBinBinIT(Dataset = Data.Observed.MTS, Surr = Surr, 
True = True, Treat = Treat, Trial.ID = Trial.ID, 
Pat.ID = Pat.ID, Number.Bootstraps=100)

# Examine results
summary(Fit)
plot(Fit, Trial.Level = FALSE, Indiv.Level.By.Trial=TRUE)
plot(Fit, Trial.Level = TRUE, Indiv.Level.By.Trial=FALSE)

## End(Not run)

[Package Surrogate version 3.3.0 Index]