preprocess {IPDfromKM}R Documentation

Preprocess the read-in coordinates

Description

Preprocess the raw coordinates into an appropriate format for reconstruct IPD. Returns include the clean dataset and a table displaying the index of read-in points within each time interval.

Usage

preprocess(dat,trisk=NULL,nrisk=NULL,totalpts=NULL,maxy=100)

Arguments

dat

a two-column dataset with the first column being times, and the second the survival probabilities extracted from a published K-M curve using getpoints function, or software such as ScanIt or DigitizeIt.

trisk

a vector containing risk time points (i.e., times points at which the number of patients at risk are reported). This often can be found under the x-axis of a K-M curve. The default value is NULL.

nrisk

a vector containing the numbers of patients at risk reported at the risk time points. This often can be found under the x-axis of a K-M curve. The default value is NULL.

totalpts

the initial number of patients, with a default value of NULL. However, when both trisk and nrisk are NULL, this number is required for the estimation.

maxy

the scale of survival probability. Set maxy=100 when the probabilities are reported in percentages (e.g., 70%). Set maxy=1 when the probabilities are reported using decimal numbers (e.g, 0.7).

Details

The preprocess() function process the coordinates dataset extrated from a published K-M curve using getpoints function, or software such as DigitizeIt or ScanIt.
In most of published Kaplan-Meier curves, we can also find several numbers of patients at risk under the x-axis. These numbers at risk, and the time reported them, should be manually input in the form of vectors (nrisk and trisk). However, when these information is not available, we can leave the "trisk" and "nrisk" parameter as "NULL". In this case, the initial number of patients "totalpts" should be input.

Sample dataset can be found in Radiationdata.

Value

preprocess() returns a list object, including four items as follows.

preprocessdat: the two-column(i.e.,time, survival) table after preprocessing

intervalIndex: a table displaying the index of read-in points within each time interval.

endpts: the number of patients remaining at the end of the trial.

inputdat: the read-in dataset.

References

Guyot P, Ades AE, Ouwens MJ, Welton NJ. Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves. BMC Med Res Methodol.2012; 1:9.

Examples



# Radiationdata$radio is a dataset exported from ScanIt software ================
radio <- Radiationdata$radio

# Load time points when the patients number =======
# at risk reported (i.e. trisk in month) ======
trisk <- Radiationdata$trisk

# Load the numbers of patients at risk reported (i.e. nrisk) =======
# at the time points (trisk) ======
nrisk.radio <- Radiationdata$nrisk.radio

# Use the trisk and nrisk as input for preprocess and reconstruction ============
pre_radio_1 <- preprocess(dat=Radiationdata$radio, trisk=trisk,
             nrisk=nrisk.radio,totalpts=NULL,maxy=100)
est_radio_1 <- getIPD(prep=pre_radio_1,armID=0,tot.events=NULL)

# Output include reconstructed individual patients data =========================
head(est_radio_1$IPD)

# When trisk and nrisk were not available, then we must input ====================
# the initial number of patients   ===============================================
pre_radio_2 <- preprocess(dat=Radiationdata$radio, totalpts=213,maxy=100)
est_radio_2 <- getIPD(prep=pre_radio_2,armID=0,tot.events=NULL)

# Output include reconstructed individual patients data ==========================
head(est_radio_2$IPD)


[Package IPDfromKM version 0.1.10 Index]