preprocess {IPDfromKM} | R Documentation |
Preprocess the read-in coordinates
Description
Preprocess the raw coordinates into an appropriate format for reconstruct IPD. Returns include the clean dataset and a table displaying the index of read-in points within each time interval.
Usage
preprocess(dat,trisk=NULL,nrisk=NULL,totalpts=NULL,maxy=100)
Arguments
dat |
a two-column dataset with the first column being times, and the second the survival probabilities extracted from a published K-M curve using |
trisk |
a vector containing risk time points (i.e., times points at which the number of patients at risk are reported). This often can be found under the x-axis of a K-M curve. The default value is NULL. |
nrisk |
a vector containing the numbers of patients at risk reported at the risk time points. This often can be found under the x-axis of a K-M curve. The default value is NULL. |
totalpts |
the initial number of patients, with a default value of NULL. However, when both trisk and nrisk are NULL, this number is required for the estimation. |
maxy |
the scale of survival probability. Set maxy=100 when the probabilities are reported in percentages (e.g., 70%). Set maxy=1 when the probabilities are reported using decimal numbers (e.g, 0.7). |
Details
The preprocess()
function process the coordinates dataset extrated from a published K-M curve using getpoints
function, or software such as DigitizeIt or ScanIt.
In most of published Kaplan-Meier curves, we can also find several numbers of patients at risk under the x-axis. These numbers at risk, and the time
reported them, should be manually input in the form of vectors (nrisk and trisk). However, when these information is not available, we can leave the "trisk" and
"nrisk" parameter as "NULL". In this case, the initial number of patients "totalpts" should be input.
Sample dataset can be found in Radiationdata
.
Value
preprocess()
returns a list object, including four items as follows.
preprocessdat: the two-column(i.e.,time, survival) table after preprocessing
intervalIndex: a table displaying the index of read-in points within each time interval.
endpts: the number of patients remaining at the end of the trial.
inputdat: the read-in dataset.
References
Guyot P, Ades AE, Ouwens MJ, Welton NJ. Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves. BMC Med Res Methodol.2012; 1:9.
Examples
# Radiationdata$radio is a dataset exported from ScanIt software ================
radio <- Radiationdata$radio
# Load time points when the patients number =======
# at risk reported (i.e. trisk in month) ======
trisk <- Radiationdata$trisk
# Load the numbers of patients at risk reported (i.e. nrisk) =======
# at the time points (trisk) ======
nrisk.radio <- Radiationdata$nrisk.radio
# Use the trisk and nrisk as input for preprocess and reconstruction ============
pre_radio_1 <- preprocess(dat=Radiationdata$radio, trisk=trisk,
nrisk=nrisk.radio,totalpts=NULL,maxy=100)
est_radio_1 <- getIPD(prep=pre_radio_1,armID=0,tot.events=NULL)
# Output include reconstructed individual patients data =========================
head(est_radio_1$IPD)
# When trisk and nrisk were not available, then we must input ====================
# the initial number of patients ===============================================
pre_radio_2 <- preprocess(dat=Radiationdata$radio, totalpts=213,maxy=100)
est_radio_2 <- getIPD(prep=pre_radio_2,armID=0,tot.events=NULL)
# Output include reconstructed individual patients data ==========================
head(est_radio_2$IPD)