prepareCGOneFactorData {cg} | R Documentation |
Prepare data object from a data frame for One Factor / One-Way / Unpaired Samples evaluations
Description
The function prepareCGOneFactorData
reads in a data frame and
settings
in order to create a
cgOneFactorData
object. The created object is designed to have exploratory and
fit methods applied to it.
Usage
prepareCGOneFactorData(dfr, format = "listed", analysisname = "",
endptname = "", endptunits = "", logscale = TRUE, zeroscore = NULL,
addconstant = NULL, rightcensor = NULL, leftcensor = NULL, digits = NULL,
refgrp = NULL, stamps = FALSE)
Arguments
dfr |
A valid data frame, see the |
format |
Default value of
|
analysisname |
Optional, a character text or
math-valid expression that will be set for
default use in graph title and table methods. The default
value is the empty |
endptname |
Optional, a character text or math-valid expression
that will be set for default use as the y-axis label of graph
methods, and also used for table methods. The default
value is the empty |
endptunits |
Optional, a character text or math-valid
expression that can be used in combination with the endptname
argument.
Parentheses are
automatically added to this input, which will be added to the end
of the endptname character value or expression. The default
value is the empty |
logscale |
Apply a log-transformation to the data for
evaluations. The default value is |
zeroscore |
Optional,
replace response values of zero with a derived or specified
numeric value, as an approach to overcome the presence of zeroes
when evaluation in the
logarithmic scale ( |
addconstant |
Optional,
add a numeric constant to all response values, as an
approach to overcome the presence of zeroes when evaluation in the
logarithmic scale |
rightcensor |
Optional, can be specified with a numeric
value where any value equal to or greater will be regarded as
right censored in the evaluation. The value of |
leftcensor |
Optional, can be specified with a numeric
value where any value equal to or lesser will be regarded as
left censored in the evaluation. The value of |
digits |
Optional, for output display purposes in graphs
and table methods, values will be rounded to this numeric
value. Only the integers of 0, 1, 2, 3, and 4 are accepted. No
rounding is done during any calculations. The default value is
|
refgrp |
Optional, specify one of the factor levels to be the
“reference group”, such as a “control” group.
The default value is |
stamps |
Optional, specify a time stamp in graphs, along
with cg package
version identification. The default value is |
Details
- Input Data Frame
-
The input data frame
dfr
can be of the format"listed"
or"groupcolumns"
. Another distinguishing characteristic is whether or not it contains censored data representations.Censored observations can be represented by
<
for left-censoring and>
for right-censoring. The<
value refers to values less than or equal to a numeric value. For example,<0.76
denotes a left-censored value of 0.76 or less. Similarly,>2.02
denotes a value of 2.02 or greater for a right-censored value. There must be no space between the direction indicator and the numeric value. These representations can be used in either thelisted
orgroupcolumns
formats fordfr
.No interval-censored representations are currently handled when
format="groupcolumns"
.If
format="groupcolumns"
fordfr
is specified, then the number of columns must equal the number of groups, and any censored values must follow the<
and>
representations. The individual group values are of mode character, since any censored values will be represented for example as<0.76
or>2.02
. If any of the groups have less number of observations than any others, i.e. there are unequal sample sizes, then the corresponding "no data" cells in the data frame need to contain empty quote""
values.If
format="listed"
fordfr
is specified, then there may be anywhere from two to four columns for an input data frame.- two columns
The first column has the group levels to define the factor, and the second column contains the response values. Censored representations of
<
and>
can be used here. One or both ofrightcensor
orleftcensor
may also be specified as a number. If a number is specified forrightcensor
, then all values in the second column equal to this value will be processed as right-censored. Analogously, if a number is specified forleftcensor
, then all values in the second column equal to this value will be processed as left-censored. WARNING: This should be used cautiously to make sure the equality occurs as desired. This convention is designed for simple Type I censoring scenarios.- three columns
Like the two column case, the first column has the group levels to define the factor, and the second column contains the response values, which will all be coerced to numeric. Any censoring information must be specified in the third column. Borrowing the convention of
Surv
from the survival package,0
=right censored,1
=no censoring, and2
=left censored. Ifrightcensor=NULL
andleftcensor=NULL
are left as defaults in the call, and values of 0, 1, and 2 are all represented, then the processing will create a suitable data framedfru
for modeling that the canonicalsurvreg
function understands.However, if 0 and 1 are the only specified values in the third censoring status column, then one of
rightcensor=TRUE
orleftcensor=TRUE
must be specified, but NOT both, or an error message will occur. A column of all 1's or all 0's will also raise an error message.- four columns
Like the two column case, the first column has the group levels to define the factor. The second and third columns need to have numeric response information, and the fourth column needs to have censoring status. This is the most general representation, where any combination of left-censoring, right-censoring, and interval-censoring is permitted. The
rightcensor
andleftcensor
input arguments are ignored and set toNULL
. IMPORTANT: The convention ofSurv
from the survival package, 0=right censored, 1=no censoring, and 2=left censored, 3=interval censored, andtype="interval"
, is followed. For status=0, 1, and 2, the second and third columns match in value, so that the status variable in the fourth column distinguishes the lower and upper bounds for the right-censored (0) and left-censored (2) cases. For status=3, the two values differ to define the interval boundaries. The processing will create a suitable data framedfru
for modeling that the canonicalsurvreg
andsurvfit
functions from the survival package understand.
- zeroscore
-
If
zeroscore="estimate"
is specified, a number close to zero is derived to replace all zeroes for subsequent log-scale analyses. A spline fit (usingspline
andmethod="natural"
) of the log of the response vector on the original response vector is performed. The zeroscore is then derived from the log-scale value of the spline curve at the original scale value of zero. This approach comes from the concept of arithmetic-logarithmic scaling discussed in Tukey, Ciminera, and Heyse (1985). - addconstant
-
If
addconstant="simple"
oraddconstant="VR"
is specified, a number is derived and added to all response values."simple"
Taken from the "white" book on S (Chambers and Hastie, 1992), page 68. The range (
max - min
) of the response values is multiplied by0.0001
to derive the number to add to all the response values."VR"
Based on the
logtrans
function discussed in Venables and Ripley (2002), pages 171-172 and available in the MASS package. The algorithm applies a Box-Cox profile likelihood approach with a log scale translation model.
Value
A cgOneFactorData
object is returned, with the following slots:
dfr |
The original input data frame that is the specified value of the
|
dfru |
Processed version of the input data frame, which will be used for the various evaluation methods. |
fmt.dfru |
A list version of the input data frame, which will only
differ from the |
has.censored |
Boolean |
settings |
A list of properties associated with the data frame:
|
Note
Contact cg@billpikounis.net for bug reports, questions, concerns, and comments.
Author(s)
Bill Pikounis [aut, cre, cph], John Oleynick [aut], Eva Ye [ctb]
References
Tukey, J.W., Ciminera, J.L., and Heyse, J.F. (1985). "Testing the Statistical Certainty of a Response to Increasing Doses of a Drug," Biometrics, Volume 41, 295-301.
Chambers, J.M, and Hastie, T.R. (1992), Statistical Modeling in S. Chapman & Hall/CRC.
Venables, W. N., and Ripley, B. D. (2002), Modern Applied Statistics with S. Fourth edition. Springer.
See Also
Surv
, canine
,
gmcsfcens
,
prepare
Examples
data(canine)
canine.data <- prepareCGOneFactorData(canine, format="groupcolumns",
analysisname="Canine",
endptname="Prostate Volume",
endptunits=expression(plain(cm)^3),
digits=1, logscale=TRUE, refgrp="CC")
## Censored Data
data(gmcsfcens)
gmcsfcens.data <- prepareCGOneFactorData(gmcsfcens, format="groupcolumns",
analysisname="cytokine",
endptname="GM-CSF (pg/ml)",
logscale=TRUE)