logreg.myown {LogicReg} | R Documentation |
Writing your own Logic Regression scoring function
Description
Help file for writing your own scoring function for
logreg
!
Usage
logreg.myown()
Details
You can write your own scoring function for logreg
! This may
be useful if you have a model other than those which we already
programmed in.
Essentially you need to provide two routines in the file
Myownscoring.f:
(i) A routine Myownfitting which fits your model: it
provides a coefficient (beta) for each of the logic trees and provides
a score of how good the model is. Low scores are good. (So add a minus
sign if your score is a log-likelihood.)
(ii) A routine Myownscoring which - given the betas -
provides the score of your model. [If you don't use cross-validation,
this second routine is not needed, though some dummy routine to
satisfy the compiler should still be provided.]
After recompilation, you can fit your model using the option
type = 0
in logreg
. Below we give an example for a version
of the My.own functions for conditional logistic regression which are
also provided as inst/condlogic.ff when you downloaded the files.
PROGRAMMING DETAILS
Below is a list of variables that are passed on. Most of them are as
you expect - response, predictors (binary ones and continuous ones),
number of cases, number of predictors. In addition there are two
columns - dcph
and weight
- that can either be used to pass on an
auxiliary variable for each case (discrete for
dcph
and continuous for
weight
), or even some overall auxiliary variables - as these numbers
are not used anywhere else. If you do not need any of the variables -
just ignore them!
prtr
:
the predictions of the logic trees in the current model: this is an
integer matrix of size n1
times ntr
- although only the
first nop
columns contain useful information.
rsp
:
the response variable: this is a real (single precision) vector of
length n1
.
dcph
:
censor times: this is an integer vector of length n1
this could
be used as an auxiliary (integer) vector - as it is just passed on.
(There is no check that this is a 0/1 variable, when you use your own
scoring function.) For example, you could use this to pass on
something like cluster membership.
weight
:
weights for the cases this is a real vector of length n1
. this
could be used as an auxiliary (real) vector - as it is just passed on.
There is no check that these numbers are positive, when you choose
your own scoring function.
ordrs
:
the order (by response size) of the cases This is an integer vector of
length n1
. For the case with the smallest response this one is
1, for the second smallest 2, and so on. Ties are resolved arbitrary.
Always computed, although only used for proportional hazards
models. Use it as you wish.
n1
:
the total number of cases in the data.
ntr
:
the number of logic trees ALLOWED in the tree.
nop
:
the number of logic trees in the CURRENT model. The subroutines
should work if nop
is 0.
wh
:
the index of the tree that has been edited in the last move - i.e. the
column of prtr
that has changes since the last call.
nsep
:
number of variables that get fit a separate parameter The subroutines
should work if nsep
is 0.
seps
:
array of the above variables - this is a single precision matrix of
size nsep
times n1
. Note that seps
and
prtr
are stored in different directions.
For Myownfitting you should return:
betas
:
a vector of parameters of the model that you fit. betas(0)
should be the parameter for the intercept betas(1:nsep)
should
be the parameters for the continuous variables in seps
betas((nsep+1):(nsep+nop))
should be the parameters for the
binary trees in prtr if you have more parameters, use dcph
, or
weight
; these variables will not be printed however.
score
:
whatever score you assign to your model small should be good
(i.e. deviance or -log.likelihood).
reject
:
an indicator whether or not to reject the proposed move *regardless*
of the score (for example when an iteration necessary to determine the
score failed to converge (0 = move is OK ; 1 = reject move) set this
one to 0 if there is no such condition.
You are allowed to change the values of dcph, and weight.
For Myownscoring additional input is:
betas
:
the coefficients
You should return:
score
:
whatever score you assign to your model small should be good
(i.e. deviance or -log.likelihood). If the model "crashes", you should
simply return a very large number.
While we try to prevent that models are singular, it is possible that
for your model a single or degenerate model is passed on for
evaluation. For Myownfitting you can pass the model back with
reject = 1
, for Myownscoring you can pass it on with a very large
value for score
. Currently Myownscoring.f contains empty frames for
the scoring functions; condlogic.ff contains an example with
conditional logistic regression.
The logic regression program is written in Fortran 77.
CONDITIONAL LOGISTIC REGRESSION
A function for a conditional logistic regression score function is attached as an example function on how to write your own scoring function for Logic Regression. Obviously you can also use it if you have conditional logistic data.
Conditional logistic regression is common model fitting technique for matched case-control studies, in which each case is matched with one or more controls. (In conditional logistic regression several cases could be matched to several controls, in the implementation provided here only one case can be matched with each group of controls.) Conditional logistic regression models are parameterized like regular logistic regression models, except that the intercept is no longer identifiable. See, for example, Breslow and Day - Volume 1 (1990, Statistical Methods in Cancer Research, International Agency for Research on Cancer, Lyon) for details. Conditional logistic regression models are most easily fit using a stratified proportional hazards model (if there is one-to-one case-control matching it can also be fit using logistic regression, but that method breaks down if there is more than one control per case). Each group of a case and controls is one stratum. All cases get an arbitrarily event time of 1.00, and all controls get a censoring time of 2.00.
In our implementation we use the response column to indicate the matching. For all controls this column is 0, for a case it is k, indicating that the next k records are the matched controls for the current case. Thus, we order our cases so that each case is followed by its controls. Cases with a negative response are put in a stratum -1, which is not used in any computations. This has implications for cross-validation. See below.
In Myownfitting and Myownscoring we first allocate
various vectors (strata, index, censoring variable) that are local, as
well as some work arrays that are used by our fitting routines. (We
need to set some of the parameters for that, see the help page of
logreg
for details.) We then define idx(j)=j
for
j=1,n1
, and we define the strata
and delta
vectors. We use slightly modified versions of the proportional
hazards routines that are already used otherwise in the Logic
Regression program, to include stratification. After the model is
fitted, we assign minus the partial likelihood to score(1)
and
(for Myownfitting) we pass on the betas.
Recompile after replacing Myownscoring.f by condlogic.ff
The permutation and null model versions are not directly usable (we
could do some permutation tests, but they require more programming),
but we can use cross-validation. Obviously we should keep cases and
controls match. To that extend, we would run permutation with a
negative seed (see logreg
) and we would take care ourselves
that case-control groups are in a random order, and that every block
has the same number of records. We achieve the later by adding some
records with response -1. In particular, suppose that we have 19
pairs of case- (single) control data, and that we want to do 3-fold
cross validation. We would permute the sequence of the 19 pairs, and
add two records with response -1 after the 13th pair, and two records
with -1 at the end of the file, so that the total data file would have
42 records.
Author(s)
Ingo Ruczinski ingo@jhu.edu and Charles Kooperberg clk@fredhutch.org.
References
Ruczinski I, Kooperberg C, LeBlanc ML (2003). Logic Regression, Journal of Computational and Graphical Statistics, 12, 475-511.
Ruczinski I, Kooperberg C, LeBlanc ML (2002). Logic Regression - methods and software. Proceedings of the MSRI workshop on Nonlinear Estimation and Classification (Eds: D. Denison, M. Hansen, C. Holmes, B. Mallick, B. Yu), Springer: New York, 333-344.
Kooperberg C, Ruczinski I, LeBlanc ML, Hsu L (2001). Sequence Analysis using Logic Regression, Genetic Epidemiology, 21, S626-S631.
Selected chapters from the dissertation of Ingo Ruczinski, available from https://research.fredhutch.org/content/dam/stripe/kooperberg/ingophd-logic.pdf
See Also
Examples
logreg.myown() # displays this help file
help(logreg.myown) # equivalent