ohenery {ohenery}R Documentation

The 'ohenery' package.

Description

Modeling of ordinal outcomes via the softmax function under the Harville and Henery models.

Harville and Henery models

The Harville and Henery models describe the probability of ordered outcomes in terms of some parameters. Typically the ordered outcomes are things like place in a race, or winner among a large number of contestants. The Harville model could be described as a softmax probability for the first place finish, with a recursive model on the remaining places. The Henery model generalizes that to adjust the remaining places with another parameter.

These are best illustrated with an example. Suppose you observe a race of 20 contestants. Contestant number 11 takes first place, number 6 takes second place, and 17 takes third place, while the fourth through twentieth places are not recorded or not of interest. Under the Harville model, the probability of this outcome can be expressed as

μ11iμiμ6i11μiμ17i11,i6μi,\frac{\mu_{11}}{\sum_i \mu_i} \frac{\mu_6}{\sum_{i \ne 11} \mu_i} \frac{\mu_{17}}{\sum_{i \ne 11, i \ne 6} \mu_i},

where μi=expηi\mu_i = \exp{\eta_i}. In a softmax regression under the Harville model, one expresses the odds as ηi=xiβ\eta_i = x_i^{\top}\beta, where xix_i are independent variables, for some β\beta to be fit by the regression.

Under the Henery model, one adds gammas, γ2,γ3,...\gamma_2, \gamma_3, ... such that the probability of the outcome above is

μ11iμiμ6γ2i11μiγ2μ17γ3i11,i6μiγ3.\frac{\mu_{11}}{\sum_i \mu_i} \frac{\mu_6^{\gamma_2}}{\sum_{i \ne 11} \mu_i^{\gamma_2}} \frac{\mu_{17}^{\gamma_3}}{\sum_{i \ne 11, i \ne 6} \mu_i^{\gamma_3}}.

There is no reason to model a γ1\gamma_1 as anything but one, since it would be redundant. The Henery softmax regression estimates the β\beta as well as the γj\gamma_j. To simplify the regression, the higher order gammas are assumed to equal the last fit value. That is, we usually model γ5=γ4=γ3\gamma_5=\gamma_4=\gamma_3.

The regression supports weighted estimation as well. The weights are applied to the places, not to the participants. The weighted likelihood under the example above, for the Harville model is

(μ11iμi)w1(μ6i11μi)w2(μ17i11,i6μi)w3.\left(\frac{\mu_{11}}{\sum_i \mu_i}\right)^{w_1} \left(\frac{\mu_6}{\sum_{i \ne 11} \mu_i}\right)^{w_2} \left(\frac{\mu_{17}}{\sum_{i \ne 11, i \ne 6} \mu_i}\right)^{w_3}.

The weighting mechanism is how this package deals with unobserved places. Rather than marking all runners-up as tied for fourth place, in this case one sets the wi=0w_i=0 for i>3i > 3. The regression is then not asked to make distinctions between the tied runners-up.

Breaking Changes

This package is a work in progress. Expect breaking changes. Please file any bug reports or issues at https://github.com/shabbychef/ohenery/issues.

Legal Mumbo Jumbo

ohenery is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

Note

This package is maintained as a hobby.

Author(s)

Steven E. Pav shabbychef@gmail.com

References

Harville, D. A. "Assigning probabilities to the outcomes of multi-entry competitions." Journal of the American Statistical Association 68, no. 342 (1973): 312-316. http://dx.doi.org/10.1080/01621459.1973.10482425

Henery, R. J. "Permutation probabilities as models for horse races." Journal of the Royal Statistical Society: Series B (Methodological) 43, no. 1 (1981): 86-91. http://dx.doi.org/10.1111/j.2517-6161.1981.tb01153.x


[Package ohenery version 0.1.1 Index]