R: Find Cutoff Values.

findCutoffs {QuantileGradeR}

R Documentation

Find Cutoff Values.

Description

findCutoffs applies a quantile adjustment to inspection scores within a jurisdiction's subunits (e.g. ZIP codes) and creates a data frame of cutoff values to be used for grading restaurants or other inspected entities.

Usage

findCutoffs(X, z, gamma, resolve.ties = TRUE, restaurant.tol = 10,
  max.iterations = 20)

Arguments

`X`	Numeric matrix of size `n` x `p`, where `n` is the number is restaurants to be graded and `p` is the number of inspections to be used in grade assignment. Entry `X[i,j]` represents the inspection score for the `i`th restaurant in the `j`th most recent inspection.
`z`	Character vector of length `n` representing ZIP codes (or other subunits within a jurisdiction). `z[i]` is the ZIP code corresponding to the restaurant with inspection scores in row `i` of `X`.
`gamma`	Numeric vector representing absolute grade cutoffs. Entries in gamma should be increasing, with `gamma[1] <= gamma[2]` etc (this is related to the "Warning" section and larger scores being associated with higher risk).
`resolve.ties`	Boolean value that determines the definition of quantile to be used after optimal quantiles have been found with the `percentileSeek` function. See Modes below, as well as Appendix J of Ho, D.E., Ashwood, Z.C., and Elias, B. "Improving the Reliability of Food Safety Disclosure: A Quantile Adjusted Restaurant Grading System for Seattle-King County".
`restaurant.tol`	An integer indicating the maximum difference in the number of restaurants in a grading category between the unadjusted and adjusted grading algorithms (for the top `length(gamma)` grading categories).
`max.iterations`	The maximum number of iterations that the iterative algorithm (carried out by the internal `percentileSeek` function) should run in order to find optimal quantiles for ZIP cutoffs. The iterative algorithm is described in more detail below.

Details

In our documentation, we use the language "ZIP code" and "restaurant", however, our grading algorithm and our code can be applied to grade other inspected entities; and quantile cutoffs can be sought in subunits of a jurisdiction that are not ZIP codes. For example, it may make sense to search for quantile cutoffs in an inspector's allocated inspection area or within a census tract. We chose to work with ZIP codes in our work because area assignments for inspectors in King County (WA) tend to be single or multiple ZIP codes, and we desired to assign grades based on how a restaurant's scores compare to other restaurants assessed by the same inspector. We could have calculated quantile cutoffs in an inspector's allocated area, but inspector areas are not always contiguous. Because food choices are generally local, ZIP codes offer a transparent and meaningful basis for consumers to distinguish establishments. Where "ZIP code" is referenced, please read "ZIP code or other subunit of a jurisdiction" and "restaurant" should read "restaurant or other entity to be graded".

findCutoffs takes in a vector of cutoff scores, gamma, a matrix of restaurants' scores, X, and a vector corresponding to restaurants' ZIP codes, z, and outputs a data frame of cutoff scores to be used in the gradeAllBus function to assign grades to restaurants. findCutoffs first carries out "unadjusted grading" and compares restaurants' most recent routine inspection scores to the raw cutoff scores contained in gamma and assigns initial grades to restaurants. Grade proportions in this scheme are then used as initial quantiles to find quantile cutoffs in each ZIP code (or quantile cutoffs accommodating for the presence of score ties in the ZIP code, depending on the value of resolve.ties; see the Modes section). Restaurants are then graded with the ZIP code quantile cutoffs, and grading proportions are compared with grading proportions from the unadjusted system. Quantiles are iterated over one at a time (by the internal percentileSeek function, which uses a binary search root finding method) until grading proportions with ZIP code quantile cutoffs are within a certain tolerance (as determined by restaurant.tol) of the unadjusted grading proportions. This iterative step is important because of the discrete nature of the inspection score distribution, and the existence of large numbers of restaurants with the same inspection scores.

The returned ZIP code cutoff data frame has one row for each unique ZIP code and has (length(gamma)+1) columns, corresponding to one column for the ZIP code name, and (length(gamma)) cutoff scores separating the (length(gamma)+1) grading categories. Across each ZIP code's row, cutoff scores increase and we assume, as in the King County (WA) case, that greater risk is associated with larger inspection scores. (If scores are decreasing in risk, users should transform inspection scores with a simple function such as f(score) = - score before using any of the functions in QuantileGradeR.)

Modes

When resolve.ties = TRUE, in order to calculate quantile cutoffs in a ZIP code, we alter the definition of quantile from the usual "Nearest Rank" definition and use the "Quantile Adjustment (with Ties Resolution)" definition that is discussed in Appendix J of Ho, D.E., Ashwood, Z.C., and Elias, B. "Improving the Reliability of Food Safety Disclosure: A Quantile Adjusted Restaurant Grading System for Seattle-King County" (working paper). In particular, once we have found the optimal set of quantiles to be applied across ZIP codes, p, with the percentileSeek function, instead of returning (for B/C cutoffs, for example) the scores in each ZIP code that result in at least (p[2] x 100)% of restaurants in the ZIP code scoring less than or equal to these cutoffs, the mode resolve.ties = TRUE takes into account the ties that exist in ZIP codes. Returned scores for A/B cutoffs are those that result in the closest percentage of restaurants in the ZIP code scoring less than or equal to the A/B cutoff to the desired percentage, (p[1] x 100)%. Similarly, B/C cutoffs are the scores in the ZIP code that result in the closest percentage of restaurants in the ZIP code scoring less than or equal to the B/C cutoff and more than the A/B cutoff to the desired percentage, ((p[2] - p[1]) x 100)%.

When resolve.ties = FALSE, we use the usual "Nearest Rank" definition of quantile when applying the optimal quantiles, p, across ZIP codes.

Warning

findCutoffs will produce cutoff scores even for ZIP codes with only one restaurant: situations in which a quantile adjustment shouldn't be used. It is the job of the user to ensure that, if using the findCutoffs function, it makes sense to do so. This may involve only performing the quantile adjustment on larger ZIP codes and providing absolute cutoff points for smaller ZIP codes, or may involve aggregating smaller ZIP codes into a larger geographical unit and then performing the quantile adjustment on the larger area (the latter approach is the one we adopted).

As mentioned previously, findCutoffs was created for an inspection system that associates greater risk with larger inspection scores. If the inspection system of interest associates greater risk with reduced scores, it will be neccessary to perform a transformation of the scores matrix before utilizing the findCutoffs function. However a simple function such as f(score) = - score would perform the necessary transformation.

Examples


## ==== Quantile-Adjusted Grading =====
## ZIP Code Cutoffs
# In King County, meaningful scores in the inspection system are 0 and 30:
# more than 50% of restaurants score 0 points in a single inspection round,
# and 30 is the highest score that a restaurant can be assigned before it is
# subject to a return inspection, hence these values form our gamma vector.
# The output dataframe, zipcode.cutoffs.df, has ten rows and three columns: one
# row for every unique ZIP code in zips.kc, one column for the ZIP name, the
# second column for the A/B cutoff (Gamma.A) and the third column for the B/C
# cutoff (Gamma.B).

 zipcode.cutoffs.df <- findCutoffs(X.kc, zips.kc, gamma = c(0, 30))

## ==== Traditional Grading Systems ====
## ZIP Code Cutoffs
# Traditional (unadjusted) restaurant grading systems use the same cutoff scores
# for all ZIP codes. To allow comparison, an unadjusted ZIP code cutoff frame
# for King County is generated by the internal createCutoffsDF function:

 unadj.cutoffs.df <- createCutoffsDF(X.kc, zips.kc, gamma = c(0, 30), type = "unadj")

[Package QuantileGradeR version 0.1.1 Index]