smahal {DOS} R Documentation

Robust Mahalanobis Distance Matrix for Optimal Matching

Description

Computes a robust Mahalanobis distance matrix between treated individuals and potential controls. The usual Mahalnaobis distance may ignore a variable because it contains one extreme outlier, and it pays too much attention to rare binary variables, but smahal addresses both issues.

Usage

smahal(z, X)


Arguments

 z z is a vector that is 1 for a treated individual and 0 for a control. X A matrix of continuous or binary covariates. The number of rows of X must equal the length of z.

Details

The usual Mahalnaobis distance may ignore a variable because it contains one extreme outlier, and it pays too much attention to rare binary variables, but smahal addresses both issues.

To address outliers, each column of x is replaced by a column of ranks, with average ranks used for ties. This prevents one outlier from inflating the variance for a column, thereby making large differences count little in the Mahalanobis distance.

Rare binary variables have very small variances, p(1-p) for small p, so in the usual Mahalanobis distance, a mismatch for a rare binary variable is counted as very important. If you were matching for US states as 49 binary variables, mismatching for California would not be very important, because p(1-p) is not so small, but mismatching for Wyoming is very important because p(1-p) is very small. To combat this, the variances of the ranked columns are rescaled so they are all the same, all equal to the variance of untied ranks. See Chapter 8 of Design of Observational Studies (2010).

Value

The robust distance matrix has one row for each treated individual (z=1) and one column for each potential control (z=0). The row and column names of the distance matrix refer to the position in z, 1, 2, ..., length(z).

Author(s)

Paul R. Rosenbaum

References

Rosenbaum, P. R. (2010). Design of Observational Studies. New York: Springer. The method and example are discussed in Chapter 8.

Examples

data(costa)
z<-1*(costa$welder=="Y") aa<-1*(costa$race=="A")
smoker=1*(costa$smoker=="Y") age<-costa$age
x<-cbind(age,aa,smoker)
dmat<-smahal(z,x)
# Mahalanobis distances
round(dmat[,1:6],2) # Compare with Table 8.6 in Design of Observational Studies (2010)
# Impose propensity score calipers
prop<-glm(z~age+aa+smoker,family=binomial)\$fitted.values # propensity score
# Mahalanobis distanced penalized for violations of a propensity score caliper.
# This version is used for numerical work.