R: Evaluates L1 distance between multidimensional histograms

L1.meas {cem}

R Documentation

Evaluates L1 distance between multidimensional histograms

Description

Evaluates L1 distance between multidimensional histograms

Usage

L1.meas(group, data, drop=NULL, breaks = NULL, weights, grouping = NULL)

Arguments

`group`	the group variable
`data`	the data
`drop`	a vector of variable names in the data frame to ignore
`breaks`	a list of vectors of cutpoints; if not specified, automatic choice will be made
`weights`	weights
`grouping`	named list, each element of which is a list of groupings for a single categorical variable. See Details.

Details

This function calculates the L1 distance on the k-dimensional histogram in order to measure the level of imbalance in a matching solution.

If breaks is not specified, the Scott automated bin calculation is used (which coarsens less than Sturges, which used in cem). Please refer to cem help page. In this case, breaks are used to calculate the L1 measure.

When choosing breaks for L1, a very fine coarsening (many cut points) produces values of L1 close to 1. A very mild coarsening (very fex cutpoints), is not able to discriminate, i.e. L1 close to 0 (particularly true when the number of observations is small with respect to the number of continuous variables).

The grouping option is a list where each element is itself a list. For example, suppose for variable quest1 you have the following possible levels "no answer", NA, "negative", "neutral", "positive" and you want to collect ("no answer", NA, "neutral") into a single group, then the grouping argument should contain list(quest1=list(c("no answer", NA, "neutral"))). Or if you have a discrete variable elements with values 1:10 and you want to collect it into groups “1:3,NA”, “4”, “5:9”, “10” you specify in grouping the following list list(elements=list(c(1:3,NA), 5:9)). Values not defined in the grouping are left as they are. If cutpoints and groupings are defined for the same variable, the groupings take precedence and the corresponding cutpoints are set to NULL.

The L1.profile function shows how to compare matching solutions for any level of (i.e., without regard to) coarsening.

This code also calculate the Local Common Support (LCS) measure, which is the proportion of non empty k-dimensional cells of the histogram which contain at least one observation per group.

Value

An object of class L1.meas which is a list with the following fields

`L1`	The numerical value of the L1 measure
`breaks`	A list of cutpoints used to calculate the L1 measure
`LCS`	The numerical value of the Local Common Support proportion

Author(s)

Stefano Iacus, Gary King, and Giuseppe Porro

References

Iacus, King, Porro (2011) doi:10.1198/jasa.2011.tm09599

Iacus, King, Porro (2012) doi:10.1093/pan/mpr013

Iacus, King, Porro (2019) doi:10.1017/pan.2018.29

Examples

data(LL)
set.seed(123)
L1.meas(LL$treated,LL, drop=c("treated","re78"))

[Package cem version 1.1.31 Index]