| whatif {WhatIf} | R Documentation | 
Counterfactual Evaluation
Description
Implements the methods described in King and Zeng (2006a, 2006b) for
evaluating counterfactuals.  
Usage
whatif(formula = NULL, data, cfact, range = NULL, freq = NULL, nearby = 1, 
distance = "gower", miss = "list", choice = "both", return.inputs = FALSE, 
return.distance = FALSE, mc.cores = detectCores(), ...)
Arguments
| formula | An optional formula without a dependent variable that
is of class "formula" and that follows standard Rconventions for formulas, e.g. ~ x1 + x2.  Allows you to
transform or otherwise re-specify combinations of the variables in
bothdataandcfact.  To use this
parameter, bothdataandcfactmust be coercable
to data frames; the variables of bothdataandcfactmust be labeled; and all variables appearing informulamust also appear in bothdataandcfact.  Otherwise, errors are returned.  The intercept is
automatically dropped.  Default isNULL. | 
| data | May take one of the following forms:  
 
 A Rmodel output object, such as the output from calls tolm,glm, andzelig. If it is not azeligobject,
such an output object must be a list.  It must additionally have either aformulaortermscomponent and either adataormodelcomponent; if it
does not, an error is returned.  Of the latter,whatiffirst looks fordata, which should contain either the original
data set supplied as part of the model call (as inglm)
or the name of this data set (as inzelig), which is
assumed to reside in the global environment.  Ifdatadoes not
exist,whatifthen looks formodel, which should
contain the model frame (as inlm).  The intercept is
automatically dropped from the extracted observed covariate data
set if the original model included one. A n-by-knon-character (logical or numeric) matrix or
data frame of observed covariate data withndata points
or units andkcovariates.  All desired variable transformations
and interaction terms should be included in this set ofkcovariates unlessformulais alternatively used to
produce them.  However, an intercept should not be.  Such a matrix
may be obtained by passing model output (e.g., output from a call
tolm) tomodel.matrixand excluding the intercept
from the resulting matrix if one was
fit.  Note thatwhatifwill attempt to coerce data frames
to their internal numeric values.  Hence, data frames should only
contain logical, numeric, and factor columns; character columns
will lead to an error being returned. A string.  Either the complete path (including file name) of the
file containing the data or the path relative to your working
directory.  This file should be a white space delimited text file.
If it contains a header, you must include a column of row names as
discussed in the help file for the Rfunctionread.table.  The data in the file should be as otherwise
described in (2). Missing data is allowed and will be dealt with
via the argument missing.  It should be flagged usingR's standard representation for missing data,NA. | 
| cfact | A Robject or a string.  If aRobject,
am-by-knon-character matrix or data frame of
counterfactuals withmcounterfactuals and the samekcovariates (in the same order) as indata.  However, ifformulais used to select a subset of thekcovariates,
thencfactmay contain either only thesej \leq kcovariates or the complete set ofkcovariates.  An intercept
should not be included as one of the covariates.  It will be
automatically dropped from the counterfactuals generated by
Zelig if the original model contained one.  Data frames
will again be coerced to their internal numeric values if possible.
If a string, either the complete path (including file name) of the
file containing the counterfactuals or the path relative to your
working directory.  This file should be a white space delimited text
file.  See the discussion underdatafor instructions on
dealing with a header.  All counterfactuals should be fully
observed: if you supply counterfactuals with missing data, they will
be list-wise deleted and a warning message will be printed to the screen. | 
| range | An optional numeric vector of length k, wherekis 
the number of covariates.  Each element represents the range of the corresponding
covariate for use in calculating Gower distances.  Use this argument
when covariate data do not represent the population of interest,
such as selection by stratification or experimental manipulation.
By default, the range of each covariate is calculated
from the data (the difference of its maximum and minimum values in
the sample), which is appropriate when a simple random sampling
design was used.  To supply your own range for thekth covariate,
set thekth element of the vector equal to the desired range
and all other elements equal toNA.  Default isNULL. | 
| freq | An optional numeric vector of any positive length, the elements
of which comprise a set of distances.  Used in calculating
cumulative frequency distributions for the distances of the data
points from each counterfactual.  For each such distance and
counterfactual, the cumulative frequency is the fraction of observed
covariate data points with distance to the counterfactual less
than or equal to the supplied distance value.  The default varies
with the distance measure used.  When the Gower distance measure is employed,
frequencies are calculated for the sequence of Gower distances from
0 to 1 in increments of 0.05.  When the Euclidian distance measure
is employed, frequencies are calculated for the sequence of Euclidian
distances from the minimum to the maximum observed distances in twenty
equal increments, all rounded to two decimal places.  Default is NULL. | 
| nearby | An optional scalar indicating
which observed data points are considered to be nearby (i.e., withing ‘nearby’
geometric variances of) the counterfactuals.  Used to calculate the summary statistic
returned by the function: the fraction of the observed data nearby
each counterfactual.  By default, the geometric variance of the
covariate data is used.  For example, setting nearbyto
2 will identify the proportion of data points within two geometric variances of a
counterfactual.  Default isNULL. | 
| distance | An optional string indicating which of two distance measures
to employ.  The choices are either "gower", Gower's non-parametric
distance measure (G^2), which is suitable for both qualitative
and quantitative data; or"euclidian", squared Euclidian distance, which 
is only suitable for quantitative data.  The default is"gower". | 
| miss | An optional string indicating the strategy for dealing
with missing data in the observed covariate data set.
whatifsupports two possible missing data strategies:"list", list-wise deletion of missing cases; and"case",
ignoring missing data case-by-case.  Note that if"case"is
selected, cases with missing values are deleted listwise for the
convex hull test and for computing Euclidian distances, but pairwise deletion is
used in computing the Gower distances to maximally use available
information. The user is strongly encouraged to treat missing data
using specialized tools such as Amelia prior to feeding the data towhatif.  Default is"list". | 
| choice | An optional string indicating which analyses to 
undertake. The options are either "hull", only perform the convex hull 
membership test;"distance", do not perform the convex
hull test but do everything else, such as calculating the distance between
each counterfactual and data point; or"both", undertake both the
convex hull test and the distance calculations (i.e., do everything).
Default is"both". | 
| return.inputs | A Boolean; should the processed observed
covariate and counterfactual data matrices on which all
whatifcomputations are performed be returned?  Processing
refers to internalwhatifoperations such as the subsetting
of covariates viaformula, the deletion of cases with
missing values, and the coercion of data frames to numeric matrices.
Primarily intended for diagnostic purposes.  IfTRUE, these matrices
are returned as a list.  Default isFALSE. | 
| return.distance | A Boolean; should the matrix of distances
between each counterfactual and data point be returned?  If
TRUE, this matrix is returned as part of the output; ifFALSE, it is not.  Default isFALSEdue to the large
size that this matrix may attain. | 
| mc.cores | The number of cores to use for the convex hull test, i.e. at 
most how many child processes will be run simultaneously. Must be at least 
one, and parallelization requires at least two cores. The default is set by
detectCores | 
.
| ... | Further arguments passed to and from other methods. | 
Details
This function is the primary tool for evaluating your counterfactuals.  
Specifically, it:
-  Determines whether or not your counterfactuals are in the
convex hull of the observed covariate data.  
 
-  Computes the distance of your counterfactuals from each of the - nobserved covariate data points.  The default distance function used is Gower's 
non-parametric measure.
 
-  Computes a summary statistic for each counterfactual based on 
the distances in (2):  the fraction of observed covariate data points with 
distances to your counterfactual less than a value you supply.  By
default, this value is taken to be the geometric variability of the observed
data.
 
-  Computes the cumulative frequency distribution of each counterfactual
for the distances in (2) using values that you supply.  By default, Gower
distances from 0 to 1 in increments of 0.05 are used.
 
Value
An object of class "whatif", a list consisting of the following 
six or seven elements:
| call | The original call to whatif. | 
| inputs | A list with two elements, dataandcfact.  Only
present ifreturn.inputswas set equal toTRUEin the call
towhatif.  The first element is the processed observed
covariate data matrix on which allwhatifcomputations were
performed.  The second element is the processed counterfactual data
matrix. | 
| in.hull | A logical vector of length m, wheremis the number
of counterfactuals.  Each element of the vector isTRUEif the corresponding
counterfactual is in the convex hull andFALSEotherwise. | 
| dist | A m-by-nnumeric matrix, wheremis 
the number of counterfactuals andnis the number of data points 
(units).  Only present ifreturn.distancewas set equal toTRUEin the call towhatif.  The[i, j]th entry of the matrix contains the  
distance between theith counterfactual and thejth data point. | 
| geom.var | A scalar.  The geometric variability of the observed covariate
data. | 
| sum.stat | A numeric vector of length m, wheremis the
number of counterfactuals.   Themth element contains the summary 
statistic for the corresponding counterfactual.  This summary statistic is 
the fraction of data points with distances to the counterfactual 
less than the argumentnearby, which by default is the geometric 
variability of the covariates. | 
| cum.freq | A numeric matrix.  By default, the matrix has
dimension m-by-21, wheremis the number of
counterfactuals; however, if you supplied your own frequencies via
the argumentfreq, the matrix has dimensionm-by-f,
wherefis the length offreq.  Each row of the
matrix contains the cumulative frequency distribution for the
corresponding counterfactual calculated using either the distance 
measure-specific default set of distance values or the set that you supplied (see 
the discussion under the argumentfreq).  Hence, the[i, j]th
entry of the matrix is the fraction of data points with 
distances to theith counterfactual less than or equal to the
value represented by thejth column.  The column names contain these
values. | 
Note
This function requires the lpSolve package.
Author(s)
Stoll, Heather hstoll@polsci.ucsb.edu, King, Gary
king@harvard.edu and Zeng, Langche zeng@ucsd.edu
References
King, Gary and Langche Zeng.  2006.  "The Dangers of 
Extreme Counterfactuals."  Political Analysis 14 (2).
Available from https://gking.harvard.edu.
King, Gary and Langche Zeng.  2007.  "When Can History Be Our Guide?
The Pitfalls of Counterfactual Inference."  International Studies Quarterly
51 (March).  Available from https://gking.harvard.edu.
See Also
plot.whatif,
summary.whatif,
print.whatif,
print.summary.whatif
Examples
##  Create example data sets and counterfactuals
my.cfact <- matrix(rnorm(3*5), ncol = 5)
my.data <- matrix(rnorm(100*5), ncol = 5)
##  Evaluate counterfactuals
my.result <- whatif(data = my.data, cfact = my.cfact, mc.cores = 1)
##  Evaluate counterfactuals and supply own gower distances for 
##  cumulative frequency distributions
my.result <- whatif(cfact = my.cfact, data = my.data, 
                    freq = c(0, .25, .5, 1, 1.25, 1.5), mc.cores = 1)
[Package 
WhatIf version 1.5-10 
Index]