associate {regclass} | R Documentation |
Association Analysis
Description
This function takes two quantities and computes relevent numerical measures of association. The p-values of the associations are estimated via permutation tests. Plots for diagnostics are provided as well, with optional arguments that allow for classic tests.
Usage
associate(formula, data, permutations = 500, seed=NA, plot = TRUE, classic = FALSE,
cex.leg=0.7, n.levels=NA,prompt=TRUE,color=TRUE,...)
Arguments
formula |
A standard R formula written as y~x, where y is the name of the variable playing the role of y and x is the name of the variable playing the role of x. |
data |
An optional argument giving the name of the data frame that contains x and y. If not specified, the function will use existing definitions in the parent environment. |
permutations |
The number of permutations for Monte Carlo estimation of the p-value. If 0, function defaults to reporting classic results. |
seed |
An optional argument specifying the random number seed for permutations. |
plot |
|
classic |
|
cex.leg |
Scale factor for the size of legends in plots. Larger values make legends bigger. |
n.levels |
An optional argument of interest only when y is categorical and x is quantitative. It specifies the number of levels when converting x to a categorical variable during the analysis. Each level will have the same number of cases. If this does not work out evenly, some levels are randomly picked to have one more case than the others. If unspecified, the default is to pick the number of levels so that there are 10 cases per level or a maximum of 6 levels (whichever is smaller). |
prompt |
|
color |
|
... |
Additional arguments related to plotting, e.g., pch, lty, lwd |
Details
This function uses Monte Carlo simulation (permutation procedure) to approximate the p-value of an association. Only complete cases are considered in the analysis.
Valid formulas may include functions of the variable, e.g. y^2, log10(x), or more complicated functions like I(x1/(x2+x3)). In the latter case, I() must surround the function of interest to be computed correctly.
When both x and y are quantitative variables, an analysis of Pearson's correlation and Spearman's rank correlation is provided. Scatterplots and histograms of the variables are provided. If classic
is TRUE
, the QQ-plots of the variables are provided along with tests of assumptions.
When x is categorical and y is quantitative, the averages (as well as mean ranks and medians) of y are compared between levels of x. The "discrepancy" is the F statistic for averages, Kruskal-Wallis statistic for mean ranks, and the chi-squared statistic for the median test. Side-by-side boxplots are also provided. If classic
is TRUE
, the QQ-plots of the distribution of y for each level of x are provided.
When x is quantitative and y is categorical, x is converted to a categorical variable with n.levels
levels with equal numbers of cases. A chi-squared test is performed for the association. The classic approach assumes a multinomial logistic regression to check significance. A mosaic plot showing the distribution of y for each induced level of x is provided as well as a probability "curve". If classic
is TRUE
, the multinomial logistic curves for each level are provided versus x..
When both x and y are categorical, a chi-squared test is performed. The contingency table, table of expected counts, and conditional distributions are also reported along with a mosaic plot.
If the permutation procedure is used, the sampling distribution of the measure of association is displayed over the requested amount of permutations along with the observed value on the actual data (except when y is categorical with x quantitative).
If classic results are desired, then plots and tests to check assumptions are supplied. white.test
from package bstats (version 1.1-11-5) and mshapiro.test
from package mvnormtest (version 0.1-9) are built into the function to avoid directly referencing the libraries (which sometimes causes problems).
Author(s)
Adam Petrie
References
Introduction to Regression and Modeling
See Also
lm
, glm
, anova
, cor
, chisq.test
, vglm
Examples
#Two quantitative variables
data(SALARY)
associate(Salary~Education,data=SALARY,permutations=1000)
#y is quantitative while x is categorical
data(SURVEY11)
associate(X07.GPA~X40.FavAlcohol,data=SURVEY11,permutations=0,classic=TRUE)
#y is categorical while x is quantitative
data(WINE)
associate(Quality~alcohol,data=WINE,classic=TRUE,n.levels=5)
#Two categorical variables (many cases, turns off prompt asking for user input)
data(ACCOUNT)
set.seed(320)
#Work with a smaller subset
SUBSET <- ACCOUNT[sample(nrow(ACCOUNT),1000),]
associate(Purchase~Area.Classification,data=SUBSET,classic=TRUE,prompt=FALSE)