get_hotellings {disprofas} | R Documentation |
Hotelling's statistics (for two independent (small) samples)
Description
The function get_hotellings()
estimates the parameters for Hotelling's
two-sample T^2
statistic for small samples. Note that the
function get_hotellings()
is deprecated. Upon the introduction of
the new function get_T2_one()
it was renamed to get_T2_two()
.
Please use the new function get_T2_two()
instead of the obsolete
function get_hotellings()
.
Usage
get_hotellings(m1, m2, signif)
Arguments
m1 |
A matrix with the data of the reference group, e.g. a matrix representing dissolution profiles, i.e. with rows for the different dosage units and columns for the different time points, or a matrix for the different model parameters (columns) of different dosage units (rows). |
m2 |
A matrix with the same dimensions as matrix |
signif |
A positive numeric value between |
Details
The two-sample Hotelling's T^2
test statistic is given by
T^2 = \frac{n_T n_R}{n_T + n_R} \left( \bm{x}_T - \bm{x}_R
\right)^{\top} \bm{S}_{pooled}^{-1} \left( \bm{x}_T - \bm{x}_R \right) ,
where \bm{x}_T
and \bm{x}_R
are the vectors of the
sample means of the test (T
) and reference (R
) group, e.g.
vectors of the average dissolution per time point or of the average model
parameters, n_T
and n_R
are the numbers of observations of the
reference and the test group, respectively (i.e. the number of rows in
matrices m1
and m2
handed over to the get_T2_two()
function), and \bm{S}_{pooled}
is the pooled
variance-covariance matrix which is calculated by
\bm{S}_{pooled} = \frac{(n_R - 1) \bm{S}_R + (n_T - 1) \bm{S}_T}{%
n_R + n_T - 2} ,
where \bm{S}_R
and \bm{S}_T
are the estimated
variance-covariance matrices which are calculated from the matrices of the
two groups being compared, i.e. m1
and m2
. The matrix
\bm{S}_{pooled}^{-1}
is the inverted
variance-covariance matrix. As the number of columns of matrices m1
and m2
increases, and especially as the correlation between the
columns increases, the risk increases that the pooled variance-covariance
matrix \bm{S}_{pooled}
is ill-conditioned or even singular
and thus cannot be inverted. The term
D_M = \sqrt{ \left( \bm{x}_T - \bm{x}_R \right)^{\top}
\bm{S}_{pooled}^{-1} \left( \bm{x}_T - \bm{x}_R \right) }
is the Mahalanobis distance which is used to measure the difference between
two multivariate means. For large samples, T^2
is approximately
chi-square distributed with p
degrees of freedom, where p
is
the number of variables, i.e. the number of dissolution profile time points
or the number of model parameters. In terms of the Mahalanobis distance,
Hotelling's T^2
statistic can be expressed has
\frac{n_T n_R}{n_T + n_R} \; D_M^2 = k \; D_M^2 .
To transform the Hotelling's T^2
statistic into an F
-statistic,
a conversion factor is necessary, i.e.
K = k \; \frac{n_T + n_R - p - 1}{\left( n_T + n_R - 2 \right) p} .
With this transformation, the following test statistic can be applied:
K \; D_M^2 \leq F_{p, n_T + n_R - p - 1, \alpha} .
Under the null hypothesis, H_0: \bm{\mu}_T = \bm{\mu}_R
, this F
-statistic is F
-distributed with
p
and n_T + n_R - p - 1
degrees of freedom. H_0
is
rejected at significance level \alpha
if the F
-value exceeds
the critical value from the F
-table evaluated at \alpha
, i.e.
F > F_{p, n_T + n_R - p - 1, \alpha}
. The null hypothesis is satisfied
if, and only if, the population means are identical for all variables. The
alternative is that at least one pair of these means is different.
The following assumptions concerning the data are made:
The data from population
i
is a sample from a population with mean vector\mu_i
. In other words, it is assumed that there are no sub-populations.The data from both populations have common variance-covariance matrix
\Sigma
.The elements from both populations are independently sampled, i.e. the data values are independent.
Both populations are multivariate normally distributed.
Confidence intervals:
Confidence intervals for the mean differences at each time point or
confidence intervals for the mean differences between the parameter
estimates of the reference and the test group are calculated by aid of the
formula
\left( \bm{x}_T - \bm{x}_R \right) \pm \sqrt{\frac{1}{K} \;
F_{p, n_T + n_R - p - 1, \alpha} \; \bm{s}_{pooled}} ,
where \bm{s}_{pooled}
is the vector of the diagonal
elements of the pooled variance-covariance matrix
\bm{S}_{pooled}
. With (1 - \alpha)100\%
confidence,
this interval covers the respective linear combination of the differences
between the means of the two sample groups. If not the linear combination
of the variables is of interest but rather the individual variables, then
the Bonferroni corrected confidence intervals should be used instead which
are given by the expression
\left( \bm{x}_T - \bm{x}_R \right) \pm
t_{n_T + n_R - 2, \frac{\alpha}{2 p}} \;
\sqrt{\frac{1}{k} \; \bm{s}_{pooled}} .
Value
A list with the following elements is returned:
Parameters |
Parameters determined for the estimation of Hotelling's
|
S.pool |
Pooled variance-covariance matrix. |
covs |
A list with the elements |
means |
A list with the elements |
CI |
A list with the elements |
The Parameters
element contains the following information:
dm |
Mahalanobis distance of the samples. |
df1 |
Degrees of freedom (number of variables or time points). |
df2 |
Degrees of freedom (number of rows - number of variables - 1). |
alpha |
Provided significance level. |
K |
Scaling factor for |
k |
Scaling factor for the squared Mahalanobis distance to obtain
the |
T2 |
Hotelling's |
F |
Observed |
F.crit |
Critical |
t.crit |
Critical |
p.F |
|
References
Hotelling, H. The generalisation of Student's ratio. Ann Math Stat. 1931; 2(3): 360-378.
Hotelling, H. (1947) Multivariate quality control illustrated by air testing of sample bombsights. In: Eisenhart, C., Hastay, M.W., and Wallis, W.A., Eds., Techniques of Statistical Analysis, McGraw Hill, New York, 111-184.
See Also
get_T2_one
, get_sim_lim
,
mimcr
.
Examples
# Estimation of the parameters for Hotelling's two-sample T2 statistic
# (for small samples)
## Not run:
res <-
get_hotellings(m1 = as.matrix(dip1[dip1$type == "R", c("t.15", "t.90")]),
m2 = as.matrix(dip1[dip1$type == "T", c("t.15", "t.90")]),
signif = 0.1)
res$S.pool
res$Parameters
## End(Not run)
# Expected results in res$S.pool
# t.15 t.90
# t.15 3.395808 1.029870
# t.90 1.029870 4.434833
# Expected results in res$Parameters
# DM df1 df2 signif K
# 1.044045e+01 2.000000e+00 9.000000e+00 1.000000e-01 1.350000e+00
# k T2 F F.crit p.F
# 3.000000e+00 3.270089e+02 1.471540e+02 3.006452e+00 1.335407e-07