R: Hotelling's statistics (for one (small) sample)

get_T2_one {disprofas}

R Documentation

Hotelling's statistics (for one (small) sample)

Description

The function get_T2_one() estimates the parameters for Hotelling's one-sample T^2 statistic for small samples.

Usage

get_T2_one(m, mu, signif, na_rm = FALSE)

Arguments

`m`	A matrix with the data of the reference group, e.g. a matrix for the different model parameters (columns) of different dosage unit (rows).
`mu`	A numeric vector of, e.g. the hypothetical model parameter mean values.
`signif`	A positive numeric value between `0` and `1` that specifies the significance level. The default value is `0.05`.
`na_rm`	A logical value that indicates whether observations containing `NA` (or `NaN`) values should be removed (`na_rm = TRUE`) or not (`na_rm = FALSE`). The default is `na_rm = FALSE`.

Details

The one-sample Hotelling's T^2 test statistic is given by

T^2 = n \left( \bar{\bm{x}} - \bm{\mu}_0 \right)^{\top} \bm{S}^{-1} \left( \bar{\bm{x}} - \bm{\mu}_0 \right) .

where \bar{\bm{x}} is the vector of the sample means of the sample group, e.g. the vector of the average dissolution per time point or of the average model parameters, n is the numbers of observations of the sample group (i.e. the number of rows in matrix m handed over to the get_T2_one() function, and \bm{S} is variance-covariance matrix. The matrix \bm{S}^{-1} is the inverted variance-covariance matrix. The term

D_M = \sqrt{ \left( \bar{\bm{x}} - \bm{\mu}_0 \right)^{\top} \bm{S}^{-1} \left( \bar{\bm{x}} - \bm{\mu}_0 \right) }

is the Mahalanobis distance measuring the difference between the sample mean vector and the vector of the hypothetical values \bm{\mu}_0. For large samples, T^2 is approximately chi-square distributed with p degrees of freedom, where p is the number of variables, i.e. the number of dissolution profile time points or the number of model parameters. In terms of the Mahalanobis distance, the one-sample Hotelling's T^2 statistic can be expressed has

n \; D_M^2 = k \; D_M^2 .

To transform the one-sample Hotelling's T^2 statistic into an F-statistic, a conversion factor is necessary, i.e.

K = k \; \frac{n - p}{(n - 1) p} .

With this transformation, the following test statistic can be applied:

K \; D_M^2 \leq F_{p, n - p, \alpha} .

Under the null hypothesis, H_0: \bm{\mu} = \bm{\mu}_0, this F-statistic is F-distributed with p and n - p degrees of freedom. H_0 is rejected at a significance level of \alpha if the test statistic F exceeds the critical value from the F-table evaluated at \alpha, i.e. F > F_{p, n - p, \alpha}.

The following assumptions concerning the data are made:

The data of population x has no sub-populations, i.e. there are no sub-populations of x with different means.
The observations are based on a common variance-covariance matrix \Sigma.
The observations have been independently sampled.
The observations have been sampled from a multivariate normal distribution.

Confidence intervals:
Simultaneous (1 - \alpha)100\% confidence intervals for all linear combinations of the sample means are given by the expression

\left( \bar{\bm{x}} - \bm{\mu}_0 \right) \pm \sqrt{\frac{1}{K} \; F_{p, n - p, \alpha} \; \bm{s}} ,

where \bm{s} is the vector of the diagonal elements of the variance-covariance matrix \bm{S}. With (1 - \alpha)100\% confidence, this interval covers the respective linear combination of the differences between the sample means and the hypothetical means. If not the linear combination of the variables is of interest but rather the individual variables, then the Bonferroni corrected confidence intervals should be used instead which are given by the expression

\left( \bar{\bm{x}} - \bm{\mu}_0 \right) \pm t_{n - 1, \frac{\alpha}{2 p}} \; \sqrt{\frac{1}{k} \; \bm{s}} .

Value

A list with the following elements is returned:

`Parameters`	Parameters determined for the estimation of Hotelling's `T^2`.
`cov`	The variance-covariance matrix of the reference group.
`means`	A list with the elements `mean.r`, `mean.t` and `mean.diff`, i.e. the average model parameters of the reference group, the hypothetical average model parameters (handed over via the `mu` parameter) and the corresponding differences, respectively.
`CI`	A list with the elements `Hotelling` and `Bonferroni`, i.e. data frames with columns `LCL` and `UCL` for the lower and upper `(1 - \alpha)100\%` confidence limits, respectively, and rows for each time point or model parameter.

The Parameters element contains the following information:

`dm`	Mahalanobis distance of the samples.
`df1`	Degrees of freedom (number of variables or time points).
`df2`	Degrees of freedom (number of rows - number of variables - 1).
`alpha`	Provided significance level.
`K`	Scaling factor for `F` to account for the distribution of the `T^2` statistic.
`k`	Scaling factor for the squared Mahalanobis distance to obtain the `T^2` statistic.
`T2`	Hotelling's `T^2` statistic (`F`-distributed).
`F`	Observed `F` value.
`F.crit`	Critical `F` value.
`t.crit`	Critical `t` value.
`p.F`	`p` value for Hotelling's `T^2` test statistic.

References

Hotelling, H. The generalisation of Student's ratio. Ann Math Stat. 1931; 2(3): 360-378.

Hotelling, H. (1947) Multivariate quality control illustrated by air testing of sample bombsights. In: Eisenhart, C., Hastay, M.W., and Wallis, W.A., Eds., Techniques of Statistical Analysis, McGraw Hill, New York, 111-184.

Examples

# Estimation of the parameters for Hotelling's one-sample T2 statistic
# (for small samples)
# Check if there is a significant difference of the test batch results
# from the average reference batch results.
# Since p.F in res1$Parameters is smaller than 0.1, it is concluded that the
# new batch differs from the reference batch.
res1 <-
  get_T2_one(m = as.matrix(dip1[dip1$type == "T", c("t.15", "t.90")]),
             mu = colMeans(dip1[dip1$type == "R", c("t.15", "t.90")]),
             signif = 0.1, na_rm = FALSE)
res1$Parameters

# Expected results in res1$Parameters
#           dm          df1          df2       signif            K
# 1.314197e+01 2.000000e+00 4.000000e+00 1.000000e-01 2.400000e+00
#            k           T2            F       F.crit       t.crit
# 6.000000e+00 1.036268e+03 4.145072e+02 4.324555e+00 2.570582e+00
#          p.F
# 2.305765e-05

# In Tsong (1997) (see reference of dip7), the model-dependent approach is
# illustrated with an example data set of alpha and beta parameters obtained
# by fitting the Weibull curve function to a data set of dissolution profiles
# of three reference batches and one new batch (12 profiles per batch).
# Check if there is a significant difference of the test batch results
# from the average reference batch results.
# Since p.F in res2$Parameters is smaller than 0.05, it is concluded that the
# test batch differs from the reference batches.
res2 <-
  get_T2_one(m = as.matrix(dip7[dip7$type == "test", c("alpha", "beta")]),
             mu = colMeans(dip7[dip7$type == "ref", c("alpha", "beta")]),
             signif = 0.05, na_rm = FALSE)
res2$Parameters

# Expected results in res2$Parameters
#           dm          df1          df2       signif            K
# 5.984856e+00 2.000000e+00 1.000000e+01 5.000000e-02 5.454545e+00
#            k           T2            F       F.crit       t.crit
# 1.200000e+01 4.298220e+02 1.953736e+02 4.102821e+00 2.593093e+00
#          p.F
# 9.674913e-09

# In Sathe (1996) (see reference of dip8), the model-dependent approach is
# illustrated with an example data set of alpha and beta parameters obtained
# by fitting the Weibull curve function to a data set of dissolution profiles
# of one reference batch and one new batch with minor modifications and another
# new batch with major modifications (12 profiles per batch).
# Check if there is a significant difference of the results of the minor or
# major modificated batches from the average reference batch results.
# Since p.F in res3.minor$Parameters or in res3.major$Parameters are smaller
# than 0.1, it is concluded that the minor and the major modification batch
# differs from the reference batch.
res3.minor <-
  get_T2_one(m = log(as.matrix(dip8[dip8$type == "minor",
                                    c("alpha", "beta")])),
             mu = log(colMeans(dip8[dip8$type == "ref",
                                     c("alpha", "beta")])),
             signif = 0.1, na_rm = FALSE)
res3.major <-
  get_T2_one(m = log(as.matrix(dip8[dip8$type == "major",
                                    c("alpha", "beta")])),
             mu = log(colMeans(dip8[dip8$type == "ref",
                                     c("alpha", "beta")])),
             signif = 0.1, na_rm = FALSE)
res3.minor$Parameters
res3.major$Parameters

# Expected results in res3.minor$Parameters
#           dm          df1          df2       signif            K
# 2.718715e+00 2.000000e+00 1.000000e+01 1.000000e-01 5.454545e+00
#            k           T2            F       F.crit       t.crit
# 1.200000e+01 8.869691e+01 4.031678e+01 2.924466e+00 2.200985e+00
#          p.F
# 1.635140e-05

# Expected results in res3.major$Parameters
#           dm          df1          df2       signif            K
# 5.297092e+00 2.000000e+00 1.000000e+01 1.000000e-01 5.454545e+00
#            k           T2            F       F.crit       t.crit
# 1.200000e+01 3.367102e+02 1.530501e+02 2.924466e+00 2.200985e+00
#          p.F
# 3.168664e-08

[Package disprofas version 0.2.0 Index]