dependence.structure {multivariance} | R Documentation |
determines the dependence structure
Description
Determines the dependence structure as described in [3].
Usage
dependence.structure(
x,
vec = 1:ncol(x),
verbose = TRUE,
detection.aim = NULL,
type = "conservative",
structure.type = "clustered",
c.factor = 2,
list.cdm = NULL,
alpha = 0.05,
p.adjust.method = "holm",
stop.too.many = NULL,
...
)
Arguments
x |
matrix, each row of the matrix is treated as one sample |
vec |
vector, it indicates which columns are initially treated together as one sample |
verbose |
boolean, if |
detection.aim |
|
type |
the method used for the detection, one of ' |
structure.type |
either the ' |
c.factor |
numeric, larger than 0, a constant factor used in the case of ' |
list.cdm |
not required, the list of doubly centered distance matrices corresponding to |
alpha |
numeric between 0 and 1, the significance level used for the tests |
p.adjust.method |
a string indicating the p-value adjustment for multiple testing, see |
stop.too.many |
numeric, upper limit for the number of tested tuples. A warning is issued if it is used. Use |
... |
these are passed to |
Details
Performs the detection of the dependence structure as described in [3]. In the clustered
structure variables are clustered and treated as one variable as soon as a dependence is detected, the full
structure treats always each variable separately. The detection is either based on tests with significance level alpha
or a consistent
estimator is used. The latter yields (in the limit for increasing sample size) under very mild conditions always the correct dependence structure (but the convergence might be very slow).
If fixed.rejection.level
is not provided, the significance level alpha
is used to determine which multivariances are significant using the distribution-free rejection level. As default the Holm method is used for p-value correction corresponding to multiple testing.
The resulting graph can be simplified (pairwise dependence can be represented by edges instead of vertices) using clean.graph
.
Advanced:
The argument detection.aim
is currently only implemented for structure.type = clustered
. It can be used to check, if an expected dependence structure was detected. This might be useful for simulation studies to determine the empirical power of the detection algorithm. Hereto detection.aim
is set to a list of vectors which indicate the expected detected dependence structures (one for each run of find.cluster
). The vector has as first element the k
for which k-tuples are detected (for this aim the detection stops without success if no k-tuple is found), and the other elements, indicate to which clusters all present vertices belong after the detection, e.g. c(3,2,2,1,2,1,1,2,1)
expects that 3-tuples are detected and in the graph are 8 vertices (including those representing the detected 3 dependencies), the order of the 2's and 1's indicate which vertices belong to which cluster. If detection.aim
is provided, the vector representing the actual detection is printed, thus one can use the output with copy-paste to fix successively the expected detection aims.
Note that a failed detection might invoke the warning:
run$mem == detection.aim[[k]][-1] : longer object length is not a multiple of shorter object length
Value
returns a list with elements:
multivariances
calculated multivariances,
cdms
calculated doubly centered distance matrices,
graph
graph representing the dependence structure,
detected
boolean, this is only included if a
detection.aim
is given,number.of.dep.tuples
vector, with the number of dependent tuples for each tested order. For the full dependence structure a value of -1 indicates that all tuples of this order are already lower order dependent, a value of -2 indicates that there were more than
stop.too.many
tuples,structure.type
either
clustered
orfull
,type
the type of p-value estimation or consistent estimation used,
total.number.of.tests
numeric vector, with the number of tests for each group of tests,
typeI.error.prob
estimated probability of a type I error,
alpha
significance level used if a p-value estimation procedure is used,
c.factor
factor used if a consistent estimation procedure is used,
parameter.range
significance levels (or 'c.factor' values) which yield the same detection result.
References
For the theoretic background see the reference [3] given on the main help page of this package: multivariance-package.
Examples
# structures for the datasets included in the package
dependence.structure(dep_struct_several_26_100)
dependence.structure(dep_struct_star_9_100)
dependence.structure(dep_struct_iterated_13_100)
dependence.structure(dep_struct_ring_15_100)
# basic examples:
x = coins(100) # 3-dependent
dependence.structure(x)
colnames(x) = c("A","B","C")
dependence.structure(x) # names of variables are used as labels
dependence.structure(coins(100),vec = c(1,1,2))
# 3-dependent rv of which the first two rv are used together as one rv, thus 2-dependence.
dependence.structure(x,vec = c(1,1,2)) # names of variables are used as labels
dependence.structure(cbind(coins(200),coins(200,k=5)),verbose = TRUE)
#1,2,3 are 3-dependent, 4,..,9 are 6-dependent
# similar to the the previous example, but
# the pair 1,3 is treated as one sample,
# anagously the pair 2,4. In the resulting structure one does not
# see anymore that the dependence of 1,2,3,4 with the rest is due
# to 4.
dependence.structure(cbind(coins(200),coins(200,k=5)),
vec = c(1,2,1,2,3,4,5,6,7),verbose = TRUE)
### Advanced:
# How to check the empirical power of the detection algorithm?
# Use a dataset for which the structure is detected, e.g. dep_struct_several_26_100.
# run:
dependence.structure(dep_struct_several_26_100,
detection.aim = list(c(ncol(dep_struct_several_26_100))))
# The output provides the first detection aim. Now we run the same line with the added
# detection aim
dependence.structure(dep_struct_several_26_100,detection.aim = list(c(3,1, 1, 1, 2, 2, 2, 3, 4,
5, 6, 7, 8, 8, 8, 9, 9, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1, 2, 8, 9),
c(ncol(dep_struct_several_26_100))))
# and get the next detection aim ... thus we finally obtain all detection aims.
# now we can run the code with new sample data ....
N = 100
dependence.structure(cbind(coins(N,2),tetrahedron(N),coins(N,4),tetrahedron(N),
tetrahedron(N),coins(N,3),coins(N,3),rnorm(N)),
detection.aim = list(c(3,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 8, 8,
9, 9, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1, 2, 8, 9),
c(4,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11,
11, 12, 1, 2, 8, 9, 10, 11),
c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 1,
2, 4, 5, 6, 7, 3),
c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 1,
2, 4, 5, 6, 7, 3)))$detected
# ... and one could start to store the results and compute the rate of successes.
# ... or one could try to check how many samples are necessary for the detection:
re = numeric(100)
for (i in 2:100) {
re[i] =
dependence.structure(dep_struct_several_26_100[1:i,],verbose = FALSE,
detection.aim = list(c(3,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8,
8, 8, 9, 9, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1, 2, 8, 9),
c(4,1, 1, 1, 2, 2, 2, 3, 4, 5, 6, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10, 11, 11,
11, 11, 12, 1, 2, 8, 9, 10, 11),
c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7,
8, 1, 2, 4, 5, 6, 7, 3),
c(5, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7,
8, 1, 2, 4, 5, 6, 7, 3)))$detected
print(paste("First", i,"samples. Detected?", re[i]==1))
}
cat(paste("Given the 1 to k'th row the structure is not detected for k =",which(re == FALSE),"\n"))