cm_distance {qdap} | R Documentation |
Distance Matrix Between Codes
Description
Generate distance measures to ascertain a mean distance measure between codes.
Usage
cm_distance(
dataframe,
pvals = c(TRUE, FALSE),
replications = 1000,
parallel = TRUE,
extended.output = TRUE,
time.var = TRUE,
code.var = "code",
causal = FALSE,
start.var = "start",
end.var = "end",
cores = detectCores()/2
)
Arguments
dataframe |
A data frame from the cm_x2long family
( |
pvals |
A logical vector of length 1 or 2. If element 2 is blank
element 1 will be recycled. If the first element is |
replications |
An integer value for the number of replications used in
resampling the data if any |
parallel |
logical. If |
extended.output |
logical. If |
time.var |
An optional variable to split the dataframe by (if you have data that is by various times this must be supplied). |
code.var |
The name of the code variable column. Defaults to "codes" as out putted by x2long family. |
causal |
logical. If |
start.var |
The name of the start variable column. Defaults to "start" as out putted by x2long family. |
end.var |
The name of the end variable column. Defaults to "end" as out putted by x2long family. |
cores |
An integer value describing the number of cores to use if
|
Details
Note that row names are the first code and column names are the
second comparison code. The values for Code A compared to Code B will not be
the same as Code B compared to Code A. This is because, unlike a true
distance measure, cm_distance's matrix is asymmetrical. cm_distance
computes the distance by taking each span (start and end) for Code A and
comparing it to the nearest start or end for Code B.
Value
An object of the class "cm_distance"
. This is a list with the
following components:
pvals |
A logical indication of whether pvalues were calculated |
replications |
Integer value of number of replications used |
extended.output |
An optional list of individual repeated measures information |
main.output |
A list of aggregated repeated measures information |
adj.alpha |
An adjusted alpha level (based on |
Within the lists of extended.output and list of the main.output are the following items:
mean |
A distance matrix of average distances between codes |
sd |
A matrix of standard deviations of distances between codes |
n |
A matrix of counts of distances between codes |
stan.mean |
A matrix of standardized values of distances between codes. The closer a value is to zero the closer two codes relate. |
pvalue |
A n optional matrix of simulated pvalues associated with the mean distances |
Warning
p-values are estimated and thus subject to error. More replications decreases the error. Use:
p \pm \left ( 1.96 \cdot \sqrt{\frac{\alpha(1-\alpha)}{n}}\right )
to adjust the confidence in the estimated p-values based on the number of replications.
References
https://stats.stackexchange.com/a/22333/7482
See Also
Examples
## Not run:
foo <- list(
AA = qcv(terms="02:03, 05"),
BB = qcv(terms="1:2, 3:10"),
CC = qcv(terms="1:9, 100:150")
)
foo2 <- list(
AA = qcv(terms="40"),
BB = qcv(terms="50:90"),
CC = qcv(terms="60:90, 100:120, 150"),
DD = qcv(terms="")
)
(dat <- cm_2long(foo, foo2, v.name = "time"))
plot(dat)
(out <- cm_distance(dat, replications=100))
names(out)
names(out$main.output)
out$main.output
out$extended.output
print(out, new.order = c(3, 2, 1))
print(out, new.order = 3:2)
#========================================
x <- list(
transcript_time_span = qcv(00:00 - 1:12:00),
A = qcv(terms = "2.40:3.00, 6.32:7.00, 9.00,
10.00:11.00, 59.56"),
B = qcv(terms = "3.01:3.02, 5.01, 19.00, 1.12.00:1.19.01"),
C = qcv(terms = "2.40:3.00, 5.01, 6.32:7.00, 9.00, 17.01")
)
(dat <- cm_2long(x))
plot(dat)
(a <- cm_distance(dat, causal=TRUE, replications=100))
## Plotting as a network graph
datA <- list(
A = qcv(terms="02:03, 05"),
B = qcv(terms="1:2, 3:10, 45, 60, 200:206, 250, 289:299, 330"),
C = qcv(terms="1:9, 47, 62, 100:150, 202, 260, 292:299, 332"),
D = qcv(terms="10:20, 30, 38:44, 138:145"),
E = qcv(terms="10:15, 32, 36:43, 132:140"),
F = qcv(terms="1:2, 3:9, 10:15, 32, 36:43, 45, 60, 132:140, 250, 289:299"),
G = qcv(terms="1:2, 3:9, 10:15, 32, 36:43, 45, 60, 132:140, 250, 289:299"),
H = qcv(terms="20, 40, 60, 150, 190, 222, 255, 277"),
I = qcv(terms="20, 40, 60, 150, 190, 222, 255, 277")
)
datB <- list(
A = qcv(terms="40"),
B = qcv(terms="50:90, 110, 148, 177, 200:206, 250, 289:299"),
C = qcv(terms="60:90, 100:120, 150, 201, 244, 292"),
D = qcv(terms="10:20, 30, 38:44, 138:145"),
E = qcv(terms="10:15, 32, 36:43, 132:140"),
F = qcv(terms="10:15, 32, 36:43, 132:140, 148, 177, 200:206, 250, 289:299"),
G = qcv(terms="10:15, 32, 36:43, 132:140, 148, 177, 200:206, 250, 289:299"),
I = qcv(terms="20, 40, 60, 150, 190, 222, 255, 277")
)
(datC <- cm_2long(datA, datB, v.name = "time"))
plot(datC)
(out2 <- cm_distance(datC, replications=1250))
plot(out2)
plot(out2, label.cex=2, label.dist=TRUE, digits=5)
## End(Not run)