acss {acss} | R Documentation |
ACSS complexity
Description
Functions to obtain the algorithmic complexity for short strings, an approximation of the Kolmogorov Complexity of a short string using the coding theorem method.
Usage
acss(string, alphabet = 9)
local_complexity(string, alphabet = 9, span = 5)
likelihood_d(string, alphabet = 9)
likelihood_ratio(string, alphabet = 9)
prob_random(string, alphabet = 9, prior= 0.5)
Arguments
string |
|
alphabet |
|
prior |
|
span |
size of substrings to be created from |
Details
The algorithmic complexity is computed using the coding theorem method: For a given alphabet size (number of different symbols in a string), all possible or a large number of random samples of Turing machines (TM) with a given number of states (e.g., 5) and number of symbols corresponding to the alphabet size were simulated until they reached a halting state or failed to end.
The outputs of the TMs at the halting states produces a distribution of strings known as the algorithmic probability of the strings. The algorithmic coding theorem (Levin, 1974) establishes the connection between the complexity of a string K(s)
and its algorithmic probability D(s)
as:
K(s)\approx -\log_{2}(D(s))
This package accesses a database containing data on 4.5 million strings from length 1 to 12 simulated on TMs with 2, 4, 5, 6, and 9 symbols.
For a more detailed discussion see Gauvrit, Singmann, Soler-Toscano, and Zenil (2014), http://complexitycalculator.com/methodology.html, or references below.
Value
- "acss"
A matrix in which the rows correspond to the strings entered and the columns to the algorithmic complexity K and the algorithmic probability D of the string (see http://complexitycalculator.com/methodology.html).
- "local_complexity"
A list with elements corresponding to the strings. Each list containes a named vector of algorithmic complexities (K) of all substrings in each string with length span.
- "likelihood_d"
A named vector with the likelihoods for
string
given a detreministic process.- "likelihood_ratio"
A named vector with the likelihood ratios (or Bayes factors) for
string
given a random rather than detreministic process.- "prob_random"
A named vector with the posterior probabilities that for a random process given the strings and the provided prior for being produced by a random process (default is 0.5, which correspond to a prior of 1 - 0.5 = 0.5 for a detereministic process).
Note
The first time per session one of the functions described here is used, a relatively large dataset is loaded into memory which can take a considerable amount of time (> 10 seconds).
References
Delahaye, J.-P., & Zenil, H. (2012). Numerical evaluation of algorithmic complexity for short strings: A glance into the innermost structure of randomness. Applied Mathematics and Computation, 219(1), 63-77. doi:10.1016/j.amc.2011.10.006
Gauvrit, N., Singmann, H., Soler-Toscano, F., & Zenil, H. (2014). Algorithmic complexity for psychology: A user-friendly implementation of the coding theorem method. arXiv:1409.4080 [cs, stat]. http://arxiv.org/abs/1409.4080.
Gauvrit, N., Zenil, H., Delahaye, J.-P., & Soler-Toscano, F. (2014). Algorithmic complexity for short binary strings applied to psychology: a primer. Behavior Research Methods, 46(3), 732-744. doi:10.3758/s13428-013-0416-0
Levin, L. A. (1974). Laws of information conservation (nongrowth) and aspects of the foundation of probability theory. Problemy Peredachi Informatsii, 10(3), 30-35.
Soler-Toscano, F., Zenil, H., Delahaye, J.-P., & Gauvrit, N. (2012). Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines. PLoS ONE, 9(5): e96223.
Examples
# WARNING: The first call to one of the functions
# discussed on this page loads a large data set
# and usually takes > 10 seconds. Stay patient.
acss(c("HEHHEE", "GHHGGHGHH", "HSHSHHSHSS"))
## K.9 D.9
## HEHHEE 23.38852 9.106564e-08
## GHHGGHGHH 33.50168 8.222205e-11
## HSHSHHSHSS 35.15241 2.618613e-11
acss(c("HEHHEE", "GHHGGHGHH", "HSHSHHSHSS"))[,"K.9"]
## [1] 23.38852 33.50168 35.15241
acss(c("HEHHEE", "GHHGGHGHH", "HSHSHHSHSS"), alphabet = 2)
## K.2 D.2
## HEHHEE 14.96921 3.117581e-05
## GHHGGHGHH 25.60208 1.963387e-08
## HSHSHHSHSS 26.90906 7.935321e-09
acss(c("HEHHEE", "GHHGGHGHUE", "HSHSHHSHSS"), NULL)
## K.2 K.4 K.5 K.6 K.9
## HEHHEE 14.96921 18.55227 19.70361 20.75762 23.38852
## GHHGGHGHUE NA 31.75832 33.00795 34.27457 37.78935
## HSHSHHSHSS 26.90906 29.37852 30.52566 31.76229 35.15241
## D.2 D.4 D.5 D.6
## HEHHEE 3.117581e-05 2.601421e-06 1.171176e-06 5.640722e-07
## GHHGGHGHUE NA 2.752909e-10 1.157755e-10 4.812021e-11
## HSHSHHSHSS 7.935321e-09 1.432793e-09 6.469341e-10 2.745360e-10
## D.9
## HEHHEE 9.106564e-08
## GHHGGHGHUE 4.209915e-12
## HSHSHHSHSS 2.618613e-11
## Not run:
likelihood_d(c("HTHTHTHT", "HTHHTHTT"), alphabet = 2)
## HTHTHTHT HTHHTHTT
## 0.010366951 0.003102718
likelihood_ratio(c("HTHTHTHT", "HTHHTHTT"), alphabet = 2)
## HTHTHTHT HTHHTHTT
## 0.3767983 1.2589769
prob_random(c("HTHTHTHT", "HTHHTHTT"), alphabet = 2)
## HTHTHTHT HTHHTHTT
## 0.2736772 0.5573217
## End(Not run)
local_complexity(c("01011010111" ,"GHHGGHGHUE"), alphabet = 5, span=5)
## $`01011010111`
## 01011 10110 01101 11010 10101 01011 10111
## 16.22469 16.24766 16.24766 16.22469 16.24322 16.22469 15.93927
##
## $GHHGGHGHUE
## GHHGG HHGGH HGGHG GGHGH GHGHU HGHUE
## 16.44639 16.44639 16.24766 16.22469 16.58986 16.86449
local_complexity(c("01011010111" ,"GHHGGHGHUE"), span=7)
## $`01011010111`
## 0101101 1011010 0110101 1101011 1010111
## 26.52068 26.52068 26.47782 26.62371 26.29186
##
## $GHHGGHGHUE
## GHHGGHG HHGGHGH HGGHGHU GGHGHUE
## 27.04623 26.86992 27.30871 27.84322