data_to_zipfs {latentFactoR} | R Documentation |
Transforms simulate_factors
Data to Zipf's Distribution
Description
Zipf's distribution is commonly found for text data. Closely related to the Pareto and power-law distributions, the Zipf's distribution produces highly skewed data. This transformation is intended to mirror the data generating process of Zipf's law seen in semantic network and topic modeling data.
Usage
data_to_zipfs(lf_object, beta = 2.7, alpha = 1, dichotomous = FALSE)
Arguments
lf_object |
Data object from |
beta |
Numeric (length = 1).
Sets the shift in rank.
Defaults to |
alpha |
Numeric (length = 1).
Sets the power of the rank.
Defaults to |
dichotomous |
Boolean (length = 1).
Whether data should be dichotomized rather
than frequencies (e.g., semantic network analysis).
Defaults to |
Details
The formula used to transform data is (Piantadosi, 2014):
f(r) proportional to 1 / (r + beta)^alpha
where f(r) is the rth most frequency, r is the rank-order of the data, beta is a shift in the rank (following Mandelbrot, 1953, 1962), and alpha is the power of the rank with greater values suggesting greater differences between the largest frequency to the next, and so forth.
The function will transform continuous data output from simulate_factors
.
See examples to get started
Value
Returns a list containing:
data |
Simulated data that has been transform to follow Zipf's distribution |
RMSE |
A vector of root mean square errors for transformed data and data assumed to follow theoretical Zipf's distribution and Spearman's correlation matrix of the transformed data compared to the original population correlation matrix |
spearman_correlation |
Spearman's correlation matrix of the transformed data |
original_correlation |
Original population correlation matrix before the data were transformed |
original_results |
Original |
Author(s)
Alexander P. Christensen <alexpaulchristensen@gmail.com>, Hudson Golino <hfg9s@virginia.edu>, Luis Eduardo Garrido <luisgarrido@pucmm.edu>
References
Mandelbrot, B. (1953). An informational theory of the statistical structure of language. Communication Theory, 84, 486–502.
Mandelbrot, B. (1962). On the theory of word frequencies and on related Markovian models of discourse. Structure of Language and its Mathematical Aspects, 190–219.
Piantadosi, S. T. (2014). Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review, 21(5), 1112-1130.
Zipf, G. (1936). The psychobiology of language. London, UK: Routledge.
Zipf, G. (1949). Human behavior and the principle of least effort. New York, NY: Addison-Wesley.
Examples
# Generate factor data
two_factor <- simulate_factors(
factors = 2, # factors = 2
variables = 6, # variables per factor = 6
loadings = 0.55, # loadings between = 0.45 to 0.65
cross_loadings = 0.05, # cross-loadings N(0, 0.05)
correlations = 0.30, # correlation between factors = 0.30
sample_size = 1000 # number of cases = 1000
)
# Transform data to Mandelbrot's Zipf's
two_factor_zipfs <- data_to_zipfs(
lf_object = two_factor,
beta = 2.7,
alpha = 1
)
# Transform data to Mandelbrot's Zipf's (dichotomous)
two_factor_zipfs_binary <- data_to_zipfs(
lf_object = two_factor,
beta = 2.7,
alpha = 1,
dichotomous = TRUE
)