| dimcalc {lsa} | R Documentation |
Dimensionality Calculation Routines (LSA)
Description
Methods for choosing a ‘good’ number of singular values for the dimensionality reduction in LSA.
Usage
dimcalc_share(share=0.5)
dimcalc_ndocs(ndocs)
dimcalc_kaiser()
dimcalc_raw()
dimcalc_fraction(frac=(1/50))
Arguments
share |
Optional: a fraction of the sum of the selected singular values to the sum of all singular values (default: 0.5). Only needed by |
frac |
Optional: a fraction of the number of the singular values to be used (default: 1/50th). |
ndocs |
Optional: the number of documents (only needed for |
Details
In an LSA process, the diagonal matrix of the singular value decomposition is usually reduced to a specific number of dimensions (also ‘factors’ or ‘singular values’).
The functions dimcalc\_share(), dimcalc\_ndocs(), dimcalc\_kaiser()
and also the redundant function dimcalc\_raw() offer methods to calculate a useful
number of singular values (based on the distribution and values of the given sequence
of singular values).
All of them are tightly coupled to the core LSA functions: they generates
a function to be executed by the calling (higher-level)
function lsa(). The output function contains only one parameter,
namely s, which is expected to be the sequence of singular values.
In lsa(), the code returned is executed, the mandatory
singular values are provided as a parameter within lsa().
The dimensionality calculation methods, however, can still be called directly by adding a second, separate parameter set: e.g.
dimcalc\_share(share=0.2)(mysingularvalues)
The method dimcalc\_share() finds the first position in the descending sequence of
singular values s where their sum (divided by the sum of all
values) meets or exceeds the specified share.
The method dimcalc\_ndocs() calculates the first position in the descending sequence
of singular values where their sum meets or exceeds the number of documents.
The method dimcalc\_kaiser() calculates the number of singular values according to the
Kaiser-Criterium, i.e. from the descending order of values all values
with s[n] > 1 will be taken. The number of dimensions is returned
accordingly.
The method dimcalc_fraction() returns the specified share of the
number of singular values. Per default, 1/50th of the available values
will be used and the determined number of singular values will be returned.
The method dimcalc\_raw() return the maximum number of singular values (= the length
of s). It is here only for completeness.
Value
Returns a function that takes the singular values as a parameter to return the recommended number of dimensions. The expected parameter of this function is
s |
A sequence of singular values (as produced by the SVD). Only needed when calling the dimensionality calculation routines directly. |
Author(s)
Fridolin Wild f.wild@open.ac.uk
References
Wild, F., Stahl, C., Stermsek, G., Neumann, G., Penya, Y. (2005) Parameters Driving Effectiveness of Automated Essay Scoring with LSA. In: Proceedings of the 9th CAA, pp.485-494, Loughborough
See Also
Examples
## create some data
vec1 = c( 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
vec2 = c( 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0 )
vec3 = c( 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0 )
matrix = cbind(vec1,vec2, vec3)
s = svd(matrix)$d
# standard share of 0.5
dimcalc_share()(s)
# specific share of 0.9
dimcalc_share(share=0.9)(s)
# meeting the number of documents (here: 3)
n = ncol(matrix)
dimcalc_ndocs(n)(s)