R: Estimate Variance Matrix via Statistical Factors

factor.model.stat {BurStFin}

R Documentation

Estimate Variance Matrix via Statistical Factors

Description

Creates a variance matrix based on the principal components of the variables that have no missing values.

Usage

factor.model.stat(x, weights = seq(0.5, 1.5, length.out = nobs), 
	output = "full", center = TRUE, frac.var = 0.5, iter.max = 1, 
	nfac.miss = 1, full.min = 20, reg.min = 40, sd.min = 20, 
	quan.sd = 0.9, tol = 0.001, zero.load = FALSE, 
	range.factors = c(0, Inf), constant.returns.okay = FALSE, 
	specific.floor = 0.1, floor.type = "quantile", verbose=2)

Arguments

`x`	required. A numeric matrix. The rows are observations and the columns are the variables. In finance, this will be a matrix of returns where the rows are times and the columns are assets. For the default value of `weights` the most recent observation should be the last row. The number of columns may exceed the number of rows, and missing values are accepted. A column may even have all missing values.
`weights`	a vector of observation weights, or `NULL`. Equal weights can be specified with `NULL` or with a single positive number. Otherwise, the length must be equal to either the original number of rows in `x` or the number of rows in `x` minus the number of rows that contain all missing values.
`output`	a character string indicating the form of the result. It must partially match one of: `"full"`, `"systematic"`, `"specific"` or `"factor"`.
`center`	either a logical value or a numeric vector with length equal to the number of columns in `x`. If `center` is `TRUE`, then the mean of each column is used as the center. If `center` is `FALSE`, then the center for each variable is taken to be zero.
`frac.var`	a control on the number of factors to use – the number of factors is chosen so that the factors account for (just over) `frac.var` of the total variability.
`iter.max`	the maximum number of times to iterate the search for principal factors of the variables with complete data.
`nfac.miss`	a vector of integers giving the number of factors to use in regressions for variables with missing values. The number of factors used is equal to the i-th element of `nfac.miss` where i is the number of missing values for the variable. Thus the values in the vector should be non-increasing. The last value is used when the number of missing values is greater than the length of `nfac.miss`.
`full.min`	an integer giving the minimum number of variables that must have complete data.
`reg.min`	the minimum number of non-missing values for a variable in order for a regression to be performed on the variable.
`sd.min`	the minimum number of non-missing values for a variable in order for the standard deviation to be estimated from the data.
`quan.sd`	the quantile of the standard deviations to use for the standard deviation of variables that do not have enough data for the standard deviation to be estimated.
`tol`	a number giving the tolerance for the principal factor convergence (using the assets with full data). If the maximum change in uniquenesses (in the correlation scale) is less than `tol` from one iteration to the next, then convergence is assumed and the iterations end.
`zero.load`	a logical value. If `TRUE`, then loadings for variables with missing values are zero except for those estimated by regression. If `FALSE`, then loadings for variables with missing values are the average loading for the factor (when they are not estimated by regression).
`range.factors`	a numeric vector that gives the maximum and minimum number of factors that are allowed to be used.
`constant.returns.okay`	a logical vector: if `TRUE`, then a column with all of its non-missing values equal does not cause an error. if the true variance is thought to be non-zero, then a better alternative to setting this to `TRUE` is to set all the values in the column of `x` to be `NA`.
`specific.floor`	a number indicating how much uniquenesses should be adjusted upwards. The meaning of this number depends on the value of the `floor.type` argument.
`floor.type`	a character string that partially matches one of: `"quantile"` or `"fraction"`. If the value is `"quantile"`, then all uniquenesses are made to be at least as big as the `specific.floor` quantile of the uniquenesses. If the value is `"fraction"`, then all uniqueness are made to be at least `specific.floor`.
`verbose`	a number indicating the level of warning messages desired. This currently controls warnings: If at least 1, then a warning will be issued if all the values in `x` are non-negative. In finance this is an indication that prices rather than returns are input (an easy mistake to make). If at least 1, then a warning will be issued if there are any assets with constant returns (unless `constant.returns.okay` is `FALSE` in which case an error is thrown). If at least 2, then a warning will be issued if there are any specific variances that are adjusted from being negative.

Value

if output is "full", then a variance matrix with dimensions equal to the number of columns in the input x. This has two additional attributes: number.of.factors that says how many factors are used in the model, and timestamp that gives the date and time that the object was created.

if output is "systematic", then a matrix with dimensions equal to the number of columns in the input x that contains the systematic portion of the variance matrix.

if output is "specific", then a diagonal matrix with dimensions equal to the number of columns in the input x that contains the specific variance portion of the variance matrix. The full variance matrix is the sum of the systematic and specific matrices.

If output is "factor", then an object of class "statfacmodBurSt" which is a list with components:

`loadings`	a matrix of the loadings for the correlation matrix.
`uniquenesses`	the uniquenesses for the correlation matrix. That is, the proportion of the variance that is not explained by the factors. Note that if there are uniquenesses that have been modified via the `specific.floor` argument, then the actual proportion is the stated proportion divided by one plus the modification.
`sdev`	the standard deviations for the variables. Note that if there are uniquenesses that have been modified via the `specific.floor` argument, then the corresponding standard deviations in `sdev` are smaller than the actual standard deviations in the answer.
`constant.names`	A character vector giving the names of the variables that are constant (if any).
`cumulative.variance.fraction`	numeric vector giving the cumulative fraction of the variance explained by (all) the factors.
`timestamp`	character string giving the date and time the calculation was completed.
`call`	an image of the call that created the object.

Details

Observations that are missing on all variables are deleted. Then a principal components factor model is estimated with the variables that have complete data.

For variables that have missing values, the standard deviation is estimated when there are enough obeservations otherwise a given quantile of the standard deviations of the other assets is used as the estimate. The loadings for these variables are set to be either the average loading for the variables with no missing data, or zero. The loadings for the most important factors are modified by performing a regression with the non-missing data for each variable (if there is enough data to do the regression).

The treatment of variables with missing values can be quite important. You may well benefit from specializing how missing values are handled to your particular problem. To do this, set the output to "factor" – then you can modify the loadings (and per force the uniquenesses), and the standard deviations to fit your situation. This may include taking sectors and countries into account, for example.

The default settings for missing value treatment are suitable for creating a variance matrix for long-only portfolio optimization – high volatility and average correlation. Take note that the proper treatment of missing values is HIGHLY dependent on the use to which the variance matrix is to be put.

OBSERVATION WEIGHTS. Time weights are quite helpful for estimating variances from returns. The default weighting seems to perform reasonably well over a range of situations.

FACTOR MODEL TO FULL MODEL. This class of object has a method for fitted which returns the variance matrix corresponding to the factor model representation.

Warning

The default value for weights assumes that the last row is the most recent observation and the first observation is the most ancient observation.

Research Issues

The method of handling missing values used in the function has not been well studied. It seems not to be the worst approach, but undoubtedly can be improved.

The default method of boosting the result away from singularity is completely unstudied. For optimization it is wise to move away from singularity, just how to do that best seems like a research question.

Revision

This help was last revised 2014 March 09.

Author(s)

Burns Statistics

Examples

## Not run: 
varian1 <- factor.model.stat(retmat)

varfac <- factor.model.stat(retmat, nfac=0, zero=TRUE, output="fact")

varian2 <- fitted(varfac) # get matrix from factor model

varian3 <- factor.model.stat(retmat, nfac=rep(c(5,3,1), c(20,40,1)))

## End(Not run)

[Package BurStFin version 1.3 Index]