R: Bayesian estimation of Linear SDF (B-SDF)

BayesianSDF {BayesianFactorZoo}

R Documentation

Bayesian estimation of Linear SDF (B-SDF)

Description

This function provides the Bayesian estimates of factors' risk prices. The estimates with the flat prior are given by Definitions 1 and 2 in Bryzgalova et al. (2023). The estimates with the normal prior are used in Table I (see the footnote of Table I).

Usage

BayesianSDF(
  f,
  R,
  sim_length = 10000,
  intercept = TRUE,
  type = "OLS",
  prior = "Flat",
  psi0 = 5,
  d = 0.5
)

Arguments

`f`	A `t \times k` matrix of factors, where `k` is the number of factors and `t` is the number of periods
`R`	A `t \times N` matrix of test assets, where `t` is the number of periods and `N` is the number of test assets
`sim_length`	The length of MCMCs
`intercept`	If `intercept = TRUE` (`intercept = FALSE`), the model includes (does not include) the intercept. The default is `intercept = TRUE`
`type`	If `type = 'OLS'` (`type = 'GLS'`), the function returns Bayesian OLS (GLS) estimates of risk prices `\lambda`. The default is 'OLS'
`prior`	If `type = 'Flat'` (`type = 'Normal'`), the function executes the Bayesian estimation with the flat prior (normal prior). The default is 'Flat'
`psi0`	The hyper-parameter of the prior distribution of risk prices `\lambda` used in the normal prior (see Details). This parameter is needed only when the user chooses the normal prior. The default value is 5
`d`	The hyper-parameter of the prior distribution of risk prices `\lambda` used in the normal prior (see Details). The default value is 0.5

Details

Intercept

Consider the cross-sectional step. If one includes the intercept, the model is

\mu_R = \lambda_c 1_N + C_f \lambda_f = C \lambda,

where C = (1_N, C_f) and \lambda^\top = (\lambda_c^\top, \lambda_f^\top)^\top . If one doesn't include the intercept, the model is

\mu_R = C_f \lambda_f = C \lambda,

where C = C_f and \lambda = \lambda_f.

Bayesian Estimation

Let Y_t = f_t \cup R_t. Conditional on the data Y = \{Y_t\}_{t=1}^T, we can draw \mu_{Y} and \Sigma_{Y} from the Normal-inverse-Wishart system

\mu_Y | \Sigma_Y, Y \sim N (\hat{\mu}_Y , \Sigma_Y / T) ,

\Sigma_Y | Y \sim W^{-1} (T-1, \Sigma_{t=1}^{T} (Y_t - \hat{\mu}_Y ) ( Y_t - \hat{\mu}_Y )^\top ) ,

where W^{-1} is the inverse-Wishart distribution. We do not standardize Y_t in the time-series regression. In the empirical implementation, after obtaining posterior draws for \mu_{Y} and \Sigma_{Y}, we calculate \mu_R and C_f as the standardized expected returns of test assets and correlation between test assets and factors. It follows that C is a matrix containing a vector of ones and C_f.

The prior distribution of risk prices is either the flat prior or the normal prior.

With prior = 'Flat' and type = 'OLS', for each draw, the risk price estimate is

\hat{\lambda} = (C^{\top} C)^{-1}C^{T} \mu_{R} .

With prior = 'Flat' and type = 'GLS', for each draw, the risk price estimate is

\hat{\lambda} = (C^{\top} \Sigma^{-1}_{R} C)^{-1} C^{\top} \Sigma^{-1}_{R} \mu_{R}

If one chooses prior = 'Normal', the prior of factor j's risk price is

\lambda_j | \sigma^2 \sim N(0, \sigma^2 \psi \tilde{\rho}_j^\top \tilde{\rho}_j T^d ) ,

where \tilde{\rho}_j = \rho_j - (\frac{1}{N} \Sigma_{i=1}^{N} \rho_{j,i} ) \times 1_N is the cross-sectionally demeaned vector of factor j's correlations with asset returns. Equivalently,

\lambda | \sigma^2 \sim N(0, \sigma^2 D^{-1}) ,

D = diag \{ (\psi \tilde{\rho}_1^\top \tilde{\rho}_1 T^d)^{-1}, ..., (\psi \tilde{\rho}_k^\top \tilde{\rho}_k T^d)^{-1} \} \ \ without \ intercept;

D = diag \{ c, (\psi \tilde{\rho}_1^\top \tilde{\rho}_1 T^d)^{-1}, ..., (\psi \tilde{\rho}_k^\top \tilde{\rho}_k T^d)^{-1} \} \ \ with \ intercept;

where c is a small positive number corresponding to the common cross-sectional intercept (\lambda_c). Default values for \psi (psi0) and d (d) are 5 and 0.5, respectively.

With prior = 'Normal' and type = 'OLS', for each draw, the risk price estimate is

\hat{\lambda} = ( C^{\top} C +D )^{-1} C^{\top} \mu_R .

With prior = 'Normal' and type = 'GLS', for each draw, the risk price estimate is

\hat{\lambda} = ( C^{\top} \Sigma_R^{-1} C +D )^{-1} C^{\top} \Sigma_R^{-1} \mu_R .

Value

The return of BayesianSDF is a list that contains the following elements:

lambda_path: A sim_length\times (k+1) matrix if the intercept is included. NOTE: the first column \lambda_c corresponds to the intercept. The next k columns (i.e., the 2th – (k+1)-th columns) are the risk prices of k factors. If the intercept is excluded, the dimension of lambda_path is sim_length\times k.
R2_path: A sim_length\times 1 matrix, which contains the posterior draws of the OLS or GLS R^2.

References

Bryzgalova S, Huang J, Julliard C (2023). “Bayesian solutions for the factor zoo: We just ran two quadrillion models <https://doi.org/10.1111/jofi.13197>.” Journal of Finance, 78(1), 487–557.

Examples

## <-------------------------------------------------------------------------------->
## Example: Bayesian estimates of risk prices and R2
## This example is from the paper (see Section III. Simulation)
## <-------------------------------------------------------------------------------->

library(reshape2)
library(ggplot2)

# Load the example data
data("BFactor_zoo_example")
HML <- BFactor_zoo_example$HML
lambda_ols <- BFactor_zoo_example$lambda_ols
R2.ols.true <- BFactor_zoo_example$R2.ols.true
sim_f <- BFactor_zoo_example$sim_f
sim_R <- BFactor_zoo_example$sim_R
uf <- BFactor_zoo_example$uf
W_ols <- BFactor_zoo_example$W_ols

cat("Load the simulated example \n")

cat("Cross-section: Fama-French 25 size and value portfolios \n")
cat("True pricing factor in simulations: HML \n")
cat("Pseudo-true cross-sectional R-squared:", R2.ols.true, "\n")
cat("Pseudo-true (monthly) risk price:", lambda_ols[2], "\n")

cat("----------------------------- Bayesian SDF ----------------------------\n")
cat("------------------------ See definitions 1 and 2 ----------------------\n")

cat("--------------------- Bayesian SDF: Strong factor ---------------------\n")

sim_result <- SDF_gmm(sim_R, sim_f, W_ols)   # GMM estimation
# sim_result$lambda_gmm
# sqrt(sim_result$Avar_hat[2,2])
# sim_result$R2_adj

## Now estimate the model using Bayesian method
two_step <- BayesianSDF(sim_f, sim_R, sim_length =  2000, psi0 = 5, d = 0.5)
# apply(X = two_step$lambda_path, FUN = quantile, MARGIN = 2, probs = c(0.05, 0.95))
# quantile(two_step$R2_path, probs = c(0.05, 0.5, 0.95))

# Note that the first element correspond to lambda of the constant term
# So we choose k=2 to get lambda of the strong factor
k <- 2
m1 <- sim_result$lambda_gmm[k]
sd1 <- sqrt(sim_result$Avar_hat[k,k])

bfm<-two_step$lambda_path[1001:2000, k]
fm<-rnorm(5000,mean = m1, sd=sd1)
data<-data.frame(cbind(fm, bfm))
colnames(data)<-c("GMM-OLS", "BSDF-OLS")
data.long<-melt(data)

#
### Figure 1(c)
#
p <- ggplot(aes(x=value, colour=variable, linetype=variable), data=data.long)
p+
 stat_density(aes(x=value, colour=variable),
              geom="line",position="identity", size = 2, adjust=1) +
 geom_vline(xintercept = lambda_ols[2], linetype="dotted", color = "#8c8c8c", size=1.5)+
 guides(colour = guide_legend(override.aes=list(size=2), title.position = "top",
 title.hjust = 0.5, nrow=1,byrow=TRUE))+
 theme_bw()+
 labs(color=element_blank()) +
 labs(linetype=element_blank()) +
 theme(legend.key.width=unit(4,"line")) +
 theme(legend.position="bottom")+
 theme(text = element_text(size = 26))+
 xlab(bquote("Risk price ("~lambda[strong]~")")) +
 ylab("Density" )


cat("--------------------- Bayesian SDF: Useless factor --------------------\n")

sim_result <- SDF_gmm(sim_R, uf, W_ols)
# sim_result$lambda_gmm
# sqrt(sim_result$Avar_hat[2,2])
# sim_result$R2_adj

two_step <- BayesianSDF(uf, sim_R, sim_length =  2000, psi0 = 5, d = 0.5)
#apply(X = two_step$lambda_path, FUN = quantile, MARGIN = 2, probs = c(0.05, 0.95))


## Posterior (Asymptotic) Distribution of lambda
k <- 2
m1 <- sim_result$lambda[k]
sd1 <- sqrt(sim_result$Avar_hat[k,k])

bfm<-two_step$lambda_path[1001:2000, k]
fm<-rnorm(5000,mean = m1, sd=sd1)
data<-data.frame(cbind(fm, bfm))
colnames(data)<-c("GMM-OLS", "BSDF-OLS")
data.long<-melt(data)

#
### Figure 1(a)
#
p <- ggplot(aes(x=value, colour=variable, linetype=variable), data=data.long)
p+
 stat_density(aes(x=value, colour=variable),
              geom="line",position="identity", size = 2, adjust=2) +
 geom_vline(xintercept = 0, linetype="dotted", color = "#8c8c8c", size=1.5)+
 guides(colour = guide_legend(override.aes=list(size=2),
 title.position = "top", title.hjust = 0.5, nrow=1,byrow=TRUE))+
 theme_bw()+
 labs(color=element_blank()) +
 labs(linetype=element_blank()) +
 theme(legend.key.width=unit(4,"line")) +
 theme(legend.position="bottom")+
 theme(text = element_text(size = 26))+
 xlab(bquote("Risk price ("~lambda[spurious]~")")) +
 ylab("Density" )

[Package BayesianFactorZoo version 0.0.0.2 Index]