# Chapter 3 Factor investing and asset pricing anomalies

Asset pricing anomalies are the foundations of factor investing. In this chapter our aim is twofold:

• present simple ideas and concepts: basic factor models and common empirical facts (time-varying nature of returns and risk premia);
• provide the reader with lists of articles that go much deeper to stimulate and satisfy curiosity.

The purpose of this chapter is not to provide a full treatment of the many topics related to factor investing. Rather, it is intended to give a broad overview and cover the essential themes so that the reader is guided towards the relevant references. As such, it can serve as a short, non-exhaustive, review of the literature. The subject of factor modelling in finance is incredibly vast and the number of papers dedicated to it is substantial and still rapidly increasing.

The universe of peer-reviewed financial journals can be split in two. The first kind is the academic journals. Their articles are mostly written by professors, and the audience consists mostly of scholars. The articles are long and often technical. Prominent examples are the Journal of Finance, the Review of Financial Studies and the Journal of Financial Economics. The second type is more practitioner-orientated. The papers are shorter, easier to read, and target finance professionals predominantly. Two emblematic examples are the Journal of Portfolio Management and the Financial Analysts Journal. This chapter reviews and mentions articles published essentially in the first family of journals.

Beyond academic articles, several monographs are already dedicated to the topic of style allocation (a synonym of factor investing used for instance in theoretical articles () or practitioner papers ()). To cite but a few, we mention:

• : an exhaustive excursion into risk premia, across many asset classes, with a large spectrum of descriptive statistics (across factors and periods),
• Ang (2014): covers factor investing with a strong focus on the money management industry,
• : very complete book on the cross-section of signals with statistical analyses (univariate metrics, correlations, persistence, etc.),
• : a tour on various topics given by field experts (factor purity, predictability, selection versus weighting, factor timing, etc.).

Finally, we mention a few wide-scope papers on this topic: , and .

## 3.1 Introduction

The topic of factor investing, though a decades-old academic theme, has gained traction concurrently with the rise of equity traded funds (ETFs) as vectors of investment. Both have gathered momentum in the 2010 decade. Not so surprisingly, the feedback loop between practical financial engineering and academic research has stimulated both sides in a mutually beneficial manner. Practitioners rely on key scholarly findings (e.g., asset pricing anomalies) while researchers dig deeper into pragmatic topics (e.g., factor exposure or transaction costs). Recently, researchers have also tried to quantify and qualify the impact of factor indices on financial markets. For instance, analyze herding behaviors while show that the introduction of composite securities increases volatility and cross-asset correlations.

The core aim of factor models is to understand the drivers of asset prices. Broadly speaking, the rationale behind factor investing is that the financial performance of firms depends on factors, whether they be latent and unobservable, or related to intrinsic characteristics (like accounting ratios for instance). Indeed, as frames it, the first essential question is which characteristics really provide independent information about average returns? Answering this question helps understand the cross-section of returns and may open the door to their prediction.

Theoretically, linear factor models can be viewed as special cases of the arbitrage pricing theory (APT) of Ross (1976), which assumes that the return of an asset $$n$$ can be modelled as a linear combination of underlying factors $$f_k$$: $$$\tag{3.1} r_{t,n}= \alpha_n+\sum_{k=1}^K\beta_{n,k}f_{t,k}+\epsilon_{t,n},$$$

where the usual econometric constraints on linear models hold: $$\mathbb{E}[\epsilon_{t,n}]=0$$, $$\text{cov}(\epsilon_{t,n},\epsilon_{t,m})=0$$ for $$n\neq m$$ and $$\text{cov}(\textbf{f}_n,\boldsymbol{\epsilon}_n)=0$$. If such factors do exist, then they are in contradiction with the cornerstone model in asset pricing: the capital asset pricing model (CAPM) of , and . Indeed, according to the CAPM, the only driver of returns is the market portfolio. This explains why factors are also called ‘anomalies.’

Empirical evidence of asset pricing anomalies has accumulated since the dual publication of and . This seminal work has paved the way for a blossoming stream of literature that has its meta-studies (e.g., , and ). The regression (3.1) can be evaluated once (unconditionally) or sequentially over different time frames. In the latter case, the parameters (coefficient estimates) change and the models are thus called conditional (we refer to and to for recent results on this topic as well as for a detailed review on the related research). Conditional models are more flexible because they acknowledge that the drivers of asset prices may not be constant, which seems like a reasonable postulate.

## 3.2 Detecting anomalies

### 3.2.1 Challenges

Obviously, a crucial step is to be able to identify an anomaly and the complexity of this task should not be underestimated. Given the publication bias towards positive results (see, e.g., in financial economics), researchers are often tempted to report partial results that are sometimes invalidated by further studies. The need for replication is therefore high and many findings have no tomorrow (, , ), especially if transaction costs are taken into account (, ). Nevertheless, as is demonstrate by , $$p$$-hacking alone cannot account for all the anomalies documented in the literature. One way to reduce the risk of spurious detection is to increase the hurdles (often, the $$t$$-statistics) but the debate is still ongoing (, ), or to resort to multiple testing (, ). Nevertheless, the large sample sizes used in finance may mechanically lead to very low $$p$$-values and we refer to for a discussion on this topic.

Some researchers document fading anomalies because of publication: once the anomaly becomes public, agents invest in it, which pushes prices up and the anomaly disappears. and document this effect in the US but find that all other countries experience sustained post-publication factor returns (see also ). With a different methodology, introduce a publication bias adjustment for returns and the authors note that this (negative) adjustment is in fact rather small. Likewise, finds that $$p$$-hacking cannot be responsible for all the anomalies reported in the literature. recommends the notion of alpha decay to study the persistence or attenuation of anomalies. even builds a model in which agents invest according to anomalies reporting in academic research.

The destruction of factor premia may be due to herding (, ) and could be accelerated by the democratization of so-called smart-beta products (ETFs notably) that allow investors to directly invest in particular styles (value, low volatility, etc.) - see . For a theoretical perspective on the attractivity of factor investing, we refer to Jin (2019). For an empirical study that links crowding to factor returns we point to .

On the other hand, argue that the price impact of crowding in the smart-beta universe is mitigated by trading diversification stemming from external institutions that trade according to strategies outside this space (e.g., high frequency traders betting via order-book algorithms).

The remainder of this subsection was inspired from and .

### 3.2.2 Simple portfolio sorts

This is the most common procedure and the one used in . The idea is simple. On one date,

1. rank firms according to a particular criterion (e.g., size, book-to-market ratio);
2. form $$J\ge 2$$ portfolios (i.e., homogeneous groups) consisting of the same number of stocks according to the ranking (usually, $$J=2$$, $$J=3$$, $$J=5$$ or $$J=10$$ portfolios are built, based on the median, terciles, quintiles or deciles of the criterion);
3. the weight of stocks inside the portfolio is either uniform (equal weights), or proportional to market capitalization;
4. at a future date (usually one month), report the returns of the portfolios.
Then, iterate the procedure until the chronological end of the sample is reached.

The outcome is a time series of portfolio returns $$r_t^j$$ for each grouping $$j$$. An anomaly is identified if the $$t$$-test between the first ($$j=1$$) and the last group ($$j=J$$) unveils a significant difference in average returns. More robust tests are described in . A strong limitation of this approach is that the sorting criterion could have a non-monotonic impact on returns and a test based on the two extreme portfolios would not detect it. Several articles address this concern: and for instance. Another concern is that these sorted portfolios may capture not only the priced risk associated to the characteristic, but also some unpriced risk. show that it is possible to disentangle the two and make the most of altered sorted portfolios.

Instead of focusing on only one criterion, it is possible to group asset according to more characteristics. The original paper also combines market capitalization with book-to-market ratios. Each characteristic is divided into 10 buckets, which makes 100 portfolios in total. Beyond data availability, there is no upper bound on the number of features that can be included in the sorting process. In fact, some authors investigate more complex sorting algorithms that can manage a potentially large number of characteristics (see e.g., and ).

Finally, we refer to for refinements that take into account the covariance structure of asset returns and to for a theoretical study on the statistical properties of the sorting procedure (including theoretical links with regression-based approaches). Notably, the latter paper discusses the optimal number of portfolios and suggests that it is probably larger than the usual 10 often used in the literature.

In the code and Figure 3.1 below, we compute size portfolios (equally weighted: above versus below the median capitalization). According to the size anomaly, the firms with below median market cap should earn higher returns on average. This is verified whenever the orange bar in the plot is above the blue one (it happens most of the time).

data_ml %>%
group_by(date) %>%
mutate(large = Mkt_Cap_12M_Usd > median(Mkt_Cap_12M_Usd)) %>% # Creates the cap sort
ungroup() %>%                                                 # Ungroup
mutate(year = lubridate::year(date)) %>%                      # Creates a year variable
group_by(year, large) %>%                                     # Analyze by year & cap
summarize(avg_return = mean(R1M_Usd)) %>%                     # Compute average return
ggplot(aes(x = year, y = avg_return, fill = large)) +         # Plot!
geom_col(position = "dodge") +                                # Bars side-to-side
theme(legend.position = c(0.8, 0.2)) +                        # Legend location
coord_fixed(124) + theme(legend.title=element_blank()) +      # x/y aspect ratio
scale_fill_manual(values=c("#F87E1F", "#0570EA"), name = "",  # Colors
labels=c("Small", "Large"))  +
ylab("Average returns") + theme(legend.text=element_text(size=9)) 

### 3.2.3 Factors

The construction of so-called factors follows the same lines as above. Portfolios are based on one characteristic and the factor is a long-short ensemble of one extreme portfolio minus the opposite extreme (small minus large for the size factor or high book-to-market ratio minus low book-to-market ratio for the value factor). Sometimes, subtleties include forming bivariate sorts and aggregating several portfolios together, as in the original contribution of . The most common factors are listed below, along with a few references. We refer to the books listed at the beginning of the chapter for a more exhaustive treatment of factor idiosyncrasies. For most anomalies, theoretical justifications have been brought forward, whether risk-based or behavioral. We list the most frequently cited factors below:

• Size (SMB = small firms minus large firms): Banz (1981), , , , and .
• Value (HML = high minus low: undervalued minus growth’ firms): , , .
• Momentum (WML = winners minus losers): , and . The winners are the assets that have experienced the highest returns over the last year (sometimes the computation of the return is truncated to omit the last month). Cross-sectional momentum is linked, but not equivalent, to time series momentum (trend following), see e.g., and . Momentum is also related to contrarian movements that occur both at higher and lower frequencies (short-term and long-term reversals), see .
• Profitability (RMW = robust minus weak profits): , . In the former reference, profitability is measured as (revenues - (cost and expenses))/equity.
• Investment (CMA = conservative minus aggressive): , . Investment is measured via the growth of total assets (divided by total assets). Aggressive firms are those that experience the largest growth in assets.
• Low risk’ (sometimes, BAB = betting against beta): , , , , and . In this case, the computation of risk changes from one article to the other (simple volatility, market beta, idiosyncratic volatility, etc.).

With the notable exception of the low risk premium, the most mainstream anomalies are kept and updated in the data library of Kenneth French (https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html). Of course, the computation of the factors follows a particular set of rules, but they are generally accepted in the academic sphere. Another source of data is the AQR repository: https://www.aqr.com/Insights/Datasets.

In the dataset we use for the book, we proxy the value anomaly not with the book-to-market ratio but with the price-to-book ratio (the book value is located in the denominator). As is shown in , the choice of the variable for value can have sizable effects.

Below, we import data from Ken French’s data library. We will use it later on in the chapter.

library(quantmod)                         # Package for data extraction
library(xtable)                           # Package for LaTeX exports
min_date <- "1963-07-31"                  # Start date
max_date <- "2020-03-28"                  # Stop date
temp <- tempfile()
KF_website <- "http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/"
KF_file <- "ftp/F-F_Research_Data_5_Factors_2x3_CSV.zip"
link <- paste0(KF_website,KF_file)        # Link of the file
download.file(link, temp, quiet = TRUE)   # Download!
FF_factors <- read_csv(unz(temp, "F-F_Research_Data_5_Factors_2x3.CSV"),
skip = 3) %>%          # Check the number of lines to skip!
rename(date = X1, MKT_RF = Mkt-RF) %>%  # Change the name of first columns
mutate_at(vars(-date), as.numeric) %>%                 # Convert values to number
mutate(date = ymd(parse_date_time(date, "%Y%m"))) %>%  # Date in right format
mutate(date = rollback(date + months(1)))              # End of month date
FF_factors <- FF_factors %>% mutate(MKT_RF = MKT_RF / 100, # Scale returns
SMB = SMB / 100,
HML = HML / 100,
RMW = RMW / 100,
CMA = CMA / 100,
RF = RF/100) %>%
filter(date >= min_date, date <= max_date)             # Finally, keep only recent points
knitr::kable(head(FF_factors),  booktabs = TRUE,
caption = "Sample of monthly factor returns.") # A look at the data (see table)                   
TABLE 3.1: Sample of monthly factor returns.
date MKT_RF SMB HML RMW CMA RF
1963-07-31 -0.0039 -0.0045 -0.0094 0.0066 -0.0115 0.0027
1963-08-31 0.0507 -0.0082 0.0182 0.0040 -0.0040 0.0025
1963-09-30 -0.0157 -0.0048 0.0017 -0.0076 0.0024 0.0027
1963-10-31 0.0253 -0.0130 -0.0004 0.0275 -0.0224 0.0029
1963-11-30 -0.0085 -0.0085 0.0170 -0.0045 0.0222 0.0027
1963-12-31 0.0183 -0.0190 -0.0006 0.0007 -0.0030 0.0029

Posterior to the discovery of these stylized facts, some contributions have aimed at building theoretical models that capture these properties. We cite a handful below:

• size and value: , , , , , ;
• momentum: , , , .

In addition, recent bridges have been built between risk-based factor representations and behavioural theories. We refer essentially to and and the references therein.

While these factors (i.e., long-short portfolios) exhibit time-varying risk premia and are magnified by corporate news and announcements (), it is well-documented (and accepted) that they deliver positive returns over long horizons.6 We refer to and to the survey , as well as to the related bibliography for technical details on estimation procedures of risk premia and the corresponding empirical results. Large sample studies that documents regime changes in factor premia were also carried out by and . Moreover, the predictability of returns is also time-varying (as documented in , and ), and estimation methods can be improved ().

In Figure 3.2, we plot the average monthly return aggregated over each calendar year for five common factors. The risk free rate (which is not a factor per se) is the most stable, while the market factor (aggregate market returns minus the risk-free rate) is the most volatile. This makes sense because it is the only long equity factor among the five series.

FF_factors %>%
mutate(date = year(date)) %>%                       # Turn date into year
gather(key = factor, value = value, - date) %>%     # Put in tidy shape
group_by(date, factor) %>%                          # Group by year and factor
summarise(value = mean(value)) %>%                  # Compute average return
ggplot(aes(x = date, y = value, color = factor)) +  # Plot
geom_line() + coord_fixed(500)                      # Fix x/y ratio

The individual attributes of investors who allocate towards particular factors is a blossoming topic. We list a few references below, even though they somewhat lie out of the scope of this book. show that value investors are older, wealthier and face lower income risk compared to growth investors who are those in the best position to take financial risks. The study leads to different conclusions: it finds that the propensity to invest in value versus growth assets has roots in genetics and in life events (the latter effect being confirmed in , and the former being further detailed in a more general context in ). Psychological traits can also explain some factors: when agents extrapolate, they are likely to fuel momentum (this topic is thoroughly reviewed in ). Micro- and macro-economic consequences of these preferences are detailed in . To conclude this paragraph, we mention that theoretical models have also been proposed that link agents’ preferences and beliefs (via prospect theory) to market anomalies (see for instance ).

Finally, we highlight the need of replicability of factor premia and echo the recent editorial by . As is shown by and , many proclaimed factors are in fact very much data-dependent and often fail to deliver sustained profitability when the investment universe is altered or when the definition of variable changes ().

Campbell Harvey and his co-authors, in a series of papers, tried to synthesize the research on factors in , and . His work underlines the need to set high bars for an anomaly to be called a ‘true’ factor. Increasing thresholds for $$p$$-values is only a partial answer, as it is always possible to resort to data snooping in order to find an optimized strategy that will fail out-of-sample but that will deliver a $$t$$-statistic larger than three (or even four). recommends to resort to a Bayesian approach which blends data-based significance with a prior into a so-called Bayesianized p-value (see subsection below).

Following this work, researchers have continued to explore the richness of this zoo. propose a tractable Bayesian estimation of large-dimensional factor models and evaluate all possible combinations of more than 50 factors, yielding an incredibly large number of coefficients. This combined with a Bayesianized procedure allows to distinguish between pervasive and superfluous factors. use simulations of 2 million trading strategies to estimate the rate of false discoveries, that is, when a spurious factor is detected (type I error). They also advise to use thresholds for t-statistics that are well above three. In a similar vein, also underline that sometimes true anomalies may be missed because of a one time $$t$$-statistic that is too low (type II error).

The propensity of journals to publish positive results has led researchers to estimate the difference between reported returns and true returns. call this difference the publication bias and estimate it as roughly 12%. That is, if a published average return is 8%, the actual value may in fact be closer to (1-12%)*8%=7%. Qualitatively, this estimation of 12% is smaller than the out-of-sample reduction in returns found in .

### 3.2.4 Predictive regressions, sorts, and p-value issues

For simplicity, we assume a simple form: $$$\tag{3.2} \textbf{r} = a+b\textbf{x}+\textbf{e},$$$ where the vector $$\textbf{r}$$ stacks all returns of all stocks and $$\textbf{x}$$ is a lagged variable so that the regression is indeed predictive. If the estimate $$\hat{b}$$ is significant given a specified threshold, then it can be tempting to conclude that $$\textbf{x}$$ does a good job at predicting returns. Hence, long-short portfolios related to extreme values of $$\textbf{x}$$ (mind the sign of $$\hat{b}$$) are expected to generate profits. This is unfortunately often false because $$\hat{b}$$ gives information on the past ability of $$\textbf{x}$$ to forecast returns. What happens in the future may be another story.

Statistical tests are also used for portfolio sorts. Assume two extreme portfolios are expected to yield very different average returns (like very small cap versus very large cap, or strong winners versus bad losers). The portfolio returns are written $$r_t^+$$ and $$r_t^-$$. The simplest test for the mean is $$t=\sqrt{T}\frac{m_{r_+}-m_{r_-}}{\sigma_{r_+-r_-}}$$, where $$T$$ is the number of points and $$m_{r_\pm}$$ denotes the means of returns and $$\sigma_{r_+-r_-}$$ is the standard deviation of the difference between the two series, i.e., the volatility of the long-short portfolio. In short, the statistic can be viewed as a scaled Sharpe ratio (though usually these ratios are computed for long-only portfolios) and can in turn be used to compute $$p$$-values to assess the robustness of an anomaly. As is shown in and , many factors discovered by reasearchers fail to survive in out-of-sample tests.

One reason why people are overly optimistic about anomalies they detect is the widespread reverse interpretation of the p-value. Often, it is thought of as the probability of one hypothesis (e.g., my anomaly exists) given the data. In fact, it’s the opposite; it’s the likelihood of your data sample, knowing that the anomaly holds. \begin{align*} p-\text{value} &= P[D|H] \\ \text{target prob.}& = P[H|D]=\frac{P[D|H]}{P[D]}\times P[H], \end{align*} where $$H$$ stands for hypothesis and $$D$$ for data. The equality in the second row is a plain application of Bayes’ identity: the interesting probability is in fact a transform of the $$p$$-value.

Two articles (at least) discuss this idea. introduces Bayesianized $$p$$-values: $$$\tag{3.3} \text{Bayesianized } p-\text{value}=\text{Bpv}= e^{-t^2/2}\times\frac{\text{prior}}{1+e^{-t^2/2}\times \text{prior}} ,$$$ where $$t$$ is the $$t$$-statistic obtained from the regression (i.e., the one that defines the p-value) and prior is the analyst’s estimation of the odds that the hypothesis (anomaly) is true. The prior is coded as follows. Suppose there is a p% chance that the null holds (i.e., (1-p)% for the anomaly). The odds are coded as $$p/(1-p)$$. Thus, if the t-statistic is equal to 2 (corresponding to a p-value of 5% roughly) and the prior odds are equal to 6, then the Bpv is equal to $$e^{-2}\times 6 \times(1+e^{-2}\times 6)^{-1}\approx 0.448$$ and there is a 44.8% chance that the null is true. This interpretation stands in sharp contrast with the original $$p$$-value which cannot be viewed as a probability that the null holds. Of course, one drawback is that the level of the prior is crucial and solely user-specified.

The work of is very different but shares some key concepts, like the introduction of Bayesian priors in regression outputs. They show that coercing the predictive regression with an $$L^2$$ constraint (see the ridge regression in Chapter 5) amounts to introducing views on what the true distribution of $$b$$ is. The stronger the constraint, the more the estimate $$\hat{b}$$ will be shrunk towards zero. One key idea in their work is the assumption of a distribution for the true $$b$$ across many anomalies. It is assumed to be Gaussian and centered. The interesting parameter is the standard deviation: the larger it is, the more frequently significant anomalies are discovered. Notably, the authors show that this parameter changes through time and we refer to the original paper for more details on this subject.

### 3.2.5 Fama-Macbeth regressions

Another detection method was proposed by through a two-stage regression analysis of risk premia. The first stage is a simple estimation of the relationship (3.1): the regressions are run on a stock-by-stock basis over the corresponding time series. The resulting estimates $$\hat{\beta}_{i,k}$$ are then plugged into a second series of regressions: $$$r_{t,n}= \gamma_{t,0} + \sum_{k=1}^K\gamma_{t,k}\hat{\beta}_{n,k} + \varepsilon_{t,n},$$$ which are run date-by-date on the cross-section of assets.7 Theoretically, the betas would be known and the regression would be run on the $$\beta_{n,k}$$ instead of their estimated values. The $$\hat{\gamma}_{t,k}$$ estimate the premia of factor $$k$$ at time $$t$$. Under suitable distributional assumptions on the $$\varepsilon_{t,n}$$, statistical tests can be performed to determine whether these premia are significant or not. Typically, the statistic on the time-aggregated (average) premia $$\hat{\gamma}_k=\frac{1}{T}\sum_{t=1}^T\hat{\gamma}_{t,k}$$: $t_k=\frac{\hat{\gamma}_k}{\hat{\sigma_k}/\sqrt{T}}$ is often used in pure Gaussian contexts to assess whether or not the factor is significant ($$\hat{\sigma}_k$$ is the standard deviation of the $$\hat{\gamma}_{t,k}$$).

We refer to and for technical discussions on the biases and losses in accuracy that can be induced by standard ordinary least squares (OLS) estimations. Moreover, as the $$\hat{\beta}_{i,k}$$ in the second-pass regression are estimates, a second level of errors can arise (the so-called errors in variables). The interested reader will find some extensions and solutions in , and .

Below, we perform regressions on our sample. We start by the first pass: individual estimation of betas. We build a dedicated function below and use some functional programming to automate the process. We stick to the original implementation of the estimation and perform synchronous regressions.

nb_factors <- 5                                                     # Number of factors
data_FM <- left_join(data_ml %>%                                    # Join the 2 datasets
dplyr::select(date, stock_id, R1M_Usd) %>% # (with returns...
filter(stock_id %in% stock_ids_short),     # ... over some stocks)
FF_factors,
by = "date") %>%
group_by(stock_id) %>%                                          # Grouping
mutate(R1M_Usd = lag(R1M_Usd)) %>%                              # Lag returns
ungroup() %>%
na.omit() %>%                                                   # Remove missing points
spread(key = stock_id, value = R1M_Usd)
models <- lapply(paste0("", stock_ids_short,
' ~  MKT_RF + SMB + HML + RMW + CMA'),           # Model spec
function(f){ lm(as.formula(f), data = data_FM,           # Call lm(.)
na.action="na.exclude") %>%
summary() %>%                                    # Gather the output
"$"(coef) %>% # Keep only coefs data.frame() %>% # Convert to dataframe dplyr::select(Estimate)} # Keep the estimates ) betas <- matrix(unlist(models), ncol = nb_factors + 1, byrow = T) %>% # Extract the betas data.frame(row.names = stock_ids_short) # Format: row names colnames(betas) <- c("Constant", "MKT_RF", "SMB", "HML", "RMW", "CMA") # Format: col names TABLE 3.2: Sample of beta values (row numbers are stock IDs). Constant MKT_RF SMB HML RMW CMA 1 0.008 1.424 0.521 0.648 0.994 -0.405 3 -0.002 0.822 1.101 0.894 0.313 -0.541 4 0.005 0.363 0.298 -0.048 0.587 0.201 7 0.006 0.424 0.681 0.253 0.312 0.119 9 0.004 0.838 0.663 1.065 0.050 0.066 11 -0.001 0.987 0.139 0.499 -0.110 -0.012 In the table, MKT_RF is the market return minus the risk free rate. The corresponding coefficient is often referred to as the beta, especially in univariate regressions. We then reformat these betas from Table 3.2 to prepare the second pass. Each line corresponds to one asset: the first 5 columns are the estimated factor loadings and the remaining ones are the asset returns (date by date). loadings <- betas %>% # Start from loadings (betas) dplyr::select(-Constant) %>% # Remove constant data.frame() # Convert to dataframe ret <- returns %>% # Start from returns dplyr::select(-date) %>% # Keep the returns only data.frame(row.names = returns$date) %>%     # Set row names
t()                                          # Transpose
FM_data <- cbind(loadings, ret)                  # Aggregate both

TABLE 3.3: Sample of reformatted beta values (ready for regression).
MKT_RF SMB HML RMW CMA 2000-01-31 2000-02-29 2000-03-31
1 1.4244867 0.5213076 0.6480181 0.9935662 -0.4051943 -0.036 0.263 0.031
3 0.8224423 1.1008094 0.8938485 0.3125507 -0.5409821 0.077 -0.024 0.018
4 0.3629633 0.2975909 -0.0480948 0.5870450 0.2006145 -0.016 0.000 0.153
7 0.4236196 0.6812306 0.2525918 0.3120793 0.1192179 -0.009 0.027 0.000
9 0.8377579 0.6628723 1.0648010 0.0496863 0.0664127 0.032 0.076 -0.025
11 0.9868678 0.1392741 0.4990199 -0.1098016 -0.0122229 0.144 0.258 0.049

We observe that the values of the first column (market betas) revolve around one, which is what we would expect. Finally, we are ready for the second round of regressions.

models <- lapply(paste("", returns$date, "", ' ~ MKT_RF + SMB + HML + RMW + CMA', sep = ""), function(f){ lm(as.formula(f), data = FM_data) %>% # Call lm(.) summary() %>% # Gather the output "$"(coef) %>%                                    # Keep only the coefs
data.frame() %>%                                 # Convert to dataframe
dplyr::select(Estimate)}                         # Keep only estimates
)
gammas <- matrix(unlist(models), ncol = nb_factors + 1, byrow = T) %>%    # Switch to dataframe
data.frame(row.names = returns$date) # & set row names colnames(gammas) <- c("Constant", "MKT_RF", "SMB", "HML", "RMW", "CMA") # Set col names TABLE 3.4: Sample of gamma (premia) values. Constant MKT_RF SMB HML RMW CMA 2000-01-31 -0.012 0.042 0.218 -0.135 -0.272 0.034 2000-02-29 0.012 0.076 -0.130 0.046 0.085 -0.028 2000-03-31 0.007 -0.012 -0.014 0.052 0.039 0.043 2000-04-30 0.137 -0.155 -0.104 0.160 0.076 -0.061 2000-05-31 0.050 -0.009 0.072 -0.096 -0.093 -0.053 2000-06-30 0.027 -0.029 -0.018 0.053 0.045 0.017 Visually, the estimated premia are also very volatile. We plot their estimated values for the market, SMB and HML factors. gammas[2:nrow(gammas),] %>% # Take gammas: # The first row is omitted because the first row of returns is undefined dplyr::select(MKT_RF, SMB, HML) %>% # Select 3 factors bind_cols(date = data_FM$date) %>%                              # Add date
gather(key = factor, value = gamma, -date) %>%                  # Put in tidy shape
ggplot(aes(x = date, y = gamma, color = factor)) +              # Plot
geom_line() + facet_grid( factor~. ) +                          # Lines & facets
scale_color_manual(values=c("#F87E1F", "#0570EA", "#F81F40")) + # Colors
coord_fixed(980)                                                # Fix x/y ratio

The two spikes at the end of the sample signal potential colinearity issues; two factors seem to compensate in an unclear aggregate effect. This underlines the usefulness of penalized estimates (see Chapter 5).

### 3.2.6 Factor competition

The core purpose of factors is to explain the cross-section of stock returns. For theoretical and practical reasons, it is preferable if redundancies within factors are avoided. Indeed, redundancies imply collinearity which is known to perturb estimates (). In addition, when asset managers decompose the performance of their returns into factors, overlaps (high absolute correlations) between factors yield exposures that are less interpretable; positive and negative exposures compensate each other spuriously.

A simple protocol to sort out redundant factors is to run regressions of each factor against all others: $$$\tag{3.4} f_{t,k} = a_k +\sum_{j\neq k} \delta_{k,j} f_{t,j} + \epsilon_{t,k}.$$$ The interesting metric is then the test statistic associated to the estimation of $$a_k$$. If $$a_k$$ is significantly different from zero, then the cross-section of (other) factors fails to explain exhaustively the average return of factor $$k$$. Otherwise, the return of the factor can be captured by exposures to the other factors and is thus redundant.

One mainstream application of this technique was performed in , in which the authors show that the HML factor is redundant when taking into account four other factors (Market, SMB, RMW and CMA). Below, we reproduce their analysis on an updated sample. We start our analysis directly with the database maintained by Kenneth French.

We can run the regressions that determine the redundancy of factors via the procedure defined in Equation (3.4).

factors <- c("MKT_RF", "SMB", "HML", "RMW", "CMA")
models <- lapply(paste(factors, ' ~  MKT_RF + SMB + HML + RMW + CMA-',factors),
function(f){ lm(as.formula(f), data = FF_factors) %>%               # Call lm(.)
summary() %>%                               # Gather the output
"$"(coef) %>% # Keep only the coefs data.frame() %>% # Convert to dataframe filter(rownames(.) == "(Intercept)") %>% # Keep only the Intercept dplyr::select(Estimate,Pr...t..)} # Keep the coef & p-value ) alphas <- matrix(unlist(models), ncol = 2, byrow = T) %>% # Switch from list to dataframe data.frame(row.names = factors) # alphas # To see the alphas (optional) We obtain the vector of $$\alpha$$ values from Equation ((3.4)). Below, we format these figures along with $$p$$-value thresholds and export them in a summary table. The significance levels of coefficients is coded as follows: $$0<(***)<0.001<(**)<0.01<(*)<0.05$$. results <- matrix(NA, nrow = length(factors), ncol = length(factors) + 1) # Coefs signif <- matrix(NA, nrow = length(factors), ncol = length(factors) + 1) # p-values for(j in 1:length(factors)){ form <- paste(factors[j], ' ~ MKT_RF + SMB + HML + RMW + CMA-',factors[j]) # Build model fit <- lm(form, data = FF_factors) %>% summary() # Estimate model coef <- fit$coefficients[,1]                                            # Keep coefficients
p_val <- fit$coefficients[,4] # Keep p-values results[j,-(j+1)] <- coef # Fill matrix signif[j,-(j+1)] <- p_val } signif[is.na(signif)] <- 1 # Kick out NAs results <- results %>% round(3) %>% data.frame() # Basic formatting results[signif<0.001] <- paste(results[signif<0.001]," (***)") # 3 star signif results[signif>0.001&signif<0.01] <- # 2 star signif paste(results[signif>0.001&signif<0.01]," (**)") results[signif>0.01&signif<0.05] <- # 1 star signif paste(results[signif>0.01&signif<0.05]," (*)") results <- cbind(as.character(factors), results) # Add dep. variable colnames(results) <- c("Dep. Variable","Intercept", factors) # Add column names TABLE 3.5: Factor competition among the Fama and French (2015) five factors. Dep. Variable Intercept MKT_RF SMB HML RMW CMA MKT_RF 0.008 (***) NA 0.264 (***) 0.102 -0.345 (***) -0.903 (***) SMB 0.003 (*) 0.134 (***) NA 0.077 -0.428 (***) -0.126 HML 0 0.027 0.041 NA 0.151 (***) 1.015 (***) RMW 0.004 (***) -0.091 (***) -0.222 (***) 0.149 (***) NA -0.278 (***) CMA 0.002 (***) -0.109 (***) -0.03 0.457 (***) -0.128 (***) NA We confirm that the HML factor remains redundant when the four others are present in the asset pricing model. The figures we obtain are very close to the ones in the original paper (), which makes sense, since we only add 5 years to their initial sample. At a more macro-level, researchers also try to figure out which models (i.e., combinations of factors) are the most likely, given the data empirically observed (and possibly given priors formulated by the econometrician). For instance, this stream of literature seeks to quantify to which extent the 3-factor model of outperforms the 5 factors in . In this direction, introduce a novel computation for p-values that compare the relative likelihood that two models pass a zero-alpha test. More generally, the Bayesian method of was subsequently improved by - see also and (an R package exists for the former: czfactor). For a discussion on model comparison from a transaction cost perspective, we refer to . Lastly, even the optimal number of factors is a subject of disagreement among conclusions of recent work. While the traditional literature focuses on a limited number (3-5) of factors, more recent research by , , and advocates the need to use at least 15 or more (in contrast, argue that a small number of latent factors may suffice). even find that the number of characteristics that help explain the cross-section of returns varies in time.8 ### 3.2.7 Advanced techniques The ever increasing number of factors combined to their importance in asset management has led researchers to craft more subtle methods in order to organize’’ the so-called factor zoo and, more importantly, to detect spurious anomalies and compare different asset pricing model specifications. We list a few of them below. - combine LASSO selection with Fama-MacBeth regressions to test if new factor models are worth it. They quantify the gain of adding one new factor to a set of predefined factors and show that many factors reported in papers published in the 2010 decade do not add much incremental value; - (in a similar vein) use bootstrap on orthogonalized factors. They make the case that correlations among predictors is a major issue and their method aims at solving this problem. Their lengthy procedure seeks to test if maximal additional contribution of a candidate variable is significant; - compare asset pricing models through squared maximum Sharpe ratios; - estimate factor risk premia using a three-pass method based on principal component analysis; - disentangle priced and non-priced factors via a combination of principal component analysis and regressions; - warn against factor misspecification (when spurious factors are included in the list of regressors). Traded factors ($$resp.$$ macro-economic factors) seem more likely ($$resp.$$ less likely) to yield robust identifications (see also ). There is obviously no infallible method, but the number of contributions in the field highlights the need for robustness. This is evidently a major concern when crafting investment decisions based on factor intuitions. One major hurdle for short-term strategies is the likely time-varying feature of factors. We refer for instance to , , and for practical results and to and for more theoretical treatments (with additional empirical results). ## 3.3 Factors or characteristics? The decomposition of returns into linear factor models is convenient because of its simple interpretation. There is nonetheless a debate in the academic literature about whether firm returns are indeed explained by exposure to macro-economic factors or simply by the characteristics of firms. In their early study, argue that one explanation of the value premium comes from incorrect extrapolation of past earning growth rates. Investors are overly optimistic about firms subject to recent profitability. Consequently, future returns are (also) driven by the core (accounting) features of the firm. The question is then to disentangle which effect is the most pronounced when explaining returns: characteristics versus exposures to macro-economic factors. In their seminal contribution on this topic, provide evidence in favour of the former (two follow-up papers are and ). They show that firms with high book-to-market ratios or small capitalizations display higher average returns, even if they are negatively loaded on the HML or SMB factors. Therefore, it seems that it is indeed the intrinsic characteristics that matter, and not the factor exposure. For further material on characteristics’ role in return explanation or prediction, we refer to the following contributions: - estimate predictive regressions based on firms characteristics and show that it is possible to build profitable portfolios based on the resulting predictions. There method was subsequently enhanced with the adaptive LASSO by Guo (2020). - Section 2.5.2. in surveys pre-2010 results on this topic; - find that characteristics explain a larger proportion of variation in estimated expected returns than factor loadings; - reconcile factor-based explanations of premia to a theoretical model in which some agents’ demands are sentiment driven; - show with penalized regressions that 20 to 30 characteristics (out of 94) are useful for the prediction of monthly returns of US stocks. Their methodology is interesting: they regress returns against characteristics to build forecasts and then regress the returns on the forecast to assess if they are reliable. The latter regression uses a LASSO-type penalization (see Chapter 5) so that useless characteristics are excluded from the model. The penalization is extended to elasticnet in . - and both estimate models in which factors are latent but loadings (betas) and possibly alphas depend on characteristics. generalizes the first approach by introducing regime-switching. In contrast, and estimate latent factors without any link to particular characteristics (and provide large sample asymptotic properties of their methods). - In the same vein as , and and discuss potential errors that arise when working with portfolio sorts that yield long-short returns. The authors show that in some cases, tests based on this procedure may be deceitful. This happens when the characteristic chosen to perform the sort is correlated with an external (unobservable) factor. They propose a novel regression-based approach aimed at bypassing this problem. More recently and in a separate stream of literature, have introduced a demand model in which investors form their portfolios according to their preferences towards particular firm characteristics. They show that this allows them to mimic the portfolios of large institutional investors. In their model, aggregate demands (and hence, prices) are directly linked to characteristics, not to factors. In a follow-up paper, show that a few sets of characteristics suffice to predict future returns. They also show that, based on institutional holdings from the UK and the US, the largest investors are those who are the most influencial in the formation of prices. In a similar vein, derive an elegant (theoretical) general equilibrium model that generates some well-documented anomalies (size, book-to-market). The models of and are also able to theoretically generate known anomalies. Finally, in , characteristics influence returns via the role they play in the predictability of dividend growth. This paper discussed the asymptotic case when the number of assets and the number of characteristics are proportional and both increase to infinity. ## 3.4 Hot topics: momentum, timing and ESG ### 3.4.1 Factor momentum A recent body of literature unveils a time series momentum property of factor returns. For instance, report that autocorrelation patterns within these returns is statistically significant.9 Similar results are obtained in . In the same vein, make the case that the industry momentum found in can in fact be explained by this factor momentum. Going even further, conclude that the original momentum factor is in fact the aggregation of the autocorrelation that can be found in all other factors. Acknowledging the profitability of factor momentum, seeks to understand its source and decomposes stock factor momentum portfolios into two components: factor timing portfolio and a static portfolio. The former seeks to profit from the serial correlations of factor returns while the latter tries to harness factor premia. The author shows that it is the static portfolio that explains the larger portion of factor momentum returns. In , the same author presents a new estimator to gauge factor momentum predictability. Words of caution are provided in . Given the data obtained on Ken French’s website, we compute the autocorrelation function (ACF) of factors. We recall that $\text{ACF}_k(\textbf{x}_t)=\mathbb{E}[(\textbf{x}_t-\bar{\textbf{x}})(\textbf{x}_{t+k}-\bar{\textbf{x}})].$ library(cowplot) # For stacking plots library(forecast) # For autocorrelation function acf_SMB <- ggAcf(FF_factors$SMB, lag.max = 10) + labs(title = "")  # ACF SMB
acf_HML <- ggAcf(FF_factors$HML, lag.max = 10) + labs(title = "") # ACF HML acf_RMW <- ggAcf(FF_factors$RMW, lag.max = 10) + labs(title = "")  # ACF RMW
acf_CMA <- ggAcf(FF_factors\$CMA, lag.max = 10) + labs(title = "")  # ACF CMA
plot_grid(acf_SMB, acf_HML, acf_RMW, acf_CMA,  # Plot
labels = c('SMB', 'HML', 'RMW', 'CMA')) 

Of the four chosen series, only the size factor is not significantly autocorrelated at the first order.

### 3.4.2 Factor timing

Given the abundance of evidence of the time-varying nature of factor premia, it is legitimate to wonder if it is possible to predict when factor will perform well or badly. The evidence on the effectiveness of timing is diverse: positive for , , , and , negative for and mixed for . There is no consensus on which predictors to use (general macroeconomic indicators in , stock issuances versus repurchases in , and aggregate fundamental data in ). A method for building reasonable timing strategies for long-only portfolios with sustainable transaction costs is laid out in . In ML-based factor investing, it is possible to resort to more granularity by combining firm-specific attributes to large-scale economic data as we explain in Section 4.7.2.

### 3.4.3 The green factors

The demand for ethical financial products has sharply risen during the 2010 decade, leading to the creation of funds dedicated to socially responsible investing (SRI - see ). Though this phenomenon is not really new (, ), its acceleration has prompted research about whether or not characteristics related to ESG criteria (environment, social, governance) are priced. Dozens and even possibly hundreds of papers have been devoted to this question, but no consensus has been reached. More and more, researchers study the financial impact of climate change (see , and ) and the societal push for responsible corporate behavior (, ). We gather below a very short list of papers that suggests conflicting results:

• favorable: ESG investing works (, ), can work (, ), or can at least be rendered efficient (). A large meta-study reports overwhelming favorable results (), but of course, they could well stem from the publication bias towards positive results.
• unfavorable: Ethical investing is not profitable according to and . An ESG factor should be long unethical firms and short ethical ones ().
• mixed: ESG investing may be beneficial globally but not locally (). Portfolios relying on ESG screening do not significantly outperform those with no screening but are subject to lower levels of volatility (, ). As is often the case, the devil is in the details, and results depend on whether to use E, S or G ().

On top of these contradicting results, several articles point towards complexities in the measurement of ESG. Depending on the chosen criteria and on the data provider, results can change drastically (see , and ).

We end this short section by noting that of course ESG criteria can directly be integrated into ML model, as is for instance done in .

## 3.6 Coding exercises

1. Compute annual returns of the growth versus value portfolios, that is, the average return of firms with above median price-to-book ratio (the variable is called `Pb’ in the dataset).
2. Same exercise, but compute the monthly returns and plot the value (through time) of the corresponding portfolios.
3. Instead of a unique threshold, compute simply sorted portfolios based on quartiles of market capitalization. Compute their annual returns and plot them.

1. This has been a puzzle for the value factor during the 2010 decade during which the factor performed poorly (see and ). argue that it is because some fundamentals of value firms (like ROE) have not improved at the rate of those of growth firms. This underlines that it is hard to pick which fundamental metrics matter and that their importance varies with time. even find that resorting to AI to make sense (and mine) the fundamentals’ zoo only helps marginally.↩︎

2. Originally, work with the market beta only: $$r_{t,n}=\alpha_n+\beta_nr_{t,M}+\epsilon_{t,n}$$ and the second pass included nonlinear terms: $$r_{t,n}=\gamma_{n,0}+\gamma_{t,1}\hat{\beta}_{n}+\gamma_{t,2}\hat{\beta}^2_n+\gamma_{t,3}\hat{s}_n+\eta_{t,n}$$, where the $$\hat{s}_n$$ are risk estimates for the assets that are not related to the betas. It is then possible to perform asset pricing tests to infer some properties. For instance, test whether betas have a linear influence on returns or not ($$\mathbb{E}[\gamma_{t,2}]=0$$), or test the validity of the CAPM (which implies $$\mathbb{E}[\gamma_{t,0}]=0$$).↩︎

3. Older tests for the number of factors in linear models include and .↩︎

4. Autocorrelation in aggregate/portfolio returns is a widely documented effect since the seminal paper (see also ).↩︎

5. In the same spirit, see also and .↩︎