Wednesday, December 21, 2016

19/12/16: Market Anomalies and Data Mining: Some Pretty Tough Love from Data


Investment anomalies (or in other words efficacy of exogenous factors in determining abnormal returns to investment) are a matter of puzzle for traditional investment analysis. In basic terms, we normally think about the investment as an undertaking that offers no ‘free lunch’ - if markets are liquid, deep and, once we control for risk factors, taxes and transaction costs, an average investor cannot expect to earn an above-market return. Put differently, there should be no ways to systematically (luck omitting) beat the market.

Anomalies represent the case where some factors do, in fact, generate such abnormal returns. There is a range of classic anomalies, most commonly known ones being Small Firms Outperform, January Effect, Low Book Value, Under-dogs or Discounted Assets or Dogs of the Dow, Reversals, Days of the Week, etc. In fact, there is an entire analytics industry built around markets that does one thing: mine for factors that can give investors a leg up on competition, or finding anomalies.

One recent paper have identified a list of some 314 factors that were found - in the literature - to generate abnormal returns. As noted by John Cochrane: “We thought 100% of the cross-sectional variation in expected returns came from the CAPM, now we think that’s about zero and a zoo of new factors describes the cross section.”

A recent paper published by NBER and authored by Juhani Linnainmaa and Michael Roberts (see link below) effectively tests this Cochrane’s proposition. To do this, the authors “examine cross-sectional anomalies in stock returns using hand-collected accounting data extending back to the start of the 20th century. Specifically, we investigate three potential explanations for these anomalies: unmodeled risk, mispricing, and data-snooping.” In other words, the authors look into three reasons as to why anomalies can exist:

  1. Unmodeled risk reflects the view that some of risk premium paid out in the form of investment returns is not captured by traditional models of risk-return relations;
  2. Mispricing reflects the view that markets’ participants routinely and over long run can misplace risk; and
  3. Data-snooping view implies that anomalies generate returns in the historical data that do not replicate in forward-looking implementation because these anomalies basically arise from ad hoc empirical data mining.

The authors argue that “each of these explanations generate different testable implications across three eras encompassed by our data: (1) pre-sample data existing before the discovery of the anomaly, (2) in-sample data used to identify the anomaly, and (3) post-sample data accumulating after identification of the anomaly.”

In their first set of tests, the authors focus on profitability and investment factors, because prior literature shown that “these factors, in concert with the market and size factors, capture much of the cross-sectional variation in stock returns.”

Finding 1: the authors “find no statistically reliable premiums on the profitability and investment factors in the pre-1963 sample period… Between 1963 and 2014, these factors average” statistically and financially significant returns on average of “30 and 25 basis points per month, respectively.”

Finding 2: “The attenuations of the investment and profitability premiums in the pre-1963 data are representative of most of the other 33 anomalies that we examine. Just eight out of the 36 (investment, profitability, value, and 33 others) earn average returns that are positive and statistically significant at the 5% level in the pre-1963 period.

Finding 3: All of the measures of abnormal returns used in the study generate premiums that “decrease sharply and statistically significantly when we move out of the original study’s sample period by going either backward or forward in time.” In other words, anomalies tend to disappear or weaken every time the authors significantly broaden time horizon beyond that which corresponds to the time horizon used in the original study that uncovered such an anomaly.

As authors note, “these findings are consistent with data-snooping as the anomalies are clearly sensitive to the choice of sample period."

How? "...If the anomalies are a consequence of multidimensional risk that is not accurately accounted for by the empirical model (i.e., unmodeled risk), then we would have expected them to be similar across periods, absent structural breaks in the risks that matter to investors. Similarly, if the anomalies are a consequence of mispricing, then we would have expected them to be larger during the pre-discovery sample period when limits to arbitrage, such as transaction costs, were greater.”

But there is a note of caution due. “Our results do not suggest that all return anomalies are spurious. The average in-sample anomaly earns a CAPM alpha of 32 basis points per month (t-value = 10.87). The average alpha is 13 basis points (t-value = 4.42) per month for the pre-discovery sample and 14 basis points (t-value = 4.06) for the post-discovery sample. Although these estimates lie far below the in-sample numbers, they are highly statistically significant.”

The kicker is that “investors, however, face the uncertainty of not knowing which anomalies are real and which are spurious [or due to data mining], and so they need to treat them with caution. …because data-mining bias affects many facets of returns—averages, volatilities, and correlations—it is best to test asset pricing models out of sample," or absent such opportunity (perhaps due to tight data) - by selecting a model / factor that "is able to explain half of the in-sample alpha".




Full paper: Linnainmaa, Juhani T. and Roberts, Michael R., The History of the Cross Section of Stock Returns (December 2016). NBER Working Paper No. w22894. https://ssrn.com/abstract=2880332

Post a Comment