Showing posts with label applied investment. Show all posts
Showing posts with label applied investment. Show all posts

Friday, June 16, 2017

16/6/17: Replicating Scientific Research: Ugly Truth


Continuing with the theme on 'What I've been reading lately?', here is a smashing paper on 'accuracy' of empirical economic studies.

The paper, authored by Hou, Kewei and Xue, Chen and Zhang, Lu, and titled "Replicating Anomalies" (most recent version is from June 12, 2017, but it is also available in an earlier version via NBER) effectively blows a whistle on what is going on in empirical research in economics and finance. Per authors, the vast literature that detects financial markets anomalies (or deviations away from the efficient markets hypothesis / economic rationality) "is infested with widespread p-hacking".

What's p-hacking? Well, it's a shady practice whereby researchers manipulate (by selective inclusion or exclusion) sample criteria (which data points to exclude from estimation) and test procedures (including model specifications and selective reporting of favourable test results), until insignificant results become significant. In other words, under p-hacking, researchers attempt to superficially maximise model and explanatory variables significance, or, put differently, they attempt to achieve results that confirm their intuition or biases.

What's anomalies? Anomalies are departures in the markets (e.g. in share prices) from the predictions generated by the models consistent with rational expectations and the efficient markets hypothesis. In other words, anomalies occur when markets efficiency fails.

There are scores of anomalies detected in the academic literature, prompting many researchers to advocate abandonment (in all its forms, weak and strong) of the idea that markets are efficient.

Hou, Xue and Zhang take these anomalies to the test. The compile "a large data library with 447 anomalies". The authors then control for a key problem with data across many studies: microcaps. Microcaps - or small capitalization firms - are numerous in the markets (accounting for roughly 60% of all stocks), but represent only 3% of total market capitalization. This is true for key markets, such as NYSE, Amex and NASDAQ. Yet, as authors note, evidence shows that microcaps "not only have the highest equal-weighted returns, but also the largest cross-sectional standard deviations in returns and anomaly variables among microcaps, small stocks, and big stocks." In other words, these are higher risk, higher return class of securities. Yet, despite this, "many studies overweight microcaps with equal-weighted returns, and often together with NYSE-Amex-NASDAQ breakpoints, in portfolio sorts." Worse, many (hundreds) of studies use 1970s regression technique that actually assigns more weight to microcaps. In simple terms, microcaps are the most common outlier and despite this they are given either same weight in analysis as non-outliers or their weight is actually elevated relative to normal assets, despite the fact that microcaps have little meaning in driving the actual markets (their weight in the total market is just about 3% in total).

So the study corrects for these problems and finds that, once microcaps are accounted for, the grand total of 286 anomalies (64% of all anomalies studied), and under more strict statistical signifcance test 380 (of 85% of all anomalies) "including 95 out of 102 liquidity variables (93%) are insignificant at the 5% level." In other words, the original studies claims that these anomalies were significant enough to warrant rejection of markets efficiency were not true when one recognizes one basic and simple problem with the data. Worse, per authors, "even for the 161 significant anomalies, their magnitudes are often much lower than originally reported. Among the 161, the q-factor model leaves 115 alphas insignificant (150 with t < 3)."

This is pretty damning for those of us who believe, based on empirical results published over the years, that markets are bounded-efficient, and it is outright savaging for those who claim that markets are perfectly inefficient. But, this tendency of researchers to silverplate statistics is hardly new.

Hou, Xue and Zhang provide a nice summary of research on p-hacking and non-replicability of statistical results across a range of fields. It is worth reading, because it dents significantly ones confidence in the quality of peer review and the quality of scientific research.

As the authors note, "in economics, Leamer (1983) exposes the fragility of empirical results to small specification changes, and proposes to “take the con out of econometrics” by reporting extensive sensitivity analysis to show how key results vary with perturbations in regression specification and in functional form." The latter call was never implemented in the research community.

"In an influential study, Dewald, Thursby, and Anderson (1986) attempt to replicate empirical results published at Journal of Money, Credit, and Banking [a top-tier journal], and find that inadvertent errors are so commonplace that the original results often cannot be reproduced."

"McCullough and Vinod (2003) report that nonlinear maximization routines from different software packages often produce very different estimates, and many articles published at American Economic Review [highest rated journal in economics] fail to test their solutions across different software packages."

"Chang and Li (2015) report a success rate of less than 50% from replicating 67 published papers from 13 economics journals, and Camerer et al. (2016) show a success rate of 61% from replicating 18 studies in experimental economics."

"Collecting more than 50,000 tests published in American Economic Review, Journal of Political Economy, and Quarterly Journal of Economics, [three top rated journals in economics] Brodeur, L´e, Sangnier, and Zylberberg (2016) document a troubling two-humped pattern of test statistics. The pattern features a first hump with high p-values, a sizeable under-representation of p-values just above 5%, and a second hump with p-values slightly below 5%. The evidence indicates p-hacking that authors search for specifications that deliver just-significant results and ignore those that give just-insignificant results to make their work more publishable."

If you think this phenomena is encountered only in economics and finance, think again. Here are some findings from other ' hard science' disciplines where, you know, lab coats do not lie.

"...replication failures have been widely documented across scientific disciplines in the past decade. Fanelli (2010) reports that “positive” results increase down the hierarchy of sciences, with hard sciences such as space science and physics at the top and soft sciences such as psychology, economics, and business at the bottom. In oncology, Prinz, Schlange, and Asadullah (2011) report that scientists at Bayer fail to reproduce two thirds of 67 published studies. Begley and Ellis (2012) report that scientists at Amgen attempt to replicate 53 landmark studies in cancer research, but reproduce the original results in only six. Freedman, Cockburn, and Simcoe (2015) estimate the economic costs of irreproducible preclinical studies amount to about 28 billion dollars in the U.S. alone. In psychology, Open Science Collaboration (2015), which consists of about 270 researchers, conducts replications of 100 studies published in top three academic journals, and reports a success rate of only 36%."

Let's get down to real farce: everyone in sciences knows the above: "Baker (2016) reports that 80% of the respondents in a survey of 1,576 scientists conducted by Nature believe that there exists a reproducibility crisis in the published scientific literature. The surveyed scientists cover diverse fields such as chemistry, biology, physics and engineering, medicine, earth sciences, and others. More than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than 50% have failed to reproduce their own experiments. Selective reporting, pressure to publish, and poor use of statistics are three leading causes."

Yeah, you get the idea: you need years of research, testing, re-testing and, more often then not, you get the results are not significant or weakly significant. Which means that after years of research you end up with unpublishable paper (no journal would welcome a paper without significant results, even though absence of evidence is as important in science as evidence of presence), no tenure, no job, no pension, no prospect of a career. So what do you do then? Ah, well... p-hack the shit out of data until the editor is happy and the referees are satisfied.

Which, for you, the reader, should mean the following: when we say that 'scientific research established fact A' based on reputable journals publishing high quality peer reviewed papers on the subject, know that around half of the findings claimed in these papers, on average, most likely cannot be replicated or verified. And then remember, it takes one or two scientists to turn the world around from believing (based on scientific consensus at the time) that the Earth is flat and is the centre of the Universe, to believing in the world as we know it to be today.


Full link to the paper: Charles A. Dice Center Working Paper No. 2017-10; Fisher College of Business Working Paper No. 2017-03-010. Available at SSRN: https://ssrn.com/abstract=2961979.

Friday, February 14, 2014

14/2/2014: Buffett's Alpha Demystified... or not?


Warren Buffett is probably the most legendary of all investors and his Berkshire Hathaway, despite numerous statements by Buffett explaining his investment philosophy, is still shrouded in a veil of mystery and magic.

The more you wonder about Buffett's fantastic historical track record, the more you ask whether the returns he amassed are a matter of luck, skill, unique strategy or all of the above.

"Buffett’s Alpha" by Andrea Frazzini, David Kabiller, and Lasse H. Pedersen (NBER Working Paper 19681 http://www.nber.org/papers/w19681, November 2013) shows that "looking at all U.S. stocks from 1926 to 2011 that have been traded for more than 30 years, …Berkshire Hathaway has the highest Sharpe ratio among all. Similarly, Buffett has a higher Sharpe ratio than all U.S. mutual funds that have been around for more than 30 years." In fact, for the period 1976-2011, Berkshire Hathaway realized Sharpe ratio stands at impressive 0.76, and "Berkshire has a significant alpha to traditional risk factors." According to the authors, "adjusting for the market exposure, Buffett’s information ratio is even lower, 0.66. This Sharpe ratio reflects high average returns, but also significant risk and periods of losses and significant drawdowns."

According to authors, this begs a question: "If his Sharpe ratio is very good but not super-human, then how did Buffett become among the richest in the world?"

The study looks at Buffett's performance and finds that "The answer is that Buffett has boosted his returns by using leverage, and that he has stuck to a good strategy for a very long time period, surviving rough periods where others might have been forced into a fire sale or a career shift. We estimate that Buffett applies a leverage of about 1.6-to-1, boosting both his risk and excess return in that proportion."

The conclusion is that "his many accomplishments include having the conviction, wherewithal, and skill to operate with leverage and significant risk over a number of decades."


But the above still leaves open a key question: "How does Buffett pick stocks to achieve this attractive return stream that can be leveraged?"

The authors "…identify several general features of his portfolio: He buys stocks that are
-- “safe” (with low beta and low volatility),
-- “cheap” (i.e., value stocks with low price-to-book ratios), and
-- high-quality (meaning stocks that profitable, stable, growing, and with high payout ratios).
This statistical finding is certainly consistent with Graham and Dodd (1934) and Buffett’s writings, e.g.: "Whether we’re talking about socks or stocks, I like buying quality merchandise when it is marked down"  – Warren Buffett, Berkshire Hathaway Inc., Annual Report, 2008."


Of course, such a strategy is not novel and Ben Graham's original factors for selection are very much in line with it, let alone more sophisticated screening factors. Everyone knows (whether they act on this knowledge or not is a different matter altogether) that low risk, cheap, and high quality stocks "tend to perform well in general, not just the ones that Buffett buys. Hence, perhaps these characteristics can explain Buffett’s investment? Or, is his performance driven by an idiosyncratic Buffett skill that cannot be quantified?"

The authors look at these questions as well. "The standard academic factors that capture the market, size, value, and momentum premia cannot explain Buffett’s performance so his success has to date been a mystery (Martin and Puthenpurackal (2008)). Given Buffett’s tendency to buy stocks with low return risk and low fundamental risk, we further adjust his performance for the Betting-Against-Beta (BAB) factor of Frazzini and Pedersen (2013) and the Quality Minus Junk (QMJ) factor of Asness, Frazzini, and Pedersen (2013)."

And then 'Eureka!': "We find that accounting for these factors explains a large part of Buffett's performance. In other words, accounting for the general tendency of high-quality, safe, and cheap stocks to outperform can explain much of Buffett’s performance and controlling for these factors makes Buffett’s alpha statistically insignificant… Buffett’s genius thus appears to be at least partly in recognizing early on, implicitly or explicitly, that these factors work, applying leverage without ever having to fire sale, and sticking to his principles. Perhaps this is what he means by his modest comment: "Ben Graham taught me 45 years ago that in investing it is not necessary to do extraordinary things to get extraordinary results." – Warren Buffett, Berkshire Hathaway Inc., Annual Report, 1994."


There is more to be asked about Warren Buffett's investment style and strategy. "…we consider whether Buffett’s skill is due to his ability to buy the right stocks versus his ability as a CEO. Said differently, is Buffett mainly an investor or a manager?"

Authors oblige: "To address this, we decompose Berkshire’s returns into a part due to investments in publicly traded stocks and another part due to private companies run within Berkshire. The idea is that the return of the public stocks is mainly driven by Buffett’s stock selection skill, whereas the private companies could also have a larger element of management."

Another 'Eureka!' moment beckons: "We find that both public and private companies contribute to Buffett’s performance, but the portfolio of public stocks performs the best, suggesting that Buffett’s skill is mostly in stock selection. Why then does Buffett rely heavily on private companies as well, including insurance and reinsurance businesses? One reason might be that this structure provides a steady source of financing, allowing him to leverage his stock selection ability. Indeed, we find that 36% of Buffett’s liabilities consist of insurance float with an average cost below the T-Bill rate.


So core conclusions on Buffett's genius: "In summary, we find that Buffett has developed a unique access to leverage that he has invested in safe, high-quality, cheap stocks and that these key characteristics can largely explain his impressive performance. Buffett’s unique access to leverage is consistent with the idea that he can earn BAB returns driven by other investors’ leverage constraints. Further, both value and quality predict returns and both are needed to explain Buffett’s performance. Buffett’s performance appears not to be luck, but an expression that value and quality investing can be implemented in an actual portfolio (although, of course, not by all investors who must collectively hold the market)."

Awesome study!