Much of the current fascination with behavioural economics is well deserved - the field is a tremendously important merger of psychology and economics, bringing economic research and analysis down to the granular level of human behaviour. However, much of it is also a fad - behavioural economics provide a convenient avenue for advertising companies, digital marketing agencies, digital platforms providers and aggregators, as well as congestion-pricing and Gig-Economy firms to milk strategies for revenue raising that are anchored in common sense. In other words, much of behavioural economics use in real business (and in Government) is about convenient plucking out of strategy-confirming results. It is marketing, not analysis.
A lot of this plucking relies on empirically-derived insights from behavioural economics, which, in turn, often rely on experimental evidence. Now, experimental evidence in economics is very often dodgy by design: you can’t compel people to act, so you have to incentivise them; you can quite select a representative group, so you assemble a ‘proximate’ group, and so on. Imagine you want to study intervention effects on a group of C-level executives. Good luck getting actual executives to participate in your study and good luck getting selection biases sorted out in analysing the results. Still, experimental economics continues to gain prominence, as a backing for behavioural economics. A still, companies and governments spend millions on funding such research.
Now, not all experiments are poorly structured and not all evidence derived from is dodgy. So to alleviate nagging suspicion as to how much error is carried in experiments, a recent paper by Alwyn Young of London School of Economics, titled “Channelling Fisher: Randomization Tests and the Statistical Insignificance of Seemingly Significant Experimental Results” (http://personal.lse.ac.uk/YoungA/ChannellingFisher.pdf) used “randomization statistical inference to test the null hypothesis of no treatment effect in a comprehensive sample of 2003 regressions in 53 experimental papers drawn from the journals of the American Economic Association.”
The attempt is pretty darn good. The study uses robust methodology to test a statistically valid hypothesis: has there been a statically significant result derived in the studies arising from experimental treatment or not? The paper tests a large sample of studies published (having gone through peer and editorial reviews) in perhaps the most reputable economics journals. This is creme-de-la-creme of economics studies.
The findings, to put this scientifically: “Randomization tests reduce the number of regression specifications with statistically significant treatment effects by 30 to 40 percent. An omnibus randomization test of overall experimental significance that incorporates all of the regressions in each paper finds that only 25 to 50 percent of experimental papers, depending upon the significance level and test, are able to reject the null of no treatment effect whatsoever. Bootstrap methods support and confirm these results. “
In other words, in majority of studies claiming to have achieved statistically significant results from experimental evidence, such results were not really statistically significantly attributable to experiments.
Now, the author is cautious in his conclusions. “Notwithstanding its results, this paper confirms the value of randomized experiments. The methods used by authors of experimental papers are standard in the profession and present throughout its journals. Randomized statistical inference provides a solution to the problems and biases identified in this paper. While, to date, it rarely appears in experimental papers, which generally rely upon traditional econometric methods, it can easily be incorporated into their analysis. Thus, randomized experiments can solve both the problem of identification and the problem of accurate statistical inference, making them doubly reliable as an investigative tool. “
But this is hogwash. The results of the study effectively tell us that large (huge) proportion of papers on experimental economics published in the most reputable journals have claimed significant results attributable to experiments where no such significance really was present. Worse, the methods that delivered these false significance results “are standard in the profession”.
Now, consider the even more obvious: these are academic papers, written by highly skilled (in econometrics, data collection and experiment design) authors. Imagine what drivel passes for experimental analysis coming out of marketing and surveying companies? Imagine what passes for policy analysis coming out of public sector outfits? Without peer reviews and without cross-checks like those performed by Young?