Showing posts with label scientific publishing. Show all posts
Showing posts with label scientific publishing. Show all posts

Friday, June 16, 2017

16/6/17: Replicating Scientific Research: Ugly Truth


Continuing with the theme on 'What I've been reading lately?', here is a smashing paper on 'accuracy' of empirical economic studies.

The paper, authored by Hou, Kewei and Xue, Chen and Zhang, Lu, and titled "Replicating Anomalies" (most recent version is from June 12, 2017, but it is also available in an earlier version via NBER) effectively blows a whistle on what is going on in empirical research in economics and finance. Per authors, the vast literature that detects financial markets anomalies (or deviations away from the efficient markets hypothesis / economic rationality) "is infested with widespread p-hacking".

What's p-hacking? Well, it's a shady practice whereby researchers manipulate (by selective inclusion or exclusion) sample criteria (which data points to exclude from estimation) and test procedures (including model specifications and selective reporting of favourable test results), until insignificant results become significant. In other words, under p-hacking, researchers attempt to superficially maximise model and explanatory variables significance, or, put differently, they attempt to achieve results that confirm their intuition or biases.

What's anomalies? Anomalies are departures in the markets (e.g. in share prices) from the predictions generated by the models consistent with rational expectations and the efficient markets hypothesis. In other words, anomalies occur when markets efficiency fails.

There are scores of anomalies detected in the academic literature, prompting many researchers to advocate abandonment (in all its forms, weak and strong) of the idea that markets are efficient.

Hou, Xue and Zhang take these anomalies to the test. The compile "a large data library with 447 anomalies". The authors then control for a key problem with data across many studies: microcaps. Microcaps - or small capitalization firms - are numerous in the markets (accounting for roughly 60% of all stocks), but represent only 3% of total market capitalization. This is true for key markets, such as NYSE, Amex and NASDAQ. Yet, as authors note, evidence shows that microcaps "not only have the highest equal-weighted returns, but also the largest cross-sectional standard deviations in returns and anomaly variables among microcaps, small stocks, and big stocks." In other words, these are higher risk, higher return class of securities. Yet, despite this, "many studies overweight microcaps with equal-weighted returns, and often together with NYSE-Amex-NASDAQ breakpoints, in portfolio sorts." Worse, many (hundreds) of studies use 1970s regression technique that actually assigns more weight to microcaps. In simple terms, microcaps are the most common outlier and despite this they are given either same weight in analysis as non-outliers or their weight is actually elevated relative to normal assets, despite the fact that microcaps have little meaning in driving the actual markets (their weight in the total market is just about 3% in total).

So the study corrects for these problems and finds that, once microcaps are accounted for, the grand total of 286 anomalies (64% of all anomalies studied), and under more strict statistical signifcance test 380 (of 85% of all anomalies) "including 95 out of 102 liquidity variables (93%) are insignificant at the 5% level." In other words, the original studies claims that these anomalies were significant enough to warrant rejection of markets efficiency were not true when one recognizes one basic and simple problem with the data. Worse, per authors, "even for the 161 significant anomalies, their magnitudes are often much lower than originally reported. Among the 161, the q-factor model leaves 115 alphas insignificant (150 with t < 3)."

This is pretty damning for those of us who believe, based on empirical results published over the years, that markets are bounded-efficient, and it is outright savaging for those who claim that markets are perfectly inefficient. But, this tendency of researchers to silverplate statistics is hardly new.

Hou, Xue and Zhang provide a nice summary of research on p-hacking and non-replicability of statistical results across a range of fields. It is worth reading, because it dents significantly ones confidence in the quality of peer review and the quality of scientific research.

As the authors note, "in economics, Leamer (1983) exposes the fragility of empirical results to small specification changes, and proposes to “take the con out of econometrics” by reporting extensive sensitivity analysis to show how key results vary with perturbations in regression specification and in functional form." The latter call was never implemented in the research community.

"In an influential study, Dewald, Thursby, and Anderson (1986) attempt to replicate empirical results published at Journal of Money, Credit, and Banking [a top-tier journal], and find that inadvertent errors are so commonplace that the original results often cannot be reproduced."

"McCullough and Vinod (2003) report that nonlinear maximization routines from different software packages often produce very different estimates, and many articles published at American Economic Review [highest rated journal in economics] fail to test their solutions across different software packages."

"Chang and Li (2015) report a success rate of less than 50% from replicating 67 published papers from 13 economics journals, and Camerer et al. (2016) show a success rate of 61% from replicating 18 studies in experimental economics."

"Collecting more than 50,000 tests published in American Economic Review, Journal of Political Economy, and Quarterly Journal of Economics, [three top rated journals in economics] Brodeur, L´e, Sangnier, and Zylberberg (2016) document a troubling two-humped pattern of test statistics. The pattern features a first hump with high p-values, a sizeable under-representation of p-values just above 5%, and a second hump with p-values slightly below 5%. The evidence indicates p-hacking that authors search for specifications that deliver just-significant results and ignore those that give just-insignificant results to make their work more publishable."

If you think this phenomena is encountered only in economics and finance, think again. Here are some findings from other ' hard science' disciplines where, you know, lab coats do not lie.

"...replication failures have been widely documented across scientific disciplines in the past decade. Fanelli (2010) reports that “positive” results increase down the hierarchy of sciences, with hard sciences such as space science and physics at the top and soft sciences such as psychology, economics, and business at the bottom. In oncology, Prinz, Schlange, and Asadullah (2011) report that scientists at Bayer fail to reproduce two thirds of 67 published studies. Begley and Ellis (2012) report that scientists at Amgen attempt to replicate 53 landmark studies in cancer research, but reproduce the original results in only six. Freedman, Cockburn, and Simcoe (2015) estimate the economic costs of irreproducible preclinical studies amount to about 28 billion dollars in the U.S. alone. In psychology, Open Science Collaboration (2015), which consists of about 270 researchers, conducts replications of 100 studies published in top three academic journals, and reports a success rate of only 36%."

Let's get down to real farce: everyone in sciences knows the above: "Baker (2016) reports that 80% of the respondents in a survey of 1,576 scientists conducted by Nature believe that there exists a reproducibility crisis in the published scientific literature. The surveyed scientists cover diverse fields such as chemistry, biology, physics and engineering, medicine, earth sciences, and others. More than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than 50% have failed to reproduce their own experiments. Selective reporting, pressure to publish, and poor use of statistics are three leading causes."

Yeah, you get the idea: you need years of research, testing, re-testing and, more often then not, you get the results are not significant or weakly significant. Which means that after years of research you end up with unpublishable paper (no journal would welcome a paper without significant results, even though absence of evidence is as important in science as evidence of presence), no tenure, no job, no pension, no prospect of a career. So what do you do then? Ah, well... p-hack the shit out of data until the editor is happy and the referees are satisfied.

Which, for you, the reader, should mean the following: when we say that 'scientific research established fact A' based on reputable journals publishing high quality peer reviewed papers on the subject, know that around half of the findings claimed in these papers, on average, most likely cannot be replicated or verified. And then remember, it takes one or two scientists to turn the world around from believing (based on scientific consensus at the time) that the Earth is flat and is the centre of the Universe, to believing in the world as we know it to be today.


Full link to the paper: Charles A. Dice Center Working Paper No. 2017-10; Fisher College of Business Working Paper No. 2017-03-010. Available at SSRN: https://ssrn.com/abstract=2961979.

Saturday, July 20, 2013

20/7/2013: WLASze Part 2: Weekend Links on Arts, Sciences and zero economics

The second part of my regular WLASze (Weekly Links on Arts, Sciences and zero economics)... enjoy!

Part one is available here.



Let's start the second WLASze for the weekend we are in from science. The cognitive science to be more precise. Basically, in a summary, there's a myth that once we hit twenties, we are already matured, formed and, although conditions and our responses to them do change, we are basically 'emotional intelligence'-wise pre-determined. I am not so sure if my own recollection of my twenties supports this myth, but someone, somewhere, in large enough numbers believes it to be true. 

It turns out this is not the case (which makes me at my 45 at last being on the sane side of an argument about my own twenties). And here's an article arguing the point: "The brain is going through a second critical period of growth," she explained. "The brain doesn't finish developing until some time in your twentysomething years. Being more specific, the pre-frontal cortex doesn't reach maturation until some time in your twenties. This is the last part of the brain to have evolved; it's the last part of the brain to mature. For our purposes, what's important to know about the pre-frontal cortext is that this is the part of the brain that thinks about time, probability, and uncertainty."


Enough said. And a H/T @raluca3000 for digging the article up...


PS: I have no idea who the Girls are, but they look like something of a horror flick, where a bunch of giggly cheerleaders are about to be terrorised by a crazed alien that emerges from their mom's smile...


With alien worlds, then, here's a tale of a speedy demon: basically, someone digging through old data from that relic of the technology past that keeps on ticking - the Hubble Telescope - has spotted a little dot - a new Moon of Neptune. Quote d'resistance: “This is a moon that never sits still long enough to get its picture taken”. The thing flies around at a speed of ca 16,174 miles per hour. 

Staying with the theme of speed: ArsTechnica reports about the black hole that sucks gases at a speed of 10 million kmph or 6.21 million mph or 384 times faster than the Neptune's newest moon moves at. For those old enough to remember Ross Perot (no, not Hercule Poirot) can certainly see now where his famous reference about the 'giant sucking sound from the South' coined 21 years ago has some tangible traction... No, not in Texas, yet...



Shifting the gears from pure science (no, not Ross) to a grey area between science and arts: amasing visualisation of numbers properties: here is visualisation of π, φ and e: http://mkweb.bcgsc.ca/pi/art/


impressive visualisation h/t to Brian O' Hanlon and his comment to last week's WLASze. 

And while on the topic: progression and transition for the first 2,000 digits of e:



I have always argued that:

  1. Mathematics is a part of Art,
  2. Art is the most powerful tool of inquiry available to the (wo)mankind, and
  3. Physical sciences (beyond theory) can only aspire to possess the power of Art

Need more evidence? The above was a trip from math to art. Now, from art to math by Roman Opalka:





Moving on from the methodologised (or theorised) madness of subtle beauty, but staying with the boundary between art and science theme, here's an interesting post on the evolution of typography and design in scientific publishing. Here's an oldest (albeit not the best designed) academic journal:


Although the French as usual claim the whole thing to be their own invention (they beat the Brits to it by 3 months) with this


Thankfully, we don't have to fight this one, though the French design definitely beats the UK dysfunctional plain-face approach to jumbling together a page of text made up of some 10 fonts and about as many font sizes... 


More on history, this time - a new discovery from the Mayan civilisation. The discovery relates the tales of political battles that raged in the Dark Period (dark because we know little about it, although the entire Mayan civilisation was not exactly 'light' when it came to ethics, but...). This dates back to AD 550-560s, as my reading of the article suggests and gives us the names of two kings we didn't know about... Meanwhile in Europe Justinian's boys smuggle contraband silkworms from Asia and Black Death is all the rage across the continent... Also, rather not very light-filled years...


Silkworm was smuggled from China back in AD 553. In return, we brought Chinese art back into the fold of 'thinking art' (away from pure propaganda utilitarianism) ca AD 1980s (yep, it took that long and even as late as 1989, the Chinese Communist Party was not too keen on modern art, especially when the bosses shut down the first modern art exhibition held in China in February 1989). But as with silkworm taking hold in Europe, it will take time for art to take hold in China, although the country art scene has been hugely dynamic and original. The reason for it is that we are now into the early stages of the second generation of Chinese (resident) artists that have any capacity to think beyond the constraints of the limited vocabulary and philosophy of Communist art (Socialist Realism). 

To see this, go no further than this example of a superb online flip book of contemporary Chinese artists in Paris: http://flipbook.kohn.fr/private-sale_chine-a-paris/ Much of this is 'soft' - excitingly interesting for its novelty and naivety factors, but conceptually and artistically boring. Take numbers 10 and 11 - iconoclasm does not work in Western art context. 

Not since we broke the taboos of strictly dogmatic interpretation of the subject of art as drivers of form - the school of thought that dominated pre-Rinascimento and then occasionally re-floated under various political regimes throughout the ages, including in the 1930s-40s in fascist states and subsequently in the Warsaw Pact (plus Yugoslavia and Albania). Stuff like the above is now mostly kitsch, unless it has a historical (as opposed to artistic) value. Don't tell the fans of late (post-abstract minimalist) works of Jeff Koons:


Efforts at abstract art as well as reinterpreted traditionalist expressions represented in the e-book on Chinese art in Paris remind me of the period in Russian art around 1988-1998 when Russian artists raced to catch up with the Western vocabulary, philosophy, composition and theory, and semiotics. This process in Russian art is now exhausted, largely, although the market for Russian art still shows strong interest in that expressionist nostalgia for preservation of any departure from the past norm, even if that departure relies on the very same norm for juxtaposition-defined raison d'etre.

The entire book left me in a strange state: I would not want to hold a single work in my collection, but I would not be averse to holding many works, were I to end up with them in my collection… Strange? Try not to think too hard… the e-book is lovely... just lovely... just...


Stay tuned for Part 3 of WLASze coming up later tonight.