Friday, February 24, 2017

How Bad "Science" Happens

The BBC reports scientists have difficulty replicating the findings of their peers. Explanation of what constitutes "replication": read the study, follow the exact same methodology using the same measures, calculate the results using the same statistical methods. Supposedly find the same results.

The problem? Much too often, replication attempts fail - they don't get the same results. How can this be true, is it crookedness? Sure, sometime it is simple cheating, fudging the data to get the desired results.

Most of the time it is probably not dishonest. Let me explain two ways it happens. Begin with the understanding that the only research which gets published is that which finds "positive results." In other words, that finds the hypothesized results.

So-called "negative results" don't get published, as a grad school mentor explained to me, because you can't tell whether the failure to find hypothesized results really means the hypotheses are false or that in some way the methods used to measure them were faulty.

So ... scientists run many more experiments than ever get published, because they often find no support for their hypotheses. Lots of studies find not much of anything, and get tossed.

The way we know if differences are "real" is if statistically they are strong enough to be likely to occur by chance only one time in twenty. Another way of saying this is that every 20 studies which tested nonsense hypotheses produce perhaps one with results that look good, that appear to prove the bad hypotheses.

Guess how many of these supposedly "good" (but actually bad) findings are submitted for publication. Answer: nearly all. It is likely they cannot be replicated.

Remember the old comment about a zillion monkeys pounding a zillion typewriters? Somewhere one will reproduce significant chunks of Shakespeare quite by accident. Does that mean the lucky monkey is a genius? Not nearly.

The other problem leading to non-replicability is data mining. Data mining happens when researchers gather a wide range of data, skim through it looking for "significant" relationships between variables.

They post hoc dream up hypotheses which those significant relationships might explain. Except when they write it up, they claim to have had the hypotheses first and then tested them, finding the results reported in the write-up. Unfortunately, significant relationships can occur randomly, or result from obscure causes never imagined by those writing up the "science."