How to Understand Scientific Studies

From John Pohl’s Twitter Feed:

Twenty tips for interpreting scientific claims http://bit.ly/1hY3nD5. Referenced article: Nature 503, 335–337 (21 November 2013) doi:10.1038/503335a

An excerpt:

we suggest 20 concepts that should be part of the education of civil servants, politicians, policy advisers and journalists — and anyone else who may have to interact with science or scientists. Politicians with a healthy scepticism of scientific advocates might simply prefer to arm themselves with this critical set of knowledge…

Differences and chance cause variation…

No measurement is exact. Practically all measurements have some error…Results should be presented with a precision that is appropriate for the associated error, to avoid implying an unjustified degree of accuracy…

Bias is rife. Experimental design or measuring devices may produce atypical results in a given direction….

Bigger is usually better for sample size…

Correlation does not imply causation. It is tempting to assume that one pattern causes another. However, the correlation might be coincidental, or it might be a result of both patterns being caused by a third factor — a ‘confounding’ or ‘lurking’ variable…

Regression to the mean can mislead. Extreme patterns in data are likely to be, at least in part, anomalies attributable to chance or error…

Extrapolating beyond the data is risky. Patterns found within a given range do not necessarily apply outside that range…

Beware the base-rate fallacy. The ability of an imperfect test to identify a condition depends upon the likelihood of that condition occurring (the base rate). For example, a person might have a blood test that is ‘99% accurate’ for a rare disease and test positive, yet they might be unlikely to have the disease. If 10,001 people have the test, of whom just one has the disease, that person will almost certainly have a positive test, but so too will a further 100 people (1%) even though they do not have the disease.

Controls are important. A control group is dealt with in exactly the same way as the experimental group, except that the treatment is not applied. Without a control, it is difficult to determine whether a given treatment really had an effect…

Randomization avoids bias. Experiments should, wherever possible, allocate individuals or groups to interventions randomly…

Seek replication, not pseudoreplication. Results consistent across many studies, replicated on independent populations, are more likely to be solid…

Scientists are human. Scientists have a vested interest in promoting their work, often for status and further research funding, although sometimes for direct financial gain. This can lead to selective reporting of results and occasionally, exaggeration. Peer review is not infallible: journal editors might favour positive findings and newsworthiness. Multiple, independent sources of evidence and replication are much more convincing.

Significance is significant. Expressed as P, statistical significance is a measure of how likely a result is to occur by chance. Thus P = 0.01 means there is a 1-in-100 probability that what looks like an effect of the treatment could have occurred randomly, and in truth there was no effect at all. Typically, scientists report results as significant when the P-value of the test is less than 0.05 (1 in 20).

Separate no effect from non-significance. The lack of a statistically significant result (say a P-value > 0.05) does not mean that there was no underlying effect: it means that no effect was detected. A small study may not have the power to detect a real difference…

Effect size matters. Small responses are less likely to be detected…

Study relevance limits generalizations. The relevance of a study depends on how much the conditions under which it is done resemble the conditions of the issue under consideration…

Feelings influence risk perception. Broadly, risk can be thought of as the likelihood of an event occurring in some time frame, multiplied by the consequences should the event occur…

Dependencies change the risks. It is possible to calculate the consequences of individual events, such as an extreme tide, heavy rainfall and key workers being absent. However, if the events are interrelated, (for example a storm causes a high tide, or heavy rain prevents workers from accessing the site) then the probability of their co-occurrence is much higher than might be expected…

Data can be dredged or cherry picked. Evidence can be arranged to support one point of view…

Extreme measurements may mislead…

Comment: This is a really good reference to provide context for understanding scientific studies and sources of error.