Hello there. I have a question about experimental control groups.

Lets say you're running an experiment to find out whether large doses of vitamin C increase rate of hair growth in monkeys - or some similar arbitrary question.

So you take 100 monkeys, divide them into a group of 50 who get lots of vitamin C, and a control group of 50 who get an otherwise identical diet, then compare how fast their hair grows over six months.

But...what's wrong with using a single group of monkeys at different times? Take 50 monkeys, measure their hair growth on a normal diet for six months, then switch to the enriched diet for another six, then back to the normal diet and see if there's a difference in the 'enriched' period.

I've heard it argued that the best control group is the same populatation - but if that's true, I'd expect researchers to use non-identical control groups a lot less.

So why do experimenters not re-use test subjects in this way? Is it just that the experiment takes longer? Or is it that there are too many unpredictable factors which could change at the same time as the diet, and add too much noise to the data? Or what?

There is nothing wrong with your idea, in fact it used in clinical testing very regularly. It is called a double blind placebo controlled cross-over study.

In this type of study a selected group of patients are given either a placebo or active substance for a given period of time then a wash-out period (depending on the half-life of the drug or treatment) and then the other treatment/placebo for the same period of time.

As you say the advantage of this type of study is that less subjects/patients are needed and the variation over time should be less since each person acts as their own control. It is still blinded since neither the patient nor observer knows which is placebo vs active arm of the study.

There are however a number of disadvantages of this sort of study as compared to the classic double blind study with two groups:-

1.    Long term effects cannot be tracked with this approach since the treatment period is fixed.
2.    Potentially curative therapies cannot be tested after one another or before a placebo.
3.    The order in which treatments are administered may affect the outcome. An example might be a drug with many adverse effects given first, making patients taking a second, less harmful medicine, more sensitive to any adverse effect.
4.    Potential issues of carry-over between treatments though this can be minimised by the use of a wash-out period between treatments. This becomes a major issue if the treatment effect is long-lived.

In summary, the cross-over design is usually used for short term proof-of-concept studies to prove a treatment works. It would usually not be suitable for a large regulatory study necessary before a drug gets a license from the licensing authorities (FDA or MHRA) to be used in a clinical setting.

For more information on this see:


Good to see people taking an interest in the world of medical testing, as I spend a lot of time railing against case-based studies and arguing with my friends who are medical doctors about the value of such work. For a lot more on the world of trials and statistics as applied to medicine Ben Goldacre's Bad Science book and website are excellent.


I would also recommend Dave Colquhuon's  website

Two points worth making from a broader statistical viewpoint.

1. Randomization is regarded as the key to proving causal relationships, not using the same individuals with both treatments. I suspect that there may be some confusion in this question between the statistical use of population (all of the individuals of interest) versus a sample (some subset of the population of interest that we can get results that are then generalizable across the whole population). Apologies if the original poster was clear about this. Differences in the use of the words 'population' and 'sample' are the source of endless confusion between statisticians and biologists.

2. I am not from a medical background, unlike David W, but I work with data that can loop to the same set of values in through time. A familiar example would be temperature through an annual cycle. If there were similar effects in a drug study (female menstrual cycle in a test using adult females as an example off the top of my head) I would be worried about such effects interfering with my analysis.

So I am glad that the randomized, double-blind control trial is still regarded as the only suitable test for licencing drugs, although I think the approach by the Cochrane Library, which uses metanalyses to boost sample sizes thus allowing quicker evaluation of the effect of drugs is brilliant.

They also have a part of their site that deals with assessing methodologies

One for the people who work on human subjects.

Are there any ethical problems with the crossover trials? If such a design meant that people who could have benefited from getting a drug were delayed in doing so, would this be ethical grounds for not permitting such a design to be used.

"Hope is a duty from which palaeontologists are exempt."
David Quammen

All good points and thanks Alistair

To address your last question: Cross over studies are usually relatively small and done by investigators in an academic setting. They are classed as Phase IIa or proof-of-concept but will then require much larger longer and more definitive studies often where the new drug/treatment is not only compared to a placebo but to the current or  an existing treatment. In some cases when these larger studies are underway and the primary end point is death or serious illness, then interim analyses are sometimes done so if a new treatment/drug works or saves lives then the study can be terminated early since it would be unethical to deny the placebo group that treatment. Caveat on that (as per one of Ben's excellent articles) a company would then have an incentive to declare the study a success and  stop early to ensure their drug gets as widely used and as quickly as possible. 

Lastly meta-analyses: The old computer adage holds true – “garbage in, garbage out”. Taking 20 small poorly conducted and badly controlled studies and then to try and retrospectively re-analyse them by lumping all the data together to get a larger more “statistically meaningful sample size” is fraught with problems. I am not saying all meta-analyses are wrong (the converse is true ie taking good studies and pooling them can be a very powerful tool), merely that one has to be VERY careful to look at the quality of the primary data.