How to Rig Research: The WHO Edition
The Chloroquine Wars Part XXXIII
Fifty years ago today, the New York Times began publication of the Pentagon Papers, informing the public about how a government-engineered narrative controlled the public's perception of the Vietnam War.
When an organism gets invaded by a pathogenic force, its immune system works to restore order. In response to the Pentagon Papers leak, the Nixon administration employed the "White House Plumbers" to investigate, infiltrate, and guide the media in order to restore control over propaganda campaigns. Those efforts eventually led to the Watergate scandal as the Plumbers morphed into a partisan political tool at the hands of the executive branch.
Last night I talked with meteorologist and sports statistician Joe Giannotti on his podcast Joe's Place. We talked through some of the statistics used in biomedical testing, including the following topics:
The comparison of data from optimized versus suboptimal treatments on early COVID-19 treatment agents.
The strengths and limitations of randomized control trials (RCTs), including the fact that the results of RCTs converge with those of other evidence, and in particular their power relative to observational trials.
The often extreme distortions of effect sizes and odds ratio calculations due to the Simpson's paradox (or what I've redefined as "Simpson effects" since trend reversal may or may not occur), and in particular in relation to the WHO trials (RECOVERY and SOLIDARITY).
Note: I made the mistake of analyzing the results of the SOLIDARITY trial calling those the results of the RECOVERY trial. The results of the two trials were highly similar, but the sizes of the arms were different.
Here's the biggest punchline from that interview: The SOLIDARITY trial (and the nearly identical RECOVERY trial, which makes no sense given that neither was similar to common treatment protocols as most all doctors describe their use of HCQ) aggregated patients from hundreds of hospitals in dozens of nations. When correcting for the time-basis of the protocol dimensions, a true effect size of 86.5% is at least consistent with the data according to an aggregation of three distinct protocols (which match the fact that 60% of the RECOVERY trial patients were already on oxygen at the time treatment began).
To a quality statistician [whose livelihood does not depend on their stated answer], this chart is a smoking gun: The WHO trials absolutely did not demonstrate that HCQ is not an effective medication for COVID-19. In fact, a re-analysis at the protocol level of the raw data likely shows that it certainly does---even for late stage (moderate or severe) COVID-19 patients. We just need to know the precise disease progression curve (the percentages I used to the right of the patient numbers) in order to properly sort the patients into three protocol groups. Though the best evidence that HCQ works is still the early treatment data as per the Primary HCQ Hypothesis.
Now, I cannot say for sure that the WHO rigged their trials to suggest that HCQ failed to help patients, though I think there are a lot of reasons to believe those involved are disingenuous liars.
The trials tested an antiviral on late stage patients (median 9 days after symptom onset, which is already a median of 11.5 days after infection) when the SARS-CoV-2 virus has stopped replicating in most parts of the body and the disease state no longer depends much on viral replication (image thanks to Ben Althouse, PhD).
The trial designers ignored outside expert advice as to the absurdity of their protocols.
The trial designers discussed a 4.4g total dose of HCQ discussed as "undoubtedly high" after consulting pharmacokinetic reports (here, here, and here) that were then scrubbed from the WHO trial website before a 9.6g dose was used. That final dosage was not even made public, and the only way to find it was to look up documents on the trial made public by a few of the nations running the trial.
In my interview with Joe, I said that I wasn't sure if the high dosage made a big difference in the mortality results of the trial. I was speaking too quickly. The 2.4g loading dose used (first 24 hours total) has a toxicity similar to the point at which the WHO previously said in 1979 is "potentially fatal" and very well could have brought the sickest and most frail patients to a blood serum level similar to that of the bizarre trial in Brazil in which patients given high levels of chloroquine died at high rates. It may very well be that the dosage level is responsible for some number of the deaths. But my larger point was that the organization of the data at the protocol level makes a far larger difference when examining the potentially huge difference between the reported odds ratio and the true odds ratio.
Frankly, the trial looks more like something somebody might design, while sacrificing human subjects, in order to create data to be privately understood while creating a smoke screen for the larger public.