John wants a Rolex. He acknowledges that it’s a silly, expensive trinket and that his desire for it is kind of frivolous. But this is his one major indulgence in an otherwise sensible and frugal life. He’s just always wanted one and he insists that a small part of him will be fulfilled if he gets one. He buys his Rolex and tells you that he’s now satisfied.
Do you doubt John? Do you feel tempted to probe him and measure his happiness over time? Would you even think of doing a clinical trial, where half the users get a Rolex and half get nothing, and measuring the relative happiness of the two groups? If you did so and found no difference in happiness, would you conclude that Rolexes make no difference to anyone’s happiness? And would you think your conclusion had the patina of “science!” on it? Or would your common sense tell you that different people enjoy different things? It’s possible that one-in-a-hundred feels a real attachment to expensive watches and the other 99 don’t really care. Perhaps two-in-a-hundred feels a sense of guilt over the frivolity of an uber-expensive watch (do some googling for the price of a Rolex to see what we’re talking about here), such that their decrease in happiness overwhelms the increase from the one who enjoys the watch. (Yes, yes, “I’d sell the watch and buy something I actually want.” Please, no arbitrage arguments, because that turns this thought experiment into something it’s not.)
Okay, see where this is going? How about this one:
Jim feels depressed. He wants his psychiatrist to put him on an SSRI. He gets put on an SSRI and insists he feels better.
Do you doubt Jim because of studies that show no difference between the control group and the treatment group for SSRIs? Or do you admit to my point above, about different people responding differently to the same treatment? I like the idea of running clinical trials and studying the effects of medicine in a systematic way, but I have serious doubts about measuring subjective feelings. It’s not hard to understand that different people have different preferences for consumer goods. If you switched my shopping cart with a random person at the checkout lane, we’d both be very disappointed. Everyone understands this. We have heterogeneous tastes in consumer goods. When I’m putting things into my shopping cart, I’m mixing my own tonic that will improve my well-being. It would be utter nonsense for someone take the contents of my cart and give them to a treatment group, while simultaneously monitoring a control group who gets nothing. You just sort of have to trust my subjective judgment that “I want some Old Rasputin Imperial Stout and Hanes Premium boxer-briefs.” If you see that stuff in my cart, you presume that I’m satisfying a set of preferences that you can’t possibly observe, that I know better than anyone else. It’s no big mystery that there are numerous versions of every product, that any one product is purchased by a tiny minority of shoppers, or that any shopping cart with more than a few items is completely unique. Is it hard to believe that such heterogeneity of responses holds true for medicine? Does it rankle our feathers to admit that the effectiveness of some medicines might be beyond the grasp of science, as much so as is the “effectiveness of shopping carts?” My experience, having talked to people who are on lots of psychiatric medications, is that they have optimized their “shopping cart” over time after learning about their personal response to various mixes of drugs. A blog that I frequently read (Slate Star Codex), written by a psychiatrist, suggests something similar. The author frequently talks about how one patient might respond well to a given drug while others don’t, and I’m sure his experience is typical of the profession as a whole.
Try another one:
Gary has post-traumatic stress disorder. Gary says that smoking marijuana helps his post-traumatic stress disorder.
I think the only sensible option is to believe Gary. You could do a study and find out that there’s “no difference between the control group and the treatment group.” But maybe that’s because half of the treatment group gets really paranoid and feels worse while the other half improves, such that the overall magnitudes cancel out. In reality, everyone knows goddamn well whether they feel better or not. Everyone given the option of picking and fine-tuning their own treatment can make themselves feel better. The people who are made to feel worse simply stop smoking. These questions of subjective judgment are beyond the realm of science, because they depend on things that aren’t observable. I get very annoyed with people who insist that there’s “no science behind the claim” that marijuana is medicine. For one thing, it’s shown promise in treating objectively measureable problems, such as seizures. So the claim is untrue on its face. But just as importantly, many of the problems that marijuana treats are things that can’t be measured by science. Suppose somebody says, “I smoke marijuana and it makes me feel better. I feel better-rested, less anxious, less bothered by stress.” The decent thing to do is believe them. “I do X and it makes me feel better” is more akin to “I wanted a Rolex and getting one satisfied me” than to “snake-oil cured my cancer.” People who want a scientific answer to this kind of question are barking up the wrong tree.
I don’t want to overstate my point. You really can rule out the possibility that, say, vaccines cause autism. You can show that certain cancer drugs are extremely ineffective. You can demonstrate that a back surgery does not meaningfully affect back pain. There are some questions that randomized controlled trials can answer. But even so, this problem of heterogeneous response is lurking in the background. It really is possible that different people respond differently to the same cancer treatment, such that there is no good way of knowing which treatment is most appropriate to which person.
One might hope that we can get a handle on this homogeneous response problem by identifying what kinds of people respond well/poorly to which medicines. Perhaps some genetic marker or some physical trait makes you more receptive to certain kinds of drugs. And there is certainly some value in this; some genes have been identified that correspond to rapid/slow metabolism of certain drugs, such that the drugs might be ineffective or dangerous to people with those genes. (The textbook Karch’s Pathology of Drug Abuse, which I’ve blogged about before, mentions this genetic heterogeneity problem in practically every section.) But this problem may be very prone to overfitting. There are too many conceivable correlates to specify which one is responsible for good/bad responses to a drug. Supposing even that you have a large enough sample (say, thousands of people). If you have thousands of genes and physical/mental traits to test, some of them will correlate very well with outcomes just by sheer chance. We may eventually get a better handle on the problem. Principle component analysis and various clustering methods might reduce the number of correlates to a manageable few. A solid understanding of the chemical and physiological effects of a drug might inform our ideas of who will respond well or poorly. (For a trivial example: “This drug is hard on the liver, so it won’t be effective for people with cirrhosis.”) But no doubt it is a hard problem. The properties that correlate with treatment outcomes might not be observable in any obvious way.
I’m not preaching nihilism here. I’m not saying “…therefore we can’t know anything about anything.” I think this is actually another case in which radical uncertainty leads to libertarian conclusions. We should allow lots of experimentation with lots of different analytical methods. We shouldn’t try to shoehorn everything into the FDA’s “randomized controlled study” paradigm, because it simply isn’t appropriate for many kinds of medicine. Forget the idea that we’ll know the truth if only we have a big enough sample size. To answer the question of “which medicines are effective, and to whom,” we’re going to need to marshal different kinds of evidence from different kinds of sources. We need to consider our Bayesian priors and be open to the possibility that different priors will lead to different conclusions. When these conclusions differ, it means reasonable people can disagree about whether a given drug is effective, or whether the side-effects are worth the costs. Identifying good medicine is an iterative process. The current paradigm of banning everything that doesn’t pass some official review process is wrong-headed.