A sketch of a layered solution to the interpersonal comparison problem
It's founded in psychometrics!
As always, this article is free. If you enjoy it, we ask that you share it in lieu of paying.
I can’t explain, the state that I’m in, the state of my heart, he was my best friend.
-Sufjan Stevens, Wasps of the Pallisades
My Ph.D. is partly about the interpersonal comparison problem. It’s a somewhat nerdy topic, often regarded as insoluble. After explaining it to you, I want to convince you that A) that it’s a really important question and that B) there is a solution that has been hiding in plain sight. What I also want to do is to write an essay on a technical philosophical problem in a very accessible way. There’s a great deal of mystery about what it is philosophers do, so I want to draw back the curtain- not by description but by example.
The “answer” takes a special form. I first propose a solution. I then say “well suppose you don’t accept this assumption in the solution, if you just accept this alternative, weaker assumption you can still get the result”. Then I weaken it again and so on.
It is my hope to show that, so long as you think that the branch of psychology known as psychometrics is broadly acceptable in its methodology- at least as provisional best practice- and you are willing to make some very weak additional assumptions, the interpersonal comparison problem is a solved one. Or rather, the interpersonal comparison problem is solved as a practical barrier to ethical inquiry that relies on interpersonal comparisons. There may still be lingering conceptual questions, but no one should hold these questions up as a reason not to use interpersonal comparisons in ethical inquiry- for example, in thinking about welfare economics.
But I’m getting ahead of myself. Here’s an example to introduce the problem we’re talking about. A hospital is running low on painkillers. There is only enough left for one patient. One has a headache secondary to a head cold. The other has dislocated their shoulder. Which should get the painkiller? Most of us will have zero problems coming to a resolution, the patient with a dislocated shoulder should get it. But what does it mean for one person to be in greater pain than another? How can we quantify pain in a way that can be compared across persons?
Even in the form, I have put it, this is not a purely hypothetical problem. I have worked in reception jobs in hospitals and watched medical staff make decisions about triage etc. partly on the basis of considerations of the degree of pain between individuals. If there’s no scientific way to make those judgments, that’s bad news. If those judgments are meaningless even in principle, that’s even worse.
What I’m speaking of is the problem of interpersonal comparison. It’s not just a philosopher’s question! In economics, at least since Lionel Robbin’s book on the subject, it has been something of an article of faith that interpersonal comparison is difficult, or fraught with difficulties, or perhaps is nothing but the representation of a decisionmaker’s own preferences over the tradeoff rate between people. On the basis of skepticism about interpersonal comparison, welfare economics became all about Pareto improvements and the Kaldor-Hicks criteria (if you don’t know what these mean, don’t worry, it won’t matter here). This, in my opinion, contributed to the perception of economics as an anti-egalitarian science. It may have even led policy in an anti-egalitarian direction.
Practical applications
The problem of interpersonal comparison is a very general one. Consider the field of effective altruism, which aims to maximize the good done with a given set of resources. Good in this field is often conceived of in terms of human (or animal) welfare. Comparing two proposals in terms of their effects on human welfare almost necessarily requires quantifying mental states.
Here’s why. Two of the most popular theories of what it means for a person’s life to go well are the hedonic theory, according to which one’s life goes well to the degree that there is a preponderance of pleasure over pain- and the preference satisfaction theory, according to which one’s life goes well to the degree that one gets what one wants. On both these theories of welfare, quantifying the benefits of different welfare improving programs will likely require comparing the intensities of different mental states across different people- pleasures, pains and wants.
There is a third theory of human welfare- the objective list theory- according to which a person’s welfare is constituted by the degree to which they have certain good things- like friendship, opportunities, security etc. This might seem to get us out of the problem of having to compare intensities of mental states, but really it doesn’t, because, in almost all plausible versions of this account, pleasure and desire satisfaction are important items on that list.
So we can’t really do effective altruism without some method- even if only an implicit one- of comparing the intensity of mental states between people.
Beyond effective altruism, consider also the problem of artificial intelligence alignment. Much research is happening at the moment on the question of how to define human ethical priorities formally in such a way that an intelligent machine could be instructed to respect them. On most accounts of ethics, part of our informal, everyday ethical calculus is making these interpersonal comparisons. A clearer understanding of how comparisons can be done in a principled way is thus necessary for AI alignment research.
Defining the problem
Attentive economists and philosophers might have noticed that I speak here of the interpersonal comparison problem. I do not speak of the interpersonal utility comparison problem which is its more common name. This is for two reasons.
The first is that utility is a poorly defined term. It is sometimes treated as synonymous with welfare or wellbeing and it is sometimes treated as synonymous with preference fulfillment (as in the Von-Neumann Morgenstern utility model). Even more confusingly, these two things- preference fulfillment and wellbeing- are sometimes treated as synonymous with each other and sometimes not.
The second is that there are interesting problems about comparing mental states that may not be directly related to utility at all. I might want to say “Bob is feeling angrier than Alice”, and although, of course, Bob’s degree of anger is related to both his utility and his welfare, on no definition whatsoever is it constitutive of it. The problem of how to compare Bob and Alice’s degree of anger, and the meaning of such comparisons, is an interesting problem in and of itself.
So what we’re really interested in comparing is the intensity of certain kinds of mental states between people. Exactly what is in this bundle of mental states is a little difficult to enumerate- but I would put forward, as a basic list:
Pleasures and pains
Desires and aversions
Emotions
There is no need to include beliefs at least on a certain definition of belief, as Bayesians have given us an adequate account of how to compare the strength of beliefs using betting behavior.
I call this category affective mental states because they all seem to have a tight conceptual link with motivation.
Empirical usefulness and psychometrics
Before certain complexities are added, I don’t really think that the interpersonal comparison problem is that difficult. Consider, what makes us think that we can compare temperatures between objects? We develop hypotheses about ways to measure temperatures, and how hot and cold certain things are. We find that using these hypotheses we can do empirical work- make predictions and so on. That’s really all it is.
Can a guess about the relative intensity of some affective mental state do empirical work? Can it helps us make true predictions, and not lead us too often to false ones? Yes!
There’s a whole science called psychometrics which makes estimates of the magnitude of various mental constructs, including, but not limited to, affective states of all the types we discussed above. Indeed, within psychometrics, there is a field of happiness studies, focused specifically on constructs like life satisfaction and happiness that many consider of one essence with welfare itself.
But these psychometric approaches were historically neglected by economists and philosophers working on the problem of interpersonal comparison. Indeed, psychometric approaches have often been neglected in general in these fields- though this is changing now- see the emerging field of happiness economics and the work of the philosopher Alexandrova.
There’s a philosopher called Angner who has been working on the differences between psychologists and economists in the measurement of welfare for a while. His thesis is that it comes down to different understandings in the theory of measurement. Psychologists use a more flexible, one might say, empiricist, approach to measurement called the psychometric approach, whereas economists prefer the representational theory of measurement, a more rationalist approach which is based on formal axiomatizations.
The way psychometrics and the psychometric theory of measurement operates is by assigning magnitudes to a person’s level of a construct through tests with standardised items (“Barry’s level of happiness is 7/10 whereas Alice’s level of happiness is 9/10”) and then using those assignments of numbers to make predictions.
Let’s say that we’re measuring happiness. We begin by creating a series of questions that we think, based on our understanding of happiness, should measure happiness.
E.g., rate the following propositions 1 to 5, with 1 being strongly disagree and 5 being strongly agree.
I am generally in a positive mood.
I feel good about life.
I am a happy person
We can see that the test has a certain plausibility because its questions are conceptually related to happiness (it has “face validity”). Even this alone gives us some basis for credence in the test as a measure of happiness.
Now we administer it to a bank of people, using it to assign estimated happiness scores. We first check to make sure that it is measuring something and doesn’t just represent random noise, this is to say we check the measure for reliability. There are a couple of different ways to do this, but one is to administer the same test to the same group of people with, say, an hour’s gap between, and check the correlation between the scores at T1 & T2.
Our next task is to check how well it is performing as a happiness estimator. We might use its estimated happiness scores in a regression model to predict results on other tests which measure similar things (we call this “convergent validity”). For example, we might correlate it against a preexisting test of, say, hopefulness. One interesting form of convergent validity is to compare first-person results with third-person results. Have Bob fill out the test, then have Bob’s roommate, Alice, fill out the test as if she were Bob and see what the correlation is.
Or we might use test results to predict a behavioral outcome like suicide rate or frequency of smiling. This is criterion validity of a type we might call behavioral validity.
We might also flip things around and see how well circumstances, like an unhappy breakup, can predict our assignment of scores via a test. This would be another example of criterion validity- I call this subtype situational validity.
We might even develop hypotheses about how our measure should be related to biology if it really does capture happiness. For example, we might check to see if it is inversely related to stress response hormones like cortisol. This would be another type of criterion validity we can call biological validity.
Thus psychometrics gives us a way to estimate the relative intensity of affective states. It then tests these estimates, seeing if they are borne out in behavior, environment, biology, peer opinion, and other tests. Through an iterating process of testing, theory development, and application psychometrics aim at better and better ways of assigning numbers to mental states in a way that is valid across persons. There’s a lot I haven’t gotten into here, including more details on the role of statistics- especially factor analysis and psychometrics is not a field without methodological controversy, but, overall, it seems psychometricians never got the memo about the impossibility of interpersonal comparison.
The escape route
To me it seems that psychometrics is measuring something interpersonally between people- its capacity for empirical success shows this. Thus, a skeptic of interpersonal comparison owes us an account of what psychometrics is and is not measuring if they are to maintain that interpersonal comparison for ethical purposes is impossible.
The most plausible approach here is to insist that there is a distinction between mental states conceived of in how we experience them and conceived of in how they influence our behavior. This probably seems very abstract, so let me explain.
Consider the concept of qualia. To introduce the idea of qualia consider Alice. Alice has spent her whole life seeing the color spectrum inverted. Her greens are reds. Her yellows are blues.
However, from a young age, she was taught language like everyone else, to associate words with colors she saw. Thus she calls her green experience of what we consider to be a red object “red”, just as we do. Presumably, no one will ever even know that Alice’s experiences are so very different from ours in this way. This “greenness of green” is what we call qualia, and though it seems immediately present in consciousness, it’s hard to imagine what difference it could make to behavior.
It might be a short step from admitting qualia to making interpersonal comparison impossible. Consider the feel of desire, of longing. Now imagine that all your longings and aversions were exactly twice as great. You might think that this would have notable impacts on your behavior- perhaps making you a more passionate person- but there is a strong argument that it wouldn’t. For example, your strengthened desire to act might be exactly counterbalanced by your increased laziness. Perhaps then, experience can be altered without a functional alteration in the case of affective states, if you scale them in proportion to each other.
So, in order to prevent the conclusion that psychometrics can be used for interpersonal comparison, what our interlocutor is aiming at is a bifurcation between the functional part of an affective state (which we will call an f-state) and the experiential part of an affective state identifiable with qualia (which we will call an e-state). It is acknowledged that psychometrics can measure and compare f-states, but e-states are more mysterious and inscrutable- hence thwarting efforts at interpersonal comparison. Remember those terms, e-state, and f-state, they’re going to keep coming up.
If you’re wondering why psychometrics can only measure f-states, remember that what psychometrics measures is behavior (even if it’s only question answering behavior), and that if it influences behavior, it’s part of the f-state.
There’s a further assumption here. The critic assumes that it’s these scientifically inscrutable, interpersonally incomparable e-states that matter for ethical purposes- it’s these states which comprise human welfare or suffering. If we acknowledged e-states existed but didn’t regard them as ethically important, they wouldn’t be troubling from the standpoint of ethical decision-making or policymaking. Thus while they would create difficulties for interpersonal comparison, they wouldn’t be difficulties of practical relevance.
Cutting the escape route off at the pass: functionalism
In the previous section, I explained a way out of the seemingly obvious conclusion that psychometrics enables interpersonal comparison. That escape route was to disentangle feeling and behavior in a particular way. In this section, I’m going to outline a counterargument against this “escape route”.
Functionalism is a view in the philosophy of mind about what the mind is. It might be best to explain it by way of comparison to analytical behaviorism because it can be seen as a more evolved version of that doctrine.
Analytic behaviorism, a now almost extinct view in the philosophy of mind, held as follows. Let’s say you are angry. That anger is constituted by certain behaviors and behavioral tendencies. For example, you may raise your volume, tend to act destructively and rashly, become flushed in the face, etc. Those behaviors and behavioral tendencies are your anger. Analytical behaviorism has the advantage of being a purely physical view of what the mind is, but it has disadvantages. For example, we generally think that your anger causes you to raise your voice. But if your anger is partially constituted by your tendency to raise your voice, it’s not really accurate to say that your anger has caused you to raise your voice.
The functionalist has a solution to these and many other problems of analytical behaviorism. What if your anger is whatever arrangement of your central nervous system that causes you to behave in an angry way? This keeps a tight conceptual connection between behavior and mental states while making sense of our ordinary intuition that mental states cause behavior.
Functionalism abolishes the possibility of a residual unobservable difference in mental states by holding that e-states separate from f-states don’t exist.
There are many good arguments for this kind of functionalism that denies there are separate f-states and e-states. Consider, for example, that if e-states truly are separate from f-states, they have no influence on behavior since f-states can include anything that has an effect on behavior. The theory then faces a problem why are we talking about e-states if they have no influence on behavior? (For those interested, this objection mirrors a classical objection to epiphenomenalism in the philosophy of mind).
So if you accept functionalism, your confidence in psychometrics as a yardstick of interpersonal comparison will once again be restored.
Epistemic functionalism
But okay, okay, I’ll admit, not everyone is going to be persuaded by my hardline view that all affective states are functional and contain no non-functional components, but I can sweeten the pot, or rather, remove a lot of the vinegar.
We can weaken functionalism considerably from a claim about how things are to a claim about what it is reasonable to believe (an epistemic claim). If functionalism is the principle that it is a metaphysical truth that no functional differences=no mental differences, epistemic functionalism is the view that it is at least reasonable to assume that there are no mental differences where there are no functional differences unless shown otherwise. Epistemic functionalism is a weaker claim, functionalism implies epistemic functionalism but not vice-versa. By making our premises weaker while still trying to reach the same conclusion (a common strategy in philosophy), we’re trying to make an argument that’s appealing to a broader circle.
To further explain epistemic functionalism, let’s go back to the example of color experiences (even though it’s not strictly related to the problem we’re considering here). It could be that you see green where I see red and vice versa, but until someone comes up with evidence of that, it’s not irrational to think that your green is much like mine and vice versa. connecting it to our topic, perhaps it is possible that all your emotions or all your desires are on a different scale to mine, but epistemic functionalism suggests that we can reasonably assume they are similar in the absence of contradictory evidence.
I’m going to label the rest of this essay as an appendix because it gets more complicated from here on out, and I think that for many people the arguments I have made thus far will go through. Nonetheless, keep reading if you want to learn how we can weaken the assumptions we’ve made even further.
APPENDIX: TWO EXTRA ARGUMENTS
If even still you don’t accept this?: Unbiased estimator functionalism
I find the argument so far persuasive as a solution to the interpersonal utility comparison problem. I’m a functionalist. I think mental states are definable in terms of functional relationships with behavior, and hence are fully psychometrically measurable. Even if I weren’t a functionalist, I would find epistemic functionalism or the view that it’s reasonable to assume that similar f-states equals similar e-states in the absence of contradictory evidence persuasive.
However, I think we can add another layer of “even if”. Even if you find all of the above reasoning unpersuasive, an old argument called the equal ignorance argument, combined with an even weaker form of epistemic functionalism that I call unbiased estimator functionalism, might still go through.
Unbiased estimator functionalism: The equal ignorance principle
Let’s suppose that earning another dollar always makes you better off- this means that your utility function is strictly increasing in dollars. Let’s further suppose that, despite this, each additional dollar is worth less to you than each previous dollar- this means that your utility function is concave in dollars. Perhaps your interest in dollars looks something like this:
Now let’s suppose that a decision-maker knows that every single person in the population has these features- utility which is both strictly increasing and concave in dollars. However, the decisionmaker has no further information on the utility functions of the population- their shape or magnitude. The decision-maker has a pool of money that they want to dole out to the population, how should they divide it?
You may be able to see the answer- which Abba Lerner proved mathematically- intuitively. The decision-maker should split the money equally.
But the equal ignorance theorem suffers from a problem if it is meant to apply to real life. Arguably we are not totally ignorant about the scale of other people’s utility functions. For example, arguably we have good reason to think that, on average, rich people like money more than poor people because all other things being equal, the person who likes X more will have more of it. We might also have other reasons to think there are differences. For example, the rich might be more habituated to their wealth- this could create in them a greater need for it or might mean that it is largely wasted on them.
Unbiased estimator functionalism: The claim
By an unbiased estimator here we mean something slightly different from its normal usage. We mean an estimator that, in expectation, is not biased towards higher or lower values for any person or type of person. We still may not trust the answers it gives, we may think it tends to be wildly inaccurate- but not in the form of a known bias.
Let’s say you have two friends named Alice and Bob. If you tell either of those friends a story about someone, they will then guess that person’s height (it’s very irritating). They are both very bad at it and are on average wrong by two feet. However, when Bob makes predictions, he tends to overestimate the height of men, and underestimate the height of women. Alice has no specific tendency towards underestimating or overestimating the heights of men or women. She’s just all over the shop. Alice is an unbiased estimator.
Here’s where unbiased estimator functionalism comes in. According to unbiased estimator functionalism, a person’s functional state can be used as an unbiased estimator of the intensity of their affective states. F-state is an unbiased estimator of e-state.
Glossing quite a bit, the argument from this to interpersonal comparison goes as follows. Let us suppose that we think f-states are measurable and interpersonally comparable, but that e-states are what matters in ethical terms.
If we accept that f-states are unbiased estimators of e-states, even if we don’t think they’re necessarily very accurate estimators, and that we have no further information about the relative magnitudes of e-states, then, using reasoning like that involved in the equal ignorance argument, we can derive the conclusion that our estimates of e-states based on f-states should guide our ethical behavior. We may be wrong, but our wrongness has no tendency to go in a specific direction and we have no further information so we can’t do better than just using f-states as an estimate of e-states.
One final even if
Maybe you’ve read through all the above and you find yourself radically uncertain about whether or not the arguments I have given “go through” in preserving interpersonal comparisons. I think the argument has been compelling, but I get stuff wrong all the time. Well, I have one final pitch for you, a pitch to commonsense.
Denying interpersonal comparisons is a kind of skepticism. There are all sorts of arguments for all sorts of forms of skepticism in philosophy. For example, some people think that we should be skeptics about induction- or making inferences about the future on the basis of past observations. These people point out that the principle of induction- that it is legitimate to make such inferences- is itself undefended and it’s not good enough to say that it must be true because it’s worked in the past, because that’s circular!
Generally speaking, when we run into a theoretical argument for skepticism in philosophy, we don’t allow it to stop actual everyday and scientific inquiry. We assume the philosophers will work it out someday, and even if they don’t, we likely keep going all the same.
We have a practically applicable method for making interpersonal comparisons of affective states (psychometrics) that lines up with commonsense ideas about how to measure a variable between instances, and how to validate that measure. Even if you believe that there are philosophical reasons to be skeptical of psychometrics as a strategy for interpersonal comparison, I propose we should keep using it in the interim, much as we keep assuming the future will resemble the past, other people have minds, there is an external world, and so on.