Economists rediscover the Likert Scale? A reply to "The Scientific Value of Numerical Measures of Human Feelings"
Brief overview
As the authors put it: “It examined the relationship between get-me-out-of-here “exit” behavior and satisfaction in four domains: housing, intimate partnerships, jobs, and health.”
Satisfaction was rated by Likert scale. Despite being, fundamentally, an exploration of whether Likert scales work, the paper curiously did not have the word “Likert” anywhere in it. People on Twitter found this paper very funny because the predictive power of Likert scales is widely accepted so it’s hard to see what the novel contribution of the paper is.
The authors were economists, and economists have often been leery of these scales, so some took it as a sign of the intellectual arrogance of economists vis a vis other social science, not taking anything seriously unless they think their field invented it.
The paper presented itself as grappling with this traditional skepticism by economists towards Likert scales (the authors inexplicably call the output of Likert scales “feelings integers)”, and trying to present hard data proving these scales work.
Some are comparing it to the infamous rediscovery of calculus by a group of diabetes researchers.
Preliminary thoughts
PNAS is a journal that I don’t take very seriously. Every time I hear about it, it’s getting in trouble for something. Plus they broke my heart with this paper when I was young and naive- this was before the replication crisis, and I genuinely thought it was going to change the world, I spearheaded some research on it, and…
Also the name, PNAS, when said out loud, sounds like…
AHEM
The scientific value of numerical measures of human feelings- a bold title, to say the least. It’s not uncommon to promise a fair bit more with your title than you can deliver, and I respect that. However, I would have thought that a paper with a title like this would be a review or meta-analysis in light of the enormous amount of work already done on this subject. Or if it was an experiment, I would have expected it to have some funky new methodology, maybe shedding light on how Likert scales work.
When I first heard that there was a brouhaha about economics and Likert scales I felt the terror of a man who has a year before Ph.D. submission and does not want to have to integrate any new material into his thesis. Really though, I shouldn’t have worried.
Twitter has its fun
Some people warned that maybe it wasn’t the simple silliness it looked like:
Here’s an attempt at making the case in detail:
In the main though, people weren’t convinced
Is the paper right to say that economists are skeptical of Likert scales? If so why?
The short answer is that yes, economists are skeptical of Likert scales, though far less than they used to be. If you ask them why economists will point the vagueness of the scales- why should it be clear that these quantifications mean anything? However this doesn’t explain why economists in particular tend to reject them- such reasoning is available to other social scientists, but largely hasn’t troubled them, because there’s a mountain of evidence that, despite this vagueness, they do work.
So why do economists in particular distrust Likert scales? In my opinion, there are two fundamental historical reasons why- though I am no intellectual historian, so be warned.
The first has to do with technical arguments in measurement theory which suggest that while Likert scales are presented as giving cardinal information or ratio information, in reality, they may only give ordinal information. These arguments are more appealing to economists because of similar debates in the history of their discipline, and the often greater mathematical sophistication of economists to other social scientists. However, even were this argument right, it would only show that these measures can’t be trusted as cardinal measures, it doesn’t explain why they are rejected as ordinal indicators. If you didn’t understand that paragraph, don’t panic, it won’t be relevant to the rest of this essay.
The second reason has to do with a longstanding view among economists that it is impossible to compare utility between people- to perform interpersonal comparison- or to say that Bob gets more from an extra dollar than Sue gets or vice versa.
If Likert scales work though, they can be used on happiness, life satisfaction etc., and if Likert scales really can tell us whether Bob or Susie is happier, then interpersonal comparison of utility- or something near enough- is possible. But we all “know” that interpersonal comparison is impossible because that has been the orthodoxy in economics for many years, ergo Likert scales must not work.
Lionel Robbins wrote a book in which he argued that interpersonal comparison of utility, and also cardinal utility [economists have often conflated the two or treated them as tightly related] are either impossible or represent mere judgments of value. Vilfredo Pareto made similar arguments earlier.
Lionel Robbins book had a tremendous impact on the discipline. So great was the impact of his work that the question of interpersonal comparison and cardinal utility were seen as settled in the negative by many (aside: it’s rarely a good sign when a discipline treats a philosophical debate like this as settled). The age of welfare economics built on cardinality and interpersonal comparison was over.
Arguably it should have been clear that something was up when Von-Neumann & Morgenstern published a perfectly good guide for formally constructing cardinal preferences from behavior just a few decades later, but the ordinal revolution largely brushed that off. See, for example, this paper, which as best as I can tell is nonsensical.
If Von-Neumann Morgenstern, one of the 20th centuries most important intellectuals and no slouch as an economist, wasn’t going to dampen the ardor for Robbins and Pareto’s ideas, mere psychology would do no such thing. Especially icky ‘soft’ psychology like that involving Likert scales.
That’s the (thin) intellectual history. Here’s my speculation on the thicker intellectual history. The great campaigners against interpersonal comparison- Pareto and Robbins- were of the political right. Pareto was basically fascist [and a great study in how a certain kind of rightwing libertarianism can lead to fascism], and Robbins was a sort of moderate libertarian.
The orthodoxy these two overturned- the old welfare economics- which admitted cardinality and the interpersonal comparison of utility- wasn’t exactly a hotbed of leftwing radicalism, but it contained resources within it that could be used, for left wing positions on taxes. For example, the idea that money has a declining marginal value, both considered on an individual level, and at the level of total aggregate interpersonal welfare, seems to give prima facie reason to favor progressive taxation.
Pareto & Robbins were, I think, very afraid of economics that could say “ahh yes, according to my calculations the optimal rate of tax on the rich is…” They wanted it to be a question for political philosophy, for vague notions of desert. Of course, they weren’t entirely wrong, there will never be a ‘scientific’ answer to the question ‘what is the optimal tax rate’- assumptions about values will always be needed, not just assumptions about facts. However, while we will never reach a conclusion about what the optimal tax rate using positive science alone, positive welfare economics has a good bit more to say that is relevant to this question before we get into normative welfare economics than is commonly recognized. If we take psychometrics seriously, it may be possible to work out what system of tax rates maximizes aggregate or at least average happiness- that’s fascinating and important information and it would definitely be received as politically relevant.
Mirrlees, a Nobel prize-winning economist, and many coming after him, did exactly this kind of utilitarian calculation of the optimal tax rate. This may seem like a counter-example but I think it was less threatening because it wasn’t presented as a hard empirical truth along the lines of “this tax rate will maximize aggregate happiness” instead, it was more like “if we play around with the idea that the marginal utility of income is proportional to the logarithm of income…”
Nowadays economists are softening on Likert scales anyway, and most are willing to accept that they provide at least ordinal information [that is to say we can at least say a 2 on a happiness scale is more than one, even if we can say 3 is twice as far from 1 as 2 is from one]. There is definitely a surge of interest in psychometrically informed welfare economics. However, as the authors of this paper on “feelings integers” identify, there is a lingering disciplinary skepticism about Likert scales.
But while this paper is right that some economists are skeptical of Likert scales, it’s far from clear that what is needed to address this is one more set of empirical results. People have been correlating Likert scales with things for almost a century.
Are economists right to be skeptical of Likert scales?
Yes if Likert scales just stood on their own, but all too often economists don’t grapple with the long history of trying to back up these scales by checking that they correlate with things.
I remember reading a rather bad NBER working paper that basically just said, in part, “how do you know these measurements of feelings aren’t just reports which signify nothing”
Or as the authors of this paper put it:
Economists’ contrasting and long-standing mistrust may be due to a belief that data on human feelings carry little or no reliable predictive power. Milton Friedman—an early recipient of the Nobel Prize in Economics—suggested in a classic article on scientific methodology that the ability to predict should be the fundamental benchmark for the assessment of scientific success.
But of course psychologists (and sociologists, and others who use Likert scales) have their own strategies to make sure these scales measure something.
Validation is, roughly, the process of showing that a measure measures what it purports to measure by showing that it correlates with all the right things we would expect it to correlate with, were it an accurate measurement.
In my thesis I distinguish between three modes of validation (of criterion validation for the psychometric geeks out there):
A) Biological validation- Showing that the measure correlates with the right biological factors. For example, there are good theoretical reasons to think that happiness should be negatively correlated with cortisol, ergo if our scale of happiness is negatively correlated with cortisol, that’s a good sign.
B) Contextual validation- Showing that our measure correlates with the right environmental features. For example, we would expect happiness to be lower in stressful environments, and higher in pleasant environments.
C) Behavioral validation- Perhaps most critical of all, and what this PNAS paper works on- showing that our measure correlates with the right aspects of the person’s behavior. For example, our measure of happiness should, presumably, be negatively correlated with suicide, positively correlated with smiling etc. etc.
Is the paper good after all?
There’s already a school of revisionism on this PNAS paper which is trying to argue “it’s good actually”. The idea goes that it’s trying to get economists, in language economists understand and accept to engage with Likert scale style measurements. Yes it’s repeating prior work, but if that’s what it takes…
The main concern I have with this argument is that the paper presents itself as making a novel empirical contribution- not merely an explanatory contribution. That explanatory contribution is the correlation of “get-out-of-there” behaviors with attitudes.
Firstly, it is not at all clear why “get-out-of-there” behaviors are particularly significant compared to other forms of behavioral validation of “integer-expressions of feelings”.
The authors write:
Our analysis draws on longitudinal data from three nations. The objective was to assess whether there is evidence of a ubiquitous connection between feelings integers and what might be termed get-me-out-of-here actions. Such actions, explained more fully below, are where individuals choose to leave their current setting (in whatever domain of life). These are of special interest to scientific researchers because get-me-out-of-here actions can be taken to be unambiguous signals of latent human dissatisfaction with the prior status quo.
But is it really clear that get-me-out-of-here actions are a better validator than other types of actions? A better proof that Likert scales actually measure attitudes? That’s far from clear to me.
Secondly, it’s not at all clear to me that their claim that “get-out-of-there” behaviors have not previously been often correlated with attitudes is true. In fact, less politely, I think it’s false.
For example, the ultimate “get out of there” behavior is the tragedy of suicide, and a number of studies have assessed the relationship between suicide and self-rated subjective-wellbeing at various levels. The authors cite this work.
Of relevance to an economist will be economic choice get-out-of-there behaviors, including customer satisfaction and job satisfaction. Again, the authors cite some of this work.
There are probably hundreds of thousands of studies, certainly tens of thousands, correlating Likert scales with behavior. ‘Get-out of there’ type behaviors are not, as far as I can tell, a particularly uncommon behavior to measure Likert scales against.
The authors try to make a deal out of the fact that the relationships are linear, but in my experience, most measures of feelings are linearly related to outcomes and predictors. Why? Well, one very reasonable interpretation- though not the only possible interpretation- is that Likert scales are linear in attitudes and feelings, which would imply that they do contain cardinal information after all.
Of course one might argue that in presenting themselves as more theoretically novel than they really are, in order to convince economists Likert scales are okay after all, the authors are engaging in a kind of noble exaggeration. They aim to fool economists into accepting already established work through a bit of subterfuge. If so, judging by Twitter, it doesn’t seem to have worked.
The irony of it all is I’ll probably cite their paper, and the discussion around it in my thesis, just to make the point that evidence on these questions is as old as the hills.
One last jab
The authors write: “Within the economist’s rational-agent framework, it is typically taken as axiomatic that decision utility and experienced utility coincide.”
The theory that experienced utility and decision utility coincide is sometimes called psychological hedonism since it postulates that behavioral motivation and pleasure coincide perfectly. At least in my area of interest, welfare economics, it hasn’t been the orthodoxy since the 19th century.
My Ph.D. thesis
While we’re talking about Likert scales and economics…
The argument of my Ph.D. thesis is that psychometrics can solve some of the traditional “insoluble” problems of welfare economics. For example, the problem of interpersonal comparison. In relation to interpersonal comparison, I argue that something roughly like the following argument, combining the philosophical idea of functionalism about mind [weakened to an epistemic rather than an ontological thesis], and the psychological idea of psychometric measurement, can allow us to interpersonally compare welfare-like concepts such as happiness:
[The psychometric thesis] Construct validation shows us that measures of subjective well-being capture functional states that are analogous to concepts like “happiness”
[The epistemic functionalist thesis] It is reasonable to assume that if two agents are functionally alike, they are, in expectation, experientially alike and that if one agent is functionally ‘greater’ on a specific variable, they are, in expectation, experientially greater on that variable. For example, functional happiness is, in expectation, an unbiased estimator of experiential happiness.
Conclusion: Psychometrics can tell us whether, at least in expectation, one person has more of a welfare-like state such as happiness
The overall picture I build of alternative welfare economics is this. Welfare economics tells us the effects a menu of possible policies will have on welfare like states, such as happiness, life satisfaction, etc. It lays the information ou there about which policies will maximize aggregates, how policies will affect the welfare-like states of the most vulnerable etc. etc. Then policymakers and the public will argue over the policies in light of the results and other considerations.
I argue this approach gets past a menu of ‘classical’ problems in welfare economics:
The interpersonal comparison problem (as discussed briefly above).
The problem of cardinality.
The question of how to make a contribution to welfare economics
Concluding thoughts on psychometrics
There are real, serious questions to be asked about whether psychometrics is good science. Joel Michelle for example, a widely respected person where I live, is seen by some as having blown a hole through it a mile wide. Where economists do themselves a disservice though, is in assuming that Likert scales as measures of mental state, were ever intended to be justified at their face. No! Psychologists have developed a whole philosophy of measurement sometimes called the psychometric theory of measurement- construct validation, face validity, content validity, logical validity, test-retest reliability- these are the psychometric concepts that I would expect to see discussed in this paper. Instead the authors invent their own terminology from scratch, and scarcely refer to the existing literature.
For a good discussion on the “awareness gap” among economists on psychometrics see this.
An attempt at a sympathetic narrative about the piece
The authors of this piece have, on a charitable interpretation, run into a trap that I myself am deeply afraid of - looking like they’re trying to reinvent something when really they’re just trying to demonstrate its value.
My thesis runs the risk, I am aware, of being misperceived as just the argument ‘Hey, why don’t we take psychometrics more seriously in welfare economics- psychometrics is kind of cool’- an argument that would ignore the many people who have already made the point that psychometrics is cool and may have a role in welfare economics. Really my argument is “psychometrics can solve these three ancient problems”.
Maybe the authors are not so much reinventing the wheel, but going slowly through a a toy model to convince the skeptics.
But that’s the charitable interpretation, and as I said earlier, the authors aren’t making things easier for themselves by so blithely dismissing the prior literature on validating Likert scales and similar, or as they call it “Feelings integers”. They also claim far too much novelty for their work, if this is what they’re doing. Still, maybe that’s what they needed to do get the paper published and convince the skeptics, and look, we all know top tier journal publishing is a weird game. More power to them.
Also, live and let live! As strange as it is, it’s quite a useful piece to cite in some ways, and I admire the author’s gumption.
If my PhD thesis sounded interesting, check it out here: https://docs.google.com/document/d/17dw0_Ukp_98jmWBr8eyLkxSB1vXeLpi9YbodhQm3q2Y/edit
Or check out the summary/introduction here:
So do you think Likert scales might actually be cardinal? I've heard people whisper about research findings hinting at that, but never heard anyone actually make the case.