Towards a new methodology for evaluating economic welfare and assessing policy
This is the current precis and introduction to my Ph.D. thesis. Introducing an introduction seems superfluous, so I’ll be brief- it’s the outline of a different methodology for something like welfare economics- evaluating policy through economic means.
An ordinary language introduction to the thesis and why it matters
Welfare economics is the study of economic policies and economic arrangements affecting welfare, where welfare is understood as wellbeing or the degree to which a person’s life is going well for them.
Welfare economics is often used to evaluate policy, and in that role it has traditionally faced a problem: what to do when a policy, as policies often do, affect different people differently? How can we compare Oscar’s harms to Jennifer’s benefits? But even if we could collate all the different benefits and losses, and make them comparable to each other, how can we evaluate them against each other, let alone decide which gains and losses we should care about more? The enormous prestige of economists and the occasional taste for technocracy means that many policymakers would like economists to be able to do this for them- to deliver a neutral ground truth on what the right policies are or at least which policy policies increase human welfare.
There are a number of obstacles. Firstly there is the issue of interpersonal comparison- what does it mean to say that Jennifer’s benefit from a state of affairs is greater than Oscar’s benefit from the same or another state of affairs. How could we discover this?
The second is the issue of magnitudes- sometimes called the problem of cardinality. How can we quantify benefits as if they were temperatures on a thermometer? Even if we restrict ourselves to one person, what does it mean to say that x was much worse for Oscar than y? Or that the difference between 25,000 a year and 50,000 dollars a year matters at least as much to Oscar as the difference between 50,000 and 100,000 dollars a year.
Finally there is the problem of value judgements. Even if we could quantify all the benefits and harms Oscar and Jennifer face, and show that policy P does more benefit than policy Q some people might still suggest that policy Q is superior. Let’s say policy Q, despite having less benefit overall, benefits Jessica more. Some people might prefer policy Q if they feel that Jennifer is harder working, or if they feel that Jennifer is worse off and therefore should be prioritized for benefits, or if they feel that Jennifer has been historically oppressed and must be compensated. How can economics, a positive science, possibly choose between these value frameworks?
Right now, welfare economics has methods of resolving each of these problems in practice. However the solutions used are subject to many critiques. To restrict myself to a selection of criticisms I agree with: 1. Current methods often unjustly favor the rich over the poor. 2. Current methods rely on psychologically implausible models 3. Current methods make very strong assumptions about what it means to live a good life 4. Current methods are in tension with a democratic approach to economic decision making.
Meanwhile, there have been efforts to psychologise welfare economics for a long time- to combine it with the burgeoning literature on wellbeing psychology. My argument in this thesis is that a particular way of psychologising welfare economics can solve- or at least has extremely good prospects for solving, all three of those problems I listed above- the problem of interpersonal comparison, the problem of cardinality and the problem of the proper role of values in applied economic science. It is also not subject -or at least it is much less subject- to the critiques of current welfare economics I outlined above. In other words, I argue that a certain type of psychological approach to welfare economics can solve many of the paradoxes and antinomies that welfare economics faces.
This approach consists in 1. Estimating the effects of policies on the level and distribution of various psychometrically measured forms of wellbeing. 2. Providing this information to policy makers and the public.
Addressing these issues matters, because arguably there is no single area of practice in the humanities or social sciences that has such a direct impact on our lives and well being as welfare economics, applied through the policies of treasuries, agencies, infrastructure funds, international institutions, non-government organizations and many others.
Introduction and precis
0. General considerations
This is a thesis about applied welfare economics -a phrase some might contend is, or ought be a tautology and others might say, in exasperation, is an oxymoron. In this thesis, applied welfare economics refers to welfare economics intended to directly shape or inform policy. Examples of what we are calling applied welfare economics include cost-benefit analysis, designing tax systems through optimal tax theory, designing institutions, planning intertemporal tradeoffs and the economic analysis of law. We use cost-benefit analysis almost exclusively for our examples, but mutandis mutandis, we believe that the approach we sketch in this thesis could be applied to any of these- Although showing this in detail would require additional work to deal with the subtleties of each area.
Many criticisms have been made of the various approaches that make up new welfare economics, but these criticisms have sometimes languished for want of an alternative paradigm. We aim to address this problem, vis a vis applied welfare economics. In this thesis, I claim to locate an alternative paradigm in already existing research- the study of subjective-well being applied in the context of economic policy. Such research already exists and has been proposed as an alternative basis for evaluating economic policy. However, my purpose is to defend the idea that a subjective-wellbeing oriented approach can address many of the needs we want welfare economics to address while at the same time offering a resolution to many of the traditional problems and limitations of welfare economics. I call this approach by various names including the psychometric approach and the subjective-wellbeing effect estimation (SWBEE) approach. For clarity, my proposition is not that this approach to welfare economics is the only possible approach that resolves the problems I consider, but only that our approach can well be applied to both the traditional uses, and the traditional difficulties, of welfare economics and is thus a promising paradigm. Key prior authors in this tradition include Ng, Alexandrova and Haybron.
I begin with some expository chapters
0.1. Chapter 1: An initial chapter on traditional criticisms of welfare economics
In this chapter, I will consider traditional criticisms of new welfare economics- especially in the area of applied welfare economics. These criticisms include its formalistic character, often weak conclusions, heroic assumptions and tendency to adopt implausible ethical standpoints in an attempt to avoid taking on any controversial ethical assumptions at all (e.g., the Kaldor-Hicks criterion).
The best way to understand many of the problems of welfare economics is to understand them as constituents of a broader problem- the problem of how to best weigh and balance interests when a policy affects different groups and their interests differently. In this thesis, we call this problem the problem of interpersonal aggregation. It is important not to confuse the problem of interpersonal aggregation with the problem of interpersonal comparison, although a solution to the interpersonal aggregation problem may require a solution to the interpersonal comparison problem. These problems are distinguished in our usage in the thesis as follows. The interpersonal aggregation problem is the problem about how to compare either overall goodness, or perhaps just the welfare-related portion of goodness when a menu of policies has costs and benefits accruing to different people. The interpersonal comparison problem -as we use the term in this thesis- is the problem of how to compare the magnitude of a mental state of some sort- perhaps a desire, emotion, “utility” (whatever that may be) or even a personality trait between two or more persons.
We argue that three separate problems, viz: 1) the interpersonal comparison problem, 2) the problem of cardinal measurement of mental states, and 3) the problem of the proper role of value judgements in welfare economics can all be seen as components of the interpersonal aggregation problem. Without comparability, cardinality and some answer to the question ‘what do we value and how much” (or some principled way to evade the problem), coming to an overall evaluation of the effects of a policy with disparate effects on disparate people is all but impossible.
0.2. Chapter 2: A chapter on traditional approaches to the interpersonal aggregation problem
In this chapter I present a taxonomy of many different approaches to welfare economics, organized by their relation to the interpersonal aggregation problem. I argue that existing approaches tend to either suffer from limited scope, limited relevance to the problem of selecting economic policy, or a presumptuousness about ethics that is unlikely to be persuasive in a democratic and ethically pluralistic society, or some combination of these problems.
Having considered the problems with traditional approaches to applied welfare economics, we begin to lay out our alternative approach over the next two chapters.
0.3. Chapter 3: A chapter laying out our thesis and introducing the basic ideas of the psychometrics of wellbeing
We provide an explanation of standard psychometric techniques and concepts that will be relevant to this work. This includes an introduction to the concepts of validity (and validation) and reliability.
We briefly review the state of the art- philosophical and scientific, on psychometric measures of happiness. My general conclusion is that the science of subjective wellbeing is reasonably robust. As regards the philosophical foundations of psychometrics and subjective-wellbeing, while there is always room for improvement the study of the conceptual foundations of subjective-wellbeing and psychometrics is relatively advanced- at least in comparison to other areas in the human sciences. I express a general skepticism about the idea that science should “wait” for clarity in conceptual foundations- science and philosophy develop jointly, not sequentially, and the standard that sometimes seems to be applied to psychometrics by its enemies in economics- that it should not proceed until we have a full philosophical account of methodology- would not be persuasive in any other area of science [don’t forget to argue this in the chapter itself]
There are a variety of different welfare concepts in the philosophy of wellbeing. These include desire satisfaction, the balance of pleasure over pain and flourishing. I briefly review special philosophical and methodological questions in the psychometric measurement of each. I argue that psychometrics can evaluate each of these types of well being so there is no need to be sectarian about ethics in advocating our methodology, a psychometric approach can engage with any of these.
0.4. Chapter four: Subjective-wellbeing effect estimation
I lay out my preferred approach to welfare economics. The goal of the applied welfare economist is not to make judgements about what policy is optimal but to provide information to decision-makers about the probable effects of policy on a variety of different metrics of subjective wellbeing. Not just the effects on aggregate/average subjective-well being, but also effects on the distribution of wellbeing. Application of the method will require judgment, especially in the initial stages as the canons of procedure are still being developed. Our thesis is mostly concerned with theoretical rationale rather than with providing an exact manual.
Interestingly, SWBEE provides a rationale for something like traditional (weighted) cost-benefit analysis as one method of pursuing the goal of subjective-wellbeing effect estimation. However it does so without requiring us to adopt the value judgments inherent in a specific social welfare function, or even a range of social-welfare functions. Instead of representing a value judgment as to the social worth of a marginal dollar going to an individual, elasticities represent a best attempt to estimate the probable effects of a marginal dollar on a particular form of psychometrically measured well being. Thus our approach can lead to a rationale for weighted cost-benefit analysis without the value judgements.
I then proceed to address one problem in welfare economics per chapter which traditional welfare economics has struggled with, but which I believe a psychometric approach can resolve. Three of the four problems I look at can be seen as tributaries to a larger problem- the interpersonal aggregation problem or how to quantify gains and losses when different people or groups gain or lose. One other problem we consider is not connected to the interpersonal aggregation problem, at least not directly, but is interesting in its own right: that is the question of the best way to infer what people’s interests or desires are.
We’ll now go through each of these chapters in summary form:
0.5. Chapter five: The problem of interpersonal comparison
The first problem we consider is the problem of interpersonal comparison of mental states. This problem is a broader problem than the traditional problem of interpersonal utility comparison. Choose some mental attribute, say happiness, anger or desire. How can we compare X’s degree of this attribute to Y’s degree of this attribute? If we define a person’s good in a way that is tied, to at least some degree to their mental states, resolving this problem will likely be necessary to resolving the interpersonal aggregation problem. This problem has traditionally bedeviled economics, in the form of worries about the interpersonal comparison of utility. Given contemporary definitions of utility, this can be seen roughly as the problem of the interpersonal comparison of degrees of desires. Thus if the psychometric approach can resolve this, so much the better for psychometrics and so much the worse for the traditional methodology of welfare economics.
I argue that psychometrics, when combined with even moderate functionalism in the philosophy of mind, gives us a meaningful way to empirically compare degrees of states such as life satisfaction and happiness between people. Functionalism says that like functional states are associated with like mental states, and psychometrics shows us through the process of construct validation that like scores tend to be associated with like functional states. We can make this result appealing to even more people by weakening functionalism from a metaphysical to an epistemic principle. That is to move from saying like mental states are associated with like functional states to saying that it is reasonable to hold that mental states are like when functional states are like- a methodological rule I call epistemic functionalism. Accepting epistemic functionalism gives us reason to affirm that comparison of psychometric scores allows for interpersonal comparison of mental states.
If this argument is right then the comparison of subjective wellbeing type states in multiple people is no more complex (in very broad principle!!) than the comparison of the temperatures of different objects.
We can further supplement this combination of functionalism and construct validation in the psychometrics of subjective wellbeing with Lerner’s principle of equal ignorance. Our argument then becomes not so much that like mental states must be associated with like mental states, or even that it is reasonable to assume such, as that assuming this has the lowest expected error of any view. I explain why this is an improvement on traditional, non-psychometric applications of the equal ignorance principle to welfare economics in the chapter- in brief, because it allows us to rule out all possible sources of empirical evidence to the contrary, before declaring equality or difference in response to similar circumstances or allocations.
[Check on the above paragraph]
0.6. Chapter six: The problem of cardinality
Even if we could compare X and Y on their degree of happiness, or desire fulfillment, establishing whether one is greater or both are equal, could we quantify the difference between them? At least for many common ways of approaching social welfare, this is going to be of key importance to resolving the interpersonal aggregation problem. For example, if we think the sum or average of wellbeing matters, or even if we think a weighted sum or average is what matters, we are going to need to be able to add up utilities or subjective-well being.
Welfare economists have traditionally been troubled by the issue of the cardinality of welfare, with the advent of ordinalism being seen as a major innovation in consumer theory, but a major difficulty for welfare economics. The problem also arises in the context of the relationship between underlying wellbeing, and wellbeing scores. Many have argued that it is is possible to to satisfactorily defend cardinalism even within a non-psychometric approach to welfare and/or utility, using revealed preferences- and this may or may not be so. However, my purpose in this chapter will be to argue that the problem can also be solved in the context of the psychometric approach to welfare economics. Thus it is not a stumbling block to this approach.
To see the problem in a psychological context, imagine a questionnaire like Cantrill’s ladder. Participants are asked to rate their happiness on a scale from 0 to 10. Now suppose that we have three responses: 6, 8 and 10. We might therefore reason that the average happiness is an 8, but unless we have established cardinality this is impermissible. It might be, for example, that the gap between a 6 and an 8 is much larger than the gap between an 8 and a 10, so taking the arithmetical averages of the responses won’t really tell us about the average of happiness as such.
I argue that while there are difficulties in establishing that psychometric measures are truly cardinal measures of wellbeing, the following considerations give us reasons for confidence:
1. There is some preliminary evidence that subjective wellbeing is linear in subjective well-being scores. This includes, but is not limited to the homoscedasticity of errors all across the scale in test-retest studies, and evidence given in studies of how respondents interpret these scales. Also notable in this regard is Kristoffersen’s point (? Query reference) that in essentially all cases the stats tend to come out the same way, whether we treat the scales as cardinal or ordinal.
2. If scores of measured subjective well being are not already linear in subjective-well being, there are many promising directions for future research on this topic that could uncover the real relationship between subjective-well being scales and underlying subjective-well being. Some of these ideas (A & B) are my own, or worked out between me and colleagues, others already exist in the literature. These ideas include:
Exploring the decision utilities associated with different levels on happiness scales in risk choice experiments.
Correlating the scale with different levels of Kahneman’s unbounded experience sampling based utilities (integrated over a period of time to give the participant’s average experience utility then compared with their response, say, to Cantrill’s ladder).
Investigation based on the assumption of just noticeable differences.
Asking participants what the distance between various points on the scale is, in their perception.
Among several others.
3. Finally, sensitivity testing conducted by myself & Kieran Latty suggests that the controversy is unlikely to matter very much. That is to say, It is very likely that if we found the true relationship between happiness and happiness scales, and recalculated the average happiness of different groups accordingly, that their order would not change by much.
Kieran and myself have adjusted the distance between scores on the basis of various assumptions, ranked countries by the resulting “averages” of happiness, and found that the order of the list of countries by average happiness changes not much or not at all. Some of these sensitivity tests assumed a very great distance from linearity. For example, treating each happiness score as corresponding to a degree of happiness equal to score^2, where 1 becomes 1, 2 becomes 4, 5 becomes 25, 10 becomes 100 and so on. In the most extreme case, we considered the situation in which each ten was worth an arbitrarily large amount, and the case in which each one was worth minus an arbitrarily large amount. We found that the rank order of countries was still largely preserved.
0.7. Chapter seven: The problem of the proper role of values in science
There has long been a problem about the proper role of values in science, and this problem has often troubled welfare economists in particular. This is linked to the overarching problem of interpersonal aggregation, because to make any headway in cases where what is good for one person diverges from what is good for another it will be necessary to make a judgment of value. For example we might establish that Susan’s subjective wellbeing is improved more by a policy than Bob’s loss of subjective wellbeing, but a judgment of value will be needed to decide whether that matters more (and even whether a particular form of subjective well being should matter at all). Thus welfare economics seems committed to making such value judgements, but one might worry that this is not the proper place of the scientist- this is our problem.
I suggest that the problem is most profitably divided in two. There is a concern that values may contaminate science, invalidating its empirical method and positive focus, and there is a concern that, in a pluralistic and democratic society, proceeding on the basis of any one set of values may be inappropriately sectarian. In this chapter we will be concerned solely with the problem of sectarianism rather than the problem of contamination, as we regard the contamination problem as a general problem within the specific context of an applied science which is intended to be action guiding for policy and thus not something we can give specific comment on in a thesis narrowly applied to welfare economics.
In the context of this thesis, we conceive of the problem of sectarianism in a way that is more practical than normative, although of course our work has normative implications. Our problem can be stated so: given that welfare economics is often paralysed by a lack of normative agreement, how may it continue in a way that garners broad social assent?
I argue that the best approach in the context of welfare economics for solving the problem of sectarianism is for welfare economics to see itself not as prescribing solutions but instead as providing ethically relevant information to inform decision makers or the democratic public. An approach to welfare economics grounded in the study of subjective-wellbeing is capable of performing this role because it can give us information that almost everyone cares about knowing, even if it will not form the whole basis for their decision making. Everyone or almost everyone wants to know about how policies affect subjective well being. It thus manages to be ethically engaged, without insisting on a specific set of political values, or a specific social welfare function.
Such an approach does not remove us from the necessity of appealing to values in welfare economics, but it gets us out of ethical sectarianism by making the ethical appeals much less controversial. All one must assent to now is that it is worth having information about how different policies will affect subjective-well being before we make a decision. welfare economics is conceptualized not as making policy recommendations, but as providing information for decision makers and the democratic public. No specific social welfare function needs to be appealed to.
I argue that this kind of move is not so readily available to approaches in welfare economics outside the psychometric approach. Conceptualizing the results of investigations such as cost-benefit analysis -whether weighted or unweighted- as mere information is difficult, because it is unclear what that information is, stripped of a normative recommendation, and of what basis there is for caring about that information. Approaches in new welfare economics, e.g. unweighted cost benefit analysis have attempted to avoid controversial value judgements through appeal to democracy. I find these attempts fail, largely because unweighted cost-benefit analysis is not, on examination, a democratic method. I give multiple reasons for thinking this, but at base our main contention is that these methods are more oligarchic than democratic.
[Consider cutting- ask Mike or Kieran for advice]
Weighted CBA based approaches might be more promising, but these tend to either turn out to be a special case of the psychometric approach, weighting by estimates of affect on subjective wellbeing, or beg the question “weighted on the basis of what?”. If we weigh on the basis of one set of ideas about what is distributively appropriate, we have run aground on ethical sectarianism again.
One argument that our psychometric approach will nonetheless turn out to be ethically sectarian is that there are multiple theories of what the good life is, and no approach to subjective well being captures what all of those theories say matter. Thus even if we aim to just present information on the subjective-wellbeing effects of policy, we have necessarily made an ethically sectarian choice in choosing to focus on certain measures of subjective-wellbeing.
Our reply is fourfold, firstly, that we can estimate the effects of policy on a variety of different kinds of subjective-well being, not just one, secondly, that there is empirical evidence that most people value the kinds of things measured by measures of subjective-well being. Thirdly, all measures of subjective-well being, and probably all popular theories about what the good-life are, are intercorrelated, meaning that to provide information on how policy affects one measure of subjective-wellbeing is to provide information on how policy affects all forms of subjective-well being. Fourthly, if desired, it is possible to survey the public on what they would like in measures of subjective well-being, as Alexandrova points out, thus it is possible to gain a kind of democratic assent.
0.8. Chapter eight: The problem of inferring well-being- revealed preferences or subjective wellbeing
How should we measure preference satisfaction? We consider two options: psychometric measures of preference satisfaction based on verbal self-report, or revealed preference measures of preference satisfaction like willingness to pay. The desire satisfaction theory of welfare has been seen as a strong suit- thus if, even, the psychometric approach proves superior even here, this is a boon for our approach.
Although this debate over the right method to measure welfare is not directly connected to the interpersonal aggregation problem like the other three issues we have considered, it does share a conceptual link: when we are contemplating changes that make things better for everyone, it is often less necessary to know the details of what people want or feel. However when a change may divide the population, knowing exactly what people want and/or how they will feel becomes more important.
I argue that psychometric measures are superior to revealed preferences measures for inferring welfare. Revealed preferences are indeterminate, as there are multiple plausible ways of assigning cardinal utilities that contradict each other. For example, there is no need that the implied elasticity of the marginal utility of income given by risk preferences, and the same elasticity given by time preferences, need not be identical.
Generally, the revealed preferences account is formulated in terms of the representational theory of measurement, where measurement of utility is really a set of rules for assigning numbers to behaviors themselves. However arguably the metaphysics of the representational theory of measurement make interpersonal comparison especially troublesome. This is because if our measurement is not the inference of an underlying quantity influencing behavior, but rather simply an assignment of numbers to behavior itself, interpersonal comparison will necessarily be a matter of a convention for comparing the behavior of two people. Any such extensions of revealed preferences to allow interpersonal comparison using the representational theory of measurement will be, in a sense, arbitrary and not answerable to any deep facts in the world, just the formulators preferences. This raises the question of whether the revealed preferences approach can be separated from the representational theory of measurement.
It might also be nice to explore whether, in the empirical liteature, revealed preferences fufillment or degree of subjective wellbeing is a better correlate of objective measures of wellbeing (health, suicide etc) although this may be out of scope for the thesis.
A final argument- participants seem to care a great deal about the subjective states measured by measures of subjective-wellbeing. How to compare this to how much they care about behavioral measures though? I note that the kind of preferences revealed preferences measures track are most often instrumental in some way- even if only in the trivial sense that people do not care about the objects they purchase in themselves, but rather what they do with them. On the other hand, people care non-instrumentally about many of the things- like happiness and life satisfaction- measured by psychometrics.
[You need to update the summary of the intrinsic desire argument here]
0.9. Chapter nine: Conclusory odds and ends
We briefly consider an objection that has been raised against many attempts to use happiness as a policy benchmark won’t this lead to wireheading. We argue, among other defenses, that our approach, as it encourages the use of multiple indicators to feed into ethical debate, some of which (like eudaimonia) are very unlikely to be positively responsive to wireheading, if studied correctly.
We consider the argument sometimes made that any standard of evaluation other than unweighted cost-benefit analysis will lead to inefficiencies, For example, that valuing the happiness of the poor as much as the rich in mid level decision making effectively amounts to an increase in the progressivity of the tax rate.
We contemplate a methodological question regarding welfare economics- should we expect welfare economics to give us a singular criteria for assessing policy, that can be followed in all cases (with maybe a few extraordinary exceptions)? Should we expect any such criteria to exist, whether coming from welfare economics or not, or is agency decisionmaking best done through a series of vague “balancing tests” that draw attention to a number of factors, but do not quantise them against each other. We very briefly defend the latter view.
We urge an empirical character to future research in the philosophy of welfare economics. Experimental philosophy -surveying the folk about their philosophical beliefs- has previously been justified through a positive program (finding out more about what the folk believe) and through a negative program (destabilizing the very notion that there is any such thing as a clear folk position). In the philosophy of welfare economics, we propose a third rationale- a normative rationale, based on the right of the public to expect input into decision making procedures. The folk deserve to have their views on certain philosophical questions known because they deserve to have them taken seriously in making political decisions. Welfare economics is tremendously important to decision making in many contemporary democracies, but its details are often unknown to the public. Welfare economists, in turn, have little idea what the public would make of their often arcane processes were they made aware of them. Concern for democratic legitimacy alone should lead us to want a better sense of how the public feel about the specificities of processes like CBA.
We end with a consideration of various directions for future research. Our argument regarding functionalism and interpersonal comparison has implications for other minds' skepticism that need to be drawn out. We have given some ideas for how the cardinality problem could be attacked, but further theoretical and experimental research is necessary. The process we have described is only given at a high level of abstraction, extra work would be necessary to transform it into a practical manual. Extra work would be needed to take what we have written primarily with cost-benefit analysis in mind, and turn it into a process for other areas, like optimal tax, and the economic analysis of law.