Back to All Entities
Philosophical Concepts

Browse Philosophical Concepts

1776 philosophical concepts
🔍
Philosophical Concepts

John Wyclif’s Political Philosophy

1. Wyclif's Later Works Government and the relation of divine justice to human law, both secular and ecclesiastical, figure as occasional themes throughout the treatises of the Summa de Ente. After receiving his doctorate in theology in 1373, his attention began to focus more completely on these topics, and his …

1. Wyclif's Later Works Government and the relation of divine justice to human law, both secular and ecclesiastical, figure as occasional themes throughout the treatises of the Summa de Ente. After receiving his doctorate in theology in 1373, his attention began to focus more completely on these topics, and his realism continued to undergird his thought at least through 1381, during which period he wrote the treatises that make up the second of his great Summae, the Summa Theologie. In late 1373, he began De Dominio Divino, which serves as bridge from the later, formal theological treatises of the Summa de Ente to the political, social, and ecclesiological subject matter of the Summa Theologie. He began royal service during this period, participating in an embassy to Bruges for negotiations with papal envoys in 1374. Wyclif remained in the service of John of Gaunt for the rest of his life; the Duke protected him from the formal prosecution prompted by five bulls of papal condemnation in 1377. After being condemned for his views on the Eucharist at Oxford in 1381, Wyclif withdrew to Lutterworth, where he remained until his death in December 1384. Though still protected by John of Gaunt, he was no longer in active service after 1379. During these tumultuous years, Wyclif wrote the ten treatises of the Summa Theologie: four on just human government, two on the structure and government of the church, one on scriptural hermeneutics, and three on specific problems afflicting the Church. Our interest lies in De Mandatis Divinis (1375–76), De Statu Innocencie (1376), and De Civili Dominio (1375–76), where he provides the theological foundation for the radical transformation of the church he prescribes in De Ecclesia (1378–79) De Potestate Pape (1379), and De Officio Regis (1379). Towards the end of his life, Wyclif summarized his entire theological vision in Trialogus (1382–83), reiterating the connections between his earlier philosophical works and later political treatises in a three-way dialogue written in language that would appeal to members of the royal court. 2. Dominium in Political Thought Before Wyclif Dominium and its generally accepted translation, 'lordship', suggest the sovereignty exercised by one individual over another, but Roman law allowed for complexity in distinguishing between property ownership, its primary referent, and jurisdiction, governance, and political power. When twelfth-century canon lawyers resurrected Roman law as the foundation for the ascendant papal monarchy, it was common to distinguish between jurisdictive authority, secular power, and the use and possession of private property.[1] By the beginning of the fourteenth century, dominium largely connoted property ownership, though this usually entailed jurisdictive authority. Most political theorists agreed with Thomas Aquinas in saying that a civil lord who supposed that his jurisdictive authority arose from property ownership rather than from a constitution would be a tyrant (Summa Theologiae IaIIae, Q.56, a.5; Q.58, a.2). Given that the legal use of dominium referred to property ownership and not to the authority to govern, it seems odd that Wyclif used the term to do so much more. The reason may be found in the connection of Augustinian theology to theories of the justice of property ownership. As the papal monarchy developed, its theorists, such as Giles of Rome, found it useful to identify all earthly justice, including just property ownership, with the source of justice in creation. 2.1 Augustine Augustine's De Civitate Dei was the basis for relating property ownership and secular justice to divine authority. Here the division between two classes of men is clear: some are members of the City of Man, motivated by love of self, while others are motivated by the love of God and a contempt for self, placing them in the City of God.[2] There is really only one true Lord in creation. Mastery of one man over another is the result of Original Sin and is therefore unnatural except in the case of paternity, which is founded on parental love for a child. Among members of the City of God, the relation of prince and subject is not political and does not entail the sort of mastery we see in the City of Man, but rather involves service and sacrifice, as exemplified by the parent/child relationship. Property ownership has been united to mastery in the City of Man because of Original Sin, whereby man turned away from God in the mistaken belief that he could make claims of exclusive ownership on created beings. This is not to say that Augustine thought that all private property relations are wrong; indeed, he is famous for having argued that all things belong to the just (De Civitate Dei 14, ch. 28). But people who own things are not de facto just. Those for whom ownership is not an end in itself but a means by which to do God's will are freed from the bondage of selfishness imposed by the Fall. They easily recognize the truth of the dictum that one should abstain from the possession of private things, or if one cannot do so, then at least from the love of property (Enarratio in Psalmam 132, ch.4). Augustine's thought on the relation of ownership to political authority is open to interpretation. One can easily read him as arguing that the Church, as the Body of Christ and earthly instantiation of the City of God, can best exemplify loving lord/subject relations through its ecclesiastical structure, thereby justifying a top-down papal monarchy. Likewise, one can read him as having so separated secular political authority from the rule of love as to make political and ecclesiastical jurisdictive authority utterly distinct. Again, one could interpret Augustine's 'all things belong to the just' as meaning that the Church is the arbiter of all property ownership in virtue of being the Body of Christ and seat of all created justice, or one could argue that the Church should abandon all claims to property ownership, just as the Apostles abstained from the possession of private property. This ambiguity in interpretation was the source of some of the competing theories that influenced Wyclif's position. 2.2 Giles of Rome During the conflict between Philip IV of France and Pope Boniface VIII in 1301, Giles of Rome wrote De Ecclesiastica Potestate, establishing the absolute secular superiority of the papacy. Giles' master Boniface VIII was responsible for the two famous Bulls, Clericos laicos (1296), which forbade clergy to give up property without papal approval, and Unam sanctam (1302), which declared that secular power is in the service of, and subject to, papal authority. De Ecclesiastica Potestate is an articulation of the concept of power underlying these two Bulls and arising from one of the two interpretations of Augustine described above. In it, Giles describes all power “spiritual and secular” as rooted in the papacy, likening its structure to a papal river from which smaller, secular streams branch out. The source of this river, he continues, is the sea, which is God: “God is a kind of font and a kind of sea of force and power, from which sea all forces and all powers are derived like streams.”[3] Not only is secular power reliant on papal authority; all property ownership, insofar as it is just, is similarly dependent on an ecclesiastical foundation. The key element in just secular power and property ownership, he continues, is grace: without God's will directly moving in creation through the sacraments of the Church, power and ownership are empty claims, devoid of justice. Although Giles did not explicitly call the combination of ownership and temporal power dominium, his uniting the two in a consistent, Augustinian fashion was sufficient for the next generation of Augustinian theorists. 2.3 The Franciscans and Their Opponents Thirty years earlier, in Bonaventure's Apologia pauperum of 1269, the Franciscans had defined any property ownership, communal or individual, as inimical to the ideals of their Order. The Fall from paradise and the introduction of selfishness to human nature makes property ownership of any type, private or communal, an abberation. For the Franciscans, “all things belong to the just” only in the sense that “belonging” entails non-exclusive sharing (usus pauper), not ownership. Within three decades, the Franciscans were divided on this issue: one party, the Spirituals, demanded that the friars adopt usus pauper as their ideal of spiritual perfection, while the other, the Conventuals, argued for a more lenient interpretation of the Rule. The Spirituals, under the guidance of the philosopher John Peter Olivi and his follower Ubertino de Casale, outnumbered the Conventuals by century's end, and had become sufficiently vocal to attract the attention of the pope.[4] John XXII was deeply suspicious of the Spiritual Franciscans' arguments, perhaps fearing a reappearance of the communitarian Waldensian heresy. Private ownership, John argued, was not the result of Original Sin, but a gift from God that Adam enjoyed in Paradise and which the blessed still can enjoy, secure in the knowledge that their ownership is sanctioned by God's dominium. This argument was to have notable consequences. John's eventual controversy with the Spiritual's champion, William Ockham, led to the first important use of the concept of natural right. But for our analysis, the important thing is that iurisdictio and proprietas were united in the concept of dominium. Wyclif would make use of the Franciscans' arguments for apostolic poverty, as well as of John XXII's idea that divine dominium provides the basis for all human dominium, though in a way that would certainly have displeased both parties.[5] By the 1350s, opponents of the Franciscans had broadened their range of criticism to question the legitimacy of the Order itself. Richard Fitzralph, (d. 1360) wrote De Pauperie Salvatoris, a sustained examination of the Franciscans' claim to function without supervision by diocesan bishop in which he argues that if the friars rely on the justice of the owners of what they use, they are bound by the same laws that bind the owners. Thus, if the owners of what the friars use are ecclesiastical, it follows that the friars must obey ecclesiastical authority.[6] Fitzralph's position is important here because it argues that grace alone is the justification for any instance of dominium in creation, and that all just dominium ultimately relies on God's dominium. Both serve as cornerstones of Wyclif's position. God's dominium is a natural consequence of the act of creating, and with it comes divine governance and conservation of created being. The rational beings in creation, angels and human beings, enjoy the loan of elements of God's created universe, but this is not a divine abdication of ultimate authority since everything is still directly subject to divine dominium. When the nature of the dominium lent to Adam changed with the Fall, the love defining our natural dominium was affected, but not eradicated. Men devised political dominium to regulate property relations, and although sin keeps them from recognizing the borrowed nature of any dominium, it does not preclude there being grace-justified property ownership. In some cases, God infuses the artificial property-relations that we call dominium with sufficient grace to make them generally equivalent to prelapsarian dominium. These grace-favored cases of human dominium do not replicate the authority of God's dominium, but can exhibit the love that characterizes it. Fitzralph's expression of the Augustinian papal position makes grace the deciding factor in ownership relations and ultimately in political authority, both of which had become nested in the term dominium. Wyclif's interpretation of the Augustinian position would stretch past arguments about papal authority and the friars, even past arguments between popes and kings, to stir the very nature of the church as Christ's earthly body. All of this begins, he would argue, with an understanding of God's dominium as the causal exemplar of created lordship. 3. Divine Dominium: Creating, Lending, and Grace The relation of universal to particular defines Wyclif's conception of how God's dominium causes all instances of dominium in creation. Divine dominium is “the standard prior to and presupposition of all other dominium; if a creature has dominium over anything, God already has dominium over it, so any created dominium follows upon divine dominium” (De Dominio Divino I, ch. 3, p.16.18–22). This relation exceeds mere exemplarity, where human dominium only imitates God's dominium without divine causal determination. God's dominium has causal efficacy over all instances of human mastery such that no true created dominium is possible without direct participation in and constant reliance upon God's dominium. The instrument through which divine dominium moves is grace, which instills in human rulers an essential love defining their every ruling action. Thus, every case of just human dominium entails a constant reliance upon grace as the hallmark of its being an instantiation of God's universal dominium. God's dominium has six aspects, three identifiable with lordship's ruling element (creation, sustenance, and governance), and three that define lordship's proprietary nature (giving, receiving, and lending) (De Dominio Divino III, ch. 1, p.198.9).7 The necessary precondition for an act of dominium is creation, of which no created being is capable. This makes God's dominium the only true instance of dominium and the source of all created instances of dominium. Because the Divine Ideas and their created correlates, the universals, are ontologically prior to particular created beings, God's dominium over universals is prior to His dominium over particulars. This means that God creates, sustains, and governs the human species prior to ruling over — and knowing — individual people. This led to questions about determinism that served as a starting point for many refutations of Wyclif's theology. The second set of acts that define dominium — giving, receiving, and lending — provides the foundation for Wyclif's argument that all created dominium necessarily requires grace. God's giving of the divine essence in creating is the truest form of giving because God is giving of Himself through Himself, which no created being can do. Nor can any created being receive as God receives; God truly receives only from Himself through His giving. God gives up nothing in His giving, and acquires nothing in His receiving; creation is God's self-expression, an act in which the divine essence is neither decreased nor increased. The crucial act from the created standpoint is God's lending, for here there is real interaction between Lord and subjects. What human beings as conscious participants in God's lending relation can claim as their own is lent to them by divine authority, which they enjoy through grace. It is easy to confuse giving with lending because a lord who has only been “lent” a gift of God for use during his lifetime appears to have been “given” that gift. God's giving is communicative, not translative. For us, most giving is translative in that it involves the giver's surrender of every connection to the gift, making it natural for us to suppose that God renounces His authority over what He gives us. In fact, God's giving is communicative, which does not involve surrender of the gift. Because all that God gives to creation will ultimately return to Him, it makes more sense to speak of God's giving as lending. With any instance of lending, Wyclif explains, the lender seeks assurance that the borrower truly deserves what is to be lent. Human desert of the dominium they are lent is a matter of some complexity involving examination of the theological concept of grace. When a temporal lord lends his subject according to the subject's worthiness, the subject's merit is commensurable with the lord's, and the mutual agreement defining the loan can be made according to the respective merit of each party. The merit that allows the subject desert of consideration for the loan is “condigna”, i.e., grounded in the dignitas shared by lender and subject. Condign merit implies that the meritorious truly deserve the reward, requiring the giver to give it to the merited as something due, as when an olympic athelete earns a gold medal by besting all her opponents. Such a loan is impossible between Creator and creature, because there is no way of placing a creature's merit on the same scale as God's perfect nature; all the creature has, including its worth, is from God, whereas God's perfection is per se. There is no way in which a creature can be considered to deserve anything from God in such a relation. Congruent merit obtains when the meritorious does not have the power to require anything of the giver. In instances of congruent merit, the goodness of the act does not require the giver to reward the agent, though it does provide sufficient cause for the reward to be given, as when one receives an Academy Award: although many of the audience members may deserve an Oscar, the winner receives it because something about her performance is somehow pleasing to the Academy. Still, Wyclif holds that “It is the invariable law of God that nobody is awarded blessedness unless they first deserve it” (De Dominio Divino III, ch. 4, p.229.18). We can move our wills to the good, and from this, Wyclif says, grace may — but need not — follow. Thus, we merit congruently thanks to God's generosity towards a will in accord with His own. In effect, God lends merit. Wyclif's theology of grace is the key to understanding how his theory of human dominium relates to divine dominium, its causal paradigm. Man's lordship is at once ownership and jurisdictive mastery, but when a human lord governs, or gives, or receives, or lends, these acts are only just insofar as the lord recognizes that his authority is that of a steward: “Any rational creature is only improperly called a lord, and is rather a minister or steward of the supreme Lord, and whatever he has to distribute, he has purely by grace” ([De Dominio Divino III, ch. 6, p.250.25–29). The essential characteristic of every instance of human dominium is the grace God lends to the individual lord, which itself is grounded in the grace of the Holy Spirit. The human lord appears to have proprietary and juristictive authority by virtue of his own excellence, but this is really only an instantiation of divine dominium, a grace-realized agent of God's lordship. This makes the human lord both master and servant; from the divine perspective, the lord is God's servant, but from the viewpoint of the subject, he is master. Wyclif is tireless in his emphasis on the illusory nature of this mastery; grace allows the human lord to recognize that he is, in fact, the servant of his subjects, ministering to them as a nurturing steward, not lording over them as would a powerful sovereign. 3.1 Natural Dominium De Civili Dominio begins with the motto, “Civil justice presupposes divine justice; civil dominium presupposes natural dominium.” Man's dominium is threefold — natural, civil, and evangelical — but comprehensible as an instantiation of the justice of God's dominium. As he moved into his general analysis of human dominium, Wyclif's thoughts turned to the most fundamental instance of God's loving governance, the Scriptural commandments. The foundation of all that is right (ius) in creation, he explains, is divine justice (iustitia), so we cannot begin to understand right and wrong in creation without understanding God's uncreated right. This was a significant departure from the Aristotelian position that unaided human reason is capable of justice, and Wyclif explicitly rejects any conception of justice that does not rely on uncreated right.[8] The laws of Scripture are the purest expression of uncreated right available to human eyes, he explains, and are most clearly expressed in the Ten Commandments of Exodus 20, and again in the two greatest commandments of Matthew 22: 37–40. Wyclif's analysis of Christ's law of love and of the Ten Commandments proceeds directly from his disquisition on the relation of earthly justice to eternal right in De Mandatis Divinis. That Wyclif uses the same title Robert Grosseteste had used in his analysis of the decalogue is no accident; Wyclif's debt to Grosseteste's conceptions of sin, love of God, idolatry, and the substance of true faith is obvious throughout the treatise. In De Statu Innocencie, the innocence into which we were created before the Fall, he says, is the optimal condition for any rational being. In our prelapsarian state, our wills would have been in perfect concord with the divine will, so that all human action would be just, effortlessly aligned with the natural order of creation. In this condition, there would be no need for civil or criminal law, since we understood what is right naturally. This denial of the need for human law is of special import, for Wyclif later argues that the evangelical lord, or priest, as heir of Christ's restoration of the possibility of natural dominium, should never be concerned with such matters. In such a state, private property ownership was unknown. The natural dominium described in Genesis 1:26 is characterized by lack of selfishness, ownership, or any distinction between 'mine' and 'thine'. The true sense of Augustine's “All things belong to the just” is most fully apparent in the prelapsarian natural disposition to share in the use of creation while acting as faithful steward to its perfect lord. The Fall was brought about by the first sin, which Wyclif characterizes as a privation of God's right in man's soul. We are left with wills prone to value the physical, material world above spiritual concerns, and the unavoidable result is private property ownership. We no longer understand a given created good as a gift on loan from God, but can only see it in terms of our own self-interest, and the unfortunate result is civil dominium, an enslavement to material goods. 4. Types of Human Dominium Wyclif's definition of civil dominium as “proprietary lordship in a viator over the goods of fortune fully according to human law” is centered not on legislative authority, but on the private property ownership enjoyed by the viator, or wayfarer, along life's path (De Civili Dominio III ch. 11, p.178.9–17).[9] This is because all civil dominium is based on the use of goods owned, which is the basis for all postlapsarian conceptions of justice (recall that for Wyclif, only God truly owns created things because creating a thing is necessary for owning it; hence, human beings are only lent created things and can use them justly, or unjustly in case they appropriate them for themselves). Before the Fall, our use of created goods was communal, unencumbered by the complexity that follows upon selfishness. But now, Wyclif explains, there are three types of use: that directly consequent upon civil ownership, civil use without ownership, and evangelical use. The first two are natural results of the Fall, and the third is the result of Christ's Incarnation. Before the Incarnation, civil ownership and civil use were grounded in man-made laws designed primarily to regulate property ownership. These legal systems tended to have two general structures: they were either monarchies, as in most cases, or else they were aristocratic polities. The harmony of the aristocratic polity is certainly preferable because it most resembles the state enjoyed before the Fall; the benevolent aristocracy, as evidenced in the time of the Biblical judges, would foster the contemplative life, communalism, and an absence of corruptible governmental apparatus. The most common species of civil dominium is monarchy, in which a chief executive power holds ultimate legislative authority. This centralized authority in one man is necessary to implement order; there is no real possibility that the many are capable of ruling on behalf of the many, given the prevalence of sin. The point of civil dominium is not, as with Aristotle, the sustenance of individual virtuous activity. Civil dominium is a phenomenon based on Original Sin, and is therefore unlikely to produce justice per se. If the government of Caesar is occasionally just, it is because it has accidentally realized divine justice. But if civil dominium that is not grounded directly in divine dominium is incapable of sustained just governance, and if natural dominium is the instantiation of divine dominium for which man was created, how can any talk of just civil dominium be possible? To return to the opening dictum of De Civili Dominio, if natural dominium is free from private property ownership, how can civil dominium rely upon it in any way? Before resolving this problem, we will need to address evangelical dominium as yet another factor in Wyclif's conception of man's postlapsarian state. 4.1 Evangelical Dominium Christ restores the possibility of gaining our lost natural dominium both through His apostolic poverty and His redemptive sacrifice as described in Holy Scripture. Because of Christ's sinless nature, He was the first man since Adam capable of exhibiting the purity of natural dominium. This Christ shared with His disciples, who were able to renounce all exclusive claims to created goods in a recreation of the communal caritas lost in the Fall (De Civili Dominio III, 4, p. 51.17–24). This poverty is not simply the state of not owning things; one can live sinfully as easily in squalor as one can in luxury. The apostolic poverty of the early Church is a spiritual state, not an economic rejection of civil dominium. The similarity between Wyclif's conception of spiritual poverty as the ideal state for Christians and the Franciscan ideal is noteworthy. Wyclif seems to make a case similar to the Spiritual Franciscans: Christ's life was exemplary for all Christians and Christ lived in apostolic poverty; therefore, all Christians ought follow His example, or at the least have that option open to them. Wyclif's consonance with the Franciscan tradition is also suggested in his use of Bonaventure's definition of apostolic poverty in the third book of De Civili Dominio, but Wyclif's motives are distinctly different from the Friars' (De Civili Dominio III, 8, pp. 119–120). While the Franciscans argued that their rule allowed them to regain the ownership-free purity enjoyed by the early Apostolic church, Wyclif contended that Christ's redemptive sacrifice enabled all Christians to regain natural dominium itself, not just its purity. This suggested that the Franciscan life was a pale imitation of true Christianity, which Wyclif's Franciscan colleagues were quick to point out. One of the first critics of Wyclif's dominium thought was William Woodford, O.F.M., who argued that Wyclif had gone too far in equating apostolic, spiritual poverty with prelapsarian purity. The extensive third book of De Civili Dominio is Wyclif's response to Franciscan critics like Woodford, and in which lie the seeds of the antifraternalism that would characterize his later writings. Wyclif describes apostolic poverty as a mode of having with love, comprehensible in terms of the individual's use of a thing for the greatest spiritual benefit. God alone can bring about the love instantiating divine dominium, making grace necessary for apostolic poverty. Because the church is founded not on the materially-based laws of man, but on the spiritually-grounded lex Christi, it must be absolutely free of property ownership, the better to realize the spiritual purity required by apostolic poverty. Any material riches that the church comes upon as “goods of fortune” must be distributed as alms for the poor, following the practice of Christ and the disciples, and the apostolic church. This is the ideal to which the Church must aspire through the example of Christ, and some of the harshest invective in Wyclif's prose is directed against the Church's refusal to return to this apostolic state. The turning point in Church history was the Donation of Constantine, on the basis of which the Church claimed to have the civil dominium of a Caesar. Wyclif was vigorous in his condemnation of the Donation, and would likely have been pleased had he lived into the early fifteenth century, when Nicholas of Cusa argued persuasively that the document was a ninth-century forgery. 4.2 Civil Dominium Given the deleterious influence civil dominium has had on the evangelical dominium of Christ's law, it is difficult to imagine how Wyclif would set aside some civil lords as capable of instantiating divine justice. But apostolic poverty is not identical with an absence of property ownership; it is having with love. While the clergy as spiritual lords ought to follow Christ's example of material poverty, it does not follow that all ownership precludes love. God can certainly bestow grace on those whom He wills to be stewards of created goods. Wyclif envisions the just civil lord or king as the means by which the Church is relieved of its accumulated burden of property ownership. So long as the Church exists in postlapsarian society, it must be protected from thieves, heresy, and infidels. Certainly no evangelical lord ought to be concerned with such matters, given their higher responsibility for the welfare of Christian souls. As a result, the Church needs a guardian to ward off enemies while caring for its own weel-being and administering alms to the poor. This allows Wyclif to describe just, grace-favored civil dominium as different in kind from the civil lordship predicated on materialistic human concerns: “It is right for God to have two vicars in His church, namely a king in temporal affairs, and a priest in spiritual. The king should strongly check rebellion, as did God in the Old Testament, while priests ought minister the precepts mildly, as did Christ, who was at once priest and king.” When he raises conventional topics in political thought, like the particulars of just rule, the responsibilities of royal councillors to their king, the nature of just war, and royal jurisdiction in commerce, his advice is priestly: “[A] lord ought not treat his subjects in a way other than he would rationally wish to be treated in similar circumstances; the Christian lord should not desire subjects for love of dominating, but for the correction and spiritual improvement of his subjects, and so to the efficacy of the church” (De Officio Regis ch. 1, p. 13.4–8). The king ought provide few and just laws wisely and accurately administered, and live subject to these laws, since just law is more necessary for the community than the king. Also, the king should strive to protect the lower classes' claims on temporal goods in the interests of social order, for “nothing is more destructive in a kingdom in its political life than immoderately to deprive the lower classes of the goods of fortune” (De Officio Regis ch. 5, p. 96.9–27).[10] On occasion he discusses the king's need of reliable councillors, generally when discussing the king's need for sacerdotal advice in directing church reform, but he never mentions Parliament as a significant aspect of civil rule. The most immediate concern of a civil lord living in an age when the Church is being poisoned by avarice should be the radical divestment of all ecclesiastical ownership. Wyclif is tireless in arguing for the king's right to take all land and goods, and indeed, even the buildings themselves, away from the Church. Should the clergy protest against royal divestment, threatening the king with excommunication or interdict, the king should proceed as a physician applies his lancet to an infected boil. No grace-favored civil lord will be disposed to save up the divested goods of the Church for his own enrichment, despite the obvious temptation. He will distribute the Church's ill-gotten lands and goods to the people. This, Wyclif explains, will be his continued responsibility even after the Church has been purged, for he is the Church's custodian as well as its protector. The hereditary succession by which civil lordship passes from father to son is a problem for Wyclif. People cannot inherit the grace needed to ensure just ownership and jurisdiction. Primogeniture imperils grace-founded civil lordship, making lords prone to rule on behalf of their own familial interests rather than in the interests of their subjects. The only means by which Wyclif can envision hereditary succession operating is through spiritual filiation, in which a civil lord instructs a worthy successor. He suggests adoption as the basis for the spiritual primogeniture by which lordship is passed on, which would be preferable to general election, for Wyclif is clear about the impossibility of widespread recognition of grace in a potential civil lord: “It does not follow, if all the people want Peter to be their civil lord, that therefore it is just” (De Civili Dominio I, 18, p. 130.6). Central to his ecclesiology is the impossibility of determining the presence of grace in another's soul, which militates against identifying members of the elect with certainty, and therefore against excommunicating any of them from the Church, as well as ruling out popular election as a means of instituting just civil dominium. Grants in perpetuity, commonly employed by civil lords to guarantee the ongoing obligation of subjects in return for a gift of land or political authority, are as impossible as hereditary inheritance. A lord might reward someone with a grant while acting as God's steward, but he certainly cannot thereby make his subject's progeny deserve the gift.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

Voting Methods

1. The Problem: Who Should be Elected? Suppose that there is a group of 21 voters who need to make a decision about which of four candidates should be elected. Let the names of the candidates be \(A\), \(B\), \(C\) and \(D\). Your job, as a social planner, is to …

1. The Problem: Who Should be Elected? Suppose that there is a group of 21 voters who need to make a decision about which of four candidates should be elected. Let the names of the candidates be \(A\), \(B\), \(C\) and \(D\). Your job, as a social planner, is to determine which of these 4 candidates should win the election given the opinions of all the voters. The first step is to elicit the voters’ opinions about the candidates. Suppose that you ask each voter to rank the 4 candidates from best to worst (not allowing ties). The following table summarizes the voters’ rankings of the candidates in this hypothetical election scenario. Read the table as follows: Each row represents a ranking for a group of voters in which candidates to the left are ranked higher. The numbers in the first column indicate the number of voters with that particular ranking. So, for example, the third row in the table indicates that 7 voters have the ranking \(B\s D\s C\s A\) which means that each of the 7 voters rank \(B\) first, \(D\) second, \(C\) third and \(A\) last. Suppose that, as the social planner, you do not have any personal interest in the outcome of this election. Given the voters’ expressed opinions, which candidate should win the election? Since the voters disagree about the ranking of the candidates, there is no obvious candidate that best represents the group’s opinion. If there were only two candidates to choose from, there is a very straightforward answer: The winner should be the candidate or alternative that is supported by more than 50 percent of the voters (cf. the discussion below about May’s Theorem in Section 4.2). However, if there are more than two candidates, as in the above example, the statement “the candidate that is supported by more than 50 percent of the voters” can be interpreted in different ways, leading to different ideas about who should win the election. One candidate who, at first sight, seems to be a good choice to win the election is \(A\). Candidate \(A\) is ranked first by more voters than any other candidate. (\(A\) is ranked first by 8 voters, \(B\) is ranked first by 7; \(C\) is ranked first by 6; and \(D\) is not ranked first by any of the voters.) Of course, 13 people rank \(A\) last. So, while more voters rank \(A\) first than any other candidate, more than half of the voters rank \(A\) last. This suggests that \(A\) should not be elected. None of the voters rank \(D\) first. This fact alone does not rule out \(D\) as a possible winner of the election. However, note that every voter ranks candidate \(B\) above candidate \(D\). While this does not mean that \(B\) should necessarily win the election, it does suggest that \(D\) should not win the election. The choice, then, boils down to \(B\) and \(C\). It turns out that there are good arguments for each of \(B\) and \(C\) to be elected. The debate about which of \(B\) or \(C\) should be elected started in the 18th-century as an argument between the two founding fathers of voting theory, Jean-Charles de Borda (1733–1799) and M.J.A.N. de Caritat, Marquis de Condorcet (1743–1794). For a history of voting theory as an academic discipline, including Condorcet’s and Borda’s writings, see McLean and Urken (1995). I sketch the intuitive arguments for the election of \(B\) and \(C\) below. Candidate \(C\) should win. Initially, this might seem like an odd choice since both \(A\) and \(B\) receive more first place votes than \(C\) (only 6 voters rank \(C\) first while 8 voters rank \(A\) first and 7 voters rank \(B\) first). However, note how the population would vote in the various two-way elections comparing \(C\) with each of the other candidates: Condorcet’s idea is that \(C\) should be declared the winner since she beats every other candidate in a one-on-one election. A candidate with this property is called a Condorcet winner. We can similarly define a Condorcet loser. In fact, in the above example, candidate \(A\) is the Condorcet loser since she loses to every other candidate in a one-on-one election. Candidate \(B\) should win. Consider \(B\)’s performance in the one-on-one elections. Candidate \(B\) performs the same as \(C\) in a head-to-head election with \(A\), loses to \(C\) by only one vote and beats \(D\) in a landslide (everyone prefers \(B\) over \(D\)). Borda suggests that we should take into account all of these facts when determining which candidate best represents the overall group opinion. To do this, Borda assigns a score to each candidate that reflects how much support he or she has among the electorate. Then, the candidate with the largest score is declared the winner. One way to calculate the score for each candidate is as follows (I will give an alternative method, which is easier to use, in the next section): The candidate with the highest score (in this case, \(B\)) is the one who should be elected. Both Condorcet and Borda suggest comparing candidates in one-on-one elections in order to determine the winner. While Condorcet tallies how many of the head-to-head races each candidate wins, Borda suggests that one should look at the margin of victory or loss. The debate about whether to elect the Condorcet winner or the Borda winner is not settled. Proponents of electing the Condorcet winner include Mathias Risse (2001, 2004, 2005) and Steven Brams (2008); Proponents of electing the Borda winner include Donald Saari (2003, 2006) and Michael Dummett (1984). See Section 3.1.1 for further issues comparing the Condorcet and Borda winners. The take-away message from this discussion is that in many election scenarios with more than two candidates, there may not always be one obvious candidate that best reflects the overall group opinion. The remainder of this entry will discuss different methods, or procedures, that can be used to determine the winner(s) given the a group of voters’ opinions. Each of these methods is intended to be an answer to the following question: Given a group of people faced with some decision, how should a central authority combine the individual opinions so as to best reflect the “overall group opinion”? A complete analysis of this question would incorporate a number of different issues ranging from central topics in political philosophy about the nature of democracy and the “will of the people” to the psychology of decision making. In this article, I focus on one aspect of this question: the formal analysis of algorithms that aggregate the opinions of a group of voters (i.e., voting methods). Consult, for example, Riker 1982, Mackie 2003, and Christiano 2008 for a more comprehensive analysis of the above question, incorporating many of the issues raised in this article. 1.1 Notation In this article, I will keep the formal details to a minimum; however, it is useful at this point to settle on some terminology. Let \(V\) and \(X\) be finite sets. The elements of \(V\) are called voters and I will use lowercase letters \(i, j, k, \ldots\) or integers \(1, 2, 3, \ldots\) to denote them. The elements of \(X\) are called candidates, or alternatives, and I will use uppercase letters \(A, B, C, \ldots \) to denote them. Different voting methods require different types of information from the voters as input. The input requested from the voters are called ballots. One standard example of a ballot is a ranking of the set of candidates. Formally, a ranking of \(X\) is a relation \(P\) on \(X\), where \(Y\mathrel{P} Z\) means that “\(Y\) is ranked above \(Z\),” satisfying three constraints: (1) \(P\) is complete: any two distinct candidates are ranked (for all candidates \(Y\) and \(Z\), if \(Y\ne Z\), then either \(Y\mathrel{P} Z\) or \(Z\mathrel{P} Y\)); (2) \(P\) is transitive: if a candidate \(Y\) is ranked above a candidate \(W\) and \(W\) is ranked above a candidate \(Z\), then \(Y\) is ranked above \(Z\) (for all candidates \(Y, Z\), and \(W\), if \(Y\mathrel{P} W\) and \(W\mathrel{P} Z\), then \(Y\mathrel{P} Z\)); and (3) \(P\) is irreflexive: no candidate is ranked above itself (there is no candidate \(Y\) such that \(Y\mathrel{P} Y\)). For example, suppose that there are three candidates \(X =\{A, B, C\}\). Then, the six possible rankings of \(X\) are listed in the following table: I can now be more precise about the definition of a Condorcet winner (loser). Given a ranking from each voter, the majority relation orders the candidates in terms of how they perform in one-on-one elections. More precisely, for candidates \(Y\) and \(Z\), write \(Y \mathrel{>_M} Z\), provided that more voters rank candidate \(Y\) above candidate \(Z\) than the other way around. So, if the distribution of rankings is given in the above table, we have: A candidate \(Y\) is called the Condorcet winner in an election scenario if \(Y\) is the maximum of the majority ordering \(>_M\) for that election scenario (that is, \(Y\) is the Condorcet winner if \(Y\mathrel{>_M} Z\) for all other candidates \(Z\)). The Condorcet loser is the candidate that is the minimum of the majority ordering. Rankings are one type of ballot. In this article, we will see examples of other types of ballots, such as selecting a single candidate, selecting a subset of candidates or assigning grades to candidates. Given a set of ballots \(\mathcal{B}\), a profile for a set of voters specifies the ballot selected by each voter. Formally, a profile for set of voters \(V=\{1,\ldots, n\}\) and a set of ballots \(\mathcal{B}\) is a sequence \(\bb=(b_1,\ldots, b_n)\), where for each voter \(i\), \(b_i\) is the ballot from \(\mathcal{B}\) submitted by voter \(i\). A voting method is a function that assigns to each possible profile a group decision. The group decision may be a single candidate (the winning candidate), a set of candidates (when ties are allowed), or an ordering of the candidates (possibly allowing ties). Note that since a profile identifies the voter associated with each ballot, a voting method may take this information into account. This means that voting methods can be designed that select a winner (or winners) based only on the ballots of some subset of voters while ignoring all the other voters’ ballots. An extreme example of this is the so-called Arrovian dictatorship for voter \(d\) that assigns to each profile the candidate ranked first by \(d\). A natural way to rule out these types of voting methods is to require that a voting method is anonymous: the group decision should depend only on the number of voters that chose each ballot. This means that if two profiles are permutations of each other, then a voting method that is anonymous must assign the same group decision to both profiles. When studying voting methods that are anonymous, it is convenient to assume the inputs are anonymized profiles. An anonymous profile for a set of ballots \(\mathcal{B}\) is a function from \(\mathcal{B}\) to the set of integers \(\mathbb{N}\). The election scenario discussed in the previous section is an example of an anonymized profile (assuming that each ranking not displayed in the table is assigned the number 0). In the remainder of this article (unless otherwise specified), I will restrict attention to anonymized profiles. I conclude this section with a few comments on the relationship between the ballots in a profile and the voters’ opinions about the candidates. Two issues are important to keep in mind. First, the ballots used by a voting method are intended to reflect some aspect of the voters’ opinions about the candidates. Voters may choose a ballot that best expresses their personal preference about the set of candidates or their judgements about the relative strengths of the candidates. A common assumption in the voting theory literature is that a ranking of the set of candidates expresses a voter’s ordinal preference ordering over the set of candidates (see the entry on preferences, Hansson and Grüne-Yanoff 2009, for an extended discussion of issues surrounding the formal modeling of preferences). Other types of ballots represent information that cannot be inferred directly from a voter’s ordinal preference ordering, for example, by describing the intensity of a preference for a particular candidate (see Section 2.3). Second, it is important to be precise about the type of considerations voters take into account when selecting a ballot. One approach is to assume that voters choose sincerely by selecting the ballot that best reflects their opinion about the the different candidates. A second approach assumes that the voters choose strategically. In this case, a voter selects a ballot that she expects to lead to her most desired outcome given the information she has about how the other members of the group will vote. Strategic voting is an important topic in voting theory and social choice theory (see Taylor 2005 and Section 3.3 of List 2013 for a discussion and pointers to the literature), but in this article, unless otherwise stated, I assume that voters choose sincerely (cf. Section 4.1). 2. Examples of Voting Methods A quick survey of elections held in different democratic societies throughout the world reveals a wide variety of voting methods. In this section, I discuss some of the key methods that have been analyzed in the voting theory literature. These methods may be of interest because they are widely used (e.g., Plurality Rule or Plurality Rule with Runoff) or because they are of theoretical interest (e.g., Dodgson’s method). I start with the most widely used method: Plurality Rule: Each voter selects one candidate (or none if voters can abstain), and the candidate(s) with the most votes win. Plurality rule (also called First Past the Post) is a very simple method that is widely used despite its many problems. The most pervasive problem is the fact that plurality rule can elect a Condorcet loser. Borda (1784) observed this phenomenon in the 18th century (see also the example from Section 1). Candidate \(A\) is the Condorcet loser (both \(B\) and \(C\) beat candidate \(A\), 13 – 8); however, \(A\) is the plurality rule winner (assuming the voters vote for the candidate that they rank first). In fact, the plurality ranking (\(A\) is first with 8 votes, \(B\) is second with 7 votes and \(C\) is third with 6 votes) reverses the majority ordering \(C\mathrel{>_M} B\mathrel{>_M} A\). See Laslier 2012 for further criticisms of Plurality Rule and comparisons with other voting methods discussed in this article. One response to the above phenomenon is to require that candidates pass a certain threshold to be declared the winner. Quota Rule: Suppose that \(q\), called the quota, is any number between 0 and 1. Each voter selects one candidate (or none if voters can abstain), and the winners are the candidates that receive at least \(q \times \# V\) votes, where \(\# V\) is the number of voters. Majority Rule is a quota rule with \(q=0.5\) (a candidate is the strict or absolute majority winner if that candidate receives strictly more than \(0.5 \times \# V\) votes). Unanimity Rule is a quota rule with \(q=1\). An important problem with quota rules is that they do not identify a winner in every election scenario. For instance, in the above election scenario, there are no majority winners since none of the candidates are ranked first by more than 50% of the voters. A criticism of both plurality and quota rules is that they severely limit what voters can express about their opinions of the candidates. In the remainder of this section, I discuss voting methods that use ballots that are more expressive than simply selecting a single candidate. Section 2.1 discusses voting methods that require voters to rank the alternatives. Section 2.2 discusses voting methods that require voters to assign grades to the alternatives (from some fixed set of grades). Finally, Section 2.3 discusses two voting methods in which the voters may have different levels of influence on the group decision. In this article, I focus on voting methods that either are familiar or help illustrate important ideas. Consult Brams and Fishburn 2002, Felsenthal 2012, and Nurmi 1987 for discussions of voting methods not covered in this article. 2.1 Ranking Methods: Scoring Rules and Multi-Stage Methods The voting methods discussed in this section require the voters to rank the candidates (see section 1.1 for the definition of a ranking). Providing a ranking of the candidates is much more expressive than simply selecting a single candidate. However, ranking all of the candidates can be very demanding, especially when there is a large number of them, since it can be difficult for voters to make distinctions between all the candidates. The most well-known example of a voting method that uses the voters’ rankings is Borda Count: Borda Count: Each voter provides a ranking of the candidates. Then, a score (the Borda score) is assigned to each candidate by a voter as follows: If there are \(n\) candidates, give \(n-1\) points to the candidate ranked first, \(n-2\) points to the candidate ranked second,…, 1 point to the candidate ranked second to last and 0 points to candidate ranked last. So, the Borda score of candidate \(A\), denoted \(\BS(A)\), is calculated as follows (where \(\#U\) denotes the number elements in the set \(U)\): \[\begin{align} \BS(A) =\ &(n-1)\times \# \{i\ |\ i \text{ ranks \(A\) first}\}\\ &+ (n-2)\times \# \{i\ |\ i \text{ ranks \(A\) second}\} \\ &+ \cdots \\ &+ 1\times \# \{i\ |\ i \text{ ranks \(A\) second to last}\}\\ &+ 0\times \# \{i\ |\ i \text{ ranks \(A\) last}\} \end{align}\] The candidate with the highest Borda score wins. Recall the example discussed in the introduction to Section 1. For each alternative, the Borda scores can be calculated using the above method: Borda Count is an example of a scoring rule. A scoring rule is any method that calculates a score based on weights assigned to candidates according to where they fall in the voters’ rankings. That is, a scoring rule for \(n\) candidates is defined as follows: Fix a sequence of numbers \((s_1, s_2, \ldots, s_n)\) where \(s_k\ge s_{k+1}\) for all \(k=1,\ldots, n-1\). For each \(k\),  \(s_k \) is the score assigned to a alternatives ranked in position \(k\). Then, the score for alternative \(A\), denoted \(Score(A)\), is calculated as follows: Borda count for \(n\) alternatives uses scores \((n-1, n-2, \ldots, 0)\) (call \(\BS(X)\) the Borda score for candidate \(X\)). Note that Plurality Rule can be viewed as a scoring rule that assigns 1 point to the first ranked candidate and 0 points to the other candidates. So, the plurality score of a candidate \(X\) is the number of voters that rank \(X\) first. Building on this idea, \(k\)-Approval Voting is a scoring method that gives 1 point to each candidate that is ranked in position \(k\) or higher, and 0 points to all other candidates. To illustrate \(k\)-Approval Voting, consider the following election scenario: Note that the Condorcet winner is \(A\), so none of the above methods guarantee that the Condorcet winner is elected (whether \(A\) is elected using 1-Approval or 3-Approval depends on the tie-breaking mechanism that is used). A second way to make a voting method sensitive to more than the voters’ top choice is to hold “multi-stage” elections. The idea is to successively remove candidates that perform poorly in the election until there is one candidate that is ranked first by more than 50% of the voters (i.e., there is a strict majority winner). The different stages can be actual “runoff” elections in which voters are asked to evaluate a reduced set of candidates; or they can be built in to the way the winner is calculated by asking voters to submit rankings over the set of all candidates. The first example of a multi-stage method is used to elect the French president. Plurality with Runoff: Start with a plurality vote to determine the top two candidates (the candidates ranked first and second according to their plurality scores). If a candidate is ranked first by more than 50% of the voters, then that candidate is declared the winner. If there is no candidate with a strict majority of first place votes, then there is a runoff between the top two candidates (or more if there are ties). The candidate(s) with the most votes in the runoff elections is(are) declared the winner(s). Rather than focusing on the top two candidates, one can also iteratively remove the candidate(s) with the fewest first-place votes: The Hare Rule: The ballots are rankings of the candidates. If a candidate is ranked first by more than 50% of the voters, then that candidate is declared the winner. If there is no candidate with a strict majority of first place votes, repeatedly delete the candidate or candidates that receive the fewest first-place votes (i.e., the candidate(s) with the lowest plurality score(s)). The first candidate to be ranked first by strict majority of voters is declared the winner (if there is no such candidate, then the remaining candidate(s) are declared the winners). The Hare Rule is also called Ranked-Choice Voting, Alternative Vote, and Instant Runoff. If there are only three candidates, then the above two voting methods are the same (removing the candidate with the lowest plurality score is the same as keeping the two candidates with highest and second-highest plurality score). The following example shows that they can select different winners when there are more than three candidates: Candidate \(A\) is the Plurality with Runoff winner: Candidates \(A\) and \(B\) are the top two candidates, being ranked first by 7 and 5 voters, respectively. In the runoff election (using the rankings from the above table), the groups voting for candidates \(C\) and \(D\) transfer their support to candidates \(B\) and \(A,\) respectively, with \(A\) winning 10 – 9. Candidate \(D\) is the Hare Rule winner: In the first round, candidate \(C\) is eliminated since she is only ranked first by 3 voters. This group’s votes are transferred to \(D\), giving him 7 votes. This means that in the second round, candidate \(B\) is ranked first by the fewest voters (5 voters rank \(B\) first in the profile with candidate \(C\) removed), and so is eliminated. After the elimination of candidate \(B\), candidate \(D\) has a strict majority of the first-place votes: 12 voters ranking him first (note that in this round the group in the second column transfers all their votes to \(D\) since \(C\) was eliminated in an earlier round). The core idea of multi-stage methods is to successively remove candidates that perform "poorly" in an election. For the Hare Rule, performing poorly is interpreted as receiving the fewest first place votes. There are other ways to identify "poorly performing" candidates in an election scenario. For instance, the Coombs Rule successively removes candidates that are ranked last by the most voters (see Grofman and Feld 2004 for an overview of Coombs Rule). Coombs Rule: The ballots are rankings of the candidates. If a candidate is ranked first by more than 50% of the voters, then that candidate is declared the winner. If there is no candidate with a strict majority of first place votes, repeatedly delete the candidate or candidates that receive the most last-place votes. The first candidate to be ranked first by a strict majority of voters is declared the winner (if there is no such candidate, then the remaining candidate(s) are declared the winners). In the above example, candidate \(B\) wins the election using Coombs Rule. In the first round, \(A\), with 9 last-place votes, is eliminated. Then, candidate \(B\) receives 12 first-place votes, which is a strict majority, and so is declared the winner. There is a technical issue that is important to keep in mind regarding the above definitions of the multi-stage voting methods. When identifying the poorly performing candidates in each round, there may be ties (i.e., there may be more than one candidate with the lowest plurality score or more than one candidate ranked last by the most voters). In the above definitions, I assume that all of the poorly performing candidates will be removed in each round. An alternative approach would use a tie-breaking rule to select one of the poorly performing candidates to be removed at each round. 2.2 Voting by Grading The voting methods discussed in this section can be viewed as generalizations of scoring methods, such as Borda Count. In a scoring method, a voter’s ranking is an assignment of grades (e.g., "1st place", "2nd place", "3rd place", ... , "last place") to the candidates. Requiring voters to rank all the candidates means that (1) every candidate is assigned a grade, (2) there are the same number of possible grades as the number of candidates, and (3) different candidates must be assigned different grades. In this section, we drop assumptions (2) and (3), assuming a fixed number of grades for every set of candidates and allowing different candidates to be assigned the same grade. The first example gives voters the option to either select a candidate that they want to vote for (as in plurality rule) or to select a candidate that they want to vote against. Negative Voting: Each voter is allowed to choose one candidate to either vote for (giving the candidate one point) or to vote against (giving the candidate –1 points). The winner(s) is(are) the candidate(s) with the highest total number of points (i.e., the candidate with the greatest score, where the score is the total number of positive votes minus the total number of negative votes). Negative voting is tantamount to allowing the voters to support either a single candidate or all but one candidate (taking a point away from a candidate \(C\) is equivalent to giving one point to all candidates except \(C\)). That is, the voters are asked to choose a set of candidates that they support, where the choice is between sets consisting of a single candidate or sets consisting of all except one candidate. The next voting method generalizes this idea by allowing voters to choose any subset of candidates: Approval Voting: Each voter selects a subset of the candidates (where the empty set means the voter abstains) and the candidate(s) with selected by the most voters wins. If a candidate \(X\) is in the set of candidates selected by a voter, we say that the voter approves of candidate \(X\). Then, the approval winner is the candidate with the most approvals. Approval voting has been extensively discussed by Steven Brams and Peter Fishburn (Brams and Fishburn 2007; Brams 2008). See, also, the recent collection of articles devoted to approval voting (Laslier and Sanver 2010). Approval voting forces voters to think about the decision problem differently: They are asked to determine which candidates they approve of rather than selecting a single candidate to voter for or determining the relative ranking of the candidates. That is, the voters are asked which candidates are above a certain “threshold of acceptance”. Ranking a set of candidates and selecting the candidates that are approved are two different aspects of a voters overall opinion about the candidates. They are related but cannot be derived from each other. See Brams and Sanver 2009, for examples of voting methods that ask voters to both select a set of candidates that they approve and to (linearly) rank the candidates. Approval voting is a very flexible method. Recall the election scenario illustrating the \(k\)-Approval Voting methods: In this election scenario, \(k\)-Approval for \(k=1,2,3\) cannot guarantee that the Condorcet winner \(A\) is elected. The Approval ballot \((\{A\},\{B\}, \{A, C\})\) does elect the Condorcet winner. In fact, Brams (2008, Chapter 2) proves that if there is a unique Condorcet winner, then that candidate may be elected under approval voting (assuming that all voters vote sincerely: see Brams 2008, Chapter 2, for a discussion). Note that approval voting may also elect other candidates (perhaps even the Condorcet loser). Whether this flexibility of Approval Voting should be seen as a virtue or a vice is debated in Brams, Fishburn and Merrill 1988a, 1988b and Saari and van Newenhizen 1988a, 1988b. Approval Voting asks voters to express something about their intensity of preference for the candidates by assigning one of two grades: "Approve" or "Don’t Approve". Expanding on this idea, some voting methods assume that there is a fixed set of grades, or a grading language, that voters can assign to each candidate. See Chapters 7 and 8 from Balinksi and Laraki 2010 for examples and a discussion of grading languages (cf. Morreau 2016). There are different ways to determine the winner(s) given a profile of ballots that assign grades to each candidate. The main approach is to calculate a "group" grade for each candidate, then select the candidate with the best overall group grade. In order to calculate a group grade for each candidate, it is convenient to use numbers for the grading language. Then, there are two natural ways to determine the group grade for a candidate: calculating the mean, or average, of the grades or calculating the median of the grades. Cumulative Voting: Each voter is asked to distribute a fixed number of points, say ten, among the candidates in any way they please. The candidate(s) with the most total points wins the election. Score Voting (also called Range Voting): The grades are a finite set of numbers. The ballots are an assignment of grades to the candidates. The candidate(s) with the largest average grade is declared the winner(s). Cumulative Voting and Score Voting are similar. The important difference is that Cumulative Voting requires that the sum of the grades assigned to the candidates by each voter is the same. The next procedure, proposed by Balinski and Laraki 2010 (cf. Bassett and Persky 1999 and the discussion of this method at rangevoting.org), selects the candidate(s) with the largest median grade rather than the largest mean grade. Majority Judgement: The grades are a finite set of numbers (cf. discussion of common grading languages). The ballots are an assignment of grades to the candidates. The candidate(s) with the largest median grade is(are) declared the winner(s). See Balinski and Laraki 2007 and 2010 for further refinements of this voting method that use different methods for breaking ties when there are multiple candidates with the largest median grade. I conclude this section with an example that illustrates Score Voting and Majority Judgement. Suppose that there are 3 candidates \(\{A, B, C\}\), 5 grades \(\{0,1,2,3,4\}\) (with the assumption that the larger the number, the higher the grade), and 5 voters. The table below describes an election scenario. The candidates are listed in the first row. Each row describes an assignment of grades to a candidate by a set of voters. The bottom two rows give the mean and median grade for each candidate. Candidate \(A\) is the score voting winner with the greatest mean grade, and candidate \(B\) is the majority judgement winner with the greatest median grade. There are two types of debates about the voting methods introduced in this section. The first concerns the choice of the grading language that voters use to evaluate the candidates. Consult Balinski and Laraki 2010 amd Morreau 2016 for an extensive discussion of the types of considerations that influence the choice of a grading language. Brams and Potthoff 2015 argue that two grades, as in Approval Voting, is best to avoid certain paradoxical outcomes. To illustrate, note that, in the above example, if the candidates are ranked by the voters according to the grades that are assigned, then candidate \(C\) is the Condorcet winner (since 3 voters assign higher grades to \(C\) than to \(A\) or \(B\)). However, neither Score Voting nor Majority Judgement selects candidate \(C\). The second type of debate concerns the method used to calculate the group grade for each candidate (i.e., whether to use the mean as in Score Voting or the median as in Majority Judgement). One important issue is whether voters have an incentive to misrepresent their evaluations of the candidates. Consider the voter in the middle column that assigns the grade of 2 to \(A\), 0 to \(B\), and 3 to \(C\). Suppose that these grades represents the voter’s true evaluations of the candidates. If this voter increases the grade for \(C\) to 4 and decreases the grade for \(A\) to 1 (and the other voters do not change their grades), then the average grade for \(A\) becomes 2.4 and the average grade for \(C\) becomes 2.6, which better reflects the voter’s true evaluations of the candidates (and results in \(C\) being elected according to Score Voting). Thus, this voter has an incentive to misrepresent her grades. Note that the median grades for the candidates do not change after this voter changes her grades. Indeed, Balinski and Laraki 2010, chapter 10, argue that using the median to assign group grades to candidates encourages voters to submit grades that reflect their true evaluations of the candidates. The key idea of their argument is as follows: If a voter’s true grade matches the median grade for a candidate, then the voter does not have an incentive to assign a different grade. If a voter’s true grade is greater than the median grade for a candidate, then raising the grade will not change the candidate’s grade and lowering the voter’s grade may result in the candidate receiving a grade that is lowering than the voter’s true evaluation. Similarly, if a voter’s true grade is lower than the median grade for a candidate, then lowering the grade will not change the candidate’s grade and raising the voter’s grade may result in the candidate receiving a grade that is higher than the voter’s true evaluation. Thus, if voters are focused on ensuring that the group grades for the candidates best reflects their true evaluations of the candidates, then voters do not have an incentive to misrepresent their grades. However, as pointed out in Felsenthal and Machover 2008 (Example 3.3), voters can manipulate the outcome of an election using Majority Judgement to ensure a preferred candidate is elected (cf. the discussion of strategic voting in Section 4.1 and Section 3.3 of List 2013). Suppose that the voter in the middle column assigns the grade of 4 to candidate \(A\), 0 to candidate \(B\) and 3 to candidate \(C\). Assuming the other voters do not change their grades, the majority judgement winner is now \(A\), which the voter ranks higher than the original majority judgement winner \(B.\) Consult Balinski and Laraki 2010, 2014 and Edelman 2012b for arguments in favor of electing candidates with the greatest median grade; and Felsenthal and Machover 2008, Gehrlein and Lepelley 2003, and Laslier 2011 for arguments against electing candidates with the greatest median grade. 2.3 Quadratic Voting and Liquid Democracy In this section, I briefly discuss two new approaches to voting that do not fit nicely into the categories of voting methods introduced in the previous sections. While both of these methods can be used to select representatives, such as a president, the primary application is a group of people voting directly on propositions, or referendums. Quadratic Voting: When more than 50% of the voters support an alternative, most voting methods will select that alternative. Indeed, when there are only two alternatives, such as when voting for or against a proposition, there are many arguments that identify majority rule as the best and most stable group decision method (May 1952; Maskin 1995). One well-known problem with always selecting the majority winner is the so-called tyranny of the majority. A complete discussion of this issue is beyond the scope of this article. The main problem from the point of view of the analysis of voting methods is that there may be situations in which a majority of the voters weakly support a proposition while there is a sizable minority of voters that have a strong preference against the proposition. One way of dealing with this problem is to increase the quota required to accept a proposition. However, this gives too much power to a small group of voters. For instance, with Unanimity Rule a single voter can block a proposal from being accepted. Arguably, a better solution is to use ballots that allow voters to express something about their intensity of preference for the alternatives. Setting aside issues about interpersonal comparisons of utility (see, for instance, Hausman 1995), this is the benefit of using the voting methods discussed in Section 2.2, such as Score Voting or Majority Judgement. These voting methods assume that there is a fixed set of grades that the voters use to express their intensity of preference. One challenge is finding an appropriate set of grades for a population of voters. Too few grades makes it harder for a sizable minority with strong preferences to override the majority opinion, but too many grades makes it easy for a vocal minority to overrule the majority opinion. Using ideas from mechanism design (Groves and Ledyard 1977; Hylland and Zeckhauser 1980), the economist E. Glen Weyl developed a voting method called Quadratic Voting that mitigates some of the above issues (Lalley and Weyl 2018a). The idea is to think of an election as a market (Posner and Weyl, 2018, Chapter 2). Each voter can purchase votes at a costs that is quadratic in the number of votes. For instance, a voter must pay $25 for 5 votes (either in favor or against a proposition). After the election, the money collected is distributed on a pro rata basis to the voters. There are a variety of economic arguments that justify why voters should pay \(v^2\) to purchase \(v\) votes (Lalley and Weyl 2018b; Goeree and Zhang 2017). See Posner and Weyl 2015 and 2017 for further discussion and a vigorous defense of the use of Quadratic Voting in national elections. Consult Laurence and Sher 2017 for two arguments against the use of Quadratic Voting. Both arguments are derived from the presence of wealth inequality. The first argument is that it is ambiguous whether the Quadratic Voting decision really outperforms a decision using majority rule from the perspective of utilitarianism (see Driver 2014 and Sinnott-Armstrong 2019 for overviews of utilitarianism). The second argument is that any vote-buying mechanism will have a hard time meeting a legitimacy requirement, familiar from the theory of democratic institutions (cf. Fabienne 2017). Liquid Democracy: Using Quadratic Voting, the voters’ opinions may end up being weighted differently: Voters that purchase more of a voice have more influence over the election. There are other reasons why some voters’ opinions may have more weight than others when making a decision about some issue. For instance, a voter may have been elected to represent a constituency, or a voter may be recognized as an expert on the issue under consideration. An alternative approach to group decision making is direct democracy in which every citizen is asked to vote on every political issue. Asking the citizens to vote on every issue faces a number of challenges, nicely explained by Green-Armytage (2015, pg. 191): Direct democracy without any option for representation is problematic. Even if it were possible for every citizen to learn everything they could possibly know about every political issue, people who did this would be able to do little else, and massive amounts of time would be wasted in duplicated effort. Or, if every citizen voted but most people did not take the time to learn about the issues, the results would be highly random and/or highly sensitive to overly simplistic public relations campaigns. Or, if only a few citizens voted, particular demographic and ideological groups would likely be under-represented One way to deal with some of the problems raised in the above quote is to use proxy voting, in which voters can delegate their vote on some issues (Miller 1969). Liquid Democracy is a form of proxy voting in which voters can delegate their votes to other voters (ideally, to voters that are well-informed about the issue under consideration). What distinguishes Liquid Democracy from proxy voting is that proxies may further delegate the votes entrusted to them. For example, suppose that there is a vote to accept or reject a proposition. Each voter is given the option to delegate their vote to another voter, called a proxy. The proxies, in turn, are given the option to delegate their votes to yet another voter. The voters that decide to not transfer their votes cast a vote weighted by the number of voters who entrusted them as a proxy, either directly or indirectly. While there has been some discussion of proxy voting in the political science literature (Miller 1969; Alger 2006; Green-Armytage 2015), most studies of Liquid Democracy can be found in the computer science literature. A notable exception is Blum and Zuber 2016 that justifies Liquid Democracy, understood as a procedure for democratic decision-making, within normative democratic theory. An overview of the origins of Liquid Democracy and pointers to other online discussions can be found in Behrens 2017. Formal studies of Liquid Democracy have focused on: the possibility of delegation cycles and the relationship with the theory of judgement aggregation (Christoff and Grossi 2017); the rationality of delegating votes (Bloembergen, Grossi and Lackner 2018); the potential problems that arise when many voters delegate votes to only a few voters (Kang et al. 2018; Golz et al. 2018); and generalizations of Liquid Democracy beyond binary choices (Brill and Talmon 2018; Zhang and Zhou 2017). 2.4 Criteria for Comparing Voting Methods This section introduced different methods for making a group decision. One striking fact about the voting methods discussed in this section is that they can identify different winners given the same collection of ballots. This raises an important question: How should we compare the different voting methods? Can we argue that some voting methods are better than others? There are a number of different criteria that can be used to compare and contrast different voting methods: 3. Voting Paradoxes In this section, I introduce and discuss a number of voting paradoxes — i.e., anomalies that highlight problems with different voting methods. Consult Saari 1995 and Nurmi 1999 for penetrating analyses that explain the underlying mathematics behind the different voting paradoxes. 3.1 Condorcet’s Paradox A very common assumption is that a rational preference ordering must be transitive (i.e., if \(A\) is preferred to \(B\), and \(B\) is preferred to \(C\), then \(A\) must be preferred to \(C\)). See the entry on preferences (Hansson and Grüne-Yanoff 2009) for an extended discussion of the rationale behind this assumption. Indeed, if a voter’s preference ordering is not transitive, for instance, allowing for cycles (e.g., an ordering of \(A, B, C\) with \(A \succ B \succ C \succ A\), where \(X\succ Y\) means \(X\) is strictly preferred to \(Y\)), then there is no alternative that the voter can be said to actually support (for each alternative, there is another alternative that the voter strictly prefers). Many authors argue that voters with cyclic preference orderings have inconsistent opinions about the candidates and should be ignored by a voting method (in particular, Condorcet forcefully argued this point). A key observation of Condorcet (which has become known as the Condorcet Paradox) is that the majority ordering may have cycles (even when all the voters submit rankings of the alternatives). Condorcet’s original example was more complicated, but the following situation with three voters and three candidates illustrates the phenomenon: Note that we have: That is, there is a majority cycle \(A>_M B >_M C >_M A\). This means that there is no Condorcet winner. This simple, but fundamental observation has been extensively studied (Gehrlein 2006; Schwartz 2018). The Condorcet Paradox shows that there may not always be a Condorcet winner in an election. However, one natural requirement for a voting method is that if there is a Condorcet winner, then that candidate should be elected. Voting methods that satisfy this property are called Condorcet consistent. Many of the methods introduced above are not Condorcet consistent. I already presented an example showing that plurality rule is not Condorcet consistent (in fact, plurality rule may even elect the Condorcet loser). The example from Section 1 shows that Borda Count is not Condorcet consistent. In fact, this is an instance of a general phenomenon that Fishburn (1974) called Condorcet’s other paradox. Consider the following voting situation with 81 voters and three candidates from Condorcet 1785. The majority ordering is \(A >_M B >_M C\), so \(A\) is the Condorcet winner. Using the Borda rule, we have: So, candidate \(B\) is the Borda winner. Condorcet pointed out something more: The only way to elect candidate \(A\) using any scoring method is to assign more points to candidates ranked second than to candidates ranked first. Recall that a scoring method for 3 candidates fixes weights \(s_1\ge s_2\ge s_3\), where \(s_1\) points are assigned to candidates ranked 1st, \(s_2\) points are assigned to candidates ranked 2nd, and \(s_3\) points are assigned to candidates ranked last. To simplify the calculation, assume that candidates ranked last receive 0 points (i.e., \(s_3=0\)). Then, the scores assigned to candidates \(A\) and \(B\) are: So, in order for \(Score(A) > Score(B)\), we must have \((s_1 \times 31 + s_2 \times 39) > (s_1 \times 39 + s_2 \times 31)\), which implies that \(s_2 > s_1\). But, of course, it is counterintuitive to give more points for being ranked second than for being ranked first. Peter Fishburn generalized this example as follows: Theorem (Fishburn 1974). For all \(m\ge 3\), there is some voting situation with a Condorcet winner such that every scoring rule will have at least \(m-2\) candidates with a greater score than the Condorcet winner. So, no scoring rule is Condorcet consistent, but what about other methods? A number of voting methods were devised specifically to guarantee that a Condorcet winner will be elected, if one exists. The examples below give a flavor of different types of Condorcet consistent methods. (See Brams and Fishburn, 2002, and Fishburn, 1977, for more examples and a discussion of Condorcet consistent methods.) The last method was proposed by Charles Dodgson (better known by the pseudonym Lewis Carroll). Interestingly, this is an example of a procedure in which it is computationally difficult to compute the winner (that is, the problem of calculating the winner is NP-complete). See Bartholdi et al. 1989 for a discussion. These voting methods (and the other Condorcet consistent methods) guarantee that a Condorcet winner, if one exists, will be elected. But, should a Condorcet winner be elected? Many people argue that there is something amiss with a voting method that does not always elect a Condorcet winner (if one exists). The idea is that a Condorcet winner best reflects the overall group opinion and is stable in the sense that it will defeat any challenger in a one-on-one contest using Majority Rule. The most persuasive argument that the Condorcet winner should not always be elected comes from the work of Donald Saari (1995, 2001). Consider again Condorcet’s example of 81 voters. This is another example that shows that Borda’s method need not elect the Condorcet winner. The majority ordering is while the ranking given by the Borda score is However, there is an argument that candidate \(B\) is the best choice for this electorate. Saari’s central observation is to note that the 81 voters can be divided into three groups: Groups 1 and 2 constitute majority cycles with the voters evenly distributed among the three possible rankings. Such profiles are called Condorcet components. These profiles form a perfect symmetry among the rankings. So, within each of these groups, it is natural to assume that the voters’ opinions cancel each other out; therefore, the decision should depend only on the voters in group 3. In group 3, candidate \(B\) is the clear winner. Balinski and Laraki (2010, pgs. 74–83) have an interesting spin on Saari’s argument. Let \(V\) be a ranking voting method (i.e., a voting method that requires voters to rank the alternatives). Say that \(V\) cancels properly if for all profiles \(\bR\), if \(V\) selects \(A\) as a winner in \(\bP\), then \(V\) selects \(A\) as a winner in any profile \(\bP+\bC\), where \(\bC\) is a Condorcet component and \(\bP+\bC\) is the profile that contains all the rankings from \(\bP\) and \(\bC\). Balinski and Laraki (2010, pg. 77) prove that there is no Condorcet consistent voting method that cancels properly. (See the discussion of the multiple districts paradox in Section 3.3 for a proof of a closely related result.) 3.2 Failures of Monotonicity A voting method is monotonic provided that receiving more support from the voters is always better for a candidate. There are different ways to make this idea precise (see Fishburn, 1982, Sanver and Zwicker, 2012, and Felsenthal and Tideman, 2013). For instance, moving up in the rankings should not adversely affect a candidate’s chances to win an election. It is easy to see that Plurality Rule is monotonic in this sense: The more voters that rank a candidate first, the better chance the candidate has to win. Surprisingly, there are voting methods that do not satisfy this natural property. The most well-known example is Plurality with Runoff. Consider the two scenarios below. Note that the only difference between the them is the ranking of the fourth group of voters. This group of two voters ranks \(B\) above \(A\) above \(C\) in scenario 1 and swaps \(B\) and \(A\) in scenario 2 (so, \(A\) is now their top-ranked candidate; \(B\) is ranked second; and \(C\) is still ranked third). In scenario 1, candidates \(A\) and \(B\) both have a plurality score of 6 while candidate \(C\) has a plurality score of 5. So, \(A\) and \(B\) move on to the runoff election. Assuming the voters do not change their rankings, the 5 voters that rank \(C\) transfer their support to candidate \(A\), giving her a total of 11 to win the runoff election. However, in scenario 2, even after moving up in the rankings of the fourth group (\(A\) is now ranked first by this group), candidate \(A\) does not win this election. In fact, by trying to give more support to the winner of the election in scenario 1, rather than solidifying \(A\)’s win, the last group’s least-preferred candidate ended up winning the election! The problem arises because in scenario 2, candidates \(A\) and \(B\) are swapped in the last group’s ranking. This means that \(A\)’s plurality score increases by 2 and \(B\)’s plurality score decreases by 2. As a consequence, \(A\) and \(C\) move on to the runoff election rather than \(A\) and \(B\). Candidate \(C\) wins the runoff election with 9 voters that rank \(C\) above \(A\) compared to 8 voters that rank \(A\) above \(C\). The above example is surprising since it shows that, when using Plurality with Runoff, it may not always be beneficial for a candidate to move up in some of the voter’s rankings. The other voting methods that violate monotonicity include Coombs Rule, Hare Rule, Dodgson’s Method and Nanson’s Method. See Felsenthal and Nurmi 2017 for further discussion of voting methods that are not monotonic. 3.3 Variable Population Paradoxes In this section, I discuss two related paradoxes that involve changes to the population of voters. No-Show Paradox: One way that a candidate may receive “more support” is to have more voters show up to an election that support them. Voting methods that do not satisfy this version of monotonicity are said to be susceptible to the no-show paradox (Fishburn and Brams 1983). Suppose that there are 3 candidates and 11 voters with the following rankings: In the first round, candidates \(A\) and \(C\) are both ranked first by 4 voters while \(B\) is ranked first by only 3 voters. So, \(A\) and \(C\) move to the runoff round. In this round, the voters in the second column transfer their votes to candidate \(C\), so candidate \(C\) is the winner beating \(A\) 7-4. Suppose that 2 voters in the first group do not show up to the election: In this election, candidate \(A\) has the lowest plurality score in the first round, so candidates \(B\) and \(C\) move to the runoff round. The first group’s votes are transferred to \(B\), so \(B\) is the winner beating \(C\) 5-4. Since the 2 voters that did not show up to this election rank \(B\) above \(C\), they prefer the outcome of the second election in which they did not participate! Plurality with Runoff is not the only voting method that is susceptible to the no-show paradox. The Coombs Rule, Hare Rule and Majority Judgement (using the tie-breaking mechanism from Balinski and Laraki 2010) are all susceptible to the no-show paradox. It turns out that always electing a Condorcet winner, if one exists, makes a voting method susceptible to the above failure of monotonicity. Theorem (Moulin 1988). If there are four or more candidates, then every Condorcet consistent voting method is susceptible to the no-show paradox. See Perez 2001, Campbell and Kelly 2002, Jimeno et al. 2009, Duddy 2014, Brandt et al. 2017, 2019, and Nunez and Sanver 2017 for further discussions and generalizations of this result. Multiple Districts Paradox: Suppose that a population is divided into districts. If a candidate wins each of the districts, one would expect that candidate to win the election over the entire population of voters (assuming that the two districts divide the set of voters into disjoint sets). This is certainly true for Plurality Rule: If a candidate is ranked first by the most voters in each of the districts, then that candidate will also be ranked first by a the most voters over the entire population. Interestingly, this is not true for all voting methods (Fishburn and Brams 1983). The example below illustrates the paradox for Coombs Rule. Candidate \(B\) wins both districts: Combining the two districts gives the following table: There are 15 total voters in the combined districts. None of the candidates are ranked first by 8 or more of the voters. Candidate \(C\) receives the most last-place votes, so is eliminated in the first round. In the second round, candidate \(A\) is beats candidate \(B\) by 1 vote (8 voters rank \(A\) above \(B\) and 7 voters rank \(B\) above \(A\)), and so is declared the winner. Thus, even though \(B\) wins both districts, candidate \(A\) wins the election when the districts are combined. The other voting methods that are susceptible to the multiple-districts paradox include Plurality with Runoff, The Hare Rule, and Majority Judgement. Note that these methods are also susceptible to the no-show paradox. As is the case with the no-show paradox, every Condorcet consistent voting method is susceptible to the multiple districts paradox (see Zwicker, 2016, Proposition 2.5). I sketch the proof of this from Zwicker 2016 (pg. 40) since it adds to the discussion at the end of Section 3.1 about whether the Condorcet winner should be elected. Suppose that \(V\) is a voting method that always selects the Condorcet winner (if one exists) and that \(V\) is not susceptible to the multiple-districts paradox. This means that if a candidate \(X\) is among the winners according to \(V\) in each of two districts, then \(X\) must be among the winners according to \(V\) in the combined districts. Consider the following two districts. Note that in district 2 candidate \(B\) is the Condorcet winner, so must be the only winner according to \(V\). In district 1, there are no Condorcet winners. If candidate \(B\) is among the winners according to \(V\), then, in order to not be susceptible to the multiple districts paradox, \(B\) must be among the winners in the combined districts. In fact, since \(B\) is the only winner in district 2, \(B\) must be the only winner in the combined districts. However, in the combined districts, candidate \(A\) is the Condorcet winner, so must be the (unique) winner according to \(V\). This is a contradiction, so \(B\) cannot be among the winners according to \(V\) in district 1. A similar argument shows that neither \(A\) nor \(C\) can be among the winners according to \(V\) in district 1 by swapping \(A\) and \(B\) in the first case and \(B\) with \(C\) in the second case in the rankings of the voters in district 2. Since \(V\) must assign at least one winner to every profile, this is a contradiction; and so, \(V\) is susceptible to the multiple districts paradox. One last comment about this paradox: It is an example of a more general phenomenon known as Simpson’s Paradox (Malinas and Bigelow 2009). See Saari (2001, Section 4.2) for a discussion of Simpson’s Paradox in the context of voting theory. 3.4 The Multiple Elections Paradox The paradox discussed in this section, first introduced by Brams, Kilgour and Zwicker (1998), has a somewhat different structure from the paradoxes discussed above. Voters are taking part in a referendum, where they are asked their opinion directly about various propositions (cf. the discussion of Quadratic Voting and Liquid Democracy in Section 2.3). So, voters must select either “yes” (Y) or “no” (N) for each proposition. Suppose that there are 13 voters who cast the following votes for the three propositions (so voters can cast one of eight possible votes): When the votes are tallied for each proposition separately, the outcome is N for each proposition (N wins 7–6 for all three propositions). Putting this information together, this means that NNN is the outcome of this election. However, there is no support for this outcome in this population of voters. This raises an important question about what outcome reflects the group opinion: Viewing each proposition separately, there is clear support for N on each proposition; however, there is no support for the entire package of N for all propositions. Brams et al. (1998, pg. 234) nicely summarise the issue as follows: The paradox does not just highlight problems of aggregation and packaging, however, but strikes at the core of social choice—both what it means and how to uncover it. In our view, the paradox shows there may be a clash between two different meanings of social choice, leaving unsettled the best way to uncover what this elusive quantity is. See Scarsini 1998, Lacy and Niou 2000, Xia et al. 2007, and Lang and Xia 2009 for further discussion of this paradox. A similar issue is raised by Anscombe’s paradox (Anscombe 1976), in which: It is possible for a majority of voters to be on the losing side of a majority of issues. This phenomenon is illustrated by the following example with five voters voting on three different issues (the voters either vote ‘yes’ or ‘no’ on the different issues). However, a majority of the voters (voters 1, 2 and 3) do not support the majority outcome on a majority of the issues (note that voter 1 does not support the majority outcome on issues 2 and 3; voter 2 does not support the majority outcome on issues 1 and 3; and voter 3 does not support the majority outcome on issues 1 and 2)! The issue is more interesting when the voters do not vote directly on the issues, but on candidates that take positions on the different issues. Suppose there are two candidates \(A\) and \(B\) who take the following positions on the three issues: Candidate \(A\) takes the majority position, agreeing with a majority of the voters on each issue, and candidate \(B\) takes the opposite, minority position. Under the natural assumption that voters will vote for the candidate who agrees with their position on a majority of the issues, candidate \(B\) will win the election (each of the voters 1, 2 and 3 agree with \(B\) on two of the three issues, so \(B\) wins the election 3–2)! This version of the paradox is known as Ostrogorski’s Paradox (Ostrogorski 1902). See Kelly 1989; Rae and Daudt 1976; Wagner 1983, 1984; and Saari 2001, Section 4.6, for analyses of this paradox, and Pigozzi 2005 for the relationship with the judgement aggregation literature (List 2013, Section 5). 4. Topics in Voting Theory 4.1 Strategizing In the discussion above, I have assumed that voters select ballots sincerely. That is, the voters are simply trying to communicate their opinions about the candidates under the constraints of the chosen voting method. However, in many contexts, it makes sense to assume that voters choose strategically. One need only look to recent U.S. elections to see concrete examples of strategic voting. The most often cited example is the 2000 U.S. election: Many voters who ranked third-party candidate Ralph Nader first voted for their second choice (typically Al Gore). A detailed overview of the literature on strategic voting is beyond the scope of this article (see Taylor 2005 and Section 3.3 of List 2013 for discussions and pointers to the relevant literature; also see Poundstone 2008 for an entertaining and informative discussion of the occurrence of this phenomenon in many actual elections). I will explain the main issues, focusing on specific voting rules. There are two general types of manipulation that can be studied in the context of voting. The first is manipulation by a moderator or outside party that has the authority to set the agenda or select the voting method that will be used. So, the outcome of an election is not manipulated from within by unhappy voters, but, rather, it is controlled by an outside authority figure. To illustrate this type of control, consider a population with three voters whose rankings of four candidates are given in the table below: Note that everyone prefers candidate \(B\) over candidate \(D\). Nonetheless, a moderator can ask the right questions so that candidate \(D\) ends up being elected. The moderator proceeds as follows: First, ask the voters if they prefer candidate \(A\) or candidate \(B\). Since the voters prefer \(A\) to \(B\) by a margin of 2 to 1, the moderator declares that candidate \(B\) is no longer in the running. The moderator then asks voters to choose between candidate \(A\) and candidate \(C\). Candidate \(C\) wins this election 2–1, so candidate \(A\) is removed. Finally, in the last round the chairman asks voters to choose between candidates \(C\) and \(D\). Candidate \(D\) wins this election 2–1 and is declared the winner. A second type of manipulation focuses on how the voters themselves can manipulate the outcome of an election by misrepresenting their preferences. Consider the following two election scenarios with 7 voters and 3 candidates: The only difference between the two election scenarios is that the third voter changed the ranking of the bottom three candidates. In election scenario 1, the third voter has candidate \(A\) ranked first, then \(C\) ranked second, \(B\) ranked third and \(D\) ranked last. In election scenario 2, this voter still has \(A\) ranked first, but ranks \(B\) second, \(D\) third and \(C\) last. In election scenario 1, candidate \(C\) is the Borda Count winner (the Borda scores are \(\BS(A)=9, \BS(B)=5, \BS(C)=10\), and \(\BS(D)=6\)). In the election scenario 2, candidate \(A\) is the Borda Count winner (the Borda scores are \(\BS(A)=9, \BS(B)=6, \BS(C)=8\), and \(\BS(D)=7\)). According to her ranking in election scenario 1, this voter prefers the outcome in election scenario 2 (candidate \(A\), the Borda winner in election scenario 2, is ranked above candidate \(C\), the Borda winner in election scenario 1). So, if we assume that election scenario 1 represents the “true” preferences of the electorate, it is in the interest of the third voter to misrepresent her preferences as in election scenario 2. This is an instance of a general result known as the Gibbard-Satterthwaite Theorem (Gibbard 1973; Satterthwaite 1975): Under natural assumptions, there is no voting method that guarantees that voters will choose their ballots sincerely (for a precise statement of this theorem see Theorem 3.1.2 from Taylor 2005 or Section 3.3 of List 2013). 4.2 Characterization Results Much of the literature on voting theory (and, more generally, social choice theory) is focused on so-called axiomatic characterization results. The main goal is to characterize different voting methods in terms of abstract principles of collective decision making. See Pauly 2008 and Endriss 2011 for interesting discussions of axiomatic characterization results from a logician’s point-of-view. Consult List 2013 and Gaertner 2006 for introductions to the vast literature on axiomatic characterizations in social choice theory. In this article, I focus on a few key axioms and results and how they relate to the voting methods and paradoxes discussed above. I start with three core principles. These properties ensure that the outcome of an election depends only on the voters’ ballots, with all the voters and candidates being treated equally. Other properties are intended to rule out some of the paradoxes and anomalies discussed above. In section 4.1, there is an example of a situation in which a candidate is elected, even though all the voters prefer a different candidate. The next principle rules out such situations: Unanimity (also called the Pareto Principle): If candidate \(A\) is ranked above candidate \(B\) by all voters, then candidate \(B\) should not win the election. These are natural properties to impose on any voting method. A surprising consequence of these properties is that they rule out another natural property that one may want to impose: Say that a voting method is resolute if the method always selects one winner (i.e., there are no ties). Suppose that \(V\) is a voting method that requires voters to rank the candidates and that there are at least 3 candidates and enough voters to form a Condorcet component (a profile generating a majority cycle with voters evenly distributed among the different rankings). First, consider the situation when there are exactly 3 candidates (in this case, we do not need to assume Unanimity). Divide the set of voters into three groups of size \(n\) and consider the Condorcet component: By Universal Domain and resoluteness, \(V\) must select exactly one of \(A\), \(B\), or \(C\) as the winner. Assume that \(V\) select \(A\) as the winner (the argument when \(V\) selects the other candidates is similar). Now, consider the profile in which every voter swaps candidate \(A\) and \(B\) in their rankings: By Neutrality and Universal Domain, \(V\) must elect candidate \(B\) in this election scenario. Now, consider the profile in which every voter in the above election scenario swaps candidates \(B\) and \(C\): By Neutrality and Universal Domain, \(V\) must elect candidate \(C\) in this election scenario. Notice that this last election scenario can be generated by permuting the voters in the first election scenario (to generate the last election scenario from the first election scenario, move the first group of voters to the 2nd position, the 2nd group of voters to the 3rd position and the 3rd group of voters to the first position). But this contradicts Anonymity since this requires \(V\) to elect the same candidate in the first and third election scenario. To extend this result to more than 3 candidates, consider a profile in which candidates \(A\), \(B\), and \(C\) are all ranked above any other candidate and the restriction to these three candidates forms a Condorcet component. If \(V\) satisfies Unanimity, then no candidate except \(A\), \(B\) or \(C\) can be elected. Then, the above argument shows that \(V\) cannot satisfy Resoluteness, Universal Domain, Neutrality, and Anonymity. That is, there are no Resolute voting methods that satisfy Universal Domain, Anonymity, Neutrality, and Unanimity for 3 or more candidates (note that I have assumed that the number of voters is a multiple of 3, see Moulin 1983 for the full proof). Section 3.2 discussed examples in which candidates end up losing an election as a result of more support from some of the voters. There are many ways to state properties that require a voting method to be monotonic. The following strong version (called Positive Responsiveness in the literature) is used to characterize majority rule when there are only two candidates: Positive Responsiveness: If candidate \(A\) is a winner or tied for the win and moves up in some of the voter’s rankings, then candidate \(A\) is the unique winner. I can now state our first characterization result. Note that in all of the example discussed above, it is crucial that there are three or more candidates (for example, stating Condorcet’s paradox requires there to be three or more candidates). When there are only two candidates, or alternatives, Majority Rule (choose the alternative ranked first by more than 50% of the voters) can be singled out as “best”: Theorem (May 1952). A voting method for choosing between two candidates satisfies Neutrality, Anonymity, Unanimity and Positive Responsiveness if and only if the method is majority rule. See May 1952 for a precise statement of this theorem and Asan and Sanver 2002, Maskin 1995, and Woeginger 2003 for alternative characterizations of majority rule. A key assumption in the proof May’s theorem and subsequent results is the restriction to voting on two alternatives. When there are only two alternatives, the definition of a ballot can be simplified since a ranking of two alternatives boils down to selecting the alternative that is ranked first. The above characterizations of Majority Rule work in a more general setting since they also allow voters to abstain (which is ambiguous between not voting and being indifferent between the alternatives). So, if the alternatives are \(\{A,B\}\), then there are three possible ballots: selecting \(A\), selecting \(B\), or abstaining (which is treated as selecting both \(A\) and \(B\)). A natural question is whether there are May-style characterization theorems for more than two alternatives. A crucial issue is that rankings of more than two alternatives are much more informative than selecting an alternative or abstaining. By restricting the information required from a voter to selecting one of the alternatives or abstaining, Goodin and List 2006 prove that the axioms used in May’s Theorem characterize Plurality Rule when there are more than two alternatives. They also show that a minor modification of the axioms characterize Approval Voting when voters are allowed to select more than one alternative. Note that focusing on voting methods that limit the information required from the voters to selecting one or more of the alternatives hides all the interesting phenomena discussed in the previous sections, such as the existence of a Condorcet paradox. Returning to the study of voting methods that require voters to rank the alternatives, the most important characterization result is Ken Arrow’s celebrated impossibility theorem (1963). Arrow showed that there is no social welfare function (a social welfare function maps the voters’ rankings (possibly allowing ties) to a single social ranking) satisfying universal domain, unanimity, non-dictatorship (there is no voter \(d\) such that for all profiles, if \(d\) ranks \(A\) above \(B\) in the profile, then the social ordering ranks \(A\) above \(B\)) and the following key property: Independence of Irrelevant Alternatives: The social ranking (higher, lower, or indifferent) of two candidates \(A\) and \(B\) depends only on the relative rankings of \(A\) and \(B\) for each voter. This means that if the voters’ rankings of two candidates \(A\) and \(B\) are the same in two different election scenarios, then the social rankings of \(A\) and \(B\) must be the same. This is a very strong property that has been extensively criticized (see Gaertner, 2006, for pointers to the relevant literature, and Cato, 2014, for a discussion of generalizations of this property). It is beyond the scope of this article to go into detail about the proof and the ramifications of Arrow’s theorem (see Morreau, 2014, for this discussion), but I note that many of the voting methods we have discussed do not satisfy the above property. A striking example of a voting method that does not satisfy Independence of Irrelevant Alternatives is Borda Count. Consider the following two election scenarios: Notice that the relative rankings of candidates \(A\), \(B\) and \(C\) are the same in both election scenarios. In the election scenario 2, the ranking of candidate \(X\), that is uniformly ranked in last place in election scenario 1, is changed. The ranking according to the Borda score of the candidates in election scenario 1 puts \(A\) first with 15 points, \(B\) second with 14 points, \(C\) third with 13 points, and \(X\) last with 0 points. In election scenario 2, the ranking of \(A\), \(B\) and \(C\) is reversed: Candidate \(C\) is first with 13 voters; candidate \(B\) is second with 12 points; candidate \(A\) is third with 11 points; and candidate \(X\) is last with 6 points. So, even though the relative rankings of candidates \(A\), \(B\) and \(C\) do not differ in the two election scenarios, the position of candidate \(X\) in the voters’ rankings reverses the Borda rankings of these candidates. In Section 3.3, it was noted that a number of methods (including all Condorcet consistent methods) are susceptible to the multiple districts paradox. An example of a method that is not susceptible to the multiple districts paradox is Plurality Rule: If a candidate receives the most first place votes in two different districts, then that candidate must receive the most first place votes in the combined the districts. More generally, no scoring rule is susceptible to the multiple districts paradox. This property is called reinforcement: Reinforcement: Suppose that \(N_1\) and \(N_2\) are disjoint sets of voters facing the same set of candidates. Further, suppose that \(W_1\) is the set of winners for the population \(N_1\), and \(W_2\) is the set of winners for the population \(N_2\). If there is at least one candidate that wins both elections, then the winner(s) for the entire population (including voters from both \(N_1\) and \(N_2\)) is the set of candidates that are in both \(W_1\) and \(W_2\) (i.e., the winners for the entire population is \(W_1\cap W_2\)). The reinforcement property explicitly rules out the multiple-districts paradox (so, candidates that win all sub-elections are guaranteed to win the full election). In order to characterize all scoring rules, one additional technical property is needed: Continuity: Suppose that a group of voters \(N_1\) elects a candidate \(A\) and a disjoint group of voters \(N_2\) elects a different candidate \(B\). Then there must be some number \(m\) such that the population consisting of the subgroup \(N_2\) together with \(m\) copies of \(N_1\) will elect \(A\). We then have: Theorem (Young 1975). Suppose that \(V\) is a voting method that requires voters to rank the candidates. Then, \(V\) satisfies Anonymity, Neutrality, Reinforcement and Continuity if and only if the method is a scoring rule. See Merlin 2003 and Chebotarev and Smais 1998 for surveys of other characterizations of scoring rules. Additional axioms single out Borda Count among all scoring methods (Young 1974; Gardenfors 1973; Nitzan and Rubinstein 1981). In fact, Saari has argued that “any fault or paradox admitted by Borda’s method also must be admitted by all other positional voting methods” (Saari 1989, pg. 454). For example, it is often remarked that Borda Count (and all scoring rules) can be easily manipulated by the voters. Saari (1995, Section 5.3.1) shows that among all scores rules Borda Count is the least susceptible to manipulation (in the sense that it has the fewest profiles where a small percentage of voters can manipulate the outcome). I have glossed over an important detail of Young’s characterization of scoring rules. Note that the reinforcement property refers to the behavior of a voting method on different populations of voters. To make this precise, the formal definition of a voting method must allow for domains that include profiles (i.e., sequences of ballots) of different lengths. To do this, it is convenient to assume that the domain of a voting method is an anonymized profile: Given a set of ballots \(\mathcal{B}\), an anonymous profile is a function \(\pi:\mathcal{B}\rightarrow\mathbb{N}\). Let \(\Pi\) be the set of all anonymous profiles. A variable domain voting method assigns a non-empty set of voters to each anonymous profile—i.e., it is a function \(V:\Pi\rightarrow \wp(X)-\emptyset\)). Of course, this builds in the property of Anonymity into the definition of a voting method. For this reason, Young (1975) does not need to state Anonymity as a characterizing property of scoring rules. Young’s axioms identify scoring rules out of the set of all functions defined from ballots that are rankings of candidates. In order to characterize the voting methods from Section 2.2, we need to change the set of ballots. For example, in order to characterize Approval Voting, the set of ballots \(\mathcal{B}\) is the set of non-empty subsets of the set of candidates—i.e., \(\mathcal{B}=\wp(X)-\emptyset\) (selecting the ballot \(X\) consisting of all candidates means that the voter abstains). Two additional axioms are needed to characterize Approval Voting: We then have: Theorem (Fishburn 1978b; Alos-Ferrer 2006 ). A variable domain voting method where the ballots are non-empty sets of candidates is Approval Voting if and only if it satisfies Faithfulness, Cancellation, and Reinforcement. Note that Approval Voting satisfies Neutrality even though it is not listed as one of the characterizing properties in the above theorem. This is because Alos-Ferrer (2006) showed that Neutrality is a consequence of Faithfulness, Cancellation and Reinforcement. See Fishburn 1978a and Baigent and Xu 1991 for alternative characterizations of Approval Voting, and Xu 2010 for a survey of the characterizations of Approval Voting (cf. the characterization of Approval Voting from Goodin and List 2006). Myerson (1995) introduced a general framework for characterizing abstract scoring rules that include Borda Count and Approval Voting as examples. The key idea is to think of a ballot, called a signal or a vote, as a function from candidates to a set \(\mathcal{V}\), where \(\mathcal{V}\) is a set of numbers. That is, the set of ballots is a subset of \(\mathcal{V}^X\) (the set of functions from \(X\) to \(\mathcal{V}\)). Then, an anonymous profile of signals assigns a score to each candidate \(X\) by summing the numbers assigned to \(X\) by each voter. This allows us to define voting methods by specifying the set of ballots: Myerson (1995) showed that an abstract voting rule is an abstract scoring rule if and only if it satisfies Reinforcement, Universal Domain (i.e. it is defined for all anonymous profiles), a version of the Neutrality property (adapted to the more abstract setting), and the Continuity property, which is called Overwhelming Majority. Pivato (2013) generalizes this result, and Gaertner and Xu (2012) provide a related characterization result (using different properties). Pivato (2014) characterizes Formal Utilitarian and Range Voting within the class of abstract scoring rules, and Mace (2018) extends this approach to cover a wider class of grading voting methods (including Majority Judgement). 4.3 Voting to Track the Truth The voting methods discussed above have been judged on procedural grounds. This “proceduralist approach to collective decision making” is defined by Coleman and Ferejohn (1986, p. 7) as one that “identifies a set of ideals with which any collective decision-making procedure ought to comply. … [A] process of collective decision making would be more or less justifiable depending on the extent to which it satisfies them.” The authors add that a distinguishing feature of proceduralism is that “what justifies a [collective] decision-making procedure is strictly a necessary property of the procedure — one entailed by the definition of the procedure alone.” Indeed, the characterization theorems discussed in the previous section can be viewed as an implementation of this idea (cf. Riker 1982). The general view is to analyze voting methods in terms of “fairness criteria” that ensure that a given method is sensitive to all of the voters’ opinions in the right way. However, one may not be interested only in whether a collective decision was arrived at “in the right way,” but in whether or not the collective decision is correct. This epistemic approach to voting is nicely explained by Joshua Cohen (1986, p. 34): An epistemic interpretation of voting has three main elements: (1) an independent standard of correct decisions — that is, an account of justice or of the common good that is independent of current consensus and the outcome of votes; (2) a cognitive account of voting — that is, the view that voting expresses beliefs about what the correct policies are according to the independent standard, not personal preferences for policies; and (3) an account of decision making as a process of the adjustment of beliefs, adjustments that are undertaken in part in light of the evidence about the correct answer that is provided by the beliefs of others. Under this interpretation of voting, a given method is judged on how well it “tracks the truth” of some objective fact (the truth of which is independent of the method being used). A comprehensive comparison of these two approaches to voting touches on a number of issues surrounding the justification of democracy (cf. Christiano 2008); however, I will not focus on these broader issues here. Instead, I briefly discuss an analysis of Majority Rule that takes this epistemic approach. The most well-known analysis comes from the writings of Condorcet (1785). The following theorem, which is attributed to Condorcet and was first proved formally by Laplace, shows that if there are only two options, then majority rule is, in fact, the best procedure from an epistemic point of view. This is interesting because it also shows that a proceduralist analysis and an epistemic analysis both single out Majority Rule as the “best” voting method when there are only two candidates. Assume that there are \(n\) voters that have to decide between two alternatives. Exactly one of these alternatives is (objectively) “correct” or “better.” The typical example here is a jury deciding whether or not a defendant is guilty. The two assumptions of the Condorcet jury theorem are: See Dietrich 2008 for a critical discussion of these two assumptions. The classic theorem is: Condorcet Jury Theorem. Suppose that Independence and Voter Competence are both satisfied. Then, as the group size increases, the probability that the majority chooses the correct option increases and converges to certainty. See Nitzan 2010 (part III) and Dietrich and Spiekermann 2013 for modern expositions of this theorem, and Goodin and Spiekermann 2018 for implications for the theory of democracy. Condorcet envisioned that the above argument could be adapted to voting situations with more than two alternatives. Young (1975, 1988, 1995) was the first to fully work out this idea (cf. List and Goodin 2001 who generalize the Condorcet Jury Theorem to more than two alternatives in a different framework). He showed (among other things) that the Borda Count can be viewed as the maximum likelihood estimator for identifying the best candidate. Conitzer and Sandholm (2005), Conitzer et al. (2009), Xia et al. (2010), and Xia (2016) take these ideas further by classifying different voting methods according to whether or not the methods can be viewed as a maximum likelihood estimator (for a noise model). The most general results along these lines can be found in Pivato 2013 which contains a series of results showing when voting methods can be interpreted as different kinds of statistical ‘estimators’. 4.4 Computational Social Choice One of the most active and exciting areas of research that is focused, in part, on the study of voting methods and voting paradoxes is computational social choice. This is an interdisciplinary research area that uses ideas and techniques from theoretical computer science and artificial intelligence to provide new perspectives and to ask new questions about methods for making group decisions; and to use voting methods in computational domains, such as recommendation systems, information retrieval, and crowdsourcing. It is beyond the scope of this article to survey this entire research area. Readers are encouraged to consult the Handbook of Computational Social Choice (Brandt et al. 2016) for an overview of this field (cf. also Endriss 2017). In the remainder of this section, I briefly highlight some work from this research area related to issues discussed in this article. Section 4.1 discussed election scenarios in which voters choose their ballots strategically and briefly introduced the Gibbard-Satterthwaite Theorem. This theorem shows that every voting method satisfying natural properties has profiles in which there is some voter, called a manipulator, that can achieve a better outcome by selecting a ballot that misrepresents her preferences. Importantly, in order to successfully manipulate an election, the manipulator must not only know which voting method is being used but also how the other members of society are voting. Although there is some debate about whether manipulation in this sense is in fact a problem (Dowding and van Hees 2008; Conitzer and Walsh, 2016, Section 6.2), there is interest in mechanisms that incentivize voters to report their “truthful” preferences. In a seminal paper, Bartholdi et al. (1989) argue that the complexity of computing which ballot will lead to a preferred outcome for the manipulator may provide a barrier to voting insincerely. See Faliszewski and Procaccia 2010, Faliszewski et al. 2010, Walsh 2011, Brandt et al. 2013, and Conitzer and Walsh 2016 for surveys of the literature on this and related questions, such as the the complexity of determining the winner given a voting method and the complexity of determining which voter or voters should be bribed to change their vote to achieve a given outcome. One of the most interesting lines of research in computational social choice is to use techniques and ideas from AI and theoretical computer science to design new voting methods. The main idea is to think of voting methods as solutions to an optimization problem. Consider the space of all rankings of the alternatives \(X\). Given a profile of rankings, the voting problem is to find an “optimal” group ranking (cf. the discussion or distance-based rationalizations of voting methods from Elkind et al. 2015). What counts as an “optimal” group ranking depends on assumptions about the type of the decision that the group is making. One assumption is that the voters have real-valued utilities for each candidate, but are only able to report rankings of the alternatives (it is assumed that the rankings represent the utility functions). The voting problem is to identify the candidates that maximizes the (expected) social welfare (the average of the voters’ utilities), given the partial information about the voters’ utilities—i.e., the profile of rankings of the candidates. See Pivato 2015 for a discussion of this approach to voting and Boutilier et al. 2015 for algorithms that solve different versions of this problem. A second assumption is that there is an objectively correct ranking of the alternatives and the voters’ rankings are noisy estimates of this ground truth. This way of thinking about the voting problem was introduced by Condorcet and discussed in Section 4.3. Procaccia et al. (2016) import ideas from the theory of error-correcting codes to develop an interesting new approach to aggregate rankings viewed as noisy estimates of some ground truth. 5. Concluding Remarks 5.1 From Theory to Practice As with any mathematical analysis of social phenomena, questions abound about the “real-life” implications of the theoretical analysis of the voting methods given above. The main question is whether the voting paradoxes are simply features of the formal framework used to represent an election scenario or formalizations of real-life phenomena. This raises a number of subtle issues about the scope of mathematical modeling in the social sciences, many of which fall outside the scope of this article. I conclude with a brief discussion of two questions that shed some light on how one should interpret the above analysis. How likely is a Condorcet Paradox or any of the other voting paradoxes? There are two ways to approach this question. The first is to calculate the probability that a majority cycle will occur in an election scenario. There is a sizable literature devoted to analytically deriving the probability of a majority cycle occurring in election scenarios of varying sizes (see Gehrlein 2006, and Regenwetter et al. 2006, for overviews of this literature). The calculations depend on assumptions about the distribution of rankings among the voters. One distribution that is typically used is the so-called impartial culture, where each ranking is possible and occurs with equal probability. For example, if there are three candidates, and it is assumed that the voters’ ballots are rankings of the candidates, then each possible ranking can occur with probability 1/6. Under this assumption, the probability of a majority cycle occurring has been calculated (see Gehrlein 2006, for details). Riker (1982, p. 122) has a table of the relevant calculations. Two observations about this data: First, as the number of candidates and voters increases, the probability of a majority cycles increases to certainty. Second, for a fixed number of candidates, the probability of a majority cycle still increases, though not necessarily to certainty (the number of voters is the independent variable here). For example, if there are five candidates and seven voters, then the probability of a majority cycle is 21.5 percent. This probability increases to 25.1 percent as the number of voters increases to infinity (keeping the number of candidates fixed) and to 100 percent as the number of candidates increases to infinity (keeping the number of voters fixed). Prima facie, this result suggests that we should expect to see instances of the Condorcet and related paradoxes in large elections. Of course, this interpretation takes it for granted that the impartial culture is a realistic assumption. Many authors have noted that the impartial culture is a significant idealization that almost certainly does not occur in real-life elections. Tsetlin et al. (2003) go even further arguing that the impartial culture is a worst-case scenario in the sense that any deviation results in lower probabilities of a majority cycle (see Regenwetter et al. 2006, for a complete discussion of this issue, and List and Goodin 2001, Appendix 3, for a related result). A second way to argue that the above theoretical observations are robust is to find supporting empirical evidence. For instance, is there evidence that majority cycles have occurred in actual elections? While Riker (1982) offers a number of intriguing examples, the most comprehensive analysis of the empirical evidence for majority cycles is provided by Mackie (2003, especially Chapters 14 and 15). The conclusion is that, in striking contrast to the probabilistic analysis referenced above, majority cycles typically have not occurred in actual elections. However, this literature has not reached a consensus about this issue (cf. Riker 1982): The problem is that the available data typically does not include voters’ opinions about all pairwise comparison of candidates, which is needed to determine if there is a majority cycle. So, this information must be inferred (for example, by using statistical methods) from the given data. A related line of research focuses on the influence of factors, such as polls (Reijngoud and Endriss 2012), social networks (Santoro and Beck 2017, Stirling 2016) and deliberation among the voters (List 2018), on the profiles of ballots that are actually realized in an election. For instance, List et al. 2013 has evidence suggesting that deliberation reduces the probability of a Condorcet cycle occurring. How do the different voting methods compare in actual elections? In this article, I have analyzed voting methods under highly idealized assumptions. But, in the end, we are interested in a very practical question: Which method should a group adopt? Of course, any answer to this question will depend on many factors that go beyond the abstract analysis given above (cf. Edelman 2012a). An interesting line of research focuses on incorporating empirical evidence into the general theory of voting. Evidence can come in the form of a computer simulation, a detailed analysis of a particular voting method in real-life elections (for example, see Brams 2008, Chapter 1, which analyzes Approval voting in practice), or as in situ experiments in which voters are asked to fill in additional ballots during an actual election (Laslier 2010, 2011). The most striking results can be found in the work of Michael Regenwetter and his colleagues. They have analyzed datasets from a variety of elections, showing that many of the usual voting methods that are considered irreconcilable (e.g., Plurality Rule, Borda Count and the Condorcet consistent methods from Section 3.1.1) are, in fact, in perfect agreement. This suggests that the “theoretical literature may promote overly pessimistic views about the likelihood of consensus among consensus methods” (Regenwetter et al. 2009, p. 840). See Regenwetter et al. 2006 for an introduction to the methods used in these analyses and Regenwetter et al. 2009 for the current state-of-the-art.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

Value Pluralism

1. Some Preliminary Clarifications 1.1 Foundational and Non-foundational Pluralism It is important to clarify the levels at which a moral theory might be pluralistic. Let us distinguish between two levels of pluralism: foundational and non-foundational. Foundational pluralism is the view that there are plural moral values at the most basic …

1. Some Preliminary Clarifications 1.1 Foundational and Non-foundational Pluralism It is important to clarify the levels at which a moral theory might be pluralistic. Let us distinguish between two levels of pluralism: foundational and non-foundational. Foundational pluralism is the view that there are plural moral values at the most basic level—that is to say, there is no one value that subsumes all other values, no one property of goodness, and no overarching principle of action. Non-foundational pluralism is the view that there are plural values at the level of choice, but these apparently plural values can be understood in terms of their contribution to one more fundamental value.[3] Judith Jarvis Thomson, a foundational pluralist, argues that when we say that something is good we are never ascribing a property of goodness, rather we are always saying that the thing in question is good in some way. If we say that a fountain pen is good we mean something different from when we say that a logic book is good, or a film is good. As Thomson puts it, all goodness is a goodness in a way. Thomson focusses her argument on Moore, who argues that when we say ‘x is good’ we do not mean ‘x is conducive to pleasure’, or ‘x is in accordance with a given set of rules’ and nor do we mean anything else that is purely descriptive. As Moore points out, we can always query whether any purely descriptive property really is good—so he concludes that goodness is simple and unanalyzable.[4] Moore is thus a foundational monist, he thinks that there is one non-natural property of goodness, and that all good things are good in virtue of having this property. Thomson finds this preposterous. In Thomson’s own words: Moore says that the question he will be addressing himself to in what follows is the question ‘What is good?’, and he rightly thinks that we are going to need a bit of help in seeing exactly what question he is expressing in those words. He proposes to help us by drawing attention to a possible answer to the question he is expressing—that is, to something that would be an answer to it, whether or not it is the correct answer to it. Here is what he offers us: “Books are good.” Books are good? What would you mean if you said ‘Books are good’? Moore, however, goes placidly on: “though [that would be] an answer obviously false; for some books are very bad indeed”. Well some books are bad to read or to look at, some are bad for use in teaching philosophy, some are bad for children. What sense could be made of a person who said, “No. no. I meant that some books are just plain bad things”? (Thomson 1997, pp. 275-276)  According to Thomson there is a fundamental plurality of ways of being good. We cannot reduce them to something they all have in common, or sensibly claim that there is a disjunctive property of goodness (such that goodness is ‘goodness in one of the various ways’. Thomson argues that that could not be an interesting property as each disjunct is truly different from every other disjunct. Thomson (1997), p. 277). Thomson is thus a foundational pluralist—she does not think that there is any one property of value at the most basic level. W.D. Ross is a foundational pluralist in a rather complex way. Most straightforwardly, Ross thinks that there are several prima facie duties, and there is nothing that they all have in common: they are irreducibly plural. This is the aspect of Ross’s view that is referred to with the phrase, ‘Ross-style pluralism’. However, Ross also thinks that there are goods in the world (justice and pleasure, for example), and that these are good because of some property they share. Goodness and rightness are not reducible to one another, so Ross is a pluralist about types of value as well as about principles. Writers do not always make the distinction between foundational and other forms of pluralism, but as well as Thomson and Ross, at least Bernard Williams (1981), Charles Taylor (1982), Charles Larmore (1987), John Kekes (1993), Michael Stocker (1990 and 1997), David Wiggins (1997) and Christine Swanton (2001) are all committed to foundational pluralism. Non-foundational pluralism is less radical—it posits a plurality of bearers of value. In fact, almost everyone accepts that there are plural bearers of value. This is compatible with thinking that there is only one ultimate value. G.E. Moore (1903), Thomson’s target, is a foundational monist, but he accepts that there are non-foundational plural values. Moore thinks that there are many different bearers of value, but he thinks that there is one property of goodness, and that it is a simple non-natural property that bearers of value possess in varying degrees. Moore is clear that comparison between plural goods proceeds in terms of the amount of goodness they have. This is not to say that the amount of goodness is always a matter of simple addition. Moore thinks that there can be organic unities, where the amount of goodness contributed by a certain value will vary according to the combination of values such as love and friendship. Thus Moore’s view is pluralist at the level of ordinary choices, and that is not without interesting consequences. (I shall return to the issue of how a foundational monist like Moore can account for organic unities in section 2.1.) Mill, a classic utilitarian, could be and often has been interpreted as thinking that there are irreducibly different sorts of pleasure. Mill argues that there are higher and lower pleasures, and that the higher pleasures (pleasures of the intellect as opposed to the body) are superior, in that higher pleasures can outweigh lower pleasures regardless of the quantity of the latter. As Mill puts it: “It is quite compatible with the principle of utility to recognize the fact, that some kinds of pleasure are more desirable and more valuable than others.” (2002, p. 241). On the foundational pluralist interpretation of Mill, there is not one ultimate good, but two (at least): higher and lower pleasures. Mill goes on to give an account of what he means: If I am asked, what I mean by difference in quality in pleasures, or what makes one pleasure more valuable than another, merely as a pleasure, except its being greater in amount, there is but one possible answer. Of two pleasures, if there be one to which all or almost all who have experience of both give a decided preference, irrespective of any feeling of moral obligation to prefer it, that is the more desirable pleasure. (2002, p. 241). The passage is ambiguous, it is not clear what role the expert judges play in the theory. On the pluralist interpretation of this passage we must take Mill as intending the role of the expert judges as a purely heuristic device: thinking about what such people would prefer is a way of discovering which pleasures are higher and which are lower, but the respective values of the pleasure is independent of the judges’ judgment. On a monist interpretation we must understand Mill as a preference utilitarian: the preferences of the judges determine value. On this interpretation there is one property of value (being preferred by expert judges) and many bearers of value (whatever the judges prefer).[5] Before moving on, it is worth noting that a theory might be foundationally monist in its account of what values there are, but not recommend that people attempt to think or make decisions on the basis of the supervalue. A distinction between decision procedures and criteria of right has become commonplace in moral philosophy. For example, a certain form of consequentialism has as its criterion of right action: act so as to maximize good consequences. This might invite the complaint that an agent who is constantly trying to maximize good consequences will often, in virtue of that fact, fail to do so. Sometimes concentrating too hard on the goal will make it less likely that the goal is achieved. A distinction between decision procedure and right action can provide a response—the consequentialist can say that the criterion of right action, (act so as to maximize good consequences) is not intended as a decision procedure—the agent should use whichever decision procedure is most likely to result in success. If, then, there is some attraction or instrumental advantage from the point of view of a particular theory to thinking in pluralist terms, then it is open to that theory to have a decision procedure that deals with apparently plural values, even if the theory is monist in every other way. [6] 1.2 A Purely Verbal Dispute? One final clarification about different understandings of pluralism ought to be made. There is an ambiguity between the name for a group of values and the name for one unitary value. There are really two problems here: distinguishing between the terms that refer to groups and the terms that refer to individuals (a merely linguistic problem) and defending the view that there really is  a candidate for a unitary value (a metaphysical problem). The linguistic problem comes about because in natural language we may use a singular term as ‘shorthand’: conceptual analysis may reveal that surface grammar does not reflect the real nature of the concept. For example, we use the term ‘well-being’ as if it refers to one single thing, but it is not hard to see that it may not. ‘Well-being’ may be a term that we use to refer to a group of things such as pleasure, health, a sense of achievement and so on. A theory that tells us that well-being is the only value may only be nominally monist. The metaphysical question is more difficult, and concerns whether there are any genuinely unitary values at all. The metaphysical question is rather different for naturalist and non-naturalist accounts of value. On Moore’s non-naturalist account, goodness is a unitary property but it is not a natural property: it is not empirically available to us, but is known by a special faculty of intuition. It is very clear that Moore thinks that goodness is a genuinely unitary property: ‘Good’, then, if we mean by it that quality which we assert to belong to a thing, when we say that the thing is good, is incapable of any definition, in the most important sense of that word. The most important sense of ‘definition’ is that in which a definition states what are the parts which invariably compose a certain whole; and in this sense ‘good’ has no definition because it is simple and has no parts. (Moore, 1903, p. 9) The question of whether there could be such a thing is no more easy or difficult than any question about the existence of non-natural entities. The issue of whether the entity is genuinely unitary is not an especially difficult part of that issue. By contrast, naturalist views do face a particular difficulty in giving an account of a value that is genuinely unitary. On the goods approach, for example, the claim must be that there is one good that is genuinely singular, not a composite of other goods. So for example, a monist hedonist must claim that pleasure really is just one thing. Pleasure is a concept we use to refer to something we take to be in the natural world, and conceptual analysis may or may not confirm that pleasure really is one thing. Perhaps, for example,  we refer both to intellectual and sensual experiences as pleasure. Or, take another good often suggested by proponents of the goods approach to value, friendship. It seems highly unlikely that there is one thing that we call friendship, even if there are good reasons to use one umbrella concept to refer to all those different things. Many of the plausible candidates for the good seem plausible precisely because they are very broad terms. If a theory is to be properly monist then, it must have an account of the good that is satisfactorily unitary. The problem applies to the deontological approach to value too. It is often relatively easy to determine whether a principle is really two or more principles in disguise—the presence of a conjunction or a disjunction, for example, is a clear giveaway. However, principles can contain terms that are unclear. Take for example a deontological theory that tells us to respect friendship. As mentioned previously, it is not clear whether there is one thing that is friendship or more than one, so it is not clear whether this is one principle about one thing, or one principle about several things, or whether it is really more than one principle. Questions about what makes individuals individuals and what the relationship is between parts and wholes have been discussed in the context of metaphysics but these issues have not been much discussed in the literature on pluralism and monism in moral philosophy. However, these issues are implicit in discussions of the well-being, nature of friendship and pleasure, and in the literature on Kant’s categorical imperative, or on Aristotelian accounts of eudaimonea. Part of an investigation into the nature of these things is an investigation into whether there really is one thing or not. [7] The upshot of this brief discussion is that monists must be able to defend their claim that the value they cite is genuinely one value. There may be fewer monist theories than it first appears. Further, the monist must accept the implications of a genuinely monist view. As Ruth Chang points out, (2015, p. 24) the simpler the monist’s account of the good is, the less likely it is that the monist will be able to give a good account of the various complexities in choice that seem an inevitable part of our experience of value. But on the other hand, if the monist starts to admit that the good is complex, the view gets closer and closer to being a pluralist view. However, the dispute between monists and pluralists is not merely verbal: there is no prima facie reason to think that there are no genuinely unitary properties, goods or principles. 2. The Attraction of Pluralism If values are plural, then choices between them will be complex. Pluralists have pressed the point that choices are complex, and so we should not shy away from the hypothesis that values are plural. In brief, the attraction of pluralism is that it seems to allow for the complexity and conflict that is part of our moral experience. We do not experience our moral choices as simple additive puzzles. Pluralists have argued that there are incommensurabilities and discontinuities in value comparisons, value remainders (or residues) when choices are made, and complexities in appropriate responses to value. Recent empirical work confirms that our ethical experience is of apparently irreducible plural values. (See Gill and Nichols, 2008.) 2.1 Discontinuities John Stuart Mill suggested that there are higher and lower pleasures (Mill, 2002, p. 241), the idea being that the value of higher and lower pleasures is measured on different scales. In other words, there are discontinuities in the measurement of value. As mentioned previously, it is unclear whether we should interpret Mill as a foundational pluralist, but the notion of higher and lower pleasures is a very useful one to illustrate the attraction of thinking that there are discontinuities in value. The distinction between higher and lower pleasures allows us to say that no amount of lower pleasures can outweigh some amount of higher pleasures. As Mill puts it, it is better to be an unhappy human being than a happy pig. In other words, the distinction allows us to say that there are discontinuities in value addition. As James Griffin (1986, p. 87) puts it: “We do seem, when informed, to rank a certain amount of life at a very high level above any amount of life at a very low level.” Griffin’s point is that there are discontinuities in the way we rank values, and this suggests that there are different values.[8] The phenomenon of discontinuities in our value rankings seems to support pluralism: if higher pleasures are not outweighed by lower pleasures, that suggests that they are not the same sort of thing. For if they were just the same sort of thing, there seems to be no reason why lower pleasures will not eventually outweigh higher pleasures. The most extreme form of discontinuity is incommensurability or incomparability, when two values cannot be ranked at all. Pluralists differ on whether pluralism entails incommensurabilities, and on what incommensurability entails for the possibility of choice. Griffin denies that pluralism entails incommensurability (Griffin uses the term incomparability) whereas other pluralists embrace incommensurability, but deny that it entails that rational choice is impossible. Some pluralists accept that there are sometimes cases where incommensurability precludes rational choice. We shall return to these issues in Section 4. 2.2 Value Conflicts and Rational Regret Michael Stocker (1990) and Bernard Williams (1973 and 1981) and others have argued that it can be rational to regret the outcome of a correct moral choice. That is,  even when the right choice has been made, the rejected option can reasonably be regretted, and so the choice involves a genuine value conflict. This seems strange if the options are being compared in terms of a supervalue. How can we regret having chosen more rather than less of the same thing? Yet the phenomenon seems undeniable, and pluralism can explain it. If there are plural values, then one can rationally regret not having chosen something which though less good, was different. It is worth noting that the pluralist argument is not that all cases of value conflict point to pluralism. There may be conflicts because of ignorance, for example, or because of irrationality, and these do not require positing plural values. Stocker argues that there are (at least) two sorts of value conflict that require plural values. The first is conflict that involves choices between doing things at different times. Stocker argues that goods become different values in different temporal situations, and the monist cannot accommodate this thought. The other sort of case (which Williams also points to) is when there is a conflict between things that have different advantages and disadvantages. The better option may be better, but it does not ‘make up for’ the lesser option, because it isn’t the same sort of thing. Thus there is a remainder—a moral value that is lost in the choice, and that it is rational to regret. Both Martha Nussbaum (1986) and David Wiggins (1980) have argued for pluralism on the grounds that only pluralism can explain akrasia, or weakness of will. An agent is said to suffer from weakness of will when she knowingly chooses a less good option over a better one. On the face of it, this is a puzzling thing to do—why would someone knowingly do what they know to be worse? A pluralist has a plausible answer—when the choice is between two different sorts of value, the agent is preferring A to B, rather than preferring less of A to more of A. Wiggins explains the akratic choice by suggesting that the agent is ‘charmed’ by some aspect of the choice, and is swayed by that to choose what she knows to be worse overall (Wiggins 1980, p. 257). However, even Michael Stocker, the arch pluralist, does not accept that this argument works. As Stocker points out, Wiggins is using a distinction between a cognitive and an affective element to the choice, and this distinction can explain akrasia on a monist account of value too. Imagine that a monist hedonist agent is faced with a choice between something that will give her more pleasure and something that will give her less pleasure. The cognitive aspect to the choice is clear—the agent knows that one option is more pleasurable than the other, and hence on her theory better. However, to say that the agent believes that more pleasure is better is not to say that she will always be attracted to the option that is most pleasurable. She may, on occasion, be attracted to the option that is more unusual or interesting. Hence she may act akratically because she was charmed by some aspect of the less good choice—and as Stocker says, there is no need to posit plural values to make sense of this—being charmed is not the same as valuing. (Stocker 1990, p.219). 2.3 Appropriate Responses to Value Another argument for pluralism starts from the observation that there are many and diverse appropriate responses to value. Christine Swanton (2003, ch. 2) and Elizabeth Anderson (1993) both take this line. As Swanton puts it: According to value centered monism, the rightness of moral responsiveness is determined entirely by degree or strength of value…I shall argue, on the contrary, that just how things are to be pursued, nurtured, respected, loved, preserved, protected, and so forth may often depend on further general features of those things, and their relations to other things, particularly the moral agent. (Swanton 2003, p. 41). The crucial thought is that there are various bases of moral responsiveness, and these bases are irreducibly plural. A monist could argue that there are different appropriate responses to value, but the monist would have to explain why there are different appropriate responses to the same value. Swanton’s point is that the only explanation the monist has is that different degrees of value merit different responses. According to Swanton, this does not capture what is really going on when we appropriately honor or respect a value rather than promoting it. Anderson and Swanton both argue that the complexity of our responses to value can only be explained by a pluralistic theory. Elizabeth Anderson argues that it is a mistake to understand moral goods on the maximising model. She uses the example of parental love (Anderson 1997, p. 98). Parents should not see their love for their children as being directed towards an “aggregate child collective”. Such a view would entail that trade offs were possible, that one child could be sacrificed for another. On Anderson’s view we can make rational choices between conflicting values without ranking values: “…choices concerning those goods or their continued existence do not generally require that we rank their values on a common scale and choose the more valuable good; they require that we give each good its due” (Anderson 1997, p. 104). 3. Monist Solutions I began the last section by saying that if foundational values are plural, then choices between them will be complex. It is clear that our choices are complex. However, it would be invalid to conclude from that that values are plural—the challenge for monists is to explain how they too can make sense of the complexity of our value choices. 3.1 Different Bearers of Value One way for monists to make sense of complexity in value choice is to point out that there are different bearers of value, and this makes a big difference to the experience of choice. (See Hurka, 1996; Schaber, 1999; Klocksiem 2011). Here is the challenge to monism in Michael Stocker’s words (Stocker, 1990, p. 272): “[if monism is true] there is no ground for rational conflict because the better option lacks nothing that would be made good by the lesser.” In other words, there are no relevant differences between the better and worse options except that the better option is better. Thomas Hurka objects that there can be such differences. For example, in a choice between giving five units of pleasure to A and ten units to B, the best option (more pleasure for B) involves giving no pleasure at all to A. So there is something to rationally regret, namely, that A had no pleasure. The argument can be expanded to deal with all sorts of choice situation: in each situation,  a monist can say something sensible about an unavoidable loss, a loss that really is a loss. If, of two options one will contribute more basic value, the monist must obviously choose that one. But the lesser of the options may contribute value via pleasure, while the superior option contributes value via knowledge, and so there is a loss in choosing the option with the greater value contribution—a loss in pleasure— and it is rational for us to regret this. There is one difficulty with this answer. The loss described by Hurka is not a moral loss, and so the regret is not moral regret. In Hurka’s example, the relevant loss is that A does not get any pleasure. The agent doing the choosing may be rational to regret this if she cares about A, or even if she just feels sorry for A, but there has been no moral loss, as ‘pleasure for A’ as opposed to pleasure itself is not a moral value. According to the view under consideration, pleasure itself is what matters morally, and so although A’s pleasure matters qua pleasure, the moral point of view takes B’s pleasure into account in just the same way, and there is nothing to regret, as there is more pleasure than there would otherwise have been. Stocker and Williams would surely insist that the point of their argument was not just that there is a loss, but that there is a moral loss. The monist cannot accommodate that point, as the monist can only consider the quantity of the value, not its distribution, and so we are at an impasse. However, the initial question was whether the monist has succeeded in explaining the phenomenon of ‘moral regret’, and perhaps Hurka has done that by positing a conflation of moral and non-moral regret in our experience. From our point of view, there is regret, and the monist can explain why that is without appealing to irrationality. On the other hand the monist cannot appeal to anything other than quantity of value in appraising the morality of the situation. So although Hurka is clearly right in so far as he is saying that a correct moral choice can be regretted for non-moral reasons, he can go no further than that. 3.2 Diminishing Marginal Value Another promising strategy that the monist can use in order to explain the complexity in our value choices is the appeal to ‘diminishing marginal value’. The value that is added to the sum by a source of value will tend to diminish after a certain point—this phenomenon is known as diminishing marginal value (or, sometimes, diminishing marginal utility). Mill’s higher and lower pleasures, which seem to be plural values, might be accommodated by the monist in this way. The monist makes sense of discontinuities in value by insisting on the distinction between sources of value, which are often ambiguously referred to as ‘values’, and the super value. Using a monist utilitarian account of value, we can distinguish between the non-evaluative description of options, the intermediate description, and the evaluative description as follows: On this account, painting produces beauty, and beauty (which is not a value but the intermediate source of value) produces value. Similarly, reading a book produces knowledge, and gaining knowledge produces value. Now it should be clear how the monist can make sense of phenomena like higher and lower pleasures. The non-evaluative options (e.g. eating donuts) have diminishing marginal non-basic value. On top of that, the intermediate effect, or non-basic value, (e.g. experiencing pleasure) can have a diminishing contribution to value. Varying diminishing marginal value in these cases is easily explained psychologically. It is just the way we are—we get less and less enjoyment from donuts as we eat more and more (at least in one sitting). However, we may well get the same amount of enjoyment from the tenth Johnny Cash song that we did from the first. In order to deal with the higher and lower pleasures case the monist will have to argue that pleasures themselves can have diminishing marginal utility—the monist can argue that gustatory pleasure gets boring after a while, and hence contributes less and less to the super value—well being, or whatever it is.[9] This picture brings us back to the distinction between foundational and non-foundational pluralism. Notice that the monist theories being imagined here are foundationally monist, because they claim that there is fundamentally one value, such as pleasure, and they are pluralist at the level of ordinary choice because they claim that there are intermediate values, such as knowledge and beauty, which are valuable because of the amount of pleasure they produce (or realize, or contain—the exact relationship will vary from theory to theory). 3.3 Theoretical Virtues The main advantage of pluralism is that it seems true to our experience of value. We experience values as plural, and pluralism tells is that values are indeed plural. The monist can respond, as we have seen, that there are ways to explain the apparent plurality of values without positing fundamentally plural values. Another, complementary strategy that the monist can pursue is to argue that monism has theoretical virtues that pluralism lacks. In general, it seems that theories should be as simple and coherent as possible, and that other things being equal, we should prefer a more coherent theory to a less coherent one. Thus so long as monism can make sense of enough of our intuitive judgments about the nature of value, then it is to be preferred to pluralism because it does better on the theoretical virtue of coherence. Another way to put this point is in terms of explanation. The monist can point out that the pluralist picture lacks explanatory depth. It seems that a list of values needs some further explanation: what makes these things values? (See Bradley, 2009, p.16). The monist picture is superior, because the monist can provide an explanation for the value of the (non-foundational) plurality of values: these things are values because they contribute to well-being, or pleasure, or whatever the foundational monist value is. (See also the discussion of this in the entry on value theory). Patricia Marino argues against this strategy (2016). She argues that ‘systematicity’ (the idea that it is better to have fewer principles) is not a good argument in favour of monism. Marino points out that explanation in terms of fewer fundamental principles is not necessarily better explanation. If there are plural values, then the explanation that appeals to plural values is a better one, in the sense that it is the true one: it doesn’t deny the plurality of values. (2016, p.124-125). Even if we could give a monist explanation without having to trade off against our pluralist intuitions, Marino argues, we have no particular reason to think that explanations appealing to fewer principles are superior. 3.4 Preference Satisfaction Views There is a different account of value that we ought to consider here: the view that value consists in preference or desire satisfaction. On this view, knowledge and pleasure and so on are valuable when they are desired, and if they are not desired anymore they are not valuable anymore. There is no need to appeal to complicated accounts of diminishing marginal utility: it is uncontroversial that we sometimes desire something and sometimes don’t. Thus complexities in choices are explained by complexities in our desires, and it is uncontroversial that our desires are complex. Imagine a one person preference satisfaction account of value that says simply that what is valuable is what P desires. Apparently this view is foundationally monist: there is only one thing that confers value (being desired by P), yet at the non-foundational level there are many values (whatever P desires). Let us say that P desires hot baths, donuts and knowledge. The structure of P’s desires is such that there is a complicated ranking of these things, which will vary from circumstance to circumstance. The ranking is not explained by the value of the objects,rather, her desire explains the ranking and determines the value of the objects. So it might be that P sometimes desires a hot bath and a donut equally, and cannot choose between them; it might be that sometimes she would choose knowledge over a hot bath and a donut, but sometimes she would choose a hot bath over knowledge. On James Griffin’s slightly more complex view, well-being consist in the fulfillment of informed desire, and Griffin points out that his view can explain discontinuities in value without having to appeal to diminishing marginal utility: there may well turn out to be cases in which, when informed, I want, say, a certain amount of one thing more than any amount of another, and not because the second thing cloys, and so adding to it merely produces diminishing marginal values. I may want it even though the second thing does not, with addition, lose its value; it may be that I think that no increase in that kind of value, even if constant and positive, can overtake a certain amount of this kind of value. (1986, p. 76). This version of foundational monism/normative pluralism escapes some of the problems that attend the goods approach. First, this view can account for deep complexities in choice. The plural goods that P is choosing between do not seem merely instrumental. Donuts are not good because they contribute to another value, and P does not desire donuts for any reason other than their donuty nature. On this view, if it is hard to choose between donuts and hot baths it is because of the intrinsic nature of the objects. The key here is that value is conferred by desire, not by contribution to another value. Second, this view can accommodate incomparabilities: if P desires a hot bath because of its hot bathy nature, and a donut because of its donuty nature, she may not be able to choose between them. However, it is not entirely clear that a view like Griffin’s is genuinely monist at the foundational level: the question arises, what is constraining the desires that qualify as value conferring? If the answer is ‘nothing’, then the view seems genuinely monist, but is probably implausible. Unconstrained desire accounts of value seem implausible because our desires can be for all sorts of things—we may desire things that are bad for us, or we may desire things because of some mistake we have made. If the answer is that there is something constraining the desires that count as value conferring, then of course the question is, ‘what?’ Is it the values of the things desired? A desire satisfaction view that restricts the qualifying desires must give an account of what restricts them, and obviously, the account may commit the view to foundational pluralism. Griffin addresses this question at the very beginning of his book on well being (Griffin, 1986, ch.2).[10] As he puts it, The danger is that desire accounts get plausible only by, in effect, ceasing to be desire accounts. We had to qualify desire with informed, and that gave prominence to the features or qualities of the objects of desire, and not to the mere existence of desire. (1986, p. 26). Griffin’s account of the relationship between desire and value is subtle, and (partly because Griffin himself does not distinguish between foundational and normative pluralism) it is difficult to say whether his view is foundationally pluralist or not. Griffin argues that it is a mistake to see desire as a blind motivational force—we desire things that we perceive in a favorable light- we take them to have a desirability feature. When we try to explain what involved in seeing things in a favorable light, we cannot, according to Griffin, separate understanding from desire: …we cannot, even in the case of a desirability feature such as accomplishment, separate understanding and desire. Once we see something as ‘accomplishment’, as ‘giving weight and substance to our lives’, there is no space left for desire to follow along in a secondary subordinate position. Desire is not blind. Understanding is not bloodless. Neither is the slave of the other. There is no priority. (1986, p. 30) This suggests that the view is indeed pluralist at the foundation—values are not defined entirely by desire, but partly by other features of the situation, and so at the most fundamental level there is more than one value making feature. Griffin himself says that “the desire account is compatible with a strong form of pluralism about values” (p. 31). I shall not pursue further the question whether or not Griffin is a foundational pluralist, my aim in this section is to show first, that monist preference satisfaction accounts of value may have more compelling ways of explaining complexities in value comparison than monist goods approaches, but second, to point out that any constrained desire account may well actually be foundationally pluralist. As soon as something is introduced to constrain the desires that qualify as value conferring, it looks as though another value is operating. 4. Pluralism and Rational Choice The big question facing pluralism is whether rational choices can be made between irreducibly plural values. Irreducible plurality appears to imply incommensurability—that is to say, that there is no common measure which can be used to compare two different values. (See the entry on incommensurable values.) Value incommensurability seems worrying: if values are incommensurable, then either we are forced into an ad hoc ranking, or we cannot rank the values at all. Neither of these are very appealing options. However, pluralists reject this dilemma. Bernard Williams argues that it is a mistake to think that pluralism implies that comparisons are impossible. He says: There is one motive for reductivism that does not operate simply on the ethical, or on the non-ethical, but tends to reduce every consideration to one basic kind. This rests on an assumption about rationality, to the effect that two considerations cannot be rationally weighed against each other unless there is a common consideration in terms of which they can be compared. This assumption is at once very powerful and utterly baseless. Quite apart from the ethical, aesthetic considerations can be weighed against economic ones (for instance) without being an application of them, and without their both being an example of a third kind of consideration. (Williams 1985, p. 17) Making a similar point, Ruth Chang points out that incommensurability is often conflated with incomparability. She provides clear definitions of each: incommensurability is the lack of a common unit of value by which precise comparisons can be made. Two items are incomparable, if there is no possible relation of comparison, such as ‘better than’, or ‘as good as’ (1997, Introduction). Chang points out that incommensurability is often thought to entail incomparability, but it does not. Defenders of pluralism have used various strategies to show that it is possible to make rational choices between plural values. 4.1 Practical Wisdom The pluralist’s most common strategy in the face of worries about choices between incommensurable values is to appeal to practical wisdom—the faculty described by Aristotle—a faculty of judgment that the wise and virtuous person has, which enables him to see the right answer. Practical wisdom is not just a question of being able to see and collate the facts, it goes beyond that in some way—the wise person will see things that only a wise person could see. So plural values can be compared in that a wise person will ‘just see’ that one course of action rather than another is to be taken. This strategy is used (explicitly or implicitly) by McDowell (1979), Nagel (1979), Larmore (1987), Skorupski (1996), Anderson (1993 and 1997) Wiggins (1997 and 1998), Chappell (1998), Swanton (2003). Here it is in Nagel’s words: Provided one has taken the process of practical justification as far as it will go in the course of arriving at the conflict, one may be able to proceed without further justification, but without irrationality either. What makes this possible is judgment—essentially the faculty Aristotle described as practical wisdom, which reveals itself over time in individual decisions rather than in the enunciation of general principles. (1979, p. 135) The main issue for this solution to the comparison problem is to come up with an account of what practical wisdom is. It is not easy to understand what sort of thing the faculty of judgment might be, or how it might work. Obviously pluralists who appeal to this strategy do not want to end up saying that the wise judge can see which of the options has more goodness, as that would constitute collapsing back into monism. So the pluralist has to maintain that the wise judge makes a judgment about what the right thing to do is without making any quantitative judgment. The danger is that the faculty seems entirely mysterious: it is a kind of magical vision, unrelated to our natural senses. As a solution to the comparison problem, the appeal to practical wisdom looks rather like way of shifting the problem to another level. Thus the appeal to practical wisdom cannot be left at that. The pluralist owes more explanation of what is involved in practical wisdom. What follows below are various pluralists’ accounts of how choice between plural values is possible, and whether such choice is rational. 4.2 Super Scales One direction that pluralists have taken is to argue that although values are plural, there is nonetheless an available scale on which to rank them. This scale is not rationalized by something that the values have in common (that would be monism), but by something over and above the values, which is not itself a super value. Williams sometimes writes as if this is his intention, as do Griffin (1986 and 1997), Stocker (1990), Chang (1997 and 2004), Taylor (1982 and 1997). James Griffin (1986) develops this suggestion in his discussion of plural prudential values. According to Griffin, we do not need to have a super-value to have super-scale. Griffin says: …it does not follow from there being no super-value that there is no super-scale. To think so would be to misunderstand how the notion of ‘quantity’ of well-being enters. It enters through ranking; quantitative differences are defined on qualitative ones. The quantity we are talking about is ‘prudential value’ defined on informed rankings. All that we need for the all-encompassing-scale is the possibility of ranking items on the basis of their nature. And we can, in fact, rank them in that way. We can work out trade-offs between different dimensions of pleasure or happiness. And when we do, we rank in a strong sense: not just choose one rather than the other, but regard it as worth more. That is the ultimate scale here: worth to one’s life. (Griffin 1986, p. 90) This passage is slightly hard to interpret (for more on why see my earlier discussion of Griffin in the section on preference satisfaction accounts). On one interpretation, Griffin is in fact espousing a sophisticated monism. The basic value is ‘worth to one’s life’, and though it is important to talk about non-basic values, such as the different dimensions of pleasure and happiness, they are ultimately judged in terms of their contribution to the worth of lives. The second possible interpretation takes Griffin’s claim that worth to life is not a supervalue seriously. On this interpretation, it is hard to see what worth to life is, if not a supervalue. Perhaps it is only a value that we should resort to when faced with incomparabilities. However, this interpretation invites the criticism that Griffin is introducing a non-moral value, perhaps prudential value, to arbitrate when moral values are incommensurable. In other words, we cannot decide between incommensurable values on moral grounds, so we should decide on prudential grounds. This seems reasonable when applied to incommensurabilities in aesthetic values. One might not be able to say whether Guernica is better than War and Peace, but one might choose to have Guernica displayed on the wall because it will impress one’s friends, or because it is worth more money, or even because one just enjoys it more. In the case of moral choices this is a less convincing strategy: it introduces a level of frivolity into morality that seems out of place. Stocker’s main strategy is to argue that values are plural, and comparisons are made, so it must be possible to make rational comparisons. He suggests that a “higher level synthesizing category” can explain how comparisons are made (1990, p. 172). According to Stocker these comparisons are not quantitative, they are evaluative: Suppose we are trying to choose between lying on a beach and discussing philosophy—or more particularly, between the pleasure of the former and the gain in understanding from the latter. To compare them we may invoke what might be called a higher-level synthesizing category. So, we may ask which will conduce to a more pleasing day, or to a day that is better spent. Once we have fixed upon the higher synthesizing category, we can often easily ask which option is better in regard to that category and judge which to choose on the basis of that. Even if it seems a mystery how we might ‘directly’ compare lying on the beach and discussing philosophy, it is a commonplace that we do compare them, e.g. in regard to their contribution to a pleasing day. (Stocker 1990, p. 72) Stocker claims that goodness is just the highest level synthesizing category, and that lower goods are constitutive means to the good. Ruth Chang’s approach to comparisons of plural values is very similar (Chang 1997 (introduction) and 2004). Chang claims that comparisons can only be made in terms of a covering value—a more comprehensive value that has the plural values as parts. There is a problem in understanding quite what a ‘synthesizing category’ or ‘covering value’ is. How does the covering value determine the relative weightings of the constituent values? One possibility is that it does it by pure stipulation—as a martini just is a certain proportion of gin and vermouth. However, stipulation does not have the right sort of explanatory power. On the other hand, if a view is to remain pluralist, it must avoid conflating the super scale with a super value. Chang argues that her covering values are sufficiently unitary to provide a basis for comparison, and yet preserve the separateness of the other values. Chang’s argument goes as follows: the values at stake in a situation (for example, prudence and morality) cannot on their own determine how heavily they weigh in a particular choice situation—the values weigh differently depending on the circumstances of the choice. However, the values plus the circumstances cannot determine relevant weightings either—because (I am simplifying here) the internal circumstances of the choice will affect the weighting of the values differently depending on the external circumstances. To use Chang’s own example, when the values at stake are prudence and morality (specifically, the duty to help an innocent victim), and the circumstances include the fact that the victim is far away, the effect this circumstance will have on the weighting of the values depends on external circumstances, which fix what matters in the choice. So, as Chang puts it, “‘What matters’ must therefore have content beyond the values and the circumstances of the choice” (2004, p. 134). Stocker is aware of the worry that appeal to something in terms of which comparisons can be made reduces the view to monism: Stocker insists that the synthesizing category (such as a good life) is not a unitary value—it is at most ‘nominal monism’ in my terminology. Stocker argues that it is a philosophical prejudice to think that rational judgment must be quantitative, and so he claims that he does not need to give an account of how we form and use the higher level synthesizing categories. 4.3 Basic Preferences Another approach to the comparison problem appeals to basic preferences. Joseph Raz takes the line that we can explain choice between irreducibly plural goods by talking about basic preferences. Raz approaches the issue of incommensurability by talking about the nature of agency and rationality instead of about the nature of value. He distinguishes between two conceptions of human agency: the rationalist conception, and the classical conception. The rationalist conception corresponds to what we have called the stronger use of the term rational. According to the rationalist conception, reasons require action. The classical conception, by contrast, “regards reasons as rendering options eligible” (Raz 1999, p. 47). Raz favors the classical conception, which regards the will as something separate from desire: The will is the ability to choose and perform intentional actions. We exercise our will when we endorse the verdict of reason that we must perform an action, and we do so, whether willingly, reluctantly, or regretting the need, etc. According to the classical conception, however, the most typical exercise or manifestation of the will is in choosing among options that reason merely renders eligible. Commonly when we so choose, we do what we want, and we choose what we want, from among the eligible options. Sometimes speaking of wanting one option (or its consequences) in preference to the other eligible ones is out of place. When I choose one tin of soup from a row of identical tins in the shop, it would be wrong and misleading to say that I wanted that tin rather than, or in preference to, the others. Similarly, when faced with unpalatable but unavoidable and incommensurate options (as when financial need forces me to give up one or another of incommensurate goods), it would be incorrect to say that I want to give up the one I choose to give up. I do not want to do so. I have to, and I would equally have regretted the loss of either good. I simply choose to give up one of them. (Raz, 1999, p. 48) Raz’s view about the nature of agency is defended in great detail over the course of many  articles, and all of those arguments cannot be examined in detail here. What is crucial in the context of this discussion of pluralism is whether Raz gives us a satisfactory account of the weaker sense of rational. Raz’s solution to the problem of incommensurability hangs on the claim that it can be rational (in the weak sense) to choose A over B  when there are no further reasons favouring A over B. We shall restrict ourselves to mentioning one objection to the view in the context of moral choices between plural goods. Though Raz’s account of choice may seem plausible in cases where we choose between non-moral values, it seems to do violence to the concept of morality. Consider one of Raz’s own examples, the choice between a banana and a pear. It may be that one has to choose between them, and there is no objective reason to choose one or the other. In this case, it seems Raz’s account of choice is plausible. If one feels like eating a banana, then in this case, desire does provide a reason. As Raz puts it, “A want can never tip the balance of reasons in and of itself. Rather, our wants become relevant when reasons have run their course.” In the example where we choose between a banana and a pear, this sounds fine. However, if we apply it to a moral choice it seems a lot less plausible. Raz admits that “If of the options available to agents in typical situations of choice and decision, several are incommensurate, then reason can neither determine nor completely explain their choices or actions” (Raz, 1999, p. 48). Thus many moral choices are not directed by reason but by a basic preference. It is not fair to call it a desire, because on Raz’s account we desire things for reasons—we take the object of our desire to be desirable. On Raz’s picture then, when reasons have run their course, we are choosing without reasons. It doesn’t matter hugely whether we call that ‘rational’ (it is not rational in the strong sense, but it is in the weak sense). What matters is whether this weak sense of rational is sufficient to satisfy our concept of moral choice as being objectively defensible. The problem is that choosing without reasons look rather like plumping. Plumping may be an intelligible form of choice, but it is questionable whether it is a satisfactory account of moral choice. 4.4 Accepting Incomparability One philosopher who is happy to accept that there may be situations where we just cannot make reasoned choices between plural values is Isaiah Berlin, who claimed that goods such as liberty and equality conflict at the fundamental level. Berlin is primarily concerned with political pluralism, and with defending political liberalism, but his views about incomparability have been very influential in discussions on moral pluralism. Bernard Williams (1981), Charles Larmore (1987), John Kekes (1993), Michael Stocker (1990 and 1997), David Wiggins (1997) have all argued that there are at least some genuinely irresolvable conflicts between values, and that to expect a rational resolution is a mistake. For Williams this is part of a more general mistake made by contemporary moral philosophers—he thinks that philosophy tries to make ethics too easy, too much like arithmetic. Williams insists throughout his writings that ethics is a much more complex and multi-faceted beast than its treatment at the hands of moral philosophers would suggest, and so it is not surprising to him that there should be situations where values conflict irresolvably. Stocker (1990) discusses the nature of moral conflict at great length, and although he thinks that many apparent conflicts can be dissolved or are not serious, like Williams, he argues that much of contemporary philosophy’s demand for simplicity is mistaken. Stocker argues that ethics need not always be action guiding, that value is much more complex than Kantians and utilitarians would have us think, and that as the world is complicated we will inevitably face conflicts. Several pluralists have argued that accepting the inevitability of value conflicts does not result in a  breakdown of moral argument, but rather the reverse. Kekes (1993), for example, claims that pluralism enables us to see that irresolvable disagreements are not due to wickedness on the part of our interlocutor, but may be due to the plural nature of values. 5. Conclusion The battle lines in the debate between pluralism and monism are not always clear. In this entry I have outlined some of them, and discussed some of the main arguments. Pluralists need to be clear about whether they are foundational or non-foundational pluralists. Monists must defend their claim that there really is a unitary value. Much of the debate between pluralists and monists has focussed on the issue of whether the complexity of moral choice implies that values really are plural—a pattern emerges in which the monist claims to be able to explain the appearance of plurality away, and the pluralist insists that the appearance reflects a pluralist reality. Finally, pluralists must explain how comparisons between values are made, or defend the consequence that incommensurability is widespread.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

Intrinsic vs. Extrinsic Value

1. What Has Intrinsic Value? The question “What is intrinsic value?” is more fundamental than the question “What has intrinsic value?,” but historically these have been treated in reverse order. For a long time, philosophers appear to have thought that the notion of intrinsic value is itself sufficiently clear to …

1. What Has Intrinsic Value? The question “What is intrinsic value?” is more fundamental than the question “What has intrinsic value?,” but historically these have been treated in reverse order. For a long time, philosophers appear to have thought that the notion of intrinsic value is itself sufficiently clear to allow them to go straight to the question of what should be said to have intrinsic value. Not even a potted history of what has been said on this matter can be attempted here, since the record is so rich. Rather, a few representative illustrations must suffice. In his dialogue Protagoras, Plato [428–347 B.C.E.] maintains (through the character of Socrates, modeled after the real Socrates [470–399 B.C.E.], who was Plato’s teacher) that, when people condemn pleasure, they do so, not because they take pleasure to be bad as such, but because of the bad consequences they find pleasure often to have. For example, at one point Socrates says that the only reason why the pleasures of food and drink and sex seem to be evil is that they result in pain and deprive us of future pleasures (Plato, Protagoras, 353e). He concludes that pleasure is in fact good as such and pain bad, regardless of what their consequences may on occasion be. In the Timaeus, Plato seems quite pessimistic about these consequences, for he has Timaeus declare pleasure to be “the greatest incitement to evil” and pain to be something that “deters from good” (Plato, Timaeus, 69d). Plato does not think of pleasure as the “highest” good, however. In the Republic, Socrates states that there can be no “communion” between “extravagant” pleasure and virtue (Plato, Republic, 402e) and in the Philebus, where Philebus argues that pleasure is the highest good, Socrates argues against this, claiming that pleasure is better when accompanied by intelligence (Plato, Philebus, 60e). Many philosophers have followed Plato’s lead in declaring pleasure intrinsically good and pain intrinsically bad. Aristotle [384–322 B.C.E.], for example, himself a student of Plato’s, says at one point that all are agreed that pain is bad and to be avoided, either because it is bad “without qualification” or because it is in some way an “impediment” to us; he adds that pleasure, being the “contrary” of that which is to be avoided, is therefore necessarily a good (Aristotle, Nicomachean Ethics, 1153b). Over the course of the more than two thousand years since this was written, this view has been frequently endorsed. Like Plato, Aristotle does not take pleasure and pain to be the only things that are intrinsically good and bad, although some have maintained that this is indeed the case. This more restrictive view, often called hedonism, has had proponents since the time of Epicurus [341–271 B.C.E.].[1] Perhaps the most thorough renditions of it are to be found in the works of Jeremy Bentham [1748–1832] and Henry Sidgwick [1838–1900] (see Bentham 1789, Sidgwick 1907); perhaps its most famous proponent is John Stuart Mill [1806–1873] (see Mill 1863). Most philosophers who have written on the question of what has intrinsic value have not been hedonists; like Plato and Aristotle, they have thought that something besides pleasure and pain has intrinsic value. One of the most comprehensive lists of intrinsic goods that anyone has suggested is that given by William Frankena (Frankena 1973, pp. 87–88): life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom; beauty, harmony, proportion in objects contemplated; aesthetic experience; morally good dispositions or virtues; mutual affection, love, friendship, cooperation; just distribution of goods and evils; harmony and proportion in one’s own life; power and experiences of achievement; self-expression; freedom; peace, security; adventure and novelty; and good reputation, honor, esteem, etc. (Presumably a corresponding list of intrinsic evils could be provided.) Almost any philosopher who has ever addressed the question of what has intrinsic value will find his or her answer represented in some way by one or more items on Frankena’s list. (Frankena himself notes that he does not explicitly include in his list the communion with and love and knowledge of God that certain philosophers believe to be the highest good, since he takes them to fall under the headings of “knowledge” and “love.”) One conspicuous omission from the list, however, is the increasingly popular view that certain environmental entities or qualities have intrinsic value (although Frankena may again assert that these are implicitly represented by one or more items already on the list). Some find intrinsic value, for example, in certain “natural” environments (wildernesses untouched by human hand); some find it in certain animal species; and so on. Suppose that you were confronted with some proposed list of intrinsic goods. It would be natural to ask how you might assess the accuracy of the list. How can you tell whether something has intrinsic value or not? On one level, this is an epistemological question about which this article will not be concerned. (See the entry in this encyclopedia on moral epistemology.) On another level, however, this is a conceptual question, for we cannot be sure that something has intrinsic value unless we understand what it is for something to have intrinsic value. 2. What Is Intrinsic Value? The concept of intrinsic value has been characterized above in terms of the value that something has “in itself,” or “for its own sake,” or “as such,” or “in its own right.” The custom has been not to distinguish between the meanings of these terms, but we will see that there is reason to think that there may in fact be more than one concept at issue here. For the moment, though, let us ignore this complication and focus on what it means to say that something is valuable for its own sake as opposed to being valuable for the sake of something else to which it is related in some way. Perhaps it is easiest to grasp this distinction by way of illustration. Suppose that someone were to ask you whether it is good to help others in time of need. Unless you suspected some sort of trick, you would answer, “Yes, of course.” If this person were to go on to ask you why acting in this way is good, you might say that it is good to help others in time of need simply because it is good that their needs be satisfied. If you were then asked why it is good that people’s needs be satisfied, you might be puzzled. You might be inclined to say, “It just is.” Or you might accept the legitimacy of the question and say that it is good that people’s needs be satisfied because this brings them pleasure. But then, of course, your interlocutor could ask once again, “What’s good about that?” Perhaps at this point you would answer, “It just is good that people be pleased,” and thus put an end to this line of questioning. Or perhaps you would again seek to explain the fact that it is good that people be pleased in terms of something else that you take to be good. At some point, though, you would have to put an end to the questions, not because you would have grown tired of them (though that is a distinct possibility), but because you would be forced to recognize that, if one thing derives its goodness from some other thing, which derives its goodness from yet a third thing, and so on, there must come a point at which you reach something whose goodness is not derivative in this way, something that “just is” good in its own right, something whose goodness is the source of, and thus explains, the goodness to be found in all the other things that precede it on the list. It is at this point that you will have arrived at intrinsic goodness (cf. Aristotle, Nicomachean Ethics, 1094a). That which is intrinsically good is nonderivatively good; it is good for its own sake. That which is not intrinsically good but extrinsically good is derivatively good; it is good, not (insofar as its extrinsic value is concerned) for its own sake, but for the sake of something else that is good and to which it is related in some way. Intrinsic value thus has a certain priority over extrinsic value. The latter is derivative from or reflective of the former and is to be explained in terms of the former. It is for this reason that philosophers have tended to focus on intrinsic value in particular. The account just given of the distinction between intrinsic and extrinsic value is rough, but it should do as a start. Certain complications must be immediately acknowledged, though. First, there is the possibility, mentioned above, that the terms traditionally used to refer to intrinsic value in fact refer to more than one concept; again, this will be addressed later (in this section and the next). Another complication is that it may not in fact be accurate to say that whatever is intrinsically good is nonderivatively good; some intrinsic value may be derivative. This issue will be taken up (in Section 5) when the computation of intrinsic value is discussed; it may be safely ignored for now. Still another complication is this. It is almost universally acknowledged among philosophers that all value is “supervenient” or “grounded in” on certain nonevaluative features of the thing that has value. Roughly, what this means is that, if something has value, it will have this value in virtue of certain nonevaluative features that it has; its value can be attributed to these features. For example, the value of helping others in time of need might be attributed to the fact that such behavior has the feature of being causally related to certain pleasant experiences induced in those who receive the help. Suppose we accept this and accept also that the experiences in question are intrinsically good. In saying this, we are (barring the complication to be discussed in Section 5) taking the value of the experiences to be nonderivative. Nonetheless, we may well take this value, like all value, to be supervenient on, or grounded in, something. In this case, we would probably simply attribute the value of the experiences to their having the feature of being pleasant. This brings out the subtle but important point that the question whether some value is derivative is distinct from the question whether it is supervenient. Even nonderivative value (value that something has in its own right; value that is, in some way, not attributable to the value of anything else) is usually understood to be supervenient on certain nonevaluative features of the thing that has value (and thus to be attributable, in a different way, to these features). To repeat: whatever is intrinsically good is (barring the complication to be discussed in Section 5) nonderivatively good. It would be a mistake, however, to affirm the converse of this and say that whatever is nonderivatively good is intrinsically good. As “intrinsic value” is traditionally understood, it refers to a particular way of being nonderivatively good; there are other ways in which something might be nonderivatively good. For example, suppose that your interlocutor were to ask you whether it is good to eat and drink in moderation and to exercise regularly. Again, you would say, “Yes, of course.” If asked why, you would say that this is because such behavior promotes health. If asked what is good about being healthy, you might cite something else whose goodness would explain the value of health, or you might simply say, “Being healthy just is a good way to be.” If the latter were your response, you would be indicating that you took health to be nonderivatively good in some way. In what way, though? Well, perhaps you would be thinking of health as intrinsically good. But perhaps not. Suppose that what you meant was that being healthy just is “good for” the person who is healthy (in the sense that it is in each person’s interest to be healthy), so that John’s being healthy is good for John, Jane’s being healthy is good for Jane, and so on. You would thereby be attributing a type of nonderivative interest-value to John’s being healthy, and yet it would be perfectly consistent for you to deny that John’s being healthy is intrinsically good. If John were a villain, you might well deny this. Indeed, you might want to insist that, in light of his villainy, his being healthy is intrinsically bad, even though you recognize that his being healthy is good for him. If you did say this, you would be indicating that you subscribe to the common view that intrinsic value is nonderivative value of some peculiarly moral sort.[2] Let us now see whether this still rough account of intrinsic value can be made more precise. One of the first writers to concern himself with the question of what exactly is at issue when we ascribe intrinsic value to something was G. E. Moore [1873–1958]. In his book Principia Ethica, Moore asks whether the concept of intrinsic value (or, more particularly, the concept of intrinsic goodness, upon which he tended to focus) is analyzable. In raising this question, he has a particular type of analysis in mind, one which consists in “breaking down” a concept into simpler component concepts. (One example of an analysis of this sort is the analysis of the concept of being a vixen in terms of the concepts of being a fox and being female.) His own answer to the question is that the concept of intrinsic goodness is not amenable to such analysis (Moore 1903, ch. 1). In place of analysis, Moore proposes a certain kind of thought-experiment in order both to come to understand the concept better and to reach a decision about what is intrinsically good. He advises us to consider what things are such that, if they existed by themselves “in absolute isolation,” we would judge their existence to be good; in this way, we will be better able to see what really accounts for the value that there is in our world. For example, if such a thought-experiment led you to conclude that all and only pleasure would be good in isolation, and all and only pain bad, you would be a hedonist.[3] Moore himself deems it incredible that anyone, thinking clearly, would reach this conclusion. He says that it involves our saying that a world in which only pleasure existed—a world without any knowledge, love, enjoyment of beauty, or moral qualities—is better than a world that contained all these things but in which there existed slightly less pleasure (Moore 1912, p. 102). Such a view he finds absurd. Regardless of the merits of this isolation test, it remains unclear exactly why Moore finds the concept of intrinsic goodness to be unanalyzable. At one point he attacks the view that it can be analyzed wholly in terms of “natural” concepts—the view, that is, that we can break down the concept of being intrinsically good into the simpler concepts of being A, being B, being C…, where these component concepts are all purely descriptive rather than evaluative. (One candidate that Moore discusses is this: for something to be intrinsically good is for it to be something that we desire to desire.) He argues that any such analysis is to be rejected, since it will always be intelligible to ask whether (and, presumably, to deny that) it is good that something be A, B, C,…, which would not be the case if the analysis were accurate (Moore 1903, pp. 15–16). Even if this argument is successful (a complicated matter about which there is considerable disagreement), it of course does not establish the more general claim that the concept of intrinsic goodness is not analyzable at all, since it leaves open the possibility that this concept is analyzable in terms of other concepts, some or all of which are not “natural” but evaluative. Moore apparently thinks that his objection works just as well where one or more of the component concepts A, B, C,…, is evaluative; but, again, many dispute the cogency of his argument. Indeed, several philosophers have proposed analyses of just this sort. For example, Roderick Chisholm [1916–1999] has argued that Moore’s own isolation test in fact provides the basis for an analysis of the concept of intrinsic value. He formulates a view according to which (to put matters roughly) to say that a state of affairs is intrinsically good or bad is to say that it is possible that its goodness or badness constitutes all the goodness or badness that there is in the world (Chisholm 1978). Eva Bodanszky and Earl Conee have attacked Chisholm’s proposal, showing that it is, in its details, unacceptable (Bodanszky and Conee 1981). However, the general idea that an intrinsically valuable state is one that could somehow account for all the value in the world is suggestive and promising; if it could be adequately formulated, it would reveal an important feature of intrinsic value that would help us better understand the concept. We will return to this point in Section 5. Rather than pursue such a line of thought, Chisholm himself responded (Chisholm 1981) in a different way to Bodanszky and Conee. He shifted from what may be called an ontological version of Moore’s isolation test—the attempt to understand the intrinsic value of a state in terms of the value that there would be if it were the only valuable state in existence—to an intentional version of that test—the attempt to understand the intrinsic value of a state in terms of the kind of attitude it would be fitting to have if one were to contemplate the valuable state as such, without reference to circumstances or consequences. This new analysis in fact reflects a general idea that has a rich history. Franz Brentano [1838–1917], C. D. Broad [1887–1971], W. D. Ross [1877–1971], and A. C. Ewing [1899–1973], among others, have claimed, in a more or less qualified way, that the concept of intrinsic goodness is analyzable in terms of the fittingness of some “pro” (i.e., positive) attitude (Brentano 1969, p. 18; Broad 1930, p. 283; Ross 1939, pp. 275–76; Ewing 1948, p. 152). Such an analysis, which has come to be called “the fitting attitude analysis” of value, is supported by the mundane observation that, instead of saying that something is good, we often say that it is valuable, which itself just means that it is fitting to value the thing in question. It would thus seem very natural to suppose that for something to be intrinsically good is simply for it to be such that it is fitting to value it for its own sake. (“Fitting” here is often understood to signify a particular kind of moral fittingness, in keeping with the idea that intrinsic value is a particular kind of moral value. The underlying point is that those who value for its own sake that which is intrinsically good thereby evince a kind of moral sensitivity.) Though undoubtedly attractive, this analysis can be and has been challenged. Brand Blanshard [1892–1987], for example, argues that the analysis is to be rejected because, if we ask why something is such that it is fitting to value it for its own sake, the answer is that this is the case precisely because the thing in question is intrinsically good; this answer indicates that the concept of intrinsic goodness is more fundamental than that of the fittingness of some pro attitude, which is inconsistent with analyzing the former in terms of the latter (Blanshard 1961, pp. 284–86). Ewing and others have resisted Blanshard’s argument, maintaining that what grounds and explains something’s being valuable is not its being good but rather its having whatever non-value property it is upon which its goodness supervenes; they claim that it is because of this underlying property that the thing in question is “both” good and valuable (Ewing 1948, pp. 157 and 172. Cf. Lemos 1994, p. 19). Thomas Scanlon calls such an account of the relation between valuableness, goodness, and underlying properties a buck-passing account, since it “passes the buck” of explaining why something is such that it is fitting to value it from its goodness to some property that underlies its goodness (Scanlon 1998, pp. 95 ff.). Whether such an account is acceptable has recently been the subject of intense debate. Many, like Scanlon, endorse passing the buck; some, like Blanshard, object to doing so. If such an account is acceptable, then Ewing’s analysis survives Blanshard’s challenge; but otherwise not. (Note that one might endorse passing the buck and yet reject Ewing’s analysis for some other reason. Hence a buck-passer may, but need not, accept the analysis. Indeed, there is reason to think that Moore himself is a buck-passer, even though he takes the concept of intrinsic goodness to be unanalyzable; cf. Olson 2006). Even if Blanshard’s argument succeeds and intrinsic goodness is not to be analyzed in terms of the fittingness of some pro attitude, it could still be that there is a strict correlation between something’s being intrinsically good and its being such that it is fitting to value it for its own sake; that is, it could still be both that (a) it is necessarily true that whatever is intrinsically good is such that it is fitting to value it for its own sake, and that (b) it is necessarily true that whatever it is fitting to value for its own sake is intrinsically good. If this were the case, it would reveal an important feature of intrinsic value, recognition of which would help us to improve our understanding of the concept. However, this thesis has also been challenged. Krister Bykvist has argued that what he calls solitary goods may constitute a counterexample to part (a) of the thesis (Bykvist 2009, pp. 4 ff.). Such (alleged) goods consist in states of affairs that entail that there is no one in a position to value them. Suppose, for example, that happiness is intrinsically good, and good in such a way that it is fitting to welcome it. Then, more particularly, the state of affairs of there being happy egrets is intrinsically good; so too, presumably, is the more complex state of affairs of there being happy egrets but no welcomers. The simpler state of affairs would appear to pose no problem for part (a) of the thesis, but the more complex state of affairs, which is an example of a solitary good, may pose a problem. For if to welcome a state of affairs entails that that state of affairs obtains, then welcoming the more complex state of affairs is logically impossible. Furthermore, if to welcome a state of affairs entails that one believes that that state of affairs obtains, then the pertinent belief regarding the more complex state of affairs would be necessarily false. In neither case would it seem plausible to say that welcoming the state of affairs is nonetheless fitting. Thus, unless this challenge can somehow be met, a proponent of the thesis must restrict the thesis to pro attitudes that are neither truth- nor belief-entailing, a restriction that might itself prove unwelcome, since it excludes a number of favorable responses to what is good (such as promoting what is good, or taking pleasure in what is good) to which proponents of the thesis have often appealed. As to part (b) of the thesis: some philosophers have argued that it can be fitting to value something for its own sake even if that thing is not intrinsically good. A relatively early version of this argument was again provided by Blanshard (1961, pp. 287 ff. Cf. Lemos 1994, p. 18). Recently the issue has been brought into stark relief by the following sort of thought-experiment. Imagine that an evil demon wants you to value him for his own sake and threatens to cause you severe suffering unless you do. It seems that you have good reason to do what he wants—it is appropriate or fitting to comply with his demand and value him for his own sake—even though he is clearly not intrinsically good (Rabinowicz and Rønnow-Rasmussen 2004, pp. 402 ff.). This issue, which has come to be known as “the wrong kind of reason problem,” has attracted a great deal of attention. Some have been persuaded that the challenge succeeds, while others have sought to undermine it. One final cautionary note. It is apparent that some philosophers use the term “intrinsic value” and similar terms to express some concept other than the one just discussed. In particular, Immanuel Kant [1724–1804] is famous for saying that the only thing that is “good without qualification” is a good will, which is good not because of what it effects or accomplishes but “in itself” (Kant 1785, Ak. 1–3). This may seem to suggest that Kant ascribes (positive) intrinsic value only to a good will, declaring the value that anything else may possess merely extrinsic, in the senses of “intrinsic value” and “extrinsic value” discussed above. This suggestion is, if anything, reinforced when Kant immediately adds that a good will “is to be esteemed beyond comparison as far higher than anything it could ever bring about,” that it “shine[s] like a jewel for its own sake,” and that its “usefulness…can neither add to, nor subtract from, [its] value.” For here Kant may seem not only to be invoking the distinction between intrinsic and extrinsic value but also to be in agreement with Brentano et al. regarding the characterization of the former in terms of the fittingness of some attitude, namely, esteem. (The term “respect” is often used in place of “esteem” in such contexts.) Nonetheless, it becomes clear on further inspection that Kant is in fact discussing a concept quite different from that with which this article is concerned. A little later on he says that all rational beings, even those that lack a good will, have “absolute value”; such beings are “ends in themselves” that have a “dignity” or “intrinsic value” that is “above all price” (Kant 1785, Ak. 64 and 77). Such talk indicates that Kant believes that the sort of value that he ascribes to rational beings is one that they possess to an infinite degree. But then, if this were understood as a thesis about intrinsic value as we have been understanding this concept, the implication would seem to be that, since it contains rational beings, ours is the best of all possible worlds.[4] Yet this is a thesis that Kant, along with many others, explicitly rejects elsewhere (Kant, Lectures in Ethics). It seems best to understand Kant, and other philosophers who have since written in the same vein (cf. Anderson 1993), as being concerned not with the question of what intrinsic value rational beings have—in the sense of “intrinsic value” discussed above—but with the quite different question of how we ought to behave toward such creatures (cf. Bradley 2006). 3. Is There Such a Thing As Intrinsic Value At All? In the history of philosophy, relatively few seem to have entertained doubts about the concept of intrinsic value. Much of the debate about intrinsic value has tended to be about what things actually do have such value. However, once questions about the concept itself were raised, doubts about its metaphysical implications, its moral significance, and even its very coherence began to appear. Consider, first, the metaphysics underlying ascriptions of intrinsic value. It seems safe to say that, before the twentieth century, most moral philosophers presupposed that the intrinsic goodness of something is a genuine property of that thing, one that is no less real than the properties (of being pleasant, of satisfying a need, or whatever) in virtue of which the thing in question is good. (Several dissented from this view, however. Especially well known for their dissent are Thomas Hobbes [1588–1679], who believed the goodness or badness of something to be constituted by the desire or aversion that one may have regarding it, and David Hume [1711–1776], who similarly took all ascriptions of value to involve projections of one’s own sentiments onto whatever is said to have value. See Hobbes 1651, Hume 1739.) It was not until Moore argued that this view implies that intrinsic goodness, as a supervening property, is a very different sort of property (one that he called “nonnatural”) from those (which he called “natural”) upon which it supervenes, that doubts about the view proliferated. One of the first to raise such doubts and to press for a view quite different from the prevailing view was Axel Hägerström [1868–1939], who developed an account according to which ascriptions of value are neither true nor false (Hägerström 1953). This view has come to be called “noncognitivism.” The particular brand of noncognitivism proposed by Hägerström is usually called “emotivism,” since it holds (in a manner reminiscent of Hume) that ascriptions of value are in essence expressions of emotion. (For example, an emotivist of a particularly simple kind might claim that to say “A is good” is not to make a statement about A but to say something like “Hooray for A!”) This view was taken up by several philosophers, including most notably A. J. Ayer [1910–1989] and Charles L. Stevenson [1908–1979] (see Ayer 1946, Stevenson 1944). Other philosophers have since embraced other forms of noncognitivism. R. M. Hare [1919–2002], for example, advocated the theory of “prescriptivism” (according to which moral judgments, including judgments about goodness and badness, are not descriptive statements about the world but rather constitute a kind of command as to how we are to act; see Hare 1952) and Simon Blackburn and Allan Gibbard have since proposed yet other versions of noncognitivism (Blackburn 1984, Gibbard 1990). Hägerström characterized his own view as a type of “value-nihilism,” and many have followed suit in taking noncognitivism of all kinds to constitute a rejection of the very idea of intrinsic value. But this seems to be a mistake. We should distinguish questions about value from questions about evaluation. Questions about value fall into two main groups, conceptual (of the sort discussed in the last section) and substantive (of the sort discussed in the first section). Questions about evaluation have to do with what precisely is going on when we ascribe value to something. Cognitivists claim that our ascriptions of value constitute statements that are either true or false; noncognitivists deny this. But even noncognitivists must recognize that our ascriptions of value fall into two fundamental classes—ascriptions of intrinsic value and ascriptions of extrinsic value—and so they too must concern themselves with the very same conceptual and substantive questions about value as cognitivists address. It may be that noncognitivism dictates or rules out certain answers to these questions that cognitivism does not, but that is of course quite a different matter from rejecting the very idea of intrinsic value on metaphysical grounds. Another type of metaphysical challenge to intrinsic value stems from the theory of “pragmatism,” especially in the form advanced by John Dewey [1859–1952] (see Dewey 1922). According to the pragmatist, the world is constantly changing in such a way that the solution to one problem becomes the source of another, what is an end in one context is a means in another, and thus it is a mistake to seek or offer a timeless list of intrinsic goods and evils, of ends to be achieved or avoided for their own sakes. This theme has been elaborated by Monroe Beardsley, who attacks the very notion of intrinsic value (Beardsley 1965; cf. Conee 1982). Denying that the existence of something with extrinsic value presupposes the existence of something else with intrinsic value, Beardsley argues that all value is extrinsic. (In the course of his argument, Beardsley rejects the sort of “dialectical demonstration” of intrinsic value that was attempted in the last section, when an explanation of the derivative value of helping others was given in terms of some nonderivative value.) A quick response to Beardsley’s misgivings about intrinsic value would be to admit that it may well be that, the world being as complex as it is, nothing is such that its value is wholly intrinsic; perhaps whatever has intrinsic value also has extrinsic value, and of course many things that have extrinsic value will have no (or, at least, neutral) intrinsic value. Far from repudiating the notion of intrinsic value, though, this admission would confirm its legitimacy. But Beardsley would insist that this quick response misses the point of his attack, and that it really is the case, not just that whatever has value has extrinsic value, but also that nothing has intrinsic value. His argument for this view is based on the claim that the concept of intrinsic value is “inapplicable,” in that, even if something had such value, we could not know this and hence its having such value could play no role in our reasoning about value. But here Beardsley seems to be overreaching. Even if it were the case that we cannot know whether something has intrinsic value, this of course leaves open the question whether anything does have such value. And even if it could somehow be shown that nothing does have such value, this would still leave open the question whether something could have such value. If the answer to this last question is “yes,” then the legitimacy of the concept of intrinsic value is in fact confirmed rather than refuted. As has been noted, some philosophers do indeed doubt the legitimacy, the very coherence, of the concept of intrinsic value. Before we turn to a discussion of this issue, however, let us for the moment presume that the concept is coherent and address a different sort of doubt: the doubt that the concept has any great moral significance. Recall the suggestion, mentioned in the last section, that discussions of intrinsic value may have been compromised by a failure to distinguish certain concepts. This suggestion is at the heart of Christine Korsgaard’s “Two Distinctions in Goodness” (Korsgaard 1983). Korsgaard notes that “intrinsic value” has traditionally been contrasted with “instrumental value” (the value that something has in virtue of being a means to an end) and claims that this approach is misleading. She contends that “instrumental value” is to be contrasted with “final value,” that is, the value that something has as an end or for its own sake; however, “intrinsic value” (the value that something has in itself, that is, in virtue of its intrinsic, nonrelational properties) is to be contrasted with “extrinsic value” (the value that something has in virtue of its extrinsic, relational properties). (An example of a nonrelational property is the property of being round; an example of a relational property is the property of being loved.) As an illustration of final value, Korsgaard suggests that gorgeously enameled frying pans are, in virtue of the role they play in our lives, good for their own sakes. In like fashion, Beardsley wonders whether a rare stamp may be good for its own sake (Beardsley 1965); Shelly Kagan says that the pen that Abraham Lincoln used to sign the Emancipation Proclamation may well be good for its own sake (Kagan 1998); and others have offered similar examples (cf. Rabinowicz and Rønnow-Rasmussen 1999 and 2003). Notice that in each case the value being attributed to the object in question is (allegedly) had in virtue of some extrinsic property of the object. This puts the moral significance of intrinsic value into question, since (as is apparent from our discussion so far) it is with the notion of something’s being valuable for its own sake that philosophers have traditionally been, and continue to be, primarily concerned. There is an important corollary to drawing a distinction between intrinsic value and final value (and between extrinsic value and nonfinal value), and that is that, contrary to what Korsgaard herself initially says, it may be a mistake to contrast final value with instrumental value. If it is possible, as Korsgaard claims, that final value sometimes supervenes on extrinsic properties, then it might be possible that it sometimes supervenes in particular on the property of being a means to some other end. Indeed, Korsgaard herself suggests this when she says that “certain kinds of things, such as luxurious instruments, … are valued for their own sakes under the condition of their usefulness” (Korsgaard 1983, p. 185). Kagan also tentatively endorses this idea. If the idea is coherent, then we should in principle distinguish two kinds of instrumental value, one final and the other nonfinal.[5] If something A is a means to something else B and has instrumental value in virtue of this fact, such value will be nonfinal if it is merely derivative from or reflective of B’s value, whereas it will be final if it is nonderivative, that is, if it is a value that A has in its own right (due to the fact that it is a means to B), irrespective of any value that B may or may not have in its own right. Even if it is agreed that it is final value that is central to the concerns of moral philosophers, we should be careful in drawing the conclusion that intrinsic value is not central to their concerns. First, there is no necessity that the term “intrinsic value” be reserved for the value that something has in virtue of its intrinsic properties; presumably it has been used by many writers simply to refer to what Korsgaard calls final value, in which case the moral significance of (what is thus called) intrinsic value has of course not been thrown into doubt. Nonetheless, it should probably be conceded that “final value” is a more suitable term than “intrinsic value” to refer to the sort of value in question, since the latter term certainly does suggest value that supervenes on intrinsic properties. But here a second point can be made, and that is that, even if use of the term “intrinsic value” is restricted accordingly, it is arguable that, contrary to Korsgaard’s contention, all final value does after all supervene on intrinsic properties alone; if that were the case, there would seem to be no reason not to continue to use the term “intrinsic value” to refer to final value. Whether this is in fact the case depends in part on just what sort of thing can be valuable for its own sake—an issue to be taken up in the next section. In light of the matter just discussed, we must now decide what terminology to adopt. It is clear that moral philosophers since ancient times have been concerned with the distinction between the value that something has for its own sake (the sort of nonderivative value that Korsgaard calls “final value”) and the value that something has for the sake of something else to which it is related in some way. However, given the weight of tradition, it seems justifiable, perhaps even advisable, to continue, despite Korsgaard’s misgivings, to use the terms “intrinsic value” and “extrinsic value” to refer to these two types of value; if we do so, however, we should explicitly note that this practice is not itself intended to endorse, or reject, the view that intrinsic value supervenes on intrinsic properties alone. Let us now turn to doubts about the very coherence of the concept of intrinsic value, so understood. In Principia Ethica and elsewhere, Moore embraces the consequentialist view, mentioned above, that whether an action is morally right or wrong turns exclusively on whether its consequences are intrinsically better than those of its alternatives. Some philosophers have recently argued that ascribing intrinsic value to consequences in this way is fundamentally misconceived. Peter Geach, for example, argues that Moore makes a serious mistake when comparing “good” with “yellow.”[6] Moore says that both terms express unanalyzable concepts but are to be distinguished in that, whereas the latter refers to a natural property, the former refers to a nonnatural one. Geach contends that there is a mistaken assimilation underlying Moore’s remarks, since “good” in fact operates in a way quite unlike that of “yellow”—something that Moore wholly overlooks. This contention would appear to be confirmed by the observation that the phrase “x is a yellow bird” splits up logically (as Geach puts it) into the phrase “x is a bird and x is yellow,” whereas the phrase “x is a good singer” does not split up in the same way. Also, from “x is a yellow bird” and “a bird is an animal” we do not hesitate to infer “x is a yellow animal,” whereas no similar inference seems warranted in the case of “x is a good singer” and “a singer is a person.” On the basis of these observations Geach concludes that nothing can be good in the free-standing way that Moore alleges; rather, whatever is good is good relative to a certain kind. Judith Thomson has recently elaborated on Geach’s thesis (Thomson 1997). Although she does not unqualifiedly agree that whatever is good is good relative to a certain kind, she does claim that whatever is good is good in some way; nothing can be “just plain good,” as she believes Moore would have it. Philippa Foot, among others, has made a similar charge (Foot 1985). It is a charge that has been rebutted by Michael Zimmerman, who argues that Geach’s tests are less straightforward than they may seem and fail after all to reveal a significant distinction between the ways in which “good” and “yellow” operate (Zimmerman 2001, ch. 2). He argues further that Thomson mischaracterizes Moore’s conception of intrinsic value. According to Moore, he claims, what is intrinsically good is not “just plain good”; rather, it is good in a particular way, in keeping with Thomson’s thesis that all goodness is goodness in a way. He maintains that, for Moore and other proponents of intrinsic value, such value is a particular kind of moral value. Mahrad Almotahari and Adam Hosein have revived Geach’s challenge (Almotahari and Hosein 2015). They argue that if, contrary to Geach, “good” could be used predicatively, we would be able to use the term predicatively in sentences of the form ‘a is a good K’ but, they argue, the linguistic evidence indicates that we cannot do so (Almotahari and Hosein 2015, 1493–4). 4. What Sort of Thing Can Have Intrinsic Value? Among those who do not doubt the coherence of the concept of intrinsic value there is considerable difference of opinion about what sort or sorts of entity can have such value. Moore does not explicitly address this issue, but his writings show him to have a liberal view on the matter. There are times when he talks of individual objects (e.g., books) as having intrinsic value, others when he talks of the consciousness of individual objects (or of their qualities) as having intrinsic value, others when he talks of the existence of individual objects as having intrinsic value, others when he talks of types of individual objects as having intrinsic value, and still others when he talks of states of individual objects as having intrinsic value. Moore would thus appear to be a “pluralist” concerning the bearers of intrinsic value. Others take a more conservative, “monistic” approach, according to which there is just one kind of bearer of intrinsic value. Consider, for example, Frankena’s long list of intrinsic goods, presented in Section 1 above: life, consciousness, etc. To what kind(s) of entity do such terms refer? Various answers have been given. Some (such as Panayot Butchvarov) claim that it is properties that are the bearers of intrinsic value (Butchvarov 1989, pp. 14–15). On this view, Frankena’s list implies that it is the properties of being alive, being conscious, and so on, that are intrinsically good. Others (such as Chisholm) claim that it is states of affairs that are the bearers of intrinsic value (Chisholm 1968–69, 1972, 1975). On this view, Frankena’s list implies that it is the states of affairs of someone (or something) being alive, someone being conscious, and so on, that are intrinsically good. Still others (such as Ross) claim that it is facts that are the bearers of intrinsic value (Ross 1930, pp. 112–13; cf. Lemos 1994, ch. 2). On this view, Frankena’s list implies that it is the facts that someone (or something) is alive, that someone is conscious, and so on, that are intrinsically good. (The difference between Chisholm’s and Ross’s views would seem to be this: whereas Chisholm would ascribe intrinsic value even to states of affairs, such as that of everyone being happy, that do not obtain, Ross would ascribe such value only to states of affairs that do obtain.) Ontologists often divide entities into two fundamental classes, those that are abstract and those that are concrete. Unfortunately, there is no consensus on just how this distinction is to be drawn. Most philosophers would classify the sorts of entities just mentioned (properties, states of affairs, and facts) as abstract. So understood, the claim that intrinsic value is borne by such entities is to be distinguished from the claim that it is borne by certain other closely related entities that are often classified as concrete. For example, it has recently been suggested that it is tropes that have intrinsic value.[7] Tropes are supposed to be a sort of particularized property, a kind of property-instance (rather than simply a property). (Thus the particular whiteness of a particular piece of paper is to be distinguished, on this view, from the property of whiteness.) It has also been suggested that it is states, understood as a kind of instance of states of affairs, that have intrinsic value (cf. Zimmerman 2001, ch. 3). Those who make monistic proposals of the sort just mentioned are aware that intrinsic value is sometimes ascribed to kinds of entities different from those favored by their proposals. They claim that all such ascriptions can be reduced to, or translated into, ascriptions of intrinsic value of the sort they deem proper. Consider, for example, Korsgaard’s suggestion that a gorgeously enameled frying pan is good for its own sake. Ross would say that this cannot be the case. If there is any intrinsic value to be found here, it will, according to Ross, not reside in the pan itself but in the fact that it plays a certain role in our lives, or perhaps in the fact that something plays this role, or in the fact that something that plays this role exists. (Others would make other translations in the terms that they deem appropriate.) On the basis of this ascription of intrinsic value to some fact, Ross could go on to ascribe a kind of extrinsic value to the pan itself, in virtue of its relation to the fact in question. Whether reduction of this sort is acceptable has been a matter of considerable debate. Proponents of monism maintain that it introduces some much-needed order into the discussion of intrinsic value, clarifying just what is involved in the ascription of such value and simplifying the computation of such value—on which point, see the next section. (A corollary of some monistic approaches is that the value that something has for its own sake supervenes on the intrinsic properties of that thing, so that there is a perfect convergence of the two sorts of values that Korsgaard calls “final” and “intrinsic”. On this point, see the last section; Zimmerman 2001, ch. 3; Tucker 2016; and Tucker (forthcoming).) Opponents argue that reduction results in distortion and oversimplification; they maintain that, even if there is intrinsic value to be found in such a fact as that a gorgeously enameled frying pan plays a certain role in our lives, there may yet be intrinsic, and not merely extrinsic, value to be found in the pan itself and perhaps also in its existence (cf. Rabinowicz and Rønnow-Rasmussen 1999 and 2003). Some propose a compromise according to which the kind of intrinsic value that can sensibly be ascribed to individual objects like frying pans is not the same kind of intrinsic value that is the topic of this article and can sensibly be ascribed to items of the sort on Frankena’s list (cf. Bradley 2006). (See again the cautionary note in the final paragraph of Section 2 above.) 5. How Is Intrinsic Value to Be Computed? In our assessments of intrinsic value, we are often and understandably concerned not only with whether something is good or bad but with how good or bad it is. Arriving at an answer to the latter question is not straightforward. At least three problems threaten to undermine the computation of intrinsic value. First, there is the possibility that the relation of intrinsic betterness is not transitive (that is, the possibility that something A is intrinsically better than something else B, which is itself intrinsically better than some third thing C, and yet A is not intrinsically better than C). Despite the very natural assumption that this relation is transitive, it has been argued that it is not (Rachels 1998; Temkin 1987, 1997, 2012). Should this in fact be the case, it would seriously complicate comparisons, and hence assessments, of intrinsic value. Second, there is the possibility that certain values are incommensurate. For example, Ross at one point contends that it is impossible to compare the goodness of pleasure with that of virtue. Whereas he had suggested in The Right and the Good that pleasure and virtue could be measured on the same scale of goodness, in Foundations of Ethics he declares this to be impossible, since (he claims) it would imply that pleasure of a certain intensity, enjoyed by a sufficient number of people or for a sufficient time, would counterbalance virtue possessed or manifested only by a small number of people or only for a short time; and this he professes to be incredible (Ross 1939, p. 275). But there is some confusion here. In claiming that virtue and pleasure are incommensurate for the reason given, Ross presumably means that they cannot be measured on the same ratio scale. (A ratio scale is one with an arbitrary unit but a fixed zero point. Mass and length are standardly measured on ratio scales.) But incommensurability on a ratio scale does not imply incommensurability on every scale—an ordinal scale, for instance. (An ordinal scale is simply one that supplies an ordering for the quantity in question, such as the measurement of arm-strength that is provided by an arm-wrestling competition.) Ross’s remarks indicate that he in fact believes that virtue and pleasure are commensurate on an ordinal scale, since he appears to subscribe to the arch-puritanical view that any amount of virtue is intrinsically better than any amount of pleasure. This view is just one example of the thesis that some goods are “higher” than others, in the sense that any amount of the former is better than any amount of the latter. This thesis can be traced to the ancient Greeks (Plato, Philebus, 21a-e; Aristotle, Nicomachean Ethics, 1174a), and it has been endorsed by many philosophers since, perhaps most famously by Mill (Mill 1863, paras. 4 ff). Interest in the thesis has recently been revived by a set of intricate and intriguing puzzles, posed by Derek Parfit, concerning the relative values of low-quantity/high-quality goods and high-quantity/low-quality goods (Parfit 1984, Part IV). One response to these puzzles (eschewed by Parfit himself) is to adopt the thesis of the nontransitivity of intrinsic betterness. Another is to insist on the thesis that some goods are higher than others. Such a response does not by itself solve the puzzles that Parfit raises, but, to the extent that it helps, it does so at the cost of once again complicating the computation of intrinsic value. To repeat: contrary to what Ross says, the thesis that some goods are higher than others implies that such goods are commensurate, and not that they are incommensurate. Some people do hold, however, that certain values really are incommensurate and thus cannot be compared on any meaningful scale. (Isaiah Berlin [1909–1997], for example, is often thought to have said this about the values of liberty and equality. Whether he is best interpreted in this way is debatable. See Berlin 1969.) This view constitutes a more radical threat to the computation of intrinsic value than does the view that intrinsic betterness is not transitive. The latter view presupposes at least some measure of commensurability. If A is better than B and B is better than C, then A is commensurate with B and B is commensurate with C; and even if it should turn out that A is not better than C, it may still be that A is commensurate with C, either because it is as good as C or because it is worse than C. But if A is incommensurate with B, then A is neither better than nor as good as nor worse than B. (Some claim, however, that the reverse does not hold and that, even if A is neither better than nor as good as nor worse than B, still A may be “on a par” with B and thus be roughly comparable with it. Cf. Chang 1997, 2002.) If such a case can arise, there is an obvious limit to the extent to which we can meaningfully say how good a certain complex whole is (here, “whole” is used to refer to whatever kind of entity may have intrinsic value); for, if such a whole comprises incommensurate goods A and B, then there will be no way of establishing just how good it is overall, even if there is a way of establishing how good it is with respect to each of A and B. There is a third, still more radical threat to the computation of intrinsic value. Quite apart from any concern with the commensurability of values, Moore famously claims that there is no easy formula for the determination of the intrinsic value of complex wholes because of the truth of what he calls the “principle of organic unities” (Moore 1903, p. 96). According to this principle, the intrinsic value of a whole must not be assumed to be the same as the sum of the intrinsic values of its parts (Moore 1903, p. 28) As an example of an organic unity, Moore gives the case of the consciousness of a beautiful object; he says that this has great intrinsic value, even though the consciousness as such and the beautiful object as such each have comparatively little, if any, intrinsic value. If the principle of organic unities is true, then there is scant hope of a systematic approach to the computation of intrinsic value. Although the principle explicitly rules out only summation as a method of computation, Moore’s remarks strongly suggest that there is no relation between the parts of a whole and the whole itself that holds in general and in terms of which the value of the latter can be computed by aggregating (whether by summation or by some other means) the values of the former. Moore’s position has been endorsed by many other philosophers. For example, Ross says that it is better that one person be good and happy and another bad and unhappy than that the former be good and unhappy and the latter bad and happy, and he takes this to be confirmation of Moore’s principle (Ross 1930, p. 72). Broad takes organic unities of the sort that Moore discusses to be just one instance of a more general phenomenon that he believes to be at work in many other situations, as when, for example, two tunes, each pleasing in its own right, make for a cacophonous combination (Broad 1985, p. 256). Others have furnished still further examples of organic unities (Chisholm 1986, ch. 7; Lemos 1994, chs. 3 and 4, and 1998; Hurka 1998). Was Moore the first to call attention to the phenomenon of organic unities in the context of intrinsic value? This is debatable. Despite the fact that he explicitly invoked what he called a “principle of summation” that would appear to be inconsistent with the principle of organic unities, Brentano appears nonetheless to have anticipated Moore’s principle in his discussion of Schadenfreude, that is, of malicious pleasure; he condemns such an attitude, even though he claims that pleasure as such is intrinsically good (Brentano 1969, p. 23 n). Certainly Chisholm takes Brentano to be an advocate of organic unities (Chisholm 1986, ch. 5), ascribing to him the view that there are many kinds of organic unity and building on what he takes to be Brentano’s insights (and, going further back in the history of philosophy, the insights of St. Thomas Aquinas [1225–1274] and others). Recently, a special spin has been put on the principle of organic unities by so-called “particularists.” Jonathan Dancy, for example, has claimed (in keeping with Korsgaard and others mentioned in Section 3 above), that something’s intrinsic value need not supervene on its intrinsic properties alone; in fact, the supervenience-base may be so open-ended that it resists generalization. The upshot, according to Dancy, is that the intrinsic value of something may vary from context to context; indeed, the variation may be so great that the thing’s value changes “polarity” from good to bad, or vice versa (Dancy 2000). This approach to value constitutes an endorsement of the principle of organic unities that is even more subversive of the computation of intrinsic value than Moore’s; for Moore holds that the intrinsic value of something is and must be constant, even if its contribution to the value of wholes of which it forms a part is not, whereas Dancy holds that variation can occur at both levels. Not everyone has accepted the principle of organic unities; some have held out hope for a more systematic approach to the computation of intrinsic value. However, even someone who is inclined to measure intrinsic value in terms of summation must acknowledge that there is a sense in which the principle of organic unities is obviously true. Consider some complex whole, W, that is composed of three goods, X, Y, and Z, which are wholly independent of one another. Suppose that we had a ratio scale on which to measure these goods, and that their values on this scale were 10, 20, and 30, respectively. We would expect someone who takes intrinsic value to be summative to declare the value of W to be (10 + 20 + 30 =) 60. But notice that, if X, Y, and Z are parts of W, then so too, presumably, are the combinations X-and-Y, X-and-Z, and Y-and-Z; the values of these combinations, computed in terms of summation, will be 30, 40, and 50, respectively. If the values of these parts of W were also taken into consideration when evaluating W, the value of W would balloon to 180. Clearly, this would be a distortion. Someone who wishes to maintain that intrinsic value is summative must thus show not only how the various alleged examples of organic unities provided by Moore and others are to be reinterpreted, but also how, in the sort of case just sketched, it is only the values of X, Y, and Z, and not the values either of any combinations of these components or of any parts of these components, that are to be taken into account when evaluating W itself. In order to bring some semblance of manageability to the computation of intrinsic value, this is precisely what some writers, by appealing to the idea of “basic” intrinsic value, have tried to do. The general idea is this. In the sort of example just given, each of X, Y, and Z is to be construed as having basic intrinsic value; if any combinations or parts of X, Y, and Z have intrinsic value, this value is not basic; and the value of W is to be computed by appealing only to those parts of W that have basic intrinsic value. Gilbert Harman was one of the first explicitly to discuss basic intrinsic value when he pointed out the apparent need to invoke such value if we are to avoid distortions in our evaluations (Harman 1967). However, he offers no precise account of the concept of basic intrinsic value and ends his paper by saying that he can think of no way to show that nonbasic intrinsic value is to be computed in terms of the summation of basic intrinsic value. Several philosophers have since tried to do better. Many have argued that nonbasic intrinsic value cannot always be computed by summing basic intrinsic value. Suppose that states of affairs can bear intrinsic value. Let X be the state of affairs of John being pleased to a certain degree x, and Y be the state of affairs of Jane being displeased to a certain degree y, and suppose that X has a basic intrinsic value of 10 and Y a basic intrinsic value of −20. It seems reasonable to sum these values and attribute an intrinsic value of −10 to the conjunctive state of affairs X&Y. But what of the disjunctive state of affairs XvY or the negative state of affairs ~X? How are their intrinsic values to be computed? Summation seems to be a nonstarter in these cases. Nonetheless, attempts have been made even in such cases to show how the intrinsic value of a complex whole is to be computed in a nonsummative way in terms of the basic intrinsic values of simpler states, thus preserving the idea that basic intrinsic value is the key to the computation of all intrinsic value (Quinn 1974, Chisholm 1975, Oldfield 1977, Carlson 1997). (These attempts have generally been based on the assumption that states of affairs are the sole bearers of intrinsic value. Matters would be considerably more complicated if it turned out that entities of several different ontological categories could all have intrinsic value.) Suggestions as to how to compute nonbasic intrinsic value in terms of basic intrinsic value of course presuppose that there is such a thing as basic intrinsic value, but few have attempted to provide an account of what basic intrinsic value itself consists in. Fred Feldman is one of the few (Feldman 2000; cf. Feldman 1997, pp. 116–18). Subscribing to the view that only states of affairs bear intrinsic value, Feldman identifies several features that any state of affairs that has basic intrinsic value in particular must possess. He maintains, for example, that whatever has basic intrinsic value must have it to a determinate degree and that this value cannot be “defeated” by any Moorean organic unity. In this way, Feldman seeks to preserve the idea that intrinsic value is summative after all. He does not claim that all intrinsic value is to be computed by summing basic intrinsic value, but he does insist that the value of entire worlds is to be computed in this way. Despite the detail in which Feldman characterizes the concept of basic intrinsic value, he offers no strict analysis of it. Others have tried to supply such an analysis. For example, by noting that, even if it is true that only states have intrinsic value, it may yet be that not all states have intrinsic value, Zimmerman suggests (to put matters somewhat roughly) that basic intrinsic value is the intrinsic value had by states none of whose proper parts have intrinsic value (Zimmerman 2001, ch. 5). On this basis he argues that disjunctive and negative states in fact have no intrinsic value at all, and thereby seeks to show how all intrinsic value is to be computed in terms of summation after all. Two final points. First, we are now in a position to see why it was said above (in Section 2) that perhaps not all intrinsic value is nonderivative. If it is correct to distinguish between basic and nonbasic intrinsic value and also to compute the latter in terms of the former, then there is clearly a respectable sense in which nonbasic intrinsic value is derivative. Second, if states with basic intrinsic value account for all the value that there is in the world, support is found for Chisholm’s view (reported in Section 2) that some ontological version of Moore’s isolation test is acceptable. 6. What Is Extrinsic Value? At the beginning of this article, extrinsic value was said simply—too simply—to be value that is not intrinsic. Later, once intrinsic value had been characterized as nonderivative value of a certain, perhaps moral kind, extrinsic value was said more particularly to be derivative value of that same kind. That which is extrinsically good is good, not (insofar as its extrinsic value is concerned) for its own sake, but for the sake of something else to which it is related in some way. For example, the goodness of helping others in time of need is plausibly thought to be extrinsic (at least in part), being derivative (at least in part) from the goodness of something else, such as these people’s needs being satisfied, or their experiencing pleasure, to which helping them is related in some causal way. Two questions arise. The first is whether so-called extrinsic value is really a type of value at all. There would seem to be a sense in which it is not, for it does not add to or detract from the value in the world. Consider some long chain of derivation. Suppose that the extrinsic value of A can be traced to the intrinsic value of Z by way of B, C, D… Thus A is good (for example) because of B, which is good because of C, and so on, until we get to Y’s being good because of Z; when it comes to Z, however, we have something that is good, not because of something else, but “because of itself,” i.e., for its own sake. In this sort of case, the values of A, B, …, Y are all parasitic on the value of Z. It is Z’s value that contributes to the value there is in the world; A, B, …, Y contribute no value of their own. (As long as the value of Z is the only intrinsic value at stake, no change of value would be effected in or imparted to the world if a shorter route from A to Z were discovered, one that bypassed some letters in the middle of the alphabet.) Why talk of “extrinsic value” at all, then? The answer can only be that we just do say that certain things are good, and others bad, not for their own sake but for the sake of something else to which they are related in some way. To say that these things are good and bad only in a derivative sense, that their value is merely parasitic on or reflective of the value of something else, is one thing; to deny that they are good or bad in any respectable sense is quite another. The former claim is accurate; hence the latter would appear unwarranted. If we accept that talk of “extrinsic value” can be appropriate, however, a second question then arises: what sort of relation must obtain between A and Z if A is to be said to be good “because of” Z? It is not clear just what the answer to this question is. Philosophers have tended to focus on just one particular causal relation, the means-end relation. This is the relation at issue in the example given earlier: helping others is a means to their needs being satisfied, which is itself a means to their experiencing pleasure. The term most often used to refer to this type of extrinsic value is “instrumental value,” although there is some dispute as to just how this term is to be employed. (Remember also, from Section 3 above, that on some views “instrumental value” may refer to a type of intrinsic, or final, value.) Suppose that A is a means to Z, and that Z is intrinsically good. Should we therefore say that A is instrumentally good? What if A has another consequence, Y, and this consequence is intrinsically bad? What, especially, if the intrinsic badness of Y is greater than the intrinsic goodness of Z? Some would say that in such a case A is both instrumentally good (because of Z) and instrumentally bad (because of Y). Others would say that it is correct to say that A is instrumentally good only if all of A’s causal consequences that have intrinsic value are, taken as a whole, intrinsically good. Still others would say that whether something is instrumentally good depends not only on what it causes to happen but also on what it prevents from happening (cf. Bradley 1998). For example, if pain is intrinsically bad, and taking an aspirin puts a stop to your pain but causes nothing of any positive intrinsic value, some would say that taking the aspirin is instrumentally good despite its having no intrinsically good consequences. Many philosophers write as if instrumental value is the only type of extrinsic value, but that is a mistake. Suppose, for instance, that the results of a certain medical test indicate that the patient is in good health, and suppose that this patient’s having good health is intrinsically good. Then we may well want to say that the results are themselves (extrinsically) good. But notice that the results are of course not a means to good health; they are simply indicative of it. Or suppose that making your home available to a struggling artist while you spend a year abroad provides him with an opportunity he would otherwise not have to create some masterpieces, and suppose that either the process or the product of this creation would be intrinsically good. Then we may well want to say that your making your home available to him is (extrinsically) good because of the opportunity it provides him, even if he goes on to squander the opportunity and nothing good comes of it. Or suppose that someone’s appreciating the beauty of the Mona Lisa would be intrinsically good. Then we may well want to say that the painting itself has value in light of this fact, a kind of value that some have called “inherent value” (Lewis 1946, p. 391; cf. Frankena 1973, p. 82). (“Inherent value” may not be the most suitable term to use here, since it may well suggest intrinsic value, whereas the sort of value at issue is supposed to be a type of extrinsic value. The value attributed to the painting is one that it is said to have in virtue of its relation to something else that would supposedly be intrinsically good if it occurred, namely, the appreciation of its beauty.) Many other instances could be given of cases in which we are inclined to call something good in virtue of its relation to something else that is or would be intrinsically good, even though the relation in question is not a means-end relation. One final point. It is sometimes said that there can be no extrinsic value without intrinsic value. This thesis admits of several interpretations. First, it might mean that nothing can occur that is extrinsically good unless something else occurs that is intrinsically good, and that nothing can occur that is extrinsically bad unless something else occurs that is intrinsically bad. Second, it might mean that nothing can occur that is either extrinsically good or extrinsically bad unless something else occurs that is either intrinsically good or intrinsically bad. On both these interpretations, the thesis is dubious. Suppose that no one ever appreciates the beauty of Leonardo’s masterpiece, and that nothing else that is intrinsically either good or bad ever occurs; still his painting may be said to be inherently good. Or suppose that the aspirin prevents your pain from even starting, and hence inhibits the occurrence of something intrinsically bad, but nothing else that is intrinsically either good or bad ever occurs; still your taking the aspirin may be said to be instrumentally good. On a third interpretation, however, the thesis might be true. That interpretation is this: nothing can occur that is either extrinsically good or extrinsically neutral or extrinsically bad unless something else occurs that is either intrinsically good or intrinsically neutral or intrinsically bad. This would be trivially true if, as some maintain, the nonoccurrence of something intrinsically either good or bad entails the occurrence of something intrinsically neutral. But even if the thesis should turn out to be false on this third interpretation, too, it would nonetheless seem to be true on a fourth interpretation, according to which the concept of extrinsic value, in all its varieties, is to be understood in terms of the concept of intrinsic value.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

The History of Utilitarianism

1. Precursors to the Classical Approach Though the first systematic account of utilitarianism was developed by Jeremy Bentham (1748–1832), the core insight motivating the theory occurred much earlier. That insight is that morally appropriate behavior will not harm others, but instead increase happiness or ‘utility.’ What is distinctive about utilitarianism …

1. Precursors to the Classical Approach Though the first systematic account of utilitarianism was developed by Jeremy Bentham (1748–1832), the core insight motivating the theory occurred much earlier. That insight is that morally appropriate behavior will not harm others, but instead increase happiness or ‘utility.’ What is distinctive about utilitarianism is its approach in taking that insight and developing an account of moral evaluation and moral direction that expands on it. Early precursors to the Classical Utilitarians include the British Moralists, Cumberland, Shaftesbury, Hutcheson, Gay, and Hume. Of these, Francis Hutcheson (1694–1746) is explicitly utilitarian when it comes to action choice. Some of the earliest utilitarian thinkers were the ‘theological’ utilitarians such as Richard Cumberland (1631–1718) and John Gay (1699–1745). They believed that promoting human happiness was incumbent on us since it was approved by God. After enumerating the ways in which humans come under obligations (by perceiving the “natural consequences of things”, the obligation to be virtuous, our civil obligations that arise from laws, and obligations arising from “the authority of God”) John Gay writes: “…from the consideration of these four sorts of obligation…it is evident that a full and complete obligation which will extend to all cases, can only be that arising from the authority of God; because God only can in all cases make a man happy or miserable: and therefore, since we are always obliged to that conformity called virtue, it is evident that the immediate rule or criterion of it is the will of God” (R, 412). Gay held that since God wants the happiness of mankind, and since God's will gives us the criterion of virtue, “…the happiness of mankind may be said to be the criterion of virtue, but once removed” (R, 413). This view was combined with a view of human motivation with egoistic elements. A person's individual salvation, her eternal happiness, depended on conformity to God's will, as did virtue itself. Promoting human happiness and one's own coincided, but, given God's design, it was not an accidental coincidence. This approach to utilitarianism, however, is not theoretically clean in the sense that it isn't clear what essential work God does, at least in terms of normative ethics. God as the source of normativity is compatible with utilitarianism, but utilitarianism doesn't require this. Gay's influence on later writers, such as Hume, deserves note. It is in Gay's essay that some of the questions that concerned Hume on the nature of virtue are addressed. For example, Gay was curious about how to explain our practice of approbation and disapprobation of action and character. When we see an act that is vicious we disapprove of it. Further, we associate certain things with their effects, so that we form positive associations and negative associations that also underwrite our moral judgments. Of course, that we view happiness, including the happiness of others as a good, is due to God's design. This is a feature crucial to the theological approach, which would clearly be rejected by Hume in favor of a naturalistic view of human nature and a reliance on our sympathetic engagement with others, an approach anticipated by Shaftesbury (below). The theological approach to utilitarianism would be developed later by William Paley, for example, but the lack of any theoretical necessity in appealing to God would result in its diminishing appeal. Anthony Ashley Cooper, the 3rd Earl of Shaftesbury (1671–1713) is generally thought to have been the one of the earliest ‘moral sense’ theorists, holding that we possess a kind of “inner eye” that allows us to make moral discriminations. This seems to have been an innate sense of right and wrong, or moral beauty and deformity. Again, aspects of this doctrine would be picked up by Francis Hutcheson and David Hume (1711–1776). Hume, of course, would clearly reject any robust realist implications. If the moral sense is like the other perceptual senses and enables us to pick up on properties out there in the universe around us, properties that exist independent from our perception of them, that are objective, then Hume clearly was not a moral sense theorist in this regard. But perception picks up on features of our environment that one could regard as having a contingent quality. There is one famous passage where Hume likens moral discrimination to the perception of secondary qualities, such as color. In modern terminology, these are response-dependent properties, and lack objectivity in the sense that they do not exist independent of our responses. This is radical. If an act is vicious, its viciousness is a matter of the human response (given a corrected perspective) to the act (or its perceived effects) and thus has a kind of contingency that seems unsettling, certainly unsettling to those who opted for the theological option. So, the view that it is part of our very nature to make moral discriminations is very much in Hume. Further — and what is relevant to the development of utilitarianism — the view of Shaftesbury that the virtuous person contributes to the good of the whole — would figure into Hume's writings, though modified. It is the virtue that contributes to the good of the whole system, in the case of Hume's artificial virtues. Shaftesbury held that in judging someone virtuous or good in a moral sense we need to perceive that person's impact on the systems of which he or she is a part. Here it sometimes becomes difficult to disentangle egoistic versus utilitarian lines of thought in Shaftesbury. He clearly states that whatever guiding force there is has made nature such that it is “…the private interest and good of every one, to work towards the general good, which if a creature ceases to promote, he is actually so far wanting to himself, and ceases to promote his own happiness and welfare…” (R, 188). It is hard, sometimes, to discern the direction of the ‘because’ — if one should act to help others because it supports a system in which one's own happiness is more likely, then it looks really like a form of egoism. If one should help others because that's the right thing to do — and, fortunately, it also ends up promoting one's own interests, then that's more like utilitarianism, since the promotion of self-interest is a welcome effect but not what, all by itself, justifies one's character or actions. Further, to be virtuous a person must have certain psychological capacities — they must be able to reflect on character, for example, and represent to themselves the qualities in others that are either approved or disapproved of. …in this case alone it is we call any creature worthy or virtuous when it can have the notion of a public interest, and can attain the speculation or science of what is morally good or ill, admirable or blameable, right or wrong….we never say of….any mere beast, idiot, or changeling, though ever so good-natured, that he is worthy or virtuous. (Shaftesbury IVM; BKI, PII, sec. iii) Thus, animals are not objects of moral appraisal on the view, since they lack the necessary reflective capacities. Animals also lack the capacity for moral discrimination and would therefore seem to lack the moral sense. This raises some interesting questions. It would seem that the moral sense is a perception that something is the case. So it isn't merely a discriminatory sense that allows us to sort perceptions. It also has a propositional aspect, so that animals, which are not lacking in other senses are lacking in this one. The virtuous person is one whose affections, motives, dispositions are of the right sort, not one whose behavior is simply of the right sort and who is able to reflect on goodness, and her own goodness [see Gill]. Similarly, the vicious person is one who exemplifies the wrong sorts of mental states, affections, and so forth. A person who harms others through no fault of his own “…because he has convulsive fits which make him strike and wound such as approach him” is not vicious since he has no desire to harm anyone and his bodily movements in this case are beyond his control. Shaftesbury approached moral evaluation via the virtues and vices. His utilitarian leanings are distinct from his moral sense approach, and his overall sentimentalism. However, this approach highlights the move away from egoistic views of human nature — a trend picked up by Hutcheson and Hume, and later adopted by Mill in criticism of Bentham's version of utilitarianism. For writers like Shaftesbury and Hutcheson the main contrast was with egoism rather than rationalism. Like Shaftesbury, Francis Hutcheson was very much interested in virtue evaluation. He also adopted the moral sense approach. However, in his writings we also see an emphasis on action choice and the importance of moral deliberation to action choice. Hutcheson, in An Inquiry Concerning Moral Good and Evil, fairly explicitly spelled out a utilitarian principle of action choice. (Joachim Hruschka (1991) notes, however, that it was Leibniz who first spelled out a utilitarian decision procedure.) ….In comparing the moral qualities of actions…we are led by our moral sense of virtue to judge thus; that in equal degrees of happiness, expected to proceed from the action, the virtue is in proportion to the number of persons to whom the happiness shall extend (and here the dignity, or moral importance of persons, may compensate numbers); and, in equal numbers, the virtue is the quantity of the happiness, or natural good; or that the virtue is in a compound ratio of the quantity of good, and number of enjoyers….so that that action is best, which procures the greatest happiness for the greatest numbers; and that worst, which, in like manner, occasions misery. (R, 283–4) Scarre notes that some hold the moral sense approach incompatible with this emphasis on the use of reason to determine what we ought to do; there is an opposition between just apprehending what's morally significant and a model in which we need to reason to figure out what morality demands of us. But Scarre notes these are not actually incompatible: The picture which emerges from Hutcheson's discussion is of a division of labor, in which the moral sense causes us to look with favor on actions which benefit others and disfavor those which harm them, while consequentialist reasoning determines a more precise ranking order of practical options in given situations. (Scarre, 53–54) Scarre then uses the example of telling a lie to illustrate: lying is harmful to the person to whom one lies, and so this is viewed with disfavor, in general. However, in a specific case, if a lie is necessary to achieve some notable good, consequentialist reasoning will lead us to favor the lying. But this example seems to put all the emphasis on a consideration of consequences in moral approval and disapproval. Stephen Darwall notes (1995, 216 ff.) that the moral sense is concerned with motives — we approve, for example, of the motive of benevolence, and the wider the scope the better. It is the motives rather than the consequences that are the objects of approval and disapproval. But inasmuch as the morally good person cares about what happens to others, and of course she will, she will rank order acts in terms of their effects on others, and reason is used in calculating effects. So there is no incompatibility at all. Hutcheson was committed to maximization, it seems. However, he insisted on a caveat — that “the dignity or moral importance of persons may compensate numbers.” He added a deontological constraint — that we have a duty to others in virtue of their personhood to accord them fundamental dignity regardless of the numbers of others whose happiness is to be affected by the action in question. Hume was heavily influenced by Hutcheson, who was one of his teachers. His system also incorporates insights made by Shaftesbury, though he certainly lacks Shaftesbury's confidence that virtue is its own reward. In terms of his place in the history of utilitarianism we should note two distinct effects his system had. Firstly, his account of the social utility of the artificial virtues influenced Bentham's thought on utility. Secondly, his account of the role sentiment played in moral judgment and commitment to moral norms influenced Mill's thoughts about the internal sanctions of morality. Mill would diverge from Bentham in developing the ‘altruistic’ approach to Utilitarianism (which is actually a misnomer, but more on that later). Bentham, in contrast to Mill, represented the egoistic branch — his theory of human nature reflected Hobbesian psychological egoism. 2. The Classical Approach The Classical Utilitarians, Bentham and Mill, were concerned with legal and social reform. If anything could be identified as the fundamental motivation behind the development of Classical Utilitarianism it would be the desire to see useless, corrupt laws and social practices changed. Accomplishing this goal required a normative ethical theory employed as a critical tool. What is the truth about what makes an action or a policy a morally good one, or morally right? But developing the theory itself was also influenced by strong views about what was wrong in their society. The conviction that, for example, some laws are bad resulted in analysis of why they were bad. And, for Jeremy Bentham, what made them bad was their lack of utility, their tendency to lead to unhappiness and misery without any compensating happiness. If a law or an action doesn't do any good, then it isn't any good. 2.1 Jeremy Bentham Jeremy Bentham (1748–1832) was influenced both by Hobbes' account of human nature and Hume's account of social utility. He famously held that humans were ruled by two sovereign masters — pleasure and pain. We seek pleasure and the avoidance of pain, they “…govern us in all we do, in all we say, in all we think…” (Bentham PML, 1). Yet he also promulgated the principle of utility as the standard of right action on the part of governments and individuals. Actions are approved when they are such as to promote happiness, or pleasure, and disapproved of when they have a tendency to cause unhappiness, or pain (PML). Combine this criterion of rightness with a view that we should be actively trying to promote overall happiness, and one has a serious incompatibility with psychological egoism. Thus, his apparent endorsement of Hobbesian psychological egoism created problems in understanding his moral theory since psychological egoism rules out acting to promote the overall well-being when that it is incompatible with one's own. For the psychological egoist, that is not even a possibility. So, given ‘ought implies can’ it would follow that we are not obligated to act to promote overall well-being when that is incompatible with our own. This generates a serious tension in Bentham's thought, one that was drawn to his attention. He sometimes seemed to think that he could reconcile the two commitments empirically, that is, by noting that when people act to promote the good they are helping themselves, too. But this claim only serves to muddy the waters, since the standard understanding of psychological egoism — and Bentham's own statement of his view — identifies motives of action which are self-interested. Yet this seems, again, in conflict with his own specification of the method for making moral decisions which is not to focus on self-interest — indeed, the addition of extent as a parameter along which to measure pleasure produced distinguishes this approach from ethical egoism. Aware of the difficulty, in later years he seemed to pull back from a full-fledged commitment to psychological egoism, admitting that people do sometimes act benevolently — with the overall good of humanity in mind. Bentham also benefited from Hume's work, though in many ways their approaches to moral philosophy were completely different. Hume rejected the egoistic view of human nature. Hume also focused on character evaluation in his system. Actions are significant as evidence of character, but only have this derivative significance. In moral evaluation the main concern is that of character. Yet Bentham focused on act-evaluation. There was a tendency — remarked on by J. B. Schneewind (1990), for example — to move away from focus on character evaluation after Hume and towards act-evaluation. Recall that Bentham was enormously interested in social reform. Indeed, reflection on what was morally problematic about laws and policies influenced his thinking on utility as a standard. When one legislates, however, one is legislating in support of, or against, certain actions. Character — that is, a person's true character — is known, if known at all, only by that person. If one finds the opacity of the will thesis plausible then character, while theoretically very interesting, isn't a practical focus for legislation. Further, as Schneewind notes, there was an increasing sense that focus on character would actually be disruptive, socially, particularly if one's view was that a person who didn't agree with one on a moral issues was defective in terms of his or her character, as opposed to simply making a mistake reflected in action. But Bentham does take from Hume the view that utility is the measure of virtue — that is, utility more broadly construed than Hume's actual usage of the term. This is because Hume made a distinction between pleasure that the perception of virtue generates in the observer, and social utility, which consisted in a trait's having tangible benefits for society, any instance of which may or may not generate pleasure in the observer. But Bentham is not simply reformulating a Humean position — he's merely been influenced by Hume's arguments to see pleasure as a measure or standard of moral value. So, why not move from pleasurable responses to traits to pleasure as a kind of consequence which is good, and in relation to which, actions are morally right or wrong? Bentham, in making this move, avoids a problem for Hume. On Hume's view it seems that the response — corrected, to be sure — determines the trait's quality as a virtue or vice. But on Bentham's view the action (or trait) is morally good, right, virtuous in view of the consequences it generates, the pleasure or utility it produces, which could be completely independent of what our responses are to the trait. So, unless Hume endorses a kind of ideal observer test for virtue, it will be harder for him to account for how it is people make mistakes in evaluations of virtue and vice. Bentham, on the other hand, can say that people may not respond to the actions good qualities — perhaps they don't perceive the good effects. But as long as there are these good effects which are, on balance, better than the effects of any alternative course of action, then the action is the right one. Rhetorically, anyway, one can see why this is an important move for Bentham to be able to make. He was a social reformer. He felt that people often had responses to certain actions — of pleasure or disgust — that did not reflect anything morally significant at all. Indeed, in his discussions of homosexuality, for example, he explicitly notes that ‘antipathy’ is not sufficient reason to legislate against a practice: The circumstances from which this antipathy may have taken its rise may be worth enquiring to…. One is the physical antipathy to the offence…. The act is to the highest degree odious and disgusting, that is, not to the man who does it, for he does it only because it gives him pleasure, but to one who thinks [?] of it. Be it so, but what is that to him? (Bentham OAO, v. 4, 94) Bentham then notes that people are prone to use their physical antipathy as a pretext to transition to moral antipathy, and the attending desire to punish the persons who offend their taste. This is illegitimate on his view for a variety of reasons, one of which is that to punish a person for violations of taste, or on the basis of prejudice, would result in runaway punishments, “…one should never know where to stop…” The prejudice in question can be dealt with by showing it “to be ill-grounded”. This reduces the antipathy to the act in question. This demonstrates an optimism in Bentham. If a pain can be demonstrated to be based on false beliefs then he believes that it can be altered or at the very least ‘assuaged and reduced’. This is distinct from the view that a pain or pleasure based on a false belief should be discounted. Bentham does not believe the latter. Thus Bentham's hedonism is a very straightforward hedonism. The one intrinsic good is pleasure, the bad is pain. We are to promote pleasure and act to reduce pain. When called upon to make a moral decision one measures an action's value with respect to pleasure and pain according to the following: intensity (how strong the pleasure or pain is), duration (how long it lasts), certainty (how likely the pleasure or pain is to be the result of the action), proximity (how close the sensation will be to performance of the action), fecundity (how likely it is to lead to further pleasures or pains), purity (how much intermixture there is with the other sensation). One also considers extent — the number of people affected by the action. Keeping track of all of these parameters can be complicated and time consuming. Bentham does not recommend that they figure into every act of moral deliberation because of the efficiency costs which need to be considered. Experience can guide us. We know that the pleasure of kicking someone is generally outweighed by the pain inflicted on that person, so such calculations when confronted with a temptation to kick someone are unnecessary. It is reasonable to judge it wrong on the basis of past experience or consensus. One can use ‘rules of thumb’ to guide action, but these rules are overridable when abiding by them would conflict with the promotion of the good. Bentham's view was surprising to many at the time at least in part because he viewed the moral quality of an action to be determined instrumentally. It isn't so much that there is a particular kind of action that is intrinsically wrong; actions that are wrong are wrong simply in virtue of their effects, thus, instrumentally wrong. This cut against the view that there are some actions that by their very nature are just wrong, regardless of their effects. Some may be wrong because they are ‘unnatural’ — and, again, Bentham would dismiss this as a legitimate criterion. Some may be wrong because they violate liberty, or autonomy. Again, Bentham would view liberty and autonomy as good — but good instrumentally, not intrinsically. Thus, any action deemed wrong due to a violation of autonomy is derivatively wrong on instrumental grounds as well. This is interesting in moral philosophy — as it is far removed from the Kantian approach to moral evaluation as well as from natural law approaches. It is also interesting in terms of political philosophy and social policy. On Bentham's view the law is not monolithic and immutable. Since effects of a given policy may change, the moral quality of the policy may change as well. Nancy Rosenblum noted that for Bentham one doesn't simply decide on good laws and leave it at that: “Lawmaking must be recognized as a continual process in response to diverse and changing desires that require adjustment” (Rosenblum 1978, 9). A law that is good at one point in time may be a bad law at some other point in time. Thus, lawmakers have to be sensitive to changing social circumstances. To be fair to Bentham's critics, of course, they are free to agree with him that this is the case in many situations, just not all — and that there is still a subset of laws that reflect the fact that some actions just are intrinsically wrong regardless of consequences. Bentham is in the much more difficult position of arguing that effects are all there are to moral evaluation of action and policy. 2.2 John Stuart Mill John Stuart Mill (1806–1873) was a follower of Bentham, and, through most of his life, greatly admired Bentham's work even though he disagreed with some of Bentham's claims — particularly on the nature of ‘happiness.’ Bentham, recall, had held that there were no qualitative differences between pleasures, only quantitative ones. This left him open to a variety of criticisms. First, Bentham's Hedonism was too egalitarian. Simple-minded pleasures, sensual pleasures, were just as good, at least intrinsically, than more sophisticated and complex pleasures. The pleasure of drinking a beer in front of the T.V. surely doesn't rate as highly as the pleasure one gets solving a complicated math problem, or reading a poem, or listening to Mozart. Second, Bentham's view that there were no qualitative differences in pleasures also left him open to the complaint that on his view human pleasures were of no more value than animal pleasures and, third, committed him to the corollary that the moral status of animals, tied to their sentience, was the same as that of humans. While harming a puppy and harming a person are both bad, however, most people had the view that harming the person was worse. Mill sought changes to the theory that could accommodate those sorts of intuitions. To this end, Mill's hedonism was influenced by perfectionist intuitions. There are some pleasures that are more fitting than others. Intellectual pleasures are of a higher, better, sort than the ones that are merely sensual, and that we share with animals. To some this seems to mean that Mill really wasn't a hedonistic utilitarian. His view of the good did radically depart from Bentham's view. However, like Bentham, the good still consists in pleasure, it is still a psychological state. There is certainly that similarity. Further, the basic structures of the theories are the same (for more on this see Donner 1991). While it is true that Mill is more comfortable with notions like ‘rights’ this does not mean that he, in actuality, rejected utilitarianism. The rationale for all the rights he recognizes is utilitarian. Mill's ‘proof’ of the claim that intellectual pleasures are better in kind than others, though, is highly suspect. He doesn't attempt a mere appeal to raw intuition. Instead, he argues that those persons who have experienced both view the higher as better than the lower. Who would rather be a happy oyster, living an enormously long life, than a person living a normal life? Or, to use his most famous example — it is better to be Socrates ‘dissatisfied’ than a fool ‘satisfied.’ In this way Mill was able to solve a problem for utilitarianism. Mill also argued that the principle could be proven, using another rather notorious argument: The only proof capable of being given that an object is visible is that people actually see it…. In like manner, I apprehend, the sole evidence it is possible to produce that anything is desirable is that people do actually desire it. If the end which the utilitarian doctrine proposes to itself were not, in theory and in practiced, acknowledged to be an end, nothing could ever convince any person that it was so. (Mill, U, 81) Mill then continues to argue that people desire happiness — the utilitarian end — and that the general happiness is “a good to the aggregate of all persons.” (81) G. E. Moore (1873–1958) criticized this as fallacious. He argued that it rested on an obvious ambiguity: Mill has made as naïve and artless a use of the naturalistic fallacy as anybody could desire. “Good”, he tells us, means “desirable”, and you can only find out what is desirable by seeking to find out what is actually desired…. The fact is that “desirable” does not mean “able to be desired” as “visible” means “able to be seen.” The desirable means simply what ought to be desired or deserves to be desired; just as the detestable means not what can be but what ought to be detested… (Moore, PE, 66–7) It should be noted, however, that Mill was offering this as an alternative to Bentham's view which had been itself criticized as a ‘swine morality,’ locating the good in pleasure in a kind of indiscriminate way. The distinctions he makes strike many as intuitively plausible ones. Bentham, however, can accommodate many of the same intuitions within his system. This is because he notes that there are a variety of parameters along which we quantitatively measure pleasure — intensity and duration are just two of those. His complete list is the following: intensity, duration, certainty or uncertainty, propinquity or remoteness, fecundity, purity, and extent. Thus, what Mill calls the intellectual pleasures will score more highly than the sensual ones along several parameters, and this could give us reason to prefer those pleasures — but it is a quantitative not a qualitative reason, on Bentham's view. When a student decides to study for an exam rather than go to a party, for example, she is making the best decision even though she is sacrificing short term pleasure. That's because studying for the exam, Bentham could argue, scores higher in terms of the long term pleasures doing well in school lead to, as well as the fecundity of the pleasure in leading to yet other pleasures. However, Bentham will have to concede that the very happy oyster that lives a very long time could, in principle, have a better life than a normal human. Mill's version of utilitarianism differed from Bentham's also in that he placed weight on the effectiveness of internal sanctions — emotions like guilt and remorse which serve to regulate our actions. This is an off-shoot of the different view of human nature adopted by Mill. We are the sorts of beings that have social feelings, feelings for others, not just ourselves. We care about them, and when we perceive harms to them this causes painful experiences in us. When one perceives oneself to be the agent of that harm, the negative emotions are centered on the self. One feels guilt for what one has done, not for what one sees another doing. Like external forms of punishment, internal sanctions are instrumentally very important to appropriate action. Mill also held that natural features of human psychology, such as conscience and a sense of justice, underwrite motivation. The sense of justice, for example, results from very natural impulses. Part of this sense involves a desire to punish those who have harmed others, and this desire in turn “…is a spontaneous outgrowth from two sentiments, both in the highest degree natural…; the impulse of self-defense, and the feeling of sympathy.” (Chapter 5, Utilitarianism) Of course, he goes on, the justification must be a separate issue. The feeling is there naturally, but it is our ‘enlarged’ sense, our capacity to include the welfare of others into our considerations, and make intelligent decisions, that gives it the right normative force. Like Bentham, Mill sought to use utilitarianism to inform law and social policy. The aim of increasing happiness underlies his arguments for women's suffrage and free speech. We can be said to have certain rights, then — but those rights are underwritten by utility. If one can show that a purported right or duty is harmful, then one has shown that it is not genuine. One of Mills most famous arguments to this effect can be found in his writing on women's suffrage when he discusses the ideal marriage of partners, noting that the ideal exists between individuals of “cultivated faculties” who influence each other equally. Improving the social status of women was important because they were capable of these cultivated faculties, and denying them access to education and other opportunities for development is forgoing a significant source of happiness. Further, the men who would deny women the opportunity for education, self-improvement, and political expression do so out of base motives, and the resulting pleasures are not ones that are of the best sort. Bentham and Mill both attacked social traditions that were justified by appeals to natural order. The correct appeal is to utility itself. Traditions often turned out to be “relics” of “barbarous” times, and appeals to nature as a form of justification were just ways to try rationalize continued deference to those relics. In the latter part of the 20th century some writers criticized utilitarianism for its failure to accommodate virtue evaluation. However, though virtue is not the central normative concept in Mill's theory, it is an extremely important one. In Chapter 4 of Utilitarianism Mill noted … does the utilitarian doctrine deny that people desire virtue, or maintain that virtue is not a thing to be desired? The very reverse. It maintains not only that virtue is to be desired, but also that it is to be desired disinterestedly, for itself. Whatever may be the opinion of utilitarian moralists as to the original conditions by which virtue is made virtue … they not only place virtue at the very head of things which are good as a means to the ultimate end, but they also recognize as a psychological fact the possibility of its being, to the individual, a good in itself, without looking to any end beyond it; and hold, that the mind is not in a right state, not in a state conformable to Utility, not in the state most conducive to the general happiness, unless it does love virtue in this manner … In Utilitarianism Mill argues that virtue not only has instrumental value, but is constitutive of the good life. A person without virtue is morally lacking, is not as able to promote the good. However, this view of virtue is someone complicated by rather cryptic remarks Mill makes about virtue in his A System of Logic in the section in which he discusses the “Art of Life.” There he seems to associate virtue with aesthetics, and morality is reserved for the sphere of ‘right’ or ‘duty‘. Wendy Donner notes that separating virtue from right allows Mill to solve another problem for the theory: the demandingness problem (Donner 2011). This is the problem that holds that if we ought to maximize utility, if that is the right thing to do, then doing right requires enormous sacrifices (under actual conditions), and that requiring such sacrifices is too demanding. With duties, on Mill's view, it is important that we get compliance, and that justifies coercion. In the case of virtue, however, virtuous actions are those which it is “…for the general interest that they remain free.” 3. Henry Sidgwick Henry Sidgwick's (1838–1900) The Methods of Ethics (1874) is one of the most well known works in utilitarian moral philosophy, and deservedly so. It offers a defense of utilitarianism, though some writers (Schneewind 1977) have argued that it should not primarily be read as a defense of utilitarianism. In The Methods Sidgwick is concerned with developing an account of “…the different methods of Ethics that I find implicit in our common moral reasoning…” These methods are egoism, intuition based morality, and utilitarianism. On Sidgwick's view, utilitarianism is the more basic theory. A simple reliance on intuition, for example, cannot resolve fundamental conflicts between values, or rules, such as Truth and Justice that may conflict. In Sidgwick's words “…we require some higher principle to decide the issue…” That will be utilitarianism. Further, the rules which seem to be a fundamental part of common sense morality are often vague and underdescribed, and applying them will actually require appeal to something theoretically more basic — again, utilitarianism. Yet further, absolute interpretations of rules seem highly counter-intuitive, and yet we need some justification for any exceptions — provided, again, by utilitarianism. Sidgwick provides a compelling case for the theoretical primacy of utilitarianism. Sidgwick was also a British philosopher, and his views developed out of and in response to those of Bentham and Mill. His Methods offer an engagement with the theory as it had been presented before him, and was an exploration of it and the main alternatives as well as a defense. Sidgwick was also concerned with clarifying fundamental features of the theory, and in this respect his account has been enormously influential to later writers, not only to utilitarians and consequentialists, generally, but to intuitionists as well. Sidgwick's thorough and penetrating discussion of the theory raised many of the concerns that have been developed by recent moral philosophers. One extremely controversial feature of Sidgwick's views relates to his rejection of a publicity requirement for moral theory. He writes: Thus, the Utilitarian conclusion, carefully stated, would seem to be this; that the opinion that secrecy may render an action right which would not otherwise be so should itself be kept comparatively secret; and similarly it seems expedient that the doctrine that esoteric morality is expedient should itself be kept esoteric. Or, if this concealment be difficult to maintain, it may be desirable that Common Sense should repudiate the doctrines which it is expedient to confine to an enlightened few. And thus a Utilitarian may reasonably desire, on Utilitarian principles, that some of his conclusions should be rejected by mankind generally; or even that the vulgar should keep aloof from his system as a whole, in so far as the inevitable indefiniteness and complexity of its calculations render it likely to lead to bad results in their hands. (490) This accepts that utilitarianism may be self-effacing; that is, that it may be best if people do not believe it, even though it is true. Further, it rendered the theory subject to Bernard Williams' (1995) criticism that the theory really simply reflected the colonial elitism of Sidgwick's time, that it was ‘Government House Utilitarianism.’ The elitism in his remarks may reflect a broader attitude, one in which the educated are considered better policy makers than the uneducated. One issue raised in the above remarks is relevant to practical deliberation in general. To what extent should proponents of a given theory, or a given rule, or a given policy — or even proponents of a given one-off action — consider what they think people will actually do, as opposed to what they think those same people ought to do (under full and reasonable reflection, for example)? This is an example of something that comes up in the Actualism/possibilism debate in accounts of practical deliberation. Extrapolating from the example used above, we have people who advocate telling the truth, or what they believe to be the truth, even if the effects are bad because the truth is somehow misused by others. On the other hand are those who recommend not telling the truth when it is predicted that the truth will be misused by others to achieve bad results. Of course it is the case that the truth ought not be misused, that its misuse can be avoided and is not inevitable, but the misuse is entirely predictable. Sidgwick seems to recommending that we follow the course that we predict will have the best outcome, given as part of our calculations the data that others may fail in some way — either due to having bad desires, or simply not being able to reason effectively. The worry Williams points to really isn't a worry specifically with utilitarianism (Driver 2011). Sidgwick would point out that if it is bad to hide the truth, because ‘Government House’ types, for example, typically engage in self-deceptive rationalizations of their policies (which seems entirely plausible), then one shouldn't do it. And of course, that heavily influences our intuitions. Sidgwick raised issues that run much deeper to our basic understanding of utilitarianism. For example, the way earlier utilitarians characterized the principle of utility left open serious indeterminacies. The major one rests on the distinction between total and average utility. He raised the issue in the context of population growth and increasing utility levels by increasing numbers of people (or sentient beings): Assuming, then, that the average happiness of human beings is a positive quantity, it seems clear that, supposing the average happiness enjoyed remains undiminished, Utilitarianism directs us to make the number enjoying it as great as possible. But if we foresee as possible that an increase in numbers will be accompanied by a decrease in average happiness or vice versa, a point arises which has not only never been formally noticed, but which seems to have been substantially overlooked by many Utilitarians. For if we take Utilitarianism to prescribe, as the ultimate end of action, happiness on the whole, and not any individual's happiness, unless considered as an element of the whole, it would follow that, if the additional population enjoy on the whole positive happiness, we ought to weigh the amount of happiness gained by the extra number against the amount lost by the remainder. (415) For Sidgwick, the conclusion on this issue is not to simply strive to greater average utility, but to increase population to the point where we maximize the product of the number of persons who are currently alive and the amount of average happiness. So it seems to be a hybrid, total-average view. This discussion also raised the issue of policy with respect to population growth, and both would be pursued in more detail by later writers, most notably Derek Parfit (1986). 4. Ideal Utilitarianism G. E. Moore strongly disagreed with the hedonistic value theory adopted by the Classical Utilitarians. Moore agreed that we ought to promote the good, but believed that the good included far more than what could be reduced to pleasure. He was a pluralist, rather than a monist, regarding intrinsic value. For example, he believed that ‘beauty’ was an intrinsic good. A beautiful object had value independent of any pleasure it might generate in a viewer. Thus, Moore differed from Sidgwick who regarded the good as consisting in some consciousness. Some objective states in the world are intrinsically good, and on Moore's view, beauty is just such a state. He used one of his more notorious thought experiments to make this point: he asked the reader to compare two worlds, one was entirely beautiful, full of things which complemented each other; the other was a hideous, ugly world, filled with “everything that is most disgusting to us.” Further, there are not human beings, one imagines, around to appreciate or be disgusted by the worlds. The question then is, which of these worlds is better, which one's existence would be better than the other's? Of course, Moore believed it was clear that the beautiful world was better, even though no one was around to appreciate its beauty. This emphasis on beauty was one facet of Moore's work that made him a darling of the Bloomsbury Group. If beauty was a part of the good independent of its effects on the psychological states of others — independent of, really, how it affected others, then one needn't sacrifice morality on the altar of beauty anymore. Following beauty is not a mere indulgence, but may even be a moral obligation. Though Moore himself certainly never applied his view to such cases, it does provide the resources for dealing with what the contemporary literature has dubbed ‘admirable immorality’ cases, at least some of them. Gauguin may have abandoned his wife and children, but it was to a beautiful end. Moore's targets in arguing against hedonism were the earlier utilitarians who argued that the good was some state of consciousness such as pleasure. He actually waffled on this issue a bit, but always disagreed with Hedonism in that even when he held that beauty all by itself was not an intrinsic good, he also held that for the appreciation of beauty to be a good the beauty must actually be there, in the world, and not be the result of illusion. Moore further criticized the view that pleasure itself was an intrinsic good, since it failed a kind of isolation test that he proposed for intrinsic value. If one compared an empty universe with a universe of sadists, the empty universe would strike one as better. This is true even though there is a good deal of pleasure, and no pain, in the universe of sadists. This would seem to indicate that what is necessary for the good is at least the absence of bad intentionality. The pleasures of sadists, in virtue of their desires to harm others, get discounted — they are not good, even though they are pleasures. Note this radical departure from Bentham who held that even malicious pleasure was intrinsically good, and that if nothing instrumentally bad attached to the pleasure, it was wholly good as well. One of Moore's important contributions was to put forward an ‘organic unity’ or ‘organic whole’ view of value. The principle of organic unity is vague, and there is some disagreement about what Moore actually meant in presenting it. Moore states that ‘organic’ is used “…to denote the fact that a whole has an intrinsic value different in amount from the sum of the values of its parts.” (PE, 36) And, for Moore, that is all it is supposed to denote. So, for example, one cannot determine the value of a body by adding up the value of its parts. Some parts of the body may have value only in relation to the whole. An arm or a leg, for example, may have no value at all separated from the body, but have a great deal of value attached to the body, and increase the value of the body, even. In the section of Principia Ethica on the Ideal, the principle of organic unity comes into play in noting that when persons experience pleasure through perception of something beautiful (which involves a positive emotion in the face of a recognition of an appropriate object — an emotive and cognitive set of elements), the experience of the beauty is better when the object of the experience, the beautiful object, actually exists. The idea was that experiencing beauty has a small positive value, and existence of beauty has a small positive value, but combining them has a great deal of value, more than the simple addition of the two small values (PE, 189 ff.). Moore noted: “A true belief in the reality of an object greatly increases the value of many valuable wholes…” (199). This principle in Moore — particularly as applied to the significance of actual existence and value, or knowledge and value, provided utilitarians with tools to meet some significant challenges. For example, deluded happiness would be severely lacking on Moore's view, especially in comparison to happiness based on knowledge. 5. Conclusion Since the early 20th Century utilitarianism has undergone a variety of refinements. After the middle of the 20th Century it has become more common to identify as a ‘Consequentialist’ since very few philosophers agree entirely with the view proposed by the Classical Utilitarians, particularly with respect to the hedonistic value theory. But the influence of the Classical Utilitarians has been profound — not only within moral philosophy, but within political philosophy and social policy. The question Bentham asked, “What use is it?,” is a cornerstone of policy formation. It is a completely secular, forward-looking question. The articulation and systematic development of this approach to policy formation is owed to the Classical Utilitarians.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

The Medieval Problem of Universals

1. Introduction The inherent problems with Plato’s original theory were recognized already by Plato himself. In his Parmenides Plato famously raised several difficulties, for which he apparently did not provide satisfactory answers. Aristotle (384–322 B.C.E.), with all due reverence to his teacher, consistently rejected Plato’s theory, and heavily criticized it …

1. Introduction The inherent problems with Plato’s original theory were recognized already by Plato himself. In his Parmenides Plato famously raised several difficulties, for which he apparently did not provide satisfactory answers. Aristotle (384–322 B.C.E.), with all due reverence to his teacher, consistently rejected Plato’s theory, and heavily criticized it throughout his own work. (Hence the famous saying, amicus Plato sed magis amica veritas).[1] Nevertheless, despite this explicit doctrinal conflict, Neo-Platonic philosophers, pagans (such as Plotinus ca. 204–270, and Porphyry, ca. 234–305) and Christians (such as Augustine, 354–430, and Boethius, ca. 480–524) alike, observed a basic concordance between Plato’s and Aristotle’s approach, crediting Aristotle with an explanation of how the human mind acquires its universal concepts of particular things from experience, and Plato with providing an explanation of how the universal features of particular things are established by being modeled after their universal archetypes.[2] In any case, it was this general attitude toward the problem in late antiquity that set the stage for the ever more sophisticated medieval discussions.[3] In these discussions, the concepts of the human mind, therefore, were regarded as posterior to the particular things represented by these concepts, and hence they were referred to as universalia post rem (‘universals after the thing’). The universal features of singular things, inherent in these things themselves, were referred to as universalia in re (‘universals in the thing’), answering the universal exemplars in the divine mind, the universalia ante rem (‘universals before the thing’).[4] All these, universal concepts, universal features of singular things, and their exemplars, are expressed and signified by means of some obviously universal signs, the universal (or common) terms of human languages. For example, the term ‘man’, in English is a universal term, because it is truly predicable of all men in one and the same sense, as opposed to the singular term ‘Socrates’, which in the same sense, i.e., when not used equivocally, is only predicable of one man (hence the need to add an ordinal number to the names of kings and popes of the same name). Depending on which of these items (universal features of singular things, their universal concepts, or their universal names) they regarded as the primary, really existing universals, it is customary to classify medieval authors as being realists, conceptualists, or nominalists, respectively. The realists are supposed to be those who assert the existence of real universals in and/or before particular things, the conceptualists those who allow universals only, or primarily, as concepts of the mind, whereas nominalists would be those who would acknowledge only, or primarily, universal words. But this rather crude classification does not adequately reflect the genuine, much more subtle differences of opinion between medieval thinkers. (No wonder one often finds in the secondary literature distinctions between, “moderate” and “extreme” versions of these crudely defined positions.) In the first place, nearly all medieval thinkers agreed on the existence of universals before things in the form of divine ideas existing in the divine mind,[5] but all of them denied their existence in the form of mind-independent, real, eternal entities originally posited by Plato. Furthermore, medieval thinkers also agreed that particular things have certain features which the human mind is able to comprehend in a universal fashion, and signify by means of universal terms. As we shall see, their disagreements rather concerned the types of the relationships that hold between the particular things, their individual, yet universally comprehensible features, the universal concepts of the mind, and the universal terms of our languages, as well as the ontological status of, and distinctions between, the individualized features of the things and the universal concepts of the mind. Nevertheless, the distinction between “realism” and “nominalism”, especially, when it is used to refer to the distinction between the radically different ways of doing philosophy and theology in late-medieval times, is quite justifiable, provided we clarify what really separated these ways, as I hope to do in the later sections of this article. In this brief summary account, I will survey the problem both from a systematic and from a historical point of view. In the next section I will first motivate the problem by showing how naturally the questions concerning universals emerge if we consider how we come to know a universal claim, i.e., one that concerns a potentially infinite number of particulars of a given kind, in a simple geometrical demonstration. I will also briefly indicate why a naïve Platonic answer to these questions in terms of the theory of perfect Forms, however plausible it may seem at first, is inadequate. In the third section, I will briefly discuss how the specific medieval questions concerning universals emerged, especially in the context of answering Porphyry’s famous questions in his introduction to Aristotle’s Categories, which will naturally lead us to a discussion of Boethius’ Aristotelian answers to these questions in his second commentary on Porphyry in the fourth section. However, Boethius’ Aristotelian answers anticipated only one side of the medieval discussions: the mundane, philosophical theory of universals, in terms of Aristotelian abstractionism. But the other important, Neo-Platonic, theological side of the issue provided by Boethius, and, most importantly, by St. Augustine, was for medieval thinkers the theory of ontologically primary universals as the creative archetypes of the divine mind, the Divine Ideas. Therefore, the fifth section is going to deal with the main ontological and epistemological problems generated by this theory, namely, the apparent conflict between divine simplicity and the multiplicity of divine ideas, on the one hand, and the tension between the Augustinian theory of divine illumination and Aristotelian abstractionism, on the other. Some details of the early medieval Boethian-Aristotelian approach to the problem and its combination with the Neo-Platonic Augustinian tradition before the influx of the newly recovered logical, metaphysical, and physical writings of Aristotle and their Arabic commentaries in the second half of the 12th century will be taken up in the sixth section, in connection with Abelard’s (1079–1142) discussion of Porphyry’s questions. The seventh section will discuss some details of the characteristic metaphysical approach to the problem in the 13th century, especially as it was shaped by the influence of Avicenna’s (980–1037) doctrine of common nature. The eighth section outlines the most general features of the logical conceptual framework that served as the common background for the metaphysical disagreements among the authors of this period. I will argue that it is precisely this common logical-semantical framework that allows the grouping together of authors who endorse sometimes radically different metaphysics and epistemologies (not only in this period, but also much later, well into the early modern period) as belonging to what in later medieval philosophy came to be known as the “realist” via antiqua, the “old way” of doing philosophy and theology. By contrast, it was precisely the radically different logical-semantical approach initiated by William Ockham (ca. 1280–1350), and articulated and systematized most powerfully by Jean Buridan (ca. 1300–1358), that distinguished the “nominalist” via moderna, the “modern way” of doing philosophy and theology from the second half of the 14th century. The general, distinctive characteristics of this “modern way” will be the discussed in the ninth section. Finally, the concluding tenth section will briefly indicate how the separation of the two viae, in addition to a number of extrinsic social factors, contributed to the disintegration of scholastic discourse, and thereby to the disappearance of the characteristically medieval problem of universals, as well as to the re-mergence of recognizably the same problem in different guises in early modern philosophy. 2. The Emergence of the Problem It is easy to see how the problem of universals emerges, if we consider a geometrical demonstration, for example, the demonstration of Thales’ theorem. According to the theorem, any triangle inscribed in a semicircle is a right triangle, as is shown in the following diagram: Looking at this diagram, we can see that all we need to prove is that the angle at vertex D of triangle ABD is a right angle. The proof is easy once we realize that since lines AC, DC, and BC are the radii of a circle, the triangles ACD and DCB are isosceles triangles, whence their base angles are equal. For then, if we denote the angles of ABD by the names of their vertices, this fact entails that D=A + B; and so, since A + B + D=180o, it follows that 2A + 2B=180o; therefore, A + B=90o, that is, D=90o, q. e. d. Of course, from our point of view, the important thing about this demonstration is not so much the truth of its conclusion as the way it proves this conclusion. For the conclusion is a universal theorem, which has to concern all possible triangles inscribed in any possible semicircle whatsoever, not just the one inscribed in the semicircle in the figure above. Yet, apparently, in the demonstration above we were talking only about that triangle. So, how can we claim that whatever we managed to prove concerning that particular triangle will hold for all possible triangles? If we take a closer look at the diagram, we can easily see the appeal of the Platonic answer to this question. For upon a closer look, it is clear that, despite appearances to the contrary, this demonstration cannot be about the triangle in this diagram. Indeed, in the demonstration we assumed that the lines AC, DC, and BC were all perfectly equal, straight lines. However, if we zoom in on the figure, we can clearly see that these lines are far from being equal; in fact, they are not even straight lines: The demonstration was certainly not about the collection of jagged black surfaces that we can see here. Rather, the demonstration concerned something we did not see with our bodily eyes, but what we had in mind all along, understanding it to be a triangle, with perfectly straight edges, touching a perfect circle in three unextended points, which are all perfectly equidistant from the center of the circle. The figure we could see was only a convenient “reminder” of what we are supposed to have in mind when we want to prove that a certain property, namely, that it is a right triangle, has to belong to the object in our mind in virtue of what it is, namely, a triangle inscribed in a semicircle. Obviously, the conclusion applies perfectly only to the perfect triangle we had in mind, whereas it holds for the visible figure only insofar as, and to the extent that, this figure resembles the object we had in mind. But this figure fails to have this property precisely insofar as, and to the extent that, it falls short of the object in our mind. However, on the basis of this point it should also be clear that the conclusion does apply to this figure, and every other visible triangle inscribed in a semicircle as well, insofar as, and to the extent that, it manages to imitate the properties of the perfect object in our mind. Therefore, the Platonic answer to the question of what this demonstration was about, namely, that it was about a perfect, ideal triangle, which is invisible to the eyes, but is graspable by our understanding, at once provides us with an explanation of the possibility of universal, necessary knowledge. By knowing the properties of the Form or Idea, we know all its particulars, i.e., all the things that imitate it, insofar as they imitate or participate in it. So, the Form itself is a universal entity, a universal model of all its particulars; and since it is the knowledge of this universal entity that can enable us to know at once all its particulars, it is absolutely vital for us to know what it is, what it is like, and exactly how it is related to its particulars. However, obviously, all these questions presuppose that it is at all, namely, that such a universal entity exists. But the existence of such an entity seems to be rather precarious. Consider, for instance, the perfect triangle we were supposed to have in mind during the demonstration of Thales’ theorem. If it is a perfect triangle, it obviously has to have three sides, since a perfect triangle has to be a triangle, and nothing can be a triangle unless it has three sides. But of those three sides either at least two are equal or none, that is to say, the triangle in question has to be either isosceles or scalene (taking ‘isosceles’ broadly, including even equilateral triangles, for the sake of simplicity). However, since it is supposed to be the universal model of all triangles, and not only of isosceles triangles, this perfect triangle cannot be an isosceles, and for the same reason it cannot be a scalene triangle either. Therefore, such a universal triangle would have to have inconsistent properties, namely, both that it is either isosceles or scalene and that it is neither isosceles nor scalene. However, obviously nothing can have these properties at the same time, so nothing can be a universal triangle any more than a round square. So, apparently, no universal triangle can exist. But then, what was our demonstration about? Just a little while ago, we concluded that it could not be directly about any particular triangle (for it was not about the triangle in the figure, and it was even less about any other particular triangle not in the figure), and now we had to conclude that it could not be about a universal triangle either. But are there any further alternatives? It seems obvious that through this demonstration we do gain universal knowledge concerning all particulars. Yet it is also clear that we do not, indeed, we cannot gain this knowledge by examining all particulars, both because they are potentially infinite and because none of them perfectly satisfies the conditions stated in the demonstration. So, there must have been something wrong in our characterization of the universal, which compelled us to conclude that, in accordance with that characterization, universals could not exist. Therefore, we are left with a whole bundle of questions concerning the nature and characteristics of universals, questions that cannot be left unanswered if we want to know how universal, necessary knowledge is possible, if at all. 3. The Origin of the Specifically Medieval Problem of Universals What we may justifiably call the first formulation of “the medieval problem of universals” (distinguishing it from the both logically and historically related ancient problems of Plato’s Theory of Forms) was precisely such a bundle of questions famously raised by Porphyry in his Isagoge, that is, his Introduction to Aristotle’s Categories. As he wrote: (1) Since, Chrysaorius, to teach about Aristotle’s Categories it is necessary to know what genus and difference are, as well as species, property, and accident, and since reflection on these things is useful for giving definitions, and in general for matters pertaining to division and demonstration, therefore I shall give you a brief account and shall try in a few words, as in the manner of an introduction, to go over what our elders said about these things. I shall abstain from deeper enquiries and aim, as appropriate, at the simpler ones. (2) For example, I shall beg off saying anything about (a) whether genera and species are real or are situated in bare thoughts alone, (b) whether as real they are bodies or incorporeals, and (c) whether they are separated or in sensibles and have their reality in connection with them. Such business is profound, and requires another, greater investigation. Instead I shall now try to show how the ancients, the Peripatetics among them most of all, interpreted genus and species and the other matters before us in a more logical fashion. [Porphyry, Isagoge, in Spade 1994 (henceforth, Five Texts), p. 1.] Even though in this way, by relegating them to a “greater investigation”, Porphyry left these questions unanswered, they certainly proved to be irresistible for his medieval Latin commentators, beginning with Boethius, who produced not just one, but two commentaries on Porphyry’s text; the first based on Marius Victorinus’s (fl. 4th c.) translation, and the second on his own.[6] In the course of his argument, Boethius makes it quite clear what sort of entity a universal would have to be. A universal must be common to several particulars in its entirety, and not only in part simultaneously, and not in a temporal succession, and it should constitute the substance of its particulars.[7] However, as Boethius argues, nothing in real existence can satisfy these conditions. The main points of his argument can be reconstructed as follows. Anything that is common to many things in the required manner has to be simultaneously, and as a whole, in the substance of these many things. But these many things are several beings precisely because they are distinct from one another in their being, that is to say, the act of being of one is distinct from the act of being of the other. However, if the universal constitutes the substance of a particular, then it has to have the same act of being as the particular, because constituting the substance of something means precisely this, namely, sharing the act of being of the thing in question, as the thing’s substantial part. But the universal is supposed to constitute the substance of all of its distinct particulars, as a whole, at the same time. Therefore, the one act of being of the universal entity would have to be identical with all the distinct acts of being of its several particulars at the same time, which is impossible.[8] This argument, therefore, establishes that no one thing can be a universal in its being, that is to say, nothing can be both one being and common to many beings in such a manner that it shares its act of being with those many beings, constituting their substance. This can easily be visualized in the following diagram, where the tiny lightning bolts indicate the acts of being of the entities involved, namely, a woman, a man, and their universal humanity (the larger dotted figure). But then, Boethius goes on, we should perhaps say that the universal is not one being, but rather many beings, that is, [the collection of][9] those constituents of the individual essences of its particulars on account of which they all fall under the same universal predicable. For example, on this conception, the genus ‘animal’ would not be some one entity, a universal animality over and above the individual animals, yet somehow sharing its being with them all (since, as we have just seen, that is impossible), but rather [the collection of] the individual animalities of all animals. Boethius rejects this suggestion on the ground that whenever there are several generically similar entities, they have to have a genus; therefore, just as the individual animals had to have a genus, so too, their individual animalities would have to have another one. However, since the genus of animalities cannot be one entity, some ‘super-animality’ (for the same reason that the genus of animals could not be one entity, on the basis of the previous argument), it seems that the genus of animalities would have to be a number of further ‘super-animalities’. But then again, the same line of reasoning should apply to these ‘super-animalities’, giving rise to a number of ‘super-super-animalities’, and so on to infinity, which is absurd. Therefore, we cannot regard the genus as some real being even in the form of [a collection of] several distinct entities. Since similar reasonings would apply to the other Porphyrian predicables as well, no universal can exist in this way. Now, a universal either exists in reality independently of a mind conceiving of it, or it only exists in the mind. If it exists in reality, then it either has to be one being or several beings. But since it cannot exist in reality in either of these two ways, Boethius concludes that it can only exist in the mind.[10] However, to complicate matters, it appears that a universal cannot exist in the mind either. For, as Boethius says, the universal existing in the mind is some universal understanding of some thing outside the mind. But then this universal understanding is either disposed in the same way as the thing is, or differently. If it is disposed in the same way, then the thing also must be universal, and then we end up with the previous problem of a really existing universal. On the other hand, if it is disposed differently, then it is false, for “what is understood otherwise than the thing is is false” (Five Texts, Spade 1994, p. 23 (21)). But then, all universals in the understanding would have to be false representations of their objects; therefore, no universal knowledge would be possible, whereas our considerations started out precisely from the existence of such knowledge, as seems to be clear, e.g., in the case of geometrical knowledge. 4. Boethius’ Aristotelian Solution Boethius’ solution of the problem stated in this form consists in the rejection of this last argument, by pointing out the ambiguity of the principle that “what is understood otherwise than the thing is is false”. For in one sense this principle states the obvious, namely, that an act of understanding that represents a thing to be otherwise than the thing is is false. This is precisely the reading of this principle that renders it plausible. However, in another sense this principle would state that an act of understanding which represents the thing in a manner which is different from the manner in which the thing exists is false. In this sense, then, the principle would state that if the mode of representation of the act of understanding is different from the mode of being of the thing, then the act of understanding is false. But this is far from plausible. In general, it is simply not true that a representation can be true or faithful only if the mode of representation matches the mode of being of the thing represented. For example, a written sentence is a true and faithful representation of a spoken sentence, although the written sentence is a visible, spatial sequence of characters, whereas the spoken sentence is an audible, temporal pattern of articulated sounds. So, what exists as an audible pattern of sounds is represented visually, that is, the mode of existence of the thing represented is radically different from the mode of its representation. In the same way, when particular things are represented by a universal act of thought, the things exist in a particular manner, while they are represented in a universal manner, still, this need not imply that the representation is false. But this is precisely the sense of the principle that the objection exploited. Therefore, since in this sense the principle can be rejected, the objection is not conclusive.[11] However, it still needs to be shown that in the particular case of universal representation the mismatch between the mode of its representation and the mode of being of the thing represented does in fact not entail the falsity of the representation. This can easily be seen if we consider the fact that the falsity of an act of understanding consists in representing something to be in a way it is not. That is to say, properly speaking, it is only an act of judgment that can be false, by which we think something to be somehow. But a simple act of understanding, by which we simply understand something without thinking it to be somehow, that is, without attributing anything to it, cannot be false. For example, I can be mistaken if I form in my mind the judgment that a man is running, whereby I conceive a man to be somehow, but if I simply think of a man without attributing either running or not running to him, I certainly cannot make a mistake as to how he is.[12] In the same way, I would be mistaken if I were to think that a triangle is neither isosceles nor scalene, but I am certainly not in error if I simply think of a triangle without thinking either that it is isosceles or that it is scalene. Indeed, it is precisely this possibility that allows me to form the universal mental representation, that is, the universal concept of all particular triangles, regardless of whether they are isosceles or scalene. For when I think of a triangle in general, then I certainly do not think of something that is a triangle and is neither isosceles nor scalene, for that is impossible, but I simply think of a triangle, not thinking that it is an isosceles and not thinking that it is a scalene triangle. This is how the mind is able to separate in thought what are inseparable in real existence. Being either isosceles or scalene is inseparable from a triangle in real existence. For it is impossible for something to be a triangle, and yet not to be an isosceles and not to be a scalene triangle either. Still, it is not impossible for something to be thought to be a triangle and not to be thought to be an isosceles and not to be thought to be a scalene triangle either (although of course, it still has to be thought to be either-isosceles-or-scalene). This separation in thought of those things that cannot be separated in reality is the process of abstraction.[13] In general, by means of the process of abstraction, our mind (in particular, the faculty of our mind Aristotle calls active intellect (nous poietikos, in Greek, intellectus agens, in Latin) is able to form universal representations of particular objects by disregarding what distinguishes them, and conceiving of them only in terms of those of their features in respect of which they do not differ from one another. In this way, therefore, if universals are regarded as universal mental representations existing in the mind, then the contradictions emerging from the Platonic conception no longer pose a threat. On this Aristotelian conception, universals need not be thought of as somehow sharing their being with all their distinct particulars, for their being simply consists in their being thought of, or rather, the particulars’ being thought of in a universal manner. This is what Boethius expresses by saying in his final replies to Porphyry’s questions the following: … genera and species subsist in one way, but are understood in an another. They are incorporeal, but subsist in sensibles, joined to sensibles. They are understood, however, as subsisting by themselves, and as not having their being in others. [Five Texts, Spade 1994, p. 25] But then, if in this way, by positing universals in the mind, the most obvious inconsistencies of Plato’s doctrine can be avoided, no wonder that Plato’s “original” universals, the universal models which particulars try to imitate by their features, found their place, in accordance with the long-standing Neo-Platonic tradition, in the divine mind.[14] It is this tradition that explains Boethius’ cautious formulation of his conclusion concerning Aristotelianism pure and simple, as not providing us with the whole story. As he writes: … Plato thinks that genera and species and the rest are not only understood as universals, but also exist and subsist apart from bodies. Aristotle, however, thinks that they are understood as incorporeal and universal, but subsist in sensibles. I did not regard it as appropriate to decide between their views. For that belongs to a higher philosophy. But we have carefully followed out Aristotle’s view here, not because we would recommend it the most, but because this book, [the Isagoge], is written about the Categories, of which Aristotle is the author. [Five Texts, Spade 1994, p. 25] 5. Platonic Forms as Divine Ideas Besides Boethius, the most important mediator between the Neo-Platonic philosophical tradition and the Christianity of the Medieval Latin West, pointing out also its theological implications, was St. Augustine. In a passage often quoted by medieval authors in their discussions of divine ideas, he writes as follows: … in Latin we can call the Ideas “forms” or “species”, in order to appear to translate word for word. But if we call them “reasons”, we depart to be sure from a proper translation — for reasons are called “logoi” in Greek, not Ideas — but nevertheless, whoever wants to use this word will not be in conflict with the fact. For Ideas are certain principal, stable and immutable forms or reasons of things. They are not themselves formed, and hence they are eternal and always stand in the same relations, and they are contained in the divine understanding. [Spade 1985, Other Internet Resources, p. 383][15] As we could see from Boethius’ solution, in this way, if Platonic Forms are not universal beings existing in a universal manner, but their universality is due to a universal manner of understanding, we can avoid the contradictions arising from the “naïve” Platonic conception. Nevertheless, placing universal ideas in the divine mind as the archetypes of creation, this conception can still do justice to the Platonic intuition that what accounts for the necessary, universal features of the ephemeral particulars of the visible world is the presence of some universal exemplars in the source of their being. It is precisely in virtue of having some insight into these exemplars themselves that we can have the basis of universal knowledge Plato was looking for. As St. Augustine continues: And although they neither arise nor perish, nevertheless everything that is able to arise and perish, and everything that does arise and perish, is said to be formed in accordance with them. Now it is denied that the soul can look upon them, unless it is a rational one, [and even then it can do so] only by that part of itself by which it surpasses [other things] — that is, by its mind and reason, as if by a certain “face”, or by an inner and intelligible “eye”. To be sure, not each and every rational soul in itself, but [only] the one that is holy and pure, that [is the one that] is claimed to be fit for such a vision, that is, the one that keeps that very eye, by which these things are seen, healthy and pure and fair and like the things it means to see. What devout man imbued with true religion, even though he is not yet able to see these things, nevertheless dares to deny, or for that matter fails to profess, that all things that exist, that is, whatever things are contained in their own genus with a certain nature of their own, so that that they might exist, are begotten by God their author, and that by that same author everything that lives is alive, and that the entire safe preservation and the very order of things, by which changing things repeat their temporal courses according to a fixed regimen, are held together and governed by the laws of a supreme God? If this is established and granted, who dares to say that God has set up all things in an irrational manner? Now if it is not correct to say or believe this, it remains that all things are set up by reason, and a man not by the same reason as a horse — for that is absurd to suppose. Therefore, single things are created with their own reasons. But where are we to think these reasons exist, if not in the mind of the creator? For he did not look outside himself, to anything placed [there], in order to set up what he set up. To think that is sacrilege. But if these reasons of all things to be created and [already] created are contained in the divine mind, and [if] there cannot be anything in the divine mind that is not eternal and unchangeable, and [if] Plato calls these principal reasons of things “Ideas”, [then] not only are there Ideas but they are true, because they are eternal and [always] stay the same way, and [are] unchangeable. And whatever exists comes to exist, however it exists, by participation in them. But among the things set up by God, the rational soul surpasses all [others], and is closest to God when it is pure. And to the extent that it clings to God in charity, to that extent, drenched in a certain way and lit up by that intelligible light, it discerns these reasons, not by bodily eyes but by that principal [part] of it by which it surpasses [everything else], that is, by its intelligence. By this vision it becomes most blessed. These reasons, as was said, whether it is right to call them Ideas or forms or species or reasons, many are permitted to call [them] whatever they want, but [only] to a very few [is it permitted] to see what is true. [Spade 1985, Other Internet Resources, pp. 383–384] Augustine’s conception, then, saves Plato’s original intuitions, yet without their inconsistencies, while it also combines his philosophical insights with Christianity. But, as a rule, a really intriguing solution of a philosophical problem usually gives rise to a number of further problems. This solution of the original problem with Plato’s Forms is no exception. 5.1 Divine Ideas and Divine Simplicity First of all, it generates a particular ontological/theological problem concerning the relationship between God and His Ideas. For according to the traditional philosophical conception of divine perfection, God’s perfection demands that He is absolutely simple, without any composition of any sort of parts.[16] So, God and the divine mind are not related to one another as a man and his mind, namely as a substance to one of its several powers, but whatever powers God has He is. Furthermore, the Divine Ideas themselves cannot be regarded as being somehow the eternal products of the divine mind distinct from the divine mind, and thus from God Himself, for the only eternal being is God, and everything else is His creature. Now, since the Ideas are not creatures, but the archetypes of creatures in God’s mind, they cannot be distinct from God. However, as is clear from the passage above, there are several Ideas, and there is only one God. So how can these several Ideas possibly be one and the same God? Augustine never explicitly raised the problem, but for example Aquinas, who (among others) did, provided the following rather intuitive solution for it (ST1, q. 15, a. 2). The Divine Ideas are in the Divine Mind as its objects, i.e., as the things understood. But the diversity of the objects of an act of understanding need not diversify the act itself (as when understanding the Pythagorean theorem, we understand both squares and triangles). Therefore, it is possible for the self-thinking divine essence to understand itself in a single act of understanding so perfectly that this act of understanding not only understands the divine essence as it is in itself, but also in respect of all possible ways in which it can be imperfectly participated by any finite creature. The cognition of the diversity of these diverse ways of participation accounts for the plurality of divine ideas. But since all these diverse ways are understood in a single eternal act of understanding, which is nothing but the act of divine being, and which in turn is again the divine essence itself, the multiplicity of ideas does not entail any corresponding multiplicity of the divine essence. To be sure, this solution may still give rise to the further questions as to what these diverse ways are, exactly how they are related to the divine essence, and how their diversity is compatible with the unity and simplicity of the ultimate object of divine thought, namely, divine essence itself. In fact, these are questions that were raised and discussed in detail by authors such as Henry of Ghent (c. 1217–1293), Thomas of Sutton (ca. 1250–1315), Duns Scotus (c. 1266–1308) and others.[17] 5.2 Illuminationism vs. Abstractionism Another major issue connected to the doctrine of divine ideas, as should also be clear from the previously quoted passage, was the bundle of epistemological questions involved in Augustine’s doctrine of divine illumination. The doctrine — according to which the human soul, especially “one that is holy and pure”, obtains a specific supernatural aid in its acts of understanding, by gaining a direct insight into the Divine Ideas themselves — received philosophical support in terms of a typically Platonic argument in Augustine’s De Libero Arbitrio.[18] The argument can be reconstructed as follows. The Augustinian Argument for Illumination. I can come to know from experience only something that can be found in experience [self-evident] Absolute unity cannot be found in experience [assumed] Therefore, I cannot come to know absolute unity from experience. [1,2] Whatever I know, but I cannot come to know from experience, I came to know from a source that is not in this world of experiences. [self-evident] I know absolute unity. [assumed] Therefore, I came to know absolute unity from a source that is not in this world of experiences. [3,4,5] Proof of 2. Whatever can be found in experience is some material being, extended in space, and so it has to have a multitude of spatially distinct parts. Therefore, it is many in respect of those parts. But what is many in some respect is not one in that respect, and what is not one in some respect is not absolutely one. Therefore, nothing can be found in experience that is absolutely one, that is, nothing in experience is an absolute unity. Proof of 5. I know that whatever is given in experience has many parts (even if I may not be able to discern those parts by my senses), and so I know that it is not an absolute unity. But I can have this knowledge only if I know absolute unity, namely, something that is not many in any respect, not even in respect of its parts, for, in general, I can know that something is F in a certain respect, and not an F in some other respect, only if I know what it is for something to be an F without any qualification. (For example, I know that the two halves of a body, taken together, are not absolutely two, for taken one by one, they are not absolutely one, since they are also divisible into two halves, etc. But I can know this only because I know that for obtaining absolutely two things [and not just two multitudes of further things], I would have to have two things that in themselves are absolutely one.) Therefore, I know absolute unity. It is important to notice here that this argument (crucially) assumes that the intellect is passive in acquiring its concepts. According to this assumption, the intellect merely receives the cognition of its objects as it finds them. By contrast, on the Aristotelian conception, the human mind actively processes the information it receives from experience through the senses. So by means of its faculty appropriately called the active or agent intellect, it is able to produce from a limited number of experiences a universal concept equally representing all possible particulars falling under that concept. In his commentary on Aristotle’s De Anima Aquinas insightfully remarks: The reason why Aristotle came to postulate an active intellect was his rejection of Plato’s theory that the essences of sensible things existed apart from matter, in a state of actual intelligibility. For Plato there was clearly no need to posit an active intellect. But Aristotle, who regarded the essences of sensible things as existing in matter with only a potential intelligibility, had to invoke some abstractive principle in the mind itself to render these essences actually intelligible. [In De Anima, bk. 3, lc. 10] On the basis of these and similar considerations, therefore, one may construct a rather plausible Aristotelian counterargument, which is designed to show that we need not necessarily gain our concept of absolute unity from a supernatural source, for it is possible for us to obtain it from experience by means of the active intellect. Of course, similar considerations should apply to other concepts as well. An Aristotelian-Thomistic counterargument from abstraction. I know from experience everything whose concept my active intellect is able to abstract from experience. [self-evident] But my active intellect is able to abstract from experience the concept of unity, since we all experience each singular thing as being one, distinct from another. [self-evident, common experience][19] Therefore, I know unity from experience by abstraction. [1,2] Whenever I know something from experience by abstraction, I know both the thing whose concept is abstracted and its limiting conditions from which its concept is abstracted. [self-evident] Therefore, I know both unity and its limiting conditions from which its concept is abstracted. [3,4] But whenever I know something and its limiting conditions, and I can conceive of it without its limiting conditions (and this is precisely what happens in abstraction), I can conceive of its absolute, unlimited realization. [self-evident] Therefore, I can conceive of the absolute, unlimited realization of unity, based on the concept of unity I acquired from experience by abstraction. [5,6] Therefore, it is not necessary for me to have a preliminary knowledge of absolute unity before all experience, from a source other than this world of experiences. [7] To be sure, we should notice here that this argument does not falsify the doctrine of illumination. Provided it works, it only invalidates the Augustinian-Platonic argument for illumination. Furthermore, this is obviously not a sweeping, knock-down refutation of the idea that at least some of our concepts perhaps could not so simply be derived from experience by abstraction; in fact, in the particular case of unity, and in general, in connection with our transcendental notions (i.e., notions that apply in each Aristotelian category, so they transcend the limits of each one of them, such as the notions of being, unity, goodness, truth, etc.), even the otherwise consistently Aristotelian Aquinas would have a more complicated story to tell (see Klima 2000b). Nevertheless, although Aquinas would still leave some room for illumination in his epistemology, he would provide for illumination an entirely naturalistic interpretation, as far as the acquisition of our intellectual concepts of material things is concerned, by simply identifying it with the “intellectual light in us”, that is, the active intellect, which enables us to acquire these concepts from experience by abstraction.[20] Duns Scotus, who opposed Aquinas on so many other points, takes basically the same stance on this issue. Other medieval theologians, especially such prominent “Augustinians” as Bonaventure, Matthew of Aquasparta, or Henry of Ghent, would provide greater room for illumination in the form of a direct, specific, supernatural influence needed for human intellectual cognition in this life besides the general divine cooperation needed for the workings of our natural powers, in particular, the abstractive function of the active intellect.[21] But they would not regard illumination as supplanting, but rather as supplementing intellectual abstraction. As we could see, Augustine makes recognition of truth dependent on divine illumination, a sort of irradiation of the intelligible light of divine ideas, which is accessible only to the few who are “holy and pure”. But this seems to go against at least 1. the experience that there are knowledgeable non-believers or pagans 2. the Aristotelian insight that we can have infallible comprehension of the first principles of scientific demonstrations for which we only need the intellectual concepts that we can acquire naturally, from experience by abstraction,[22] and 3. the philosophical-theological consideration that if human reason, man’s natural faculty for acquiring truth were not sufficient for performing its natural function, then human nature would be naturally defective in its noblest part, precisely in which it was created after the image of God. In fact, these are only some of the problems explicitly raised and considered by medieval Augustinians, which prompted their ever more refined accounts of the role of illumination in human cognition. For example, Matthew of Aquasparta, recapitulating St. Bonaventure, writes as follows: Plato and his followers stated that the entire essence of cognition comes forth from the archetypal or intelligible world, and from the ideal reasons; and they stated that the eternal light contributes to certain cognition in its evidentness as the entire and sole reason for cognition, as Augustine in many places recites, in particular in bk. viii. c. 7 of The City of God: ‘The light of minds for the cognition of everything is God himself, who created everything’. But this position is entirely mistaken. For although it appears to secure the way of wisdom, it destroys the way of knowledge. Furthermore, if that light were the entire and sole reason for cognition, then the cognition of things in the Word would not differ from their cognition in their proper kind, neither would the cognition of reason differ from the cognition of revelation, nor philosophical cognition from prophetic cognition, nor cognition by nature from cognition by grace. The other position was apparently that of Aristotle, who claimed that the entire essence of cognition is caused and comes from below, through the senses, memory, and experience, [working together] with the natural light of our active intellect, which abstracts the species from phantasms and makes them actually understood. And for this reason he did not claim that the eternal is light necessary for cognition, indeed, he never spoke about it. And this opinion of his is obvious in bk. 2 of the Posterior Analytics. […] But this position seems to be very deficient. For although it builds the way of knowledge, it totally destroys the way of wisdom. […] Therefore, I take it that one should maintain an intermediate position without prejudice, by stating that our cognition is caused both from below and from above, from external things as well as the ideal reasons. […] God has provided our mind with some intellectual light, by means of which it would abstract the species of objects from the sensibles, by purifying them and extracting their quiddities, which are the per se objects of the intellect. […] But this light is not sufficient, for it is defective, and is mixed with obscurity, unless it is joined and connected to the eternal light, which is the perfect and sufficient reason for cognition, and the intellect attains and somehow touches it by its upper part. However the intellect attains that light or those eternal reasons as the reason for cognition not as sole reason, for then, as has been said, cognition in the Word would not differ from cognition in proper kind, nor the cognition of wisdom would differ from the cognition of knowledge. Nor does it attain them as the entire reason, for then it would not need the species and similitudes of things; but this is false, for the Philosopher says, and experience teaches, that if someone loses a sense, then he loses that knowledge of things which comes from that sense. [DHCR, pp. 94–96] In this way, taking the intermediate position between Platonism and Aristotelianism pure and simple, Matthew interprets Augustine’s Platonism as being compatible with the Aristotelian view, crediting the Aristotelian position with accounting for the specific empirical content of our intellectual concepts, while crediting the Platonic view with accounting for their certainty in grasping the natures of things. Still, it may not appear quite clear exactly what the contribution of the eternal light is, indeed, whether it is necessary at all. After all, if by abstraction we manage to gain those intellectual concepts that represent the natures of things, what else is needed to have a grasp of those natures? Henry of Ghent, in his detailed account of the issue, provides an interesting answer to this question. Henry first distinguishes cognition of a true thing from the cognition of the truth of the thing. Since any really existing thing is truly what it is (even if it may on occasion appear something else), any cognition of any really existing thing is the cognition of a true thing. But cognition of a true thing may occur without the cognition of its truth, since the latter is the cognition that the thing adequately corresponds to its exemplar in the human or divine mind. For example, if I draw a circle, when a cat sees it, then it sees the real true thing as it is presented to it. Yet the cat is simply unable to judge whether it is a true circle in the sense that it really is what it is supposed to be, namely, a locus of points equidistant from a given point. By contrast, a human being is able to judge the truth of this thing, insofar as he or she would be able to tell that my drawing is not really and truly a circle, but is at best a good approximation of what a true circle would be. Now, in intellectual cognition, just as in the sensory cognition of things, when the intellect simply apprehends a true thing, then it still does not have to judge the truth of the thing, even though it may have a true apprehension, adequately representing the thing. But the cognition of the truth of the thing only occurs in a judgment, when the intellect judges the adequacy of the thing to its exemplar. But since a thing can be compared to two sorts of exemplar, namely, to the exemplar in the human mind, and to the exemplar in the divine mind, the cognition of the truth of a thing is twofold, relative to these two exemplars. The exemplar of the human mind, according to Henry, is nothing but the Aristotelian abstract concept of the thing, whereby the thing is simply apprehended in a universal manner, and hence its truth is judged relative to this concept, when the intellect judges that the thing in question falls under this concept or not. As Henry writes: […] attending to the exemplar gained from the thing as the reason for its cognition in the cognizer, the truth of the thing can indeed be recognized, by forming a concept of the thing that conforms to that exemplar; and it is in this way that Aristotle asserted that man gains knowledge and cognition of the truth from purely natural sources about changeable natural things, and that this exemplar is acquired from things by means of the senses, as from the first principle of art and science. […] So, by means of the universal notion in us that we have acquired from the several species of animals we are able to realize concerning any thing that comes our way whether it is an animal or not, and by means of the specific notion of donkey we realize concerning any thing that comes our way whether it is a donkey or not. [HQO, a. 1, q. 2, fol. 5 E-F] But this sort of cognition of the truth of a thing, although it is intellectual, universal cognition, is far from being the infallible knowledge we are seeking. As Henry argues further: But by this sort of acquired exemplar in us we do not have the entirely certain and infallible cognition of truth. Indeed, this is entirely impossible for three reasons, the first of which is taken from the thing from which this exemplar is abstracted, the second from the soul, in which this exemplar is received, and the third from the exemplar itself that is received in the soul about the thing. The first reason is that this exemplar, since it is abstracted from changeable things, has to share in the nature of changeability. Therefore, since physical things are more changeable than mathematical objects, this is why the Philosopher claimed that we have a greater certainty of knowledge about mathematical objects than about physical things by means of their universal species. And this is why Augustine, discussing this cause of the uncertainty of the knowledge of natural things in q. 9 of his Eighty-Three Different Questions, says that from the bodily senses one should not expect the pure truth [syncera veritas] … The second reason is that the human soul, since it is changeable and susceptible to error, cannot be rectified to save it from swerving into error by anything that is just as changeable as itself, or even more; therefore, any exemplar that it receives from natural things is necessarily just as changeable as itself, or even more, since it is of an inferior nature, whence it cannot rectify the soul so that it would persist in the infallible truth. … The third reason is that this sort of exemplar, since it is the intention and species of the sensible thing abstracted from the phantasm, is similar to the false as well as to the true [thing], so that on its account these cannot be distinguished. For it is by means of the same images of sensible things that in dreams and madness we judge these images to be the things, and in sane awareness we judge the things themselves. But the pure truth can only be perceived by discerning it from falsehood. Therefore, by means of such an exemplar it is impossible to have certain knowledge, and certain cognition of the truth. And so if we are to have certain knowledge of the truth, then we have to turn our mind away from the senses and sensible things, and from every intention, no matter how universal and abstracted from sensible things, to the unchangeable truth existing above the mind […]. [ibid., fol. 5. F] So, Henry first distinguished between the cognition of a true thing and the intellectual cognition of the truth of a thing, and then, concerning the cognition of the truth of the thing, he distinguished between the cognition of truth by means of a concept abstracted from the thing and “the pure truth” [veritas syncera vel liquida], which he says cannot be obtained by means of such abstracted concepts. But then the question naturally arises: what is this “pure truth”, and how can it be obtained, if at all? Since cognition of the pure truth involves comparison of objects not to their acquired exemplar in the human mind, but to their eternal exemplar in the divine mind, in the ideal case it would consist in some sort of direct insight into the divine ideas, enabling the person who has this access to see everything in its true form, as “God meant it to be”, and also see how it fails to live up to its idea due to its defects. So, it would be like the direct intuition of two objects, one sensible, another intelligible, on the basis of which one could also immediately judge how closely the former approaches the latter. But this sort of direct intuition of the divine ideas is only the share of angels and the souls of the blessed in beatific vision; it is generally not granted in this life, except in rare, miraculous cases, in rapture, or prophetic vision. Therefore, if there is to be any non-miraculous recognition of this pure truth in this life, then it has to occur differently. Henry argues that even if we do not have a direct intuition of divine ideas as the objects cognized (whereby their particulars are recognized as more or less approximating them), we do have the cognition of the quiddities of things as the objects cognized by reason of some indirect cognition of their ideas. The reason for this, Henry says, is the following: …for our concept to be true by the pure truth, the soul, insofar as it is informed by it, has to be similar to the truth of the thing outside, since truth is a certain adequacy of the thing and the intellect. And so, as Augustine says in bk. 2 of On Free Choice of the Will, since the soul by itself is liable to slip from truth into falsity, whence by itself it is not informed by the truth of any thing, although it can be informed by it, but nothing can inform itself, for nothing can give what it does not have; therefore, it is necessary that it be informed of the pure truth of a thing by something else. But this cannot be done by the exemplar received from the thing itself, as has been shown earlier [in the previously quoted passage — GK]. It is necessary, therefore, that it be informed by the exemplar of the unchangeable truth, as Augustine intends in the same place. And this is why he says in On True Religion that just as by its truth are true those that are true, so too by its similitude are similar those that are similar. It is necessary, therefore, that the unchangeable truth impress itself into our concept, and that it transform our concept to its own character, and that in this way it inform our mind with the expressed truth of the thing by the same similitude that the thing itself has in the first truth. [HQO a. 1, q. 2, fol. 7, I] So, when we have the cognition of the pure truth of a thing, then we cannot have it in terms of the concept acquired from the thing, yet, since we cannot have it from a direct intuition of the divine exemplar either, the way we can have it is that the acquired concept primarily impressed on our mind will be further clarified, but no longer by a similarity of the thing, but by the similarity of the divine exemplar itself. Henry’s point seems to be that given that the external thing itself is already just a (more or less defective) copy of the exemplar, the (more or less defective) copy of this copy can only be improved by means of the original exemplar, just as a copy of a poor repro of some original picture can only be improved by retouching the copy not on the basis of the poor repro, but on the basis of the original. But since the external thing is fashioned after its divine idea, the “retouching” of the concept in terms of the original idea does yield a better representation of the thing; indeed, so much better that on the basis of this “retouched” concept we are even able to judge just how well the thing realizes its kind. For example, when I simply have the initial simple concept of circle abstracted from circular objects I have seen, that concept is good enough for me to tell circular objects apart from non-circular ones. But with this simple, unanalyzed concept in mind, I may still not be able to say what a true circle is supposed to be, and accordingly, exactly how and to what extent the more or less circular objects I see fail or meet this standard. However, when I come to understand that a circle is a locus of points equidistant from a given point, I will realize by means of a clear and distinct concept what it was that I originally conceived in a vague and confused manner in my original concept of circle.[23] To be sure, I do not come to this definition of circle by looking up to the heaven of Ideas; in fact, I may just be instructed about it by my geometry teacher. But what is not given to me by my geometry teacher is the understanding of the fact that what is expressed by the definition is indeed what I originally rather vaguely conceived by my concept abstracted from visible circles. This “flash” of understanding, when I realize that it is necessary for anything that truly matches the concept of a circle to be such as described in the definition, would be an instance of receiving illumination without any particular, miraculous revelation.[24] However, even if in this light Henry’s distinctions between the two kinds of truths and the corresponding differences of concepts make good sense, and even if we accept that the concepts primarily accepted from sensible objects need to be further worked on in order to provide us with true, clear understanding of the natures of things, it is not clear that this further work cannot be done by the natural faculties of our mind, assuming only the general influence of God in sustaining its natural operations, but without performing any direct and specific “retouching” of our concepts “from above”. Using our previous analogy of the acquired concept as the copy of a poor repro of an original, we may say that if we have a number of different poor, fuzzy repros that are defective in a number of different ways, then in a long and complex process of collating them, we might still be able discern the underlying pattern of the original, and thus produce a copy that is actually closer to the original than any of the direct repros, without ever being allowed a glimpse of the original. In fact, this was precisely the way Aristotelian theologians, such as Aquinas, interpreted Augustine’ conception of illumination, reducing God’s role to providing us with the intelligible light not by directly operating on any of our concepts in particular, but providing the mind with “a certain likeness of the uncreated light, obtained through participation” (ST1, q. 84, a. 5c), namely, the agent intellect. Matthew of Aquasparta quite faithfully describes this view, associating it with the Aristotelian position he rejects: Some people engaged in “philosophizing” [quidam philosophantes] follow this position, although not entirely, when they assert that that light is the general cause of certain cognition, but is not attained, and its special influence is not necessary in natural cognition; but the light of the agent intellect is sufficient together with the species and similitudes of things abstracted and received from the things; for otherwise the operation of [our] nature would be rendered vacuous, our intellect would understand only by coincidence, and our cognition would not be natural, but supernatural. And what Augustine says, namely, that everything is seen in and through that light, is not to be understood as if the intellect would somehow attain that light, nor as if that light would have some specific influence on it, but in such a way that the eternal God naturally endowed us with intellectual light, in which we naturally cognize and see all cognizable things that are within the scope of reason. [DHCR, p. 95] Although Matthew vehemently rejects this position as going against Augustine’s original intention (“which is unacceptable, since he is a prominent teacher, whom catholic teachers and especially theologians ought to follow” — as Matthew says), this view, in ever more refined versions, gained more and more ground toward the end of the 13th century, adopted not only by Aquinas and his followers, but also by his major opponents, namely, Scotus and his followers.[25] Still, illuminationism and abstractionism were never treated by medieval thinkers as mutually exclusive alternatives. They rather served as the two poles of a balancing act in judging the respective roles of nature and direct divine intervention in human intellectual cognition.[26] Although Platonism definitely survived throughout the Middle Ages (and beyond), in the guise of the interconnected doctrines of divine ideas, participation, and illumination, there was a quite general Aristotelian consensus,[27] especially after Abelard’s time, that the mundane universals of the species and genera of material beings exist as such in the human mind, as a result of the mind’s abstracting from their individuating conditions. But consensus concerning this much by no means entailed a unanimous agreement on exactly what the universals thus abstracted are, what it is for them to exist in the mind, how they are related to their particulars, what their real foundation in those particulars is, what their role is in the constitution of our universal knowledge, and how they contribute to the encoding and communication of this knowledge in the various human languages. For although the general Aristotelian stance towards universals successfully handles the inconsistencies quite obviously generated by a naïve Platonist ontology, it gives rise precisely to these further problems of its own. 6. Universals According to Abelard’s Aristotelian Conception It was Abelard who first dealt with the problem of universals explicitly in this form. Having relatively easily disposed of putative universal forms as real entities corresponding to Boethius’ definition, in his Logica Ingredientibus he concludes that given Aristotle’s definition of universals in his On Interpretation as those things that can be predicated of several things, it is only universal words that can be regarded as really existing universals. However, since according to Aristotle’s account in the same work, words are meaningful in virtue of signifying concepts in the mind, Abelard soon arrives at the following questions: These questions open up a new chapter in the history of the problem of universals. For these questions add a new aspect to the bundle of the originally primarily ontological, epistemological, and theological questions constituting the problem, namely, they add a semantic aspect. On the Aristotelian conception of universals as universal predicables, there obviously are universals, namely, our universal words. But the universality of our words is clearly not dependent on the physical qualities of our articulate sounds, or of the various written marks indicating them, but on their representative function. So, to give an account of the universality of our universal words, we have to be able to tell in virtue of what they have this universal representative function, that is to say, we have to be able to assign a common cause by the recognition of which in terms of a common concept we can give a common name to a potential infinity of individuals belonging to the same kind. But this common cause certainly cannot be a common thing in the way Boethius described universal things, for, as we have seen, the assumption of the existence of such a common thing leads to contradictions. To be sure, Abelard also provides a number of further arguments, dealing with several refinements of Boethius’ characterization of universals proposed by his contemporaries, such as William of Champeaux, Bernard of Chartres, Clarembald of Arras, Jocelin of Soissons, and Walter of Mortagne – but I cannot go into those details here.[28] The point is that he refutes and rejects all these suggestions to save real universals either as common things, having their own real unity, or as collections of several things, having a merely collective unity. The gist of his arguments against the former view is that the universal thing on that view would have to have its own numerical unity, and therefore, since it constitutes the substance of all its singulars, all these singulars would have to be substantially one and the same thing which would have to have all their contrary properties at the same time, which is impossible. The main thrust of his arguments against the collection-theory is that collections are arbitrary integral wholes of the individuals that make them up, so they simply do not fill the bill of the Porphyrian characterizations of the essential predicables such as genera and species.[29] So, the common cause of the imposition of universal words cannot be any one thing, or a multitude of things; yet, being a common cause, it cannot be nothing. Therefore, this common cause, which Abelard calls the status[30] of those things to which it is common, is a cause, but it is a cause which is a non-thing. However strange this may sound, Abelard observes that sometimes we do assign causes which are not things. For example, when we say “The ship was wrecked because the pilot was absent”, the cause that we assign, namely, that the pilot was absent is not some thing, it is rather how things were, i.e., the way things were, which in this case we signify by the whole proposition “The pilot was absent”.[31] From the point of view of understanding what Abelard’s status are, it is significant that he assimilates the causal role of status as the common cause of imposition to causes that are signified by whole propositions. These significata of whole propositions, which in English we may refer to by using the corresponding “that-clauses” (as I did above, referring to the cause of the ship’s wreck by the phrase “that the pilot was absent”), and in Latin by an accusative-with-infinitive construction, are what Abelard calls the dicta of propositions. These dicta, not being identifiable with any single thing, yet, not being nothing, constitute an ontological realm that is completely different from that of ordinary things. But it is also in this realm that Abelard’s common causes of imposition may find their place. Abelard says that the common cause of imposition of a universal name has to be something in which things falling under that name agree. For example, the name ‘man’ (in the sense of ‘human being’, and not in the sense of ‘male human being’) is imposed on all humans on account of something in which all humans, as such, agree. But that in which all humans as such agree is that each one of them is a man, that is, each one agrees with all others in their being a man. So, it is their being human [esse hominem] that is the common cause Abelard was looking for, and this is what he calls the status of man. The status of man is not a thing; it is not any singular man, for obviously no singular man is common to all men, and it is not a universal man, for there is no such a thing. But being a man is common in the required manner (i.e., it is something in which all humans agree), yet it is clearly not a thing. For let us consider the singular propositions ‘Socrates is a man’ [Socrates est homo], ‘Plato is a man’ [Plato est homo], etc. These signify their dicta, namely, Socrates’s being a man [Socratem esse hominem], and Plato’s being a man [Platonem esse hominem], etc. But then it is clear that if we abstract from the singular subjects and retain what is common to them all, we can get precisely the status in which all these subjects agree, namely, being a man [esse hominem]. So, the status, just like the dicta from which they can be obtained, constitute an ontological realm that is entirely different from that of ordinary things. Still, despite the fact that it clearly has to do something with abstraction, an activity of the mind, Abelard insists that a status is not a concept of our mind. The reason for his insistence is that the status, being the common cause of imposition of a common name, must be something real, the existence of which is not dependent on the activity of our minds. A status is there in the nature of things, regardless of whether we form a mental act whereby we recognize it or not. In fact, for Abelard, a status is an object of the divine mind, whereby God preconceives the state of his creation from eternity.[32] A concept, or mental image of our mind, however, exists as the object of our mind only insofar as our mind performs the mental act whereby it forms this object. But this object, again, is not a thing, indeed, not any more than any other fictitious object of our minds. However, what distinguishes the universal concept from a merely fictitious object of our mind is that the former corresponds to a status of really existing singular things, whereas the latter does not have anything corresponding to it. To be sure, there are a number of points left in obscurity by Abelard’s discussion concerning the relationships of the items distinguished here. For example, Abelard says that we cannot conceive of the status. However, it seems that we can only signify by our words whatever we can conceive. Yet, Abelard insists that besides our concepts, our words must signify the status themselves.[33] A solution to the problem is only hinted at in Abelard’s remark that the names can signify status, because “their inventor meant to impose them in accordance with certain natures or characteristics of things, even if he did not know how to think out the nature or characteristic of the thing” (Five Texts, Spade 1994, p. 46 (116)). So, we may assume that although the inventor of the name does not know the status, his vague, “senses-bound” conception, from which he takes his word’s signification, is directed at the status, as to that which he intends to signify.[34] However, Abelard does not work out this suggestion in any further detail. Again, it is unclear how the status is related to the individualized natures of the things that agree in the status. If the status is what the divine mind conceives of the singulars in abstraction from them, why couldn’t the nature itself be conceived in the same way? – after all, the abstract nature would not have to be a thing any more than a status is, for its existence would not be real being, but merely its being conceived. Furthermore, it seems quite plausible that Abelard’s status could be derived by abstraction from singular dicta with the same predicate, as suggested above. But dicta are the quite ordinary significata of our propositions, which Abelard never treats as epistemologically problematic, so why would the status, which we could apparently abstract from them, be accessible only to the divine mind? I’m not suggesting that Abelard could not provide acceptable and coherent answers to these and similar questions and problems.[35] But perhaps these problems also contributed to the fact that by the 13th century his doctrine of status was no longer in currency. Another historical factor that may have contributed to the waning of Abelard’s theory was probably the influence of the newly translated Aristotelian writings along with the Arabic commentaries that flooded the Latin West in the second half of the 12th century. 7. Universal Natures in Singular Beings and in Singular Minds The most important influence in this period from our point of view came from Avicenna’s doctrine distinguishing the absolute consideration of a universal nature from what applies to the same nature in the subject in which it exists. The distinction is neatly summarized in the following passage. Horsehood, to be sure, has a definition that does not demand universality. Rather it is that to which universality happens. Hence horsehood itself is nothing but horsehood only. For in itself it is neither many nor one, neither is it existent in these sensibles nor in the soul, neither is it any of these things potentially or actually in such a way that this is contained under the definition of horsehood. Rather [in itself it consists] of what is horsehood only.[36] In his little treatise On Being and Essence, Aquinas explains the distinction in greater detail in the following words: A nature, however, or essence …can be considered in two ways. First, we can consider it according to its proper notion, and this is its absolute consideration; and in this way nothing is true of it except what pertains to it as such; whence if anything else is attributed to it, that will yield a false attribution. …In the other way [an essence] is considered as it exists in this or that [individual]; and in this way something is predicated of it per accidens [non-essentially or coincidentally], on account of that in which it exists, as when we say that a man is white because Socrates is white, although this does not pertain to man as such. A nature considered in this way, however, has two sorts of existence. It exists in singulars on the one hand, and in the soul on the other, and from each of these [sorts of existence] it acquires accidents. In the singulars, furthermore, the essence has several [acts of] existence according to the multiplicity of singulars. Nevertheless, if we consider the essence in the first, or absolute, sense, none of these pertain to it. For it is false to say that the essence of man, considered absolutely, has existence in this singular, because if existence in this singular pertained to man insofar as he is man, man would never exist, except as this singular. Similarly, if it pertained to man insofar as he is man not to exist in this singular, then the essence would never exist in the singular. But it is true to say that man, but not insofar as he is man, may be in this singular or in that one, or else in the soul. Therefore, the nature of man considered absolutely abstracts from every existence, though it does not exclude any. And the nature thus considered is what is predicated of each individual.[37] So, a common nature or essence according to its absolute consideration abstracts from all existence, both in the singulars and in the mind. Yet, and this is the important point, it is the same nature that informs both the singulars that have this nature and the minds conceiving of them in terms of this nature. To be sure, this sameness is not numerical sameness, and thus it does not yield numerically one nature. On the contrary, it is the sameness of several, numerically distinct realizations of the same information-content, just like the sameness of a book in its several copies. Just as there is no such a thing as a universal book over and above the singular copies of the same book, so there is no such a thing as a universal nature existing over and above the singular things of the same nature; still, just as it is true to say that the singular copies are the copies of the same book, so it is true to say that these singulars are of the same nature. Indeed, this analogy also shows why this conception should be so appealing from the point of view of the original epistemological problem of the possibility of universal knowledge, without entailing the ontological problems of naïve Platonism. For just as we do not need to read all copies of the same book in order to know what we can find on the same page in the next copy (provided it is not a corrupt copy),[38] so we can know what may apply to all singulars of the same nature without having to experience them all. Still, we need not assume that we can have this knowledge only if we can get somehow in a mysterious contact with the universal nature over and above the singulars; all we need is to learn how “to read” the singulars in our experience to discern the “common message”, the universal nature, informing them all, uniformly, yet in their distinct singularity. (Note that “reading the singulars” is not a mere metaphor: this is precisely what geneticists are quite literally doing in the process of gene sequencing, for instance, in the human genome project.) Therefore, the same nature is not the same in the same way as the same individual having this nature is the same as long as it exists. For that same nature, insofar as it is regarded as the same, does not even exist at all; it is said to be the same only insofar as it is recognizable as the same, if we disregard everything that distinguishes its instances in several singulars. (Note here that whoever would want to deny such a recognizable sameness in and across several singulars would have to deny that he is able to recognize the same words or the same letters in various sentences; so such a person would not be able to read, write, or even to speak, or understand human speech. But then we shouldn’t really worry about such a person in a philosophical debate.) However, at this point some further questions emerge. If this common nature is recognizably the same on account of disregarding its individuating conditions in the singulars, then isn’t it the result of abstraction; and if so, isn’t it in the abstractive mind as its object? But if it is, then how can Aquinas say that it abstracts both from being in the singulars and from being in the mind? Here we should carefully distinguish between what we can say about the same nature as such, and what we can say about the same nature on account of its conditions as it exists in this or that subject. Again, using our analogy, we can certainly consistently say that the same book in its first edition was 200 pages, whereas in the second only 100, because it was printed on larger pages, but the book itself, as such, is neither 200 nor 100 pages, although it can be either. In the same way, we can consistently say that the same nature as such is neither in the singulars nor in the mind, but of course it is only insofar as it is in the mind that it can be recognizably the same, on account of the mind’s abstraction. Therefore, that it is abstract and is actually recognized as the same in its many instances is something that belongs to the same nature only on account of being conceived by the abstractive mind. This is the reason why the nature is called a universal concept, insofar as it is in the mind. Indeed, it is only under this aspect that it is properly called a universal. So, although that which is predicable of several singulars is nothing but the common nature as such, considered absolutely, still, that it is predicable pertains to the same nature only on account of being conceived by the abstractive intellect, insofar as it is a concept of the mind. At any rate, this is how Aquinas solves the paralogism that seems to arise from this account, according to which the true claims that Socrates is a man and man is a species would seem to entail the falsity that Socrates is a species. For if we say that in the proposition ‘Socrates is a man’ the predicate signifies human nature absolutely, but the same nature, on account of its abstract character, is a species, the false conclusion seems inevitable (Klima 1993a). However, since the common nature is not a species in its absolute consideration, but only insofar as it is in the mind, the conclusion does not follow. Indeed, this reasoning would be just as invalid as the one trying to prove that this book, pointing to the second edition which is actually 100 pages, is 200 pages, because the same book was 200 hundred pages in its first edition. For just as its being 200 pages belongs to the same book only in its first edition, so its being a species belongs to human nature only as it exists in the mind. So, to sum up, we have to distinguish here between the nature existing in this singular (such as the individualized human nature of Socrates, which is numerically one item, mind-independently existing in Socrates), the universal (such as the species of human nature existing only in the mind as its object considered in abstraction from the individuating conditions it has in the singular humans), and the nature according to its absolute consideration (such as human nature considered in abstraction both from its existence in the singulars as its subjects and in the mind as its object). What establishes the distinction of these items is the difference of what can be truly said of them on account of the different conditions they have in this or that. What establishes the unity of these items, however, is that they are somehow the same nature existing and considered under different conditions. For the human nature in Socrates is numerically one, it is numerically distinct from the human nature in Plato, and it has real, mind-independent existence, which is in fact nothing but the existence of Socrates, i.e., Socrates’ life. However, although the human nature in Socrates is a numerically distinct item from the human nature in Plato, insofar as it is human nature, it is formally, in fact, specifically the same nature, for it is human nature, and not another, specifically different, say, feline or canine nature. It is precisely this formal, specific, mind-independent sameness of these items (for, of course, say, this cat and that cat do not differ insofar as they are feline, regardless of whether there is anyone to recognize this) that allows the abstractive human mind to recognize this sameness by abstracting from those individuating conditions on account of which this individualized nature in this individual numerically differs from that individualized nature in that individual. Thus, insofar as the formally same nature is actually considered by a human mind in abstraction from these individualizing conditions, it is a universal, a species, an abstract object of a mental act whereby a human mind conceives of any individualized human nature without its individuating conditions. But, as we could see earlier, nothing can be a human nature existing without its individuating conditions, although any individualized human nature can be thought of without thinking of its necessarily conjoined individuating conditions (just as triangular shape can be thought of without thinking its necessarily conjoined conditions of being isosceles or being scalene). So for this universal concept to be is nothing but to be thought of, to be an object of the abstractive human mind. Finally, human nature in its absolute consideration is the same nature abstracted even from this being, i.e., even from being an object of the mind. Thus, as opposed to both in its existence in individuals and in the mind, neither existence, nor non-existence, nor unity, nor disunity or multiplicity belongs to it, as it is considered without any of these; indeed, it is considered without considering its being considered, for it is considered only in terms of what belongs to it on account of itself, not considering anything that has to belong to it on account of something else in which it can only be (i.e., whether in the mind or in reality). So, the nature according to its absolute consideration does not have numerical unity or multiplicity, which it has as it exists in individuals, nor does it have the formal unity that it has in the consideration of the mind (insofar as it is one species among many), but it has that formal unity which precedes even the recognition of this unity by the abstractive mind.[39] Nevertheless, even if with these distinctions Aquinas’ solution of the paralogism works and what he says about the existence and unity vs. multiplicity of a common nature can be given a consistent interpretation, the emergence of the paralogism itself and the complexities involved in explaining it away, as well as the problems involved in providing this consistent interpretation show the inherent difficulties of this account. The main difficulty is the trouble of keeping track of what we are talking about when it becomes crucial to know what pertains to what on account of what; in general, when the conditions of identity and distinction of the items we are talking about become variable and occasionally rather unclear. Indeed, we can appreciate just how acute these difficulties may become if we survey the items that needed to be distinguished in what may be described as the common conceptual framework of the “realist” via antiqua, the “old way” of doing philosophy and theology, before the emergence of the “modern way”, the “nominalist” via moderna challenging some fundamental principles of the older framework, resulting mostly from the semantic innovations introduced by William Ockham. The survey of these items and the problems they generate will then allow us to see in greater detail the main motivation for Ockham’s innovations. 8. Universals in the Via Antiqua In this framework, we have first of all the universal or common terms of spoken and written languages, which are common on account of being imposed upon universal concepts of the human mind. The concepts themselves are universal on account of being obtained by the activity of the abstractive human mind from experiences of singulars. But the process of concept formation also involves various stages. In the first place, the sensory information collected by the single senses is distinguished, synthesized, and collated by the higher sensory faculties of the common sense [sensus communis] and the so-called cogitative power [vis cogitativa], to be stored in sensory memory as phantasms, the sensory representations of singulars in their singularity. The active intellect [intellectus agens] uses this sensory information to extract its intelligible content and produce the intelligible species [species intelligibiles], the universal representations of several individuals in their various degrees of formal unity, disregarding their distinctive features and individuating conditions in the process of abstraction. The intelligible species are stored in the intellectual memory of the potential intellect [intellectus possibilis], which can then use them to form the corresponding concept in an act of thought, for example, in forming a judgment. The intelligible species and the concepts themselves, being formed by individual human minds, are individual in their being, insofar as they pertain to this or that human mind. However, since they are the result of abstraction, in their information content they are universal. Now insofar as this universal information content is common to all minds that form these concepts at all, and therefore it is a common intelligible content gained by these minds from their objects insofar as they are conceived by these minds in a universal manner, later scholastic thinkers refer to it as the objective concept [conceptus obiectivus], distinguishing it from the formal or subjective concepts [conceptus formales seu subiectivi], which are the individual acts of individual minds carrying this information (just as the individual copies of a book carry the information content of the book).[40] It is this objective concept that is identified as the universal of the human mind (distinguished from the universals of the divine mind), namely, a species, a genus, a difference, a property, or an accident. (Note that these are only the simple concepts. Complex concepts, such as those corresponding to complex terms and propositions are the products of the potential intellect using these concepts in its further operations.) These universals, then, as the objective concepts of the mind, would be classified as beings of reason [entia rationis], the being of which consists in their being conceived (cf. Klima 1993b and Schmidt 1966). To be sure, they are not merely fictitious objects, for they are grounded in the nature of things insofar as they carry the universal information content abstracted from the singulars. But then again, the universal information content of the objective concept itself, considered not insofar as it is in the mind as its object, but in itself, disregarding whatever may carry it, is distinguished from its carriers both in the mind and in the ultimate objects of the mind, the singular things, as the nature of these things in its absolute consideration. However, the common nature as such cannot exist on its own any more than a book could exist without any copies of it or any minds conceiving of it. So, this common nature has real existence only in the singulars, informing them, and giving them their recognizably common characteristics. However, these common characteristics can be recognized as such only by a mind capable of abstracting the common nature from experiencing it in its really existing singular instances. But it is on account of the real existence of these individualized instances in the singulars that the common nature can truly be predicated of the singulars, as long as they are actually informed by these individualized instances. The items thus distinguished and their interconnections can be represented by the following block-diagram. The dashed frames indicate that the items enclosed by them have a certain reduced ontological status, a “diminished” mode of being, while the boxes partly sharing a side indicate the (possible) partial identities of the items they enclose.[41] The arrows pointing from the common term to the singulars, their individualized natures and items in the mind on this diagram represent semantic relations, which I am going to explain later, in connection with Ockham’s innovations. The rest of the arrows indicate the flow of information from experience of singulars through the sensory faculties to the abstractive mind, and to the application of the universal information abstracted by the mind to further singular experiences in acts of judgment. Obviously, this is a rather complicated picture. However, its complexity itself should not be regarded as problematic or even surprising, for that matter. After all, this diagram merely summarizes, and distinguishes the main stages of, how the human mind processes the intelligible, universal information received from a multitude of singular experiences, and then again, how it applies this information in classifying further experiences. This process may reasonably be expected to be complex, and should not be expected to involve fewer stages than, e.g., setting up, and retrieving information from, a computer database. What renders this picture more problematic is rather the difficulties involved in identifying and distinguishing these stages and the corresponding items. Further complications were also generated by the variations in terminology among several authors, and the various criteria of identity and distinctness applied by them in introducing various different notions of identity and distinctness. In fact, many of the great debates of the authors working within this framework can be characterized precisely as disputing the identity or distinctness of the items featured here, or the very criteria of identifying or distinguishing them. For example, already Abelard raised the question whether the concept or mental image, which we may identify in the diagram as the objective concept of later authors, should be identified with the act of thought, which we may identify as the subjective concept, or perhaps a further act of the mind, called formatio, namely, the potential intellect’s act of forming the concept, using the intelligible species as the principle of its action. Such distinctions were later on severely criticized by authors such as John Peter Olivi and others, who argued for the elimination of intelligible species, and, in general, of any intermediaries between an act of the intellect and its ultimate objects, the singulars conceived in a universal manner.[42] Again, looking at the diagram on the side of the singulars, most 13th century authors agreed that what accounts for the specific unity of several individuals of the same species, namely, their specific nature, should be something other than what accounts for their numerical distinctness, namely, their principle of individuation. However, one singular entity in a species of several co-specific individuals has to contain both the principle of the specific unity of these individuals and its own principle of individuation. Therefore, this singular entity, being a composite at least of its specific nature and its principle of individuation, has to be distinct from its specific nature. At any rate, this is the situation with material substances, whose principle of individuation was held to be their matter. However, based on this reasoning, immaterial substances, such as angels, could not be regarded as numerically distinct on account of their matter, but only on account of their form. But since form is the principle of specific unity, difference in form causes specific diversity. Therefore, on this basis, any two angels had to be regarded as different in species. This conclusion was explicitly drawn by Aquinas and others, but it was rejected by Augustinian theologians, and it was condemned in Paris in 1277.[43] So, no wonder authors such as Henry of Ghent and Duns Scotus worked out alternative accounts of individuation, introducing not only different principles of individuation, such as the Scotists’ famous (or infamous) haecceity, but also different criteria of distinctness and identity, such as those grounding Henry of Ghent’s intentional distinction, or Scotus’s formal distinction,[44] or even later Suarez’ modal distinction.[45] But even further problems arose from considering the identity or distinctness of the individualized natures signified by several common terms in one and the same individual. The metaphysical debate over the real distinction of essence and existence from this point of view is nothing but the issue whether the individualized common nature signified by the definition of a thing is the same as the act of being signified by the verb ‘is’ in the same thing. In fact, the famous problem of the plurality vs. unity of substantial forms may also be regarded as a dispute over whether the common natures signified by the substantial predicates on the Porphyrian tree in the category of substance are distinct or the same in the same individual (cf. Callus 1967). Finally, and this appears to be the primary motivation for Ockham’s innovations, there was the question whether one must regard all individualized common natures signified in the same individual by several predicates in the ten Aristotelian categories as distinct from one another. For the affirmative answer would involve commitment to a virtually limitless multiplication of entities. Indeed, according to Ockham, the via antiqua conception would entail that a column is to the right by to-the-rightness, God is creating by creation, is good by goodness, just by justice, mighty by might, an accident inheres by inherence, a subject is subjected by subjection, the apt is apt by aptitude, a chimera is nothing by nothingness, someone blind is blind by blindness, a body is mobile by mobility, and so on for other, innumerable cases.[46] And this is nothing, but “multiplying beings according to the multiplicity of terms… which, however, is erroneous and leads far away from the truth”.[47] 9. Universals in the Via Moderna To be sure, as the very debates within the via antiqua framework concerning the identity or non-identity of various items distinguished in that framework indicate, Ockham’s charges are not quite justified.[48] After all, several via antiqua authors did allow the identification of the significata of terms belonging to various categories, so their “multiplication of beings” did not necessarily match the multiplicity of terms. Furthermore, since via antiqua authors also distinguished between various modes or senses of being, allowing various sorts of “diminished” kinds of being, such as beings of reason, their ontological commitments were certainly not as unambiguous as Ockham would have us believe in this passage. However, if we contrast the diagram of the via antiqua framework above with the following schematic representation of the via moderna framework introduced by Ockham, we can immediately appreciate the point of Ockham’s innovations. Without a doubt, it is the captivating simplicity of this picture, especially as compared with the complexity of the via antiqua picture, that was the major appeal of the Ockhamist approach. There are fewer items here, equally on the same ontological footing, distinguished from one another in terms of the same unambiguous distinction, the numerical distinction between individual real entities. To be sure, there still are universals in this picture. But these universals are neither common natures “contracted” to individuals by some really or merely formally distinct principle of individuation, nor some universal objects of the mind, which exist in a “diminished” manner, as beings of reason. Ockham’s universals, at least in his mature theory,[49] are just our common terms and our common concepts. Our common terms, which are just singular utterances or inscriptions, are common in virtue of being subordinated to our common concepts. Our common concepts, on the other hand, are just singular acts of our singular minds. Their universality consists simply in the universality of their representative function. For example, the common term ‘man’ is a spoken or written universal term of English, because it is subordinated to that concept of our minds by which we conceive of each man indifferently. (See Klima, 2011) It is this indifference in its representative function that enables the singular act of my mind to conceive of each man in a universal manner, and the same goes for the singular act of your mind. Accordingly, there is no need to assume that there is anything in the individual humans, distinct from these humans themselves, a common yet individualized nature waiting to be abstracted by the mind. All we need to assume is that two humans are more similar to each other than either of them to a brute animal, and all animals are more similar to each other than any of them to a plant, etc., and that the mind, being able to recognize this similarity, is able to represent the humans by means of a common specific concept, the animals by means of a common generic concept, all living things by means of a more general generic concept, etc.[50] In this way, then, the common terms subordinated to these concepts need not signify some abstract common nature in the mind, and consequently its individualized instances in the singulars, for they directly signify the singulars themselves, just as they are directly conceived by the universally representative acts of the mind. So, what these common terms signify are just the singulars themselves, which are also the things referred to by these terms when they are used in propositions. Using the customary rendering of the medieval logical terminology, the things ultimately signified by a common term are its significata, while the things referred to by the same term when it is used in a proposition are their (personal) supposita.[51] Now if we compare the two diagrams representing the respective conceptions of the two viae, we can see just how radically Ockham’s innovations changed the character of the semantic relations connecting terms, concepts and things. In both viae, common terms are subordinated to common concepts, and it is in virtue of this subordination that they ultimately signify what their concepts represent. In the via moderna, a concept is just an act of the mind representing singulars in a more or less indifferent manner, yielding a more or less universal signification for the term. In the via antiqua, however, the act of the mind is just one item in a whole series of intermediary representations, distinguished in terms of their different functions in processing universal information, and connected by their common content, ultimately representing the common, yet individualized natures of their singulars.[52] Accordingly, a common term, expressing this common content, is primarily subordinated to the objective concept of the mind. But of course, this objective concept is only the common content of the singular representative acts of singular minds, their subjective concepts, formed by means of the intelligible species, abstracted by their active intellects. On the other hand, the objective concept, abstracting from all individuating conditions, expresses only what is common to all singulars, namely, their nature considered absolutely. But this absolutely considered nature is only the common content of what informs each singular of the same nature in its actual real existence. So, the term’s ultimate significata will have to be the individualized natures of the singulars. But these ultimate significata may still not be the singulars themselves, namely, when the things informed by these significata are not metaphysically simple. In the via moderna conception, therefore, the ultimate significata of a term are nothing but those singular things that can be the term’s supposita in various propositions, as a matter of semantics. By contrast, in the via antiqua conception, a term’s ultimate significata may or may not be the same things as the term’s (personal) supposita, depending on the constitution of these supposita, as a matter of metaphysics. The singulars will be the supposita of the term when it is used as the subject term of a proposition in which something is predicated about the things informed by these ultimate significata (in the case of metaphysically simple entities, the term’s significata and supposita coincide).[53] Nevertheless, despite the nominalists’ charges to the contrary, the via antiqua framework, as far as its semantic considerations are concerned, was no more committed to the real distinction of the significata and supposita of its common terms than the via moderna framework was. For if the semantic theory in itself had precluded the identification of these semantic values, then the question of possible identity of these values could not have been meaningfully raised in the first place. Furthermore, in that case such identifications would have been precluded as meaningless even when talking about metaphysically simple entities, such as angels and God, whereas the metaphysical simplicity of these entities was expressed precisely in terms of such identifications. But also in the mundane cases of the significata and supposita of concrete and abstract universal terms in the nine accidental categories, several via antiqua authors argued for the identification of these semantic values both within and across categories. First of all there was Aristotle’s authority for the claim that action and passion are the same motion,[54] so the significata of terms in these two categories could not be regarded as really distinct entities. But several authors also argued for the identification of relations with their foundations, that is to say, for the identity of the significata of relative terms with the significata of terms in the categories quantity and quality. (For example, on this conception, my equality in height to you would be just my height, provided you were of the same height, and not a distinct “equality-thing” somehow attached to my height, caused by our equal heights.)[55] By contrast, what makes the via moderna approach simpler is that it “automatically” achieves such identifications already on the basis of its semantic principles. Since in this approach the significata of concrete common terms are just the singulars directly represented by the corresponding concepts, the significata and (personal) supposita of terms are taken to be the same singulars from the beginning. So these common terms signify and supposit for the same things either absolutely, provided the term is absolute, or in relation to other singulars, provided the term is connotative. But even in the case of connotative terms, such as relative terms (in fact, all terms in the nine accidental categories, except for some abstract terms in the category quality, according to Ockham) we do not need to assume the existence of some mysterious relational entities informing singular substances. For example, the term ‘father’ need not be construed as signifying in me an inherent relation, my fatherhood, somehow connecting me to my son, and suppositing for me on that account in the context of a proposition; rather, it should merely be construed as signifying me in relation to my son, thereby suppositing for me in the context of a proposition, while connoting my son. 10. The Separation of the Viae, and the Breakdown of Scholastic Discourse in Late-Medieval Philosophy The appeal of the simplicity of the via moderna approach, especially as it was systematically articulated in the works of John Buridan and his students, had a tremendous impact on late-medieval philosophy and theology. To be sure, many late-medieval scholars, who were familiar with both ways, would have shared the sentiment expressed by the remark of Domingo Soto (1494–1560, describing himself as someone who was “born among nominalists and raised by realists”)[56] to the effect that whereas the realist doctrine of the via antiqua was more difficult to understand, still, the nominalist doctrine of the via moderna was more difficult to believe.[57] Nevertheless, the overall simplicity and internal consistency of the nominalist approach were undeniable, gathering a strong following by the 15th century in all major universities of Europe, old and newly established alike.[58] The resulting separation and the ensuing struggle of the medieval viae did not end with the victory of the one over the other. Instead, due to the primarily semantic nature of the separation, getting the parties embroiled in increasingly complicated ways of talking past each other, thereby generating an ever growing dissatisfaction, even contempt, in a new, lay, humanist intelligentsia,[59] it ended with the demise of the characteristically medieval conceptual frameworks of both viae in the late-medieval and early modern period. These developments, therefore, also put an end to the specifically medieval problem of universals. However, the increasingly rarified late-medieval problem eventually vanished only to give way to several modern variants of recognizably the same problem, which keeps recurring in one form or another in contemporary philosophy as well. Indeed, one may safely assert that as long as there is interest in the questions of how a human language obviously abounding in universal terms can be meaningfully mapped onto a world of singulars, there is a problem of universals, regardless of the details of the particular conceptual framework in which the relevant questions are articulated. Clearly, in this sense, the problem of universals is itself a universal, the universal problem of accounting for the relationships between mind, language, and reality.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

Church’s Type Theory

1. Syntax 1.1 Fundamental Ideas We start with an informal description of the fundamental ideas underlying the syntax of Church’s formulation of type theory. All entities have types, and if α and β are types, the type of functions from elements of type β to elements of type α is …

1. Syntax 1.1 Fundamental Ideas We start with an informal description of the fundamental ideas underlying the syntax of Church’s formulation of type theory. All entities have types, and if α and β are types, the type of functions from elements of type β to elements of type α is written as \((\alpha \beta)\). (This notation was introduced by Church, but some authors write \((\beta \rightarrow \alpha)\) instead of \((\alpha \beta)\). See, for example, Section 2 of the entry on type theory.) As noted by Schönfinkel (1924), functions of more than one argument can be represented in terms of functions of one argument when the values of these functions can themselves be functions. For example, if f is a function of two arguments, for each element x of the left domain of f there is a function g (depending on x) such that \(gy = fxy\) for each element y of the right domain of f. We may now write \(g = fx\), and regard f as a function of a single argument, whose value for any argument x in its domain is a function \(fx\), whose value for any argument y in its domain is fxy. For a more explicit example, consider the function + which carries any pair of natural numbers to their sum. We may denote this function by \(+_{((\sigma \sigma)\sigma)}\), where \(\sigma\) is the type of natural numbers. Given any number x, \([+_{((\sigma \sigma)\sigma)}x]\) is the function which, when applied to any number y, gives the value \([[+_{((\sigma \sigma)\sigma)}x]y]\), which is ordinarily abbreviated as \(x + y\). Thus \([+_{((\sigma \sigma)\sigma)}x]\) is the function of one argument which adds x to any number. When we think of \(+_{((\sigma \sigma)\sigma)}\) as a function of one argument, we see that it maps any number x to the function \([+_{((\sigma \sigma)\sigma)}x]\). More generally, if f is a function which maps n-tuples \(\langle w_{\beta},x_{\gamma},\ldots ,y_{\delta},z_{\tau}\rangle\) of elements of types \(\beta\), \(\gamma\),…, \(\delta\) ,\(\tau\), respectively, to elements of type α, we may assign to f the type \(((\ldots((\alpha \tau)\delta)\ldots \gamma)\beta)\). It is customary to use the convention of association to the left to omit parentheses, and write this type symbol simply as \((\alpha \tau \delta \ldots \gamma \beta)\). A set or property can be represented by a function (often called characteristic function) which maps elements to truth values, so that an element is in the set, or has the property, in question iff the function representing the set or property maps that element to truth. When a statement is asserted, the speaker means that it is true, so that \(s x\) means that \(s x\) is true, which also expresses the assertions that s maps x to truth and that \(x \in s\). In other words, \(x \in s\) iff \(s x\). We take \({o}\) as the type symbol denoting the type of truth values, so we may speak of any function of type \(({o}\alpha)\) as a set of elements of type α. A function of type \((({o}\alpha)\beta)\) is a binary relation between elements of type β and elements of type α. For example, if \(\sigma\) is the type of the natural numbers, and \(<\) is the order relation between natural numbers, \(<\) has type \(({o}\sigma \sigma)\), and for all natural numbers x and \(y, {<}x y\) (which we ordinarily write as \(x < y)\) has the value truth iff x is less than y. Of course, \(<\) can also be regarded as the function which maps each natural number x to the set \(<x\) of all natural numbers y such that x is less than y. Thus sets, properties, and relations may be regarded as particular kinds of functions. Church’s type type theory is thus a logic of functions, and, in this sense, it is in the tradition of the work of Frege’s Begriffsschrift. The opposite approach would be to reduce functions to relations, which was the approach taken by Whitehead and Russell (1927a) in the Principia Mathematica. Expressions which denote elements of type α are called wffs of type α. Thus, statements of type theory are wffs of type \({o}\). If \(\bA_{\alpha}\) is a wff of type α in which \(\bu_{\alpha \beta}\) is not free, the function (associated with) \(\bu_{\alpha \beta}\) such that \(\forall \bv_{\beta}[\bu_{\alpha \beta}\bv_{\beta} = \bA_{\alpha}]\) is denoted by \([\lambda \bv_{\beta}\bA_{\alpha}]\). Thus \(\lambda \bv_{\beta}\) is a variable-binder, like \(\forall \bv_{\beta}\) or \(\exists \bv_{\beta}\) (but with a quite different meaning, of course); λ is known as an abstraction operator. \([\lambda \bv_{\beta}\bA_{\alpha}]\) denotes the function whose value on any argument \(\bv_{\beta}\) is \(\bA_{\alpha}\), where \(\bv_{\beta}\) may occur free in \(\bA_{\alpha}\). For example, \([\lambda n_{\sigma}[4\cdot n_{\sigma}+3]]\) denotes the function whose value on any natural number n is \(4\cdot n+3\). Hence, when we apply this function to the number 5 we obtain \([\lambda n_{\sigma}[4\cdot n_{\sigma}+3]]5 4\cdot 5+3 23\). We use \(\textsf{Sub}(\bB,\bv,\bA)\) as a notation for the result of substituting \(\bB\) for \(\bv\) in \(\bA\), and \(\textsf{SubFree}(\bB,\bv,\bA)\) as a notation for the result of substituting \(\bB\) for all free occurrences of \(\bv\) in \(\bA\). The process of replacing \([\lambda \bv_{\beta}\bA_{\alpha}]\bB_{\beta}\) by \(\textsf{SubFree}(\bB_{\beta},\bv_{\beta},\bA_{\alpha})\) (or vice-versa) is known as β-conversion, which is one form of λ-conversion. Of course, when \(\bA_\) is a wff of type \({o}\), \([\lambda \bv_{\beta}\bA_]\) denotes the set of all elements \(\bv_{\beta}\) (of type \(\beta)\) of which \(\bA_\) is true; this set may also be denoted by \(\{\bv_{\beta}|\bA_\}\). For example, \([\lambda x\ x<y]\) denotes the set of x such that x is less than y (as well as that property which a number x has if it is less than y). In familiar set-theoretic notation, would be written (By the Axiom of Extensionality for truth values, when \(\bC_\) and \(\bD_\) are of type \({o}, \bC_ \equiv \bD_\) is equivalent to \(\bC_ = \bD_\).) Propositional connectives and quantifiers can be assigned types and can be denoted by constants of these types. The negation function maps truth values to truth values, so it has type \(({o}{o})\). Similarly, disjunction and conjunction (etc.) are binary functions from truth values to truth values, so they have type \(({o}{o}{o})\). The statement \(\forall \bx_{\alpha}\bA_\) is true iff the set \([\lambda \bx_{\alpha}\bA_]\) contains all elements of type α. A constant \(\Pi_{{o}({o}\alpha)}\) can be introduced (for each type symbol \(\alpha)\) to denote a property of sets: a set \(s_{{o}\alpha}\) has the property \(\Pi_{{o}({o}\alpha)}\) iff \(s_{{o}\alpha}\) contains all elements of type α. With this interpretation should be true, as well as for any wff \(\bA_\) and variable \(\bx_{\alpha}\). Since by λ-conversion we have equation can be written more simply as Thus, \(\forall \bx_{\alpha}\) can be defined in terms of \(\Pi_{{o}({o}\alpha)}\), and λ is the only variable-binder that is needed. 1.2 Formulas Before we state the definition of a “formula”, a word of caution is in order. The reader may be accustomed to thinking of a formula as an expression which plays the role of an assertion in a formal language, and of a term as an expression which designates an object. Church’s terminology is somewhat different, and provides a uniform way of discussing expressions of many different types. What we call well-formed formula of type α (\(\textrm{wff}_{\alpha}\)) below would in more standard terminology be called term of type α, and then only certain terms, namely those with type \({o}\), would be called formulas. Anyhow, in this entry we have decided to stay with Church’s original terminology. Another remark concerns the use of some specific mathematical notation. In what follows, the entry distinguishes between the symbols \(\imath\), \(\iota_{(\alpha({o}\alpha))}\), and \(\atoi\). The first is the symbol used for the type of individuals; the second is the symbol used for a logical constant (see Section 1.2.1 below); the third is the symbol used as a variable-binding operator that represents the definite description “the” (see Section 1.3.4). The reader should not confuse them and check to see that the browser is displaying these symbols correctly. Type symbols are defined inductively as follows: The primitive symbols are the following: A formula is a finite sequence of primitive symbols. Certain formulas are called well-formed formulas (wffs). We write \(\textrm{wff}_{\alpha}\) as an abbreviation for wff of type α, and define this concept inductively as follows: Note, for example, that by (a) \(\nsim_{({o}{o})}\) is a wff\(_{({o}{o})}\), so by (b) if \(\bA_\) is a wff\(_\), then \([\nsim_{({o}{o})}\bA_]\) is a wff\(_\). Usually, the latter wff will simply be written as \(\nsim \bA\). It is often convenient to avoid parentheses, brackets and type symbols, and use conventions for omitting them. For formulas we use the convention of association to the right, and we may write \(\lor_{((oo)o)}\bA_ \bB_\) instead of \([[\lor_{((oo)o)}\bA_] \bB_]\). For types the corresponding convention is association to the left, and we may write \(ooo\) instead of \(((oo)o)\). The last definition is known as the Leibnizian definition of equality. It asserts that x and y are the same if y has every property that x has. Actually, Leibniz called his definition “the identity of indiscernibles” and gave it in the form of a biconditional: x and y are the same if x and y have exactly the same properties. It is not difficult to show that these two forms of the definition are logically equivalent. We now provide a few examples to illustrate how various assertions and concepts can be expressed in Church’s type theory. Example 1 To express the assertion that “Napoleon is charismatic” we introduce constants \(\const{Charismatic}_{{o}\imath}\) and \(\const{Napoleon}_{\imath}\), with the types indicated by their subscripts and the obvious meanings, and assert the wff If we wish to express the assertion that “Napoleon has all the properties of a great general”, we might consider interpreting this to mean that “Napoleon has all the properties of some great general”, but it seems more appropriate to interpret this statement as meaning that “Napoleon has all the properties which all great generals have”. If the constant \(\const{GreatGeneral}_{{o}\imath}\) is added to the formal language, this can be expressed by the wff As an example of such a property, we note that the sentence “Napoleon’s soldiers admire him” can be expressed in a similar way by the wff By λ-conversion, this is equivalent to This statement asserts that one of the properties which Napoleon has is that of being admired by his soldiers. The property itself is expressed by the wff Example 2 We illustrate some potential applications of type theory with the following fable. A rich and somewhat eccentric lady named Sheila has an ostrich and a cheetah as pets, and she wishes to take them from her hotel to her remote and almost inaccessible farm. Various portions of the trip may involve using elevators, boxcars, airplanes, trucks, very small boats, donkey carts, suspension bridges, etc., and she and the pets will not always be together. She knows that she must not permit the ostrich and the cheetah to be together when she is not with them. We consider how certain aspects of this problem can be formalized so that Sheila can use an automated reasoning system to help analyze the possibilities. There will be a set Moments of instants or intervals of time during the trip. She will start the trip at the location \(\const{Hotel}\) and moment \(\const{Start}\), and end it at the location \(\const{Farm}\) and moment \(\const{Finish}\). Moments will have type \(\tau\), and locations will have type \(\varrho\). A state will have type \(\sigma\) and will specify the location of Sheila, the ostrich, and the cheetah at a given moment. A plan will specify where the entities will be at each moment according to this plan. It will be a function from moments to states, and will have type \((\sigma \tau)\). The exact representation of states need not concern us, but there will be functions from states to locations called \(\const{LocationOfSheila}\), \(\const{LocationOfOstrich}\), and \(\const{LocationOfCheetah}\) which provide the indicated information. Thus, \(\const{LocationOfSheila}_{\varrho \sigma}[p_{\sigma \tau}t_{\tau}]\) will be the location of Sheila according to plan \(p_{\sigma \tau}\) at moment \(t_{\tau}\). The set \(\const{Proposals}_{{o}(\sigma \tau)}\) is the set of plans Sheila is considering. We define a plan p to be acceptable if, according to that plan, the group starts at the hotel, finishes at the farm, and whenever the ostrich and the cheetah are together, Sheila is there too. Formally, we define \(\const{Acceptable}_{{o}(\sigma \tau)}\) as We can express the assertion that Sheila has a way to accomplish her objective with the formula Example 3 We now provide a mathematical example. Mathematical ideas can be expressed in type theory without introducing any new constants. An iterate of a function f from a set to itself is a function which applies f one or more times. For example, if \(g(x) = f(f(f(x)))\), then g is an iterate of f. \([\text{ITERATE+}_{{o}(\imath\imath)(\imath\imath)}f_{\imath\imath}g_{\imath\imath}]\) means that \(g_{\imath\imath}\) is an iterate of \(f_{\imath\imath}\). \(\text{ITERATE+}_{{o}(\imath\imath)(\imath\imath)}\) is defined (inductively) as Thus, g is an iterate of f if g is in every set p of functions which contains f and which contains the function \(\lambda x_{\imath}f_{\imath\imath}[j_{\imath\imath}x_{\imath}]\) (i.e., f composed with j) whenever it contains j. A fixed point of f is an element y such that \(f(y) = y\). It can be proved that if some iterate of a function f has a unique fixed point, then f itself has a fixed point. This theorem can be expressed by the wff See Andrews et al. 1996, for a discussion of how this theorem, which is called THM15B, can be proved automatically. Example 4 An example from philosophy is Gödel’s variant of the ontological argument for the existence of God. This example illustrates two interesting aspects: Example 5 Suppose we omit the use of type symbols in the definitions of wffs. Then we can write the formula \(\lambda x\nsim[xx]\), which we shall call \(\textrm{R}\). It can be regarded as denoting the set of all sets x such that x is not in x. We may then consider the formula \([\textrm{R R}]\), which expresses the assertion that \(\textrm{R}\) is in itself. We can clearly prove \([\textrm{R R}] \equiv [[\lambda x\nsim [xx]] \textrm{R}]\), so by λ-conversion we can derive \([\textrm{R R}] \equiv\, \nsim[\textrm{R R}]\), which is a contradiction. This is Russell’s paradox. Russell’s discovery of this paradox (Russell 1903, 101-107) played a crucial role in the development of type theory. Of course, when type symbols are present, \(\textrm{R}\) is not well-formed, and the contradiction cannot be derived. 1.3 Axioms and Rules of Inference We start by listing the axioms for what we shall call elementary type theory. The theorems of elementary type theory are those theorems which can be derived, using the rules of inference, from Axioms (1)–\((6^{\alpha})\) (for all type symbols \(\alpha)\). We shall sometimes refer to elementary type theory as \(\cT\). It embodies the logic of propositional connectives, quantifiers, and λ-conversion in the context of type theory. To illustrate the rules and axioms introduced above, we give a short and trivial proof in \(\cT\). Following each wff of the proof, we indicate how it was inferred. (The proof is actually quite inefficient, since line 3 is not used later, and line 7 can be derived directly from line 5 without using line 6. The additional proof lines have been inserted to illustrate some relevant aspects. For the sake of readability, many brackets have been deleted from the formulas in this proof. The diligent reader should be able to restore them.) Note that (3) can be written as and (7) can be written as We have thus derived a well known law of quantification theory. We illustrate one possible interpretation of the wff \((7')\) (which is closely related to Axiom 6) by considering a situation in which a rancher puts some horses in a corral and leaves for the night. Later, he cannot remember whether he closed the gate to the corral. While reflecting on the situation, he comes to a conclusion which can be expressed by \((7')\) if we take the horses to be the elements of type \(\imath\), interpret \(p_\) to mean “the gate was closed”, and interpret \(r_{{o}\imath}\) so that \(r_{{o}\imath}x_{\imath}\) asserts “\(x_{\imath}\) left the corral”. With this interpretation, \((7')\) says If it is true of every horse that the gate was closed or that the horse left the corral, then the gate was closed or every horse left the corral. To the axioms listed above we add the axioms below to obtain Church’s type theory. The axioms of boolean and functional extensionality are the following: Church did not include Axiom \(7^\) in his list of axioms in Church 1940, but he mentioned the possibility of including it. Henkin did include it in Henkin 1950. The expression stands for For example, stands for By λ-conversion, this is equivalent to which reduces by λ-conversion to This asserts that there is a unique element which has the property \(P_{{o}\alpha}\). From this example we can see that in general, \(\exists_1\bx_{\alpha}\bA_\) expresses the assertion that “there is a unique \(\bx_{\alpha}\) such that \(\bA_\)”. When there is a unique such element \(\bx_{\alpha}\), it is convenient to have the notation \(\atoi\bx_{\alpha}\bA_\) to represent the expression “the \(\bx_{\alpha}\) such that \(\bA_\)”. Russell showed in Whitehead & Russell 1927b how to provide contextual definitions for such notations in his formulation of type theory. In Church’s type theory \(\atoi\bx_{\alpha}\bA_\) is defined as \(\iota_{\alpha({o}\alpha)}[\lambda \bx_{\alpha}\bA_]\). Thus, \(\atoi\) behaves like a variable-binding operator, but it is defined in terms of λ with the aid of the constant \(\iota_{\alpha({o}\alpha)}\). Thus, λ is still the only variable-binding operator that is needed. Since \(\bA_\) describes \(\bx_{\alpha}, \iota_{\alpha({o}\alpha)}\) is called a description operator. Associated with this notation is the following: This says that when the set \(p_{{o}\alpha}\) has a unique member, then \(\iota_{\alpha({o}\alpha)}p_{{o}\alpha}\) is in \(p_{{o}\alpha}\), and therefore is that unique member. Thus, this axiom asserts that \(\iota_{\alpha({o}\alpha)}\) maps one-element sets to their unique members. If from certain hypotheses one can prove then by using Axiom \(8^{\alpha}\) one can derive which can also be written as We illustrate the usefulness of the description operator with a small example. Suppose we have formalized the theory of real numbers, and our theory has constants \(1_{\varrho}\) and \(\times_{\varrho \varrho \varrho}\) to represent the number 1 and the multiplication function, respectively. (Here \(\varrho\) is the type of real numbers.) To represent the multiplicative inverse function, we can define the wff \(\textrm{INV}_{\varrho \varrho}\) as Of course, in traditional mathematical notation we would not write the type symbols, and we would write \(\times_{\varrho \varrho \varrho}z_{\varrho}x_{\varrho}\) as \(z \times x\) and write \(\textrm{INV}_{\varrho \varrho}z\) as \(z^{-1}\). Thus \(z^{-1}\) is defined to be that x such that \(z \times x = 1\). When Z is provably not 0, we will be able to prove \(\exists_1 x_{\varrho}[\times_{\varrho \varrho \varrho} \textrm{Z x}_{\varrho} = 1_{\varrho}]\) and \(Z \times Z^{-1} = 1\), but if we cannot establish that Z is not 0, nothing significant about \(Z^{-1}\) will be provable. The Axiom of Choice can be expressed as follows in Church’s type theory: \((9^{\alpha})\) says that the choice function \(\iota_{\alpha({o}\alpha)}\) chooses from every nonempty set \(p_{{o}\alpha}\) an element, designated as \(\iota_{\alpha({o}\alpha)}p_{{o}\alpha}\), of that set. When this form of the Axiom of Choice is included in the list of axioms, \(\iota_{\alpha({o}\alpha)}\) is called a selection operator instead of a description operator, and \(\atoi\bx_{\alpha} \bA_\) means “an \(\bx_{\alpha}\) such that \(\bA_\)” when there is some such element \(\bx_{\alpha}\). These selection operators have the same meaning as Hilbert’s \(\epsilon\)-operator (Hilbert 1928). However, we here provide one such operator for each type α. It is natural to call \(\atoi\) a definite description operator in contexts where \(\atoi\bx_{\alpha}\bA_\) means “the \(\bx_{\alpha}\) such that \(\bA_\)”, and to call it an indefinite description operator in contexts where \(\atoi\bx_{\alpha}\bA_\) means “an \(\bx_{\alpha}\) such that \(\bA_\)”. Clearly the Axiom of Choice implies the Axiom of Descriptions, but sometimes formulations of type theory are used which include the Axiom of Descriptions, but not the Axiom of Choice. Another formulation of the Axiom of Choice simply asserts the existence of a choice function without explicitly naming it: Normally when one assumes the Axiom of Choice in type theory, one assumes it as an axiom schema, and asserts AC\(^{\alpha}\) for each type symbol α. A similar remark applies to the axioms for extensionality and description. However, modern proof systems for Church’s type theory, which are, e.g., based on resolution, do in fact avoid the addition of such axiom schemata for reasons as further explained in Sections 3.4 and 4 below. They work with more constrained, goal-directed proof rules instead. Before proceeding, we need to introduce some terminology. \(\cQ_0\) is an alternative formulation of Church’s type theory which will be described in Section 1.4 and is equivalent to the system described above using Axioms (1)–(8). A type symbol is propositional if the only symbols which occur in it are \({o}\) and parentheses. Yasuhara (1975) defined the relation “\(\ge\)” between types as the reflexive transitive closure of the minimal relation such that \((\alpha \beta) \ge \alpha\) and \((\alpha \beta) \ge \beta\). He established that: The existence of a choice functions for “higher” types thus entails the existence of choice functions for “lower” types, the opposite is generally not the case though. Büchi (1953) has shown that while the schemas expressing the Axiom of Choice and Zorn’s Lemma can be derived from each other, the relationships between the particular types involved are complex. One can define the natural numbers (and therefore other basic mathematical structures such as the real and complex numbers) in type theory, but to prove that they have the required properties (such as Peano’s Postulates), one needs an Axiom of Infinity. There are many viable possibilities for such an axiom, such as those discussed in Church 1940, section 57 of Church 1956, and section 60 of Andrews 2002. 1.4 A Formulation Based on Equality In Section 1.2.1, \(\nsim_{({o}{o})}, \lor_{(({o}{o}){o})}\), and the \(\Pi_{({o}({o}\alpha))}\)’s were taken as primitive constants, and the wffs \(\sfQ_{{o}\alpha \alpha}\) which denote equality relations at type α were defined in terms of these. We now present an alternative formulation \(\cQ_0\) of Church’s type theory in which there are primitive constants \(\sfQ_{{o}\alpha \alpha}\) denoting equality, and \(\nsim_{({o}{o})}, \lor_{(({o}{o}){o})}\), and the \(\Pi_{({o}({o}\alpha))}\)’s are defined in terms of the \(\sfQ_{{o}\alpha \alpha}\)’s. Tarski (1923) noted that in the context of higher-order logic, one can define propositional connectives in terms of logical equivalence and quantifiers. Quine (1956) showed how both quantifiers and connectives can be defined in terms of equality and the abstraction operator λ in the context of Church’s type theory. Henkin (1963) rediscovered these definitions, and developed a formulation of Church’s type theory based on equality in which he restricted attention to propositional types. Andrews (1963) simplified the axioms for this system. \(\cQ_0\) is based on these ideas, and can be shown to be equivalent to a formulation of Church’s type theory using Axioms (1)–(8) of the preceding sections. This section thus provides an alternative to the material in the preceding Sections 1.2.1–1.3.4. More details about \(\cQ_0\) can be found in Andrews 2002. \(T_\) denotes truth. The meaning of \(\Pi_{{o}({o}\alpha)}\) was discussed in Section 1.1. To see that this definition of \(\Pi_{{o}({o}\alpha)}\) is appropriate, note that \(\lambda x_{\alpha}T\) denotes the set of all elements of type α, and that \(\Pi_{{o}({o}\alpha)}s_{{o}\alpha}\) stands for \(\sfQ_{{o}({o}\alpha)({o}\alpha)}[\lambda x_{\alpha}T] s_{{o}\alpha}\), respectively for \([\lambda x_{\alpha}T] = s_{{o}\alpha}\). Therefore \(\Pi_{{o}({o}\alpha)}s_{{o}\alpha}\) asserts that \(s_{{o}\alpha}\) is the set of all elements of type α, so \(s_{{o}\alpha}\) contains all elements of type α. It can be seen that \(F_\) can also be written as \(\forall x_x_\), which asserts that everything is true. This is false, so \(F_\) denotes falsehood. The expression \(\lambda g_{{o}{o}{o}}[g_{{o}{o}{o}}x_y_]\) can be used to represent the ordered pair \(\langle x_,y_\rangle\), and the conjunction \(x_ \land y_\) is true iff \(x_\) and \(y_\) are both true, i.e., iff \(\langle T_,T_\rangle = \langle x_,y_\rangle\). Hence \(x_ \land y_\) can be expressed by the formula \([\lambda g_{{o}{o}{o}}[g_{{o}{o}{o}}T_T_]] = [\lambda g_{{o}{o}{o}}[g_{{o}{o}{o}}x_y_]]\). Other propositional connectives and the existential quantifier are easily defined. By using \(\iota_{(\imath({o}\imath))}\), one can define description operators \(\iota_{\alpha({o}\alpha)}\) for all types α. \(\cQ_0\) has a single rule of inference. Rule R: From \(\bC\) and \(\bA_{\alpha} = \bB_{\alpha}\), to infer the result of replacing one occurrence of \(\bA_{\alpha}\) in \(\bC\) by an occurrence of \(\bB_{\alpha}\), provided that the occurrence of \(\bA_{\alpha}\) in \(\bC\) is not (an occurrence of a variable) immediately preceded by λ. The axioms for \(\cQ_0\) are the following: 2. Semantics It is natural to compare the semantics of type theory with the semantics of first-order logic, where the theorems are precisely the wffs which are valid in all interpretations. From an intuitive point of view, the natural interpretations of type theory are standard models, which are defined below. However, it is a consequence of Gödel’s Incompleteness Theorem (Gödel 1931) that axioms (1)–(9) do not suffice to derive all wffs which are valid in all standard models, and there is no consistent recursively axiomatized extension of these axioms which suffices for this purpose. Nevertheless, experience shows that these axioms are sufficient for most purposes, and Leon Henkin considered the problem of clarifying in what sense they are complete. The definitions and theorem below constitute Henkin’s (1950) solution to this problem, which is often referred to as general semantics or Henkin semantics. A frame is a collection \(\{\cD_{\alpha}\}_{\alpha}\) of nonempty domains (sets) \(\cD_{\alpha}\), one for each type symbol α, such that \(\cD_ = \{\sfT,\sfF\}\) (where \(\sfT\) represents truth and \(\sfF\) represents falsehood), and \(\cD_{\alpha \beta}\) is some collection of functions mapping \(\cD_{\beta}\) into \(\cD_{\alpha}\). The members of \(\cD_{\imath}\) are called individuals. An interpretation \(\langle \{\cD_{\alpha}\}_{\alpha}, \frI\rangle\) consists of a frame and a function \(\frI\) which maps each constant C of type α to an appropriate element of \(\cD_{\alpha}\), which is called the denotation of C. The logical constants are given their standard denotations. An assignment of values in the frame \(\{\cD_{\alpha}\}_{\alpha}\) to variables is a function \(\phi\) such that \(\phi \bx_{\alpha} \in \cD_{\alpha}\) for each variable \(\bx_{\alpha}\). (Notation: The assignment \(\phi[a/x]\) maps variable x to value a and it is identical with \(\phi\) for all other variable symbols different from x.) An interpretation \(\cM = \langle \{\cD_{\alpha}\}_{\alpha}, \frI\rangle\) is a general model (aka Henkin model) iff there is a binary function \(\cV\) such that \(\cV_{\phi}\bA_{\alpha} \in \cD_{\alpha}\) for each assignment \(\phi\) and wff \(\bA_{\alpha}\), and the following conditions are satisfied for all assignments and all wffs: If an interpretation \(\cM\) is a general model, the function \(\cV\) is uniquely determined. \(\cV_{\phi}\bA_{\alpha}\) is called the value of \(\bA_{\alpha}\) in \(\cM\) with respect to \(\phi\). One can easily show that the following statements hold in all general models \(\cM\) for all assignments \(\phi\) and all wffs \(\bA\) and \(\bB\): The semantics of general models is thus as expected. However, there is a subtlety to note regarding the following condition for arbitrary types α: When the definitions of Section 1.2.1 are employed, where equality has been defined in terms of Leibniz’ principle, then this statement is not implied for all types α. It only holds if we additionally require that the domains \(\cD_{{o}\alpha}\) contain all the unit sets of objects of type α, or, alternatively, that the domains \(\cD_{{o}\alpha\alpha}\) contain the respective identity relations on objects of type α (which entails the former). The need for this additional requirement, which is not included in the original work of Henkin (1950), has been demonstrated in Andrews 1972a. When instead the alternative definitions of Section 1.4 are employed, then this requirement is obviously met due to the presence of the logical constants \(\sfQ_{{o}\alpha \alpha}\) in the signature, which by definition denote the respective identity relations on the objects of type α and therefore trivially ensure their existence in each general model \(\cM\). It is therefore a natural option to always assume primitive equality constants (for each type α) in a concrete choice of base system for Church’s type theory, just as realized in Andrews’ system \(\cQ_0\). An interpretation \(\langle \{\cD_{\alpha}\}_{\alpha}, \frI\rangle\) is a standard model iff for all α and \(\beta , \cD_{\alpha \beta}\) is the set of all functions from \(\cD_{\beta}\) into \(\cD_{\alpha}\). Clearly a standard model is a general model. We say that a wff \(\bA\) is valid in a model \(\cM\) iff \(\cV_{\phi}\bA = \sfT\) for every assignment \(\phi\) into \(\cM\). A model for a set \(\cH\) of wffs is a model in which each wff of \(\cH\) is valid. A wff \(\bA\) is valid in the general [standard] sense iff \(\bA\) is valid in every general [standard] model. Clearly a wff which is valid in the general sense is valid in the standard sense, but the converse of this statement is false. Completeness and Soundness Theorem (Henkin 1950): A wff is a theorem if and only if it is valid in the general sense. Not all frames belong to interpretations, and not all interpretations are general models. In order to be a general model, an interpretation must have a frame satisfying certain closure conditions which are discussed further in Andrews 1972b. Basically, in a general model every wff must have a value with respect to each assignment. A model is said to be finite iff its domain of individuals is finite. Every finite model for \(\cQ_0\) is standard (Andrews 2002, Theorem 5404), but every set of sentences of \(\cQ_0\) which has infinite models also has nonstandard models (Andrews2002, Theorem 5506). An understanding of the distinction between standard and nonstandard models can clarify many phenomena. For example, it can be shown that there is a model \(\cM = \langle \{\cD_{\alpha}\}_{\alpha}, \frI\rangle\) in which \(\cD_{\imath}\) is infinite, and all the domains \(\cD_{\alpha}\) are countable. Thus \(\cD_{\imath}\) and \(\cD_{{o}\imath}\) are both countably infinite, so there must be a bijection h between them. However, Cantor’s Theorem (which is provable in type theory and therefore valid in all models) says that \(\cD_{\imath}\) has more subsets than members. This seemingly paradoxical situation is called Skolem’s Paradox. It can be resolved by looking carefully at Cantor’s Theorem, i.e., \(\nsim \exists g_{{o}\imath\imath}\forall f_{{o}\imath}\exists j_{\imath}[g_{{o}\imath\imath}j_{\imath} = f_{{o}\imath}]\), and considering what it means in a model. The theorem says that there is no function \(g \in \cD_{{o}\imath\imath}\) from \(\cD_{\imath}\) into \(\cD_{{o}\imath}\) which has every set \(f_{{o}\imath} \in \cD_{{o}\imath}\) in its range. The usual interpretation of the statement is that \(\cD_{{o}\imath}\) is bigger (in cardinality) than \(\cD_{\imath}\). However, what it actually means in this model is that h cannot be in \(\cD_{{o}\imath\imath}\). Of course, \(\cM\) must be nonstandard. While the Axiom of Choice is presumably true in all standard models, there is a nonstandard model for \(\cQ_0\) in which AC\(^{\imath}\) is false (Andrews 1972b). Thus, AC\(^{\imath}\) is not provable in \(\cQ_0\). Thus far, investigations of model theory for Church’s type theory have been far less extensive than for first-order logic. Nevertheless, there has been some work on methods of constructing nonstandard models of type theory and models in which various forms of extensionality fail, models for theories with arbitrary (possibly incomplete) sets of logical constants, and on developing general methods of establishing completeness of various systems of axioms with respect to various classes of models. Relevant papers include Andrews 1971, 1972a,b, and Henkin 1975. Further related work can be found in Benzmüller et al. 2004, Brown 2004, 2007, and Muskens 2007. 3. Metatheory 3.1 Lambda-Conversion The first three rules of inference in Section 1.3.1 are called rules of λ-conversion. If \(\bD\) and \(\bE\) are wffs, we write \(\bD \conv \bE\) to indicate that \(\bD\) can be converted to \(\bE\) by applications of these rules. This is an equivalence relation between wffs. A wff \(\bD\) is in β-normal form iff it has no well-formed parts of the form \([[\lambda \bx_{\alpha}\bB_{\beta}]\bA_{\alpha}]\). Every wff is convertible to one in β-normal form. Indeed, every sequence of contractions (applications of rule 2, combined as necessary with alphabetic changes of bound variables) of a wff is finite; obviously, if such a sequence cannot be extended, it terminates with a wff in β-normal form. (This is called the strong normalization theorem.) By the Church-Rosser Theorem, this wff in β-normal form is unique modulo alphabetic changes of bound variables. For each wff \(\bA\) we denote by \({\downarrow}\bA\) the first wff (in some enumeration) in β-normal form such that \(\bA \conv {\downarrow} \bA\). Then \(\bD \conv \bE\) if and only if \({\downarrow} \bD = {\downarrow} \bE\). By using the Axiom of Extensionality one can obtain the following derived rule of inference: \(\eta\)-Contraction. Replace a well-formed part \([\lambda \by_{\beta}[\bB_{\alpha \beta}\by_{\beta}]]\) of a wff by \(\bB_{\alpha \beta}\), provided \(\by_{\beta}\) does not occur free in \(\bB_{\alpha \beta}\). This rule and its inverse (which is called \(\eta\)-Expansion) are sometimes used as additional rules of λ-conversion. See Church 1941, Stenlund 1972, Barendregt 1984, and Barendregt et al. 2013 for more information about λ-conversion. It is worth mentioning (again) that λ-abstraction replaces the need for comprehension axioms in Church’s type theory. 3.2 Higher-Order Unification The challenges in higher-order unification are outlined very briefly. More details on the topic are given in Dowek 2001; its utilization in higher-order theorem provers is also discussed in Benzmüller & Miller 2014. Definition. A higher-order unifier for a pair \(\langle \bA,\bB\rangle\) of wffs is a substitution \(\theta\) for free occurrences of variables such that \(\theta \bA\) and \(\theta \bB\) have the same β-normal form. A higher-order unifier for a set of pairs of wffs is a unifier for each of the pairs in the set. Higher-order unification differs from first-order unification (Baader & Snyder 2001) in a number of important respects. In particular: However, an algorithm has been devised (Huet 1975, Jensen & Pietrzykowski 1976), called pre-unification, which will find a unifier for a set of pairs of wffs if one exists. The pre-unifiers computed by Huet’s procedure are substitutions that can reduce the original unification problem to one involving only so called flex-flex unification pairs. Flex-flex pairs have variable head symbols in both terms to be unified and they are known to always have a solution. The concrete computation of these solutions can thus be postponed or omitted. Pre-unification is utilized in all the resolution based theorem provers mentioned in Section 4. Pattern unification refers a small subset of unification problems, first studied by Miller 1991, whose identification has been important for the construction of practical systems. In a pattern unification problem every occurrence of an existentially quantified variable is applied to a list of arguments that are all distinct variables bound by either a λ-binder or a universal quantifier in the scope of the existential quantifier. Thus, existentially quantified variables cannot be applied to general terms but a very restricted set of bound variables. Pattern unification, like first-order unification, is decidable and most general unifiers exist for solvable problems. This is why pattern unification is preferably employed (when applicable) in some state-of-the-art theorem provers for Church’s type theory. 3.3 A Unifying Principle The Unifying Principle was introduced in Smullyan 1963 (see also Smullyan 1995) as a tool for deriving a number of basic metatheorems about first-order logic in a uniform way. The principle was extended to elementary type theory by Andrews (1971) and to extensional type theory, that is, Henkin’s general semantics without description or choice, by Benzmüller, Brown and Kohlhase (2004). We outline these extensions in some more detail below. The Unifying Principle was extended to elementary type theory (the system \(\cT\) of Section 1.3.2) in Andrews 1971 by applying ideas in Takahashi 1967. This Unifying Principle for \(\cT\) has been used to establish cut-elimination for \(\cT\) in Andrews 1971 and completeness proofs for various systems of type theory in Huet 1973a, Kohlhase 1995, and Miller 1983. We first give a definition and then state the principle. Definition. A property \(\Gamma\) of finite sets of wffs\(_\) is an abstract consistency property iff for all finite sets \(\cS\) of wffs\(_\), the following properties hold (for all wffs A, B): Note that consistency is an abstract consistency property. Unifying Principle for \(\cT\). If \(\Gamma\) is an abstract consistency property and \(\Gamma(\cS)\), then \(\cS\) is consistent in \(\cT\). Here is a typical application of the Unifying Principle. Suppose there is a procedure \(\cM\) which can be used to refute sets of sentences, and we wish to show it is complete for \(\cT\). For any set of sentences, let \(\Gamma(\cS)\) mean that \(\cS\) is not refutable by \(\cM\), and show that \(\Gamma\) is an abstract consistency property. Now suppose that \(\bA\) is a theorem of \(\cT\). Then \(\{\nsim \bA\}\) is inconsistent in \(\cT\), so by the Unifying Principle not \(\Gamma(\{\nsim \bA\})\), so \(\{\nsim \bA\}\) is refutable by \(\cM\). Extensions of the above Unifying principle towards Church’s type theory with general semantics were studied since the mid nineties. A primary motivation was to support (refutational) completeness investigations for the proof calculi underlying the emerging higher-order automated theorem provers (see Section 4 below). The initial interest was on a fragment of Church’s type theory, called extensional type theory, that includes the extensionality axioms, but excludes \(\iota_{(\alpha({o}\alpha))}\) and the axioms for it (description and choice were largely neglected in the automated theorem provers at the time). Analogous to before, a distinction has been made between extensional type theory with defined equality (as in Section 1.2.1, where equality is defined via Leibniz’ principle) and extensional type theory with primitive equality (e.g., system \(\cQ_0\) as in Section 1.4, or, alternatively, a system based on logical constants \(\nsim_{({o}{o})}, \lor_{(({o}{o}){o})}\), and the \(\Pi_{({o}({o}\alpha))}\)’s as in Section 1.2.1, but with additional primitive logical constants \(=_{{o}\alpha\alpha}\) added). A first attempt towards a Unifying Principle for extensional type theory with primitive equality is presented in Kohlhase 1993. The conditions given there, which are still incomplete[1], were subsequently modified and complemented as follows: 3.4 Cut-Elimination and Cut-Simulation Cut-elimination proofs (see also the SEP entry on proof theory) for Church’s type theory, which are often closely related to such proofs (Takahashi 1967, 1970; Prawitz 1968; Mints 1999) for other formulations of type theory, may be found in Andrews 1971, Dowek & Werner 2003, and Brown 2004. In Benzmüller et al. 2009 it is shown how certain wffs\(_\), such as axioms of extensionality, descriptions, choice (see Sections 1.3.3 to 1.3.5), and induction, can be used to justify cuts in cut-free sequent calculi for elementary type theory. Moreover, the notions of cut-simulation and cut-strong axioms are introduced in this work, and the need for omitting defined equality and for eliminating cut-strong axioms such as extensionality, description, choice and induction in machine-oriented calculi (e.g., by replacing them with more constrained, goal-directed rules) in order to reduce cut-simulation effects are discussed as a major challenge for higher-order automated theorem proving. In other words, including cut-strong axioms in a machine-oriented proof calculus for Church’s type theory is essentially as bad as including a cut rule, since the cut rule can be mimicked by them. 3.5 Expansion Proofs An expansion proof is a generalization of the notion of a Herbrand expansion of a theorem of first-order logic; it provides a very elegant, concise, and nonredundant representation of the relationship between the theorem and a tautology which can be obtained from it by appropriate instantiations of quantifiers and which underlies various proofs of the theorem. Miller (1987) proved that a wff \(\bA\) is a theorem of elementary type theory if and only if \(\bA\) has an expansion proof. In Brown 2004 and 2007, this concept is generalized to that of an extensional expansion proof to obtain an analogous theorem involving type theory with extensionality. 3.6 The Decision Problem Since type theory includes first-order logic, it is no surprise that most systems of type theory are undecidable. However, one may look for solvable special cases of the decision problem. For example, the system \(\cQ_{0}^1\) obtained by adding to \(\cQ_0\) the additional axiom \(\forall x_{\imath}\forall y_{\imath}[x_{\imath}=y_{\imath}]\) is decidable. Although the system \(\cT\) of elementary type theory is analogous to first-order logic in certain respects, it is a considerably more complex language, and special cases of the decision problem for provability in \(\cT\) seem rather intractable for the most part. Information about some very special cases of this decision problem may be found in Andrews 1974, and we now summarize this. A wff of the form \(\exists \bx^1 \ldots \exists \bx^n [\bA=\bB]\) is a theorem of \(\cT\) iff there is a substitution \(\theta\) such that \(\theta \bA \conv \theta \bB\). In particular, \(\vdash \bA=\bB\) iff \(\bA \conv \bB\), which solves the decision problem for wffs of the form \([\bA=\bB]\). Naturally, the circumstance that only trivial equality formulas are provable in \(\cT\) changes drastically when axioms of extensionality are added to \(\cT\). \(\vdash \exists \bx_{\beta}[\bA=\bB]\) iff there is a wff \(\bE_{\beta}\) such that \(\vdash[\lambda \bx_{\beta}[\bA=\bB]]\bE_{\beta}\), but the decision problem for the class of wffs of the form \(\exists \bx_{\beta}[\bA=\bB]\) is unsolvable. A wff of the form \(\forall \bx^1 \ldots \forall \bx^n\bC\), where \(\bC\) is quantifier-free, is provable in \(\cT\) iff \({\downarrow} \bC\) is tautologous. On the other hand, the decision problem for wffs of the form \(\exists \bz\bC\), where \(\bC\) is quantifier-free, is unsolvable. (By contrast, the corresponding decision problem in first-order logic with function symbols is known to be solvable (Maslov 1967).) Since irrelevant or vacuous quantifiers can always be introduced, this shows that the only solvable classes of wffs of \(\cT\) in prenex normal form defined solely by the structure of the prefix are those in which no existential quantifiers occur. 4. Automation 4.1 Machine-Oriented Proof Calculi The development, respectively improvement, of machine-oriented proof calculi for Church’s type theory is still a challenge research topic. Compared, e.g., to the theoretical and practical maturity achieved in first-order automated theorem proving, the area is still in its infancy. Obviously, the challenges are also much bigger than in first-order logic. The practically way more expressive nature of the term-language of Church’s type theory causes a larger, bushier and more difficult to traverse proof search space than in first-order logic. Moreover, remember that unification, which constitutes a very important control and filter mechanism in first-order theorem proving, is undecidable (in general) in type theory; see Section 3.2. On the positive side, however, there is a chance to find significantly shorter proofs than in first-order logic. This is well illustrated with a small, concrete example in Boolos 1987. Clearly, much further progress is needed to further leverage the practical relevance of existing calculi for Church’s type theory and their implementations (see Section 4.3). The challenges include It is planned that future editions of this article further elaborate on machine-oriented proof calculi for Church’s type theory. For the time being, however, we provide only a selection of historical and more recent references for the interested reader (see also Section 5 below): 4.2 Early Proof Assistants Early computer systems for proving theorems of Church’s type theory (or extensions of it) include HOL (Gordon 1988; Gordon & Melham 1993), TPS (Andrews et al. 1996; Andrews & Brown 2006), Isabelle (Paulson 1988, 1990), PVS (Owre et al. 1996; Shankar 2001), IMPS (Farmer et al. 1993), HOL Light (Harrison 1996), OMEGA (Siekmann et al. 2006), and λClam (Richardson et al. 1998). See Other Internet References section below for links to further info on these and other provers mentioned later. The majority of the above systems focused (at least initially) on interactive proof and provided rather limited support for additional proof automation. Full proof automation was pioneered, in particular, by the TPS project. Progress was made in the nineties, when other projects started similar activities, respectively, enforced theirs. However, the resource investments and achievements were lacking much behind those seen in first-order theorem proving. Significant progress was fostered only later, in particular, through the development of a commonly supported syntax for Church’s type theory, called TPTP THF (Sutcliffe & Benzmüller 2010), and the inclusion, from 2009 onwards, of a TPTP THF division in the yearly CASC competitions (kind of world championships for automated theorem proving; see Sutcliffe 2016 for further details). 4.3 Automated Theorem Provers An selection of theorem provers for Church’s type theory is presented. The focus is on systems that have successfully participated in TPTP THF CASC competitions in the past. The latest editions of most mentioned systems can be accessed online via the SystemOnTPTP infrastructure (Sutcliffe 2017). Nearly all mentioned systems produce verifiable proof certificates in the TPTP TSTP syntax. Further details on the automation of Church’s type theory are given in Benzmüller & Miller 2014. The TPS prover (Andrews et al. 1996, Andrews & Brown 2006) can be used to prove theorems of elementary type theory or extensional type theory automatically, interactively, or semi-automatically. When searching for a proof automatically, TPS first searches for an expansion proof (Miller 1987) or an extensional expansion proof (Brown 2004, 2007) of the theorem. Part of this process involves searching for acceptable matings (Andrews 1981, Bishop 1999). The behavior of TPS is controlled by sets of flags, also called modes. A simple scheduling mechanism is employed in the latest versions of TPS to sequentially run a about fifty modes for a limited amount of time. TPS was the winner of the first THF CASC competition in 2009. The LEO-II prover (Benzmüller et al. 2015) is the successor of LEO (Benzmüller & Kohlhase 1998b, which was hardwired with the OMEGA proof assistant (LEO stands for Logical Engine of OMEGA). The provers are based on the RUE-resolution calculi developed in Benzmüller 1999a,b. LEO was the first prover to implement calculus rules for extensionality to avoid cut-simulation effects. LEO-II inherits and adapts them, and provides additional calculus rules for description and choice. The prover, which internally collaborates with first-order provers (preferably E) and SAT solvers, has pioneered cooperative higher-order/first-order proof automation. Since the prover is often too weak to find a refutation among the steadily growing set of clauses on its own, some of the clauses in LEO-II’s search space attain a special status: they are first-order clauses modulo the application of an appropriate transformation function. Therefore, LEO-II progressively launches time limited calls with these clauses to a first-order theorem prover, and when the first-order prover reports a refutation, LEO-II also terminates. Parts of these ideas were already implemented in the predecessor LEO. Communication between LEO-II and the cooperating first-order theorem provers uses the TPTP language and standards. LEO-II was the winner of the second THF CASC competition in 2010. The Satallax prover (Brown 2012) is based on a complete ground tableau calculus for Church’s type theory with choice (Backes & Brown 2011). An initial tableau branch is formed from the assumptions of a conjecture and negation of its conclusion. From that point on, Satallax tries to determine unsatisfiability or satisfiability of this branch. Satallax progressively generates higher-order formulas and corresponding propositional clauses. Satallax uses the SAT solver MiniSat as an engine to test the current set of propositional clauses for unsatisfiability. If the clauses are unsatisfiable, the original branch is unsatisfiable. Satallax provides calculus rules for extensionality, description and choice. If there are no quantifiers at function types, the generation of higher-order formulas and corresponding clauses may terminate. In that case, if MiniSat reports the final set of clauses as satisfiable, then the original set of higher-order formulas is satisfiable (by a standard model in which all types are interpreted as finite sets). Satallax was the winner of the THF CASC competition in 2011 and since 2013. The Isabelle/HOL system (Nipkow, Wenzel, & Paulson 2002) has originally been designed as an interactive prover. However, in order to ease user interaction several automatic proof tactics have been added over the years. By appropriately scheduling a subset of these proof tactics, some of which are quite powerful, Isabelle/HOL has since about 2011 been turned also into an automatic theorem prover for TPTP THF (and other TPTP syntax formats), that can be run from a command shell like other provers. The most powerful proof tactics that are scheduled by Isabelle/HOL include the Sledgehammer tool (Blanchette et al. 2013), which invokes a sequence of external first-order and higher-order theorem provers, the model finder Nitpick (Blanchette & Nipkow 2010), the equational reasoner simp, the untyped tableau prover blast, the simplifier and classical reasoners auto, force, and fast, and the best-first search procedure best. In contrast to all other automated theorem provers mentioned above, the TPTP incarnation of Isabelle/HOL does not yet output proof certificates. Isabelle/HOL was the winner of the THF CASC competition in 2012. The agsyHOL prover is based on a generic lazy narrowing proof search algorithm. Backtracking is employed and a comparably small search state is maintained. The prover outputs proof terms in sequent style which can be verified in the Agda system. coqATP implements (the non-inductive) part of the calculus of constructions (Bertot & Castéran 2004). The system outputs proof terms which are accepted as proofs (after the addition of a few definitions) by the Coq proof assistant. The prover employs axioms for functional extensionality, choice, and excluded middle. Boolean extensionality is not supported. In addition to axioms, a small library of basic lemmas is employed. The Leo-III prover implements a paramodulation calculus for Church’s type theory (Steen 2018). The system, which is a descendant of LEO and LEO-II, provides calculus rules for extensionality, description and choice. The system has put an emphasis on the implementation of an efficient set of underlying data structures, on simplification routines and on heuristic rewriting. In the tradition of its predecessors, Leo-III cooperates with first-order reasoning tools using translations to many-sorted first-order logic. The prover accepts every common TPTP syntax dialect and is thus very widely applicable. Recently, the prover has also been extended to natively supports almost every normal higher-order modal logic. Zipperposition (Bentkamp et al. 2018) is new and inspiring higher-order theorem prover which, at the current state of development, is still working for a comparably weak fragment of Church’s type theory, called lambda-free higher-order logic (a comprehension-free higher-order logic, which is nevertheless supporting λ-notation). The system, which is based on superposition calculi, is developed bottom up, and it is progressively extended towards stronger fragments of Church’s type theory and to support other relevant extensions such datatypes, recursive functions and arithmetic. Various so called proof hammers, in the spirit of Isabelle’s Sledgehammer tool, have recently been developed and integrated with modern proof assistants. Prominent examples include HOL(y)Hammer (Kaliszyk & Urban 2015) for HOL Light and a similar hammer (Czaika & Kaliszyk 2018) for the proof assistant Coq. 4.4 (Counter-)Model Finding Support for finding finite models or countermodels for formulas of Church’s type theory was implemented already in the tableau-based prover HOT (Konrad 1998). Restricted (counter-)model finding capabilities are also implemented in the provers Satallax, LEO-II and LEO-III. The most advanced (finite) model finding support is currently realized in the systems Nitpick, Nunchaku and Refute. These tools have been integrated with the Isabelle proof assistant. Nitpick is also available as a standalone tool that accepts TPTP THF syntax. The systems are particularly valuable for exposing errors and misconceptions in problem encodings, and for revealing bugs in the THF theorem provers. 5. Applications 5.1 Semantics of Natural Language Church’s type theory plays an important role in the study of the formal semantics of natural language. Pioneering work on this was done by Richard Montague. See his papers “English as a formal language”, “Universal grammar”, and “The proper treatment of quantification in ordinary English”, which are reprinted in Montague 1974. A crucial component of Montague’s analysis of natural language is the definition of a tensed intensional logic (Montague 1974: 256), which is an enhancement of Church’s type theory. Montague Grammar had a huge impact, and has since been developed in many further directions, not least in Typelogical/Categorical Grammar. Further related work on intensional and higher-order modal logic is presented in Gallin 1975 and Muskens 2006. 5.2 Mathematics and Computer Science Proof assistants based on Church’s Type Theory, including Isabelle/HOL, HOL Light, HOL4, and PVS, have been successfully utilized in a broad range of application in computer science and mathematics. Applications in computer science include the verification of hardware, software and security protocols. A prominent example is the L4.verified project in which Isabelle/HOL was used to formally prove that the seL4 operating system kernel implements an abstract, mathematical model specifying of what the kernel is supposed to do (Klein et al. 2018). In mathematics proof assistants have been applied for the development of libraries mathematical theories and the verification of challenge theorems. An early example is the mathematical library that was developed since the eighties in the TPS project. A exemplary list of theorems that were proved automatically with TPS is given in Andrews et al. 1996. A very prominent recent example is Hales Flyspeck in which HOL Light was employed to develop a formal proof for Kepler’s conjecture (Hales et al. 2017). An example that strongly exploits automation support in Isabelle/HOL with Sledgehammer and Nitpick is presented in Benzmüller & Scott forthcoming. In this work different axiom systems for category theory were explored and compared. A solid overview on past and ongoing formalization projects can be obtained by consulting respective sources such as Isabelle’s Archive of Formal Proofs, the Journal of Formalized Reasoning, or the THF entries in Sutcliffe’s TPTP problem library. Further improving proof automation within these proof assistants—based on proof hammering tools or on other forms of prover integration—is relevant for minimizing interaction effort in future applications.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

The Revision Theory of Truth

1. Semiformal introduction Let's take a closer look at the sentence (1), given above: (1) is not true. (1) It will be useful to make the paradoxical reasoning explicit. First, suppose that (1) is not true. (2) It seems an intuitive principle concerning truth that, for any sentence p, we …

1. Semiformal introduction Let's take a closer look at the sentence (1), given above: (1) is not true. (1) It will be useful to make the paradoxical reasoning explicit. First, suppose that (1) is not true. (2) It seems an intuitive principle concerning truth that, for any sentence p, we have the so-called T-biconditional ‘p’ is true iff p. (3) (Here we are using ‘iff’ as an abbreviation for ‘if and only if’.) In particular, we should have ‘(1) is not true’ is true iff (1) is not true. (4) Thus, from (2) and (4), we get ‘(1) is not true’ is true. (5) Then we can apply the identity, (1) = ‘(1) is not true.’ (6) to conclude that (1) is true. This all shows that if (1) is not true, then (1) is true. Similarly, we can also argue that if (1) is true then (1) is not true. So (1) seems to be both true and not true: hence the paradox. As stated above, the three-valued approach to the paradox takes the liar sentence, (1), to be neither true nor false. Exactly how, or even whether, this move blocks the above reasoning is a matter for debate. The RTT is not designed to block reasoning of the above kind, but to model it-or most of it.[2] As stated above, the central idea is the idea of a revision process: a process by which we revise hypotheses about the truth-value of one or more sentences. Consider the reasoning regarding the liar sentence, (1) above. Suppose that we hypothesize that (1) is not true. Then, with an application of the relevant T-biconditional, we might revise our hypothesis as follows: Hypothesis: (1) is not true. T-biconditional: ‘(1) is not true’ is true iff (1) is not true. Therefore: ‘(1) is not true’ is true. Known identity: (1) = ‘(1) is not true’. Conclusion: (1) is true. New revised hypothesis: (1) is true. We could continue the revision process, by revising our hypothesis once again, as follows: New hypothesis: (1) is true. T-biconditional: ‘(1) is not true’ is true iff (1) is not true. Therefore: ‘(1) is not true’ is not true. Known identity: (1) = ‘(1) is not true’. Conclusion: (1) is not true. New new revised hypothesis: (1) is not true. As the revision process continues, we flip back and forth between taking the liar sentence to be true and not true. Example 1.1 It is worth seeing how this kind of revision reasoning works in a case with several interconnected sentences. Let's apply the revision idea to the following three sentences: (8) is true or (9) is true. (7) (7) is true. (8) (7) is not true. (9) Informally, we might reason as follows. Either (7) is true or (7) is not true. Thus, either (8) is true or (9) is true. Thus, (7) is true. Thus (8) is true and (9) is not true, and (7) is still true. Iterating the process once again, we get (8) is true, (9) is not true, and (7) is true. More formally, consider any initial hypothesis, h0, about the truth values of (7), (8) and (9). Either h0 says that (7) is true or h0 says that (7) is not true. In either case, we can use the T-biconditional to generate our revised hypothesis h1: if h0 says that (7) is true, then h1 says that ‘(7) is true’ is true, i.e. that (8) is true; and if h0 says that (7) is not true, then h1 says that ‘(7) is not true’ is true, i.e. that (9) is true. So h1 says that either (8) is true or (9) is true. So h2 says that ‘(8) is true or (9) is true’ is true. In other words, h2 says that (7) is true. So no matter what hypothesis h0 we start with, two iterations of the revision process lead to a hypothesis that (7) is true. Similarly, three or more iterations of the revision process, lead to the hypothesis that (7) is true, (8) is true and (9) is not true — regardless of our initial hypothesis. In Section 3, we will reconsider this example in a more formal context. One thing to note is that, in Example 1.1, the revision process yields stable truth values for all three sentences. The notion of a sentence stably true in all revision sequences will be a central notion for the RTT. The revision-theoretic treatment contrasts, in this case, with the three-valued approach: on most ways of implementing the three-valued idea, all three sentences, (7), (8) and (9), turn out to be neither true nor false.[3] In this case, the RTT arguably better captures the correct informal reasoning than does the three-valued approach: the RTT assigns to the sentences (7), (8) and (9) the truth-values that were assigned to them by the informal reasoning given at the beginning of the example. 2. Framing the problem 2.1 Truth languages The goal of the RTT is not to give a paradox-free account of truth. Rather, the goal of the RTT is to give an account of our often unstable and often paradoxical reasoning about truth. RTT seeks, more specifically, to give a a two-valued account that assigns stable classical truth values to sentences when intuitive reasoning would assign stable classical truth values. We will present a formal semantics for a formal language: we want that language to have both a truth predicate and the resources to refer to its own sentences. Let us consider a first-order language L, with connective &, ∨, and ¬, quantifiers ∀ and ∃, the equals sign =, variables, and some stock of names, function symbols and relation symbols. We will say that L is a truth language, if it has a distinguished predicate T and quotation marks ‘ and ’, which will be used to form quote names: if A is a sentence of L, then ‘A’ is a name. Let SentL = {A : A is a sentence of L}. It will be useful to identify the T-free fragment of a truth language L: the first-order language L− that has the same names, function symbols and relation symbols as L, except the unary predicate T. Since L− has the same names as L, including the same quote names, L− will have a quote name ‘A’ for every sentence A of L. Thus ∀xTx is not a sentence of L−, but ‘∀xTx’ is a name of L− and ∀x(x = ‘∀xTx’) is a sentence of L−. 2.2 Ground models Other than the truth predicate, we will assume that our language is interpreted classically. More precisely, let a ground model for L be a classical model M = <D, I > for L−, the T-free fragment of L, satisfying the following: Clauses (1) and (2) simply specify what it is for M to be a classical model of the T-free fragment of L. Clauses (3) and (4) ensure that L, when interpreted, can talk about its own sentences. Given a ground model, we will consider the prospects of providing a satisfying interpretation of T. The most obvious desideratum is that the ground model, expanded to include an interpretation of T, satisfy Tarski's T-biconditionals, i.e., the biconditionals of the form T ‘A’ iff A for each A ∈ SentL. Some useful terminology: Given a ground model M for L and a name, function symbol or relation symbol X, we can think of I(X) as the interpretation or, to borrow a term from Gupta and Belnap, the signification of X. Gupta and Belnap characterize an expression's or concept's signification in a world w as “an abstract something that carries all the information about all the expression's [or concept's] extensional relations in w.” If we want to interpret Tx as ‘x is true’, then, given a ground model M, we would like to find an appropriate signification, or an appropriate range of significations, for T. 2.3 Three ground models We might try to assign to T a classical signification, by expanding M to a classical model M′ = <D′, I′ > for all of L, including T. Also recall that we want M′ to satisfy the T-biconditionals: for our immediate purposes, let us interpret these classically. Let us say that an expansion M′ of a ground model M is Tarskian iff M′ is a classical model and all of the T-biconditionals, interpreted classically, are true in M′. We would like to expand ground models to Tarskian models. We consider three ground models in order to assess our prospects for doing this. Ground model M1 Our first ground model is a formalization of Example 1.1, above. Suppose that L1 contains three non-quote names, α, β, and γ, and no predicates other than T. Let M1 = <D1, I1 > be as follows: D1 = SentL1 I1(α) = Tβ ∨ Tγ I1(β) = Tα I1(γ) = ¬Tα Ground model M2 Suppose that L2 contains one non-quote names, τ, and no predicates other than T. Let M2 = <D2, I2 > be as follows: D2 = SentL2 I2(τ) = Tτ Ground model M3 Suppose that L3 contains one non-quote names, λ, and no predicates other than T. Let M3 = <D3, I3 > be as follows: D3 = SentL3 I3(λ) = ¬Tλ Theorem 2.1 (1) M1 can be expanded to exactly one Tarskian model: in this model, the sentences (Tβ ∨ Tγ) and Tα are true, while the sentence ¬Tα is false. (2) M2 can be expanded to exactly two Tarskian models, in one of which the sentence Tτ is true and in the other of which the sentence Tτ is false. (3) M3 cannot be expanded to a Tarskian model. The proofs of (1) and (2) are beyond the scope of this article, but some remarks are in order. Re (1): The fact that M1 can be expanded to a Tarskian model is not surprising, given the reasoning in Example 1.1, above: any initial hypothesis about the truth values of the three sentences in question leads, after three iterations of the revision process, to a stable hypothesis that (Tβ ∨ Tγ) and Tα are true, while ¬Tα is false. The fact that M1 can be expanded to exactly one Tarskian model needs the so-called Transfer Theorem, Gupta and Belnap 1993, Theorem 2D.4. Remark: In the introductory remarks, above, we claim that there are consistent classical interpreted languages that refer to their own sentences and have their own truth predicates. Clauses (1) of Theorem 2.1 delivers an example. Let M1′ be the unique Tarskian expansion of M1. Then the language L1, interpreted by M1′ is an interpreted language that has its own truth predicate satisfying the T-biconditionals classically understood, obeys the rules of standard classical logic, and has the ability to refer to each of its own sentences. Thus Tarski was not quite right in his view that any language with its own truth predicate would be inconsistent, as long as it obeyed the rules of standard classical logic, and had the ability to refer to its own sentences. Re (2): The only potential problematic self-reference is in the sentence Tτ, the so-called truth teller, which says of itself that it is true. Informal reasoning suggests that the truth teller can consistently be assigned either classical truth value: if you assign it the value t then no paradox is produced, since the sentence now truly says of itself that it is true; and if you assign it the value f then no paradox is produced, since the sentence now falsely says of itself that it is true. Theorem 2.1 (2) formalizes this point, i.e., M2 can be expanded to one Tarskian model in which Tτ is true and one in which Tτ is false. The fact that M2 can be expanded to exactly two Tarskian models needs the Transfer Theorem, alluded to above. Note that the language L2, interpreted by either of these expansions, provides another example of an interpreted language that has its own truth predicate satisfying the T-biconditionals classically understood, obeys the rules of standard classical logic, and has the ability to refer to each of its own sentences. Proof of (3). Suppose that M3′ = <D3, I3′ > is a classical expansion of M3 to all of L3. Since M3′ is an expansion of M3, I3 and I3′ agree on all the names of L3. So I3 ′(λ) = I3(λ) = ¬Tλ = I3(‘¬Tλ’) = I3 ′(‘¬Tλ’). So the sentences Tλ and T ‘¬Tλ’ have the same truth value in M3′. So the T-biconditional T ‘¬Tλ’ ≡ ¬Tλ is false in M3′. Remark: The language L3 interpreted by the ground model M3 formalizes the liar's paradox, with the sentence ¬Tλ as the offending liar's sentence. Thus, despite Theorem 2.1, Clauses (1) and (2), Clause (3) strongly suggests that in a semantics for languages capable of expressing their own truth concepts, T cannot, in general, have a classical signification; and the ‘iff’ in the T-biconditionals will not be read as the classical biconditional. We take these suggestions up in Section 4, below. 3. Basic notions of the RTT 3.1 Revision rules In Section 1, we informally sketched the central thought of the RTT, namely, that we can use the T-biconditionals to generate a revision rule — a rule for revising a hypothesis about the extension of the truth predicate. Here we will formalize this notion, and work through an example from Section 1. In general, let L be a truth language and M be a ground model for L. An hypothesis is a function h : D → {t, f}. A hypothesis will in effect be a hypothesized classical interpretation for T. Let's work with an example that combines Examples 2.1 and 2.3. We will state the example formally, but reason in a semiformal way, to transition from one hypothesized extension of T to another. Example 3.1 Suppose that L contains four non-quote names, α, β, γ and λ and no predicates other than T. Also suppose that M = <D, I > is as follows: D = SentL I(α) = Tβ ∨ Tγ I(β) = Tα I(γ) = ¬Tα I(λ) = ¬Tλ It will be convenient to let A be the sentence Tβ ∨ Tγ B be the sentence Tα C be the sentence ¬Tα X be the sentence ¬Tλ Thus: D = SentL I(α) = A I(β) = B I(γ) = C I(λ) = X Suppose that the hypothesis h0 hypothesizes that A is false, B is true, C is false and X is true. Thus h0(A) = f h0(B) = t h0(C) = f h0(X) = f Now we will engage in some semiformal reasoning, on the basis of hypothesis h0. Among the four sentences, A, B, C and X, h0 puts only B in the extension of T. Thus, reasoning from h0, we conclude that ¬Tα since the referent of α is not in the extension of T Tβ since the referent of β is in the extension of T ¬Tγ since the referent of γ is not in the extension of T ¬Tλ since the referent of λ is not in the extension of T. The T-biconditional for the four sentence A, B, C and X are as follows: (TA) A is true iff Tβ ∨ Tγ (TB) B is true iff Tα (TC) C is true iff ¬Tα (TX) X is true iff ¬Tλ Thus, reasoning from h0, we conclude that A is true B is not true C is true X is true This produces our new hypothesis h1: h1(A) = t h1(B) = f h1(C) = t h1(X) = t Let's revise our hypothesis once again. So now we will engage in some semiformal reasoning, on the basis of hypothesis h1. Hypothesis h1 puts A, C and X, but not B, in the extension of the T. Thus, reasoning from h1, we conclude that Tα since the referent of a is in the extension of T ¬Tβ since the referent of β is in the extension of T Tγ since the referent of γ is not in the extension of T Tλ since the referent of λ is not in the extension of T Recall the T-biconditional for the four sentence A, B, C and X, given above. Reasoning from h1 and these T-biconditionals, we conclude that A is true B is true C is not true X is not true This produces our new new hypothesis h2: h2(A) = t h2(B) = t h2(C) = f h2(X) = f □ Let's formalize the semiformal reasoning carried out in Example 3.1. First we hypothesized that certain sentences were, or were not, in the extension of T. Consider ordinary classical model theory. Suppose that our language has a predicate G and a name a, and that we have a model M = <D, I > which places the referent of a inside the extension of G: I(G)(I(α)) = t Then we conclude, classically, that the sentence Ga is true in M. It will be useful to have some notation for the classical truth value of a sentence S in a classical model M. We will write ValM(S). In this case, ValM(Ga) = t. In Example 3.1, we did not start with a classical model of the whole language L, but only a classical model of the T-free fragment of L. But then we added a hypothesis, in order to get a classical model of all of L. Let's use the notation M + h for the classical model of all of L that you get when you extend M by assigning T an extension via the hypothesis h. Once you have assigned an extension to the predicate T, you can calculate the truth values of the various sentences of L. That is, for each sentence S of L, we can calculate ValM + h(S) In Example 3.1, we started with hypothesis h0 as follows: h0(A) = f h0(B) = t h0(C) = f h0(X) = f Then we calculated as follows: ValM+h0(Tα) = f ValM+h0(Tβ) = t ValM+h0(Tγ) = f ValM+h0(Tλ) = f And then we concluded as follows: ValM+h0(A) = ValM+h0(Tβ ∨ Tγ) = t ValM+h0(B) = ValM+h0(¬Tα) = f ValM+h0(C) = ValM+h0(Tα) = t ValM+h0(X) = ValM+h0(¬Tλ) = t These conclusions generated our new hypothesis, h1: h1(A) = t h1(B) = f h1(C) = t h1(X) = t Note that, in general, h1(S) = ValM+h0(S). We are now prepared to define the revision rule given by a ground model M = <D, I >. In general, given an hypothesis h, let M + h = <D, I′ > be the model of L which agrees with M on the T-free fragment of L, and which is such that I′(T) = h. So M + h is just a classical model for all of L. For any model M + h of all of L and any sentence A if L, let ValM+h(A) be the ordinary classical truth value of A in M + h. Definition 3.2 Suppose that L is a truth language and that M = <D, I > is a ground model for L. The revision rule, τM, is the function mapping hypotheses to hypotheses, as follows: τM(h)(d) = { t, if d ∈ D is a sentence of L and ValM+h(d) = t f, otherwise The ‘otherwise’ clause tells us that if d is not a sentence of L, then, after one application of revision, we stick with the hypothesis that d is not true.[5] Note that, in Example 3.1, h1 = τM(h0) and h2 = τM(h1). We will often drop the subscripted ‘M’ when the context make it clear which ground model is at issue. 3.2 Revision sequences Let's pick up Example 3.1 and see what happens when we iterate the application of the revision rule. Example 3.3 (Example 3.2 continued) Recall that L contains four non-quote names, α, β, γ and λ and no predicates other than T. Also recall that M = <D, I > is as follows: D = SentL I(α) = A = Tβ ∨ Tγ I(β) = B = Tα I(γ) = C = ¬Tα I(λ) = X = ¬Tλ The following table indicates what happens with repeated applications of the revision rule τM to the hypothesis h0 from Example 3.1. In this table, we will write τ instead of τM: S h0(S) τ(h0)(S) τ2(h0)(S) τ3(h0)(S) τ4(h0)(S) … A f t t t t … B t f t t t … C f t f f f … X f t f t f … So h0 generates a revision sequence (see Definition 3.7, below). And A and B are stably true in that revision sequence (see Definition 3.6, below), while C is stably false. The liar sentence X is, unsurprisingly, neither stably true nor stably false: the liar sentence is unstable. A similar calculation would show that A is stably true, regardless of the initial hypothesis: thus A is categorically true (see Definition 3.8). Before giving a precise definition of a revision sequence, we give an example where we would want to carry the revision process beyond the finite stages, h, τ1(h), τ2(h), τ3(h), and so on. Example 3.4 Suppose that L contains nonquote names α0, α1, α2, α3, …, and unary predicates G and T. Now we will specify a ground model M = <D, I > where the name α0 refers to some tautology, and where the name α1 refers to the sentence Tα0 the name α2 refers to the sentence Tα1 the name a3 refers to the sentence Ta2 … More formally, let A0 be the sentence Tα0 ∨ ¬Tα0, and for each n ≥ 0, let An+1 be the sentence Tαn. Thus A1 is the sentence Tα0, and A2 is the sentence Tα1, and A3 is the sentence Tα2, and so on. Our ground model M = <D, I > is as follows: D = SentL I(αn) = An I(G)(A) = t iff A = An for some n Thus, the extension of G is the following set of sentences: {A0, A1, A2, A3, … } = {(Tα0 ∨ ¬Tα0), Tα0, Ta1, Ta2, Ta3, … }. Finally let B be the sentence ∀x(Gx ⊃ Tx). Let h be any hypothesis for which we have, for each natural number n, h(An) = h(B) = f. The following table indicates what happens with repeated applications of the revision rule τM to the hypothesis h. In this table, we will write τ instead of τM: S h(S) t(h)(S) τ2(h)(S) τ3(h)(S) τ4(h)(S) … A0 f t t t t … A1 f f t t t … A2 f f f t t … A3 f f f f t … A4 f f f f f … B f f f f f … At the 0th stage, each An is outside the hypothesized extension of T. But from the nth stage onwards, An is in the hypothesized extension of T. So, for each n, the sentence An is eventually stably hypothesized to be true. Despite this, there is no finite stage at which all the An's are hypothesized to be true: as a result the sentence B = ∀x(Gx ⊃ Tx) remains false at each finite stage. This suggests extending the process as follows: S h(S) τ(h)(S) τ2(h)(S) τ3(h)(S) … ω ω+1 ω+2 … A0 f t t t … t t t … A1 f f t t … t t t … A2 f f f t … t t t … A3 f f f f … t t t … A4 f f f f … t t t … B f f f f … f t t … Thus, if we allow the revision process to proceed beyond the finite stages, then the sentence B = ∀x(Gx ⊃ Tx) is stably true from the ω+1st stage onwards. □ In Example 3.4, the intuitive verdict is that not only should each An receive a stable truth value of t, but so should the sentence B = ∀x(Gx ⊃ Tx). The only way to ensure this is to carry the revision process beyond the finite stages. So we will consider revision sequences that are very long: not only will a revision sequence have a nth stage for each finite number n, but a ηth stage for every ordinal number η. (The next paragraph is to help the reader unfamiliar with ordinal numbers.) One way to think of the ordinal numbers is as follows. Start with the finite natural numbers: 0, 1, 2, 3,… Add a number, ω, greater than all of these but not the immediate successor of any of them: 0, 1, 2, 3, …, ω And then take the successor of ω, its successor, and so on: 0, 1, 2, 3, …, ω, ω+1, ω+2, ω+3 … Then add a number ω+ω, or ω×2, greater than all of these (and again, not the immediate successor of any), and start over, reiterating this process over and over: 0, 1, 2, 3, …, ω, ω+1, ω+2, ω+3, …, ω×2, (ω×2)+1, (ω×2)+2, (ω×2)+3, …, ω×3, (ω×3)+1, (ω×3)+2, (ω×3)+3, …   At the end of this, we add an ordinal number ω×ω or ω2: 0, 1, 2, …, ω, ω+1, ω+2, …, ω×2, (ω×2)+1, …, ω×3, …, ω×4, …, ω×5, …, ω2, ω2+1, … The ordinal numbers have the following structure: every ordinal number has an immediate successor known as a successor ordinal; and for any infinitely ascending sequence of ordinal numbers, there is a limit ordinal which is greater than all the members of the sequence and which is not the immediate successor of any member of the sequence. Thus the following are successor ordinals: 5, 178, ω+12, (ω×5)+56, ω2+8; and the following are limit ordinals: ω, ω×2, ω2, (ω2+ω), etc. Given a limit ordinal η, a sequence S of objects is an η-long sequence if there is an object Sδ for every ordinal δ < η. We will denote the class of ordinals as On. Any sequence S of objects is an On-long sequence if there is an object Sδ for every ordinal δ. When assessing whether a sentence receives a stable truth value, the RTT considers sequences of hypotheses of length On. So suppose that S is an On-long sequence of hypotheses, and let ζ and η range over ordinals. Clearly, in order for S to represent the revision process, we need the ζ+1st hypothesis to be generated from the ζth hypothesis by the revision rule. So we insist that Sζ+1 = τM(Sζ). But what should we do at a limit stage? That is, how should we set Sη(δ) when η is a limit ordinal? Clearly any object that is stably true [false] up to that stage should be true [false] at that stage. Thus consider Example 3.2. The sentence A2, for example, is true up to the ωth stage; so we set A2 to be true at the ωth stage. For objects that do not stabilize up to that stage, Gupta and Belnap 1993 adopt a liberal policy: when constructing a revision sequence S, if the value of the object d ∈ D has not stabilized by the time you get to the limit stage η, then you can set Sη(δ) to be whichever of t or f you like. Before we give the precise definition of a revision sequence, we continue with Example 3.3 to see an application of this idea. Example 3.5 (Example 3.3 continued) Recall that L contains four non-quote names, α, β, γ and λ and no predicates other than T. Also recall that M = <D, I > is as follows: D = SentL I(α) = A = Tβ ∨ Tγ I(β) = B = Tα I(γ) = C = ¬Tα I(λ) = X = ¬Tλ The following table indicates what happens with repeated applications of the revision rule τM to the hypothesis h0 from Example 3.1. For each ordinal η, we will indicate the ηth hypothesis by Sη (suppressing the index M on τ). Thus S0 = h0, S1 = τ(h0), S2 = τ2(h0), S3 = τ3(h0), and Sω, the ωth hypothesis, is determined in some way from the hypotheses leading up to it. So, starting with h0 from Example 3.3, our revision sequence begins as follows: S S0(S) S1(S) S2(S) S3(S) S4(S) … A f t t t t … B t f t t t … C f t f f f … X f t f t f … What happens at the ωth stage? A and B are stably true up to the ωth stage, and C is stably false up to the ωth stage. So at the ωth stage, we must have the following: S S0(S) S1(S) S2(S) S3(S) S4(S) … Sω(S) A f t t t t … t B t f t t t … t C f t f f f … f X f t f t f … ? But the entry for Sω(X) can be either t or f. In other words, the initial hypothesis h0 generates at least two revision sequences. Every revision sequence S that has h0 as its initial hypothesis must have Sω(A) = t, Sω(B) = t, and Sω(C) = f. But there is some revision sequence S, with h0 as its initial hypothesis, and with Sω(X) = t; and there is some revision sequence S′, with h0 as its initial hypothesis, and with Sω′(X) = f. □ We are now ready to define the notion of a revision sequence: Definition 3.6 Suppose that L is a truth language, and that M = <D, I > is a ground model. Suppose that S is an On-long sequence of hypotheses. Then we say that d ∈ D is stably t [f] in S iff for some ordinal θ we have Sζ(d) = t [f], for every ordinal ζ ≥ θ. Suppose that S is a η-long sequence of hypothesis for some limit ordinal η. Then we say that d ∈ D is stably t [f] in S iff for some ordinal θ < η we have Sζ(d) = t [f], for every ordinal ζ such that ζ ≥ θ and ζ < η. If S is an On-long sequence of hypotheses and η is a limit ordinal, then S|η is the initial segment of S up to but not including η. Note that S|η is a η-long sequence of hypotheses. Definition 3.7 Suppose that L is a truth language, and that M = <D, I > is a ground model. Suppose that S is an On-long sequence of hypotheses. S is a revision sequence for M iff Sζ+1 = τM(Sζ), for each ζ ∈ On, and for each limit ordinal η and each d ∈ D, if d is stably t [f] in S|η, then Sη(d) = t [f]. Definition 3.8 Suppose that L is a truth language, and that M = <D, I > is a ground model. We say that the sentence A is categorically true [false] in M iff A is stably t [f] in every revision sequence for M. We say that A is categorical in M iff A is either categorically true or categorically false in M. We now illustrate these concepts with an example. The example will also illustrate a new concept to be defined afterwards. Example 3.9 Suppose that L is a truth language containing nonquote names β, α0, α1, α2, α3, …, and unary predicates G and T. Let B be the sentence Tβ ∨ ∀x∀y(Gx & ¬Tx & Gy & ¬Ty ⊃ x=y). Let A0 be the sentence ∃x(Gx & ¬Tx). And for each n ≥ 0, let An+1 be the sentence Tαn. Consider the following ground model M = <D, I > D = SentL I(β) = B I(αn) = An I(G)(A) = t iff A = An for some n Thus, the extension of G is the following set of sentences: {A0, A1, A2, A3, … } = {Tα0, Tα1, T α2, Tα3, … }. Let h be any hypothesis for which we have, h(B) = f and for each natural number n, h(An) = f. And let S be a revision sequence whose initial hypothesis is h, i.e., S0 = h. The following table indicates some of the values of Sγ(C), for sentences C ∈ {B, A0, A1, A2, A3, … }. In the top row, we indicate only the ordinal number representing the stage in the revision process. 0 1 2 3 … ω ω+1 ω+2 ω+3 … ω×2 (ω×2)+1 (ω×2)+2 … B f f f f … f t t t … t t t … A0 f t t t … t f t t … t f t … A1 f f t t … t t f t … t t f … A2 f f f t … t t t f … t t t … A3 f f f f … t t t t … t t t … A4 f f f f … t t t t … t t t … It is worth contrasting the behaviour of the sentence B and the sentence A0. From the ω+1st stage on, B is stabilizes as true. In fact, B is stably true in every revision sequence for M. Thus, B is categorically true in M. The sentence A0, however, never quite stabilizes: it is usually true, but within a few finite stages of a limit ordinal, the sentence A0 can be false. In these circumstances, we say that A0 is nearly stably true (See Definition 3.10, below.) In fact, A0 is nearly stably true in every revision sequence for M. □ Example 3.9 illustrates not only the notion of stability in a revision sequence, but also of near stability, which we define now: Definition 3.10. Suppose that L is a truth language, and that M = <D, I > is a ground model. Suppose that S is an On-long sequence of hypotheses. Then we say that d ∈ D is nearly stably t [f] in S iff for some ordinal θ we have for every ζ ≥ θ, there is a natural number n such that, for every m ≥ n, Sζ+m(d) = t [f]. Gupta and Belnap 1993 characterize the difference between stability and near stability as follows: “Stability simpliciter requires an element [in our case a sentence] to settle down to a value x [in our case a truth value] after some initial fluctuations say up to [an ordinal η]… In contrast, near stability allows fluctuations after η also, but these fluctuations must be confined to finite regions just after limit ordinals” (p. 169). Gupta and Belnap 1993 introduce two theories of truth, T* and T#, based on stability and near stability. Theorems 3.12 and 3.13, below, illustrate an advantage of the system T#, i.e., the system based on near stability. Definition 3.11 Suppose that L is a truth language, and that M = <D, I > is a ground model. We say that a sentence A is valid in M by T* iff A is stably true in every revision sequence. And we say that a sentence A is valid in M by T# iff A is nearly stably true in every revision sequence. Theorem 3.12 Suppose that L is a truth language, and that M = <D, I > is a ground model. Then, for every sentence A of L, the following is valid in M by T#: T‘¬A’ ≡ ¬T‘A’. Theorem 3.13 There is a truth language L and a ground model M = <D, I > and a sentence A of L such that the following is not valid in M by T*: T ‘¬A’ ≡ ¬T ‘A’. Gupta and Belnap 1993, Section 6C, note similar advantages of T# over T*. For example, T# does, but T* does not, validate the following semantic principles: T ‘A & B’ ≡ T ‘A’ & T ‘B’ T ‘A ∨ B’ ≡ T ‘A’ ∨ T ‘B’ Gupta and Belnap remain noncommittal about which of T# and T* (and a further alternative that they define, Tc) is preferable. 4. Interpreting the formalism The main formal notions of the RTT are the notion of a revision rule (Definition 3.2), i.e., a rule for revising hypotheses; and a revision sequence (Definition 3.7), a sequence of hypotheses generated in accordance with the appropriate revision rule. Using these notions, we can, given a ground model, specify when a sentence is stably, or nearly stably, true or false in a particular revision sequence. Thus we could define two theories of truth, T* and T#, based on stability and near stability. The final idea is that each of these theories delivers a verdict on which sentences of the language are categorically assertible, given a ground model. Note that we could use revision-theoretic notions to make rather fine-grained distinctions among sentences: Some sentences are unstable in every revision sequence; others are stable in every revision sequence, though stably true in some and stably false in others; and so on. Thus, we can use revision-theoretic ideas to give a fine-grained analysis of the status of various sentences, and of the relationships of various sentences to one another. Recall the suggestion made at the end of Section 2: In a semantics for languages capable of expressing their own truth concepts, T will not, in general, have a classical signification; and the ‘iff’ in the T-biconditionals will not be read as the classical biconditional. Gupta and Belnap fill out these suggestions in the following way. 4.1 The signification of T First, they suggest that the signification of T, given a ground model M, is the revision rule τM itself. As noted in the preceding paragraph, we can give a fine-grained analysis of sentences' statuses and interrelations on the basis of notions generated directly and naturally from the revision rule τM. Thus, τM is a good candidate for the signification of T, since it does seem to be “an abstract something that carries all the information about all [of T's] extensional relations” in M. (See Gupta and Belnap's characterization of an expression's signification, given in Section 2, above.) 4.2 The ‘iff’ in the T-biconditionals Gupta and Belnap's related suggestion concerning the ‘iff’ in the T-biconditionals is that, rather than being the classical biconditional, this ‘iff’ is the distinctive biconditional used to define a previously undefined concept. In 1993, Gupta and Belnap present the revision theory of truth as a special case of a revision theory of circularly defined concepts. Suppose that L is a language with a unary predicate F and a binary predicate R. Consider a new concept expressed by a predicate G, introduced through a definition like this: Gx =df ∀y(Ryx ⊃ Fx) ∨ ∃y(Ryx & Gx). Suppose that we start with a domain of discourse, D, and an interpretation of the predicate F and the relation symbol R. Gupta and Belnap's revision-theoretic treatment of concepts thus circularly introduced allows one to give categorical verdicts, for certain d ∈ D about whether or not d satisfies G. Other objects will be unstable relative to G: we will be able categorically to assert neither that d satisfies G nor that d does not satisfy G. In the case of truth, Gupta and Belnap take the set of T-biconditionals of the form T ‘A’ =df A (10) together to give the definition of the concept of truth. It is their treatment of ‘=df’ (the ‘iff’ of definitional concept introduction), together with the T-biconditionals of the form (10), that determine the revision rule τM. 4.3 The paradoxical reasoning Recall the liar sentence, (1), from the beginning of this article: (1) is not true (1) In Section 1, we claimed that the RTT is designed to model, rather than block, the kind of paradoxical reasoning regarding (1). But we noted in footnote 2 that the RTT does avoid contradictions in these situations. There are two ways to see this. First, while the RTT does endorse the biconditional (1) is true iff (1) is not true, the relevant ‘iff’ is not the material biconditional, as explained above. Thus, it does not follow that both (1) is true and (1) is not true. Second, note that on no hypothesis can we conclude that both (1) is true and (1) is not true. If we keep it firmly in mind that revision-theoretical reasoning is hypothetical rather than categorical, then we will not infer any contradictions from the existence of a sentence such as (1), above. 4.4 The signification thesis Gupta and Belnap's suggestions, concerning the signification of T and the interpretation of the ‘iff’ in the T-biconditionals, dovetail nicely with two closely related intuitions articulated in Gupta & Belnap 1993. The first intuition, loosely expressed, is “that the T-biconditionals are analytic and fix the meaning of ‘true’” (p. 6). More tightly expressed, it becomes the “Signification Thesis” (p. 31): “The T-biconditionals fix the signification of truth in every world [where a world is represented by a ground model].”[6] Given the revision-theoretic treatment of the definition ‘iff’, and given a ground model M, the T-biconditionals (10) do, as noted, fix the suggested signification of T, i.e., the revision rule τM. 4.5 The supervenience of semantics The second intuition is the supervenience of the signification of truth. This is a descendant of M. Kremer's 1988 proposed supervenience of semantics. The idea is simple: which sentences fall under the concept truth should be fixed by (1) the interpretation of the nonsemantic vocabulary, and (2) the empirical facts. In non-circular cases, this intuition is particularly strong: the standard interpretation of “snow” and “white” and the empirical fact that snow is white, are enough to determine that the sentence “snow is white” falls under the concept truth. The supervenience of the signification of truth is the thesis that the signification of truth, whatever it is, is fixed by the ground model M. Clearly, the RTT satisfies this principle. It is worth seeing how a theory of truth might violate this principle. Consider the truth-teller sentence, i.e., the sentence that says of itself that it is true: (11) is true (11) As noted above, Kripke's three-valued semantics allows three truth values, true (t), false (f), and neither (n). Given a ground model M = <D, I > for a truth language L, the candidate interpretations of T are three-valued interpretations, i.e., functions h : D → { t, f, n }. Given a three-valued interpretation of T, and a scheme for evaluating the truth value of composite sentences in terms of their parts, we can specify a truth value ValM+h(A) = t, f or n, for every sentence A of L. The central theorem of the three-valued semantics is that, given any ground model M, there is a three-valued interpretation h of T so that, for every sentence A, we have ValM+h(T ‘A’) = ValM+h(A).[7] We will call such an interpretation of T an acceptable interpretation. Our point here is this: if there's a truth-teller, as in (11), then there is not only one acceptable interpretation of T; there are three: one according to which (11) is true, one according to which (11) is false, and one according to which (11) is neither. Thus, there is no single “correct” interpretation of T given a ground model M. Thus the three-valued semantics seems to violate the supervenience of semantics.[8] The RTT does not assign a truth value to the truth-teller, (11). Rather, it gives an analysis of the kind of reasoning that one might engage in with respect to the truth-teller: If we start with a hypothesis h according to which (11) is true, then upon revision (11) remains true. And if we start with a hypothesis h according to which (11) is not true, then upon revision (11) remains not true. And that is all that the concept of truth leaves us with. Given this behaviour of (11), the RTT tells us that (11) is neither categorically true nor categorically false, but this is quite different from a verdict that (11) is neither true nor false. 4.6 A nonsupervenient interpretation of the formalism We note an alternative interpretation of the revision-theoretic formalism. Yaqūb 1993 agrees with Gupta and Belnap that the T-biconditionals are definitional rather than material biconditionals, and that the concept of truth is therefore circular. But Yaqūb interprets this circularity in a distinctive way. He argues that, since the truth conditions of some sentences involve reference to truth in an essential, irreducible manner, these conditions can only obtain or fail in a world that already includes an extension of the truth predicate. Hence, in order for the revision process to determine an extension of the truth predicate, an initial extension of the predicate must be posited. This much follows from circularity and bivalence. (1993, 40) Like Gupta and Belnap, Yaqūb posits no privileged extension for T. And like Gupta and Belnap, he sees the revision sequences of extensions of T, each sequence generated by an initial hypothesized extension, as “capable of accommodating (and diagnosing) the various kinds of problematic and unproblematic sentences of the languages under consideration” (1993, 41). But, unlike Gupta and Belnap, he concludes from these considerations that “truth in a bivalent language is not supervenient” (1993, 39). He explains in a footnote: for truth to be supervenient, the truth status of each sentence must be “fully determined by nonsemantical facts”. Yaqūb does not explicitly use the notion of a concept's signification. But Yaqūb seems committed to the claim that the signification of T — i.e., that which determines the truth status of each sentence — is given by a particular revision sequence itself. And no revision sequence is determined by the nonsemantical facts, i.e., by the ground model, alone: a revision sequence is determined, at best, by a ground model and an initial hypothesis.[9] 5. Further issues 5.1 Three-valued semantics We have given only the barest exposition of the three-valued semantics, in our discussion of the supervenience of the signification of truth, above. Given a truth language L and a ground model M, we defined an acceptable three-valued interpretation of T as an interpretation h : D → { t, f, n } such that ValM+h(T‘A’) = ValM+h(A) for each sentence A of L. In general, given a ground model M, there are many acceptable interpretations of T. Suppose that each of these is indeed a truly acceptable interpretation. Then the three-valued semantics violates the supervenience of the signification of T. Suppose, on the other hand, that, for each ground model M, we can isolate a privileged acceptable interpretation as the correct interpretation of T. Gupta and Belnap present a number of considerations against the three-valued semantics, so conceived. (See Gupta & Belnap 1993, Chapter 3.) One principal argument is that the central theorem, i.e., that for each ground model there is an acceptable interpretation, only holds when the underlying language is expressively impoverished in certain ways: for example, the three-valued approach fails if the language has a connective ~ with the following truth table: A ~A t f f t n t The only negation operator that the three-valued approach can handle has the following truth table: A ¬A t f f t n n But consider the liar that says of itself that it is ‘not’ true, in this latter sense of ‘not’. Gupta and Belnap urge the claim that this sentence “ceases to be intuitively paradoxical” (1993, 100). The claimed advantage of the RTT is its ability to describe the behaviour of genuinely paradoxical sentences: the genuine liar is unstable under semantic evaluation: “No matter what we hypothesize its value to be, semantic evaluation refutes our hypothesis.” The three-valued semantics can only handle the “weak liar”, i.e., a sentence that only weakly negates itself, but that is not guaranteed to be paradoxical: “There are appearances of the liar here, but they deceive.” We've thus far reviewed two of Gupta and Belnap's complaints against three-valued approaches, and now we raise a third: in the three-valued theories, truth typically behaves like a nonclassical concept even when there’s no vicious reference in the language. Without defining terms here, we note that one popular precisification of the three-valued approach, is to take the correct interpretation of T to be that given by the ‘least fixed point’ of the ‘strong Kleene scheme’: putting aside details, this interpretation always assigns the truth value n to the sentence ∀x(Tx ∨ ¬Tx), even when the ground model allows no circular, let alone vicious, reference. Gupta and Belnap claim an advantage for the RTT: according to revision-theoretic approach, they claim, truth always behaves like a classical concept when there is no vicious reference. Kremer 2010 challenges this claim by precisifying it as a formal claim against which particular revision theories (e.g. T* or T#, see Definition 3.11, above) and particular three-valued theories can be tested. As it turns out, on many three-valued theories, truth does in fact behave like a classical concept when there's no vicious reference: for example, the least fixed point of a natural variant of the supervaluation scheme always assigns T a classical interpretation in the absence of vicious reference. Granted, truth behaves like a classical concept when there’s no vicious reference on Gupta and Belnap's theory T*, but, so Kremer argues, does not on Gupta and Belnap's theory T#. This discussion is further taken up by Wintein 2014. 5.2 Two values? A contrast presupposed by this entry is between allegedly two-valued theories, like the RTT, and allegedly three-valued or other many-valued rivals. One might think of the RTT itself as providing infinitely many semantic values, for example one value for every possible revision sequence. Or one could extract three semantic values for sentences: categorical truth, categorical falsehood, and uncategoricalness. In reply, it must be granted that the RTT generates many statuses available to sentences. Similarly, three-valued approaches also typically generate many statuses available to sentences. The claim of two-valuedness is not a claim about statuses available to sentences, but rather a claim about the truth values presupposed in the whole enterprise. 5.3 Amendments to the RTT We note three ways to amend the RTT. First, we might put constraints on which hypotheses are acceptable. For example, Gupta and Belnap 1993 introduce a theory, Tc, of truth based on consistent hypotheses: an hypothesis h is consistent iff the set {A : h(A) = t} is a complete consistent set of sentences. The relative merits of T*, T# and Tc are discussed in Gupta & Belnap 1993, Chapter 6. Second, we might adopt a more restrictive limit policy than Gupta and Belnap adopt. Recall the question asked in Section 3: How should we set Sη(d) when η is a limit ordinal? We gave a partial answer: any object that is stably true [false] up to that stage should be true [false] at that stage. We also noted that for an object d ∈ D that does not stabilize up to the stage η, Gupta and Belnap 1993 allow us to set Sη(d) as either t or f. In a similar context, Herzberger 1982a and 1982b assigns the value f to the unstable objects. And Gupta originally suggested, in Gupta 1982, that unstable elements receive whatever value they received at the initial hypothesis S0. These first two ways of amending the RTT both, in effect, restrict the notion of a revision sequence, by putting constraints on which of our revision sequences really count as acceptable revision sequences. The constraints are, in some sense local: the first constraint is achieved by putting restrictions on which hypotheses can be used, and the second constraint is achieved by putting restrictions on what happens at limit ordinals. A third option would be to put more global constraints on which putative revision sequences count as acceptable. Yaqūb 1993 suggests, in effect, a limit rule whereby acceptable verdicts on unstable sentences at some limit stage η depend on verdicts rendered at other limit stages. Yaqūb argues that these constraints allow us to avoid certain “artifacts”. For example, suppose that a ground model M = <D, I > has two independent liars, by having two names α and β, where I(α) = ¬Tα and I(β) = ¬Tβ. Yaqūb argues that it is a mere “artifact” of the revision semantics, naively presented, that there are revision sequences in which the sentence ¬Tα ≡ ¬Tβ is stably true, since the two liars are independent. His global constraints are developed to rule out such sequences. (See Chapuis 1996 for further discussion.) 5.4 Revision theory for circularly defined concepts As indicated in our discussion, in Section 4, of the ‘iff’ in the T-biconditionals, Gupta and Belnap present the RTT as a special case of a revision theory of circularly defined concepts. To reconsider the example from Section 4. Suppose that L is a language with a unary predicate F and a binary predicate R. Consider a new concept expressed by a predicate G, introduced through a definition, D, like this: Gx = df A(x,G) where A(x,G) is the formula ∀y(Ryx ⊃ Fx) ∨ ∃y(Ryx & Gx). In this context, a ground model is a classical model M = <D, I > of the language L: we start with a domain of discourse, D, and an interpretation of the predicate F and the relation symbol R. We would like to extend M to an interpretation of the language L + G. So, in this context, an hypothesis will be thought of as an hypothesized extension for the newly introduced concept G. Formally, a hypothesis is simply a function h : D → {t, f}. Given a hypothesis h, we take M+h to be the classical model M+h = <D, I′ >, where I′ interprets F and R in the same way as I, and where I′(G) = h. Given a hypothesized interpretation h of G, we generate a new interpretation of G as follows: and object d ∈ D is in the new extension of G just in case the defining formula A(x,G) is true of d in the model M+h. Formally, we use the ground model M and the definition D to define a revision rule, δD,M, mapping hypotheses to hypotheses, i.e., hypothetical interpretations of G to hypothetical interpretations of G. In particular, for any formula B with one free variable x, and d ∈ D, we can define the truth value ValM+h,d(B) in the standard way. Then, δD,M(h)(d) = ValM+h,d(A) Given a revision rule δD,M, we can generalize the notion of a revision sequence, which is now a sequence of hypothetical extensions of G rather than T. We can generalize the notion of a sentence B being stably true, nearly stably true, etc., relative to a revision sequence. Gupta and Belnap introduce the systems S* and S#, analogous to T* and T#, as follows:[10] Definition 5.1. A sentence B is valid on the definition D in the ground model M in the system S* (notation M ⊨*,D B) iff B is stably true relative to each revision sequence for the revision rule δD,M. A sentence B is valid on the definition D in the ground model M in the system S# (notation M ⊨#,D B) iff B is nearly stably true relative to each revision sequence for the revision rule δD,M. A sentence B is valid on the definition D in the system S* (notation ⊨*,D B) iff for all classical ground models M, we have M ⊨*,D B. A sentence B is valid on the definition D in the system S# (notation ⊨#,D B) iff for all classical ground models M, we have M ⊨#,D B. One of Gupta and Belnap's principle open questions is whether there is a complete calculus for these systems: that is, whether, for each definition D, either of the following two sets of sentences is recursively axiomatizable: {B : ⊨*,D B} and {B : ⊨#,D B}. Kremer 1993 proves that the answer is no: he shows that there is a definition D such that each of these sets of sentences is of complexity at least Π12, thereby putting a lower limit on the complexity of S* and S#. (Antonelli 1994a and 2002 shows that this is also an upper limit.) Kremer's proof exploits an intimate relationship between circular definitions understood revision-theoretically and circular definitions understood as inductive definitions: the theory of inductive definitions has been quite well understood for some time. In particular, Kremer proves that every inductively defined concept can be revision-theoretically defined. The expressive power and other aspects of the revision-theoretic treatment of circular definitions is the topic of much interesting work: see Welch 2001, Löwe 2001, Löwe and Welch 2001, and Kühnberger et al. 2005. 5.5 Axiomatic Theories of Truth and the Revision Theory The RTT is a clear example of a semantically motivated theory of truth. Quite a different tradition seeks to give a satisfying axiomatic theory of truth. Granted we cannot retain all of classical logic and all of our intuitive principles regarding truth, especially if we allow vicious self-reference. But maybe we can arrive at satisfying axiom systems for truth, that, for example, maintain consistency and classical logic, but give up only a little bit when it comes to our intuitive principles concerning truth, such as the T-biconditionals (interpreted classically); or maintain consistency and all of the T-biconditionals, but give up only a little bit of classical logic. Halbach 2011 comprehensively studies such axiomatic theories (mainly those that retain classical logic), and Horsten 2011 is in the same tradition. Both Chapter 14 of Halbach 2011 and Chapter 8 of Horsten 2011 study the relationship between the Friedman-Sheard theory FS and the revision semantics, with some interesting results. For more work on axiomatic systems and the RTT, see Horsten et al 2012. Field 2008 makes an interesting contribution to axiomatic theorizing about truth, even though most of the positive work in the book consists of model building and is therefore semantics. In particular, Field is interested in producing a theory as close to classical logic as possible, which at the same time retains all T-biconditionals (the conditional itself will be nonclassical) and which at the same time can express, in some sense, the claim that such and such a sentence is defective. Field uses tools from multivalued logic, fixed-point semantics, and revision theory to build models showing, in effect, that a very attractive axiomatic system is consistent. Field’s construction is an intricate interplay between using fixed-point constructions for successively interpreting T, and revision sequences for successively interpreting the nonclassical conditional — the final interpretation being determined by a sort of super-revision-theoretic process. 5.6 Applications Given Gupta and Belnap's general revision-theoretic treatment of circular definitions-of which their treatment of truth is a special case-one would expect revision-theoretic ideas to be applied to other concepts. Antonelli 1994b applies these ideas to non-well-founded sets: a non-well-founded set X can be thought of as circular, since, for some X0, …, Xn we have X ∈ X0 ∈ … ∈ Xn ∈ X. Chapuis 2003 applies revision-theoretic ideas to rational decision making. Also, see Wang 2011 for a discussion of revision theory and abstract objects, and Asmus 2013 for a discussion of revision theory and vagueness. In the last decade, there has been increasing interest in bridging the gap between classic debates on the nature of truth — deflationism, the correspondence theory, minimalism, pragmatism, and so on — and formal work on truth, motivated by the liar's paradox. The RTT is tied to pro-sententialism by Belnap 2006; deflationism, by Yaqūb 2008; and minimalism, by Restall 2005. We must also mention Gupta 2006. In this work, Gupta argues that an experience provides the experiencer, not with a straightforward entitlement to a proposition, but rather with a hypothetical entitlement: as explicated in Berker 2011, if subject S has experience e and is entitled to hold view v (where S’s view is the totality of S’s concepts, conceptions, and beliefs), then S is entitled to believe a certain class of perceptual judgements, Γ(v). (Berker uses “propositions” instead of “perceptual judgements” in his formulation.) But this generates a problem: how is S entitled to hold a view? There seems to be a circular interdependence between entitlements to views and entitlements to perceptual judgements. Here, Gupta appeals to a general form of revision theory — generalizing beyond both the revision theory of truth and the revision theory of circularly defined concepts (Section 5.4, above) — to given an account of how “hypothetical perceptual entitlements could yield categorical entitlements” (Berker 2011).

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

The Pragmatic Theory of Truth

1. History of the Pragmatic Theory of Truth The history of the pragmatic theory of truth is tied to the history of classical American pragmatism. According to the standard account, C.S. Peirce gets credit for first proposing a pragmatic theory of truth, William James is responsible for popularizing the pragmatic …

1. History of the Pragmatic Theory of Truth The history of the pragmatic theory of truth is tied to the history of classical American pragmatism. According to the standard account, C.S. Peirce gets credit for first proposing a pragmatic theory of truth, William James is responsible for popularizing the pragmatic theory, and John Dewey subsequently reframed truth in terms of warranted assertibility (for this reading of Dewey see Burgess & Burgess 2011: 4). More specifically, Peirce is associated with the idea that true beliefs are those that will withstand future scrutiny; James with the idea that true beliefs are dependable and useful; Dewey with the idea that truth is a property of well-verified claims (or “judgments”). 1.1 Peirce’s Pragmatic Theory of Truth The American philosopher, logician and scientist Charles Sanders Peirce (1839–1914) is generally recognized for first proposing a “pragmatic” theory of truth. Peirce’s pragmatic theory of truth is a byproduct of his pragmatic theory of meaning. In a frequently-quoted passage in “How to Make Our Ideas Clear” (1878), Peirce writes that, in order to pin down the meaning of a concept, we must: Consider what effects, which might conceivably have practical bearings, we conceive the object of our conception to have. Then, our conception of these effects is the whole of our conception of the object. (1878 [1986: 266]) The meaning of the concept of “truth” then boils down to the “practical bearings” of using this term: that is, of describing a belief as true. What, then, is the practical difference of describing a belief as “true” as opposed to any number of other positive attributes such as “creative”, “clever”, or “well-justified”? Peirce’s answer to this question is that true beliefs eventually gain general acceptance by withstanding future inquiry. (Inquiry, for Peirce, is the process that takes us from a state of doubt to a state of stable belief.) This gives us the pragmatic meaning of truth and leads Peirce to conclude, in another frequently-quoted passage, that: All the followers of science are fully persuaded that the processes of investigation, if only pushed far enough, will give one certain solution to every question to which they can be applied.…The opinion which is fated to be ultimately agreed to by all who investigate, is what we mean by the truth. (1878 [1986: 273]) Peirce realized that his reference to “fate” could be easily misinterpreted. In a less-frequently quoted footnote to this passage he writes that “fate” is not meant in a “superstitious” sense but rather as “that which is sure to come true, and can nohow be avoided” (1878 [1986: 273]). Over time Peirce moderated his position, referring less to fate and unanimous agreement and more to scientific investigation and general consensus (Misak 2004). The result is an account that views truth as what would be the result of scientific inquiry, if scientific inquiry were allowed to go on indefinitely. In 1901 Peirce writes that: Truth is that concordance of an abstract statement with the ideal limit towards which endless investigation would tend to bring scientific belief. (1901a [1935: 5.565]) Consequently, truth does not depend on actual unanimity or an actual end to inquiry: If Truth consists in satisfaction, it cannot be any actual satisfaction, but must be the satisfaction which would ultimately be found if the inquiry were pushed to its ultimate and indefeasible issue. (1908 [1935: 6.485], emphasis in original) As these references to inquiry and investigation make clear, Peirce’s concern is with how we come to have and hold the opinions we do. Some beliefs may in fact be very durable but would not stand up to inquiry and investigation (this is true of many cognitive biases, such as the Dunning-Kruger effect where people remain blissfully unaware of their own incompetence). For Peirce, a true belief is not simply one we will hold onto obstinately. Rather, a true belief is one that has and will continue to hold up to sustained inquiry. In the practical terms Peirce prefers, this means that to have a true belief is to have a belief that is dependable in the face of all future challenges. Moreover, to describe a belief as true is to point to this dependability, to signal the belief’s scientific bona fides, and to endorse it as a basis for action. By focusing on the practical dimension of having true beliefs, Peirce plays down the significance of more theoretical questions about the nature of truth. In particular, Peirce is skeptical that the correspondence theory of truth—roughly, the idea that true beliefs correspond to reality—has much useful to say about the concept of truth. The problem with the correspondence theory of truth, he argues, is that it is only “nominally” correct and hence “useless” (1906 [1998: 379, 380]) as far as describing truth’s practical value. In particular, the correspondence theory of truth sheds no light on what makes true beliefs valuable, the role of truth in the process of inquiry, or how best to go about discovering and defending true beliefs. For Peirce, the importance of truth rests not on a “transcendental” (1901a [1935: 5.572]) connection between beliefs on the one hand and reality on the other, but rather on the practical connection between doubt and belief, and the processes of inquiry that take us from the former to the latter: If by truth and falsity you mean something not definable in terms of doubt and belief in any way, then you are talking of entities of whose existence you can know nothing, and which Ockham’s razor would clean shave off. Your problems would be greatly simplified, if, instead of saying that you want to know the “Truth”, you were simply to say that you want to attain a state of belief unassailable by doubt. (1905 [1998: 336]) For Peirce, a true belief is one that is indefeasible and unassailable—and indefeasible and unassailable for all the right reasons: namely, because it will stand up to all further inquiry and investigation. In other words, if we were to reach a stage where we could no longer improve upon a belief, there is no point in withholding the title “true” from it. (Misak 2000: 101) 1.2 James’ Pragmatic Theory of Truth Peirce’s contemporary, the psychologist and philosopher William James (1842–1910), gets credit for popularizing the pragmatic theory of truth. In a series of popular lectures and articles, James offers an account of truth that, like Peirce’s, is grounded in the practical role played by the concept of truth. James, too, stresses that truth represents a kind of satisfaction: true beliefs are satisfying beliefs, in some sense. Unlike Peirce, however, James suggests that true beliefs can be satisfying short of being indefeasible and unassailable: short, that is, of how they would stand up to ongoing inquiry and investigation. In the lectures published as Pragmatism: A New Name for Some Old Ways of Thinking (1907) James writes that: Ideas…become true just in so far as they help us get into satisfactory relation with other parts of our experience, to summarize them and get about among them by conceptual short-cuts instead of following the interminable succession of particular phenomena. (1907 [1975: 34]) True ideas, James suggests, are like tools: they make us more efficient by helping us do what needs to be done. James adds to the previous quote by making the connection between truth and utility explicit: Any idea upon which we can ride, so to speak; any idea that will carry us prosperously from any one part of our experience to any other part, linking things satisfactorily, working securely, simplifying, saving labor; is true for just so much, true in so far forth, true instrumentally. This is the ‘instrumental’ view of truth. (1907 [1975: 34]) While James, here, credits this view to John Dewey and F.C.S. Schiller, it is clearly a view he endorses as well. To understand truth, he argues, we must consider the pragmatic “cash-value” (1907 [1975: 97]) of having true beliefs and the practical difference of having true ideas. True beliefs, he suggests, are useful and dependable in ways that false beliefs are not: you can say of it then either that “it is useful because it is true” or that “it is true because it is useful”. Both these phrases mean exactly the same thing. (1907 [1975: 98]) Passages such as this have cemented James’ reputation for equating truth with mere utility (something along the lines of: “< p > is true just in case it is useful to believe that p” [see Schmitt 1995: 78]). (James does offer the qualification “in the long run and on the whole of course” (1907 [1975: 106]) to indicate that truth is different from instant gratification, though he does not say how long the long run should be.) Such an account might be viewed as a watered-down version of Peirce’s account that substitutes “cash-value” or subjective satisfaction for indefeasibility and unassailability in the face of ongoing inquiry and investigation. Such an account might also be viewed as obviously wrong, given the undeniable existence of useless truths and useful falsehoods. In the early twentieth century Peirce’s writings were not yet widely available. As a result, the pragmatic theory of truth was frequently identified with James’ account—and, as we will see, many philosophers did view it as obviously wrong. James, in turn, accused his critics of willful misunderstanding: that because he wrote in an accessible, engaging style his critics “have boggled at every word they could boggle at, and refused to take the spirit rather than the letter of our discourse” (1909 [1975: 99]). However, it is also the case that James tends to overlook or intentionally blur—it is hard to say which—the distinction between (a) giving an account of true ideas and (b) giving an account of the concept of truth. This means that, while James’ theory might give a psychologically realistic account of why we care about the truth (true ideas help us get things done) his theory fails to shed much light on what the concept of truth exactly is or on what makes an idea true. And, in fact, James often seems to encourage this reading. In the preface to The Meaning of Truth he doubles down by quoting many of his earlier claims and noting that “when the pragmatists speak of truth, they mean exclusively something about the ideas, namely their workableness” (1909 [1975: 6], emphasis added). James’ point seems to be this: from a practical standpoint, we use the concept of truth to signal our confidence in a particular idea or belief; a true belief is one that can be acted upon, that is dependable and that leads to predictable outcomes; any further speculation is a pointless distraction. What then about the concept of truth? It often seems that James understands the concept of truth in terms of verification: thus, “true is the name for whatever idea starts the verification-process, useful is the name for its completed function in experience” (1907 [1975: 98]). And, more generally: Truth for us is simply a collective name for verification-processes, just as health, wealth, strength, etc., are names for other processes connected with life, and also pursued because it pays to pursue them. (1907 [1975: 104]) James seems to claim that being verified is what makes an idea true, just as having a lot of money is what makes a person wealthy. To be true is to be verified: Truth happens to an idea. It becomes true, is made true by events. Its verity is in fact an event, a process: the process namely of its verifying itself, its veri-fication. Its validity is the process of its valid-ation. (1907 [1975: 97], emphasis in original) Like Peirce, James argues that a pragmatic account of truth is superior to a correspondence theory because it specifies, in concrete terms, what it means for an idea to correspond or “agree” with reality. For pragmatists, this agreement consists in being led “towards that reality and no other” in a way that yields “satisfaction as a result” (1909 [1975: 104]). By sometimes defining truth in terms of verification, and by unpacking the agreement of ideas and reality in pragmatic terms, James’ account attempts to both criticize and co-opt the correspondence theory of truth. It appears James wants to have his cake and eat it too. 1.3 Dewey’s Pragmatic Theory of Truth John Dewey (1859–1952), the third figure from the golden era of classical American pragmatism, had surprisingly little to say about the concept of truth especially given his voluminous writings on other topics. On an anecdotal level, as many have observed, the index to his 527 page Logic: The Theory of Inquiry (1938 [2008]) has only one reference to “truth”, and that to a footnote mentioning Peirce. Otherwise the reader is advised to “See also assertibility”. At first glance, Dewey’s account of truth looks like a combination of Peirce and James. Like Peirce, Dewey emphasizes the connection between truth and rigorous scientific inquiry; like James, Dewey views truth as the verified result of past inquiry rather than as the anticipated result of inquiry proceeding into an indefinite future. For example, in 1911 he writes that: From the standpoint of scientific inquiry, truth indicates not just accepted beliefs, but beliefs accepted in virtue of a certain method.…To science, truth denotes verified beliefs, propositions that have emerged from a certain procedure of inquiry and testing. By that I mean that if a scientific man were asked to point to samples of what he meant by truth, he would pick…beliefs which were the outcome of the best technique of inquiry available in some particular field; and he would do this no matter what his conception of the Nature of Truth. (1911 [2008: 28]) Furthermore, like both Peirce and James, Dewey charges correspondence theories of truth with being unnecessarily obscure because these theories depend on an abstract (and unverifiable) relationship between a proposition and how things “really are” (1911 [2008: 34]). Finally, Dewey also offers a pragmatic reinterpretation of the correspondence theory that operationalizes the idea of correspondence: Our definition of truth…uses correspondence as a mark of a meaning or proposition in exactly the same sense in which it is used everywhere else…as the parts of a machine correspond. (1911 [2008: 45]) Dewey has an expansive understanding of “science”. For Dewey, science emerges from and is continuous with everyday processes of trial and error—cooking and small-engine repair count as “scientific” on his account—which means he should not be taken too strictly when he equates truth with scientific verification. (Peirce and James also had expansive understandings of science.) Rather, Dewey’s point is that true propositions, when acted on, lead to the sort of predictable and dependable outcomes that are hallmarks of scientific verification, broadly construed. From a pragmatic standpoint, scientific verification boils down to the process of matching up expectations with outcomes, a process that gives us all the “correspondence” we could ask for. Dewey eventually came to believe that conventional philosophical terms such as “truth” and “knowledge” were burdened with so much baggage, and had become so fossilized, that it was difficult to grasp the practical role these terms had originally served. As a result, in his later writings Dewey largely avoids speaking of “truth” or “knowledge” while focusing instead on the functions played by these concepts. By his 1938 Logic: The Theory of Inquiry Dewey was speaking of “warranted assertibility” as the goal of inquiry, using this term in place of both “truth” and “knowledge” (1938 [2008: 15–16]). In 1941, in a response to Russell entitled “Propositions, Warranted Assertibility, and Truth”, he wrote that “warranted assertibility” is a “definition of the nature of knowledge in the honorific sense according to which only true beliefs are knowledge” (1941: 169). Here Dewey suggests that “warranted assertibility” is a better way of capturing the function of both knowledge and truth insofar as both are goals of inquiry. His point is that it makes little difference, pragmatically, whether we describe the goal of inquiry as “acquiring more knowledge”, “acquiring more truth”, or better yet, “making more warrantably assertible judgments”. Because it focuses on truth’s function as a goal of inquiry, Dewey’s pragmatic account of truth has some unconventional features. To begin with, Dewey reserves the term “true” only for claims that are the product of controlled inquiry. This means that claims are not true before they are verified but that, rather, it is the process of verification that makes them true: truth and falsity are properties only of that subject-matter which is the end, the close, of the inquiry by means of which it is reached. (1941: 176) Second, Dewey insists that only “judgments”—not “propositions”—are properly viewed as truth-bearers. For Dewey, “propositions” are the proposals and working hypotheses that are used, via a process of inquiry, to generate conclusions and verified judgments. As such, propositions may be more or less relevant to the inquiry at hand but they are not, strictly speaking true or false (1941: 176). Rather, truth and falsity are reserved for “judgments” or “the settled outcome of inquiry” (1941: 175; 1938 [2008: 124]; Burke 1994): for claims, in other words, that are warrantedly assertible. Third, Dewey continues to argue that this pragmatic approach to truth is “the only one entitled to be called a correspondence theory of truth” (1941: 179) using terms nearly identical to those he used in 1911: My own view takes correspondence in the operational sense…of answering, as a key answers to conditions imposed by a lock, or as two correspondents “answer” each other; or, in general, as a reply is an adequate answer to a question or criticism—; as, in short, a solution answers the requirements of a problem. (1941: 178) Thanks to Russell (e.g., 1941: Ch. XXIII) and others, by 1941 Dewey was aware of the problems facing pragmatic accounts of truth. In response, we see him turning to the language of “warranted assertibility”, drawing a distinction between “propositions” and “judgments”, and grounding the concept of truth (or warranted assertibility) in scientific inquiry (Thayer 1947; Burke 1994). These adjustments were designed to extend, clarify, and improve on Peirce’s and James’ accounts. Whether they did so is an open question. Certainly many, such as Quine, concluded that Dewey was only sidestepping important questions about truth: that Dewey’s strategy was “simply to avoid the truth predicate and limp along with warranted belief” (Quine 2008: 165). Peirce, James, and Dewey were not the only ones to propose or defend a pragmatic theory of truth in the nineteenth and early twentieth centuries. Others, such as F.C.S. Schiller (1864–1937), also put forward pragmatic theories (though Schiller’s view, which he called “humanism”, also attracted more than its share of critics, arguably for very good reasons). Pragmatic theories of truth also received the attention of prominent critics, including Russell (1909, 1910 [1994]), Moore (1908), Lovejoy (1908a,b) among others. Several of these criticisms will be considered later; suffice it to say that pragmatic theories of truth soon came under pressure that led to revisions and several successor approaches over the next hundred-plus years. Historically Peirce, James, and Dewey had the greatest influence in setting the parameters for what makes a theory of truth pragmatic—this despite the sometimes significant differences between their respective accounts, and that over time they modified and clarified their positions in response to both criticism and over-enthusiastic praise. While this can make it difficult to pin down a single definition of what, historically, counted as a pragmatic theory of truth, there are some common themes that cut across each of their accounts. First, each account begins from a pragmatic analysis of the meaning of the truth predicate. On the assumption that describing a belief, claim, or judgment as “true” must make some kind of practical difference, each of these accounts attempts to describe what this difference is. Second, each account then connects truth specifically to processes of inquiry: to describe a claim as true is to say that it either has or will stand up to scrutiny. Third, each account rejects correspondence theories of truth as overly abstract, “transcendental”, or metaphysical. Or, more accurately, each attempts to redefine correspondence in pragmatic terms, as the agreement between a claim and a predicted outcome. While the exact accounts offered by Peirce, James, and Dewey found few defenders—by the mid-twentieth century pragmatic theories of truth were largely dormant—these themes did set a trajectory for future versions of the pragmatic theory of truth. 2. Neo-Pragmatic Theories of Truth Pragmatic theories of truth enjoyed a resurgence in the last decades of the twentieth century. This resurgence was especially visible in debates between Hilary Putnam (1926–2016) and Richard Rorty (1931–2007) though broadly pragmatic ideas were defended by other philosophers as well (Bacon 2012: Ch. 4). (One example is Crispin Wright’s superassertibility theory (1992, 2001) which he claims is “as well equipped to express the aspiration for a developed pragmatist conception of truth as any other candidate” (2001: 781) though he does not accept the pragmatist label.) While these “neo-pragmatic” theories of truth sometimes resembled the classical pragmatic accounts of Peirce, James, or Dewey, they also differed significantly, often by framing the concept of truth in explicitly epistemic terms such as assertibility or by drawing on intervening developments in the field. At the outset, neo-pragmatism was motivated by a renewed dissatisfaction with correspondence theories of truth and the metaphysical frameworks supporting them. Some neo-pragmatic theories of truth grew out of a rejection of metaphysical realism (e.g., Putnam 1981; for background see Khlentzos 2016). If metaphysical realism cannot be supported then this undermines a necessary condition for the correspondence theory of truth: namely, that there be a mind-independent reality to which propositions correspond. Other neo-pragmatic approaches emerged from a rejection of representationalism: if knowledge is not the mind representing objective reality—if we cannot make clear sense of how the mind could be a “mirror of nature” to use Rorty’s (1979) term—then we are also well-advised to give up thinking of truth in realist, correspondence terms. Despite these similar starting points, neo-pragmatic theories took several different and evolving forms over the final decades of the twentieth century. At one extreme some neo-pragmatic theories of truth seemed to endorse relativism about truth (whether and in what sense they did remains a point of contention). This view was closely associated with influential work by Richard Rorty (1982, 1991a,b). The rejection of representationalism and the correspondence theory of truth led to the conclusion that inquiry is best viewed as aiming at agreement or “solidarity”, not knowledge or truth as these terms are traditionally understood. This had the radical consequence of suggesting that truth is no more than “what our peers will, ceteris paribus, let us get away with saying” (Rorty 1979: 176; Rorty [2010a: 45] admits this phrase is provocative) or just “an expression of commendation” (Rorty 1991a: 23). Not surprisingly, many found this position deeply problematic since it appears to relativize truth to whatever one’s audience will accept (Baghramian 2004: 147). A related concern is that this position also seems to conflate truth with justification, suggesting that if a claim meets contextual standards of acceptability then it also counts as true (Gutting 2003). Rorty for one often admitted as much, noting that he tended to “swing back and forth between trying to reduce truth to justification and propounding some form of minimalism about truth” (1998: 21). A possible response to the accusation of relativism is to claim that this neo-pragmatic approach does not aim to be a full-fledged theory of truth. Perhaps truth is actually a rather light-weight concept and does not need the heavy metaphysical lifting implied by putting forward a “theory”. If the goal is not to describe what truth is but rather to describe how “truth” is used, then these uses are fairly straightforward: among other things, to make generalizations (“everything you said is true”), to commend (“so true!”), and to caution (“what you said is justified, but it might not be true”) (Rorty 1998: 22; 2000: 4). None of these uses requires that we embark on a possibly fruitless hunt for the conditions that make a proposition true, or for a proper definition or theory of truth. If truth is “indefinable” (Rorty 2010b: 391) then this account cannot be definition or theory of truth, relativist or otherwise. This approach differs in some noteworthy ways from earlier pragmatic accounts of truth. For one thing it is able to draw on, and draw parallels with, a range of well-developed non-correspondence theories of truth that begin (and sometimes end) by stressing the fundamental equivalence of “S is p” and “‘S is p’ is true”. These theories, including disquotationalism, deflationism, and minimalism, simply were not available to earlier pragmatists (though Peirce does at times discuss the underlying notions). Furthermore, while Peirce and Dewey, for example, were proponents of scientific inquiry and scientific processes of verification, on this neo-pragmatic approach science is no more objective or rational than other disciplines: as Rorty put it, “the only sense in which science is exemplary is that it is a model of human solidarity” (1991b: 39). Finally, on this approach Peirce, James, and Dewey simply did not go far enough: they failed to recognize the radical implications of their accounts of truth, or else failed to convey these implications adequately. In turn much of the critical response to this kind of neo-pragmatism is that it goes too far by treating truth merely as a sign of commendation (plus a few other functions). In other words, this type of neo-pragmatism goes to unpragmatic extremes (e.g., Haack 1998; also the exchange in Rorty & Price 2010). A less extreme version of neo-pragmatism attempts to preserve truth’s objectivity and independence while still rejecting metaphysical realism. This version was most closely associated with Hilary Putnam, though Putnam’s views changed over time (see Hildebrand 2003 for an overview of Putnam’s evolution). While this approach frames truth in epistemic terms—primarily in terms of justification and verification—it amplifies these terms to ensure that truth is more than mere consensus. For example, this approach might identify “being true with being warrantedly assertible under ideal conditions” (Putnam 2012b: 220). More specifically, it might demand “that truth is independent of justification here and now, but not independent of all justification” (Putnam 1981: 56). Rather than play up assertibility before one’s peers or contemporaries, this neo-pragmatic approach frames truth in terms of ideal warranted assertibility: namely, warranted assertibility in the long run and before all audiences, or at least before all well-informed audiences. Not only does this sound much less relativist but it also bears a strong resemblance to Peirce’s and Dewey’s accounts (though Putnam, for one, resisted the comparison: “my admiration for the classical pragmatists does not extend to any of the different theories of truth that Peirce, James, and Dewey advanced” [2012c: 70]). To repeat, this neo-pragmatic approach is designed to avoid the problems facing correspondence theories of truth while still preserving truth’s objectivity. In the 1980s this view was associated with Putnam’s broader program of “internal realism”: the idea that “what objects does the world consist of? is a question that it only makes sense to ask within a theory or description” (Putnam 1981: 49, emphasis in original). Internal realism was designed as an alternative to metaphysical realism that dispensed with achieving an external “God’s Eye Point of View” while still preserving truth’s objectivity, albeit internal to a given theory. (For additional criticisms of metaphysical realism see Khlentzos 2016.) In the mid-1990s Putnam’s views shifted toward what he called “natural realism” (1999; for a critical discussion of Putnam’s changing views see Wright 2000). This shift came about in part because of problems with defining truth in epistemic terms such as ideal warranted assertibility. One problem is that it is difficult to see how one can verify either what these ideal conditions are or whether they have been met: one might attempt to do so by taking an external “god’s eye view”, which would be inconsistent with internal realism, or one might come to this determination from within one’s current theory, which would be circular and relativistic. (As Putnam put it, “to talk of epistemically ‘ideal’ connections must either be understood outside the framework of internal realism or it too must be understood in a solipsistic manner ” (2012d: 79–80).) Since neither option seems promising this does not bode well for internal realism or for any account of truth closely associated with it. If internal realism cannot be sustained then a possible fallback position is “natural realism”—the view “that the objects of (normal ‘veridical’) perception are ‘external’ things, and, more generally, aspects of ‘external’ reality” (Putnam 1999: 10)—which leads to a reconciliation of sorts with the correspondence theory of truth. A natural realism suggests “that true empirical statements correspond to states of affairs that actually obtain” (Putnam 2012a: 97), though this does not commit one to a correspondence theory of truth across the board. Natural realism leaves open the possibility that not all true statements “correspond” to a state of affairs, and even those that do (such as empirical statements) do not always correspond in the same way (Putnam 2012c: 68–69; 2012a: 98). While not a ringing endorsement of the correspondence theory of truth, at least as traditionally understood, this neo-pragmatic approach is not a flat-out rejection either. Viewing truth in terms of ideal warranted assertibility has obvious pragmatic overtones of Peirce and Dewey. Viewing truth in terms of a commitment to natural realism is not so clearly pragmatic though some parallels still exist. Because natural realism allows for different types of truth-conditions—some but not all statements are true in virtue of correspondence—it is compatible with the truth-aptness of normative discourse: just because ethical statements, for example, do not correspond in an obvious way to ethical state of affairs is no reason to deny that they can be true (Putnam 2002). In addition, like earlier pragmatic theories of truth, this neo-pragmatic approach redefines correspondence: in this case, by taking a pluralist approach to the correspondence relation itself (Goodman 2013). These two approaches—one tending toward relativism, the other tending toward realism—represented the two main currents in late twentieth century neo-pragmatism. Both approaches, at least initially, framed truth in terms of justification, verification, or assertibility, reflecting a debt to the earlier accounts of Peirce, James, and Dewey. Subsequently they evolved in opposite directions. The first approach, often associated with Rorty, flirts with relativism and implies that truth is not the important philosophical concept it has long been taken to be. Here, to take a neo-pragmatic stance toward truth is to recognize the relatively mundane functions this concept plays: to generalize, to commend, to caution and not much else. To ask for more, to ask for something “beyond the here and now”, only commits us to “the banal thought that we might be wrong” (Rorty 2010a: 45). The second neo-pragmatic approach, generally associated with Putnam, attempts to preserve truth’s objectivity and the important role it plays across scientific, mathematical, ethical, and political discourse. This could mean simply “that truth is independent of justification here and now” or “that to call a statement of any kind…true is to say that it has the sort of correctness appropriate to the kind of statement it is” (2012a: 97–98). On this account truth points to standards of correctness more rigorous than simply what our peers will let us get away with saying. 3. Truth as a Norm of Inquiry and Assertion More recently—since roughly the turn of the twenty-first century—pragmatic theories of truth have focused on truth’s role as a norm of assertion or inquiry. These theories are sometimes referred to as “new pragmatic” theories to distinguish them from both classical and neo-pragmatic accounts (Misak 2007b; Hookway 2016). Like neo-pragmatic accounts, these theories often build on, or react to, positions besides the correspondence theory: for example, deflationary, minimal, and pluralistic theories of truth. Unlike some of the neo-pragmatic accounts discussed above, these theories give relativism a wide berth, avoid defining truth in terms of concepts such as warranted assertibility, and treat correspondence theories of truth with deep suspicion. On these accounts truth plays a unique and necessary role in assertoric discourse (Price 1998, 2003, 2011; Misak 2000, 2007a, 2015): without the concept of truth there would be no difference between making assertions and, to use Frank Ramsey’s nice phrase, “comparing notes” (1925 [1990: 247]). Instead, truth provides the “convenient friction” that “makes our individual opinions engage with one another” (Price 2003: 169) and “is internally related to inquiry, reasons, and evidence” (Misak 2000: 73). Like all pragmatic theories of truth, these “new” pragmatic accounts focus on the use and function of truth. However, while classical pragmatists were responding primarily to the correspondence theory of truth, new pragmatic theories also respond to contemporary disquotational, deflationary, and minimal theories of truth (Misak 1998, 2007a). As a result, new pragmatic accounts aim to show that there is more to truth than its disquotational and generalizing function (for a dissenting view see Freedman 2006). Specifically, this “more” is that the concept of truth also functions as a norm that places clear expectations on speakers and their assertions. In asserting something to be true, speakers take on an obligation to specify the consequences of their assertion, to consider how their assertions can be verified, and to offer reasons in support of their claims: once we see that truth and assertion are intimately connected—once we see that to assert that p is true is to assert p—we can and must look to our practices of assertion and to the commitments incurred in them so as to say something more substantial about truth. (Misak 2007a: 70) Truth is not just a goal of inquiry, as Dewey claimed, but actually a norm of inquiry that sets expectations for how inquirers conduct themselves. More specifically, without the norm of truth assertoric discourse would be degraded almost beyond recognition. Without the norm of truth, speakers could be held accountable only for either insincerely asserting things they don’t themselves believe (thus violating the norm of “subjective assertibility”) or for asserting things they don’t have enough evidence for (thus violating the norm of “personal warranted assertibility”) (Price 2003: 173–174). The norm of truth is a condition for genuine disagreement between people who speak sincerely and with, from their own perspective, good enough reasons. It provides the “friction” we need to treat disagreements as genuinely needing resolution: otherwise, “differences of opinion would simply slide past one another” (Price 2003: 180–181). In sum, the concept of truth plays an essential role in making assertoric discourse possible, ensuring that assertions come with obligations and that conflicting assertions get attention. Without truth, it is no longer clear to what degree assertions would still be assertions, as opposed to impromptu speculations or musings. (Correspondence theories should find little reason to object: they too can recognize that truth functions as a norm. Of course, correspondence theorists will want to add that truth also requires correspondence to reality, a step “new” pragmatists will resisting taking.) It is important that this account of truth is not a definition or theory of truth, at least in the narrow sense of specifying necessary and sufficient conditions for a proposition being true. (That is, there is no proposal along the lines of “S is true iff…”; though see Brown (2015: 69) for a Deweyan definition of truth and Heney (2015) for a Peircean response.) As opposed to some versions of neo-pragmatism, which viewed truth as “indefinable” in part because of its supposed simplicity and transparency, this approach avoids definitions because the concept of truth is implicated in a complex range of assertoric practices. Instead, this approach offers something closer to a “pragmatic elucidation” of truth that gives “an account of the role the concept plays in practical endeavors” (Misak 2007a: 68; see also Wiggins 2002: 317). The proposal to treat truth as a norm of inquiry and assertion can be traced back to both classical and neo-pragmatist accounts. In one respect, this account can be viewed as adding on to neo-pragmatic theories that reduce truth to justification or “personal warranted assertibility”. In this respect, these newer pragmatic accounts are a response to the problems facing neo-pragmatism. In another respect, new pragmatic accounts can be seen as a return to the insights of classical pragmatists updated for a contemporary audience. For example, while Peirce wrote of beliefs being “fated” to be agreed upon at the “ideal limit” of inquiry—conditions that to critics sounded metaphysical and unverifiable—a better approach is to treat true beliefs as those “that would withstand doubt, were we to inquire as far as we fruitfully could on the matter” (Misak 2000: 49). On this account, to say that a belief is true is shorthand for saying that it “gets thing right” and “stands up and would continue to stand up to reasons and evidence” (Misak 2015: 263, 265). This pragmatic elucidation of the concept of truth attempts to capture both what speakers say and what they do when they describe a claim as true. In a narrow sense the meaning of truth—what speakers are saying when they use this word—is that true beliefs are indefeasible. However, in a broader sense the meaning of truth is also what speakers are doing when they use this word, with the proposal here that truth functions as a norm that is constitutive of assertoric discourse. As we have seen, pragmatic accounts of truth focus on the function the concept plays: specifically, the practical difference made by having and using the concept of truth. Early pragmatic accounts tended to analyze this function in terms of the practical implications of labeling a belief as true: depending on the version, to say that a belief is true is to signal one’s confidence, or that the belief is widely accepted, or that it has been scientifically verified, or that it would be assertible under ideal circumstances, among other possible implications. These earlier accounts focus on the function of truth in conversational contexts or in the context of ongoing inquiries. The newer pragmatic theories discussed in this section take a broader approach to truth’s function, addressing its role not just in conversations and inquiries but in making certain kinds of conversations and inquiries possible in the first place. By viewing truth as a norm of assertion and inquiry, these more recent pragmatic theories make the function of truth independent of what individual speakers might imply in specific contexts. Truth is not just what is assertible or verifiable (under either ideal or non-ideal circumstances), but sets objective expectations for making assertions and engaging in inquiry. Unlike neo-pragmatists such as Rorty and Putnam, new pragmatists such as Misak and Price argue that truth plays a role entirely distinct from justification or warranted assertibility. This means that, without the concept of truth and the norm it represents, assertoric discourse (and inquiry in general) would dwindle into mere “comparing notes”. 4. Common Features Pragmatic theories of truth have evolved to where a variety of different approaches are described as “pragmatic”. These theories often disagree significantly with each other, making it difficult either to define pragmatic theories of truth in a simple and straightforward manner or to specify the necessary conditions that a pragmatic theory of truth must meet. As a result, one way to clarify what makes a theory of truth pragmatic is to say something about what pragmatic theories of truth are not. Given that pragmatic theories of truth have often been put forward in contrast to prevailing correspondence and other “substantive” theories of truth (Wyatt & Lynch, 2016), this suggests a common commitment shared by the pragmatic theories described above. One way to differentiate pragmatic accounts from other theories of truth is to distinguish the several questions that have historically guided discussions of truth. While some have used decision trees to categorize different theories of truth (Lynch 2001a; Künne 2003), or have proposed family trees showing relations of influence and affinity (Haack 1978), another approach is to distinguish separate “projects” that examine different dimensions of the concept of truth (Kirkham 1992). (These projects also break into distinct subprojects; for a similar approach see Frapolli 1996.) On this last approach the first, “metaphysical”, project aims to identify the necessary and sufficient conditions for “what it is for a statement…to be true” (Kirkham 1992: 20; Wyatt & Lynch call this the “essence project” [2016: 324]). This project often takes the form of identifying what makes a statement true: e.g., correspondence to reality, or coherence with other beliefs, or the existence of a particular state of affairs. A second, “justification”, project attempts to specify “some characteristic, possessed by most true statements…by reference to which the probable truth or falsity of the statement can be judged” (Kirkham 1992: 20). This often takes the form of giving a criterion of truth that can be used to determine whether a given statement is true. Finally, the “speech-act” project addresses the question of “what are we doing when we make utterances” that “ascribe truth to some statement?” (Kirkham 1992: 28). Unfortunately, truth-theorists have not always been clear on which project they are pursuing, which can lead to confusion about what counts as a successful or complete theory of truth. It can also lead to truth-theorists talking past each other when they are pursuing distinct projects with different standards and criteria of success. In these terms, pragmatic theories of truth are best viewed as pursuing the speech-act and justification projects. As noted above, pragmatic accounts of truth have often focused on how the concept of truth is used and what speakers are doing when describing statements as true: depending on the version, speakers may be commending a statement, signaling its scientific reliability, or committing themselves to giving reasons in its support. Likewise, pragmatic theories often focus on the criteria by which truth can be judged: again, depending on the version, this may involve linking truth to verifiability, assertibility, usefulness, or long-term durability. With regard to the speech-act and justification projects pragmatic theories of truth seem to be on solid ground, offering plausible proposals for addressing these projects. They are on much less solid ground when viewed as addressing the metaphysical project. As we will see, it is difficult to defend the idea, for example, that either utility, verifiability, or widespread acceptance are necessary and sufficient conditions for truth or are what make a statement true. This would suggest that the opposition between pragmatic and correspondence theories of truth is partly a result of their pursuing different projects. From a pragmatic perspective, the problem with the correspondence theory is its pursuit of the metaphysical project that, as its name suggests, invites metaphysical speculation about the conditions which make sentences true—speculation that can distract from more central questions of how the truth predicate is used and how true beliefs are best recognized and acquired. (Pragmatic theories of truth are not alone in raising these concerns (David 2016).) From the standpoint of correspondence theories and other accounts that pursue the metaphysical project, pragmatic theories will likely seem incomplete, sidestepping the most important questions (Howat 2014). But from the standpoint of pragmatic theories, projects that pursue or prioritize the metaphysical project are deeply misguided and misleading. This supports the following truism: a common feature of pragmatic theories of truth is that they focus on the practical function that the concept of truth plays. Thus, whether truth is a norm of inquiry (Misak), a way of signaling widespread acceptance (Rorty), stands for future dependability (Peirce), or designates the product of a process of inquiry (Dewey), among other things, pragmatic theories shed light on the concept of truth by examining the practices through which solutions to problems are framed, tested, asserted, and defended—and, ultimately, come to be called true. Pragmatic theories of truth can thus be viewed as making contributions to the speech-act and justification projects by focusing especially on the practices people engage in when they solve problems, make assertions, and conduct scientific inquiry. Of course, even though pragmatic theories of truth largely agree on which questions to address and in what order, this does not mean that they agree on the answers to these questions, or on how to best formulate the meaning and function of truth. Another common commitment of pragmatic theories of truth—besides prioritizing the speech-act and justification projects—is that they do not restrict truth to certain topics or types of inquiry. That is, regardless of whether the topic is descriptive or normative, scientific or ethical, pragmatists tend to view it as an opportunity for genuine inquiry that incorporates truth-apt assertions. The truth-aptness of ethical and normative statements is a notable feature across a range of pragmatic approaches, including Peirce’s (at least in some of his moods, e.g., 1901b [1958: 8.158]), Dewey’s theory of valuation (1939), Putnam’s questioning of the fact-value dichotomy (2002), and Misak’s claim that “moral beliefs must be in principle responsive to evidence and argument” (2000: 94; for a dissenting view see Frega 2013). This broadly cognitivist attitude—that normative statements are truth-apt—is related to how pragmatic theories of truth de-emphasize the metaphysical project. As a result, from a pragmatic standpoint one of the problems with the correspondence theory of truth is that it can undermine the truth-aptness of normative claims. If, as the correspondence theory proposes, a necessary condition for the truth of a normative claim is the existence of a normative fact to which it corresponds, and if the existence of normative facts is difficult to account for (normative facts seem ontologically distinct from garden-variety physical facts), then this does not bode well for the truth-aptness of normative claims or the point of posing, and inquiring into, normative questions (Lynch 2009). If the correspondence theory of truth leads to skepticism about normative inquiry, then this is all the more reason, according to pragmatists, to sidestep the metaphysical project in favor of the speech-act and justification projects. As we have seen, pragmatic theories of truth take a variety of different forms. Despite these differences, and despite often being averse to being called a “theory”, pragmatic theories of truth do share some common features. To begin with, and unlike many theories of truth, these theories focus on the pragmatics of truth-talk: that is, they focus on how truth is used as an essential step toward an adequate understanding of the concept of truth (indeed, this come close to being an oxymoron). More specifically, pragmatic theories look to how truth is used in epistemic contexts where people make assertions, conduct inquiries, solve problems, and act on their beliefs. By prioritizing the speech-act and justification projects, pragmatic theories of truth attempt to ground the concept of truth in epistemic practices as opposed to the abstract relations between truth-bearers (such as propositions or statements) and truth-makers (such as states of affairs) appealed to by correspondence theories (MacBride 2018). Pragmatic theories also recognize that truth can play a fundamental role in shaping inquiry and assertoric discourse—for example, by functioning as a norm of these practices—even when it is not explicitly mentioned. In this respect pragmatic theories are less austere than deflationary theories which limit the use of truth to its generalizing and disquotational roles. And, finally, pragmatic theories of truth draw no limits, at least at the outset, to the types of statements, topics, and inquiries where truth may play a practical role. If it turns out that a given topic is not truth-apt, this is something that should be discovered as a characteristic of that subject matter, not something determined by having chosen one theory of truth or another (Capps 2017). 5. Critical Assessments Pragmatic theories of truth have faced several objections since first being proposed. Some of these objections can be rather narrow, challenging a specific pragmatic account but not pragmatic theories in general (this is the case with objections raised by other pragmatic accounts). This section will look at more general objections: either objections that are especially common and persistent, or objections that pose a challenge to the basic assumptions underlying pragmatic theories more broadly. 5.1 Three Classic Objections and Responses Some objections are as old as the pragmatic theory of truth itself. The following objections were raised in response to James’ account in particular. While James offered his own responses to many of these criticisms (see especially his 1909 [1975]), versions of these objections often apply to other and more recent pragmatic theories of truth (for further discussion see Haack 1976; Tiercelin 2014). One classic and influential line of criticism is that, if the pragmatic theory of truth equates truth with utility, this definition is (obviously!) refuted by the existence of useful but false beliefs, on the one hand, and by the existence of true but useless beliefs on the other (Russell 1910 [1994] and Lovejoy 1908a,b). In short, there seems to be a clear and obvious difference between describing a belief as true and describing it as useful: when we say that a belief is true, the thought we wish to convey is not the same thought as when we say that the belief furthers our purposes; thus “true” does not mean “furthering our purposes”. (Russell 1910 [1994: 98]) While this criticism is often aimed especially at James’ account of truth, it plausibly carries over to any pragmatic theory. So whether truth is defined in terms of utility, long-term durability or assertibility (etc.), it is still an open question whether a useful or durable or assertible belief is, in fact, really true. In other words, whatever concept a pragmatic theory uses to define truth, there is likely to be a difference between that concept and the concept of truth (e.g., Bacon 2014 questions the connection between truth and indefeasibility). A second and related criticism builds on the first. Perhaps utility, long-term durability, and assertibility (etc.) should be viewed not as definitions but rather as criteria of truth, as yardsticks for distinguishing true beliefs from false ones. This seems initially plausible and might even serve as a reasonable response to the first objection above. Falling back on an earlier distinction, this would mean that appeals to utility, long-term durability, and assertibility (etc.) are best seen as answers to the justification and not the metaphysical project. However, without some account of what truth is, or what the necessary and sufficient conditions for truth are, any attempt to offer criteria of truth is arguably incomplete: we cannot have criteria of truth without first knowing what truth is. If so, then the justification project relies on and presupposes a successful resolution to the metaphysical project, the latter cannot be sidestepped or bracketed, and any theory which attempts to do so will give at best a partial account of truth (Creighton 1908; Stebbing 1914). And a third objection builds on the second. Putting aside the question of whether pragmatic theories of truth adequately address the metaphysical project (or address it at all), there is also a problem with the criteria of truth they propose for addressing the justification project. Pragmatic theories of truth seem committed, in part, to bringing the concept of truth down to earth, to explaining truth in concrete, easily confirmable, terms rather than the abstract, metaphysical correspondence of propositions to truth-makers, for example. The problem is that assessing the usefulness (etc.) of a belief is no more clear-cut than assessing its truth: beliefs may be more or less useful, useful in different ways and for different purposes, or useful in the short- or long-run. Determining whether a belief is really useful is no easier, apparently, than determining whether it is really true: “it is so often harder to determine whether a belief is useful than whether it is true” (Russell 1910 [1994: 121]; also 1946: 817). Far from making the concept of truth more concrete, and the assessment of beliefs more straightforward, pragmatic theories of truth thus seem to leave the concept as opaque as ever. These three objections have been around long enough that pragmatists have, at various times, proposed a variety of responses. One response to the first objection, that there is a clear difference between utility (etc.) and truth, is to deny that pragmatic approaches are aiming to define the concept of truth in the first place. It has been argued that pragmatic theories are not about finding a word or concept that can substitute for truth but that they are, rather, focused on tracing the implications of using this concept in practical contexts. This is what Misak (2000, 2007a) calls a “pragmatic elucidation”. Noting that it is “pointless” to offer a definition of truth, she concludes that “we ought to attempt to get leverage on the concept, or a fix on it, by exploring its connections with practice” (2007a: 69; see also Wiggins 2002). It is even possible that James—the main target of Russell and others—would agree with this response. As with Peirce, it often seems that James’ complaint is not with the correspondence theory of truth, per se, as with the assumption that the correspondence theory, by itself, says much interesting or important about the concept of truth. (For charitable interpretations of what James was attempting to say see Ayer 1968, Chisholm 1992, Bybee 1984, Cormier 2001, 2011, and Perkins 1952; for a reading that emphasizes Peirce’s commitment to correspondence idioms see Atkins 2010.) This still leaves the second objection: that the metaphysical project of defining truth cannot be avoided by focusing instead on finding the criteria for truth (the “justification project”). To be sure, pragmatic theories of truth have often been framed as providing criteria for distinguishing true from false beliefs. The distinction between offering a definition as opposed to offering criteria would suggest that criteria are separate from, and largely inferior to, a definition of truth. However, one might question the underlying distinction: as Haack (1976) argues, the pragmatists’ view of meaning is such that a dichotomy between definitions and criteria would have been entirely unacceptable to them. (1976: 236) If meaning is related to use (as pragmatists generally claim) then explaining how a concept is used, and specifying criteria for recognizing that concept, may provide all one can reasonably expect from a theory of truth. Deflationists have often made a similar point though, as noted above, pragmatists tend to find deflationary accounts excessively austere. Even so, there is still the issue that pragmatic criteria of truth (whatever they are) do not provide useful insight into the concept of truth. If this concern is valid, then pragmatic criteria, ironically, fail the pragmatic test of making a difference to our understanding of truth. This objection has some merit: for example, if a pragmatic criterion of truth is that true beliefs will stand up to indefinite inquiry then, while it is possible to have true beliefs, “we are never in a position to judge whether a belief is true or not” (Misak 2000: 57). In that case it is not clear what good it serves to have a pragmatic criterion of truth. Pragmatic theories of truth might try to sidestep this objection by stressing their commitment to both the justification and the speech-act project. While pragmatic approaches to the justification project spell out what truth means in conversational contexts—to call a statement true is to cite its usefulness, durability, etc.—pragmatic approaches to the speech-act project point to what speakers do in using the concept of truth. This has the benefit of showing how the concept of truth—operating as a norm of assertion, say—makes a real difference to our understanding of the conditions on assertoric discourse. Pragmatic theories of truth are, as a result, wise to pursue both the justification and the speech-act projects. By itself, pragmatic approaches to the justification project are likely to disappoint. These classic objections to the pragmatic theory of truth raise several important points. For one thing, they make it clear that pragmatic theories of truth, or at least some historically prominent versions of it, do a poor job if viewed as providing a strict definition of truth. As Russell and others noted, defining truth in terms of utility or similar terms is open to obvious counter-examples. This does not bode well for pragmatic attempts to address the metaphysical project. As a result, pragmatic theories of truth have evolved often by focusing on the justification and speech-act projects instead. This is not to say that each of the above objections have been met. It is still an open question whether the metaphysical project can be avoided as many pragmatic theories attempt to do (e.g., Fox 2008 argues that epistemic accounts such as Putnam’s fail to explain the value of truth as well as more traditional approaches do). It is also an open question whether, as they evolve in response to these objections, pragmatic theories of truth invite new lines of criticism.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

Pluralist Theories of Truth

1. Alethic pluralism about truth: a plurality of properties 1.1 Strength The pluralist’s thesis that there are many ways of being true is typically construed as being tantamount to the claim that the number of truth properties is greater than one. However, this basic interpretation, is compatible with both moderate …

1. Alethic pluralism about truth: a plurality of properties 1.1 Strength The pluralist’s thesis that there are many ways of being true is typically construed as being tantamount to the claim that the number of truth properties is greater than one. However, this basic interpretation, is compatible with both moderate as well as more radical precisifications. According to moderate pluralism, at least one way of being true among the multitude of others is universally shared: According to strong pluralism, however, there is no such universal or common way of being true: Precisifying pluralism about truth in these two ways brings several consequences to the fore. Firstly, both versions of pluralism conflict with strong monism about truth: Secondly, moderate—but not strong—pluralism is compatible with a moderate version of monism about truth: (2) and (5) are compatible because (5) does not rule out the possibility that the truth property had by all true sentences might be one among the multitude of truth properties endorsed by the moderate pluralist (i.e., by someone who endorses (2)). Only strong pluralism in (3) entails the denial of the claim that all true sentences are true in the same way. Thus, moderate pluralists and moderate monists can in principle find common ground. 1.2 Related kinds of pluralism and neighboring views Not infrequently, pluralism about truth fails to be distinguished from various other theses about associated conceptual, pragmatic, linguistic, semantic, and normative phenomena. Each of these other theses involves attributing plurality to a different aspect of the analysandum (explanandum, definiendum, etc.). For instance, linguistically, one may maintain that there is a plurality of truth predicates (Wright 1992; Tappolet 1997; Lynch 2000; Pedersen 2006, 2010). Semantically, one may maintain that alethic terms like ‘true’ have multiple meanings (Pratt 1908; Tarski 1944; Kölbel 2008, 2013; Wright 2010). Cognitively or conceptually, one may maintain that there is a multiplicity of truth concepts or regimented ways of conceptualizing truth (Künne 2003; cf. Lynch 2006). Normatively, one might think that truth has a plurality of profiles (Ferrari 2016, 2018). These parameters or dimensions suggest that pluralism is itself not just a single, monolithic theory (see also Sher 1998; Wright 2013). Any fully developed version of pluralism about truth is likely to make definitive commitments about at least some of these other phenomena. (However, it hardly entails them; one can consistently be an alethic pluralist about truth, for instance, without necessarily having commitments to linguistic pluralism about truth predicates, or about concepts like fact or actuality.) Nonetheless, theses about these other phenomena should be distinguished from pluralism about truth, as understood here. Likewise, pluralism about truth must be distinguished from several neighbouring views, such as subjectivism, contextualism, relativism, or even nihilism about truth. For example, one can maintain some form of subjectivism about truth while remaining agnostic about how many ways of being true there are. Or again, one can consistently maintain that there is exactly one way of being true, which is always and everywhere dependent on context. Nor is it inconsistent to be both a pluralist and an absolutist or other anti-relativist about truth. For example, one might argue that each of the different ways of being true holds absolutely if it holds at all (Wright 1992). Alternatively, one might explicate a compatibilist view, in which there are at least two kinds of truth, absolute and relative truth (Joachim 1905), or deflationist and substantivist (Kölbel 2013). Such views would be, necessarily, pluralistic. Occasionally, pluralists have also been lumped together with various groups of so-called ‘nihilists’, ‘deniers’, and ‘cynics’, and even associated with an ‘anything goes’ approach to truth (Williams 2002). However, any version of pluralism is prima facie inconsistent with any view that denies truth properties, such as nihilism and certain forms of nominalism. 1.3 Alethic pluralism, inflationism, and deflationism The foregoing varieties of pluralism are consistent with various further analyses of pluralists’ ideas about truth. For instance, pluralists may—but need not—hold that truth properties are simply one-place properties, since commitments to truth’s being monadic are orthogonal to commitments to its being monistic. However, most pluralists converge on the idea that truth is a substantive property and take this idea as the point of departure for articulating their view. A property is substantive just in case there is more to its nature than what is given in our concept of the property. A paradigmatic example of a substantive property is the property of being water. There is more to the nature of water—being composed of H\(_2\)O, e.g.—than what is revealed in our concept of water (the colourless, odourless liquid that comes out of taps, fills lakes, etc.) The issue of substantiveness connects with one of the major issues in the truth debate: the rift between deflationary theories of truth and their inflationary counterparts (Horwich 1990; Edwards 2013b; Künne 2003; Sher 2016b; Wyatt 2016; Wyatt & Lynch 2016). A common way to understand the divide between deflationists and inflationists is in terms of the question whether or not truth is a substantive property. Inflationists endorse this idea, while deflationists reject it. More specifically, deflationists and inflationists can be seen as disagreeing over the following claim: The inflationist accepts (6). According to her, it is not transparent in the concept of truth that being true is a matter of possessing some further property (cohering, corresponding, etc.). This makes truth a substantive property. The deflationist, on the other hand, rejects (6) because she is committed to the idea that everything there is to know about truth is transparent in the concept—which, on the deflationist’s view, is exhausted by the disquotational schema (‘\(p\)’ is true if, and only if, \(p)\), or some principle like it. Deflationists also tend to reject a further claim about truth’s explanatory role: Inflationists, on the other hand, typically accept both (6) and (7). Strong and moderate versions of pluralism are perhaps best understood as versions of a non-traditional inflationary theory (for an exception, see Beall 2013; for refinements, see Edwards 2012b and Ferrari & Moruzzi forthcoming). Pluralists side with inflationists on (6) and (7), and so, their views count as inflationary. Yet, traditional inflationary theories are also predominantly monistic. They differ about which property \(F\)—coherence, identity, superwarrant, correspondence, etc.—truth consists in, but concur that there is precisely one such property: The monistic supposition in (8) is tantamount to the claim that there is but one way of being true. In opposing that claim, pluralism counts as non-traditional. 2. Motivating pluralism: the scope problem Pluralists’ rejection of (8) typically begins by rendering it as a claim about the invariant nature of truth across all regions of discourse (Acton 1935; Wright 1992, 1996; Lynch 2000, 2001; for more on domains see Edwards 2018b; Kim & Pedersen 2018, Wyatt 2013; Yu 2017). Thus rendered, the claim appears to be at odds with the following observation: For example, some theories—such as correspondence theories—seem intuitively plausible when applied to truths about ladders, ladles, and other ordinary objects. However, those theories seem much less convincing when applied to truths about comedy, fashion, ethical mores, numbers, jurisprudential dictates, etc. Conversely, theories that seem intuitively plausible when applied to legal, comic, or mathematical truths—such as those suggesting that the nature of truth is coherence—seem less convincing when applied to truths about the empirical world. Pluralists typically take traditional inflationary theories of truth to be correct in analyzing truth in terms of some substantive property \(F\). Yet, the problem with their monistic suppositions lies with generalization: a given property \(F\) might be necessary and sufficient for explaining why sentences about a certain subject matter are true, but no single property is necessary and sufficient for explaining why \(p\) is true for all sentences \(p\), whatever its subject matter. Subsequently, those theories’ inability to generalize their explanatory scope beyond the select few regions of discourse for which they are intuitively plausible casts aspersion on their candidate for \(F\). This problem has gone by various names, but has come to be known as ‘the scope problem’ (Lynch 2004b, 2009; cf. Sher 1998). Pluralists respond to the scope problem by first rejecting (8) and replacing it with: With (10), pluralists contend that the nature of truth is not a single property \(F\) that is invariant across all regions of discourse; rather the true sentences in different regions of discourse may consist in different properties among the plurality \(F_1 , \ldots ,F_n\) that constitute truth’s nature. The idea that truth is grounded in various properties \(F_1 , \ldots ,F_n\) might be further introduced by way of analogy. Consider water. We ordinarily think and talk about something’s being water as if it were just one thing—able to exist in different states, but nevertheless consisting in just one property (H\(_2\)O). But it would be a mistake to legislate in advance that we should be monists about water, since the nature of water is now known to vary more than our intuitions would initially have it. The isotopic distribution of water allows for different molecular structures, to include hydroxonium (H\(_3\)O), deuterium oxide (D\(_2\)O), and so-called ‘semi-heavy water’ (HDO). Or again, consider sugar, the nature of which includes glucose, fructose, lactose, cellulose, and similar other such carbohydrates. For the pluralist, so too might truth be grounded as a plurality of more basic properties. One reason to take pluralism about truth seriously, then, is that it provides a solution to the scope problem. In rejecting the ‘one-size-fits-all’ approach to truth, pluralists formulate a theory whose generality is guaranteed by accommodating the various properties \(F_1 , \ldots ,F_n\) by which true sentences come to be true in different regions of discourse. A second and related reason is that the view promises to be explanatory. Variance in the nature of truth in turn explains why theories of truth perform unequally across various regions of discourse—i.e., why they are descriptively adequate and appropriate in certain regions of discourse, but not others. For pluralists, the existence of different kinds of truths is symptomatic of the non-uniform nature of truth itself. Subsequently, taxonomical differences among truths might be better understood by formulating descriptive models about how the nature of truth might vary between those taxa. 3. Prominent versions of pluralism 3.1 Platitude-based strategies Many pluralists have followed Wright (1992) in supposing that compliance with platitudes is what regiments and characterizes the behavior and content of truth-predicates. Given a corollary account of how differences in truth predicates relate to differences among truth properties, this supposition suggests a platitude-based strategy for positing many ways of being true. Generally, a strategy will be platitude-based if it is intended to show that a certain collection of platitudes \(p_1 , \ldots ,p_n\) suffices for understanding the analysandum or explanandum. By ‘platitude’, philosophers generally mean certain uncontroversial expressions about a given topic or domain. Beyond that, conceptions about what more something must be or have to count as platitudinous vary widely. A well-known version of platitude-based pluralism is discourse pluralism. The simplest versions of this view make the following four claims. Firstly, discourse exhibits natural divisions, and so can be stably divided into different regions \(D_1 , \ldots ,D_n\). Secondly, the platitudes subserving some \(D_i\) may be different than those subserving \(D_j\). Thirdly, for any pair \((D_i, D_j)\), compliance with different platitudes subserving each region of discourse can, in principle, result in numerically distinct truth predicates \((t_i, t_j)\). Finally, numerically distinct truth predicates designate different ways of being true. Discourse pluralism is frequently associated with Crispin Wright (1992, 1996, 2001), although others have held similar views (see, e.g., Putnam 1994: 515). Wright has argued that discourse pluralism is supported by what he calls ‘minimalism’. According to minimalism, compliance with both the disquotational schema and the operator schema, as well as other ‘parent’ platitudes, is both necessary and sufficient for some term \(t_i\) to qualify as expressing a concept worth regarding as TRUTH (1992: 34–5). Wright proposed that the parent platitudes, which basically serve as very superficial formal or syntactic constraints, fall into two subclasses: those connecting truth with assertion (‘transparency’), and those connecting truth with logical operations (‘embedding’), Any such term complying with these parent platitudes, regardless of region of discourse, counts as what Wright called a ‘lightweight’ or ‘minimal’ truth predicate. Yet, the establishment of some \(t\) as a minimal truth predicate is compatible, argued Wright, with the nature of truth consisting in different things in different domains (2001: 752). Wright (2001) has also suggested that lightweight truth predicates tend to comply with five additional subclasses of platitudes, including those connecting truth with reality (‘correspondence’) and eternity (‘stability’), and those disconnecting truth from epistemic state (‘opacity’), justification (‘contrast’), and scalar degree (‘absoluteness’), The idea is that \(t\) may satisfy additional platitudes beyond these, and in doing so may increase its ‘weight’. For example, some \(t_i\) may be a more heavyweight truth predicate than \(t_j\) in virtue of satisfying platitudes which entail that truth be evidence-transcendent or that there be mind-independent truth-makers. Finally, differences in what constitutes truth in \(D_1 , \ldots ,D_n\) are tracked by differences in the weight of these predicates. In this way, Wright is able to accommodate the intuition that sentences about, e.g., macromolecules in biochemistry are amenable to realist truth in a way that sentences about distributive welfare in ethics may not be. Distinctions among truth predicates, according to the discourse pluralist, are due to more and less subtle differences among platitudes and principles with which they must comply. For example, assuming that accuracy of reflection is a matter of degree, predicates for truth and truthlikeness diverge because a candidate predicate may comply with either (18) or else either of (26) or (27); to accommodate both, two corollary platitudes must be included to make explicit that accurate reflection in the case of truth is necessarily maximal and that degrees of accuracy are not equivalent to degrees of truth. Indeed, it is not unusual for platitudes to presuppose certain attendant semantic or metaphysical views. For example, requires anti-nominalist commitments, an ontological commitment to propositions, and commitments to the expression relation (translation relations, an account of synonymy, etc.). Discourse pluralists requiring predicates to comply with (28) in order to count as truth-predicates must therefore be prepared to accommodate other claims that go along with (28) as a package-deal. ‘Functionalism about truth’ names the thesis that truth is a functional kind. The most comprehensive and systematic development of a platitude-based version of functionalism comes from Michael Lynch, who has been at the forefront of ushering in pluralist themes and theses (see Lynch 1998, 2000, 2001, 2004c, 2005a, 2005b, 2006, 2009, 2012, 2013; Devlin 2003). Lynch has urged that we need to think about truth in terms of the ‘job’ or role, \(F\), that true sentences stake out in our discursive practices (2005a: 29). Initially, Lynch’s brand of functionalism attempted to implicitly define the denotation of ‘truth’ using the quasi-formal technique of Ramsification. The technique commences by treating ‘true’ as the theoretical term \(\tau\) issued by the theory \(T\) and targeted for implicit definition. Firstly, the platitudes and principles of the theory are amassed \((T: p_1 , \ldots ,p_n)\) so that the \(F\)-role can be specified holistically. Secondly, a certain subset \(A\) of essential platitudes \((p_i , \ldots ,p_k)\) must be extracted from \(T\), and are then conjoined. Thirdly, following David Lewis, \(T\) is rewritten as so as to isolate the \(\tau\)-terms from the non-theoretical (‘old, original, other’) \(o\)-terms. Fourthly, all instances of ‘true’ and other cognate or closely related \(\tau\)-terms are then replaced by subscripted variables \(x_1 , \ldots ,x_n\). The resulting open sentence is prefixed with existential quantifiers to bind them. Next, the Ramsey sentence is embedded in a material biconditional; this allows functionalists to then specify the conditions by which a given truth-apt sentence \(p\) has a property that plays the \(F\)-role: where, say, the variable \(x_1\) is the one that replaced ‘true’. Having specified the conditions under which \(p\) has some property realizing \(F\), functionalists can then derive another material biconditional stating that \(p\) is true iff \(p\) has some property realizing the \(F\)-role. However, as Lynch (2004: 394) cautioned, biconditionals that specify necessary and sufficient conditions for \(p\) to be true still leave open questions about the ‘deep’ metaphysical nature of truth. Thus, given the choice, Lynch—following up on a suggestion from Pettit (1996: 886)—urged functionalists to identify truth, not with the properties realizing the \(F\)-role in a given region of discourse, but with the \(F\)-role itself. Doing so is one way to try to secure the ‘unity’ of truth (on the presumption that there is just one \(F\)-role). Hence, to say that truth is a functional kind \(F\) is to say that the \(\tau\)-term ‘truth’ denotes the property of having a property that plays the \(F\)-role, where the \(F\)-role is tantamount to the single unique second-order property of being \(F\). Accordingly, this theory proposes that something is true just in case it is \(F\). Two consequences are apparent. Firstly, the functionalist’s commitment to alethic properties realizing the \(F\)-role seems to be a commitment to a grounding thesis. This explains why Lynch’s version of alethic functionalism fits the pattern typical of inflationary theories of truth, which are committed to (6) and (7) above. Secondly, however, like most traditional inflationary theories, Lynch’s functionalism about truth appears to be monistic. Indeed, the functionalist commitment to identifying truth with and only with the unique property of being \(F\) seems to entail a commitment to strong alethic monism in (5) rather than pluralism (Wright 2005). Nonetheless, it is clear that Lynch’s version does emphasize that sentences can have the property of being \(F\) in different ways. The theory thus does a great deal to accommodate the intuitions that initially motivate the pluralist thesis that there is more than one way of being true, and to finesse a fine line between monism and pluralism. For pluralists, this compromise may not be good enough, and critics of functionalism about truth have raised several concerns. One stumbling block for functionalist theories is a worry about epistemic circularity. As Wright (2010) observes, any technique for implicit definition, such as Ramsification, proceeds on the basis of explicit decisions that the platitudes and principles constitutive of the modified Ramsey sentence are themselves true, and making explicit decisions that they are true requires already knowing in advance what truth is. Lynch (2013a) notes that the problem is not peculiar to functionalism about truth, generalizing to virtually all approaches that attempt to fix the denotation of ‘true’ by appeal to implicit definition. Some might want to claim that it generalizes even further, namely to any theory of truth whatsoever. Another issue is that the \(F\)-role becomes disunified to the extent that \(T\) can accommodate substantially different platitudes and principles. Recall that the individuation and identity conditions of the \(F\)-role—with which truth is identified—are determined holistically by the platitudes and principles constituting \(T\). So where \(T\) is constituted by expressions of the beliefs and commitments of ordinary folk, pluralists could try to show that these beliefs and commitments significantly differ across epistemic communities (see, e.g., Næss 1938a, b; Maffie 2002; Ulatowski 2017, Wyatt 2018). In that case, Ramsification over significantly different principles may yield implicit definitions of numerically distinct role properties \(F_1, F_2 , \ldots ,F_n\), each of which is a warranted claimant to being truth. 3.2 Correspondence pluralism The correspondence theory is often invoked as exemplary of traditional monistic theories of truth, and thus as a salient rival to pluralism about truth. Prima facie, however, the two are consistent. The most fundamental principle of any version of the correspondence theory, specifies what truth consists in. Since it involves no covert commitment about how many ways of being true there are, it does not require denying that there is more than one (Wright & Pedersen 2010). In principle, there may be different ways of consisting in correspondence that yield different ways of being true. Subsequently, whether the two theories turn out to be genuine rivals depends on whether further commitments are made to explicitly rule out pluralism. Correspondence theorists have occasionally made proposals that combine their view with a version of pluralism. An early—although not fully developed—proposal of this kind was made by Henry Acton (1935: 191). Two recent proposals are noteworthy and have been developed in detail. Gila Sher (1998, 2004, 2005, 2013, 2015, 2016a) has picked up the project of expounding on the claim that sentence in domains like logic correspond to facts in a different way than do sentences in other domains, while Terence Horgan and colleagues (Horgan 2001; Horgan & Potrč 2000, 2006; Horgan & Timmons 2002; Horgan & Barnard 2006; Barnard & Horgan 2013) have elaborated a view that involves a defense of the claim that not all truths correspond to facts in the same way. For Sher, truth does not consist in different properties in different regions of discourse (e.g., superwarrant in macroeconomics, homomorphism in immunology, coherence in film studies, etc.). Rather, it always and everywhere consists in correspondence. Taking ‘correspondence’ to generally refer to an \(n\)-place relation \(R\), Sher advances a version of correspondence pluralism by countenancing different ‘forms’, or ways of corresponding. For example, whereas the physical form of correspondence involves a systematic relation between the content of physical sentences and the physical structure of the world, the logical form of correspondence involves a systematic relation between the logical structure of sentences and the formal structure of the world, while the moral form of correspondence involves a relation between the moral content of sentences and (arguably) the psychological or sociological structure of the world. Sher’s view can be regarded as a moderate form of pluralism. It combines the idea that truth is many with the idea that truth is one. Truth is many on Sher’s view because there are different forms of correspondence. These are different ways of being true. At the same time, truth is one because these different ways of being true are all forms of correspondence. For Sher, a specific matrix of ‘factors’ determines the unique form of correspondence as well as the correspondence principles that govern our theorizing about them. Which factors are in play depends primarily on the satisfaction conditions of predicates. For example, the form of correspondence for logical truths of the form is determined solely by the logical factor, which is reflected by the universality of the union of the set of self-identical things and its complement. Or again, consider the categorical sentences and Both (33) and (34) involve a logical factor, which is reflected in their standard form as I-statements (i.e., some \(S\) are \(P)\), as well as the satisfaction conditions of the existential quantifier and copula; a biological factor, which is reflected in the satisfaction conditions for the predicate ‘is human’; and a normative factor, which is reflected in the satisfaction conditions for the predicates ‘is disadvantaged’ and ‘is vain’. But whereas (34) involves a psychological factor, which is reflected in the satisfaction conditions for ‘is vain’, (33) does not. Also, (33) may involve a socioeconomic factor, which is reflected in the satisfaction conditions for ‘is disadvantaged’, whereas (34) does not. By focusing on subsentential factors instead of supersentential regions of discourse, Sher offers a more fine-grained way to individuate ways in which true sentences correspond. (Sher supposes that we cannot name the correspondent of a given true sentence since there is no single discrete hypostatized entity beyond the \(n\)-tuples of objects, properties and relations, functions, structures (complexes, configurations), etc. that already populate reality.) The upshot is a putative solution to problems of mixed discourse (see §4 below): the truth of sentences like is determined by all of the above factors, and which is—despite the large overlap—a different kind of truth than either of the atomic sentences (33) and (34), according to Sher. For their part, Horgan and colleagues propose a twist on the correspondence theorist’s claim that truth consists in a correspondence relation \(R\) obtaining between a given truth-bearer and a fact. They propose that there are exactly two species of the relation \(R\): ‘direct’ (\(R_{dir}\)) and ‘indirect correspondence’ (\(R_{ind}\)), and thus exactly two ways of being true. For Horgan and colleagues, which species of \(R\)—and thus which way of being true—obtains will depend on the austerity of ontological commitments involved in assessing sentences; in turn, which commitments are involved depends on discursive context and operative semantic standards. For example, an austere ontology commits to only a single extant object: namely, the world (affectionally termed the ‘blobject’). Truths about the blobject, such as if it is one, correspond to it directly. Truths about things other than the blobject correspond to them indirectly. For example, sentences such as may be true even if the extension of the predicate ‘university’ is—strictly speaking—empty or what is referred to by ‘online universities’ is not in the non-empty extension of ‘university’. In short, \(p\) is true\(_1\) iff \(p\) is \(R_{dir}\)-related to the blobject given contextually operative standards \(c_i, c_j , \ldots ,c_m\). Alternatively, \(p\) is true\(_2\) iff \(p\) is \(R_{ind}\)-related to non-blobject entities given contextually operative standards \(c_j, c_k , \ldots ,c_n\). So, truth always consists in correspondence. But the two types of correspondence imply that there is more than one way of being true. 4. Objections to pluralism and responses 4.1 Ambiguity Some take pluralists to be committed to the thesis that ‘true’ is ambiguous: since the pluralist thinks that there is a range of alethically potent properties (correspondence, coherence, etc.), ‘true’ must be ambiguous between these different properties. This is thought to raise problems for pluralists. According to one objection, the pluralist appears caught in a grave dilemma. ‘True’ is either ambiguous or unambiguous. If it is, then there is a spate of further problems awaiting (see §4.4–§4.6 below). If it is not, then there is only one meaning of ‘true’ and thus only one property designated by it; so pluralism is false. Friends of pluralism have tended to self-consciously distance themselves from the claim that ‘true’ is ambiguous (e.g., Wright 1996: 924, 2001; Lynch 2001, 2004b, 2005c). Generally, however, the issue of ambiguity for pluralism has not been well-analyzed. Yet, one response has been investigated in some detail. According to this response, the ambiguity of ‘true’ is simply to be taken as a datum. ‘True’ is de facto ambiguous (see, e.g., Schiller 1906; Pratt 1908; Kaufmann 1948; Lucas 1969; Kölbel 2002, 2008; Sher 2005; Wright 2010). Alfred Tarski, for instance, wrote: The word ‘true’, like other words from our everyday language, is certainly not unambiguous. […] We should reconcile ourselves with the fact that we are confronted, not with one concept, but with several different concepts which are denoted by one word; we should try to make these concepts as clear as possible (by means of definition, or of an axiomatic procedure, or in some other way); to avoid further confusion we should agree to use different terms for different concepts […]. (1944: 342, 355) If ‘true’ is ambiguous de facto, as some authors have suggested, then the ambiguity objection may turn out to be—again—not so much an objection or disconfirmation of the theory, but rather just a datum about ‘truth’-talk in natural language that should be explained or explained away by theories of truth. In that case, pluralists seem no worse off—and possibly better—than any number of other truth theorists. A second possible line of response from pluralists is that their view is not necessarily inconsistent with a monistic account of either the meaning of ‘true’ or the concept TRUTH. After all, ‘true’ is ambiguous only if it can be assigned more than one meaning or semantic structure; and it has more than one meaning only if there is more than one stable conceptualization or concept TRUTH supporting each numerically distinct meaning. Yet, nothing about the claim that there is more than one way of being true entails, by itself, that there is more than one concept TRUTH. In principle, the nature of properties like being true—whether homomorphism, superassertibility, coherence, etc.—may outstrip the concept thereof, just as the nature of properties like being water—such as H\(_2\)O, H\(_3\)O, XYZ, etc.—may outstrip the concept WATER (see, e.g., Wright 1996, 2001; Alston 2002; Lynch 2001, 2005c, 2006). Nor is monism about truth necessarily inconsistent with semantic or conceptual pluralism. The supposition that TRUTH is both many and one (i.e., ‘moderate monism’) neither rules out the construction of multiple concepts or meanings thereof, nor rules out the proliferation of uses to express those concepts or meanings. For example, suppose that the only way of being true turns out to be a structural relation \(R\) between reality and certain representations thereof. Such a case is consistent with the existence of competing conceptions of what \(R\) consists in: weak homomorphism, isomorphism, ‘seriously dyadic’ correspondence, a causal \(n\)-place correspondence relation, etc. A more sensitive conclusion, then, is just that the objection from ambiguity is an objection to conceptual or semantic pluralism, not to any alethic theory—pluralism or otherwise. 4.2 The scope problem as a pseudo-problem According to the so-called ‘Quine-Sainsbury objection’, pluralists’ postulation of ambiguity in metalinguistic alethic terms is not actually necessary, and thus not well-motivated. This is because taxonomical differences among kinds of truths in different domains can be accounted for simply by doing basic ontology in object-level languages. [E]ven if it is one thing for ‘this tree is an oak’ to be true, another thing for ‘burning live cats is cruel’ to be true, and yet another for ‘Buster Keaton is funnier than Charlie Chaplin’ to be true, this should not lead us to suppose that ‘true’ is ambiguous; for we get a better explanation of the differences by alluding to the differences between trees, cruelty, and humor. (Sainsbury 1996: 900; see also Quine 1960: 131) Generally, pluralists have not yet developed a response to the Quine-Sainsbury objection. And for some, this is because the real force of the Quine-Sainsbury objection lies in its exposure of the scope problem as a pseudo-problem (Dodd 2013; see also Asay 2018). Again, the idea is that traditional inflationary theories postulate some candidate for \(F\) but the applicability and plausibility of \(F\) differs across regions of discourse. No such theory handles the truths of moral, mathematical, comic, legal, etc. discourse equally well; and this suggests that these theories, by their monism, face limitations on their explanatory scope. Pluralism offers a non-deflationary solution. Yet, why think that these differences among domains mark an alethic difference in truth per se, rather than semantic or discursive differences among the sentences comprising those domains? There is more than one way to score a goal in soccer, for example (via corner kick, ricochet off the foot of an opposing player or the head of a teammate, obstruct the goalkeeper, etc.), but it is far from clear that this entails pluralism about the property of scoring a goal in soccer. (Analogy belongs to an anonymous referee.) Pluralists have yet to adequately address this criticism (although see Blackburn 2013; Lynch 2013b, 2018; Wright 1998 for further discussion). 4.3 The criteria problem Pluralists who invoke platitude-based strategies bear the burden of articulating inclusion and exclusion criteria for determining which expressions do, or do not, count as members of the essential subset of platitudes upon which this strategy is based (Wright, 2005). Candidates include: ordinariness, intuitiveness, uninformativeness, wide use or citation, uncontroversiality, a prioricity, analyticity, indefeasibility, incontrovertibility, and sundry others. But none has proven to be uniquely adequate, and there is nothing close to a consensus about which criteria to rely on. For instance, consider the following two conceptions. One conception takes platitudes about \(x\) to be expressions that must be endorsed on pain of being linguistically incompetent with the application of the terms \(t_1 , \ldots ,t_n\) used to talk about \(x\) (Nolan 2009). However, this conception does not readily allow for disagreement: prima facie, it is not incoherent to think that two individuals, each of whom is competent with the application of \(t_1 (x), \ldots ,t_n (x)\), may differ as to whether some \(p\) must be endorsed or whether some expression is genuinely platitudinous. For instance, consider the platitude in (17), which connects being true with corresponding with reality. Being linguistically competent with terms for structural relations like correspondence does not force endorsement of claims that connect truth with correspondence; no one not already in the grip of the correspondence theory would suppose that they must endorse (17), and those who oppose it would certainly suppose otherwise. Further inadequacies beleaguer this conception. It makes no provision for degrees of either endorsement or linguistic incompetence. It makes no distinction between theoretical and non-theoretical terms, much less restrict \(t_1 (x), \ldots ,t_n (x)\) to non-theoretical terms. Nor does it require that platitudes themselves be true. On one hand, this consequently leaves open the possibility that universally-endorsed but false or otherwise alethically defective expressions are included in the platitude-based analysis of ‘true’. An old platitude about whales, for example—one which was universally endorsed on pain of being linguistically incompetent—prior to whales being classified as cetaceans—was that they are big fish. The worry, then, is that the criteria may allow us to screen in certain ‘fish stories’ about truth. This would be a major problem for advocates of Ramsification and other forms of implicit definition, since those techniques work only on the presupposition that all input being Ramsified over or implicitly defined is itself true (Wright 2010). On the other hand, making explicit that platitudes must also be true seems to entail that they are genuine ‘truisms’ (Lynch 2005c), though discovering which ones are truly indefeasible is a further difficulty—one made more difficult by the possibility of error theories (e.g., Devlin 2003) suggesting that instances of the \(T\)-schema are universally false. Indeed, we are inclined to say instances of disquotational, equivalence, and operator schemas are surely candidates for being platitudinous if anything is; but to say that they must be endorsed on pain of being linguistically incompetent is to rule out a priori error theories about instances of the \(T\)-schema. A second, closely related conception is that platitudes are expressions, which—in virtue of being banal, vacuous, elementary, or otherwise trivial—are acceptable by anyone who understands them (Horwich 1990). The interaction of banality or triviality with acceptance does rule out a wide variety of candidate expressions, however. For instance, claims that are acceptable by anyone who understands them may still be too substantive or informative to count as platitudinous, depending on what they countenance. Similarly, claims that are too ‘thin’ or neutral to vindicate any particular theory \(T\) may still be too substantive or informative to count as genuinely platitudinous on this conception (Wright 1999). This is particularly so given that nothing about a conception of platitudes as ‘pretheoretical claims’ strictly entails that they reduce to mere banalities (Vision 2004). Nevertheless, criteria like banality or triviality plus acceptance might also screen in too few expressions (perhaps as few as one, such as a particular instance of the \(T\)-schema). Indeed, it is an open question whether any of the principles in (11)–(28) would count as platitudes on this conception. An alternative conception emphasizes that the criteria should instead be the interaction of informality, truth, a prioricity, or perhaps even analyticity (Wright 2001: 759). In particular, platitudes need not take the form of an identity claim, equational definition, or a material biconditional. At the extreme, expressions can be as colloquial as you please so long as they remain true a priori (or analytically). These latter criteria are commonly appealed to, but are also not with problems. Firstly, a common worry is whether there are any strictly analytic truths about truth, and, if there are, whether they can perform any serious theoretical work. Secondly, these latter criteria would exclude certain truths that are a posteriori but no less useful to a platitude-based strategist. 4.4 The instability challenge Another objection to pluralism is that it is an inherently instable view: i.e., as soon as the view is formulated, simple reasoning renders it untenable (Pedersen 2006, 2010; see also Tappolet 1997, 2000; Wright 2012). This so-called instability challenge can be presented as follows. According to the moderate pluralist, there is more than one truth property \(F_1 , \ldots ,F_n\). Yet, given \(F_1 , \ldots ,F_n\), it seems we should recognize another truth property: Observe that \(F_U\) is not merely some property possessed by every \(p\) which happens to have one of \(F_1 , \ldots ,F_n\). (The property of being a sentence is one such a property, but it poses no trouble to the pluralist.) Rather, \(F_U\) must be an alethic property whose extension perfectly positively covaries with the combined extension of the pluralist truth properties \(F_1 , \ldots ,F_n\). And since nothing is required for the existence of this new property other than the truth properties already granted by the pluralist, (38) gives a necessary and sufficient condition for \(F_U\) to be had by some \(p\): a sentence \(p\) is \(F_U\) just in case \(p\) is \(F_1 \vee \cdots \vee F_n\). Thus, any sentence that is any of \(F_1 , \ldots ,F_n\) may be true in some more generic or universal way, \(F_U\). This suggests, at best, that strong pluralism is false, and moderate monism is true; and at worst, there seems to be something instable, or self-refuting, about pluralism. Pluralists can make concessive or non-concessive responses to the instability challenge. A concessive response grants that such a truth property exists, but maintains that it poses no serious threat to pluralism. A non-concessive response is one intended to rebut the challenge, e.g., by rejecting the existence of a common or universal truth property. One way of trying to motivate this rejection of \(F_U\) is by attending to the distinction between sparse and abundant properties, and then demonstrating that alethic properties like truth must be sparse and additionally argue that the would-be trouble-maker \(F_U\) is an abundant property. According to sparse property theorists, individuals must be unified by some qualitative similarity in order to share a property. For example, all even numbers are qualitatively similar in that they share the property of being divisible by two without remainder. Now, consider a subset of very diverse properties \(G_1 , \ldots ,G_n\) possessed by an individual \(a\). Is there some further, single property of being \(G_1\), or …, or \(G_n\) that \(a\) has? Such a further property, were it to exist, would be highly disjunctive; and it may seem unclear what, if anything, individuals that were \(G_1\), or …, or \(G_n\) would have in common—other than being \(G_1\), or …, or \(G_n\). According to sparse property theorists, the lack of qualitative similarity means that this putative disjunctive property is not a property properly so-called. Abundant property theorists, on the other hand, deny that qualitative similarity is needed in order for a range of individuals to share a property. Properties can be as disjunctive as you like. Indeed, for any set \(A\) there is at least one property had by all members of \(A\)—namely, being a member of \(A\). And since there is a set of all things that have some disjunctive property, there is a property—abundantly construed—had by exactly those things. It thus seems difficult to deny the existence of \(F_U\) if the abundant conception of properties is adopted. So pluralists who want to give a non-concessive response to the metaphysical instability challenge may want to endorse the sparse conception (Pedersen 2006). This is because the lack of uniformity in the nature of truth across domains is underwritten by a lack of qualitative similarity between the different truth properties that apply to specific domains of discourse. The truth property \(F_U\) does not exist, because truth properties are to be thought of in accordance with the sparse conception. Even if the sparse conception fails to ground pluralists’ rejection of the existence of the universal truth property \(F_U\), a concessive response to the instability challenge is still available. Pluralists can make a strong case that the truth properties \(F_1 , \ldots ,F_n\) are more fundamental than the universal truth property \(F_U\) (Pedersen 2010). This is because \(F_U\) is metaphysically dependent on \(F_1 , \ldots ,F_n\), in the sense that \(F_U\) is introduced in virtue of its being one of \(F_1 , \ldots ,F_n\), and not vice-versa. Hence, even if the pluralist commits to the existence of \(F_U\)—and hence, to moderate metaphysical monism—there is still a clear sense in which her view is distinctively more pluralist than monist. 4.5 Problems regarding mixed discourse The content of some atomic sentences seems to hark exclusively from a particular region of discourse. For instance, ‘lactose is a sugar’ concerns chemical reality, while ‘\(7 + 5 = 12\)’ is solely about the realm of numbers (and operations on these). Not all discourse is pure or exclusive, however; we often engage in so-called ‘mixed discourse’, in which contents from different regions of discourse are combined. For example, consider: Mixed atomic sentences such as (39) are thought to pose problems for pluralists. It seems to implicate concepts from the physical domain (causation), the mental domain (pain), and the moral domain (badness) (Sher 2005: 321–22). Yet, if pluralism is correct, then in which way is (39) true? Is it true in the way appropriate to talk of the physical, the mental, or the moral? Is it true in neither of these ways, or in all of these three ways, or in some altogether different way? The source of the problem may be the difficulty in classifying discursive content—a classificatory task that is an urgent one for pluralists. For it is unclear how they can maintain that regions of discourse \(D_1 , \ldots ,D_n\) partially determine the ways in which sentences can be true without a procedure for determining which region of discourse \(D_i\) a given \(p\) belongs to. One suggestion is that a mixed atomic sentence \(p\) belongs to no particular domain. Another is that it belongs to several (Wyatt 2013). Lynch (2005b: 340–41) suggested paraphrasing mixed atomic sentences as sentences that are classifiable as belonging to particular domains. For example, (39) might be paraphrased as: Unlike (39), the paraphrased (40) appears to be a pure atomic sentence belonging to the domain of morals. This proposal remains underdeveloped, however. It is not at all clear that (40) counts as a felicitous paraphrase of (39), and, more generally, unclear whether all mixed atomic sentences can be paraphrased such that they belong to just one domain without thereby altering their meaning, truth-conditions, or truth-values. Another possible solution addresses the problem head-on by questioning whether atomic sentences really are mixed, thereby denying the need for any such paraphrases. Consider the following sentences: Prima facie, what determines the domain-membership of (41) and (42) is the aesthetic and legal predicates ‘is beautiful’ and ‘is illegal’, respectively. It is an aesthetic matter whether the Mona Lisa is beautiful; this is because (41) is true in some way just in case the Mona Lisa falls in the extension of the aesthetic predicate ‘is beautiful’ (and mutatis mutandis for (42)). In the same way, we might take (39) to exclusively belong to the moral domain given that the moral predicate ‘is bad’. (This solution was presented in the first 2012 version of this entry; see Edwards 2018a for later, more detailed treatment.) It is crucial to the latter two proposals that any given mixed atomic sentence \(p\) has its domain membership essentially, since such membership is what determines the relevant kind of truth. Sher (2005, 2011) deals with the problem of mixed atomic sentences differently. On her view, the truth of a mixed atomic sentence is not accounted for by membership to some specific domain; rather the ‘factors’ involved in the sentence determine a specific form of correspondence, and this specific form of correspondence is what accounts for the truth of \(p\). The details about which specific form of correspondence obtains is determined at the sub-sentential levels of reference, satisfaction, and fulfillment. For example, the form of correspondence that accounts for the truth of (39) obtains as a combination of the physical fulfillment of ‘the causing of \(x\)’, the mental reference of ‘pain’, and the moral satisfaction of ‘\(x\) is bad’ (2005: 328). No paraphrase is needed. Another related problem pertains to two or more sentences joined by one or more logical connectives, as in Unlike atomic sentences, the mixing here takes place at the sentential rather than sub-sentential level: (43) is a conjunction, which mixes the pure sentence ‘\(7 + 5 = 12\)’ with the pure sentence ‘killing innocent people is wrong’. (There are, of course, also mixed compounds that involve mixed atomic sentences.) For many theorists, each conjunct seems to be true in a different way, if true at all: the first conjunct in whatever way is appropriate to moral theory, and the second conjunct in whatever way is appropriate to arithmetic. But then, how is the pluralist going to account for the truth of the conjunction (Tappolet 2000: 384)? Pluralists owe an answer to the question of which way, exactly, a conjunction is true when its conjuncts are true in different ways. Additional complications arise for pluralists who commit to facts being what make sentences true (e.g., Lynch 2001: 730), or other such truth-maker or -making theses. Prima facie, we would reasonably expect there to be different kinds of facts that make the conjuncts of (43) true, and which subsequently account for the differences in their different ways of being true. However, what fact or facts makes true the mixed compound? Regarding (43), is it the mathematical fact, the moral fact, or some further kind of fact? On one hand, the claims that mathematical or moral facts, respectively, make \(p\) true seem to betray the thought that both facts contribute equally to the truth of the mixed compound. On the other hand, the claim that some third ‘mixed’ kind of fact makes \(p\) true leaves the pluralist with the uneasy task of telling a rather alchemist story about fact-mixtures. Functionalists about truth (e.g., Lynch 2005b: 396–97) propose to deal with compounds by distinguishing between two kinds of realizers of the \(F\)-role. The first is an atomic realizer, such that an atomic proposition \(p\) is true iff \(p\) has a property that realizes the \(F\)-role. The second is a compound realizer, such that a compound \(q * r\) (where \(q\) and \(r\) may themselves be complex) is true iff The realizers for atomic sentences are properties like correspondence, coherence, and superwarrant. The realizer properties for compounds are special, in the sense that realizer properties for a given kind of compound are only had by compounds of that kind. Witness that each of these compound realizer properties requires any of its bearers to be an instance of a specific truth-function. Pure and mixed compounds are treated equally on this proposal: when true, they are true because they instantiate the truth-function for conjunction, having two or more conjuncts that have a property that realizes the \(F\)-role (and mutatis mutandis for disjunctions and material conditionals). However, this functionalist solution to the problem of mixed compounds relies heavily on that theory’s monism—i.e., its insistence that the single role property \(F\) is a universal truth property. This might leave one wondering whether a solution is readily available to someone who rejects the existence of such a property. One strategy is simply to identify the truth of conjunctions, disjunctions, and conditionals with the kind of properties specified by (44), (45), and (46), respectively (as opposed to taking them to be realizers of a single truth property). Thus, e.g., the truth of any conjunction simply \(is\) to be an instance of the truth-function for conjunction with conjuncts that have the property that plays the \(F\)-role for them (Kim & Pedersen 2018, Pedersen & Lynch 2018 (Sect. 20.6.2.1). Another strategy is to try to use the resources of multi-valued logic. For example, one can posit an ordered set of designated values for each way of being true \(F_1 , \ldots ,F_n\) (perhaps according to their status as ‘heavyweight’ or ‘lightweight’), and then take conjunction to be a minimizing operation and disjunction a maximizing one, i.e., \(v(p \wedge q) = \min\{v(p), v(q)\}\) and \(v(p \vee q) = \max\{v(p), v(q)\}\). Resultingly, each conjunction and disjunction—whether pure or mixed—will be either true in some way or false in some way straightforwardly determined by the values of the constituents. For example, consider the sentences Suppose that (47) is true in virtue of corresponding to physical reality, while (48) true in virtue of cohering with a body of law; and suppose further that correspondence \((F_1)\) is more ‘heavyweight’ than coherence \((F_2)\). Since conjunction is a minimizing operation and \(F_2 \lt F_1\), then ‘heat is mean molecular kinetic energy and manslaughter is a felony’ will be \(F_2\). Since disjunction is a maximizing operation, then ‘heat is mean molecular kinetic energy or manslaughter is a felony’ will be \(F_1\). The many-valued solution to the problem of mixed compounds just outlined is formally adequate because it determines a way that each compound is true. However, while interesting, the proposal needs to be substantially developed in several respects. For example, how is negation treated—are there several negations, one for each way of being true, or is there a single negation? Also, taking ‘heat is mean molecular kinetic energy and manslaughter is a felony’ to be true in the way appropriate to law betrays a thought that seems at least initially compelling, viz. that both conjuncts contribute to the truth of the conjunction. Alternatively, one could take mixed compounds to be true in some third way. However, this would leave the pluralist with the task of telling some story about how this third way of being true relates to the other two. Again substantial work needs to be done. Edwards (2008) proposed another solution to the problem of mixed conjunctions, the main idea of which is to appeal to the following biconditional schema: Edwards suggests that pluralists can answer the challenge that mixed conjunctions pose by reading the stated biconditional as having an order of determination: \(p \wedge q\) is true\(_k\) in virtue of \(p\)’s being true\(_i\) and \(q\)’s being true\(_j\), but not vice-versa. This, he maintains, explains what kind of truth a conjunction \(p \wedge q\) has when its conjuncts are true in different ways; for the conjunction is true\(_k\) in virtue of having conjuncts that are both true, where it is inessential whether the conjuncts are true in the same way. Truth\(_k\) is a further way of being true that depends on the conjuncts being true in some way without reducing to either of them. The property true\(_k\) is thus not a generic or universal truth property that applies to the conjuncts as well as the conjunction. As Cotnoir (2009) emphasizes, Edwards’ proposal provides too little information about the nature of true\(_k\). What little is provided makes transparent the commitment to true\(_k\)’s being a truth property had only by conjunctions, in which case it is unclear whether Edwards’s solution can generalize. In this regard, Edwards’ proposal is similar to Lynch’s functionalist proposal, which is committed to there being a specific realizer property for each type of logical compound. Mixed inferences—inferences involving truth-apt sentences from different domains—appear to be yet another problem for the pluralist (Tappolet 1997, 2000; Pedersen 2006). One can illustrate the problem by supposing, with the pluralist, that there are two ways of being true, one of which is predicated of the antecedent of a conditional and the other as its consequent. It can be left open in what way the conditional itself is true. Consider the following inference: This inference would appear to be valid. However, it is not clear that pluralists can account for its validity by relying on the standard characterization of validity as necessary truth preservation from premises to conclusion. Given that the truth properties applicable to respectively (51) and (52) are different, what truth property is preserved in the inference? The pluralist owes an explanation of how the thesis that there are many ways of being true can account for the validity of mixed inferences. Beall (2000) argued that the account of validity used in multi-valued logics gives pluralists the resources to deal with the problem of mixed inferences. For many-valued logics, validity is accounted for in terms of preservation of designated value, where designated values can be thought of as ways of being true, while non-designated values can be thought of as ways of being false. Adopting a designated-value account of validity, pluralists can simply take \(F_1 , \ldots ,F_n\) to be the relevant designated values and define an inference as valid just in case the conclusion is designated if each premise is designated (i.e., one of \(F_1 , \ldots ,F_n)\). On this account, the validity of (mixed) arguments whose premises and conclusion concern different regions of discourse is evaluable in terms of more than one of \(F_1 , \ldots ,F_n\); the validity of (pure) arguments whose premises and conclusion pertain to the same region of discourse is evaluable in terms of the same \(F_i\) (where \(1 \le i \le n)\). An immediate rejoinder is that the term ‘true’ in ‘ways of being true’ refers to a universal way of being true—i.e., being designated simpliciter (Tappolet 2000: 384). If so, then the multi-valued solution comes at the cost of inadvertently acknowledging a universal truth property. Of course, as noted, the existence of a universal truth property poses a threat only to strong pluralism. 4.5 The problem of generalization Alethic terms are useful devices for generalizing. For instance, suppose we wish to state the law of excluded middle. A tedious way would be to produce a long—indeed, infinite—conjunction: However, given the equivalence schema for propositions, there is a much shorter formula, which captures what (54) is meant to express by using ‘true’, but without loss of explanatory power (Horwich 1990: 4): Alethic terms are also useful devices for generalizing over what speakers say, as in The utility of a generalization like (56) is not so much that it eliminates the need to rely on an infinite conjunction, but that it is ‘blind’ (i.e., made under partial ignorance of what was said). Pluralists seem to have difficulty accounting for truth’s use as a device for generalization. One response is to simply treat uses of ‘is true’ as elliptical for ‘is true in one way or another’. In doing so, pluralists account for generalization without sacrificing their pluralism. A possible drawback, however, is that it may commit pluralists to the claim that ‘true’ designates the disjunctive property of being \(F_1 \vee \cdots \vee F_n\). Granting the existence of such a property gives pluralists a story to tell about generalizations like (55) and (56), but the response is a concessive one available only to moderate pluralists. However, as noted in §4.2.3, the existence of such a property is not a devastating blow to all pluralists, since the domain-specific truth properties \(F_1 , \ldots ,F_n\) remain explanatorily basic in relation to the property of being \(F_1 \vee \cdots \vee F_n\).

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

The Identity Theory of Truth

1. Definition and Preliminary Exposition Declarative sentences seem to take truth-values, for we say things like But sentences are apparently not the only bearers of truth-values: for we also seem to allow that what such sentences express, or mean, may be true or false, saying such things as and or …

1. Definition and Preliminary Exposition Declarative sentences seem to take truth-values, for we say things like But sentences are apparently not the only bearers of truth-values: for we also seem to allow that what such sentences express, or mean, may be true or false, saying such things as and or If, provisionally, we call the things that declarative sentences express, or mean, their contents—again provisionally, these will be such things as that Socrates is wise—then the identity theory of truth, in its most general form, states that (cf. Baldwin 1991: 35): A fact is here to be thought of as, very generally, a way things are, or a way the world is. On this approach, the identity theory secures an intimate connection between language (what language expresses) and world. Of course there would in principle be theoretical room for a view that identified not the content of, say, the true declarative sentence “Socrates is wise”—let us assume from now on that this sentence is true—with the fact that Socrates is wise, but rather that sentence itself. But this is not a version of the theory that anyone has ever advanced, nor does it appear that it would be plausible to do so (see Candlish 1999b: 200–2; Künne 2003: 6). The early Wittgenstein does regard sentences as being themselves facts, but they are not identical with the facts that make them true. Alternatively, and using a different locution, one might say that, to continue with the same example, The idea here is that (6) makes a connection between language and reality: on the left-hand side we have something expressed by a piece of language, and on the right-hand side we allude to a bit of reality. Now (6) might look truistic, and that status has indeed been claimed for the identity theory, at least in one of its manifestations. John McDowell has argued that what he calls true “thinkables” are identical with facts (1996: 27–8, 179–80). Thinkables are things like that Socrates is wise regarded as possible objects of thought. For we can think that Socrates is wise; and it can also be the case that Socrates is wise. So the idea is that what we can think can also be (identical with) what is the case. That identity, McDowell claims, is truistic. On this approach, one might prefer one’s identity theory to take the form (cf. Hornsby 1997: 2): On this approach the identity theory explicitly aims to secure an intimate connection between mind (what we think) and world. A point which has perhaps been obscured in the literature on this topic, but which should be noticed, is that (7) asserts a relation of subordination: it says that true thinkables are a (proper or improper) subset of facts; it implicitly allows that there might be facts that are not identical with true thinkables. So (7) is not to be confounded with its converse, which asserts the opposite subordination, and says that facts are a (proper or improper) subset of true thinkables, implicitly allowing, this time, that there might be true thinkables that are not identical with facts. (8) is therefore distinct from (7), and if (7) is controversial, (8) is equally or more so, but for reasons that are at least in part different. (8) denies the existence of facts that cannot be grasped in thought. But many philosophers hold it to be evident that there are, or at least could be, such facts—perhaps certain facts involving indefinable real numbers, for example, or in some other way going beyond the powers of human thought. So (8) could be false; its status remains to be established; it can hardly be regarded as truistic. Accordingly, one might expect that an identity theorist who wished to affirm (7), and certainly anyone who wanted to say that (7) (or (6)) was truistic, would—at least qua identity theorist—steer clear of (8), and leave its status sub judice. In fact, however, a good number of identity theorists, both historical and contemporary, incorporate (8) as well as—or even instead of—(7) into their statement of the theory. Richard Cartwright, who published the first modern discussion of the theory in 1987, wrote that if one were formulating the theory, it would say “that every true proposition is a fact and every fact a true proposition” (1987: 74). McDowell states that true thinkables already belong just as much to the world as to minds [i.e., (7)], and things that are the case already belong just as much to minds as to the world [i.e., (8)]. It should not even seem that we need to choose a direction in which to read the claim of identity. (2005: 84) Jennifer Hornsby takes the theory to state that true thinkables and facts coincide (1997: 2, 9, 17, 20)—they are the same set—so that she in effect identifies that theory with the conjunction of (7) and (8), as also, in effect, does Julian Dodd (2008a: passim). Now, (8) is certainly an interesting thesis that merits much more consideration than it has hitherto received (at least in the recent philosophical literature), and, as indicated, some expositions of the identity theory have as much invested in (8) as in (5) or (7): on this point see further §2 below. Nevertheless, it will make for clarity of discussion if we associate the identity theory of truth, more narrowly, with something along the lines of (5) or (7), and omit (8) from this particular discussion.[2] That will be the policy here. Whether or not (6) is truistic, both (5) and (7) involve technical or semi-technical vocabulary; moreover, they have been advanced as moves in a technical debate, namely one concerning the viability of the correspondence theory of truth. For these reasons it seems difficult to regard them as truisms (see Dodd 2008a: 179). What (5) and (7) mean, and which of them one will prefer as one’s statement of the identity theory of truth, if one is favorably disposed to that theory—one may of course be happy with both—will depend, among other things, on what exactly one thinks about the nature of such entities as that Socrates is wise. In order to get clear on this point, discussion of the identity theory has naturally been conducted in the context of the Fregean semantical hierarchy, which distinguishes between levels of language, sense, and reference. Frege recognized what he called “thoughts” (Gedanken) at the level of sense corresponding to (presented by) declarative sentences at the level of language. McDowell’s thinkables are meant to be Fregean thoughts: the change of terminology is intended to stress the fact that these entities are not thoughts in the sense of dated and perhaps spatially located individual occurrences (thinking events), but are abstract contents that are at least in principle available to be grasped by different thinkers at different times and places. So a Fregean identity theory of truth would regard both such entities as that Socrates is wise and, correlatively, facts as sense-level entities: this kind of identity theory will then state that true such entities are identical with facts. This approach will naturally favor (7) as its expression of the identity theory. By contrast with Frege, Russell abjured the level of sense and (at least around 1903–4) recognized what, following Moore, he called “propositions” as worldly entities composed of objects and properties. A modern Russellian approach might adopt these propositions—or something like them: the details of Russell’s own conception are quite vague—as the referents of declarative sentences, and identity theorists who followed this line might prefer to take a particular reading of (5) as their slogan. So these Russellians would affirm something along the lines of: by contrast with the Fregean This way of formulating the relevant identity claims has the advantage of suggesting that it would, at least in principle, be open to a theorist to combine (9) and (10) in a hybrid position that (i) departed from Russell and followed Frege by admitting both a level of Fregean sense and one of reference, and also, having admitted both levels to the semantic hierarchy, (ii) both located Fregean thoughts at the level of sense and located Russellian propositions at the level of reference. Sense being mode of presentation of reference, the idea would be that declarative sentences refer, via Fregean thoughts, to Russellian propositions (for this disposition, see Gaskin 2006: 203–20; 2008: 56–127). So someone adopting this hybrid approach would affirm both (9) and (10). Of course, the facts mentioned in (9) would be categorially different from the facts mentioned in (10), and one might choose to avoid confusion by distinguishing them terminologically, and perhaps also by privileging one set of facts, ontologically, over the other. If one wanted to follow this privileging strategy, one might say, for instance, that only reference-level facts were genuine facts, the relata of the identity relation at the level of sense being merely fact-like entities, not bona fide facts. That would be to give the combination of (9) and (10) a Russellian spin. Alternatively, someone who took the hybrid line might prefer to give it a Fregean spin, saying that the entities with which true Fregean thoughts were identical were the genuine facts, and that the corresponding entities at the level of reference that true Russellian propositions were identical with were not facts as such, but fact-like correlates of the genuine facts. Without more detail, of course, these privileging strategies leave the status of the entities they are treating as merely fact-like unclear; and, as far as the Fregean version of the identity theory goes, commentators who identify facts with sense-level Fregean thoughts usually, as we shall see, repudiate reference-level Russellian propositions altogether, rather than merely downgrading their ontological status, and so affirm (10) but reject (9). We shall return to these issues in §4 below. 2. Historical Background The expression “the identity theory of truth” was first used—or, at any rate, first used in the relevant sense—by Stewart Candlish in an article on F. H. Bradley published in 1989. But the general idea of the theory had been in the air during the 1980s: for example, in a discussion first published in 1985, concerning John Mackie’s theory of truth, McDowell criticized that theory for making truth consist in a relation of correspondence (rather than identity) between how things are and how things are represented as being. (1985 [1998: 137 n. 21]) The implication is that identity would be the right way to conceive the given relation. And versions of the identity theory go back at least to Bradley (see, e.g., Bradley 1914: 112–13; for further discussion and references, see Candlish 1989; 1995; 1999b: 209–12; T. Baldwin 1991: 36–40), and to the founding fathers of the analytic tradition (Sullivan 2005: 56–7 n. 4). The theory can be found in G. E. Moore’s “The Nature of Judgment” (1899), and in the entry he wrote on “Truth” for J. Baldwin’s Dictionary of Philosophy and Psychology (1902–3; reprinted Moore 1993: 4–8, 20–1; see T. Baldwin 1991: 40–3). Russell embraced the identity theory at least during the period of his 1904 discussions of Meinong (see, e.g., 1973: 75), possibly also in his The Principles of Mathematics of 1903, and for a few years after these publications as well (see T. Baldwin 1991: 44–8; Candlish 1999a: 234; 1999b: 206–9). Frege has a statement of the theory in his 1919 essay “The Thought”, and may have held it earlier (Frege 1918–19: 74 [1977: 25]; see Hornsby 1997: 4–6; Milne 2010: 467–8). Wittgenstein’s Tractatus (1922) is usually held to propound a correspondence rather than an identity theory of truth; however this is questionable. In the Tractatus, declarative sentences (Sätze) are said to be facts (arrangements of names), and states of affairs (Sachlagen, Sachverhalte, Tatsachen) are also said to be facts (arrangements of objects). If the Tractatus is taken to put forward a correspondence theory of truth, then presumably the idea is that a sentence will be true just if there is an appropriate relation of correspondence (an isomorphism) between sentence and state of affairs. However, the problem with this interpretation is that, in the Tractatus, a relation of isomorphism between a sentence and reality is generally conceived as a condition of the meaningfulness of that sentence, not specifically of its truth. False sentences, as well as true, are isomorphic with states of affairs—only, in their case the states of affairs do not obtain. For Wittgenstein, states of affairs may either obtain or fail to obtain—both possibilities are, in general, available to them.[3] Correlatively, it has been suggested that the Tractatus contains two different conceptions of fact, a factive and a non-factive one. According to the former conception, facts necessarily obtain or are the case; according to the latter, facts may fail to obtain or not be the case. This non-factive conception has been discerned at Tractatus 1.2–1.21, and at 2.1 (see Johnston 2013: 382). Given that, in the Tractatus, states of affairs (and perhaps facts) have two poles—obtaining or being the case, and non-obtaining or not being the case—it seems to follow that, while Wittgenstein is committed to a correspondence theory of meaning, his theory of truth must be (some version of) an identity theory, along the lines of A declarative sentence is true just if what it is semantically correlated with is identical with an obtaining state of affairs (a factive fact). (Identity theorists normally presuppose the factive conception of facts, so that “factive” is redundant in the phrase “factive facts”, and that is the policy which will be followed here.) Though a bipolar conception of facts (if indeed Wittgenstein has it) may seem odd, the bipolar conception of states of affairs (which, it is generally agreed, he does have) seems quite natural: here the identity theorist says that a true proposition is identical with an obtaining state of affairs (see Candlish & Damnjanovic 2018: 271–2). Peter Sullivan has suggested a different way of imputing an identity theory to the Tractarian Wittgenstein (2005: 58–9). His idea is that Wittgenstein’s simple objects are to be identified with Fregean senses, and that in effect the Tractatus contains an identity theory along the lines of (7) or (10). Sullivan’s ground for treating Tractarian objects as senses is that, like bona fide Fregean senses, they are transparent: they cannot be grasped in different ways. An apparent difficulty with this view is that there is plausibly more to Fregean sense than just the property of transparency: after all, Russell also attached the property of transparency to his basic objects, but it has not been suggested that Russellian basic objects are really senses, and the suggestion would seem to have little going for it (partly, though not only, because Russell himself disavowed the whole idea of Fregean sense).The orthodox position, which will be presupposed here, is that the Tractarian Wittgenstein like Russell, finds no use for a level of Fregean sense, so that his semantical hierarchy consists exclusively of levels of language and reference, with nothing of a mediatory or similar nature located between these levels. (Wittgenstein does appeal to the concepts of sense and reference in the Tractatus, but it is generally agreed that they do not figure in a Fregean way, according to which both names and sentences, for example, have both sense and reference; for Wittgenstein, by contrast, sentences have sense but not reference, whereas names have reference but not sense.) 3. Motivation What motivates the identity theory of truth? It can be viewed as a response to difficulties that seem to accrue to at least some versions of the correspondence theory (cf. Dodd 2008a: 120, 124). The correspondence theory of truth holds that truth consists in a relation of correspondence between something linguistic or quasi-linguistic, on the one hand, and something worldly on the other. Generally, the items on the worldly end of the relation are taken to be facts or (obtaining) states of affairs. For many purposes these two latter kinds of entity (facts, obtaining states of affairs) are assimilated to one another, and that strategy will be followed here. The exact nature of the correspondence theory will then depend on what the other relatum is taken to be. The items mentioned so far make available three distinct versions of the correspondence theory, depending on whether this relatum is taken to consist of declarative sentences, Fregean thoughts, or Russellian propositions. Modern correspondence theorists make a distinction between truth-bearers, which would typically fall under one of these three classifications, and truth-makers,[4] the worldly entities making truth-bearers true, when they are true. If these latter entities are facts, then true declarative sentences, Fregean thoughts, or Russellian propositions—whichever of these one selects as the relata of the correspondence relation on the language side of the language–world divide—correspond to facts in the sense that facts are what make those sentences, thoughts, or propositions true, when they are true. (Henceforth we shall normally speak simply of thoughts and propositions, understanding these to be Fregean thoughts and Russellian propositions respectively, unless otherwise specified.) That, according to the correspondence theorist (and the identity theorist can agree so far), immediately gives us a constraint on the shape of worldly facts. Take our sample sentence “Socrates is wise”, and recall that this sentence is here assumed to be true. At the level of reference we encounter the object Socrates and (assuming realism about properties)[5] the property of wisdom. Both of these may be taken to be entities in the world, but it is plausible that neither amounts to a fact: neither amounts to a plausible truth-maker for the sentence “Socrates is wise”, or for its expressed thought, or for its expressed proposition. That is because the man Socrates, just as such, and the property of wisdom, just as such, are not, so the argument goes, propositionally structured, either jointly or severally, and so do not amount to enough to make it true that Socrates is wise (cf. D. Armstrong 1997: 115–16; Dodd 2008a: 7; Hofweber 2016: 288). Even if we add in further universals, such as the relation of instantiation, and indeed the instantiation of instantiation to any degree, the basic point seems to be unaffected. In fact it can plausibly be maintained (although some commentators disagree; Merricks 2007: ch. 1, passim, and pp. 82, 117, 168; Asay 2013: 63–4; Jago 2018: passim, e.g., pp. 73, 84, 185, 218, 250, though cf. p. 161) that the man Socrates, just as such, is not even competent to make it true that Socrates exists; for that we need the existence of the man Socrates. Hence, it would appear that, if there are to be truth-makers in the world, they will have to be structured, syntactically or quasi-syntactically, in the same general way as declarative sentences, thoughts, and propositions. For convenience we can refer to structure in this general sense as “propositional structure”: the point then is that neither Socrates, nor the property of wisdom, nor (if we want to adduce it) the relation of instantiation is, just as such, propositionally structured. Following this line of argument through, we reach the conclusion that nothing short of full-blown, propositionally structured entities like the fact that Socrates is wise will be competent to make the sentence “Socrates is wise”, or the thought or proposition expressed by that sentence, true. (A question that arises here is whether tropes might be able to provide a “thinner” alternative to such ontologically “rich” entities as the fact that Socrates is wise. One problem that seems to confront any such strategy is that of making the proposed alternative a genuine one, that is, of construing the relevant tropes in such a way that they do not simply collapse into, or ontologically depend on, entities of the relatively rich form that Socrates is wise. For discussion see Dodd 2008a: 7–9.) The question facing the correspondence theorist is now: if such propositionally structured entities are truth-makers, are they truth-makers for sentences, thoughts, or propositions? It is at this point that the identity theorist finds the correspondence theory unsatisfactory. Consider first the suggestion that the worldly fact that Socrates is wise is the truth-maker for the reference-level proposition that Socrates is wise (see, e.g., Jago 2018: 72–3, and passim). There surely are such facts as the fact that Socrates is wise: we talk about such things all the time. The problem would seem to be not with the existence of such facts, but rather with the relation of correspondence which is said by the version of the correspondence theory that we are currently considering to obtain between the fact that Socrates is wise and the proposition that Socrates is wise. As emerges from this way of expressing the difficulty, there seems to be no linguistic difference between the way we talk about propositions and the way we talk about facts, when these entities are specified by “that” clauses. That suggests that facts just are true propositions. If that is right, then the relation between facts and true propositions is not one of correspondence—which, as Frege famously observed (Frege 1918–19: 60 [1977: 3]; cf. Künne 2003: 8; Milne 2010: 467–8), implies the distinctness of the relata—but identity. This line of argument can be strengthened by noting the following point about explanation. Correspondence theorists have typically wanted the relation of correspondence to explain truth: they have usually wanted to say that it is because the proposition that Socrates is wise corresponds to a fact that it is true, and because the proposition that Socrates is foolish—or rather: It is not the case that Socrates is wise (after all, his merely being foolish is not enough to guarantee that he is not wise, for he might, like James I and VI, be both wise and foolish)—does not correspond to a fact that it is false. But the distance between the true proposition that Socrates is wise and the fact that Socrates is wise seems to be too small to provide for explanatory leverage. Indeed the identity theorist’s claim is that there is no distance at all. Suppose we ask: Why is the proposition that Socrates is wise true? If we reply by saying that it is true because it is a fact that Socrates is wise, we seem to have explained nothing, but merely repeated ourselves (cf. Strawson 1971: 197; Anscombe 2000: 8; Rasmussen 2014: 39–43). So correspondence apparently gives way to identity as the relation which must hold or fail to hold between a proposition and a state of affairs if the proposition is to be true or false: the proposition is true just if it is identical with an obtaining state of affairs and false if it is not (cf. Horwich 1998: 106). And it would seem that, if the identity theorist is right about this disposition, explanatory pretensions will have to be abandoned: for while it will be correct to say that a proposition is true just if it is identical with a fact, false otherwise, it is hard to see that much of substance has thereby been said about truth (cf. Hornsby 1997; 2; Dodd 2008a; 135). It might be replied here that there are circumstances in which we tolerate statements of the form “A because B” when an appropriate identity—perhaps even identity of sense, or reference, or both—obtains between “A” and “B”. For example, we say things like “He is your first cousin because he is a child of a sibling of one of your parents” (Künne 2003: 155). But here it is plausible that there is a definitional connection between left-hand side and right-hand side, which seems not to hold of The proposition that Socrates is wise is true because it is a fact that Socrates is wise. In the latter case there is surely no question of definition; rather, we are supposed, according to the correspondence theorist, to have an example of metaphysical explanation, and that is just what, according to the identity theorist, we do not have. After all, the identity theorist will insist, it seems obvious that the relation, whatever it is, between the proposition that Socrates is wise and the fact that Socrates is wise must, given that the proposition is true, be an extremely close one: what could this relation be? If the identity theorist is right that the relation cannot be one of metaphysical explanation (in either direction), then it looks as though it will be hard to resist the insinuation of the linguistic data that the relation is one of identity. It is for this reason that identity theorists sometimes insist that their position should not be defined in terms of an identity between truth-bearer and truth-maker: that way of expressing the theory looks too much in thrall to correspondence theorists’ talk (cf. Candlish 1999b: 200–1, 213). For the identity theorist, to speak of both truth-makers and truth-bearers would imply that the things allegedly doing the truth-making were distinct from the things that were made true. But, since in the identity theorist’s view there are no truth-makers distinct from truth-bearers, if the latter are conceived as propositions, and since nothing can make itself true, it follows that there are no truth-makers simpliciter, only truth-bearers. It seems to follow, too, that it would be ill-advised to attack the identity theory by pointing out that some (or all) truths lack truth-makers (so Merricks 2007: 181): so long as truths are taken to be propositions, that is exactly what identity theorists themselves say. From the identity theorist’s point of view, truth-maker theory looks very much like an exercise in splitting the level of reference in half and then finding a bogus match between the two halves (see McDowell 1998: 137 n. 21; Gaskin 2006: 203; 2008: 119–27). For example, when David Armstrong remarks that What is needed is something in the world which ensures that a is F, some truth-maker or ontological ground for a’s being F. What can this be except the state of affairs of a’s being F? (1991: 190) the identity theorist is likely to retort that a’s being F, which according to Armstrong “ensures” that a is F,just is the entity (whatever it is) that a is F. The identity theorist maps conceptual connections that we draw between the notions of proposition, truth, falsity, state of affairs, and fact. These connections look trivial, when spelt out—of course, an identity theorist will counter that to go further would be to fall into error—so that to speak of an identity theory can readily appear too grand (McDowell 2005: 83; 2007: 352. But cf. David 2002: 126). So much for the thesis that facts are truth-makers and propositions truth-bearers; an exactly parallel argument applies to the version of the correspondence theory that treats facts as truth-makers and thoughts as truth-bearers. Consider now the suggestion that obtaining states of affairs, as the correspondence theorist conceives them, make declarative sentences (as opposed to propositions) true (cf. Horwich 1998: 106–7). In this case there appears to be no threat of triviality of the sort that apparently plagued the previous version of the correspondence theory, because states of affairs like that Socrates is wise are genuinely distinct from linguistic items such as the sentence “Socrates is wise”. To that extent friends of the identity theory need not jib at the suggestion that such sentences have worldly truth-makers, if that is how the relation of correspondence is being glossed. But they might question the appropriateness of the gloss. For, they might point out, it does not seem possible, without falsification, to draw detailed links between sentences and bits of the world. After all, different sentences in the same or different languages can “correspond” to the same bit of the world, and these different sentences might have very different (numbers of) components. The English sentence “There are cows” contains three words: are there then three bits in the world corresponding to this sentence, and making it true? (cf. Neale 2001: 177). The sentence “Cows exist” contains only two words, but would not the correspondence theorist want to say that it was made true by the same chunk of reality? And when we take other languages into account, there seems in principle to be no reason to privilege any particular number and say that a sentence corresponding to the relevant segment of reality must contain that number of words: why might there not, in principle, be sentences of actual or possible languages such that, for any n ≥ 1, there existed a sentence comprising n words and meaning the same as the English “There are cows”? (In fact, is English not already such a language? Just prefix and then iterate ad lib. a vacuous operator like “Really”.) In a nutshell, then, the identity theorist’s case against the correspondence theory is that, when the truth-making relation is conceived as originating in a worldly fact (or similar) and having as its other relatum a true sentence, the claim that this relation is one of correspondence cannot be made out; if, on the other hand, the relevant relation targets a proposition (or thought), then that relation must be held to be one of identity, not correspondence. 4. Identity, Sense, and Reference Identity theorists are agreed that, in the case of any particular relevant identity, a fact will constitute the worldly relatum of the relation, but there is significant disagreement among them on the question what the item on the other end of the relation is—whether a thought or a proposition (or both). As we have seen, there are three possible positions here: (i) one which places the identity relation exclusively between true thoughts and facts, (ii) one which places it exclusively between true propositions and facts, and (iii) a hybrid position which allows identities of both sorts (identities obtaining at the level of sense will of course be quite distinct from identities obtaining at the level of reference). Which of these positions an identity theorist adopts will depend on wider metaphysical and linguistic considerations that are strictly extraneous to the identity theory as such. Identity theorists who favor (i) generally do so because they want to have nothing to do with propositions as such. That is to say, such theorists eschew propositions as reference-level entities: of course the word “proposition” may be, and sometimes is, applied to Fregean thoughts at the level of sense, rather than to Russellian propositions at the level of reference. For example, Hornsby (1997: 2–3) uses “proposition” and “thinkable” interchangeably. So far, this terminological policy might be considered neutral with respect to the location of propositions and thinkables in the Fregean semantic hierarchy: that is to say, if one encounters a theorist who talks about “thinkables” and “propositions”, even identifying them, one does not, just so far, know where in the semantic hierarchy this theorist places these entities. In particular, we cannot assume, unless we are specifically told so, that our theorist locates either propositions or thinkables at the level of sense. After all, someone who houses propositions at the level of reference holds that these reference-level entities are thinkable, in the sense that they are graspable in thought (perhaps via thoughts at the level of sense). But they are not thinkables if this latter word is taken, as it is by McDowell and Hornsby, to be a technical term referring to entities at the level of sense. For clarity the policy here will be to continue to apply the word “proposition” exclusively to Russellian propositions at the level of reference. Such propositions, it is plausible to suppose, can be grasped in thought, but by definition they are not thoughts or thinkables, where these two latter terms have, respectively, their Fregean and McDowellian meanings. It is worth noting that this point, though superficially a merely terminological one, engages significantly with the interface between the philosophies of language and mind that was touched on in the opening paragraph. Anyone who holds that reference-level propositions can, in the ordinary sense, be thought—are thinkable—is likely to be unsatisfied with any terminology that seems to limit the domain of the thinkable and of what is thought to the level of sense (On this point see further below in this section, and Gaskin 2020: 101–2). Usually, as has been noted, identity theorists who favor (i) above have this preference because they repudiate propositions as that term is being employed here: that is, they repudiate propositionally structured reference-level entities. There are several reasons why such identity theorists feel uncomfortable with propositions when these are understood to be reference-level entities. There is a fear that such propositions, if they existed, would have to be construed as truth-makers; and identity theorists, as we have seen, want to have nothing to do with truth-makers (Dodd 2008a: 112). That fear could perhaps be defused if facts were also located at the level of reference for true propositions to be identical with. This move would take us to an identity theory in the style of (ii) or (iii) above. Another reason for suspicion of reference-level propositions is that commentators often follow Russell in his post-1904 aversion specifically to false objectives, that is, to false propositions in re (Russell 1966: 152; Cartwright 1987: 79–84). Such entities are often regarded as too absurd to take seriously as components of reality (so T. Baldwin 1991: 46; Dodd 1995: 163; 1996; 2008a: 66–70, 113–14, 162–6). More especially, it has been argued that false propositions in re could not be unities, that the price of unifying a proposition at the level of reference would be to make it true: if this point were correct it would arguably constitute a reductio ad absurdum of the whole idea of reference-level propositions, since it is plausible to suppose that if there cannot be false reference-level propositions, there cannot be true ones either (see Dodd 2008a: 165). If, on the other hand, one is happy with the existence of propositions in re or reference-level propositions, both true and false,[6] one is likely to favor an identity theory in the style of (ii) or (iii). And, once one has got as far as jettisoning (i) and deciding between (ii) and (iii), there must surely be a good case for adopting (iii): for if one has admitted propositionally structured entities both at the level of sense (as senses of declarative sentences) and at the level of reference (propositions), there seems no good reason not to be maximally liberal in allowing identities between entities of these two types and, respectively, sense- and reference-level kinds of fact (or fact-like entities). Against what was suggested above about Frege (§2), it has been objected that Frege could not have held an identity theory of truth (Baldwin 1991: 43); the idea here is that, even if he had acknowledged states of affairs as bona fide elements of reality, Frege could not have identified true thoughts with them on pain of confusing the levels of sense and reference. As far as the exegetical issue is concerned, the objection might be said to overlook the possibility that Frege identified true thoughts with facts construed as sense-level entities, rather than with states of affairs taken as reference-level entities; and, as we have noted, Frege does indeed appear to have done just this (see Dodd & Hornsby 1992). Still, the objection raises an important theoretical issue. It would surely be a serious confusion to try to construct an identity across the categorial division separating sense and reference, in particular to attempt to identify true Fregean thoughts with reference-level facts or states of affairs.[7] It has been suggested that McDowell and Hornsby are guilty of this confusion;[8] they have each rejected the charge,[9] insisting that, for them, facts are not reference-level entities, but are, like Fregean thoughts, sense-level entities.[10] But, if one adheres to the Fregean version of the identity theory ((i) above), which identifies true thoughts with facts located at the level of sense, and admits no correlative identity, in addition, connecting true propositions located at the level of reference with facts or fact-like entities also located at that level, it looks as though one faces a difficult dilemma. At what level in the semantical hierarchy is the world to be placed? Suppose first one puts it at the level of reference (this appears to be Dodd’s favored view: see 2008a: 180–1, and passim). In that case the world will contain no facts or propositions, but just objects and properties hanging loose in splendid isolation from one another, a dispensation which looks like a version of Kantian transcendental idealism. (Simply insisting that the properties include not merely monadic but also polyadic ones, such as the relation of instantiation, will not in itself solve the problem: we will still just have a bunch of separate objects, properties, and relations.) If there are no true propositions—no facts—or even false propositions to be found at the level of reference, but if also, notwithstanding that absence, the world is located there, the objects it contains will, it seems, have to be conceived as bare objects, not as things of certain sorts. Some philosophers of a nominalistic bias might be happy with this upshot; but the problem is how to make sense of the idea of a bare object—that is, an object not characterized by any properties. (Properties not instantiated by any objects, by contrast, will not be problematic, at least not for a realist.) So suppose, on the other hand, that one places the world at the level of sense, on the grounds that the world is composed of facts, and that that is where facts are located. This ontological dispensation is explicitly embraced by McDowell (1996: 179). The problem with this way out of the dilemma would seem to be that, since Fregean senses are constitutively modes of presentation of referents, the strategy under current consideration would take the world to be made up of modes of presentation—but of what? Of objects and properties? These are certainly reference-level entities, but if they are presented by items in the realm of sense, which is being identified on this approach with the world, then again, as on the first horn of the dilemma, they would appear to be condemned to an existence at the level of reference in splendid isolation from one another, rather than in propositionally structured combinations, so that once more we would seem to be committed to a form of Kantian transcendental idealism (see Suhm, Wagemann, & Wessels 2000: 32; Sullivan 2005: 59–61; Gaskin 2006:199–203). Both ways out of the dilemma appear to have this unattractive consequence. The only difference between those ways concerns where exactly in the semantic hierarchy we locate the world; but it is plausible that that issue, in itself, is or ought to be of less concern to metaphysicians than the requirement to avoid divorcing objects from the properties that make those objects things of certain sorts; and both ways out of the dilemma appear to flout this requirement. To respect the requirement, we need to nest reference-level objects and properties in propositions, or proposition-like structures, also located at the level of reference. And then some of these structured reference-level entities—the true or obtaining ones—will, it seems, be facts, or at least fact-like. Furthermore, once one acknowledges the existence of facts, or fact-like entities, existing at the level of sense, it seems in any case impossible to prevent the automatic generation of facts, or fact-like entities, residing at the level of reference. For sense is mode of presentation of reference. So we need reference-level facts or fact-like entities to be what sense-level facts or fact-like entities present. One has to decide how to treat these variously housed fact-like entities theoretically. If one were to insist that the sense-level fact-like entities were the genuine and only facts, the corresponding reference-level entities would be no better than fact-like, and contrariwise. But, regardless whether the propositionally structured entities automatically generated in this way by sense-level propositionally structured entities are to be thought of as proper facts or merely as fact-like entities, it would seem perverse not to identify the world with these entities.[11] For to insist on continuing to identify the world with sense-level rather than reference-level propositionally structured entities would seem to fly in the face of a requirement to regard the world as maximally objective and maximally non-perspectival. McDowell himself hopes to avert any charge of embracing an unacceptable idealism consequent on his location of the world at the level of sense by relying on the point that senses present their references directly, not descriptively, so that reference is, as it were, contained in sense (1996: 179–80). To this it might be objected that the requirement of maximal objectivity forces an identification of the world with the contained, not the containing, entities in this scenario, which in turn seems to force the upshot—if the threat of Kantian transcendental idealism is really to be obviated—that the contained entities be propositionally structured as such, that is, as contained entities, and not simply in virtue of being contained in propositionally structured containing entities. (For a different objection to McDowell, see Sullivan 2005: 60 n. 6.) 5. Difficulties with the Theory and Possible Solutions 5.1 The modal problem G. E. Moore drew attention to a point that might look (and has been held to be) problematic for the identity theory (Moore 1953: 308; Fine 1982: 46–7; Künne 2003: 9–10). The proposition that Socrates is wise exists in all possible worlds where Socrates and the property of wisdom exist, but in some of those worlds this proposition is true and in others it is false. The fact that Socrates is wise, by contrast, only exists in those worlds where the proposition both exists and is true. So it would seem that the proposition that Socrates is wise cannot be identical with the fact that Socrates is wise. They have different modal properties, and so by the principle of the indiscernibility of identicals they cannot be identical. Note, first, that this problem, if it is a problem, has nothing especially to do with the identity theory of truth or with facts. It seems to arise already for true propositions and propositions taken simpliciter before ever we get to the topic of facts. That is, one might think that the proposition that Socrates is wise is identical with the true proposition that Socrates is wise (assuming, as we are doing, that this proposition is true); but we then face the objection that the proposition taken simpliciter and the true proposition differ in their modal properties, since (as one might suppose) the true proposition that Socrates is wise does not exist at worlds where the proposition that Socrates is wise is false, but the proposition taken simpliciter does. Indeed the problem, if it is a problem, is still more general, and purported solutions to it go back at least to the Middle Ages (when it was discussed in connection with Duns Scotus’ formal distinction; see Gaskin 2002 [with references to further relevant literature]). Suppose that Socrates is a cantankerous old curmudgeon. Now grumpy Socrates, one would think, is identical with Socrates. But in some other possible worlds Socrates is of a sunny and genial disposition. So it would seem that Socrates cannot be identical with grumpy Socrates after all, because in these other possible worlds, while Socrates goes on existing, grumpy Socrates does not exist—or so one might argue. Can the identity theorist deal with this problem, and if so how? Here is one suggestion. Suppose we hold, staying with grumpy Socrates for a moment, that, against the assumption made at the end of the last paragraph, grumpy Socrates does in fact exist in worlds where Socrates has a sunny disposition. The basis for this move would be the thought that, after all, grumpy Socrates is identical with Socrates, and Socrates exists in these other worlds. So grumpy Socrates exists in those worlds too; it is just that he is not grumpy in those worlds. (Suppose Socrates is very grumpy; suppose in fact that grumpiness is so deeply ingrained in his character that worlds in which he is genial are quite far away. Someone surveying the array of possible worlds, starting from the actual world and moving out in circles, and stumbling at long last upon a world with a pleasant Socrates in it, might register the discovery by exclaiming, with relief, “Oh look! Grumpy Socrates is not grumpy over here!”.) Similarly, one might contend, the true proposition, and fact, that Socrates is wise goes on existing in the worlds where Socrates is not wise, because the true proposition, and fact, that Socrates is wise just is the proposition that Socrates is wise, and that proposition goes on existing in these other worlds, but in those worlds that true proposition, and fact, is not a true proposition, or a fact. (In Scotist terms one might say that the proposition that Socrates is wise and the fact that Socrates is wise are really identical but formally distinct.) This solution was, in outline, proposed by Richard Cartwright in his 1987 discussion of the identity theory (Cartwright 1987: 76–8; cf. David 2002: 128–9; Dodd 2008a: 86–8; Candlish & Damnjanovic 2018: 265–6). According to Cartwright, the true proposition, and fact, that there are subways in Boston exists in other possible worlds where Boston does not have subways, even though in those worlds that fact would be not be a fact. (Compare: grumpy Socrates exists in worlds where Socrates is genial and sunny, but he is not grumpy there.) So even in worlds where it is not a fact that Boston has subways, that fact, namely the fact that Boston has subways, continues to exist. Cartwright embellishes his solution with two controversial points. First, he draws on Kripke’s distinction between rigid and non-rigid designation, suggesting that his solution can be described by saying that the expression “The fact that Boston has subways” is a non-rigid designator. But it is plausible that that expression goes on referring to, or being satisfied by (depending on how exactly one wants to set up the semantics of definite descriptions: see Gaskin 2008: 56–81), the fact that Boston has subways in possible worlds where Boston does not have subways; it is just that, though that fact exists in those worlds, it is not a fact there. But that upshot does not appear to derogate from the rigidity of the expression in question. Secondly, Cartwright allows for a true reading of “The fact that there are subways in Boston might not have been the fact that there are subways in Boston”. But it is arguable that we should say that this sentence is just false (David 2002: 129). The fact that there are subways in Boston would still have gone on being the same fact in worlds where Boston has no subways, namely the fact that there are subways in Boston; it is just that in those worlds this fact would not have been a fact. You might say: in that world the fact that there are subways in Boston would not be correctly described as a fact, but in talking about that world we are talking about it from the point of view of our world, and in our world it is a fact. (Similarly with grumpy Socrates.) Now, an objector may want to press the following point against the above purported solution to the difficulty. Consider again the fact that Socrates is wise. Surely, it might be said, it is more natural to maintain that that fact does not exist in a possible world where Socrates is not wise, rather than that it exists there all right, but is not a fact. After all, imagine a conversation about a world in which Socrates is not wise and suppose that Speaker A claims that Socrates is indeed wise in that world. Speaker B might counter with No, sorry, you’re wrong: there is no such fact in that world; the purported fact that Socrates is wise simply does not exist in that world. It might seem odd to insist that B is not allowed to say this and must say instead Yes, you’re right that there is such a fact in that world, namely the fact that Socrates is wise, but in that world that fact is not a fact;. How might the identity theorist respond to this objection? One possible strategy would be to make a distinction between fact and factuality, as follows. Factuality, one might say, is a reification of facts. Once you have a fact, you also get, as an ontological spin-off, the factuality of that fact. The fact, being a proposition, exists at all possible worlds where the proposition exists, though in some of these worlds it may not be a fact: it will not be a fact in worlds where the proposition is false. The factuality of that fact, by contrast, only exists at those worlds where the fact is a fact—where the proposition is true. So factuality is a bit like a trope. Compare grumpy Socrates again. Grumpy Socrates, the identity theorist might contend, exists at all worlds where Socrates exists, though at some of those worlds he is not grumpy. But Socrates’ grumpiness—that particular trope—exists only at worlds where Socrates is grumpy. That seems to obviate the problem, because the suggestion being canvassed here is that grumpy Socrates is identical not with Socrates’ grumpiness—so that the fact that these two entities have different modal properties need embarrass no one—but rather with Socrates. Similarly, the suggestion is that the proposition that Socrates is wise is identical not with the factuality of the fact that Socrates is wise, but just with that fact. So the identity theorist would accommodate the objector’s point by insisting that facts exist at possible worlds where their factualities do not exist. The reader may be wondering why this problem was ever raised against the identity theory of truth in the first place. After all, the identity theorist does not say that propositions simpliciter are identical with facts, but that true propositions are identical with facts, and now true propositions and facts surely have exactly the same modal properties: for regardless how things are with the sheer proposition that Socrates is wise, at any rate the true proposition that Socrates is wise must surely be thought to exist at the same worlds as the fact that Socrates is wise, whatever those worlds are. However, as against this quick way with the purported problem, there stands the intuition, mentioned and exploited above, that the true proposition that Socrates is wise is identical with the proposition that Socrates is wise. So long as that intuition is in play, the problem does indeed seem to arise—for true propositions, in the first instance, and then for facts by transitivity of identity. But the identity theorist will maintain that, as explained, the problem has a satisfactory solution. 5.2 The “right fact” problem Candlish, following Cartwright, has urged that the identity theory of truth is faced with the difficulty of getting hold of the “right fact” (Cartwright 1987: 74–5; Candlish 1999a: 238–9; 1999b: 202–4). Consider a version of the identity theory that states: Candlish’s objection is now that (11) does not specify which fact has to be identical with the proposition for the proposition to be true. But what the identity theory requires is not that a true proposition be identical with some fact or other, it is that it be identical with the right fact. (1999b: 203) In another paper Candlish puts the matter like this: But after all, any proposition might be identical with some fact or other (and there are reasons identified in the Tractatus for supposing that all propositions are themselves facts), and so all might be true. What the identity theory needs to capture is the idea that it is by virtue of being identical with the appropriate fact that a proposition is true. (1999a: 239) The reference to the Tractatus is suggestive. Of course, it might be objected that the Tractatus does not have propositions in the sense of that word figuring here: that is, it does not recognize Russellian propositions (propositions at the level of reference). Nor indeed does it appear to recognize Fregean thoughts. In the Tractatus, as we have noted (§2), declarative sentences (Sätze) are facts (arrangements of names), and states of affairs (Sachlagen, Sachverhalte, Tatsachen) are also facts (arrangements of objects). Even so, Candlish’s allusion to the Tractatus reminds us that propositions (in our sense) are Tractarian inasmuch as they are structured arrangements of entities, namely objects and properties. (Correlatively, thoughts are structured arrangements of senses.) False propositions (and false thoughts) will equally be arrangements of objects and properties (respectively, senses). So the difficulty that Cartwright and Candlish have identified can be put like this. Plausibly any proposition, whether or not it is true, is identical with some fact or other given that a proposition is an arrangement of entities of the appropriate sort. But if propositions just are facts, then every proposition is identical with some fact—at the very least, with itself—whether it is true or false. So the right-to-left direction of (11) looks incorrect. J. C. Beall (2000) attempts to dissolve this problem on the identity theorist’s behalf by invoking the principle of the indiscernibility of identicals. His proposal works as follows. If we ask, in respect of (11), what the “right” fact is, it seems that we can answer that the “right” fact must at least have the property of being identical with the proposition that p, and the indiscernibility principle then guarantees that there is only one such fact. This proposal is open to an obvious retort. Suppose that the proposition that p is false. That proposition will still be identical with itself, and if we are saying (in Wittgensteinian spirit) that propositions are facts, then that proposition will be identical with at least one fact, namely itself. So it will satisfy the right-hand side of (11), its falsity notwithstanding. But reflection on this retort suggests a patch-up to Beall’s proposal: why not say that the right fact is the fact that p? We would then be able to gloss (11) with Falsity, it seems, now no longer presents a difficulty, because if it is false that p then it is not a fact that p, so that (a) fails, and there is no appropriate candidate for the proposition that p to be identical with.[13] Notice that, in view of the considerations already aired in connection with the modal problem ((i) of this section), caution is here required. Suppose that it is true that p in the actual world, but false in some other possible world. According to the strategy that we have been considering on the identity theorist’s behalf, it would be wrong to say that, in the possible world where it is false that p, there is no such fact as the fact that p. The strategy has it that there is indeed such a fact, because it is (in the actual world) a fact that p, and that fact, and the true proposition, that p, go on existing in the possible world where it is false that p; it is just that that fact is not a fact in that possible world. But (12), the identity theorist will maintain, deals with this subtlety. In the possible world we are considering, where it is false that p, though the fact that p exists, it is not a fact that p, so (a) fails, and there is accordingly no risk of our getting hold of the “wrong” fact. Note also that if a Wittgensteinian line is adopted, while the (false) proposition that p will admittedly be identical with a fact—at the very least with itself—it will be possible, given the failure of (a), for the identity theorist to contend with a clear conscience that that fact is the wrong fact, which does not suffice to render the proposition true. 5.3 The “slingshot” problem If the notorious “slingshot” argument worked, it would pose a problem for the identity theory of truth. The argument exists in a number of different, though related, forms, and this is not the place to explore all of these in detail.[14] Here we shall look briefly at what is one of the simplest and most familiar versions of the argument, namely Davidson’s. This version of the argument aims to show that if true declarative sentences refer to anything (for example to propositions or facts), then they all refer to the same thing (to the “Great Proposition”, or to the “Great Fact”). This upshot would be unacceptable to an identity theorist of a Russellian cast, who thinks that declarative sentences refer to propositions, and that true such propositions are identical with facts: any such theorist is naturally going to want to insist that the propositions referred to by different declarative sentences are, at least in general, distinct from one another, and likewise that the facts with which distinct true propositions are identical are also distinct from one another. Davidson expresses the problem that the slingshot argument purportedly throws up as follows: The difficulty follows upon making two reasonable assumptions: that logically equivalent singular terms have the same reference; and that a singular term does not change its reference if a contained singular term is replaced by another with the same reference. But now suppose that “R” and “S” abbreviate any two sentences alike in truth value. (1984: 19) He then argues that the following four sentences have the same reference: (The hat over a variable symbolizes the description operator: so “\(\hat{z}\)” means the \(z\) such that …) This is because (13) and (14) are logically equivalent, as are (15) and (16), while the only difference between (14) and (15) is that (14) contains the expression (Davidson calls it a “singular term”) “\(\hat{z} (z\! =\! z \amp R)\)” whereas (15) contains “\(\hat{z} (z\! =\! z \amp S)\)”, and these refer to the same thing if S and R are alike in truth value. Hence any two sentences have the same reference if they have the same truth value. (1984: 19) The difficulty with this argument, as a number of writers have pointed out (see, e.g., Yourgrau 1987; Gaskin 1997: 153 n. 17; Künne 2003: 133–41), and the place where the identity theorist is likely to raise a cavil, lies in the first assumption on which it depends. Davidson calls this assumption “reasonable”, but it has been widely questioned. It states “that logically equivalent singular terms have the same reference”. But intuitively, the ideas of logical equivalence and reference seem to be quite distinct, indeed to have, as such, little to do with one another, so that it would be odd if there were some a priori reason why the assumption had to hold. And it is not difficult to think of apparent counterexamples: the sentence “It is raining” is logically equivalent to the sentence “It is raining and (either Pluto is larger than Mercury or it is not the case that Pluto is larger than Mercury)”, but the latter sentence seems to carry a referential payload that the former does not. Of course, if declarative sentences refer to truth-values, as Frege thought, then the two sentences will indeed be co-referential, but to assume that sentences refer to truth-values would be question-begging in the context of an argument designed to establish that all true sentences refer to the same thing. 5.4 The congruence problem A further objection to the identity theory, going back to an observation of Strawson’s, takes its cue from the point that canonical names of propositions and of facts are often not straightforwardly congruent with one another: they are often not intersubstitutable salva congruitate (or, if they are, they may not be intersubstitutable salva veritate) (Strawson 1971: 196; cf. Künne 2003: 10–12). For example, we say that propositions are true, not that they obtain, whereas we say that facts obtain, not that they are true. How serious is this point? The objection in effect presupposes that for two expressions to be co-referential, or satisfied by one and the same thing, they must be syntactically congruent, have the same truth-value potential, and match in terms of general contextual suitability. The assumption of the syntactic congruence of co-referential expressions is controversial, and it may be possible for the identity theorist simply to deny it (see Gaskin 2008: 106–10, for argument on the point, with references to further literature; cf. Dodd 2008a: 83–6.). Whether co-referential expressions must be syntactically congruent depends on one’s conception of reference, a matter that cannot be further pursued here (for discussion see Gaskin 2008: ch. 2; 2020: chs. 3–5). There has been a good deal of discussion in the literature concerning the question whether an identification of facts with true propositions is undermined not specifically by phenomena of syntactic incongruence but rather by failure of relevant intersubstitutions to preserve truth-values (see, e.g., King 2007: ch. 5; King in King, Soames, & Speaks 2014: 64–70, 201–8; Hofweber 2016: 215–23; Candlish & Damnjanovic 2018: 264). The discussion has focused on examples like the following: The problem here is said to be that the substitution of “true proposition” for “fact” or vice versa generates different readings (in particular, readings with different truth-values). Suppose Daniel has to memorize a list of true propositions, of which one is the proposition that this is a leap year. Then it is contended that we can easily imagine a scenario in which (17) and (18) differ in truth-value. Another way of putting the same point might be to say that (17) is equivalent to but that (18) is not equivalent to (21), because—so the argument goes—(18) but not (21) would be true if Daniel had memorized his list of true propositions without realizing that they were true. Similar differences can be argued to apply, mutatis mutandis, to (19) and (20). Can the identity theorist deal with this difficulty? In the first place one might suggest that the alleged mismatch between (17) and (18) is less clear than the objector claims. (17) surely does have a reading like the one that is said to be appropriate for (18). Suppose Daniel has to memorize a list of facts. (17) could then diverge in truth-value from For there is a reading of (17) on which, notwithstanding (17)’s truth, (22) is false: this is the reading on which Daniel has indeed memorized a list of facts, but without necessarily realizing that the things he is memorizing are facts. He has memorized the relevant fact (that this is a leap year), we might say, but not as a fact. That is parallel to the reading of (18) according to which Daniel has memorized the true proposition that this is a leap year, but not as a true proposition. The identity theorist might then aver that, perhaps surprisingly, the same point actually applies to the simple (21), on the grounds that this sentence can mean that Daniel remembers the propositional object that this is a leap year (from a list of such objects, say, that he has been asked to memorize), with no implication that he remembers it either as a proposition or as a fact. So, according to this response, the transparent reading of (18)—which has Daniel remember the propositional object, namely that this is a leap year, but not necessarily remember it as a fact, or even as the propositional object that this is a leap year (he remembers it under some other mode of presentation)—is also available for (17) and for (21). What about the opaque reading of either (17) or (21), which implies that Daniel knows for a fact that this is a leap year—is that reading available for (18) too? The identity theorist might maintain that this reading is indeed available, and then explain why we tend not to use sentences like (18) in the relevant sense, preferring sentences of the form of (17) or (21), on the basis of the relative technicality of the vocabulary of (18). The idea would be that it is just an accident of language that we prefer either (17) or (21) to (18) where what is in question is the sense that implies that Daniel has propositional knowledge that this is a leap year (is acquainted with that fact as a fact), as opposed to having mere acquaintance, under some mode of presentation or other, with the propositional object which happens to be (the fact) that this is a leap year. And if we ask why we prefer (17) or (21) to then the answer will be the Gricean one that (23) conveys less information than (17) or (21), under the reading of these two sentences that we are usually interested in, according to which Daniel remembers the relevant fact as a fact, for (23) is compatible with the falsity of “This is a leap year”. Hence to use (23) in a situation where one was in a position to use (17) or (21) would carry a misleading conversational implicature. That, at any rate, is one possible line for the identity theorist to take. (It is worth noting here that, if the identity theorist is right about this, it will follow that the “know that” construction will be subject to a similar ambiguity as the “remember that” construction, given that remembering is a special case of knowing. That is: “A knows that p” will mean either “A is acquainted with the fact that p, and is acquainted with it as a fact” or merely “A is acquainted with the fact that p, but not necessarily with it as such—either as a fact or even as a propositional object”.) 5.5 The individuation problem It might appear that we individuate propositions more finely than facts: for example, one might argue that the fact that Hesperus is bright is the same fact as the fact that Phosphorus is bright, but that the propositions in question are different (see on this point Künne 2003: 10–12; Candlish & Damnjanovic 2018: 266–7). The identity theorist has a number of strategies in response to this objection. One would be simply to deny it, and maintain that facts are individuated as finely as propositions: if one is a supporter of the Fregean version of the identity theory, this is likely to be one’s response (see, e.g., Dodd 2008a: 90–3). Alternatively, one might respond by saying that, if there is a good point hereabouts, at best it tells only against the Fregean and Russellian versions of the identity theory, not against the hybrid version. The identity theory in the hybrid version can agree that we sometimes think of facts as extensional, reference-level entities and sometimes also individuate propositions or proposition-like entities intensionally. Arguably, these twin points do indeed tell against either a strict Fregean or a strict Russellian version of the identity theory: they tell against the strict Fregean position because, as well as individuating facts intensionally, we also, sometimes, individuate facts extensionally; and they tell against the strict Russellian position because, as well as individuating facts extensionally, we also, sometimes, individuate facts intensionally. But it is plausible that the hybrid version of the identity theory is not touched by the objection, because that version of the theory accommodates propositionally structured and factual entities at both levels of sense and reference, though different sorts of these entities at these different levels—either propositions at the level of sense and correlative proposition-like entities at the level of reference or vice versa, and similarly, mutatis mutandis, for facts and fact-like entities. It will follow, then, for this version of the identity theory, that Fregean thoughts and Russellian propositions are available, if true, to be identical with the factual entities of the appropriate level (sense and reference, respectively), and the individuation problem will not then, it seems, arise. Propositions or propositionally structured entities will be individuated just as finely as we want them to be individuated, and at each level of resolution there will be facts or fact-like entities, individuated to the same resolution, for them to be identical with, if true.[15]

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

Deflationism About Truth

1. Central Themes in Deflationism 1.1 The Equivalence Schema While deflationism can be developed in different ways, it is possible to isolate some central themes emphasized by most philosophers who think of themselves as deflationists. These shared themes pertain to endorsing a kind of metaphysical parsimony and positing a “deflated” …

1. Central Themes in Deflationism 1.1 The Equivalence Schema While deflationism can be developed in different ways, it is possible to isolate some central themes emphasized by most philosophers who think of themselves as deflationists. These shared themes pertain to endorsing a kind of metaphysical parsimony and positing a “deflated” role for what we can call the alethic locutions (most centrally, the expressions ‘true’ and ‘false’) in the instances of what is often called truth-talk. In this section, we will isolate three of these themes. The first, and perhaps most overarching, one has already been mentioned: According to deflationists, there is some strong equivalence between a statement like ‘snow is white’ and a statement like “‘snow is white’ is true,” and this is all that can significantly be said about that application of the notion of truth. We may capture this idea more generally with the help of a schema, what is sometimes called the equivalence schema: In this schema, the angle brackets indicate an appropriate name-forming or nominalizing device, e.g., quotation marks or ‘the proposition that …’, and the occurrences of ‘\(p\)’ are replaced with matching declarative sentences to yield instances of the schema. The equivalence schema is often associated with the formal work of Alfred Tarski (1935 [1956], 1944), which introduced the schema, In the instances of schema (T) (sometimes called “Convention (T)”), the ‘\(X\)’ gets filled in with a name of the sentence that goes in for the ‘\(p\)’, making (T) a version of (ES). Tarski considered (T) to provide a criterion of adequacy for any theory of truth, thereby allowing that there could be more to say about truth than what the instances of the schema cover. Given that, together with the fact that he took the instances of (T) to be contingent, his theory does not qualify as deflationary. By contrast with the Tarskian perspective on (T)/(ES), we can formulate the central theme of deflationism under consideration as the view, roughly, that the instances of (some version of) this schema do capture everything significant that can be said about applications of the notion of truth; in a slogan, the instances of the schema exhaust the notion of truth. Approaches which depart from deflationism don’t disagree that (ES) tells us something about truth; what they (with Tarski) deny is that it is exhaustive, that it tells us the whole truth about truth. Since such approaches add substantive explanations of why the instances of the equivalence schema hold, they are now often called inflationary approaches to truth. Inflationism is the general approach shared by such traditional views as the correspondence theory of truth, coherence theory of truth, pragmatic theory of truth, identity theory of truth, and primitivist theory of truth, These theories all share a collection of connected assumptions about the alethic locutions, the concept of truth, and the property of truth. Inflationary theories all assume that the expression ‘is true’ is a descriptive predicate, expressing an explanatory concept of truth, which determines a substantive property of truth. From that shared set of presuppositions, the various traditional inflationary theories then diverge from one another by providing different accounts of the assumed truth property. On inflationary views, the nature of the truth property explains why the instances of (ES) hold. Deflationary views, by contrast, reject some if not all of the standard assumptions that lead to inflationary theories, resisting at least their move to positing any substantive truth property. Instead, deflationists offer a different understanding of both the concept of truth and the functioning of the alethic locutions. A deflationist will take the instances of (ES) to be “conceptually basic and explanatorily fundamental” (Horwich 1998a, 21, n. 4; 50), or to be direct consequences of how the expression ‘true’ operates (cf. Quine 1970 [1986], Brandom 1988, and Field 1994a). It is important to notice that even among deflationists the equivalence schema may be interpreted in different ways, and this is one way to distinguish different versions of deflationism from one another. One question about (ES) concerns the issue of what instances of the schema are assumed to be about (equivalently: to what the names in instances of (ES) are assumed to refer). According to one view, the instances of this schema are about sentences, where a name for a sentence can be formulated simply by putting quotation marks around it. In other words, for those who hold what might be called a sententialist version of deflationism, the equivalence schema has instances like (1): To make this explicit, we might say that, according to sententialist deflationism, the equivalence schema is: Notice that in this schema, the angle-brackets of (ES) have been replaced by quotation marks. According to those who hold what might be called a propositionalist version of deflationism, by contrast, instances of the equivalence schema are about propositions, where names of propositions are, or can be taken to be, expressions of the form ‘the proposition that \(p\)’, where ‘\(p\)’ is filled in with a declarative sentence. For the propositionalist, in other words, instances of the equivalence schema are properly interpreted not as being about sentences but instead as being about propositions, i.e., as biconditionals like (2) rather than (1): To make this explicit, we might say that, according to propositionalist deflationism, the equivalence schema is: Interpreting the equivalence schema as (ES-sent) rather than as (ES-prop), or vice versa, thus yields different versions of deflationism, sententialist and propositionalist versions, respectively. Another aspect that different readings of (ES) can vary across concerns the nature of the equivalence that its instances assert. On one view, the right-hand side and the left-hand side of such instances are synonymous or analytically equivalent. Thus, for sententialists who endorse this level of equivalence, (1) asserts that, “‘Brutus killed Caesar’ is true” means just what ‘Brutus killed Caesar’ means; while for propositionalists who endorse analytic equivalence, (2) asserts that ‘the proposition that Brutus killed Caesar is true’ means the same as ‘Brutus killed Caesar’. A second view is that the right-hand and left-hand sides of claims such as (1) and (2) are not synonymous but are nonetheless necessarily equivalent; this view maintains that the two sides of each equivalence stand or fall together in every possible world, despite having different meanings. And a third possible view is that claims such as (1) and (2) assert only a material equivalence; this view interprets the ‘if and only if’ in both (1) and (2) as simply the biconditional of classical logic. This tripartite distinction between analytic, necessary, and material equivalence, when combined with the distinction between sententialism and propositionalism, yields six different possible (although not exhaustive) readings of the instances of (ES): While different versions of deflationism can be correlated to some extent with different positions in this chart, some chart positions have also been occupied by more than one version of deflationism. The labels ‘redundancy theory’, ‘disappearance theory’ and ‘no-truth theory’ have been used to apply to analytic versions of deflationism: positions \(\mathbf{A}\) or \(\mathbf{B}\). But there is a sense in which position \(\mathbf{A}\) is also occupied by versions of what is called “disquotationalism” (although the most prominent disquotationalists tend to be leary of the notions of analyticity or synonymy), and what is called “prosententialism” also posits an equivalence of what is said with the left- and right-hand sides of the instances of (ES). The latter version of deflationism, however, does this without making the left-hand sides about sentences named via quotation or about propositions understood as abstract entities. No deflationist has offered an account occupying position \(\mathbf{C}\), \(\mathbf{E}\), or \(\mathbf{F}\) (although the explicit inspiration some disquotationalists have found in Tarski’s work and his deployment of material equivalence might misleadingly suggest position \(\mathbf{E})\). Paul Horwich (1998a) uses the label ‘minimalism’ for a version of propositionalist deflationism that takes the instances of (ES-prop) to involve a necessary equivalence, thereby occupying position \(\mathbf{D}\). To a large extent, philosophers prefer one or another (or none) of the positions in the chart on the basis of their views from other parts of philosophy, typically their views about the philosophy of language and metaphysics. 1.2 The Property of Truth The second theme we will discuss focuses on the fact that when we say, for example, that the proposition that Brutus killed Caesar is true, we seem to be attributing a property to that proposition, namely, the property of being true. Deflationists are typically wary of that claim, insisting either that there is no property of being true at all, or, if there is one, it is of a certain kind, often called “thin” or “insubstantial”. The suggestion that there is no truth property at all is advanced by some philosophers in the deflationary camp; we will look at some examples below. What makes this position difficult to sustain is that ‘is true’ is grammatically speaking a predicate much like ‘is metal’. If one assumes that grammatical predicates such as ‘is metal’ express properties, then, prima facie, the same would seem to go for ‘is true.’ This point is not decisive, however. For one thing, it might be possible to distinguish the grammatical form of claims containing ‘is true’ from their logical form; at the level of logical form, it might be, as prosententialists maintain, that ‘is true’ is not a predicate. For another, nominalists about properties have developed ways of thinking about grammatical predicates according to which these expressions don’t express properties at all. A deflationist might appeal, perhaps selectively, to such proposals, in order to say that ‘is true’, while a predicate, does not express a property. Whatever the ultimate fate of these attempts to say that there is no property of truth may be, a suggestion among certain deflationists has been to concede that there is a truth property but to deny it is a property of a certain kind; in particular to deny that it is (as we will say) a substantive property. To illustrate the general idea, consider (3) and (4): Do the propositions that these sentences express share a property of being true? Well, in one intuitive sense they do: Since they both are true, we might infer that they both have the property of being true. From this point of view, there is a truth property: It is simply the property that all true propositions have. On the other hand, when we say that two things share a property of Fness, we often mean more than simply that they are both \(F\). We often mean that two things that are \(F\) have some underlying nature in common, for example, that there is a common explanation as to why they are both \(F\). It is in this second claim that deflationists have in mind when they say that truth is not a substantive property. Thus, in the case of our example, what, if anything, explains the truth of (3) is that Caracas is the capital of Venezuela, and what explains this is the political history of Venezuela. On the other hand, what, if anything, explains the truth of (4) is that the earth revolves around the sun, and what explains this is the physical nature of the solar system. The physical nature of the solar system, however, has nothing to do with the political history of Venezuela (or if it does, the connections are completely accidental!) and to that extent there is no shared explanation as to why (3) and (4) are both true. Therefore, in this substantive sense, they have no property in common. It will help to bring out the contrast being invoked here if we consider two properties distinct from a supposed property of being true: the property of being a game and the property of being a mammal. Consider the games of chess and catch. Do both of these have the property of being a game? Well, in one sense, they do: they are both games that people can play. On the other hand, however, there is no common explanation as to why each counts as a game (cf. Wittgenstein 1953, §66). We might then say that being a game is not a substantive property and mean just this. But now compare the property of being a mammal. If two things are mammals, they have the property of being a mammal, but in addition there is some common explanation as to why they are both mammals – both are descended from the same family of creatures, say. According to one development of deflationism, the property of being true is more like the property of being a game than it is like the property of being a mammal. The comparisons between being true, being a game, and being a mammal are suggestive, but they still do not nail down exactly what it means to say that truth is not a substantive property. The contemporary literature on deflationism contains several different approaches to the idea. One such approach, which we will consider in detail in Section 4.1, involves denying that truth plays an explanatory role. Another approach, pursuing an analogy between being true and existing, describes truth as a “logical property” (for example, Field 1992, 322; Horwich 1998a, 37; Künne 2003, 91). A further approach appeals to David Lewis’s (1983, 1986) view that, while every set of entities underwrites a property, there is a distinction between sparse, or natural, properties and more motley or disjointed abundant properties. On this approach, a deflationist might say that there is an abundant property of being true rather than a sparse one (cf. Edwards 2013, Asay 2014, Kukla and Winsberg 2015, and Armour-Garb forthcoming). A different metaphysical idea may be to appeal to the contemporary discussion of grounding and the distinction between groundable and ungroundable properties. In this context, a groundable property is one that is capable of being grounded in some other property, whether or not it is in fact grounded; an ungroundable property is a property that is not groundable (see Dasgupta 2015, 2016 and Rabin 2020). From this point of view, a deflationist might say that being true is an ungroundable property. Hence it is unlike ordinary, sparse/natural properties, such as being iron, which are both capable of being grounded and are grounded, and it is also unlike fundamental physical properties, such as being a lepton, which are capable of being grounded (in some other possible world) but are not (actually) grounded. We will not try to decide here which of these different views of properties is correct but simply note that deflationists who want to claim that there is a truth property, just not a substantive one, have options for explaining what this means. 1.3 The Utility of the Concept of Truth In light of the two central ideas discussed so far – the idea that the equivalence schema is exhaustive of the notion of truth and the idea that there is no substantive truth property – you might wonder why we have a concept of truth in the first place. After all, contrast this question with the explanation of why we have the concept of mammals. A natural suggestion is that it allows us to think and talk about mammals and to develop theories of them. For deflationism, however, as we have just seen, being true is completely different from being a mammal; why then do we have a concept of truth? (An analogous question might be asked about the word ‘true’, i.e., why we have the word ‘true’ and related words in our language at all. In the following discussion we will not discriminate between questions about the concept of truth and questions about the word ‘true’ and will move back and forth between them.) The question of why we have the concept of truth allows us to introduce a third central theme in deflationism, which is an emphasis not merely on the property of truth but on the concept of truth, or, equivalently for present purposes, on the word ‘true’ (cf. Leeds 1978). Far from supposing that there is no point having the concept of truth, deflationists are usually at pains to point out that anyone who has the concept of truth is in possession of a very useful concept indeed; in particular, anyone who has this concept is in a position to express generalizations that would otherwise require non-standard logical devices, such as sentential variables and quantifiers for them. Suppose, for example, that Jones for whatever reason decides that Smith is an infallible guide to the nature of reality. We might then say that Jones believes everything that Smith says. To say this much, however, is not to capture the content of Jones’s belief. In order to do that we need some way of generalizing on the embedded sentence positions in a claim like: To generalize on the relationship indicated in (5), beyond just what Smith says about birds to anything she might say, what we want to do is generalize on the embedded occurrences of ‘birds are dinosaurs’. So, we need a (declarative) sentential variable, ‘\(p\)’, and a universal quantifier governing it. What we want is a way of capturing something along the lines of The problem is that we cannot formulate this in English with our most familiar way of generalizing because the ‘\(p\)’ in the consequent is in a sentence-in-use position, rather than mentioned or nominalized context (as it is in the antecedent), meaning that this formal variable cannot be replaced with a familiar English object-variable expression, e.g., ‘it’. This is where the concept of truth comes in. What we do in order to generalize in the way under consideration is employ the truth predicate with an object variable to produce the sentence, Re-rendering the quasi-formal (7) into natural language yields, Or, to put the same thing more colloquially: The equivalence schema (ES-prop) allows us to use (7) (and therefore (9)) to express what it would otherwise require the unstatable (6) to express. For, on the basis of the schema, there is always an equivalence between whatever goes in for a sentence-in-use occurrence of the variable ‘\(p\)’ and a context in which that filling of the sentential variable is nominalized. This reveals how the truth predicate can be used to provide a surrogate for sentential variables, simulating this non-standard logical device while still deploying the standard object variables already available in ordinary language (‘it’) and the usual object quantifiers (‘everything’) that govern them. This is how the use of the truth predicate in (9) gives us the content of Jones’s belief. And the important point for deflationists is that we could not have stated the content of this belief unless we had the concept of truth (the expression ‘true’). In fact, for most deflationists, it is this feature of the concept of truth – its role in the formation of these sorts of generalizations – that explains why we have a concept of truth at all. This is, as it is often put, the raison d’être of the concept of truth (cf. Field 1994a and Horwich 1998a). 2. History of Deflationism According to Michael Dummett (1959 [1978]), deflationism originates with Gottlob Frege, as expressed in this famous quote by the latter: It is … worthy of notice that the sentence ‘I smell the scent of violets’ has just the same content as the sentence ‘It is true that I smell the scent of violets’. So it seems, then, that nothing is added to the thought by my ascribing to it the property of truth. (Frege 1918, 6) This passage suggests that Frege embraces a deflationary view in position \(\mathbf{B}\) (in the chart above), namely, an analytic propositionalist version of deflationism. But this interpretation of his view is not so clear. As Scott Soames (1999, 21ff) points out, Frege (ibid.) distinguishes what we will call “opaque” truth ascriptions, like ‘My conjecture is true’, from transparent truth-ascriptions, like the one mentioned in the quote from Frege. Unlike with transparent cases, in opaque instances, one cannot simply strip ‘is true’ away and obtain an equivalent sentence, since the result is not even a sentence at all. Frank Ramsey is the first philosopher to have suggested a position like \(\mathbf{B}\) (although he does not really accept propositions as abstract entities (see Ramsey 1927 (34–5) and 1929 (7)), despite sometimes talking in terms of propositions): Truth and falsity are ascribed primarily to propositions. The proposition to which they are ascribed may be either explicitly given or described. Suppose first that it is explicitly given; then it is evident that ‘It is true that Caesar was murdered’ means no more than that Caesar was murdered, and ‘It is false that Caesar was murdered’ means no more than Caesar was not murdered. …. In the second case in which the proposition is described and not given explicitly we have perhaps more of a problem, for we get statements from which we cannot in ordinary language eliminate the words ‘true’ or ‘false’. Thus if I say ‘He is always right’, I mean that the propositions he asserts are always true, and there does not seem to be any way of expressing this without using the word ‘true’. But suppose we put it thus ‘For all \(p\), if he asserts \(p\), \(p\) is true’, then we see that the propositional function \(p\) is true is simply the same as \(p\), as e.g. its value ‘Caesar was murdered is true’ is the same as ‘Caesar was murdered’. (Ramsey 1927, 38–9) On Ramsey’s redundancy theory (as it is often called), the truth operator, ‘it is true that’ adds no content when prefixed to a sentence, meaning that in the instances of what we can think of as the truth-operator version of (ES), the left- and right-hand sides are meaning-equivalent. But Ramsey extends his redundancy theory beyond just the transparent instances of truth-talk, maintaining that the truth predicate is, in principle, eliminable even in opaque ascriptions of the form ‘\(B\) is true’ (which he (1929, 15, n. 7) explains in terms of sentential variables via a formula along the lines of ‘\(\exists p\) (\(p \amp B\) is a belief that \(p\))’) and in explicitly quantificational instances, like ‘Everything Einstein said is true’ (explained as above). As the above quote illustrates, Ramsey recognizes that in truth ascriptions like these the truth predicate fills a grammatical need, which keeps us from eliminating it altogether, but he held that even in these cases it contributes no content to anything said using it. A.J. Ayer endorses a view similar to Ramsey’s. The following quote shows that he embraces a meaning equivalence between the two sides of the instances of both the sentential (position \(\mathbf{A})\) and something like (since, despite his use of the expression ‘proposition’ to mean sentence, he also considers instances of truth-talk involving the prefix ‘it is true that’, which could be read as employing ‘that’-clauses) the propositional (position \(\mathbf{B})\) version of (ES). [I]t is evident that a sentence of the form “\(p\) is true” or “it is true that \(p\)” the reference to truth never adds anything to the sense. If I say that it is true that Shakespeare wrote Hamlet, or that the proposition “Shakespeare wrote Hamlet” is true, I am saying no more than that Shakespeare wrote Hamlet. Similarly, if I say that it is false that Shakespeare wrote the Iliad, I am saying no more than that Shakespeare did not write the Iliad. And this shows that the words ‘true’ and ‘false’ are not used to stand for anything, but function in the sentence merely as assertion and negation signs. That is to say, truth and falsehood are not genuine concepts. Consequently, there can be no logical problem concerning the nature of truth. (Ayer 1935, 28. Cf. Ayer 1936 [1952, 89]) Ludwig Wittgenstein, under Ramsey’s influence, makes claims with strong affinities to deflationism in his later work. We can see a suggestion of an endorsement of deflationary positions \(\mathbf{A}\) or \(\mathbf{B}\) in his (1953, §136) statement that “\(p\) is true \(= p\)” and “\(p\) is false = not-\(p\)”, indicating that ascribing truth (or falsity) to a statement just amounts to asserting that very proposition (or its negation). Wittgenstein also expresses this kind of view in manuscripts from the 1930s, where he claims, “What he says is true = Things are as he says” and “[t]he word ‘true’ is used in contexts such as ‘What he says is true’, but that says the same thing as ‘He says \(\ldquo p\rdquo,\) and \(p\) is the case’”. (Wittgenstein 1934 [1974, 123]) and 1937 [2005, 61]), respectively) Peter Strawson’s views on truth emerge most fully in his 1950 debate with J.L. Austin. In keeping with deflationary position \(\mathbf{B}\), Strawson (1950, 145–7) maintains that an utterance of ‘It is true that \(p\)’ just makes the same statement as an utterance of ‘\(p\)’. However, in Strawson 1949 and 1950, he further endorses a performative view, according to which an utterance of a sentence like ‘That is true’ mainly functions to do something beyond mere re-assertion. This represents a shift to an account of what the expression ‘true’ does, from traditional accounts of what truth is, or even accounts of what ‘true’ means. Another figure briefly mentioned above who looms large in the development of deflationism is Alfred Tarski, with his (1935 [1956] and 1944) identification of a precise criterion of adequacy for any formal definition of truth: its implying all of the instances of what is sometimes called “Convention (T)” or “the (T)-schema”, To explain this schema a bit more precisely, in its instances the ‘\(X\)’ gets replaced by a name of a sentence from the object-language for which the truth predicate is being defined, and the ‘\(p\)’ gets replaced by a sentence that is a translation of that sentence into the meta-language in which the truth predicate is being defined. For Tarski, the ‘if and only if’ deployed in any instance of (T) expresses just a material equivalence, putting his view at position \(\mathbf{E}\) in the chart from Section 1.1. Although this means that Tarski is not a deflationist himself (cf. Field 1994a, Ketland 1999, and Patterson 2012), there is no denying the influence that his work and its promotion of the (T)-schema have had on deflationism. Indeed, some early deflationists, such as W.V.O. Quine and Stephen Leeds, are quite explicit about taking inspiration from Tarski’s work in developing their “disquotational” views, as is Horwich in his initial discussion of deflationism. Even critics of deflationism have linked it with Tarski: Hilary Putnam (1983b, 1985) identifies deflationists as theorists who “refer to the work of Alfred Tarski and to the semantical conception of truth” and who take Tarski’s work “as a solution to the philosophical problem of truth”. The first fully developed deflationary view is the one that Quine (1970 [1986, 10–2]) presents. Given his skepticism about the existence of propositions, Quine takes sentences to be the primary entities to which ‘is true’ may be applied, making the instances of (ES-sent) the equivalences that he accepts. He defines a category of sentence that he dubs “eternal”, viz., sentence types that have all their indexical/contextual factors specified, the tokens of which always have the same truth-values. It is for these sentences that Quine offers his disquotational view. As he (ibid., 12) puts it, This cancellatory force of the truth predicate is explicit in Tarski’s paradigm: ‘Snow is white’ is true if and only if snow is white. Quotation marks make all the difference between talking about words and talking about snow. The quotation is a name of a sentence that contains the name, namely ‘snow’, of snow. By calling the sentence true, we call snow white. The truth predicate is a device of disquotation. As this quote suggests, Quine sees Tarski’s formal work on defining truth predicates for formalized languages and his criterion of adequacy for doing so as underwriting a disquotational analysis of the truth predicate. This makes Quine’s view a different kind of position-\(\mathbf{A}\) account, since he takes the left-hand side of each instance of (ES-sent) to be, as we will put it (since Quine rejects the whole idea of meaning and meaning equivalence), something like a mere syntactic variant of the right-hand side. This also means that Quine’s version of deflationism departs from inflationism by rejecting the latter’s presupposition that truth predicates function to describe the entities they get applied to, the way that other predicates, such as ‘is metal’, do. Quine also emphasizes the importance of the truth predicate’s role as a means for expressing the kinds of otherwise inexpressible generalizations discussed in Section 1.3. As he (1992, 80–1) explains it, The truth predicate proves invaluable when we want to generalize along a dimension that cannot be swept out by a general term … The harder sort of generalization is illustrated by generalization on the clause ‘time flies’ in ‘If time flies then time flies’…. We could not generalize as in ‘All men are mortal’ because ‘time flies’ is not, like ‘Socrates’, a name of one of a range of objects (men) over which to generalize. We cleared this obstacle by semantic ascent: by ascending to a level where there were indeed objects over which to generalize, namely linguistic objects, sentences. So, if we want to generalize on embedded sentence-positions within some sentences, “we ascend to talk of truth and sentences” (Quine 1970 [1986, 11]). This maneuver allows us to “affirm some infinite lot of sentences that we can demarcate only by talking about the sentences” (ibid., 12). Leeds (1978) (following Quine) makes it clear how the truth predicate is crucial for extending the expressive power of a language, despite the triviality that disquotationalism suggests for the transparent instances of truth-talk. He (ibid., 121) emphasizes the logical role of the truth predicate in the expression of certain kinds of generalizations that would otherwise be inexpressible in natural language. Leeds, like Quine, notes that a central utility of the truth predicate, in virtue of its yielding every instance of (ES-sent), is the simulation of quantification into sentence-positions. But, unlike Quine, Leeds glosses this logical role in terms of expressing potentially infinite conjunctions (for universal generalization) or potentially infinite disjunctions (for existential generalization). The truth predicate allows us to use the ordinary devices of first-order logic in ways that provide surrogates for the non-standard logical devices this would otherwise require. Leeds is also clear about accepting the consequences of deflationism, that is, of taking the logically expressive role of the truth predicate to exhaust its function. In particular, he points out that there is no need to think that truth plays any sort of explanatory role. We will return to this point in Section 4.1. Dorothy Grover, Joseph Camp, and Nuel Belnap (1975) develop a different variety of deflationism that they call a “prosentential theory”. This theory descends principally from Ramsey’s views. In fact, Ramsey (1929, 10) made what is probably the earliest use of the term ‘pro-sentence’ in his account of the purpose of truth-talk. Prosentences are explained as the sentence-level analog of pronouns. As in the case of pronouns, prosentences inherit their content anaphorically from other linguistic items, in this case from some sentence typically called the prosentence’s “anaphoric antecedent” (although it need not actually occur before the prosentence). As Grover, et al. develop this idea, this content inheritance can happen in two ways. The most basic one is called “lazy” anaphora. Here the prosentence could simply be replaced with a repetition of its antecedent, as in the sort of case that Strawson emphasized, where one says “That is true” after someone else has made an assertion. According to Grover, et al., this instance of truth-talk is a prosentence that inherits its content anaphorically from the other speaker’s utterance, so that the two speakers assert the same thing. As a result, Grover, et al. would take the instances of (ES) to express meaning equivalences, but since they (ibid., 113–5) do not take the instances of truth-talk on the left-hand sides of these equivalences to say anything about any named entities, they would not read (ES) as either (ES-sent) or (ES-prop) on their standard interpretations. So, while their prosententialism is similar to views in position \(\mathbf{A}\) or in position \(\mathbf{B}\) in the chart above, it is also somewhat different from both. Grover, et al.’s project is to develop the theory “that ‘true’ can be thought of always as part of a prosentence” (ibid., 83). They explain that ‘it is true’ and ‘that is true’ are generally available prosentences that can go into any sentence-position. They consider these expressions to be “atomic” in the sense of not being susceptible to a subject-predicate analysis giving the ‘that’ or ‘it’ separate references (ibid., 91). Both of these prosentences can function in the “lazy” way, and Grover, et al. claim (ibid., 91–2, 114) that ‘it is true’ can also operate as a quantificational prosentence (i.e., a sentential variable), for example, in a re-rendering of a sentence like, in terms of a “long-form” equivalent claim, such as One immediate concern that this version of prosententialism faces pertains to what one might call the “paraphrastic gymnastics” that it requires. For example, a sentence like ‘It is true that humans are causing climate change’ is said to have for its underlying logical form the same form as ‘Humans are causing climate change. That is true’ (ibid., 94). As a result, when one utters an instance of truth-talk of the form ‘It is true that \(p\)’, one states the content of the sentence that goes in for ‘\(p\)’ twice. In cases of quotation, like “‘Birds are dinosaurs’ is true”, Grover, et al. offer the following rendering, ‘Consider: Birds are dinosaurs. That is true’ (ibid., 103). But taking this as the underlying form of quotational instance of truth-talk requires rejecting the standard view that putting quotation marks around linguistic items forms names of those items. These issues raise concerns regarding the adequacy of this version of prosententialism. 3. The Varieties of Contemporary Deflationism In this section, we explain the details of three prominent, contemporary accounts and indicate some concerns peculiar to each. 3.1 Minimalism Minimalism is the version of deflationism that diverges the least from inflationism because it accepts many of the standard inflationary presuppositions, including that ‘is true’ is a predicate used to describe entities as having (or lacking) a truth property. What makes minimalism a version of deflationism is its denial of inflationism’s final assumption, namely, that the property expressed by the truth predicate has a substantive nature. Drawing inspiration from Leeds (1978), Horwich (1982, 182) actually coins the term ‘deflationism’ while describing “the deflationary redundancy theory which denies the existence of surplus meaning and contends that Tarski’s schema [“\(p\)” is true iff \(p\)] is quite sufficient to capture the concept.” Minimalism, Horwich’s mature deflationary position (1998a [First Edition, 1990]), adds to this earlier view. In particular, Horwich (ibid., 37, 125, 142) comes to embrace the idea that ‘is true’ does express a property, but it is merely a “logical property” (cf. Field 1992), rather than any substantive or naturalistic property of truth with an analyzable underlying nature (Horwich 1998a, 2, 38, 120–1). On the basis of natural language considerations, Horwich (ibid., 2–3, 39–40) holds that propositions are what the alethic locutions describe directly. Any other entities that we can properly call true are so only derivatively, on the basis of having some relation to true propositions (ibid., 100–1 and Horwich 1998b, 82–5). This seems to position Horwich well with respect to explaining the instances of truth-talk that cause problems for Quine and Leeds, e.g., those about beliefs and theories. Regarding truth applied directly to propositions, however, Horwich (1998a, 2–3) still explicitly endorses the thesis that Leeds emphasizes about the utility of the truth predicate (and, Horwich adds, the concept it expresses), namely, that it “exists solely for the sake of a certain logical need”. While Horwich (ibid., 138–9) goes so far as to claim that the concept of truth has a “non-descriptive” function, he does not follow Quine and Leeds all the way to their rejection of the assumption that the alethic predicates function to describe truth-bearers. Rather, his (ibid., 31–3, 37) point of agreement with them is that the main function of the truth predicate is its role in providing a means for generalizing on embedded sentence positions, rather than some role in the indication of specifically truth-involving states of affairs. Even so, Horwich (ibid., 38–40) still contends that the instances of truth-talk do describe propositions, in the sense that they make statements about them, and they do so by attributing a property to those propositions. The version of (ES) that Horwich (1998a, 6) makes the basis of his theory is what he also calls “the equivalence schema”, Since he takes truth-talk to involve describing propositions with a predicate, Horwich considers ‘it is true that \(p\)’ to be just a trivial variant of ‘The proposition that \(p\) is true’, meaning that his (E) is a version of (ES-prop) rather than of Ramsey’s (ES-op). He also employs the notation ‘\(\langle p\rangle\)’ as shorthand specifically for ‘the proposition that \(p\)’, generating a further rendering of his equivalence schema (ibid., 10) that we can clearly recognize as a version of (ES-prop), namely Horwich considers the instances of (E) to constitute the axioms of both an account of the property of truth and an account of the concept of truth, i.e., what is meant by the word ‘true’ (ibid., 136). According to minimalism, the instances of (E) are explanatorily fundamental, which Horwich suggests is a reason for taking them to be necessary (ibid., 21, n. 4). This, combined with his view that the equivalence schema applies to propositions, places his minimalism in position \(\mathbf{D}\) in the chart given in Section 1.1. The instances of (ES-prop) are thus explanatory of the functioning of the truth predicate (of its role as a de-nominalizer of ‘that’-clauses (ibid., 5)), rather than being explained by that functioning (as the analogous equivalences are for both disquotationalism and prosententialism). Moreover, Horwich (ibid., 50, 138) claims that they are also conceptually basic and a priori. He (ibid., 27–30, 33, 112) denies that truth admits of any sort of explicit definition or reductive analysis in terms of other concepts, such as reference or predicate-satisfaction. In fact, Horwich (ibid., 10–1, 111–2, 115–6) holds that these other semantic notions should both be given their own, infinitely axiomatized, minimalist accounts, which would then clarify the non-reductive nature of the intuitive connections between them and the notion of truth. Horwich (ibid., 27–30) maintains that the infinite axiomatic nature of minimalism is unavoidable. He (ibid., 25) rejects the possibility of a finite formulation of minimalism via the use of substitutional quantification. On the usual understanding of this non-standard type of quantification, the quantifiers govern variables that serve to mark places in linguistic strings, indicating that either all or some of the elements of an associated substitution class of linguistic items of a particular category can be substituted in for the variables. Since it is possible for the variables so governed to take sentences as their substitution items, this allows for a type of quantification governing sentence positions in complex sentences. Using this sort of sentential substitutional quantification, the thought is, one can formulate a finite general principle that expresses Horwich’s account of truth as follows: where ‘\(\Sigma\)’ is the existential substitutional quantifier. (GT) is formally equivalent to the formulation that Marian David (1994, 100) presents as disquotationalism’s definition of ‘true sentence’, here formulated for propositions instead. Horwich’s main reason for rejecting the proposed finite formulation of minimalism, (GT), is that an account of substitutional quantifiers seems (contra David 1994, 98–9) to require an appeal to truth (since the quantifiers are explained as expressing that at least one or that every item in the associated substitution class yields a true sentence when substituted in for the governed variables), generating circularity concerns (Horwich 1998a, 25–6). Moreover, on Horwich’s (ibid., 4, n. 1; Cf. 25, 32–3) understanding, the point of the truth predicate is to provide a surrogate for substitutional quantification and sentence-variables in natural language, so as “to achieve the effect of generalizing substitutionally over sentences … but by means of ordinary [quantifiers and] variables (i.e., pronouns), which range over objects” (italics original). Horwich maintains that the infinite “list-like” nature of minimalism poses no problem for the view’s adequacy with respect to explaining all of our uses of the truth predicate, and the bulk of Horwich 1998a attempts to establish just that. However, Anil Gupta (1993a, 365) has pointed out that minimalism’s infinite axiomatization in terms of the instances of (E) for every (non-paradox-inducing) proposition makes it maximally ideologically complex, in virtue of involving every other concept. (Moreover, the overtly “fragmented” nature of the theory also makes it particularly vulnerable to the Generalization Problem that Gupta has raised, which we discuss in Section 4.5, below.) Christopher Hill (2002) attempts to deal with some of the problems that Horwich’s view faces, by presenting a view that he takes to be a newer version of minimalism, replacing Horwich’s equivalence schema with a universally quantified formula, employing a kind of substitutional quantification to provide a finite definition of ‘true thought (proposition)’. Hill’s (ibid., 22) formulation of his account, is formally similar to the formulation of minimalism in terms of (GT) that Horwich rejects, but to avoid the circularity concerns driving that rejection, Hill’s (ibid., 18–22) idea is to offer introduction and elimination rules in the style of Gerhard Gentzen (1935 [1969]) as a means for defining the substitutional quantifiers. Horwich (1998a, 26) rejects even this inference-rule sort of approach, but he directs his critique against defining linguistic substitutional quantification this way. Hill takes his substitutional quantifiers to apply to thoughts (propositions) instead of sentences. But serious concerns have been raised regarding the coherence of this non-linguistic notion of substitutional quantification (cf. David 2006, Gupta 2006b, Simmons 2006). As a result, it is unclear that Hill’s account is an improvement on Horwich’s version of minimalism. 3.2 Disquotationalism Like minimalism, disquotationalism agrees with inflationary accounts of truth that the alethic locutions function as predicates, at least logically speaking. However, as we explained in discussing Quine’s view in Section 2, disquotationalism diverges from inflationary views (and minimalism) at their shared assumption that these (alethic) predicates serve to describe the entities picked out by the expressions with which they are combined, specifically as having or lacking a certain property. Although Quine’s disquotationalism is inspired by Tarski’s recursive method for defining a truth predicate, that method is not what Quine’s view emphasizes. Field’s contemporary disquotationism further departs from that aspect of Tarski’s work by looking directly to the instances of the (T)-schema that the recursive method must generate in order to satisfy Tarski’s criterion of material adequacy. Tarski himself (1944, 344–5) suggests at one point that each instance of (T) could be considered a “partial definition” of truth and considers (but ultimately rejects; see Section 4.5) the thesis that a logical conjunction of all of these partial definitions amounts to a general definition of truth (for the language that the sentences belonged to). Generalizing slightly from Tarski, we can call this alternative approach “(T)-schema disquotationalism”, in contrast with the Tarski-inspired approach that David (1994, 110–1) calls “recursive disquotationalism”. Field (1987, 1994a) develops a version of (T)-schema disquotationalism that he calls “pure disquotational truth”, focusing specifically on the instances of his preferred version of (ES), the “disquotational schema” (Field 1994a, 258), Similar to the “single principle” formulation, (GT), rejected by Horwich (but endorsed by Hill), Field (ibid., 267) allows that one could take a “generalized” version of (T/ES-sent), prefixed with a universal substitutional quantifier, ‘\(\Pi\)’, as having axiomatic status, or one could incorporate schematic sentence variables directly into one’s theorizing language and reason directly with (T/ES-sent) as a schema (cf. ibid., 259). Either way, in setting out his version of deflationism, Field (ibid., 250), in contrast with Horwich, does not take the instances of his version of (ES) as fundamental but instead as following from the functioning of the truth predicate. On Field’s reading of (T/ES-sent), the use of the truth predicate on the left-hand side of an instance does not add any cognitive content beyond that which the mentioned utterance has (for the speaker) on its own when used (as on the right-hand-side of (T/ES-sent)). As a result, each instance of (T/ES-sent) “holds of conceptual necessity, that is, by virtue of the cognitive equivalence of the left and right hand sides” (ibid., 258). This places Field’s deflationism also in position \(\mathbf{A}\) in the chart from Section 1.1. Following Leeds and Quine, Field (1999, 533–4) sees the central utility of a purely disquotational truth predicate to be providing for the expression of certain “fertile generalizations” that cannot be made without using the truth predicate but which do not really involve the notion of truth. Field (1994a, 264) notes that the truth predicate plays “an important logical role: it allows us to formulate certain infinite conjunctions and disjunctions that can’t be formulated otherwise [n. 17: at least in a language that does not contain substitutional quantifiers]”. Field’s disquotationalism addresses some of the worries that arose for earlier versions of this variety of deflationism, due to their connections with Tarski’s method of defining truth predicates. It also explains how to apply a disquotational truth predicate to ambiguous and indexical utterances, thereby going beyond Quine’s (1970 [1986]) insistence on taking eternal sentences as the subjects of the instances of (ES-sent) (cf. Field 1994a, 278–81). So, Field’s view addresses some of the concerns that David (1994, 130–66) raises for disquotationalism. However, an abiding concern about this variety of deflationism is that it is an account of truth as applied specifically to sentences. This opens the door to a version of the complaint that Strawson (1950) makes against Austin’s account of truth, that it is not one’s act of stating [here: the sentence one utters] but what thereby gets stated that is the target of a truth ascription. William Alston (1996, 14) makes a similar point. While disquotationalists do not worry much about this, this scope restriction might strike others as problematic because it raises questions about how we are to understand truth applied to beliefs or judgments, something that Hill (2002) worries about. Field (1978) treats beliefs as mental states relating thinkers to sentences (of a language of thought). But David (1994, 172–7) raises worries for applying disquotationalism to beliefs, even in the context of an account like Field’s. The view that we believe sentences remains highly controversial, but it is one that, it seems, a Field-style disquotationalist must endorse. Similarly, such disquotationalists must take scientific theories to consist of sets of sentences, in order for truth to be applicable to them. This too runs up against Strawson’s complaint because it suggests that one could not state the same theory in a different language. These sorts of concerns continue to press for disquotationalists. 3.3 Prosententialism As emerges from the discussion of Grover, et al. (1975) in Section 2, prosententialism is the form of deflationism that contrasts the most with inflationism, rejecting even the latter’s initial assumption that the alethic locutions function as predicates. Partly in response to the difficulties confronting Grover, et al.’s prosentential account, Robert Brandom (1988 and 1994) has developed a variation on their view with an important modification. In place of taking the underlying logic of ‘true’ as having this expression occur only as a non-separable component of the semantically atomic prosentential expressions, ‘that is true’ and ‘it is true’, Brandom treats ‘is true’ as a separable prosentence-forming operator. “It applies to a term that is a sentence nominalization or that refers to or picks out a sentence tokening. It yields a prosentence that has that tokening as its anaphoric antecedent” (Brandom 1994, 305). In this way, Brandom’s account avoids most of the paraphrase concerns that Grover, et al.’s prosententialism faces, while still maintaining prosententialism’s rejection of the contention that the alethic locutions function predicatively. As a consequence of his operator approach, Brandom gives quantificational uses of prosentences a slightly different analysis. He (re)expands instances of truth-talk like the following, “back” into longer forms, such as and explains only the second ‘it’ as involved in a prosentence. The first ‘it’ in (8*) and (11) still functions as a pronoun, anaphorically linked to a set of noun phrases (sentence nominalizations) supplying objects (sentence tokenings) as a domain being quantified over with standard (as opposed to sentential or “propositional”) quantifiers (ibid., 302). Brandom presents a highly flexible view that takes ‘is true’ as a general “denominalizing” device that applies to singular terms formed from the nominalization of sentences broadly, not just to pronouns that indicate them. A sentence like ‘It is true that humans are causing climate change’, considered via a re-rendering as ‘That humans are causing climate change is true’, is already a prosentence on his view, as is a quote-name case like “‘Birds are dinosaurs’ is true”, and an opaque instance of truth-talk like ‘Goldbach’s Conjecture is true’. In this way, Brandom offers a univocal and broader prosentential account, according to which, “[i]n each use, a prosentence will have an anaphoric antecedent that determines a class of admissible substituends for the prosentence (in the lazy case, a singleton). This class of substituends determines the significance of the prosentence associated with it” (ibid.). As a result, Brandom can accept both (ES-sent) and (ES-prop) – the latter understood as involving no commitment to propositions as entities – on readings closer to their standard interpretations, taking the instances of both to express meaning equivalences. Brandom’s account thus seems to be located in both position \(\mathbf{A}\) and position \(\mathbf{B}\) in the chart from Section 1.1, although, as with any prosententialist view, it still denies that the instances of (ES) say anything about either sentences or propositions. Despite its greater flexibility, however, Brandom’s account still faces the central worry confronting prosentential views, namely that truth-talk really does seem predicative, and not just in its surface grammatical form but in our inferential practices with it as well. In arguing for the superiority of his view over that of Grover, et al., Brandom states that “[t]he account of truth talk should bear the weight of … divergence of logical from grammatical form only if no similarly adequate account can be constructed that lacks this feature” (ibid., 304). One might find it plausible to extend this principle beyond grammatical form, to behavior in inferences as well. This is an abiding concern for attempts to resist inflationism by rejecting its initial assumption, namely, that the alethic locutions function as predicates. 4. Objections to Deflationism In the remainder of this article, we consider a number of objections to deflationism. These are by no means the only objections that have been advanced against the approach, but they seem to be particularly obvious and important ones. 4.1 The Explanatory Role of Truth The first objection starts from the observation that (a) in certain contexts an appeal to the notion of truth appears to have an explanatory role and (b) deflationism seems to be inconsistent with that appearance. Some of the contexts in which truth seems to have an explanatory role involve philosophical projects, such as the theory of meaning (which we will consider below) or explaining the nature of knowledge. In these cases, the notion of explanation at issue is not so much causal as it is conceptual (see Armour-Garb and Woodbridge forthcoming, for more on this). But the notion of truth seems also sometimes to play a causal explanatory role, especially with regard to explaining various kinds of success – mainly the success of scientific theories/method (cf. Putnam 1978 and Boyd 1983) and of people’s behavior (cf. Putnam 1978 and Field 1987), but also the kind of success involved in learning from others (Field 1972). The causal-explanatory role that the notion of truth appears to play in accounts of these various kinds of success has seemed to many philosophers to constitute a major problem for deflationism. For example, Putnam (1978, 20–1, 38) claims, “the notions of ‘truth’ and ‘reference’ have a causal-explanatory role in … an explanation of the behavior of scientists and the success of science”, and “the notion of truth can be used in causal explanations – the success of a man’s behavior may, after all, depend on the fact that certain of his beliefs are true – and the formal logic of ‘true’ [the feature emphasized by deflationism] is not all there is to the notion of truth”. While a few early arguments against deflationism focus on the role of truth in explanations of the success of science (see Williams 1986 and Fine 1984a, 1984b for deflationary responses to Putnam and Boyd on this), according to Field (1994a, 271), “the most serious worry about deflationism is that it can’t make sense of the explanatory role of truth conditions: e.g., their role in explaining behavior, or their role in explaining the extent to which behavior is successful”. While few theorists endorse the thesis that explanations of behavior in general need to appeal to the notion of truth (even a pre-deflationary Field (1987, 84–5) rejects this, but see Devitt 1997, 325–330, for an opposing position), explanations of the latter, i.e., of behavioral success, still typically proceed in terms of an appeal to truth. This poses a prima facie challenge to deflationary views. To illustrate the problem, consider the role of the truth-value of an individual’s belief in whether that person succeeds in satisfying her desires. Let us suppose that Mary wants to get to a party, and she believes that it is being held at 1001 Northside Avenue. If her belief is true, then, other things being equal, she is likely to get to the party and get what she wants. But suppose that her belief is false, and the party is in fact being held at 1001 Southside Avenue. Then it would be more likely, other things being equal, that she won’t get what she wants. In an example of this sort, the truth of her belief seems to be playing a particular role in explaining why she gets what she wants. Assuming that Mary’s belief is true, and she gets to the party, it might seem natural to say that the latter success occurs because her belief is true, which might seem to pose a problem for deflationists. However, truth-involving explanations of particular instances of success like this don’t really pose a genuine problem. This is because if we are told the specific content of the relevant belief, it is possible to replace the apparently explanatory claim that the belief is true with an equivalent claim that does not appeal to truth. In Mary’s particular case, we could replace i) the claim that she believes that the party is being held at 1001 Northside Avenue, and her belief is true, with ii) the claim that she believes that the party is being held at 1001 Northside Avenue, and the party \(is\) being held at 1001 Northside Avenue. A deflationist can claim that the appeal to truth in the explanation of Mary’s success just provides an expressive convenience (including, perhaps, the convenience of expressing what would otherwise require an infinite disjunction (of conjunctions like ii)), by saying just that what Mary believed was true, if one did not know exactly which belief Mary acted on) (cf. Horwich 1998a, 22–3, 44–6). While deflationists seem to be able to account for appeals to truth in explanations of particular instances of success, the explanatory-role challenge to deflationism also cites the explanatory role that an appeal to truth appears to play in explaining the phenomenon of behavioral success more generally. An explanation of this sort might take the following form: [1] People act (in general) in such a way that their goals will be obtained (as well as possible in the given situation), or in such a way that their expectations will not be frustrated, … if their beliefs are true. [2] Many beliefs [people have about how to attain their goals] are true. [3] So, as a consequence of [1] and [2], people have a tendency to attain certain kinds of goals. (Putnam 1978, 101) The generality of [1] in this explanation seems to cover more cases than any definite list of actual beliefs that someone has could include. Moreover, the fact that [1] supports counterfactuals by applying to whatever one might possibly believe (about attaining goals) suggests that it is a law-like generalization. If the truth predicate played a fundamental role in the expression of an explanatory law, then deflationism would seem to be unsustainable. A standard deflationary response to this line of reasoning involves rejecting the thesis that [1] is a law, seeing it (and truth-involving claims like it) instead as functioning similarly to how the claim ‘What Mary believes is true’ functions in an explanation of her particular instance of behavioral success, just expressing an even more indefinite, and thus potentially infinite claim. The latter is what makes a claim like [1] seem like an explanatory law, but even considering this indefiniteness, the standard deflationary account of [1] claims that the function of the appeal to the notion of truth there is still just to express a kind of generalization. One way to bring out this response is to note that, similar to the deflationary “infinite disjunction” account of the claim ‘What Mary believes is true’, generalizations of the kind offered in [1] entail infinite conjunctions of their instances, which are claims that can be formulated without appeal to truth. For example, in the case of explaining someone, \(A\), accomplishing their goal of getting to a party, deflationsts typically claim that the role of citing possession of a true belief is really just to express an infinite conjunction with something like the following form: If \(A\) believes that the party is 1001 Northside Avenue, and the party is at 1001 Northside Avenue, then \(A\) will get what they want; and if \(A\) believes that the party is at 1001 Southside Avenue, and the party is at 1001 Southside Ave, then \(A\) will get what they want; and if \(A\) believes that party is at 17 Elm St, and the party is at 17 Elm St, then \(A\) will get what they want; … and so on. The equivalence schema (ES) allows one to capture this infinite conjunction (of conditionals) in a finite way. For, on the basis of the schema, one can reformulate the infinite conjunction as: If \(A\) believes that the party is 1001 Northside Avenue, and that the party is 1001 Northside Avenue is true, then \(A\) will get what they want; and if \(A\) believes that the party is at 1001 Southside Avenue, and that the party is at 1001 Southside Avenue is true, then \(A\) will get what they want, and if \(A\) believes that the party is at 17 Elm Street, and that the party is at 17 Elm Street is true, then \(A\) will get what they want; … and so on. In turn, this (ES)-reformulated infinite conjunction can be expressed as a finite statement with a universal quantifier ranging over propositions: For every proposition \(x\), if what \(A\) believes \(= x\), and \(x\) is true, then \(A\) will get what they want, other things being equal. The important point for a deflationist is that one could not express the infinite conjunction regarding the agent’s beliefs and behavioral success unless one had the concept of truth. But deflationists also claim that this is all that the notion of truth is doing here and in similar explanations (cf. Leeds 1978, 1995; Williams 1986, Horwich 1998a). How successful is this standard deflationary response? There are several critiques in the literature. Some (e.g., Damnjanovic 2005) argue that there is no distinction in the first place between appearing in a causal-explanatory generalization and being a causal-explanatory property. After all, suppose it is a true generalization that metal objects conduct electricity. That would normally be taken as sufficient to show that being metal is a causal-explanatory property that one can cite in explaining why something conducts electricity. But isn’t this a counter, then, to deflationism’s thesis that, assuming there is a property of truth at all, it is at most an insubstantial one? If a property is a causal or explanatory property, after all, it is hard to view it as insubstantial. The reasoning at issue here may be presented conveniently by expanding on the general argument considered above and proceeding from an apparently true causal generalization to the falsity of deflationism (ibid.): P1. If a person \(A\) has true beliefs, they will get what they want, other things being equal. C1. Therefore, if \(A\) has beliefs with the property of being true, \(A\) will get what they want other things being equal. C2. Therefore, the property of being true appears in a causal-explanatory generalization. C3. Therefore, the property of being true is a causal-explanatory property. C4. Therefore, deflationism is false. Can a deflationist apply the standard deflationary response to this argument? Doing so would seem to involve rejecting the inference from C2 to C3. After all, the standard reply would say that the role that the appeal to truth plays in P1, the apparent causal generalization, is simply its generalizing role of expressing a potentially infinite, disjointed conjunction of unrelated causal connections (cf. Leeds 1995). So, applying this deflationary response basically hinges on the plausibility of rejecting the initial assumption that there is no distinction between appearing in a causal-explanatory generalization and being a causal-explanatory property. It is worth noting two other responses beyond the standard one that a deflationist might make to the reasoning just set out. The first option is to deny the step from P1 to C1. This inference involves the explicit introduction of the property of being true, and, as we have seen, some deflationists deny that there is a truth property at all (cf. Quine 1970 [1986], Grover, et al. 1975, Leeds 1978, Brandom 1994). But, as we noted above, the idea that there is no truth property may be difficult to sustain given the apparent fact that ‘is true’ functions grammatically as a predicate. The second option is to deny the final step from C3 to C4 and concede that there is a sense in which truth is a causal-explanatory property and yet say that it is still not a substantive property (cf. Damnjanovic 2005). For example, some philosophers (e.g., Friedman 1974, van Fraassen 1980, Kitcher 1989, Jackson and Pettit 1990) have offered different understandings of scientific explanation and causal explanation, according to which being a causal and explanatory property might not conflict with being insubstantial (perhaps by being an abundant or ungroundable property). This might be enough to sustain a deflationary position. The standard deflationary response to the explanatory-role challenge has also met with criticisms focused on providing explanations of certain “higher-level” phenomena. Philip Kitcher (2002, 355–60) concludes that Horwich’s (1998a, 22–3) application of the standard response, in his account of how the notion of truth functions in explanations of behavioral success, misses the more systematic role that truth plays in explaining patterns of successful behavior, such as when mean-ends beliefs flow from a representational device, like a map. Chase Wrenn (2011) agrees with Kitcher that deflationists need to explain systematic as opposed to just singular success, but against Kitcher he argues that deflationists are actually better off than inflationists on this front. Will Gamester (2018, 1252–5) raises a different “higher-level factor” challenge, one based on the putative inability of the standard deflationary account of the role of truth in explanations of behavioral success to distinguish between coincidental and non-coincidental success. Gamester (ibid., 1256–7) claims that an inflationist could mark and account for the difference between the two kinds of success with an explanation that appeals to the notion of truth. But it is not clear that a deflationist cannot also avail herself of a version of this truth-involving explanation, taking it just as the way of expressing in natural language what one might formally express with sentential variables and quantifiers (cf. Ramsey 1927, 1929; Prior 1971, Wrenn 2021, and Armour-Garb and Woodbridge forthcoming). 4.2 Propositions Versus Sentences We noted earlier that deflationism can be presented in either a sententialist version or a propositionalist version. Some philosophers have suggested, however, that the choice between these two versions constitutes a dilemma for deflationism (Jackson, Oppy, and Smith 1994). The objection is that if deflationism is construed in accordance with propositionalism, then it is trivial, but if it is construed in accordance with sententialism, it is false. To illustrate the dilemma, consider the following claim: Now, does ‘snow is white’ in (12) refer to a sentence or a proposition? If, on the one hand, we take (12) to be about a sentence, then, assuming (12) can be interpreted as making a necessary claim, it is false. On the face of it, after all, it takes a lot more than snow’s being white for it to be the case that ‘snow is white’ is true. In order for ‘snow is white’ to be true, it must be the case not only that snow is white, it must, in addition, be the case that ‘snow is white’ means that snow is white. But this is a fact about language that (12) ignores. On the other hand, suppose we take ‘snow is white’ in (12) to denote the proposition that snow is white. Then the approach looks to be trivial, since the proposition that snow is white is defined as being the one that is true just in case snow is white. Thus, deflationism faces the dilemma of being false or trivial. One response for the deflationist is to remain with the propositionalist version of their doctrine and accept its triviality. A trivial doctrine, after all, at least has the advantage of being true. A second response is to resist the suggestion that propositionist deflationism is trivial. For one thing, the triviality here does not have its source in the concept of truth, but rather in the concept of a proposition. Moreover, even if we agree that the proposition that snow is white is defined as the one that is true if and only if snow is white, this still leaves open whether truth is a substantive property of that proposition; as such it leaves open whether deflationism or inflationism is correct. A third response to this dilemma is to accept that deflationism applies inter alia to sentences, but to argue (following Field 1994a) that the sentences to which it applies must be interpreted sentences, i.e., sentences which already have meaning attached to them. While it takes more than snow being white to make the sentence ‘snow is white’ true, when we think of it as divorced from its meaning, that is not so clear when we treat it as having the meaning it in fact has. 4.3 Correspondence It is often said to be a platitude that true statements correspond to the facts. The so-called “correspondence theory of truth” is built around this intuition and tries to explain the notion of truth by appealing to the notions of correspondence and fact. But even if one does not build one’s approach to truth around this intuition, many philosophers regard it as a condition of adequacy on any approach that it accommodate this correspondence intuition. It is often claimed, however, that deflationism has trouble meeting this adequacy condition. One way to bring out the problem here is by focusing on a particular articulation of the correspondence intuition, one favored by deflationists themselves (e.g., Horwich 1998a). According to this way of spelling it out, the intuition that a certain sentence or proposition “corresponds to the facts” is the intuition that the sentence or proposition is true because of how the world is; that is, the truth of the proposition is explained by some fact, which is usually external to the proposition itself. We might express this by saying that someone who endorses the correspondence intuition so understood would endorse: The problem with (6) is that, when we combine it with deflationism – or at least with a necessary version of that approach – we can derive something that is plainly false. Anyone who assumes that the instances of the equivalence schema are necessary would clearly be committed to the necessary truth of: And, since (7) is a necessary truth, under that assumption, it is very plausible to suppose that (6) and (7) together entail: But (8) is clearly false. The reason is that the ‘because’ in (6) and (8) is a causal or explanatory relation, and plausibly such relations must obtain between distinct relata. But the relata in (8) are (obviously) not distinct. Hence, (8) is false, and this means that the conjunction of (6) and (7) must be false, and that deflationism is inconsistent with the correspondence intuition. To borrow a phrase of Mark Johnston’s (1989) – who mounts a similar argument in a different context – we might say that if deflationism is true, then what seems to be a perfectly good explanation in (6) goes missing; if deflationism is true, after all, then (6) is equivalent to (8), and (8) is not an explanation of anything. One way a deflationist might attempt to respond to this objection is by providing a different articulation of the correspondence intuition. For example, one might point out that the connection between the proposition that snow is white being true and snow’s being white is not a contingent connection and suggest that this rules out (6) as a successful articulation of the correspondence intuition. That intuition (one might continue) is more plausibly given voice by However, when (6*) is conjoined with (7), one cannot derive the problematic (8), and thus, one might think, the objection from correspondence might be avoided. Now, certainly this is a possible suggestion; the problem with it, however, is that a deflationist who thinks that (6*) is true is most plausibly construed as holding a sententialist, rather than a propositionalist, version of deflationism. A sententialist version of deflationism will supply a version of (7), viz.: which, at least if it is interpreted as a necessary (or analytic) truth, will conspire with (6*) to yield (8). And we are back where we started. Another response would be to object that ‘because’ creates an opaque context – that is, the kind of context within which one cannot substitute co-referring expressions and preserve truth. However, for this to work, ‘because’ must create an opaque context of the right kind. In general, we can distinguish two kinds of opaque context: intensional contexts, which allow the substitution of necessarily co-referring expressions but not contingently co-referring expressions; and hyperintensional contexts, which do not even allow the substitution of necessarily co-referring expressions. If the inference from (6) and (7) to (8) is to be successfully blocked, it is necessary that ‘because’ creates a hyperintensional context. A proponent of the correspondence objection might try to argue that while ‘because’ creates an intensional context, it does not create a hyperintensional context. But since a hyperintensional reading of ‘because’ has become standard fare, this approach remains open to a deflationist and is not an ad hoc fix. A final, and most radical, response would be to reject the correspondence intuition outright. This response is not as drastic as it sounds. In particular, deflationists do not have to say that someone who says ‘the proposition that snow is white corresponds to the facts’ is speaking falsely. Deflationists might do better by saying that such a person is simply using a picturesque or ornate way of saying that the proposition is true, where truth is understood in accordance with deflationism. Indeed, a deflationist can even agree that, for certain rhetorical or conversational purposes, it might be more effective to use talk of “correspondence to the facts”. Nevertheless, it is important to see that this response does involve a burden, since it involves rejecting a condition of adequacy that many regard as binding. 4.4 Truth-Value Gaps According to some metaethicists (moral non-cognitivists or expressivists), moral claims – such as the injunction that one ought to return people’s phone calls – are neither true nor false. The same situation holds, according to some philosophers of language, for claims that presuppose the existence of something which does not in fact exist, such as the claim that the present King of France is bald; for sentences that are vague, such as ‘These grains of sand constitute a heap’; and for sentences that are paradoxical, such as those that arise in connection with the Liar Paradox. Let us call this thesis the gap, since it finds a gap in the class of sentences between those that are true and those that are false. The deflationary approach to truth has seemed to be inconsistent with the gap, and this has been thought by some (e.g., Dummett 1959 [1978, 4] and Holton 2000) to be an objection. The reason for the apparent inconsistency flows from a natural way to extend the deflationary approach from truth to falsity. The most natural thing for a deflationist to do is to introduce a falsity schema like: Following Holton (1993, 2000), we consider (F-sent) to be the relevant schema for falsity, rather than some propositional schema, since the standard understanding of a gappy sentence is as one that does not express a proposition (cf. Jackson, et al. 1994). With a schema like (F-sent) in hand, deflationists could say things about falsity similar to what they say about truth: (F-sent) exhausts the notion of falsity, there is no substantive property of falsity, the utility of the concept of falsity is just a matter of facilitating the expression of certain generalizations, etc. However, there is a seeming incompatibility between (F-sent) and the gap. Suppose, for reductio, that ‘S’ is a sentence that is neither true nor false. In that case, it is not the case that ‘S’ is true, and it is not the case that ‘S’ is false. But then, by (ES-sent) and (F-sent), we can infer that it is not the case that S, and it is not the case that not-S; in short: \({\sim}\)S and \({\sim}{\sim}\)S, which is a classical contradiction. Clearly, then, we must give up one of these things. But which one can we give up consistently with deflationism? In the context of ethical non-cognitivism, one possible response to the apparent dilemma is to distinguish between a deflationary account of truth and a deflationary account of truth-aptitude (cf. Jackson, et al. 1994). By accepting an inflationary account of the latter, one can claim that ethical statements fail the robust criteria of “truth-aptitude” (reidentified in terms of expression of belief), even if a deflationary view of truth still allows the application of the truth predicate to them, via instances of (ES). In the case of vagueness, one might adopt epistemicism about it and claim that vague sentences actually have truth-values, we just can’t know them (cf. Williamson 1994. For an alternative, see Field 1994b). With respect to the Liar Paradox, the apparent conflict between deflationism and the gap has led some (e.g., Simmons 1999) to conclude that deflationism is hobbled with respect to dealing with the problem, since most prominent approaches to doing so, stemming from the work of Saul Kripke (1975), involve an appeal to truth-value gaps. One alternative strategy a deflationist might pursue in attempting to resolve the Liar is to offer a non-classical logic. Field 2008 adopts this approach and restricts the law of the excluded middle. JC Beall (2002) combines truth-value gaps with Kleene logic (see the entry on many-valued Logic) and makes use of both weak and strong negation. Armour-Garb and Beall (2001, 2003) argue that deflationists can and should be dialetheists and accept that some truthbearers are both true and not true (see also, Woodbridge 2005, 152–3, on adopting a paraconsistent logic that remains “quasi-classical”). By contrast, Armour-Garb and Woodbridge (2013, 2015) develop a version of the “meaningless strategy” with respect to the Liar (based on Grover 1977), which they claim a deflationist can use to dissolve that paradox and semantic pathology more generally, without accepting genuine truth-value gaps or giving up classical logic. 4.5 The Generalization Problem Since deflationists place such heavy emphasis on the role of the concept of truth in expressing generalizations, it seems somewhat ironic that certain versions of deflationism have been criticized for being incapable of accounting for generalizations involving truth (Gupta 1993a, 1993b; Field 1994a, 2008; Horwich 1998a (137–8), 2001; Halbach 1999 and 2011 (57–9); Soames 1999, Armour-Garb 2004, 2010, 2011). The “Generalization Problem” (henceforth, \(GP)\) captures the worry that a deflationary account of truth is inadequate for explaining our commitments to general facts we express with certain uses of ‘true’. This raises the question of whether and, if so, how, deflationary accounts earn the right to endorse such generalizations. Although Tarski (1935 [1956]) places great importance on the instances of his (T)-schema, he comes to recognize that those instances do not provide a fully adequate way of characterizing truth. Moreover, even when the instances of (T) are taken as theorems, Tarski (ibid.) points out that taken all together they are insufficient for proving a ‘true’-involving generalization like since the collection of the instances of (T) is \(\omega\)-incomplete (where a theory, \(\theta\), is \(\omega\)-incomplete if \(\theta\) can prove every instance of an open formula ‘\(Fx\)’ but cannot prove the universal generalization, ‘\(\forall xFx\)’). We arrive at a related problem when we combine a reliance on the instances of some version of (ES) with Quine’s view about the functioning and utility of the truth predicate. He (1992, 81) considers the purpose of (A) to be to express a generalization over sentences like the following: Quine points out that we want to be able to generalize on the embedded sentences in those conditionals, by semantically ascending, abstracting logical form, and deriving (A). But, as Tarski (ibid.) notes, this feat cannot be achieved, given only a commitment to (the instances of) (T). From (T) and (A), we can prove (B) and (C) but, given the finitude of deduction, when equipped only with the instances of (T), we cannot prove (A). As a consequence of the Compactness Theorem of first-order logic, anything provable from the totality of the instances of (T) is provable from just finitely many of them, so any theory that takes the totality of the instances of (T) to characterize truth will be unable to prove any generalization like (A). To address the question of why we need to be able to prove these truth-involving generalizations, suppose that we accept a proposition like \(\langle\)Every proposition of the form \(\langle\)if \(p\), then \(p\rangle\) is true\(\rangle\). Call this proposition “\(\beta\)”. Now take ‘\(\Gamma\)’ to stand for the collection of propositions that are the instances of \(\beta\). Horwich (2001) maintains that an account of the meaning of ‘true’ will be adequate only if it aids in explaining why we accept the members of \(\Gamma\), where such explanations amount to proofs of those propositions by, among other things, employing an explanatory premise that does not explicitly concern the truth predicate. So, one reason it is important to be able to prove a ‘true’-involving generalization is because this is a condition of adequacy for an account of the meaning of that term. One might argue that anyone who grasps the concept of truth, and that of the relevant conditional, should be said to know \(\beta\). But if a given account of truth, together with an account of the conditional (along, perhaps, with an account of other logical notions), does not entail \(\beta\), then it does not provide an acceptable account of truth. Here is another reason for thinking that generalizations like \(\beta\) must be proved. A theory of the meaning of ‘true’ should explain our acceptance of propositions like \(\beta\), which, as Gupta (1993a) and Hill (2002) emphasize, should be knowable a priori by anyone who possesses the concept of truth (and who grasps the relevant logical concepts). But if such a proposition can be known a priori on the basis of a grasp of the concept of truth (and of the relevant logical concepts), then a theory that purports to specify the meaning of ‘true’ should be able to explain our acceptance of that proposition. But if an account of the meaning of ‘true’ is going to do this, it must be possible to derive the proposition from one or more of the clauses that constitute our grasp of the concept of truth. This creates a problem for a Horwichian minimalist. Let us suppose that \(\beta\) is one of the general propositions that must be provable. Restricted to the resources available through Horwich’s minimalism, we can show that \(\beta\) cannot be derived. If a Horwichian minimalist could derive \(\beta\), it would have to be derived from the instances of But there cannot be a valid derivation of a universal generalization from a set of particular propositions unless that set is inconsistent. Since, according to Horwich (1998a), every instance of (E) that is part of his theory of truth is consistent, it follows that there cannot be a derivation of \(\beta\) from the instances of (E). This is a purely logical point. As such, considerations of pure logic dictate that our acceptance of \(\beta\) cannot be explained by Horwich’s account of truth. Since Horwich takes all instances of the propositional version of (T) (i.e., (ES-prop)) as axioms, he can prove each of those instances. But, as we have seen, restricted to the instances of the equivalence schema, he cannot prove the generalization, \(\beta\), i.e., \(\langle\)Every proposition of the form \(\langle\)if \(p\) then \(p\rangle\) is true\(\rangle\). Some deflationists respond to the GP by using a version of (GT) to formulate their approach: In this context, there are two things to notice about (GT). First, it is not a schema but a universally quantified formula. For this reason, it is possible to derive a generalization like \(\beta\) from it. Second, the existential quantifier, ‘\(\Sigma\)’, in (GT) must be a higher-order quantifier (see the entry on second-order and higher-order logic) that quantifies into sentential positions. We mentioned above an approach that takes this quantifier as a substitutional one, where the substitution class consists of sentences. We also mentioned Hill’s (2002) alternative version that takes the substitution class to be the set of all propositions. Künne (2003) suggests a different approach that takes ‘\(\Sigma\)’ to be an objectual (domain and values) quantifier ranging over propositions. However, parallel to Horwich’s rejection of (GT) discussed in Section 3.1, all of these approaches have drawn criticism on the grounds that the use of higher-order quantifiers to define truth is circular (cf. Platts 1980, McGrath 2000), and may get the extension of the concept of truth wrong (cf. Sosa 1993). An alternative deflationist approach to the GP attempts to show that, despite appearances, certain deflationary theories do have the resources to derive the relevant generalizations. Field (1994a, 2001a), for example, suggests that we allow reasoning with schemas directly and proposes rules that would allow the derivation of generalizations. Horwich (1998a, 2001) suggests a more informal approach according to which we are justified in deriving \(\beta\) since an informal inspection of a derivation of some instance of \(\beta\) shows us that we could derive any instance of it. For replies to Horwich, see Armour-Garb 2004, 2010, 2011; Gupta 1993a, 1993b; and Soames 1999. For responses to Armour-Garb’s attack on Horwich 2001, see Oms 2019 and Cieśliński 2018. 4.6 Conservativeness An ideal theory of truth will be both consistent (e.g., avoid the Liar Paradox) and adequate (e.g., allow us to derive all the essential laws of truth, such as those at issue in the Generalization Problem). Yet it has recently been argued that even if deflationists can provide a consistent theory of truth and avoid the GP, they still cannot provide an adequate theory. This argument turns on the notion of a conservative extension of a theory. Informally, a conservative extension of a theory is one that does not allow us to prove anything that could not be proved from the original, unextended theory. More formally, and applied to theories of truth, a truth theory, \(Tr\) is conservative over some theory \(T\) formulated in language \(L\) if and only if for every sentence \(\phi\) of \(L\) in which the truth predicate does not occur, if \(Tr \cup L \vdash \phi\), then \(L \vdash \phi\) (where ‘\(\vdash\)’ represents provability). Certain truth theories are conservative over arithmetic – e.g., theories that implicitly define truth using only the instances of some version of (ES) – and certain truth theories are not – e.g., Tarski’s (1935 [1956], 1944) compositional theory. Specifically, the addition of certain truth theories allows us to prove that arithmetic is consistent, something that we cannot do if we are confined to arithmetic itself. It has been argued (a) that conservative truth theories are inadequate and (b) that deflationists are committed to conservative truth theories. (See Shapiro 1998 and Ketland 1999; Horsten 1995 provides an earlier version of this argument.) We will explain the arguments for (a) below but to get a flavor of the arguments for (b), consider Shapiro’s rhetorical question: “How thin can the notion of arithmetic truth be, if by invoking it we can learn more about the natural numbers?” Shapiro is surely right to press deflationists on their frequent claims that truth is “thin” or “insubstantial”. It might also be a worry for deflationists if any adequate truth theory allowed us to derive non-logical truths, if they endorse the thesis that truth is merely a “logical property”. On the other hand, deflationists themselves insist that truth is an expressively useful device, and so they cannot be faulted for promoting a theory of truth that allows us to say more about matters not involving truth. To see an argument for (a), consider a Gödel sentence, \(G\), formulated within the language of Peano Arithmetic (henceforth, \(PA)\). \(G\) is not a theorem of PA if PA is consistent (cf. the entry on Gödel’s incompleteness theorems). But \(G\) becomes a theorem when PA is expanded by adding certain plausible principles that appear to govern a truth predicate. Thus, the resultant theory of arithmetical truth is strong enough to prove G and appears therefore to be non-conservative over arithmetic. If, as has been argued by a number of theorists, any adequate account of truth will be non-conservative over a base theory, then deflationists appear to be in trouble. Understood in this way, the “Conservativeness Argument” (henceforth, \(CA)\) is a variant of the objection considered in Section 4.1, claiming that truth plays an explanatory role that deflationism cannot accommodate. There are several deflationary responses to the CA. Field (1999) argues that the worries that arise from the claim that deflationists are in violation of explanatory conservativeness is unfounded. He (ibid., 537) appeals to the expressive role of the truth predicate and maintains that deflationists are committed to a form of “explanatory conservativeness” only insofar as there are no explanations in which the truth predicate is not playing its generalizing role. As a result, he (ibid.) notes that “any use of ‘true’ in explanations which derives solely from its role as a device of generalization should be perfectly acceptable”. For responses to Field, see Horsten 2011 (61) and Halbach 2011 (315–6). Responding to the CA, Daniel Waxman (2017) identifies two readings of ‘conservativeness’, one semantic and the other syntactic, which correspond to two conceptions of arithmetic. On the first conception, arithmetic is understood categorically as given by the standard model. On the second conception, arithmetic is understood axiomatically and is captured by the acceptance of some first-order theory, such as PA. Waxman argues that deflationism can be conservative given either conception, so that the CA does not go through. Julien Murzi and Lorenzo Rossi (2018) argue that Waxman’s attempt at marrying deflationism with conservativeness – his “conservative deflationism” – is unsuccessful. They (ibid.) reject the adoption of this view on the assumption that one’s conception of arithmetic is axiomatic, claiming, in effect, that a deflationist’s commitment to a conservative conception of truth is misguided (cf. Halbach 2011, Horsten 2011, Cieśliński 2015, and Galinon 2015). Jody Azzouni (1999) defends the “first-order deflationist”, viz., a deflationist who endorses what Waxman (ibid.) calls “the axiomatic conception of arithmetic” and whose subsequent understanding cannot rule out the eligibility of non-standard models. Azzouni accepts the need to prove certain ‘true’-involving generalizations, but he maintains that there are some generalizations that are about truths that a first-order deflationist need not prove. He further contends that if one does extend her theory of truth in a way that allows her to establish these generalizations, she should not expect her theory to be conservative, nor should she continue describing it as a deflationary view of truth. For a response to Azzouni (ibid.), see Waxman (2017, 453). In line with Field’s response to the CA, Lavinia Picollo and Thomas Schindler (2020) argue that the conservativeness constraint imposed by Horsten 1995, Shapiro 1998, Ketland 1999, and others is not a reasonable requirement to impose on deflationary accounts. They contend that the insistence on conservativeness arises from making too much of the metaphor of “insubstantiality” and that it fails to see what the function of the truth predicate really amounts to. Their leading idea is that, from a deflationist’s perspective, the logico-linguistic function of the truth predicate is to simulate sentential and predicate quantification in a first-order setting (cf. Horwich 1998a, 4, n. 1). They maintain that, for a deflationist, in conjunction with first-order quantifiers, the truth predicate has the same function as sentential and predicate quantifiers. So, we should not expect the deflationist’s truth theory to conservatively extend its base theory. 4.7 Normativity It is commonly said that our beliefs and assertions aim at truth, or present things as being true, and that truth is therefore a norm of assertion and belief. This putative fact about truth and assertion in particular has been seen to suggest that deflationism must be false (cf. Wright 1992 and Bar-On and Simmons 2007). However, the felt incompatibility between normativity and deflationism is difficult to make precise. The first thing to note is that there is certainly a sense in which deflationism is consistent with the idea that truth is a norm of assertion. To illustrate this, notice (as we saw in examining truth’s putative explanatory role) that we can obtain an intuitive understanding of this idea without mentioning truth at all, so long as we focus on a particular case. Suppose that for whatever reason Mary sincerely believes that snow is green, has good evidence for this belief, and on the basis of this belief and evidence asserts that snow is green. We might say that there is a norm of assertion that implies that Mary is still open to criticism in this case. After all, since snow is not green, there must be something incorrect or defective about Mary’s assertion (and similarly for her belief). It is this incorrectness or defectiveness that the idea that truth is a norm of assertion (and belief) is trying to capture. To arrive at a general statement of the norm that lies behind this particular case, consider that here, what we recognize is To generalize on this, what we want to do is generalize on the positions occupied by ‘snow is green’ and express something along the lines of The problem of providing a general statement like (14) is the same issue first raised in Section 1.3, and the solution by now should be familiar. To state the norm in general we would need to be able to do something we seem unable to do in ordinary language, namely, employ sentential variables and quantifiers for them. But this is where the notion of truth comes in. Because (ES) gives us its contraposition, Reading ‘\(\langle p\rangle\)’ as ‘that \(p\)’, we can reformulate (14) as But since the variable ‘\(p\)’ occurs only in nominalized contexts in (15), we can replace it with an object variable, ‘\(x\)’, and bind this with an ordinary objectual quantifier, to get Or, to put it as some philosophers might: In short, then, deflationists need not deny that we appeal to the notion of truth to express a norm of assertion; on the contrary, the concept of truth seems required to state that very generalization. If deflationists can account for the fact that we must apply the notion of truth to express a norm of assertion, then does normativity pose any problem for deflationism? Crispin Wright (1992, 15–23) argues that it does, claiming that deflationism is inherently unstable because there is a distinctive norm for assertoric practice that goes beyond the norms for warranted assertibility – that the norms of truth and warranted assertibility are potentially extensionally divergent. This separate norm of truth, he claims, is already implicit just in acceptance of the instances of (ES). He points out that not having warrant to assert some sentence does not yield having warrant to assert its negation. However, because (ES) gives us (ES-con), we have, in each instance, an inference (going from right to left) from the sentence mentioned not being true to the negation of the sentence. But the instance of (ES) for the negation of any sentence, takes us (again, going from right to left) from the negated sentence to an ascription of truth to that negated sentence. Thus, some sentence not being true does yield that the negation of the sentence is true, in contrast with warranted assertibility. This difference, Wright (ibid., 18) claims, reveals that, by deflationism’s own lights, the truth predicate expresses a distinct norm governing assertion, which is incompatible with the deflationary contention “that ‘true’ is only grammatically a predicate whose role is not to attribute a substantial characteristic”. Rejecting Wright’s argument for the instability of deflationism, Ian Rumfitt (1995, 103) notes that if we add the ideas of denying something and of having warrant for doing so (“anti-warrant”) to Wright’s characterization of deflationism, this would make ‘is not true’ simply a device of rejection governed by the norm that “[t]he predicate ‘is not true’ may be applied to any sentence for which one has an anti-warrant”. But then truth-talk’s behavior with negation would not have to be seen as indicating that it marks a distinct norm beyond justified assertibility and justifiable deniability, which would be perfectly compatible with deflationism. Field (1994a, 264–5) offers a deflationary response to Wright’s challenge (as well as to a similar objection regarding normativity from Putnam (1983a, 279–80)), pointing again to the generalizing role of the truth predicate in such normative desires as one to utter only true sentences or one to have only true beliefs. Field agrees with Wright that truth-talk expresses a norm beyond warranted assertibility, but he (1994a, 265) also maintains that “there is no difficulty in desiring that all one’s beliefs be disquotationally true; and not only can each of us desire such things, there can be a general practice of badgering other to into having such desires”. Horwich (1996, 879–80) argues that Wright’s rejection of deflationism does not follow from showing that one can use the truth predicate to express a norm beyond warranted assertibility. Like Field, Horwich claims that Wright missed the point that, in the expression of such a norm, the truth predicate is just playing its generalizing role. For other objections to deflationism based on truth’s normative role, see Price 1998, 2003 and McGrath 2003. 4.8 Inflationist Deflationism? Another objection to deflationism begins by drawing attention to a little-known doctrine about truth that G.E. Moore held at the beginning of the 20th Century. Richard Cartwright (1987, 73) describes the view as follows: “a true proposition is one that has a certain simple unanalyzable property, and a false proposition is one that lacks the property”. This doctrine about truth is to be understood as the analogue for the doctrine that Moore held about goodness, namely that goodness is a simple, unanalyzable quality. The potential problem that this Moorean view about truth presents for deflationism might best be expressed in the form of a question: What is the difference between the Moorean view and deflationism? One might reply that, according to deflationary theories, the concept of truth has an important logical role, i.e., expressing certain generalizations, whereas the concept of goodness does not. However, this doesn’t really answer our question. For one thing, it isn’t clear that Moore’s notion of truth does not also capture generalizations, since it too will yield all of the instances of (ES). For another, the idea that the concept of truth plays an important logical role doesn’t distinguish the metaphysics of deflationary conceptions from the metaphysics of the Moorean view, and it is the metaphysics of the matter that the present objection really brings into focus. Alternatively, one might suggest that the distinction between truth according to Moore’s view and deflationary conceptions of truth is the distinction between having a simple unanalyzable nature, and not having any underlying nature at all. But what is that distinction? It is certainly not obvious that there is any distinction between having a nature about which nothing can be said and having no nature at all. How might a deflationist respond to this alleged problem? The key move will be to focus on the property of being true. For the Moorean, this property is a simple unanalyzable one. But deflationists need not be committed to this. As we have seen, some deflationists think that there is no truth property at all. And even among deflationists who accept that there is some insubstantial truth property, it is not clear that this is the sort of property that the Moorean has in mind. To say that a property is unanalyzable suggests that the property is a fundamental property. One might understand this in something like the sense that Lewis proposes, i.e., as a property that is sparse and perfectly natural. Or one might understand a fundamental property as one that is groundable but not grounded in anything. But deflationists need not understand a purported property of being true in either of these ways. As noted in Section 1.2, they may think of it as an abundant property rather than a sparse one, or as one that is ungroundable. In this way, there are options available for deflationists who want to distinguish themselves from the Moorean view of truth.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

The Correspondence Theory of Truth

1. History of the Correspondence Theory The correspondence theory is often traced back to Aristotle’s well-known definition of truth (Metaphysics 1011b25): “To say of what is that it is not, or of what is not that it is, is false, while to say of what is that it is, and …

1. History of the Correspondence Theory The correspondence theory is often traced back to Aristotle’s well-known definition of truth (Metaphysics 1011b25): “To say of what is that it is not, or of what is not that it is, is false, while to say of what is that it is, and of what is not that it is not, is true”—but virtually identical formulations can be found in Plato (Cratylus 385b2, Sophist 263b). It is noteworthy that this definition does not highlight the basic correspondence intuition. Although it does allude to a relation (saying something of something) to reality (what is), the relation is not made very explicit, and there is no specification of what on the part of reality is responsible for the truth of a saying. As such, the definition offers a muted, relatively minimal version of a correspondence theory. (For this reason it has also been claimed as a precursor of deflationary theories of truth.) Aristotle sounds much more like a genuine correspondence theorist in the Categories (12b11, 14b14), where he talks of underlying things that make statements true and implies that these things (pragmata) are logically structured situations or facts (viz., his sitting and his not sitting are said to underlie the statements “He is sitting” and “He is not sitting”, respectively). Most influential is Aristotle’s claim in De Interpretatione (16a3) that thoughts are “likenesses” (homoiomata) of things. Although he nowhere defines truth in terms of a thought’s likeness to a thing or fact, it is clear that such a definition would fit well into his overall philosophy of mind. (Cf. Crivelli 2004; Szaif 2006.) 1.1 Metaphysical and Semantic Versions In medieval authors we find a division between “metaphysical” and “semantic” versions of the correspondence theory. The former are indebted to the truth-as-likeness theme suggested by Aristotle’s overall views, the latter are modeled on Aristotle’s more austere definition from Metaphysics 1011b25. The metaphysical version presented by Thomas Aquinas is the best known: “Veritas est adaequatio rei et intellectus” (Truth is the equation of thing and intellect), which he restates as: “A judgment is said to be true when it conforms to the external reality”. He tends to use “conformitas” and “adaequatio”, but also uses “correspondentia”, giving the latter a more generic sense (De Veritate, Q.1, A.1-3; cf. Summa Theologiae, Q.16). Aquinas credits the Neoplatonist Isaac Israeli with this definition, but there is no such definition in Isaac. Correspondence formulations can be traced back to the Academic skeptic Carneades, 2nd century B.C., whom Sextus Empiricus (Adversos Mathematicos, vii, 168) reports as having taught that a presentation “is true when it is in accord (symphonos) with the object presented, and false when it is in discord with it”. Similar accounts can be found in various early commentators on Plato and Aristotle (cf. Künne 2003, chap. 3.1), including some Neoplatonists: Proklos (In Tim., II 287, 1) speaks of truth as the agreement or adjustment (epharmoge) between knower and the known. Philoponus (In Cat., 81, 25-34) emphasizes that truth is neither in the things or states of affairs (pragmata) themselves, nor in the statement itself, but lies in the agreement between the two. He gives the simile of the fitting shoe, the fit consisting in a relation between shoe and foot, not to be found in either one by itself. Note that his emphasis on the relation as opposed to its relata is laudable but potentially misleading, because x’s truth (its being true) is not to be identified with a relation, R, between x and y, but with a general relational property of x, taking the form (∃y)(xRy & Fy). Further early correspondence formulations can be found in Avicenna (Metaphysica, 1.8-9) and Averroes (Tahafut, 103, 302). They were introduced to the scholastics by William of Auxerre, who may have been the intended recipient of Aquinas’ mistaken attribution (cf. Boehner 1958; Wolenski 1994). Aquinas’ balanced formula “equation of thing and intellect” is intended to leave room for the idea that “true” can be applied not only to thoughts and judgments but also to things or persons (e.g. a true friend). Aquinas explains that a thought is said to be true because it conforms to reality, whereas a thing or person is said to be true because it conforms to a thought (a friend is true insofar as, and because, she conforms to our, or God’s, conception of what a friend ought to be). Medieval theologians regarded both, judgment-truth as well as thing/person-truth, as somehow flowing from, or grounded in, the deepest truth which, according to the Bible, is God: “I am the way and the truth and the life” (John 14, 6). Their attempts to integrate this Biblical passage with more ordinary thinking involving truth gave rise to deep metaphysico-theological reflections. The notion of thing/person-truth, which thus played a very important role in medieval thinking, is disregarded by modern and contemporary analytic philosophers but survives to some extent in existentialist and continental philosophy. Medieval authors who prefer a semantic version of the correspondence theory often use a peculiarly truncated formula to render Aristotle’s definition: A (mental) sentence is true if and only if, as it signifies, so it is (sicut significat, ita est). This emphasizes the semantic relation of signification while remaining maximally elusive about what the “it” is that is signified by a true sentence and de-emphasizing the correspondence relation (putting it into the little words “as” and “so”). Foreshadowing a favorite approach of the 20th century, medieval semanticists like Ockham (Summa Logicae, II) and Buridan (Sophismata, II) give exhaustive lists of different truth-conditional clauses for sentences of different grammatical categories. They refrain from associating true sentences in general with items from a single ontological category. (Cf. Moody 1953; Adams McCord 1987; Perler 2006.) Authors of the modern period generally convey the impression that the correspondence theory of truth is far too obvious to merit much, or any, discussion. Brief statements of some version or other can be found in almost all major writers; see e.g.: Descartes 1639, ATII 597; Spinoza, Ethics, axiom vi; Locke, Essay, 4.5.1; Leibniz, New Essays, 4.5.2; Hume, Treatise, 3.1.1; and Kant 1787, B82. Berkeley, who does not seem to offer any account of truth, is a potentially significant exception. Due to the influence of Thomism, metaphysical versions of the theory are much more popular with the moderns than semantic versions. But since the moderns generally subscribe to a representational theory of the mind (the theory of ideas), they would seem to be ultimately committed to spelling out relations like correspondence or conformity in terms of a psycho-semantic representation relation holding between ideas, or sentential sequences of ideas (Locke’s “mental propositions”), and appropriate portions of reality, thereby effecting a merger between metaphysical and semantic versions of the correspondence theory. 1.2 Object-Based and Fact-Based Versions It is helpful to distinguish between “object-based” and “fact-based” versions of correspondence theories, depending on whether the corresponding portion of reality is said to be an object or a fact (cf. Künne 2003, chap. 3). Traditional versions of object-based theories assumed that the truth-bearing items (usually taken to be judgments) have subject-predicate structure. An object-based definition of truth might look like this: A judgment is true if and only if its predicate corresponds to its object (i.e., to the object referred to by the subject term of the judgment). Note that this actually involves two relations to an object: (i) a reference relation, holding between the subject term of the judgment and the object the judgment is about (its object); and (ii) a correspondence relation, holding between the predicate term of the judgment and a property of the object. Owing to its reliance on the subject-predicate structure of truth-bearing items, the account suffers from an inherent limitation: it does not cover truthbearers that lack subject-predicate structure (e.g. conditionals, disjunctions), and it is not clear how the account might be extended to cover them. The problem is obvious and serious; it was nevertheless simply ignored in most writings. Object-based correspondence was the norm until relatively recently. Object-based correspondence became the norm through Plato’s pivotal engagement with the problem of falsehood, which was apparently notorious at its time. In a number of dialogues, Plato comes up against an argument, advanced by various Sophists, to the effect that false judgment is impossible—roughly: To judge falsely is to judge what is not. But one cannot judge what is not, for it is not there to be judged. To judge something that is not is to judge nothing, hence, not to judge at all. Therefore, false judgment is impossible. (Cf. Euthydemus 283e-288a; Cratylus 429c-e; Republic 478a-c; Theaetetus 188d-190e.) Plato has no good answer to this patent absurdity until the Sophist (236d-264b), where he finally confronts the issue at length. The key step in his solution is the analysis of truthbearers as structured complexes. A simple sentence, such as “Theaetetus sits.”, though simple as a sentence, is still a complex whole consisting of words of different kinds—a name (onoma) and a verb (rhema)—having different functions. By weaving together verbs with names the speaker does not just name a number of things, but accomplishes something: meaningful speech (logos) expressive of the interweaving of ideas (eidon symploken). The simple sentence is true when Theaetetus, the person named by the name, is in the state of sitting, ascribed to him through the verb, and false, when Theaetetus is not in that state but in another one (cf. 261c-263d; see Denyer 1991; Szaif 1998). Only things that are show up in this account: in the case of falsehood, the ascribed state still is, but it is a state different from the one Theaetetus is in. The account is extended from speech to thought and belief via Plato’s well known thesis that “thought is speech that occurs without voice, inside the soul in conversation with itself” (263e)—the historical origin of the language-of-thought hypothesis. The account does not take into consideration sentences that contain a name of something that is not (“Pegasus flies”), thus bequeathing to posterity a residual problem that would become more notorious than the problem of falsehood. Aristotle, in De Interpretatione, adopts Plato’s account without much ado—indeed, the beginning of De Interpretatione reads like a direct continuation of the passages from the Sophist mentioned above. He emphasizes that truth and falsehood have to do with combination and separation (cf. De Int. 16a10; in De Anima 430a25, he says: “where the alternative of true and false applies, there we always find a sort of combining of objects of thought in a quasi-unity”). Unlike Plato, Aristotle feels the need to characterize simple affirmative and negative statements (predications) separately—translating rather more literally than is usual: “An affirmation is a predication of something toward something, a negation is a predication of something away from something” (De Int. 17a25). This characterization reappears early in the Prior Analytics (24a). It thus seems fair to say that the subject-predicate analysis of simple declarative sentences—the most basic feature of Aristotelian term logic which was to reign supreme for many centuries—had its origin in Plato’s response to a sophistical argument against the possibility of falsehood. One may note that Aristotle’s famous definition of truth (see Section 1) actually begins with the definition of falsehood. Fact-based correspondence theories became prominent only in the 20th century, though one can find remarks in Aristotle that fit this approach (see Section 1)—somewhat surprisingly in light of his repeated emphasis on subject-predicate structure wherever truth and falsehood are concerned. Fact-based theories do not presuppose that the truth-bearing items have subject-predicate structure; indeed, they can be stated without any explicit reference to the structure of truth-bearing items. The approach thus embodies an alternative response to the problem of falsehood, a response that may claim to extricate the theory of truth from the limitations imposed on it through the presupposition of subject-predicate structure inherited from the response to the problem of falsehood favored by Plato, Aristotle, and the medieval and modern tradition. The now classical formulation of a fact-based correspondence theory was foreshadowed by Hume (Treatise, 3.1.1) and Mill (Logic, 1.5.1). It appears in its canonical form early in the 20th century in Moore (1910-11, chap. 15) and Russell: “Thus a belief is true when there is a corresponding fact, and is false when there is no corresponding fact” (1912, p. 129; cf. also his 1905, 1906, 1910, and 1913). The self-conscious emphasis on facts as the corresponding portions of reality—and a more serious concern with problems raised by falsehood—distinguishes this version from its foreshadowings. Russell and Moore’s forceful advocacy of truth as correspondence to a fact was, at the time, an integral part of their defense of metaphysical realism. Somewhat ironically, their formulations are indebted to their idealist opponents, F. H. Bradley (1883, chaps. 1&2), and H. H. Joachim (1906), the latter was an early advocate of the competing coherence theory, who had set up a correspondence-to-fact account of truth as the main target of his attack on realism. Later, Wittgenstein (1921) and Russell (1918) developed “logical atomism”, which introduces an important modification of the fact-based correspondence approach (see below, Section 7.1). Further modifications of the correspondence theory, bringing a return to more overtly semantic and broadly object-based versions, were influenced by Tarski’s (1935) technical work on truth (cf. Field 1972, Popper 1972). 2. Truthbearers, Truthmakers, Truth 2.1 Truthbearers Correspondence theories of truth have been given for beliefs, thoughts, ideas, judgments, statements, assertions, utterances, sentences, and propositions. It has become customary to talk of truthbearers whenever one wants to stay neutral between these choices. Five points should be kept in mind: 2.2 Truthmakers Talk of truthmakers serves a function similar, but correlative, to talk of truthbearers. A truthmaker is anything that makes some truthbearer true. Different versions of the correspondence theory will have different, and often competing, views about what sort of items true truthbearers correspond to (facts, states of affairs, events, things, tropes, properties). It is convenient to talk of truthmakers whenever one wants to stay neutral between these choices. Four points should be kept in mind: 2.3 Truth The abstract noun “truth” has various uses. (a) It can be used to refer to the general relational property otherwise referred to as being true; though the latter label would be more perspicuous, it is rarely used, even in philosophical discussions. (b) The noun “truth” can be used to refer to the concept that “picks out” the property and is expressed in English by the adjective “true”. Some authors do not distinguish between concept and property; others do, or should: an account of the concept might differ significantly from an account of the property. To mention just one example, one might maintain, with some plausibility, that an account of the concept ought to succumb to the liar paradox (see the entry on the liar paradox), otherwise it wouldn’t be an adequate account of our concept of truth; this idea is considerably less plausible in the case of the property. Any proposed “definition of truth” might be intend as a definition of the property or of the concept or both; its author may or may not be alive to the difference. (c) The noun “truth” can be used, finally, to refer to some set of true truthbarers (possibly unknown), as in: “The truth is out there”, and: “The truth about this matter will never be known”. 3. Simple Versions of the Correspondence Theory The traditional centerpiece of any correspondence theory is a definition of truth. Nowadays, a correspondence definition is most likely intended as a “real definition”, i.e., as a definition of the property, which does not commit its advocate to the claim that the definition provides a synonym for the term “true”. Most correspondence theorists would consider it implausible and unnecessarily bold to maintain that “true” means the same as “corresponds with a fact”. Some simple forms of correspondence definitions of truth should be distinguished (“iff” means “if and only if”; the variable, “x”, ranges over whatever truthbearers are taken as primary; the notion of correspondence might be replaced by various related notions): (1) x is true iff x corresponds to some fact; x is false iff x does not correspond to any fact. (2) x is true iff x corresponds to some state of affairs that obtains; x is false iff x corresponds to some state of affairs that does not obtain. Both forms invoke portions of reality—facts/states of affairs—that are typically denoted by that-clauses or by sentential gerundives, viz. the fact/state of affairs that snow is white, or the fact/state of affairs of snow’s being white. (2)’s definition of falsehood is committed to there being (existing) entities of this sort that nevertheless fail to obtain, such as snow’s being green. (1)’s definition of falsehood is not so committed: to say that a fact does not obtain means, at best, that there is no such fact, that no such fact exists. It should be noted that this terminology is not standardized: some authors use “state of affairs” much like “fact” is used here (e.g. Armstrong 1997). The question whether non-obtaining beings of the relevant sort are to be accepted is the substantive issue behind such terminological variations. The difference between (2) and (1) is akin to the difference between Platonism about properties (embraces uninstantiated properties) and Aristotelianism about properties (rejects uninstantiated properties). Advocates of (2) hold that facts are states of affairs that obtain, i.e., they hold that their account of truth is in effect an analysis of (1)’s account of truth. So disagreement turns largely on the treatment of falsehood, which (1) simply identifies with the absence of truth. The following points might be made for preferring (2) over (1): (a) Form (2) does not imply that things outside the category of truthbearers (tables, dogs) are false just because they don’t correspond to any facts. One might think this “flaw” of (1) is easily repaired: just put an explicit specification of the desired category of truthbearers into both sides of (1). However, some worry that truthbearer categories, e.g. declarative sentences or propositions, cannot be defined without invoking truth and falsehood, which would make the resultant definition implicitly circular. (b) Form (2) allows for items within the category of truthbearers that are neither true nor false, i.e., it allows for the failure of bivalence. Some, though not all, will regard this as a significant advantage. (c) If the primary truthbearers are sentences or mental states, then states of affairs could be their meanings or contents, and the correspondence relation in (2) could be understood accordingly, as the relation of representation, signification, meaning, or having-as-content. Facts, on the other hand, cannot be identified with the meanings or contents of sentences or mental states, on pain of the absurd consequence that false sentences and beliefs have no meaning or content. (d) Take a truth of the form ‘p or q’, where ‘p’ is true and ‘q’ false. What are the constituents of the corresponding fact? Since ‘q’ is false, they cannot both be facts (cf. Russell 1906-07, p. 47f.). Form (2) allows that the fact corresponding to ‘p or q’ is an obtaining disjunctive state of affairs composed of a state of affairs that obtains and a state of affairs that does not obtain. The main point in favor of (1) over (2) is that (1) is not committed to counting non-obtaining states of affairs, like the state of affairs that snow is green, as constituents of reality. (One might observe that, strictly speaking, (1) and (2), being biconditionals, are not ontologically committed to anything. Their respective commitments to facts and states of affairs arise only when they are combined with claims to the effect that there is something that is true and something that is false. The discussion assumes some such claims as given.) Both forms, (1) and (2), should be distinguished from: (3) x is true iff x corresponds to some fact that exists; x is false iff x corresponds to some fact that does not exist, which is a confused version of (1), or a confused version of (2), or, if unconfused, signals commitment to Meinongianism, i.e., the thesis that there are things/facts that do not exist. The lure of (3) stems from the desire to offer more than a purely negative correspondence account of falsehood while avoiding commitment to non-obtaining states of affairs. Moore at times succumbs to (3)’s temptations (1910-11, pp. 267 & 269, but see p. 277). It can also be found in the 1961 translation of Wittgenstein (1921, 4.25), who uses “state of affairs” (Sachverhalt) to refer to (atomic) facts. The translation has Wittgenstein saying that an elementary proposition is false, when the corresponding state of affairs (atomic fact) does not exist—but the German original of the same passage looks rather like a version of (2). Somewhat ironically, a definition of form (3) reintroduces Plato’s problem of falsehood into a fact-based correspondence theory, i.e., into a theory of the sort that was supposed to provide an alternative solution to that very problem (see Section 1.2). A fourth simple form of correspondence definition was popular for a time (cf. Russell 1918, secs. 1 & 3; Broad 1933, IV.2.23; Austin 1950, fn. 23), but seems to have fallen out of favor: (4) x is true iff x corresponds (agrees) with some fact; x is false iff x mis-corresponds (disagrees) with some fact. This formulation attempts to avoid (2)’s commitment to non-obtaining states of affairs and (3)’s commitment to non-existent facts by invoking the relation of mis-correspondence, or disagreement, to account for falsehood. It differs from (1) in that it attempts to keep items outside the intended category of x’s from being false: supposedly, tables and dogs cannot mis-correspond with a fact. Main worries about (4) are: (a) its invocation of an additional, potentially mysterious, relation, which (b) seems difficult to tame: Which fact is the one that mis-corresponds with a given falsehood? and: What keeps a truth, which by definition corresponds with some fact, from also mis-corresponding with some other fact, i.e., from being a falsehood as well? In the following, I will treat definitions (1) and (2) as paradigmatic; moreover, since advocates of (2) agree that obtaining states of affairs are facts, it is often convenient to condense the correspondence theory into the simpler formula provided by (1), “truth is correspondence to a fact”, at least as long as one is not particularly concerned with issues raised by falsehood. 4. Arguments for the Correspondence Theory The main positive argument given by advocates of the correspondence theory of truth is its obviousness. Descartes: “I have never had any doubts about truth, because it seems a notion so transcendentally clear that nobody can be ignorant of it...the word ‘truth’, in the strict sense, denotes the conformity of thought with its object” (1639, AT II 597). Even philosophers whose overall views may well lead one to expect otherwise tend to agree. Kant: “The nominal definition of truth, that it is the agreement of [a cognition] with its object, is assumed as granted” (1787, B82). William James: “Truth, as any dictionary will tell you, is a property of certain of our ideas. It means their ‘agreement’, as falsity means their disagreement, with ‘reality’” (1907, p. 96). Indeed, The Oxford English Dictionary tells us: “Truth, n. Conformity with fact; agreement with reality”. In view of its claimed obviousness, it would seem interesting to learn how popular the correspondence theory actually is. There are some empirical data. The PhilPapers Survey (conducted in 2009; cf. Bourget and Chalmers 2014), more specifically, the part of the survey targeting all regular faculty members in 99 leading departments of philosophy, reports the following responses to the question: “Truth: correspondence, deflationary, or epistemic?” Accept or lean toward: correspondence 50.8%; deflationary 24.8%; other 17.5%; epistemic 6.9%. The data suggest that correspondence-type theories may enjoy a weak majority among professional philosophers and that the opposition is divided. This fits with the observation that typically, discussions of the nature of truth take some version of the correspondence theory as the default view, the view to be criticized or to be defended against criticism. Historically, the correspondence theory, usually in an object-based version, was taken for granted, so much so that it did not acquire this name until comparatively recently, and explicit arguments for the view are very hard to find. Since the (comparatively recent) arrival of apparently competing approaches, correspondence theorists have developed negative arguments, defending their view against objections and attacking (sometimes ridiculing) competing views. 5. Objections to the Correspondence Theory Objection 1: Definitions like (1) or (2) are too narrow. Although they apply to truths from some domains of discourse, e.g., the domain of science, they fail for others, e.g. the domain of morality: there are no moral facts. The objection recognizes moral truths, but rejects the idea that reality contains moral facts for moral truths to correspond to. Logic provides another example of a domain that has been “flagged” in this way. The logical positivists recognized logical truths but rejected logical facts. Their intellectual ancestor, Hume, had already given two definitions of “true”, one for logical truths, broadly conceived, the other for non-logical truths: “Truth or falsehood consists in an agreement or disagreement either to the real relations of ideas, or to real existence and matter of fact” (Hume, Treatise, 3.1.1, cf. 2.3.10; see also Locke, Essay, 4.5.6, for a similarly two-pronged account but in terms of object-based correspondence). There are four possible responses to objections of this sort: (a) Noncognitivism, which says that, despite appearances to the contrary, claims from the flagged domain are not truth-evaluable to begin with, e.g., moral claims are commands or expressions of emotions disguised as truthbearers; (b) Error theory, which says that all claims from the flagged domain are false; (c) Reductionism, which says that truths from the flagged domain correspond to facts of a different domain regarded as unproblematic, e.g., moral truths correspond to social-behavioral facts, logical truths correspond to facts about linguistic conventions; and (d) Standing firm, i.e., embracing facts of the flagged domain. The objection in effect maintains that there are different brands of truth (of the property being true, not just different brands of truths) for different domains. On the face of it, this conflicts with the observation that there are many obviously valid arguments combining premises from flagged and unflagged domains. The observation is widely regarded as refuting non-cognitivism, once the most popular (concessive) response to the objection. In connection with this objection, one should take note of the recently developed “multiple realizability” view of truth, according to which truth is not to be identified with correspondence to fact but can be realized by correspondence to fact for truthbearers of some domains of discourse and by other properties for truthbearers of other domains of discourse, including “flagged” domains. Though it retains important elements of the correspondence theory, this view does not, strictly speaking, offer a response to the objection on behalf of the correspondence theory and should be regarded as one of its competitors (see below, Section 8.2). Objection 2: Correspondence theories are too obvious. They are trivial, vacuous, trading in mere platitudes. Locutions from the “corresponds to the facts”-family are used regularly in everyday language as idiomatic substitutes for “true”. Such common turns of phrase should not be taken to indicate commitment to a correspondence theory in any serious sense. Definitions like (1) or (2) merely condense some trivial idioms into handy formulas; they don’t deserve the grand label “theory”: there is no theoretical weight behind them (cf. Woozley 1949, chap. 6; Davidson 1969; Blackburn 1984, chap. 7.1). In response, one could point out: (a) Definitions like (1) or (2) are “mini-theories”—mini-theories are quite common in philosophy—and it is not at all obvious that they are vacuous merely because they are modeled on common usage. (b) There are correspondence theories that go beyond these definitions. (c) The complaint implies that definitions like (1) and/or (2) are generally accepted and are, moreover, so shallow that they are compatible with any deeper theory of truth. This makes it rather difficult to explain why some thinkers emphatically reject all correspondence formulations. (d) The objection implies that the correspondence of S’s belief with a fact could be said to consist in, e.g., the belief’s coherence with S’s overall belief system. This is wildly implausible, even on the most shallow understanding of “correspondence” and “fact”. Objection 3: Correspondence theories are too obscure. Objections of this sort, which are the most common, protest that the central notions of a correspondence theory carry unacceptable commitments and/or cannot be accounted for in any respectable manner. The objections can be divided into objections primarily aimed at the correspondence relation and its relatives (3.C1, 3.C2), and objections primarily aimed at the notions of fact or state of affairs (3.F1, 3.F2): 3.C1: The correspondence relation must be some sort of resemblance relation. But truthbearers do not resemble anything in the world except other truthbearers—echoing Berkeley’s “an idea can be like nothing but an idea”. 3.C2: The correspondence relation is very mysterious: it seems to reach into the most distant regions of space (faster than light?) and time (past and future). How could such a relation possibly be accounted for within a naturalistic framework? What physical relation could it possibly be? 3.F1: Given the great variety of complex truthbearers, a correspondence theory will be committed to all sorts of complex “funny facts” that are ontologically disreputable. Negative, disjunctive, conditional, universal, probabilistic, subjunctive, and counterfactual facts have all given cause for complaint on this score. 3.F2: All facts, even the most simple ones, are disreputable. Fact-talk, being wedded to that-clauses, is entirely parasitic on truth-talk. Facts are too much like truthbearers. Facts are fictions, spurious sentence-like slices of reality, “projected from true sentences for the sake of correspondence” (Quine 1987, p. 213; cf. Strawson 1950). 6. Correspondence as Isomorphism Some correspondence theories of truth are two-liner mini-theories, consisting of little more than a specific version of (1) or (2). Normally, one would expect a bit more, even from a philosophical theory (though mini-theories are quite common in philosophy). One would expect a correspondence theory to go beyond a mere definition like (1) or (2) and discharge a triple task: it should tell us about the workings of the correspondence relation, about the nature of facts, and about the conditions that determine which truthbearers correspond to which facts. One can approach this by considering some general principles a correspondence theory might want to add to its central principle to flesh out her theory. The first such principle says that the correspondence relation must not collapse into identity—“It takes two to make a truth” (Austin 1950, p. 118): Nonidentity: No truth is identical with a fact correspondence to which is sufficient for its being a truth. It would be much simpler to say that no truth is identical with a fact. However, some authors, e.g. Wittgenstein 1921, hold that a proposition (Satz, his truthbearer) is itself a fact, though not the same fact as the one that makes the proposition true (see also King 2007). Nonidentity is usually taken for granted by correspondence theorists as constitutive of the very idea of a correspondence theory—authors who advance contrary arguments to the effect that correspondence must collapse into identity regard their arguments as objections to any form of correspondence theory (cf. Moore 1901/02, Frege 1918-19, p. 60). Concerning the correspondence relation, two aspects can be distinguished: correspondence as correlation and correspondence as isomorphism (cf. Pitcher 1964; Kirkham 1992, chap. 4). Pertaining to the first aspect, familiar from mathematical contexts, a correspondence theorist is likely to adopt claim (a), and some may in addition adopt claim (b), of: Correlation: (a) Every truth corresponds to exactly one fact; (b) Different truths correspond to different facts. Together, (a) and (b) say that correspondence is a one-one relation. This seems needlessly strong, and it is not easy to find real-life correspondence theorists who explicitly embrace part (b): Why shouldn’t different truths correspond to the same fact, as long as they are not too different? Explicit commitment to (a) is also quite rare. However, correspondence theorists tend to move comfortably from talk about a given truth to talk about the fact it corresponds to—a move that signals commitment to (a). Correlation does not imply anything about the inner nature of the corresponding items. Contrast this with correspondence as isomorphism, which requires the corresponding items to have the same, or sufficiently similar, constituent structure. This aspect of correspondence, which is more prominent (and more notorious) than the previous one, is also much more difficult to make precise. Let us say, roughly, that a correspondence theorist may want to add a claim to her theory committing her to something like the following: Structure: If an item of kind K corresponds to a certain fact, then they have the same or sufficiently similar structure: the overall correspondence between a true K and a fact is a matter of part-wise correspondences, i.e. of their having corresponding constituents in corresponding places in the same structure, or in sufficiently similar structures. The basic idea is that truthbearers and facts are both complex structured entities: truthbearers are composed of (other truthbearers and ultimately of) words, or concepts; facts are composed of (other facts or states of affairs and ultimately of) things, properties, and relations. The aim is to show how the correspondence relation is generated from underlying relations between the ultimate constituents of truthbearers, on the one hand, and the ultimate constituents of their corresponding facts, on the other. One part of the project will be concerned with these correspondence-generating relations: it will lead into a theory that addresses the question how simple words, or concepts, can be about things, properties, and relations; i.e., it will merge with semantics or psycho-semantics (depending on what the truthbearers are taken to be). The other part of the project, the specifically ontological part, will have to provide identity criteria for facts and explain how their simple constituents combine into complex wholes. Putting all this together should yield an account of the conditions determining which truthbearers correspond to which facts. Correlation and Structure reflect distinct aspects of correspondence. One might want to endorse the former without the latter, though it is hard to see how one could endorse the latter without embracing at least part (a) of the former. The isomorphism approach offers an answer to objection 3.C1. Although the truth that the cat is on the mat does not resemble the cat or the mat (the truth doesn’t meow or smell, etc.), it does resemble the fact that the cat is on the mat. This is not a qualitative resemblance; it is a more abstract, structural resemblance. The approach also puts objection 3.C2 in some perspective. The correspondence relation is supposed to reduce to underlying relations between words, or concepts, and reality. Consequently, a correspondence theory is little more than a spin-off from semantics and/or psycho-semantics, i.e. the theory of intentionality construed as incorporating a representational theory of the mind (cf. Fodor 1989). This reminds us that, as a relation, correspondence is no more—but also no less—mysterious than semantic relations in general. Such relations have some curious features, and they raise a host of puzzles and difficult questions—most notoriously: Can they be explained in terms of natural (causal) relations, or do they have to be regarded as irreducibly non-natural aspects of reality? Some philosophers have claimed that semantic relations are too mysterious to be taken seriously, usually on the grounds that they are not explainable in naturalistic terms. But one should bear in mind that this is a very general and extremely radical attack on semantics as a whole, on the very idea that words and concepts can be about things. The common practice to aim this attack specifically at the correspondence theory seems misleading. As far as the intelligibility of the correspondence relation is concerned, the correspondence theory will stand, or fall, with the general theory of reference and intentionality. It should be noted, though, that these points concerning objections 3.C1 and 3.C2 are not independent of one’s views about the nature of the primary truthbearers. If truthbearers are taken to be sentences of an ordinary language (or an idealized version thereof), or if they are taken to be mental representations (sentences of the language of thought), the above points hold without qualification: correspondence will be a semantic or psycho-semantic relation. If, on the other hand, the primary truthbearers are taken to be propositions, there is a complication: But Russellians don’t usually renounce the correspondence theory entirely. Though they have no room for (1) from Section 3, when applied to propositions as truthbearers, correspondence will enter into their account of truth for sentences, public or mental. The account will take the form of Section 3’s (2), applied to categories of truthbearers other than propositions, where Russellian propositions show up on the right-hand side in the guise of states of affairs that obtain or fail to obtain. Commitment to states of affairs in addition to propositions is sometimes regarded with scorn, as a gratuitous ontological duplication. But Russellians are not committed to states of affairs in addition to propositions, for propositions, on their view, must already be states of affairs. This conclusion is well nigh inevitable, once true propositions have been identified with facts. If a true proposition is a fact, then a false proposition that might have been true would have been a fact, if it had been true. So, a (contingent) false proposition must be the same kind of being as a fact, only not a fact—an unfact; but that just is a non-obtaining state of affairs under a different name. Russellian propositions are states of affairs: the false ones are states of affairs that do not obtain, and the true ones are states of affairs that do obtain. The Russellian view of propositions is popular nowadays. Somewhat curiously, contemporary Russellians hardly ever refer to propositions as facts or states of affairs. This is because they are much concerned with understanding belief, belief attributions, and the semantics of sentences. In such contexts, it is more natural to talk proposition-language than state-of-affairs-language. It feels odd (wrong) to say that someone believes a state of affairs, or that states of affairs are true or false. For that matter, it also feels odd (wrong) to say that some propositions are facts, that facts are true, and that propositions obtain or fail to obtain. Nevertheless, all of this must be the literal truth, according to the Russellians. They have to claim that “proposition” and “state of affairs”, much like “evening star” and “morning star”, are different names for the same things—they come with different associations and are at home in somewhat different linguistic environments, which accounts for the felt oddness when one name is transported to the other’s environment. Returning to the isomorphism approach in general, on a strict or naïve implementation of this approach, correspondence will be a one-one relation between truths and corresponding facts, which leaves the approach vulnerable to objections against funny facts (3.F1): each true truthbearer, no matter how complex, will be assigned a matching fact. Moreover, since a strict implementation of isomorphism assigns corresponding entities to all (relevant) constituents of truthbearers, complex facts will contain objects corresponding to the logical constants (“not”, “or”, “if-then”, etc.), and these “logical objects” will have to be regarded as constituents of the world. Many philosophers have found it hard to believe in the existence of all these funny facts and funny quasi-logical objects. The isomorphism approach has never been advocated in a fully naïve form, assigning corresponding objects to each and every wrinkle of our verbal or mental utterings. Instead, proponents try to isolate the “relevant” constituents of truthbearers through meaning analysis, aiming to uncover the logical form, or deep structure, behind ordinary language and thought. This deep structure might then be expressed in an ideal-language (typically, the language of predicate logic), whose syntactic structure is designed to mirror perfectly the ontological structure of reality. The resulting view—correspondence as isomorphism between properly analyzed truthbearers and facts—avoids assigning strange objects to such phrases as “the average husband”, “the sake of”, and “the present king of France”; but the view remains committed to logically complex facts and to logical objects corresponding to the logical constants. Austin (1950) rejects the isomorphism approach on the grounds that it projects the structure of our language onto the world. On his version of the correspondence theory (a more elaborated variant of (4) applied to statements), a statement as a whole is correlated to a state of affairs by arbitrary linguistic conventions without mirroring the inner structure of its correlate (cf. also Vision 2004). This approach appears vulnerable to the objection that it avoids funny facts at the price of neglecting systematicity. Language does not provide separate linguistic conventions for each statement: that would require too vast a number of conventions. Rather, it seems that the truth-values of statements are systematically determined, via a relatively small set of conventions, by the semantic values (relations to reality) of their simpler constituents. Recognition of this systematicity is built right into the isomorphism approach. Critics frequently echo Austin’s “projection”-complaint, 3.F2, that a traditional correspondence theory commits “the error of reading back into the world the features of language” (Austin 1950, p. 155; cf. also, e.g., Rorty 1981). At bottom, this is a pessimistic stance: if there is a prima facie structural resemblance between a mode of speech or thought and some ontological category, it is inferred, pessimistically, that the ontological category is an illusion, a matter of us projecting the structure of our language or thought into the world. Advocates of traditional correspondence theories can be seen as taking the opposite stance: unless there are specific reasons to the contrary, they are prepared to assume, optimistically, that the structure of our language and/or thought reflects genuine ontological categories, that the structure of our language and/or thought is, at least to a significant extent, the way it is because of the structure of the world. 7. Modified Versions of the Correspondence Theory 7.1 Logical Atomism Wittgenstein (1921) and Russell (1918) propose modified fact-based correspondence accounts of truth as part of their program of logical atomism. Such accounts proceed in two stages. At the first stage, the basic truth-definition, say (1) from Section 3, is restricted to a special subclass of truthbearers, the so-called elementary or atomic truthbearers, whose truth is said to consist in their correspondence to (atomic) facts: if x is elementary, then x is true iff x corresponds to some (atomic) fact. This restricted definition serves as the base-clause for truth-conditional recursion-clauses given at the second stage, at which the truth-values of non-elementary, or molecular, truthbearers are explained recursively in terms of their logical structure and the truth-values of their simpler constituents. For example: a sentence of the form ‘not-p’ is true iff ‘p’ is false; a sentence of the form ‘p and q’ is true iff ‘p’ is true and ‘q’ is true; a sentence of the form ‘p or q’ is true iff ‘p’ is true or ‘q’ is true, etc. These recursive clauses (called “truth conditions”) can be reapplied until the truth of a non-elementary, molecular sentence of arbitrary complexity is reduced to the truth or falsehood of its elementary, atomic constituents. Logical atomism exploits the familiar rules, enshrined in the truth-tables, for evaluating complex formulas on the basis of their simpler constituents. These rules can be understood in two different ways: (a) as tracing the ontological relations between complex facts and constituent simpler facts, or (b) as tracing logico-semantic relations, exhibiting how the truth-values of complex sentences can be explained in terms of their logical relations to simpler constituent sentences together with the correspondence and non-correspondence of simple, elementary sentences to atomic facts. Logical atomism takes option (b). Logical atomism is designed to go with the ontological view that the world is the totality of atomic facts (cf. Wittgenstein 1921, 2.04); thus accommodating objection 3.F2 by doing without funny facts: atomic facts are all the facts there are—although real-life atomists tend to allow conjunctive facts, regarding them as mere aggregates of atomic facts. An elementary truth is true because it corresponds to an atomic fact: correspondence is still isomorphism, but it holds exclusively between elementary truths and atomic facts. There is no match between truths and facts at the level of non-elementary, molecular truths; e.g., ‘p’, ‘p or q’, and ‘p or r’ might all be true merely because ‘p’ corresponds to a fact). The trick for avoiding logically complex facts lies in not assigning any entities to the logical constants. Logical complexity, so the idea goes, belongs to the structure of language and/or thought; it is not a feature of the world. This is expressed by Wittgenstein in an often quoted passage (1921, 4.0312): “My fundamental idea is that the ‘logical constants’ are not representatives; that there can be no representatives of the logic of facts”; and also by Russell (1918, p. 209f.): “You must not look about the real world for an object which you can call ‘or’, and say ‘Now look at this. This is ‘or’’”. Though accounts of this sort are naturally classified as versions of the correspondence theory, it should be noted that they are strictly speaking in conflict with the basic forms presented in Section 3. According to logical atomism, it is not the case that for every truth there is a corresponding fact. It is, however, still the case that the being true of every truth is explained in terms of correspondence to a fact (or non-correspondence to any fact) together with (in the case of molecular truths) logical notions detailing the logical structure of complex truthbearers. Logical atomism attempts to avoid commitment to logically complex, funny facts via structural analysis of truthbearers. It should not be confused with a superficially similar account maintaining that molecular facts are ultimately constituted by atomic facts. The latter account would admit complex facts, offering an ontological analysis of their structure, and would thus be compatible with the basic forms presented in Section 3, because it would be compatible with the claim that for every truth there is a corresponding fact. (For more on classical logical atomism, see Wisdom 1931-1933, Urmson 1953, and the entries on Russell's logical atomism and Wittgenstein's logical atomism in this encyclopedia.) While Wittgenstein and Russell seem to have held that the constituents of atomic facts are to be determined on the basis of a priori considerations, Armstrong (1997, 2004) advocates an a posteriori form of logical atomism. On his view, atomic facts are composed of particulars and simple universals (properties and relations). The latter are objective features of the world that ground the objective resemblances between particulars and explain their causal powers. Accordingly, what particulars and universals there are will have to be determined on the basis of total science. Problems: Logical atomism is not easy to sustain and has rarely been held in a pure form. Among its difficulties are the following: (a) What, exactly, are the elementary truthbearers? How are they determined? (b) There are molecular truthbearers, such as subjunctives and counterfactuals, that tend to provoke the funny-fact objection but cannot be handled by simple truth-conditional clauses, because their truth-values do not seem to be determined by the truth-values of their elementary constituents. (c) Are there universal facts corresponding to true universal generalizations? Wittgenstein (1921) disapproves of universal facts; apparently, he wants to re-analyze universal generalizations as infinite conjunctions of their instances. Russell (1918) and Armstrong (1997, 2004) reject this analysis; they admit universal facts. (d) Negative truths are the most notorious problem case, because they clash with an appealing principle, the “truthmaker principle” (cf. Section 8.5), which says that for every truth there must be something in the world that makes it true, i.e., every true truthbearer must have a truthmaker. Suppose ‘p’ is elementary. On the account given above, ‘not-p’ is true iff ‘p’ is false iff ‘p’ does not correspond to any fact; hence, ‘not-p’, if true, is not made true by any fact: it does not seem to have a truthmaker. Russell finds himself driven to admit negative facts, regarded by many as paradigmatically disreputable portions of reality. Wittgenstein sometimes talks of atomic facts that do not exist and calls their very nonexistence a negative fact (cf. 1921, 2.06)—but this is hardly an atomic fact itself. Armstrong (1997, chap. 8.7; 2004, chaps. 5-6) holds that negative truths are made true by a second-order “totality fact” which says of all the (positive) first-order facts that they are all the first-order facts. Atomism and the Russellian view of propositions (see Section 6). By the time Russell advocated logical atomism (around 1918), he had given up on what is now referred to as the Russellian conception of propositions (which he and G. E. Moore held around 1903). But Russellian propositons are popular nowadays. Note that logical atomism is not for the friends of Russellian propositions. The argument is straightforward. We have logically complex beliefs some of which are true. According to the friends of Russellian propositions, the contents of our beliefs are Russellian propositions, and the contents of our true beliefs are true Russellian propositions. Since true Russellian propositions are facts, there must be at least as many complex facts as there are true beliefs with complex contents (and at least as many complex states of affairs as there are true or false beliefs with complex contents). Atomism may work for sentences, public or mental, and for Fregean propositions; but not for Russellian propositions. Logical atomism is designed to address objections to funny facts (3.F1). It is not designed to address objections to facts in general (3.F2). Here logical atomists will respond by defending (atomic) facts. According to one defense, facts are needed because mere objects are not sufficiently articulated to serve as truthmakers. If a were the sole truthmaker of ‘a is F’, then the latter should imply ‘a is G’, for any ‘G’. So the truthmaker for ‘a is F’ needs at least to involve a and Fness. But since Fness is a universal, it could be instantiated in another object, b, hence the mere existence of a and Fness is not sufficient for making true the claim ‘a is F’: a and Fness need to be tied together in the fact of a’s being F. Armstrong (1997) and Olson (1987) also maintain that facts are needed to make sense of the tie that binds particular objects to universals. In this context it is usually emphasized that facts do not supervene on, hence, are not reducible to, their constituents. Facts are entities over and above the particulars and universals of which they are composed: a’s loving b and b’s loving a are not the same fact even though they have the very same constituents. Another defense of facts, surprisingly rare, would point out that many facts are observable: one can see that the cat is on the mat; and this is different from seeing the cat, or the mat, or both. The objection that many facts are not observable would invite the rejoinder that many objects are not observable either. (See Austin 1961, Vendler 1967, chap. 5, and Vision 2004, chap. 3, for more discussion of anti-fact arguments; see also the entry facts in this encyclopedia.) Some atomists propose an atomistic version of definition (1), but without facts, because they regard facts as slices of reality too suspiciously sentence-like to be taken with full ontological seriousness. Instead, they propose events and/or objects-plus-tropes (a.k.a. modes, particularized qualities, moments) as the corresponding portions of reality. It is claimed that these items are more “thingy” than facts but still sufficiently articulated—and sufficiently abundant—to serve as adequate truthmakers (cf. Mulligan, Simons, and Smith 1984). 7.2 Logical “Subatomism” Logical atomism aims at getting by without logically complex truthmakers by restricting definitions like (1) or (2) from Section 3 to elementary truthbearers and accounting for the truth-values of molecular truthbearers recursively in terms of their logical structure and atomic truthmakers (atomic facts, events, objects-plus-tropes). More radical modifications of the correspondence theory push the recursive strategy even further, entirely discarding definitions like (1) or (2), and hence the need for atomic truthmakers, by going, as it were, “subatomic”. Such accounts analyze truthbearers, e.g., sentences, into their subsentential constituents and dissolve the relation of correspondence into appropriate semantic subrelations: names refer to, or denote, objects; predicates (open sentences) apply to, or are satisfied by objects. Satisfaction of complex predicates can be handled recursively in terms of logical structure and satisfaction of simpler constituent predicates: an object o satisfies ‘x is not F’ iff o does not satisfy ‘x is F’; o satisfies ‘x is F or x is G’ iff o satisfies ‘x is F’ or o satisfies ‘x is G’; and so on. These recursions are anchored in a base-clause addressing the satisfaction of primitive predicates: an object o satisfies ‘x is F’ iff o instantiates the property expressed by ‘F’. Some would prefer a more nominalistic base-clause for satisfaction, hoping to get by without seriously invoking properties. Truth for singular sentences, consisting of a name and an arbitrarily complex predicate, is defined thus: A singular sentence is true iff the object denoted by the name satisfies the predicate. Logical machinery provided by Tarski (1935) can be used to turn this simplified sketch into a more general definition of truth—a definition that handles sentences containing relational predicates and quantifiers and covers molecular sentences as well. Whether Tarski’s own definition of truth can be regarded as a correspondence definition, even in this modified sense, is under debate (cf. Popper 1972; Field 1972, 1986; Kirkham 1992, chaps. 5-6; Soames 1999; Künne 2003, chap. 4; Patterson 2008.) Subatomism constitutes a return to (broadly) object-based correspondence. Since it promises to avoid facts and all similarly articulated, sentence-like slices of reality, correspondence theorists who take seriously objection 3.F2 favor this approach: not even elementary truthbearers are assigned any matching truthmakers. The correspondence relation itself has given way to two semantic relations between constituents of truthbearers and objects: reference (or denotation) and satisfaction—relations central to any semantic theory. Some advocates envision causal accounts of reference and satisfaction (cf. Field 1972; Devitt 1982, 1984; Schmitt 1995; Kirkham 1992, chaps. 5-6). It turns out that relational predicates require talk of satisfaction by ordered sequences of objects. Davidson (1969, 1977) maintains that satisfaction by sequences is all that remains of the traditional idea of correspondence to facts; he regards reference and satisfaction as “theoretical constructs” not in need of causal, or any, explanation. Problems: (a) The subatomistic approach accounts for the truth-values of molecular truthbearers in the same way as the atomistic approach; consequently, molecular truthbearers that are not truth-functional still pose the same problems as in atomism. (b) Belief attributions and modal claims pose special problems; e.g., it seems that “believes” is a relational predicate, so that “John believes that snow is white” is true iff “believes” is satisfied by John and the object denoted by “that snow is white”; but the latter appears to be a proposition or state of affairs, which threatens to let in through the back-door the very sentence-like slices of reality the subatomic approach was supposed to avoid, thus undermining the motivation for going subatomic. (c) The phenomenon of referential indeterminacy threatens to undermine the idea that the truth-values of elementary truthbearers are always determined by the denotation and/or satisfaction of their constituents; e.g., pre-relativistic uses of the term “mass” are plausibly taken to lack determinate reference (referring determinately neither to relativistic mass nor to rest mass); yet a claim like “The mass of the earth is greater than the mass of the moon” seems to be determinately true even when made by Newton (cf. Field 1973). Problems for both versions of modified correspondence theories: (a) It is not known whether an entirely general recursive definition of truth, one that covers all truthbearers, can be made available. This depends on unresolved issues concerning the extent to which truthbearers are amenable to the kind of structural analyses that are presupposed by the recursive clauses. The more an account of truth wants to exploit the internal structure of truthbearers, the more it will be hostage to the (limited) availability of appropriate structural analyses of the relevant truthbearers. (b) Any account of truth employing a recursive framework may be virtually committed to taking sentences (maybe sentences of the language of thought) as primary truthbearers. After all, the recursive clauses rely heavily on what appears to be the logico-syntactic structure of truthbearers, and it is unclear whether anything but sentences can plausibly be said to possess that kind of structure. But the thesis that sentences of any sort are to be regarded as the primary truthbearers is contentious. Whether propositions can meaningfully be said to have an analogous (albeit non-linguistic) structure is under debate (cf. Russell 1913, King 2007). (c) If clauses like “‘p or q’ is true iff ‘p’ is true or ‘q’ is true” are to be used in a recursive account of our notion of truth, as opposed to some other notion, it has to be presupposed that ‘or’ expresses disjunction: one cannot define “or” and “true” at the same time. To avoid circularity, a modified correspondence theory (be it atomic or subatomic) must hold that the logical connectives can be understood without reference to correspondence truth. 7.3 Relocating Correspondence Definitions like (1) and (2) from Section 3 assume, naturally, that truthbearers are true because they, the truthbearers themselves, correspond to facts. There are however views that reject this natural assumption. They propose to account for the truth of truthbearers of certain kinds, propositions, not by way of their correspondence to facts, but by way of the correspondence to facts of other items, the ones that have propositions as their contents. Consider the state of believing that p (or the activity of judging that p). The state (the activity) is not, strictly speaking, true or false; rather, what is true or false is its content, the proposition that p. Nevertheless, on the present view, it is the state of believing that p that corresponds or fails to correspond to a fact. So truth/falsehood for propositions can be defined in the following manner: x is a true/false proposition iff there is a belief state B such that x is the content of B and B corresponds/fails to correspond to a fact. Such a modification of fact-based correspondence can be found in Moore (1927, p. 83) and Armstrong (1973, 4.iv & 9). It can be adapted to atomistic (Armstrong) and subatomistic views, and to views on which sentences (of the language of thought) are the primary bearers of truth and falsehood. However, by taking the content-carrying states as the primary corresponders, it entails that there are no truths/falsehoods that are not believed by someone. Most advocates of propositions as primary bearers of truth and falsehood will regard this as a serious weakness, holding that there are very many true and false propositions that are not believed, or even entertained, by anyone. Armstrong (1973) combines the view with an instrumentalist attitude towards propositions, on which propositions are mere abstractions from mental states and should not be taken seriously, ontologically speaking. 8. The Correspondence Theory and Its Competitors 8.1 Traditional Competitors Against the traditional competitors—coherentist, pragmatist, and verificationist and other epistemic theories of truth—correspondence theorists raise two main sorts of objections. First, such accounts tend to lead into relativism. Take, e.g., a coherentist account of truth. Since it is possible that ‘p’ coheres with the belief system of S while ‘not-p’ coheres with the belief system of S*, the coherentist account seems to imply, absurdly, that contradictories, ‘p’ and ‘not-p’, could both be true. To avoid embracing contradictions, coherentists often commit themselves (if only covertly) to the objectionable relativistic view that ‘p’ is true-for-S and ‘not-p’ is true-for-S*. Second, the accounts tend to lead into some form of idealism or anti-realism, e.g., it is possible for the belief that p to cohere with someone’s belief system, even though it is not a fact that p; also, it is possible for it to be a fact that p, even if no one believes that p at all or if the belief does not cohere with anyone’s belief system. Cases of this sort are frequently cited as counterexamples to coherentist accounts of truth. Dedicated coherentists tend to reject such counterexamples, insisting that they are not possible after all. Since it is hard to see why they would not be possible, unless its being a fact that p were determined by the belief’s coherence with other beliefs, this reaction commits them to the anti-realist view that the facts are (largely) determined by what we believe. This offers a bare outline of the overall shape the debates tend to take. For more on the correspondence theory vs. its traditional competitors see, e.g., Vision 1988; Kirkham 1992, chaps. 3, 7-8; Schmitt 1995; Künne 2003, chap. 7; and essays in Lynch 2001. Walker 1989 is a book-lenght discussion of coherence theories of truth. See also the entries on pragmatism, relativism, the coherence theory of truth, in this encyclopedia. 8.2 Pluralism The correspondence theory is sometimes accused of overreaching itself: it does apply, so the objection goes, to truths from some domains of discourse, e.g., scientific discourse and/or discourse about everyday midsized physical things, but not to truths from various other domains of discourse, e.g., ethical and/or aesthetic discourse (see the first objection in Section 5 above). Alethic pluralism grows out of this objection, maintaining that truth is constituted by different properties for true propositions from different domains of discourse: by correspondence to fact for true propositions from the domain of scientific or everyday discourse about physical things; by some epistemic property, such as coherence or superassertibility, for true propositions from the domain of ethical and aesthetic discourse, and maybe by still other properties for other domains of discourse. This suggests a position on which the term “true” is multiply ambiguous, expressing different properties when applied to propositions from different domains. However, contemporary pluralists reject this problematic idea, maintaining instead that truth is “multiply realizable”. That is, the term “true” is univocal, it expresses one concept or property, truth (being true), but one that can be realized by or manifested in different properties (correspondence to fact, coherence or superassertibility, and maybe others) for true propositions from different domains of discourse. Truth itself is not to be identified with any of its realizing properties. Instead, it is characterized, quasi axiomatically, by a set of alleged “platitudes”, including, according to Crispin Wright’s (1999) version, “transparency” (to assert is to present as true), “contrast” (a proposition may be true without being justified, and v.v.), “timelesness” (if a proposition is ever true, then it always is), “absoluteness” (there is no such thing as a proposition being more or less true), and others. Though it contains the correspondence theory as one ingredient, alethic pluralism is nevertheless a genuine competitor, for it rejects the thesis that truth is correspondence to reality. Moreover, it equally contains competitors of the correspondence theory as further ingredients. Alethic pluralism in its contemporary form is a relatively young position. It was inaugurated by Crispin Wright (1992; see also 1999) and was later developed into a somewhat different form by Lynch (2009). Critical discussion is still at a relatively nascent stage (but see Vision 2004, chap. 4, for extended discussion of Wright). It will likely focus on two main problem areas. First, it seems difficult to sort propositions into distinct kinds according to the subject matter they are about. Take, e.g., the proposition that killing is morally wrong, or the proposition that immoral acts happen in space-time. What are they about? Intuitively, their subject matter is mixed, belonging to the physical domain, the biological domain, and the domain of ethical discourse. It is hard to see how pluralism can account for the truth of such mixed propositions, belonging to more than one domain of discourse: What will be the realizing property? Second, pluralists are expected to explain how the platitudes can be “converted” into an account of truth itself. Lynch (2009) proposes to construe truth as a functional property, defined in terms of a complex functional role which is given by the conjunction of the platitudes (somewhat analogous to the way in which functionalists in the philosophy of mind construe mental states as functional states, specified in terms of their functional roles—though in their case the relevant functional roles are causal roles, which is not a feasible option when it comes to the truth-role). Here the main issue will be to determine (a) whether such an account really works, when the technical details are laid out, and (b) whether it is plausible to claim that properties as different as correspondence to a fact, on the one hand, and coherence or superassertibilty, on the other, can be said to play one and the same role—a claim that seems required by the thesis that these different properties all realize the same property, being true. For more on pluralism, see e.g. the essays in Monnoyer (2007) and in Pedersen & Wright (2013); and the entry on pluralist theories of truth in this encyclopedia. 8.3 The Identity Theory of Truth According to the identity theory of truth, true propositions do not correspond to facts, they are facts: the true proposition that snow is white = the fact that snow is white. This non-traditional competitor of the correspondence theory threatens to collapse the correspondence relation into identity. (See Moore 1901-02; and Dodd 2000 for a book-length defense of this theory and discussion contrasting it with the correspondence theory; and see the entry the identity theory of truth: in this encyclopedia.) In response, a correspondence theorist will point out: (a) The identity theory is defensible only for propositions as truthbearers, and only for propositions construed in a certain way, namely as having objects and properties as constituents rather than ideas or concepts of objects and properties; that is, for Russellian propositions. Hence, there will be ample room (and need) for correspondence accounts of truth for other types of truthbearers, including propositions, if they are construed as constituted, partly or wholly, of concepts of objects and properties. (b) The identity theory is committed to the unacceptable consequence that facts are true. (c) The identity theory rests on the assumption that that-clauses always denote propositions, so that the that-clause in “the fact that snow is white” denotes the proposition that snow is white. The assumption can be questioned. That-clauses can be understood as ambiguous names, sometimes denoting propositions and sometimes denoting facts. The descriptive phrases “the proposition…” and “the fact…” can be regarded as serving to disambiguate the succeeding ambiguous that-clauses—much like the descriptive phrases in “the philosopher Socrates” and “the soccer-player Socrates” serve to disambiguate the ambiguous name “Socrates” (cf. David 2002). 8.4 Deflationism About Truth At present the most noticeable competitors to correspondence theories are deflationary accounts of truth (or ‘true’). Deflationists maintain that correspondence theories need to be deflated; that their central notions, correspondence and fact (and their relatives), play no legitimate role in an adequate account of truth and can be excised without loss. A correspondence-type formulation like (5) “Snow is white” is true iff it corresponds to the fact that snow is white, is to be deflated to (6) “Snow is white” is true iff snow is white, which, according to deflationists, says all there is to be said about the truth of “Snow is white”, without superfluous embellishments (cf. Quine 1987, p. 213). Correspondence theorists protest that (6) cannot lead to anything deserving to be regarded as an account of truth. It is concerned with only one particular sentence (“Snow is white”), and it resists generalization. (6) is a substitution instance of the schema (7) “p” is true iff p, which does not actually say anything itself (it is not truth-evaluable) and cannot be turned into a genuine generalization about truth, because of its essential reliance on the schematic letter “p”, a mere placeholder. The attempt to turn (7) into a generalization produces nonsense along the lines of “For every x, “x” is true iff x”, or requires invocation of truth: “Every substitution instance of the schema ““p” is true iff p” is true”. Moreover, no genuine generalizations about truth can be accounted for on the basis of (7). Correspondence definitions, on the other hand, do yield genuine generalizations about truth. Note that definitions like (1) and (2) in Section 3 employ ordinary objectual variables (not mere schematic placeholders); the definitions are easily turned into genuine generalizations by prefixing the quantifier phrase “For every x”, which is customarily omitted in formulations intended as definitions. It should be noted that the deflationist’s starting point, (5), which lends itself to deflating excisions, actually misrepresents the correspondence theory. According to (5), corresponding to the fact that snow is white is sufficient and necessary for “Snow is white” to be true. Yet, according to (1) and (2), it is sufficient but not necessary: “Snow is white” will be true as long as it corresponds to some fact or other. The genuine article, (1) or (2), is not as easily deflated as the impostor (5). The debate turns crucially on the question whether anything deserving to be called an “account” or “theory” of truth ought to take the form of a genuine generalization (and ought to be able to account for genuine generalizations involving truth). Correspondence theorists tend to regard this as a (minimal) requirement. Deflationists argue that truth is a shallow (sometimes “logical”) notion—a notion that has no serious explanatory role to play: as such it does not require a full-fledged account, a real theory, that would have to take the form of a genuine generalization. There is now a substantial body of literature on truth-deflationism in general and its relation to the correspondence theory in particular; the following is a small selection: Quine 1970, 1987; Devitt 1984; Field 1986; Horwich 1990 & 19982; Kirkham 1992; Gupta 1993; David 1994, 2008; Schmitt 1995; Künne 2003, chap. 4; Rami 2009. Relevant essays are contained in Blackburn and Simmons 1999; Schantz 2002; Armour-Garb and Beall 2005; and Wright and Pedersen 2010. See also the entry the deflationary theory of truth in this encyclopedia. 8.5 Truthmaker Theory This approach centers on the truthmaker or truthmaking principle: Every truth has a truthmaker; or alternatively: For every truth there is something that makes it true. The principle is usually understood as an expression of a realist attitude, emphasizing the crucial contribution the world makes to the truth of a proposition. Advocates tend to treat truthmaker theory primarily as a guide to ontology, asking: To entities of what ontological categories are we committed as truthmakers of the propositions we accept as true? Most advocates maintain that propositions of different logical types can be made true by items from different ontological categories: e.g., propositions of some types are made true by facts, others just by individual things, others by events, others by tropes (cf., e.g. Armstrong 1997). This is claimed as a significant improvement over traditional correspondence theories which are understood—correctly in most but by no means all cases—to be committed to all truthmakers belonging to a single ontological category (albeit disagreeing about which category that is). All advocates of truthmaker theory maintain that the truthmaking relation is not one-one but many-many: some truths are made true by more than one truthmaker; some truthmakers make true more than one truth. This is also claimed as a significant improvement over traditional correspondence theories which are often portrayed as committed to correspondence being a one-one relation. This portrayal is only partly justified. While it is fairly easy to find real-life correspondence theorists committing themselves to the view that each truth corresponds to exactly one fact (at least by implication, talking about the corresponding fact), it is difficult to find real-life correspondence theorists committing themselves to the view that only one truth can correspond to a given fact (but see Moore 1910-11, p. 256). A truthmaker theory may be presented as a competitor to the correspondence theory or as a version of the correspondence theory. This depends considerably on how narrowly or broadly one construes “correspondence theory”, i.e. on terminological issues. Some advocates would agree with Dummett (1959, p. 14) who said that, although “we have nowadays abandoned the correspondence theory of truth”, it nevertheless “expresses one important feature of the concept of truth…: that a statement is true only if there is something in the world in virtue of which it is true”. Other advocates would follow Armstrong who tends to present his truthmaker theory as a liberal form of correspondence theory; indeed, he seems committed to the view that the truth of a (contingent) elementary proposition consists in its correspondence with some (atomic) fact (cf. Armstrong 1997; 2004, pp. 22-3, 48-50). It is not easy to find a substantive difference between truthmaker theory and various brands of the sort of modified correspondence theory treated above under the heading “Logical Atomism” (see Section 7.1). Logical atomists, such as Russell (1918) and Wittgenstein (1921), will hold that the truth or falsehood of every truth-value bearer can be explained in terms of (can be derived from) logical relations between truth-value bearers, by way of the recursive clauses, together with the base clauses, i.e., the correspondence and non-correspondence of elementary truth-value bearers with facts. This recursive strategy could be pursued with the aim to reject the truthmaker principle: not all truths have truthmakers, only elementary truths have truthmakers (here understood as corresponding atomic facts). But it could also be pursued—and this seems to have been Russell’s intention at the time—with the aim to secure the truthmaker principle, even though the simple correspondence definition has been abandoned: not every truth corresponds to a fact, only elementary truths do, but every truth has a truthmaker; where the recursive clauses are supposed to show how truthmaking without correspondence, but grounded in correspondence, comes about. There is one straightforward difference between truthmaker theory and most correspondence theories. The latter are designed to answer the question “What is truth?”. Simple (unmodified) correspondence theories center on a biconditional, such as “x is true iff x corresponds to a fact”, intended to convey a definition of truth (at least a “real definition” which does not commit them to the claim that the term “true” is synonymous with “corresponds to a fact”—especially nowadays most correspondence theorists would consider such a claim to be implausibly and unnecessarily bold). Modified correspondence theories also aim at providing a definition of truth, though in their case the definition will be considerably more complex, owing to the recursive character of the account. Truthmaker theory, on the other hand, centers on the truthmaker principle: For every truth there is something that makes it true. Though this principle will deliver the biconditional “x is true iff something makes x true” (since “something makes x true” trivially implies “x is true”), this does not yield a promising candidate for a definition of truth: defining truth in terms of truthmaking would appear to be circular. Unlike most correspondence theories, truthmaker theory is not equipped, and usually not designed, to answer the question “What is truth?”—at least not if one expects the answer to take the form of a feasible candidate for a definition of truth. There is a growing body of literature on truthmaker theory; see for example: Russell 1918; Mullligan, Simons, and Smith 1984; Fox 1987; Armstrong 1997, 2004; Merricks 2007; and the essays in Beebe and Dodd 2005; Monnoyer 2007; and in Lowe and Rami 2009. See also the entry on truthmakers in this encyclopedia. 9. More Objections to the Correspondence Theory Two final objections to the correspondence theory deserve separate mention. 9.1 The Big Fact Inspired by an allegedly similar argument of Frege’s, Davidson (1969) argues that the correspondence theory is bankrupt because it cannot avoid the consequence that all true sentences correspond to the same fact: the Big Fact. The argument is based on two crucial assumptions: (i) Logically equivalent sentences can be substituted salva veritate in the context ‘the fact that...’; and (ii) If two singular terms denoting the same thing can be substituted for each other in a given sentence salva veritate, they can still be so substituted if that sentence is embedded within the context ‘the fact that...’. In the version below, the relevant singular terms will be the following: ‘(the x such that x = Diogenes & p)’ and ‘(the x such that x = Diogenes & q)’. Now, assume that a given sentence, s, corresponds to the fact that p; and assume that ‘p’ and ‘q’ are sentences with the same truth-value. We have: which, by (i), implies which, by (ii), implies which, by (i), implies Since the only restriction on ‘q’ was that it have the same truth-value as ‘p’, it would follow that any sentence s that corresponds to any fact corresponds to every fact; so that all true sentences correspond to the same facts, thereby proving the emptiness of the correspondence theory—the conclusion of the argument is taken as tantamount to the conclusion that every true sentence corresponds to the totality of all the facts, i.e, the Big Fact, i.e., the world as a whole. This argument belongs to a type now called “slingshot arguments” (because a giant opponent is brought down by a single small weapon, allegedly). The first versions of this type of argument were given by Church (1943) and Gödel (1944); it was later adapted by Quine (1953, 1960) in his crusade against quantified modal logic. Davidson is offering yet another adaption, this time involving the expression “corresponds to the fact that”. The argument has been criticized repeatedly. Critics point to the two questionable assumptions on which it relies, (i) and (ii). It is far from obvious why a correspondence theorist should be tempted by either one of them. Opposition to assumption (i) rests on the view that expressibility by logically equivalent sentences may be a necessary, but is not a sufficient condition for fact identity. Opposition to assumption (ii) rests on the observation that the (alleged) singular terms used in the argument are definite descriptions: their status as genuine singular terms is in doubt, and it is well-known that they behave rather differently than proper names for which assumption (ii) is probably valid (cf. Follesdal 1966/2004; Olson 1987; Künne 2003; and especially the extended discussion and criticism in Neale 2001.)

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

The Coherence Theory of Truth

1. Versions of the Coherence Theory of Truth The coherence theory of truth has several versions. These versions differ on two major issues. Different versions of the theory give different accounts of the coherence relation. Different varieties of the theory also give various accounts of the set (or sets) of …

1. Versions of the Coherence Theory of Truth The coherence theory of truth has several versions. These versions differ on two major issues. Different versions of the theory give different accounts of the coherence relation. Different varieties of the theory also give various accounts of the set (or sets) of propositions with which true propositions cohere. (Such a set will be called a specified set.) According to some early versions of the coherence theory, the coherence relation is simply consistency. On this view, to say that a proposition coheres with a specified set of propositions is to say that the proposition is consistent with the set. This account of coherence is unsatisfactory for the following reason. Consider two propositions which do not belong to a specified set. These propositions could both be consistent with a specified set and yet be inconsistent with each other. If coherence is consistency, the coherence theorist would have to claim that both propositions are true, but this is impossible. A more plausible version of the coherence theory states that the coherence relation is some form of entailment. Entailment can be understood here as strict logical entailment, or entailment in some looser sense. According to this version, a proposition coheres with a set of propositions if and only if it is entailed by members of the set. Another more plausible version of the theory, held for example in Bradley (1914), is that coherence is mutual explanatory support between propositions. The second point on which coherence theorists (coherentists, for short) differ is the constitution of the specified set of propositions. Coherentists generally agree that the specified set consists of propositions believed or held to be true. They differ on the questions of who believes the propositions and when. At one extreme, coherence theorists can hold that the specified set of propositions is the largest consistent set of propositions currently believed by actual people. For such a version of the theory, see Young (1995). According to a moderate position, the specified set consists of those propositions which will be believed when people like us (with finite cognitive capacities) have reached some limit of inquiry. For such a coherence theory, see Putnam (1981). At the other extreme, coherence theorists can maintain that the specified set contains the propositions which would be believed by an omniscient being. Some idealists seem to accept this account of the specified set. If the specified set is a set actually believed, or even a set which would be believed by people like us at some limit of inquiry, coherentism involves the rejection of realism about truth. Realism about truth involves acceptance of the principle of bivalence (according to which every proposition is either true or false) and the principle of transcendence (which says that a proposition may be true even though it cannot be known to be true). Coherentists who do not believe that the specified set is the set of propositions believed by an omniscient being are committed to rejection of the principle of bivalence since it is not the case that for every proposition either it or a contrary proposition coheres with the specified set. They reject the principle of transcendence since, if a proposition coheres with a set of beliefs, it can be known to cohere with the set. 2. Arguments for Coherence Theories of Truth Two principal lines of argument have led philosophers to adopt a coherence theory of truth. Early advocates of coherence theories were persuaded by reflection on metaphysical questions. More recently, epistemological and semantic considerations have been the basis for coherence theories. 2.1 The Metaphysical Route to Coherentism Early versions of the coherence theory were associated with idealism. Walker (1989) attributes coherentism to Spinoza, Kant, Fichte and Hegel. Certainly a coherence theory was adopted by a number of British Idealists in the last years of the nineteenth century and the first decades of the twentieth. See, for example, Bradley (1914). Idealists are led to a coherence theory of truth by their metaphysical position. Advocates of the correspondence theory believe that a belief is (at least most of the time) ontologically distinct from the objective conditions which make the belief true. Idealists do not believe that there is an ontological distinction between beliefs and what makes beliefs true. From the idealists’ perspective, reality is something like a collection of beliefs. Consequently, a belief cannot be true because it corresponds to something which is not a belief. Instead, the truth of a belief can only consist in its coherence with other beliefs. A coherence theory of truth which results from idealism usually leads to the view that truth comes in degrees. A belief is true to the degree that it coheres with other beliefs. Since idealists do not recognize an ontological distinction between beliefs and what makes them true, distinguishing between versions of the coherence theory of truth adopted by idealists and an identity theory of truth can be difficult. The article on Bradley in this Encyclopedia (Candlish 2006) argues that Bradley had an identity theory, not a coherence theory. In recent years metaphysical arguments for coherentism have found few advocates. This is due to the fact that idealism is not widely held. 2.2 Epistemological Routes to Coherentism Blanshard (1939, ch. XXVI) argues that a coherence theory of justification leads to a coherence theory of truth. His argument runs as follows. Someone might hold that coherence with a set of beliefs is the test of truth but that truth consists in correspondence to objective facts. If, however, truth consists in correspondence to objective facts, coherence with a set of beliefs will not be a test of truth. This is the case since there is no guarantee that a perfectly coherent set of beliefs matches objective reality. Since coherence with a set of beliefs is a test of truth, truth cannot consist in correspondence. Blanshard’s argument has been criticised by, for example, Rescher (1973). Blanshard’s argument depends on the claim that coherence with a set of beliefs is the test of truth. Understood in one sense, this claim is plausible enough. Blanshard, however, has to understand this claim in a very strong sense: coherence with a set of beliefs is an infallible test of truth. If coherence with a set of beliefs is simply a good but fallible test of truth, as Rescher suggests, the argument fails. The “falling apart” of truth and justification to which Blanshard refers is to be expected if truth is only a fallible test of truth. Another epistemological argument for coherentism is based on the view that we cannot “get outside” our set of beliefs and compare propositions to objective facts. A version of this argument was advanced by some logical positivists including Hempel (1935) and Neurath (1983). This argument, like Blanshard’s, depends on a coherence theory of justification. The argument infers from such a theory that we can only know that a proposition coheres with a set of beliefs. We can never know that a proposition corresponds to reality. This argument is subject to at least two criticisms. For a start, it depends on a coherence theory of justification, and is vulnerable to any objections to this theory. More importantly, a coherence theory of truth does not follow from the premisses. We cannot infer from the fact that a proposition cannot be known to correspond to reality that it does not correspond to reality. Even if correspondence theorists admit that we can only know which propositions cohere with our beliefs, they can still hold that truth consists in correspondence. If correspondence theorists adopt this position, they accept that there may be truths which cannot be known. Alternatively, they can argue, as does Davidson (1986), that the coherence of a proposition with a set of beliefs is a good indication that the proposition corresponds to objective facts and that we can know that propositions correspond. Coherence theorists need to argue that propositions cannot correspond to objective facts, not merely that they cannot be known to correspond. In order to do this, the foregoing argument for coherentism must be supplemented. One way to supplement the argument would be to argue as follows. As noted above, the correspondence and coherence theories have differing views about the nature of truth conditions. One way to decide which account of truth conditions is correct is to pay attention to the process by which propositions are assigned truth conditions. Coherence theorists can argue that the truth conditions of a proposition are the conditions under which speakers make a practice of asserting it. Coherentists can then maintain that speakers can only make a practice of asserting a proposition under conditions the speakers are able to recognise as justifying the proposition. Now the (supposed) inability of speakers to “get outside” of their beliefs is significant. Coherentists can argue that the only conditions speakers can recognise as justifying a proposition are the conditions under which it coheres with their beliefs. When the speakers make a practice of asserting the proposition under these conditions, they become the proposition’s truth conditions. For an argument of this sort see Young (1995). 3. Criticisms of Coherence Theories of Truth Any coherence theory of truth faces two principal challenges. The first may be called the specification objection. The second is the transcendence objection. 3.1 The Specification Objection According to the specification objection, coherence theorists have no way to identify the specified set of propositions without contradicting their position. This objection originates in Russell (1907). Opponents of the coherence theory can argue as follows. The proposition (1) “Jane Austen was hanged for murder” coheres with some set of propositions. (2) “Jane Austen died in her bed” coheres with another set of propositions. No one supposes that the first of these propositions is true, in spite of the fact that it coheres with a set of propositions. The specification objection charges that coherence theorists have no grounds for saying that (1) is false and (2) true. Some responses to the specification problem are unsuccessful. One could say that we have grounds for saying that (1) is false and (2) is true because the latter coheres with propositions which correspond to the facts. Coherentists cannot, however, adopt this response without contradicting their position. Sometimes coherence theorists maintain that the specified system is the most comprehensive system, but this is not the basis of a successful response to the specification problem. Coherentists can only, unless they are to compromise their position, define comprehensiveness in terms of the size of a system. Coherentists cannot, for example, talk about the most comprehensive system composed of propositions which correspond to reality. There is no reason, however, why two or more systems cannot be equally large. Other criteria of the specified system, to which coherentists frequently appeal, are similarly unable to solve the specification problem. These criteria include simplicity, empirical adequacy and others. Again, there seems to be no reason why two or more systems cannot equally meet these criteria. Although some responses to the Russell’s version of the specification objection are unsuccessful, it is unable to refute the coherence theory. Coherentists do not believe that the truth of a proposition consists in coherence with any arbitrarily chosen set of propositions. Rather, they hold that truth consists in coherence with a set of beliefs, or with a set of propositions held to be true. No one actually believes the set of propositions with which (1) coheres. Coherence theorists conclude that they can hold that (1) is false without contradicting themselves. A more sophisticated version of the specification objection has been advanced by Walker (1989); for a discussion, see Wright (1995). Walker argues as follows. In responding to Russell’s version of the specification objection, coherentists claim that some set of propositions, call it S, is believed. They are committed to the truth of (3) “S is believed.” The question of what it is for (3) to be true then arises. Coherence theorists might answer this question by saying that “‘S is believed’ is believed” is true. If they give this answer, they are apparently off on an infinite regress, and they will never say what it is for a proposition to be true. Their plight is worsened by the fact that arbitrarily chosen sets of propositions can include propositions about what is believed. So, for example, there will be a set which contains “Jane Austen was hanged for murder,” “‘Jane Austen was hanged for murder’ is believed,” and so on. The only way to stop the regress seems to be to say that the truth conditions of (3) consist in the objective fact S is believed. If, however, coherence theorists adopt this position, they seem to contradict their own position by accepting that the truth conditions of some proposition consist in facts, not in propositions in a set of beliefs. There is some doubt about whether Walker’s version of the specification objection succeeds. Coherence theorists can reply to Walker by saying that nothing in their position is inconsistent with the view that there is a set of propositions which is believed. Even though this objective fact obtains, the truth conditions of propositions, including propositions about which sets of propositions are believed, are the conditions under which they cohere with a set of propositions. For a defence of the coherence theory against Walker’s version of the specification objection, see Young (2001). A coherence theory of truth gives rise to a regress, but it is not a vicious regress and the correspondence theory faces a similar regress. If we say that p is true if and only if it coheres with a specified set of propositions, we may be asked about the truth conditions of “p coheres with a specified set.” Plainly, this is the start of a regress, but not one to worry about. It is just what one would expect, given that the coherence theory states that it gives an account of the truth conditions of all propositions. The correspondence theory faces a similar benign regress. The correspondence theory states that a proposition is true if and only if it corresponds to certain objective conditions. The proposition “p corresponds to certain objective conditions” is also true if and only if it corresponds to certain objective conditions, and so on. 3.2 The Transcendence Objection The transcendence objection charges that a coherence theory of truth is unable to account for the fact that some propositions are true which cohere with no set of beliefs. According to this objection, truth transcends any set of beliefs. Someone might argue, for example, that the proposition “Jane Austen wrote ten sentences on November 17th, 1807” is either true or false. If it is false, some other proposition about how many sentences Austen wrote that day is true. No proposition, however, about precisely how many sentences Austen wrote coheres with any set of beliefs and we may safely assume that none will ever cohere with a set of beliefs. Opponents of the coherence theory will conclude that there is at least one true proposition which does not cohere with any set of beliefs. Some versions of the coherence theory are immune to the transcendence objection. A version which holds that truth is coherence with the beliefs of an omniscient being is proof against the objection. Every truth coheres with the set of beliefs of an omniscient being. All other versions of the theory, however, have to cope with the objection, including the view that truth is coherence with a set of propositions believed at the limit of inquiry. Even at the limit of inquiry, finite creatures will not be able to decide every question, and truth may transcend what coheres with their beliefs. Coherence theorists can defend their position against the transcendence objection by maintaining that the objection begs the question. Those who present the objection assume, generally without argument, that it is possible that some proposition be true even though it does not cohere with any set of beliefs. This is precisely what coherence theorists deny. Coherence theorists have arguments for believing that truth cannot transcend what coheres with some set of beliefs. Their opponents need to take issue with these arguments rather than simply assert that truth can transcend what coheres with a specified system. 3.3 The Logic Objection Russell (1912) presented a third classic objection to the coherence theory of truth. According to this objection, any talk about coherence presupposes the truth of the laws of logic. For example, Russell argues, to say that two propositions cohere with each other is to presuppose the truth of the law of non-contradiction. In this case, coherentism has no account of the truth of law of non-contradiction. If, however, the coherence theorist holds that the truth of the law of non-contradiction depends on its coherence with a system of beliefs, and it were supposed to be false, then propositions cannot cohere or fail to cohere. In this case, the coherence theory of truth completely breaks down since propositions cannot cohere with each other. Coherentists have a plausible response to this objection. They may hold that the law of non-contradiction, like any other truth, is true because it coheres with a system of beliefs. In particular, the law of non-contradiction is supported by the belief that, for example, communication and reasoning would be impossible unless every system of beliefs contains something like law of non-contradiction (and the belief that communication and reasoning are possible). It is true that, as Russell says, if the law is supposed not to cohere with a system of beliefs, then propositions can neither cohere nor fail to cohere. However, coherence theorists may hold, they do not suppose the law of non-contradiction to be false. On the contrary, they are likely to hold that any coherent set of beliefs must include the law of non-contradiction or a similar law. 4. New Objections to Coherentism Paul Thagard is the author of the first of two recent new arguments against the coherence theory. Thagard states his argument as follows: if there is a world independent of representations of it, as historical evidence suggests, then the aim of representation should be to describe the world, not just to relate to other representations. My argument does not refute the coherence theory, but shows that it implausibly gives minds too large a place in constituting truth. (Thagard 2007: 29–30) Thagard’s argument seems to be that if there is a mind-independent world, then our representations are representations of the world. (He says representations “should be” of the world, but the argument is invalid with the addition of the auxiliary verb.) The world existed before humans and our representations, including our propositional representations. (So history and, Thagard would likely say, our best science tells us.) Therefore, representations, including propositional representations, are representations of a mind-independent world. The second sentence of the passage just quoted suggests that the only way that coherentists can reject this argument is to adopt some sort of idealism. That is, they can only reject the minor premiss of the argument as reconstructed. Otherwise they are committed to saying that propositions represent the world and, Thagard seems to suggest, this is to say that propositions have the sort of truth-conditions posited by a correspondence theory. So the coherence theory is false. In reply to this argument, coherentists can deny that propositions are representations of a mind-independent world. To say that a proposition is true is to say that it is supported by a specified system of propositions. So, the coherentist can say, propositions are representations of systems of beliefs, not representations of a mind-independent world. To assert a proposition is to assert that it is entailed by a system of beliefs. The coherentist holds that even if there is a mind-independent world, it does not follow that the “the point” of representations is to represent this world. If coherentists have been led to their position by an epistemological route, they believe that we cannot “get outside” our system of beliefs. If we cannot get outside of our system of beliefs, then it is hard to see how we can be said to represent a mind-independent reality. Colin McGinn has proposed the other new objection to coherentism. He argues (McGinn 2002: 195) that coherence theorists are committed to idealism. Like Thagard, he takes idealism to be obviously false, so the argument is a reductio. McGinn’s argument runs as follows. Coherentists are committed to the view that, for example, ‘Snow falls from the sky’ is true iff the belief that snow falls from the sky coheres with other beliefs. Now it follows from this and the redundancy biconditional (p is true iff p) that snow falls from the sky iff the belief that snow falls from the sky coheres with other beliefs. It appears then that the coherence theorist is committed to the view that snow could not fall from the sky unless the belief that snow falls from the sky coheres with other beliefs. From this it follows that how things are depends on what is believed about them. This seems strange to McGinn since he thinks, reasonably, that snow could fall from the sky even if there were no beliefs about snow, or anything else. The linking of how things are and how they are believed to be leads McGinn to say that coherentists are committed to idealism, this being the view that how things are is mind-dependent. Coherentists have a response to this objection. McGinn’s argument works because he takes it that the redundancy biconditional means something like “p is true because p”. Only if redundancy biconditionals are understood in this way does McGinn’s argument go through. McGinn needs to be talking about what makes “Snow falls from the sky” true for his reductio to work. Otherwise, coherentists who reject his argument cannot be charged with idealism. He assumes, in a way that a coherent theorist can regard as question-begging, that the truth-maker of the sentence in question is an objective way the world is. Coherentists deny that any sentences are made true by objective conditions. In particular, they hold that the falling of snow from the sky does not make “Snow falls from the sky” true. Coherentists hold that it, like any other sentence, is true because it coheres with a system of beliefs. So coherentists appear to have a plausible defence against McGinn’s objection.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

Axiomatic Theories of Truth

1. Motivations There have been many attempts to define truth in terms of correspondence, coherence or other notions. However, it is far from clear that truth is a definable notion. In formal settings satisfying certain natural conditions, Tarski’s theorem on the undefinability of the truth predicate shows that a definition …

1. Motivations There have been many attempts to define truth in terms of correspondence, coherence or other notions. However, it is far from clear that truth is a definable notion. In formal settings satisfying certain natural conditions, Tarski’s theorem on the undefinability of the truth predicate shows that a definition of a truth predicate requires resources that go beyond those of the formal language for which truth is going to be defined. In these cases definitional approaches to truth have to fail. By contrast, the axiomatic approach does not presuppose that truth can be defined. Instead, a formal language is expanded by a new primitive predicate for truth or satisfaction, and axioms for that predicate are then laid down. This approach by itself does not preclude the possibility that the truth predicate is definable, although in many cases it can be shown that the truth predicate is not definable. In semantic theories of truth (e.g., Tarski 1935, Kripke 1975), in contrast, a truth predicate is defined for a language, the so-called object language. This definition is carried out in a metalanguage or metatheory, which is typically taken to include set theory or at least another strong theory or expressively rich interpreted language. Tarski’s theorem on the undefinability of the truth predicate shows that, given certain general assumptions, the resources of the metalanguage or metatheory must go beyond the resources of the object-language. So semantic approaches usually necessitate the use of a metalanguage that is more powerful than the object-language for which it provides a semantics. As with other formal deductive systems, axiomatic theories of truth can be presented within very weak logical frameworks. These frameworks require very few resources, and in particular, avoid the need for a strong metalanguage and metatheory. Formal work on axiomatic theories of truth has helped to shed some light on semantic theories of truth. For instance, it has yielded information on what is required of a metalanguage that is sufficient for defining a truth predicate. Semantic theories of truth, in turn, provide one with the theoretical tools needed for investigating models of axiomatic theories of truth and with motivations for certain axiomatic theories. Thus axiomatic and semantic approaches to truth are intertwined. This entry outlines the most popular axiomatic theories of truth and mentions some of the formal results that have been obtained concerning them. We give only hints as to their philosophical applications. 1.1 Truth, properties and sets Theories of truth and predication are closely related to theories of properties and property attribution. To say that an open formula \(\phi(x)\) is true of an individual \(a\) seems equivalent (in some sense) to the claim that \(a\) has the property of being such that \(\phi\) (this property is signified by the open formula). For example, one might say that ‘\(x\) is a poor philosopher’ is true of Tom instead of saying that Tom has the property of being a poor philosopher. Quantification over definable properties can then be mimicked in a language with a truth predicate by quantifying over formulas. Instead of saying, for instance, that \(a\) and \(b\) have exactly the same properties, one says that exactly the same formulas are true of \(a\) and \(b\). The reduction of properties to truth works also to some extent for sets of individuals. There are also reductions in the other direction: Tarski (1935) has shown that certain second-order existence assumptions (e.g., comprehension axioms) may be utilized to define truth (see the entry on Tarski’s definition of truth). The mathematical analysis of axiomatic theories of truth and second-order systems has exhibited many equivalences between these second-order existence assumptions and truth-theoretic assumptions. These results show exactly what is required for defining a truth predicate that satisfies certain axioms, thereby sharpening Tarski’s insights into definability of truth. In particular, proof-theoretic equivalences described in Section 3.3 below make explicit to what extent a metalanguage (or rather metatheory) has to be richer than the object language in order to be able to define a truth predicate. The equivalence between second-order theories and truth theories also has bearing on traditional metaphysical topics. The reductions of second-order theories (i.e., theories of properties or sets) to axiomatic theories of truth may be conceived as forms of reductive nominalism, for they replace existence assumptions for sets or properties (e.g., comprehension axioms) by ontologically innocuous assumptions, in the present case by assumptions on the behaviour of the truth predicate. 1.2 Truth and reflection According to Gödel’s incompleteness theorems, the statement that Peano Arithmetic (PA) is consistent, in its guise as a number-theoretic statement (given the technique of Gödel numbering), cannot be derived in PA itself. But PA can be strengthened by adding this consistency statement or by stronger axioms. In particular, axioms partially expressing the soundness of PA can be added. These are known as reflection principles. An example of a reflection principle for PA would be the set of sentences \(Bew_{PA}(\ulcorner \phi \urcorner) \rightarrow \phi\) where \(\phi\) is a formula of the language of arithmetic, \(\ulcorner \phi \urcorner\) a name for \(\phi\) and \(Bew_{PA}(x)\) is the standard provability predicate for PA (‘\(Bew\)’ was introduced by Gödel and is short for the German word ‘beweisbar’, that is, ‘provable’). The process of adding reflection principles can be iterated: one can add, for example, a reflection principle R for PA to PA; this results in a new theory PA+R. Then one adds the reflection principle for the system PA+R to the theory PA+R. This process can be continued into the transfinite (see Feferman 1962 and Franzén 2004). The reflection principles express—at least partially—the soundness of the system. The most natural and full expression of the soundness of a system involves the truth predicate and is known as the Global Reflection Principle (see Kreisel and Lévy 1968). The Global Reflection Principle for a formal system S states that all sentences provable in S are true: \(Bew_{S} (x)\) expresses here provability of sentences in the system S (we omit discussion here of the problems of defining \(Bew_{S} (x))\). The truth predicate has to satisfy certain principles; otherwise the global reflection principle would be vacuous. Thus not only the global reflection principle has to be added, but also axioms for truth. If a natural theory of truth like T(PA) below is added, however, it is no longer necessary to postulate the global reflection principle explicitly, as theories like T(PA) prove already the global reflection principle for PA. One may therefore view truth theories as reflection principles as they prove soundness statements and add the resources to express these statements. Thus instead of iterating reflection principles that are formulated entirely in the language of arithmetic, one can add by iteration new truth predicates and correspondingly new axioms for the new truth predicates. Thereby one might hope to make explicit all the assumptions that are implicit in the acceptance of a theory like PA. The resulting theory is called the reflective closure of the initial theory. Feferman (1991) has proposed the use of a single truth predicate and a single theory (KF), rather than a hierarchy of predicates and theories, in order to explicate the reflective closure of PA and other theories. (KF is discussed further in Section 4.4 below.) The relation of truth theories and (iterated) reflection principles also became prominent in the discussion of truth-theoretic deflationism (see Tennant 2002 and the follow-up discussion). 1.3 Truth-theoretic deflationism Many proponents of deflationist theories of truth have chosen to treat truth as a primitive notion and to axiomatize it, often using some version of the \(T\)-sentences as axioms. \(T\)-sentences are equivalences of the form \(T\ulcorner \phi \urcorner \leftrightarrow \phi\), where \(T\) is the truth predicate, \(\phi\) is a sentence and \(\ulcorner \phi \urcorner\) is a name for the sentence \(\phi\). (More refined axioms have also been discussed by deflationists.) At first glance at least, the axiomatic approach seems much less ‘deflationary’ than those more traditional theories which rely on a definition of truth in terms of correspondence or the like. If truth can be explicitly defined, it can be eliminated, whereas an axiomatized notion of truth may and often does come with commitments that go beyond that of the base theory. If truth does not have any explanatory force, as some deflationists claim, the axioms for truth should not allow us to prove any new theorems that do not involve the truth predicate. Accordingly, Horsten (1995), Shapiro (1998) and Ketland (1999) have suggested that a deflationary axiomatization of truth should be at least conservative. The new axioms for truth are conservative if they do not imply any additional sentences (free of occurrences of the truth-predicate) that aren’t already provable without the truth axioms. Thus a non-conservative theory of truth adds new non-semantic content to a theory and has genuine explanatory power, contrary to many deflationist views. Certain natural theories of truth, however, fail to be conservative (see Section 3.3 below, Field 1999 and Shapiro 2002 for further discussion). According to many deflationists, truth serves merely the purpose of expressing infinite conjunctions. It is plain that not all infinite conjunctions can be expressed because there are uncountably many (non-equivalent) infinite conjunctions over a countable language. Since the language with an added truth predicate has only countably many formulas, not every infinite conjunction can be expressed by a different finite formula. The formal work on axiomatic theories of truth has helped to specify exactly which infinite conjunctions can be expressed with a truth predicate. Feferman (1991) provides a proof-theoretic analysis of a fairly strong system. (Again, this will be explained in the discussion about KF in Section 4.4 below.) 2. The base theory 2.1 The choice of the base theory In most axiomatic theories, truth is conceived as a predicate of objects. There is an extensive philosophical discussion on the category of objects to which truth applies: propositions conceived as objects that are independent of any language, types and tokens of sentences and utterances, thoughts, and many other objects have been proposed. Since the structure of sentences considered as types is relatively clear, sentence types have often been used as the objects that can be true. In many cases there is no need to make very specific metaphysical commitments, because only certain modest assumptions on the structure of these objects are required, independently from whether they are finally taken to be syntactic objects, propositions or still something else. The theory that describes the properties of the objects to which truth can be attributed is called the base theory. The formulation of the base theory does not involve the truth predicate or any specific truth-theoretic assumptions. The base theory could describe the structure of sentences, propositions and the like, so that notions like the negation of such an object can then be used in the formulation of the truth-theoretic axioms. In many axiomatic truth theories, truth is taken as a predicate applying to the Gödel numbers of sentences. Peano arithmetic has proved to be a versatile theory of objects to which truth is applied, mainly because adding truth-theoretic axioms to Peano arithmetic yields interesting systems and because Peano arithmetic is equivalent to many straightforward theories of syntax and even theories of propositions. However, other base theories have been considered as well, including formal syntax theories and set theories. Of course, we can also investigate theories which result by adding the truth-theoretic axioms to much stronger theories like set theory. Usually there is no chance of proving the consistency of set theory plus further truth-theoretic axioms because the consistency of set theory itself cannot be established without assumptions transcending set theory. In many cases not even relative consistency proofs are feasible. However, if adding certain truth-theoretic axioms to PA yields a consistent theory, it seems at least plausible that adding analogous axioms to set theory will not lead to an inconsistency. Therefore, the hope is that research on theories of truth over PA will give an some indication of what will happen when we extend stronger theories with axioms for the truth predicate. However, Fujimoto (2012) has shown that some axiomatic truth theories over set theory differ from their counterparts over Peano arithmetic in some aspects. 2.2 Notational conventions For the sake of definiteness we assume that the language of arithmetic has exactly \(\neg , \wedge\) and \(\vee\) as connectives and \(\forall\) and \(\exists\) as quantifiers. It has as individual constants only the symbol 0 for zero; its only function symbol is the unary successor symbol \(S\); addition and multiplication are expressed by predicate symbols. Therefore the only closed terms of the language of arithmetic are the numerals \(0, S\)(0), \(S(S\)(0)), \(S(S(S\)(0))), …. The language of arithmetic does not contain the unary predicate symbol \(T\), so let \(\mathcal{L}_T\) be the language of arithmetic augmented by the new unary predicate symbol \(T\) for truth. If \(\phi\) is a sentence of \(\mathcal{L}_T, \ulcorner \phi \urcorner\) is a name for \(\phi\) in the language \(\mathcal{L}_T\); formally speaking, it is the numeral of the Gödel number of \(\phi\). In general, Greek letters like \(\phi\) and \(\psi\) are variables of the metalanguage, that is, the language used for talking about theories of truth and the language in which this entry is written (i.e., English enriched by some symbols). \(\phi\) and \(\psi\) range over formulas of the formal language \(\mathcal{L}_T\). In what follows, we use small, upper case italic letters like \({\scriptsize A}, {\scriptsize B},\ldots\) as variables in \(\mathcal{L}_T\) ranging over sentences (or their Gödel numbers, to be precise). Thus \(\forall{\scriptsize A}(\ldots{\scriptsize A}\ldots)\) stands for \(\forall x(Sent_T (x) \rightarrow \ldots x\ldots)\), where \(Sent_T (x)\) expresses in the language of arithmetic that \(x\) is a sentence of the language of arithmetic extended by the predicate symbol \(T\). The syntactical operations of forming a conjunction of two sentences and similar operations can be expressed in the language of arithmetic. Since the language of arithmetic does not contain any function symbol apart from the symbol for successor, these operations must be expressed by sutiable predicate expressions. Thus one can say in the language \(\mathcal{L}_T\) that a negation of a sentence of \(\mathcal{L}_T\) is true if and only if the sentence itself is not true. We would write this as The square brackets indicate that the operation of forming the negation of \({\scriptsize A}\) is expressed in the language of arithmetic. Since the language of arithmetic does not contain a function symbol representing the function that sends sentences to their negations, appropriate paraphrases involving predicates must be given. Thus, for instance, the expression is a single sentence of the language \(\mathcal{L}_T\) saying that a conjunction of sentences of \(\mathcal{L}_T\) is true if and only if both sentences are true. In contrast, is only a schema. That is, it stands for the set of all sentences that are obtained from the above expression by substituting sentences of \(\mathcal{L}_T\) for the Greek letters \(\phi\) and \(\psi\). The single sentence \(\forall{\scriptsize A}\forall{\scriptsize B}(T[{\scriptsize A} \wedge{\scriptsize B}] \leftrightarrow (T{\scriptsize A} \wedge T{\scriptsize B}))\) implies all sentences which are instances of the schema, but the instances of the schema do not imply the single universally quantified sentence. In general, the quantified versions are stronger than the corresponding schemata. 3. Typed theories of truth In typed theories of truth, only the truth of sentences not containing the same truth predicate is provable, thus avoiding the paradoxes by observing Tarski’s distinction between object and metalanguage. 3.1 Definable truth predicates Certain truth predicates can be defined within the language of arithmetic. Predicates suitable as truth predicates for sublanguages of the language of arithmetic can be defined within the language of arithmetic, as long as the quantificational complexity of the formulas in the sublanguage is restricted. In particular, there is a formula \(Tr_0 (x)\) that expresses that \(x\) is a true atomic sentence of the language of arithmetic, that is, a sentence of the form \(n=k\), where \(k\) and \(n\) are identical numerals. For further information on partial truth predicates see, for instance, Hájek and Pudlak (1993), Kaye (1991) and Takeuti (1987). The definable truth predicates are truly redundant, because they are expressible in PA; therefore there is no need to introduce them axiomatically. All truth predicates in the following are not definable in the language of arithmetic, and therefore not redundant at least in the sense that they are not definable. 3.2 The \(T\)-sentences The typed \(T\)-sentences are all equivalences of the form \(T\ulcorner \phi \urcorner \leftrightarrow \phi\), where \(\phi\) is a sentence not containing the truth predicate. Tarski (1935) called any theory proving these equivalences ‘materially adequate’. Tarski (1935) criticised an axiomatization of truth relying only on the \(T\)-sentences, not because he aimed at a definition rather than an axiomatization of truth, but because such a theory seemed too weak. Thus although the theory is materially adequate, Tarski thought that the \(T\)-sentences are deductively too weak. He observed, in particular, that the \(T\)-sentences do not prove the principle of completeness, that is, the sentence \(\forall{\scriptsize A}(T{\scriptsize A}\vee T[\neg{\scriptsize A}\)]) where the quantifier \(\forall{\scriptsize A}\) is restricted to sentences not containing T. Theories of truth based on the \(T\)-sentences, and their formal properties, have also recently been a focus of interest in the context of so-called deflationary theories of truth. The \(T\)-sentences \(T\ulcorner \phi \urcorner \leftrightarrow \phi\) (where \(\phi\) does not contain \(T)\) are not conservative over first-order logic with identity, that is, they prove a sentence not containing \(T\) that is not logically valid. For the \(T\)-sentences prove that the sentences \(0=0\) and \(\neg 0=0\) are different and that therefore at least two objects exist. In other words, the \(T\)-sentences are not conservative over the empty base theory. If the \(T\)-sentences are added to PA, the resulting theory is conservative over PA. This means that the theory does not prove \(T\)-free sentences that are not already provable in PA. This result even holds if in addition to the \(T\)-sentences also all induction axioms containing the truth predicate are added. This may be shown by appealing to the Compactness Theorem. In the form outlined above, T-sentences express the equivalence between \(T\ulcorner \phi \urcorner\) and \(\phi\) only when \(\phi\) is a sentence. In order to capture the equivalence for properties \((x\) has property P iff ‘P’ is true of \(x)\) one must generalise the T-sentences. The result are usually referred to as the uniform T-senences and are formalised by the equivalences \(\forall x(T\ulcorner \phi(\underline{x})\urcorner \leftrightarrow \phi(x))\) for each open formula \(\phi(v)\) with at most \(v\) free in \(\phi\). Underlining the variable indicates it is bound from the outside. More precisely, \(\ulcorner \phi(\underline{x})\urcorner\) stands for the result of replacing the variable \(v\) in \(\ulcorner \phi(v)\urcorner\) by the numeral of \(x\). 3.3 Compositional truth As was observed already by Tarski (1935), certain desirable generalizations don’t follow from the T-sentences. For instance, together with reasonable base theories they don’t imply that a conjunction is true if both conjuncts are true. In order to obtain systems that also prove universally quantified truth-theoretic principles, one can turn the inductive clauses of Tarski’s definition of truth into axioms. In the following axioms, \(AtomSent_{PA}(\ulcorner{\scriptsize A}\urcorner)\) expresses that \({\scriptsize A}\) is an atomic sentence of the language of arithmetic, \(Sent_{PA}(\ulcorner{\scriptsize A}\urcorner)\) expresses that \({\scriptsize A}\) is a sentence of the language of arithmetic. Axiom 1 says that an atomic sentence of the language of Peano arithmetic is true if and only if it is true according to the arithmetical truth predicate for this language \((Tr_0\) was defined in Section 3.1). Axioms 2–6 claim that truth commutes with all connectives and quantifiers. Axiom 5 says that a universally quantified sentence of the language of arithmetic is true if and only if all its numerical instances are true. \(Sent_{PA}(\forall v{\scriptsize A})\) says that \({\scriptsize A}(v)\) is a formula with at most \(v\) free (because \(\forall v{\scriptsize A}(v)\) is a sentence). If these axioms are to be formulated for a language like set theory that lacks names for all objects, then axioms 5 and 6 require the use of a satisfaction relation rather than a unary truth predicate. Axioms in the style of 1–6 above played a central role in Donald Davidson‘s theory of meaning and in several deflationist approaches to truth. The theory given by all axioms of PA and Axioms 1–6 but with induction only for \(T\)-free formulae is conservative over PA, that is, it doesn’t prove any new \(T\)-free theorems that not already provable in PA. However, not all models of PA can be expanded to models of PA + axioms 1–6. This follows from a result due to Lachlan (1981). Kotlarski, Krajewski, and Lachlan (1981) proved the conservativeness very similar to PA + axioms 1–6 by model-theoretic means. Although several authors claimed that this result is also finitarily provable, no such proof was available until Enayat & Visser (2015) and Leigh (2015). Moreover, the theory given by PA + axioms 1–6 is relatively interpretable in PA. However, this result is sensitive to the choice of the base theory: it fails for finitely axiomatized theories (Heck 2015, Nicolai 2016). These proof-theoretic results have been used extensively in the discussion of truth-theoretic deflationism (see Cieśliński 2017). Of course PA + axioms 1–6 is restrictive insofar as it does not contain the induction axioms in the language with the truth predicate. There are various labels for the system that is obtained by adding all induction axioms involving the truth predicate to the system PA + axioms 1–6: T(PA), CT, PA(S) or PA + ‘there is a full inductive satisfaction class’. This theory is no longer conservative over its base theory PA. For instance one can formalise the soundness theorem or global reflection principle for PA, that is, the claim that all sentences provable in PA are true. The global reflection principle for PA in turn implies the consistency of PA, which is not provable in pure PA by Gödel’s Second Incompleteness Theorem. Thus T(PA) is not conservative over PA. T(PA) is much stronger than the mere consistency statement for PA: T(PA) is equivalent to the second-order system ACA of arithmetical comprehension (see Takeuti 1987 and Feferman 1991). More precisely, T(PA) and ACA are intertranslatable in a way that preserves all arithmetical sentences. ACA is given by the axioms of PA with full induction in the second-order language and the following comprehension principle: where \(\phi(x)\) is any formula (in which \(x\) may or may not be free) that does not contain any second-order quantifiers, but possibly free second-order variables. In T(PA), quantification over sets can be defined as quantification over formulas with one free variable and membership as the truth of the formula as applied to a number. As the global reflection principle entails formal consistency, the conservativeness result for PA + axioms 1–6 implies that the global reflection principle for Peano arithmetic is not derivable in the typed compositional theory without expanding the induction axioms. In fact, this theory proves neither the statement that all logical validities are true (global reflection for pure first-order logic) nor that all the Peano axioms of arithmetic are true. Perhaps surprisingly, of these two unprovable statements it is the former that is the stronger. The latter can be added as an axiom and the theory remains conservative over PA (Enayat and Visser 2015, Leigh 2015). In contrast, over PA + axioms 1–6, the global reflection principle for first-order logic is equivalent to global reflection for Peano arithmetic (Cieśliński 2010), and these two theories have the same arithmetic consequences as adding the axiom of induction for bounded \((\Delta_0)\) formulas containing the truth predicate (Wcisło and Łełyk 2017). The transition from PA to T(PA) can be imagined as an act of reflection on the truth of \(\mathcal{L}\)-sentences in PA. Similarly, the step from the typed \(T\)-sentences to the compositional axioms is also tied to a reflection principle, specifically the uniform reflection principle over the typed uniform \(T\)-sentences. This is the collection of sentences \(\forall x\, Bew_{S} (\ulcorner \phi(\underline{x})\urcorner) \rightarrow \phi(x) \) where \(\phi\) ranges over formulas in \(\mathcal{L}_T\) with one free variable and S is the theory of the uniform typed T-sentences. Uniform reflection exactly captures the difference between the two theories: the reflection principle is both derivable in T(PA) and suffices to derive the six compositional axioms (Halbach 2001). Moreover, the equivalence extends to iterations of uniform reflection, in that for any ordinal \(\alpha , 1 + \alpha\) iterations of uniform reflection over the typed \(T\)-sentences coincides with T(PA) extended by transfinite induction up to the ordinal \(\varepsilon_{\alpha}\), namely the \(\alpha\)-th ordinal \(\kappa\) with the property that \(\omega^{\kappa} = \kappa \) (Leigh 2016). Much stronger fragments of second-order arithmetic can be interpreted by type-free truth systems, that is, by theories of truth that prove not only the truth of arithmetical sentences but also the truth of sentences of the language \(\mathcal{L}_T\) with the truth predicate; see Section 4 below. 3.4 Hierarchical theories The above mentioned theories of truth can be iterated by introducing indexed truth predicates. One adds to the language of PA truth predicates indexed by ordinals (or ordinal notations) or one adds a binary truth predicate that applies to ordinal notations and sentences. In this respect the hierarchical approach does not fit the framework outlined in Section 2, because the language does not feature a single unary truth predicate applying to sentences but rather many unary truth predicates or a single binary truth predicate (or even a single unary truth predicate applying to pairs of ordinal notations and sentences). In such a language an axiomatization of Tarski’s hierarchy of truth predicates can be formulated. On the proof-theoretic side iterating truth theories in the style of T(PA) corresponds to iterating elementary comprehension, that is, to iterating ACA. The system of iterated truth theories corresponds to the system of ramified analysis (see Feferman 1991). Visser (1989) has studied non-wellfounded hierarchies of languages and axiomatizations thereof. If one adds the \(T\)-sentences \(T_n\ulcorner \phi \urcorner \leftrightarrow \phi\) to the language of arithmetic where \(\phi\) contains only truth predicates \(T_k\) with \(k\gt n\) to PA, a theory is obtained that does not have a standard \((\omega\)-)model. 4. Type-free truth The truth predicates in natural languages do not come with any ouvert type restriction. Therefore typed theories of truth (axiomatic as well as semantic theories) have been thought to be inadequate for analysing the truth predicate of natural language, although recently hierarchical theories have been advocated by Glanzberg (2015) and others. This is one motive for investigating type-free theories of truth, that is, systems of truth that allow one to prove the truth of sentences involving the truth predicate. Some type-free theories of truth have much higher expressive power than the typed theories that have been surveyed in the previous section (at least as long as indexed truth predicates are avoided). Therefore type-free theories of truth are much more powerful tools in the reduction of other theories (for instance, second-order ones). 4.1 Type-free \(T\)-sentences The set of all \(T\)-sentences \(T\ulcorner \phi \urcorner \leftrightarrow \phi\), where \(\phi\) is any sentence of the language \(\mathcal{L}_T\), that is, where \(\phi\) may contain \(T\), is inconsistent with PA (or any theory that proves the diagonal lemma) because of the Liar paradox. Therefore one might try to drop from the set of all \(T\)-sentences only those that lead to an inconsistency. In other words, one may consider maximal consistent sets of \(T\)-sentences. McGee (1992) showed that there are uncountably many maximal sets of \(T\)-sentences that are consistent with PA. So the strategy does not lead to a single theory. Even worse, given an arithmetical sentence (i.e., a sentence not containing \(T)\) that can neither be proved nor disproved in PA, one can find a consistent \(T\)-sentence that decides this sentence (McGee 1992). This implies that many consistent sets of \(T\)-sentences prove false arithmetical statements. Thus the strategy to drop just the \(T\)-sentences that yield an inconsistency is doomed. A set of \(T\)-sentences that does not imply any false arithmetical statement may be obtained by allowing only those \(\phi\) in \(T\)-sentences \(T\ulcorner \phi \urcorner \leftrightarrow \phi\) that contain \(T\) only positively, that is, in the scope of an even number of negation symbols. Like the typed theory in Section 3.2 this theory does not prove certain generalizations but proves the same T-free sentences as the strong type-free compositional Kripke-Feferman theory below (Halbach 2009). Schindler (2015) obtained a deductively very strong truth theory based on stratified disquotational principles. 4.2 Compositionality Besides the disquotational feature of truth, one would also like to capture the compositional features of truth and generalize the axioms of typed compositional truth to the type-free case. To this end, axioms or rules concerning the truth of atomic sentences with the truth predicate will have to be added and the restriction to \(T\)-free sentences in the compositional axioms will have to be lifted. In order to treat truth like other predicates, one will add the axiom \(\forall{\scriptsize A}(T[T{\scriptsize A}] \leftrightarrow T{\scriptsize A})\) (where \(\forall{\scriptsize A}\) ranges over all sentences). If the type restriction of the typed compositional axiom for negation is removed, the axiom \(\forall{\scriptsize A}(T[\neg{\scriptsize A}] \leftrightarrow \neg T{\scriptsize A})\) is obtained. However, the axioms \(\forall{\scriptsize A}(T[T{\scriptsize A}] \leftrightarrow T{\scriptsize A})\) and \(\forall{\scriptsize A}(T[\neg{\scriptsize A}] \leftrightarrow \neg T{\scriptsize A})\) are inconsistent over weak theories of syntax, so one of them has to be given up. If \(\forall{\scriptsize A}(T[\neg{\scriptsize A}] \leftrightarrow \neg T{\scriptsize A})\) is retained, one will have to find weaker axioms or rules for truth iteration, but truth remains a classical concept in the sense that \(\forall{\scriptsize A}(T[\neg{\scriptsize A}] \leftrightarrow \neg T{\scriptsize A})\) implies the law of excluded middle (for any sentence either the sentence itself or its negation is true) and the law of noncontradiction (for no sentence the sentence itself and its negation are true). If, in contrast, \(\forall{\scriptsize A}(T[\neg{\scriptsize A}] \leftrightarrow \neg T{\scriptsize A})\) is rejected and \(\forall{\scriptsize A}(T[T{\scriptsize A}] \leftrightarrow T{\scriptsize A})\) retained, then it will become provable that either some sentences are true together with their negations or that for some sentences neither they nor their negations are true, and thus systems of non-classical truth are obtained, although the systems themselves are still formulated in classical logic. In the next two sections we overview the most prominent system of each kind. 4.3 The Friedman–Sheard theory and revision semantics The system FS, named after Friedman and Sheard (1987), retains the negation axiom \(\forall{\scriptsize A}(T[\neg{\scriptsize A}] \leftrightarrow \neg T{\scriptsize A})\). The further compositional axioms are obtained by lifting the type restriction to their untyped counterparts: If \(\phi\) is a theorem, one may infer \(T\ulcorner \phi \urcorner\), and conversely, if \(T\ulcorner \phi \urcorner\) is a theorem, one may infer \(\phi\). It follows from results due to McGee (1985) that FS is \(\omega\)-inconsistent, that is, FS proves \(\exists x\neg \phi(x)\), but proves also \(\phi\)(0), \(\phi\)(1), \(\phi\)(2), … for some formula \(\phi(x)\) of \(\mathcal{L}_T\). The arithmetical theorems of FS, however, are all correct. In FS one can define all finite levels of the classical Tarskian hierarchy, but FS isn’t strong enough to allow one to recover any of its transfinite levels. Indeed, Halbach (1994) determined its proof-theoretic strength to be precisely that of the theory of ramified truth for all finite levels (i.e., finitely iterated T(PA); see Section 3.4) or, equivalently, the theory of ramified analysis for all finite levels. If either direction of the rule is dropped but the other kept, FS retains its proof-theoretic strength (Sheard 2001). It is a virtue of FS that it is thoroughly classical: It is formulated in classical logic; if a sentence is provably true in FS, then the sentence itself is provable in FS; and conversely if a sentence is provable, then it is also provably true. Its drawback is its \(\omega\)-inconsistency. FS may be seen as an axiomatization of rule-of-revision semantics for all finite levels (see the entry on the revision theory of truth). 4.4 The Kripke–Feferman theory The Kripke–Feferman theory retains the truth iteration axiom \(\forall{\scriptsize A}(T[T{\scriptsize A}] \leftrightarrow T{\scriptsize A})\), but the notion of truth axiomatized is no longer classical because the negation axiom \(\forall{\scriptsize A}(T[\neg{\scriptsize A}] \leftrightarrow \neg T{\scriptsize A})\) is dropped. The semantical construction captured by this theory is a generalization of the Tarskian typed inductive definition of truth captured by T(PA). In the generalized definition one starts with the true atomic sentence of the arithmetical language and then one declares true the complex sentences depending on whether its components are true or not. For instance, as in the typed case, if \(\phi\) and \(\psi\) are true, their conjunction \(\phi \wedge \psi\) will be true as well. In the case of the quantified sentences their truth value is determined by the truth values of their instances (one could render the quantifier clauses purely compositional by using a satisfaction predicate); for instance, a universally quantified sentence will be declared true if and only if all its instances are true. One can now extend this inductive definition of truth to the language \(\mathcal{L}_T\) by declaring a sentence of the form \(T\ulcorner \phi \urcorner\) true if \(\phi\) is already true. Moreover one will declare \(\neg T\ulcorner \phi \urcorner\) true if \(\neg \phi\) is true. By making this idea precise, one obtains a variant of Kripke’s (1975) theory of truth with the so called Strong Kleene valuation scheme (see the entry on many-valued logic). If axiomatized it leads to the following system, which is known as KF (‘Kripke–Feferman’), of which several variants appear in the literature: Apart from the truth-theoretic axioms, KF comprises all axioms of PA and all induction axioms involving the truth predicate. The system is credited to Feferman on the basis of two lectures for the Association of Symbolic Logic, one in 1979 and the second in 1983, as well as in subsequent manuscripts. Feferman published his version of the system under the label Ref(PA) (‘weak reflective closure of PA’) only in 1991, after several other versions of KF had already appeared in print (e.g., Reinhardt 1986, Cantini 1989, who both refer to this unpublished work by Feferman). KF itself is formulated in classical logic, but it describes a non-classical notion of truth. For instance, one can prove \(T\ulcorner L\urcorner \leftrightarrow T\ulcorner\neg L\urcorner\) if \(L\) is the Liar sentence. Thus KF proves that either both the liar sentence and its negation are true or that neither is true. So either is the notion of truth paraconsistent (a sentence is true together with its negation) or paracomplete (neither is true). Some authors have augmented KF with an axiom ruling out truth-value gluts, which makes KF sound for Kripke’s model construction, because Kripke had ruled out truth-value gluts. Feferman (1991) showed that KF is proof-theoretically equivalent to the theory of ramified analysis through all levels below \(\varepsilon_0\), the limit of the sequence \(\omega , \omega^{\omega}, \omega^{\omega^{ \omega} },\ldots\), or a theory of ramified truth through the same ordinals. This result shows that in KF exactly \(\varepsilon_0\) many levels of the classical Tarskian hierarchy in its axiomatized form can be recovered. Thus KF is far stronger than FS, let alone T(PA). Feferman (1991) devised also a strengthening of KF that is as strong as full predicative analysis, that is ramified analysis or truth up to the ordinal \(\Gamma_0\). Just as with the typed truth predicate, the theory KF (more precisely, a common variant of it) can be obtained via an act of reflection on a system of untyped \(T\)-sentences. The system of \(T\)-sentences in question is the extension of the uniform positive untyped \(T\)-sentences by a primitive falsity predicate, that is, the theory features two unary predicates \(T\) and \(F\) and axioms for every formula \(\phi(v)\) positive in both \(T\) and \(F\), where \(\phi '\) represents the De Morgan dual of \(\phi\) (exchanging \(T\) for \(F\) and vice versa). From an application of uniform reflection over this disquotational theory, the truth axioms for the corresponding two predicate version of KF are derivable (Horsten and Leigh, 2016). The converse also holds, as does the generalisation to finite and transfinite iterations of reflection (Leigh, 2017). 4.5 Capturing the minimal fixed point As remarked above, if KF proves \(T\ulcorner \phi \urcorner\) for some sentence \(\phi\) then \(\phi\) holds in all Kripke fixed point models. In particular, there are \(2^{\aleph_0}\) fixed points that form a model of the internal theory of KF. Thus from the perspective of KF, the least fixed point (from which Kripke’s theory is defined) is not singled out. Burgess (2014) provides an expansion of KF, named \(\mu\)KF, that attempts to capture the minimal Kripkean fixed point. KF is expanded by additional axioms that express that the internal theory of KF is the smallest class closed under the defining axioms for Kripkean truth. This can be formulated as a single axiom schema that states, for each open formula \(\phi\), If \(\phi\) satisfies the same axioms of KF as the predicate \(T\) then \(\phi\) holds of every true sentence. From a proof-theoretic perspective \(\mu\)KF is significantly stronger than KF. The single axiom schema expressing the minimality of the truth predicate allows one to embed into \(\mu\)KF the system ID\(_1\) of one arithmetical inductive definition, an impredicative theory. While intuitively plausible, \(\mu\)KF suffers the same expressive incompleteness as KF: Since the minimal Kripkean fixed point forms a complete \(\Pi^{1}_1\) set and the internal theory of \(\mu\)KF remains recursively enumerable, there are standard models of the theory in which the interpretation of the truth predicate is not actually the minimal fixed point. At present there lacks a thorough analysis of the models of \(\mu\)KF. 4.6 Axiomatisations of Kripke’s theory with supervaluations KF is intended to be an axiomatization of Kripke’s (1975) semantical theory. This theory is based on partial logic with the Strong Kleene evaluation scheme. In Strong Kleene logic not every sentence \(\phi \vee \neg \phi\) is a theorem; in particular, this disjunction is not true if \(\phi\) lacks a truth value. Consequently \(T\ulcorner L\vee \neg L\urcorner\) (where \(L\) is the Liar sentence) is not a theorem of KF and its negation is even provable. Cantini (1990) has proposed a system VF that is inspired by the supervaluations scheme. In VF all classical tautologies are provably true and \(T\ulcorner L \vee \neg L\urcorner\), for instance, is a theorem of VF. VF can be formulated in \(\mathcal{L}_T\) and uses classical logic. It is no longer a compositional theory of truth, for the following is not a theorem of VF: Not only is this principle inconsistent with the other axioms of VF, it does not fit the supervaluationist model for it implies \(T\ulcorner L\urcorner \vee T\ulcorner \neg L\urcorner\), which of course is not correct because according to the intended semantics neither the liar sentence nor its negation is true: both lack a truth value. Extending a result due to Friedman and Sheard (1987), Cantini showed that VF is much stronger than KF: VF is proof-theoretically equivalent to the theory ID\(_1\) of non-iterated inductive definitions, which is not predicative. 5. Non-classical approaches to self-reference The theories of truth discussed thus far are all axiomatized in classical logic. Some authors have also looked into axiomatic theories of truth based on non-classical logic (see, for example, Field 2008, Halbach and Horsten 2006, Leigh and Rathjen 2012). There are a number of reasons why a logic weaker than classical logic may be preferred. The most obvious is that by weakening the logic, some collections of axioms of truth that were previously inconsistent become consistent. Another common reason is that the axiomatic theory in question intends to capture a particular non-classical semantics of truth, for which a classical background theory may prove unsound. 5.1 The truth predicate in intuitionistic logic The inconsistency of the \(T\)-sentences does not rely on classical reasoning. It is also inconsistent over much weaker logics such as minimal logic and partial logic. However, classical logic does play a role in restricting the free use of principles of truth. For instance, over a classical base theory, the compositional axiom for implication \((\rightarrow)\) is equivalent to the principle of completeness, \(\forall{\scriptsize A}(T[{\scriptsize A}] \vee T[\neg{\scriptsize A}\)]). If the logic under the truth predicate is classical, completeness is equivalent to the compositional axiom for disjunction. Without the law of excluded middle, FS can be formulated as a fully compositional theory while not proving the truth-completeness principle (Leigh & Rathjen 2012). In addition, classical logic has an effect on attempts to combine compositional and self-applicable axioms of truth. If, for example, one drops the axiom of truth-consistency from FS (the left-to-right direction of axiom 2 in Section 4.3) as well as the law of excluded middle for the truth predicate, it is possible to add consistently the truth-iteration axiom \(\forall{\scriptsize A}(T[{\scriptsize A}] \rightarrow T[T{\scriptsize A}])\). The resulting theory still bears a strong resemblance to FS in that the constructive version of the rule-of-revision semantics for all finite levels provides a natural model of the theory, and the two theories share the same \(\Pi^{0}_2\) consequences (Leigh & Rathjen 2012; Leigh, 2013). This result should be contrasted with KF which, if formulated without the law of excluded middle, remains maximally consistent with respect to its choice of truth axioms but is a conservative extension of Heyting arithmetic. 5.2 Axiomatising Kripke’s theory Kripke’s (1975) theory in its different guises is based on partial logic. In order to obtain models for a theory in classical logic, the extension of the truth predicate in the partial model is used again as the extension of truth in the classical model. In the classical model false sentences and those without a truth value in the partial model are declared not true. KF is sound with respect to these classical models and thus incorporates two distinct logics. The first is the ‘internal’ logic of statements under the truth predicate and is formulated with the Strong Kleene valuation schema. The second is the ‘external’ logic which is full classical logic. An effect of formulating KF in classical logic is that the theory cannot be consistently closed under the truth-introduction rule If \(\phi\) is a theorem of KF, so is \(T\ulcorner \phi \urcorner\). A second effect of classical logic is the statement of the excluded middle for the liar sentence. Neither the Liar sentence nor its negation obtains a truth value in Kripke’s theory, so the disjunction of the two is not valid. The upshot is that KF, if viewed as an axiomatisation of Kripke’s theory, is not sound with respect to its intended semantics. For this reason Halbach and Horsten (2006) and Horsten (2011) explore an axiomatization of Kripke’s theory with partial logic as inner and outer logic. Their suggestion, a theory labelled PKF (‘partial KF’), can be axiomatised as a Gentzen-style two-sided sequent calculus based on Strong Kleene logic (see the entry on many-valued logic). PKF is formed by adding to this calculus the Peano–Dedekind axioms of arithmetic including full induction and the compositional and truth-iteration rules for the truth predicate as proscribed by Kripke’s theory. The result is a theory of truth that is sound with respect to Kripke’s theory. Halbach and Horsten show that this axiomatization of Kripke’s theory is significantly weaker than it’s classical cousin KF. The result demonstrates that restricting logic only for sentences with the truth predicate can hamper also the derivation of truth-free theorems.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

Time Travel and Modern Physics

1. A Botched Suicide You are very depressed. You are suicidally depressed. You have a gun. But you do not quite have the courage to point the gun at yourself and kill yourself in this way. If only someone else would kill you, that would be a good thing. But …

1. A Botched Suicide You are very depressed. You are suicidally depressed. You have a gun. But you do not quite have the courage to point the gun at yourself and kill yourself in this way. If only someone else would kill you, that would be a good thing. But you can't really ask someone to kill you. That wouldn't be fair. You decide that if you remain this depressed and you find a time machine, you will travel back in time to just about now, and kill your earlier self. That would be good. In that way you even would get rid of the depressing time you will spend between now and when you would get into that time machine. You start to muse about the coherence of this idea, when something amazing happens. Out of nowhere you suddenly see someone coming towards you with a gun pointed at you. In fact he looks very much like you, except that he is bleeding badly from his left eye, and can barely stand up straight. You are at peace. You look straight at him, calmly. He shoots. You feel a searing pain in your left eye. Your mind is in chaos, you stagger around and accidentally enter a strange looking cubicle. You drift off into unconsciousness. After a while, you can not tell how long, you drift back into consciousness and stagger out of the cubicle. You see someone in the distance looking at you calmly and fixedly. You realize that it is your younger self. He looks straight at you. You are in terrible pain. You have to end this, you have to kill him, really kill him once and for all. You shoot him, but your eyesight is so bad that your aim is off. You do not kill him, you merely damage his left eye. He staggers off. You fall to the ground in agony, and decide to study the paradoxes of time travel more seriously. 2. Why Do Time Travel Suicides Get Botched? The standard worry about time travel is that it allows one to go back and kill one's younger self and thereby create paradox. More generally it allows for people or objects to travel back in time and to cause events in the past that are inconsistent with what in fact happened. (See e.g., Gödel 1949, Earman 1972, Malament 1985a&b, Horwich 1987.) A stone-walling response to this worry is that by logic indeed inconsistent events can not both happen. Thus in fact all such schemes to create paradox are logically bound to fail. So what's the worry? Well, one worry is the question as to why such schemes always fail. Doesn't the necessity of such failures put prima facie unusual and unexpected constraints on the actions of people, or objects, that have traveled in time? Don't we have good reason to believe that there are no such constraints (in our world) and thus that there is no time travel (in our world)? We will later return to the issue of the palatability of such constraints, but first we want to discuss an argument that no constraints are imposed by time travel. 3. Topology and Constraints Wheeler and Feynman (1949) were the first to claim that the fact that nature is continuous could be used to argue that causal influences from later events to earlier events, as are made possible by time travel, will not lead to paradox without the need for any constraints. Maudlin (1990) showed how to make their argument precise and more general, and argued that nonetheless it was not completely general. Imagine the following set-up. We start off having a camera with a black and white film ready to take a picture of whatever comes out of the time machine. An object, in fact a developed film, comes out of the time machine. We photograph it, and develop the film. The developed film is subsequently put in the time machine, and set to come out of the time machine at the time the picture is taken. This surely will create a paradox: the developed film will have the opposite distribution of black, white, and shades of gray, from the object that comes out of the time machine. For developed black and white films (i.e. negatives) have the opposite shades of gray from the objects they are pictures of. But since the object that comes out of the time machine is the developed film itself it we surely have a paradox. However, it does not take much thought to realize that there is no paradox here. What will happen is that a uniformly gray picture will emerge, which produces a developed film that has exactly the same uniform shade of gray. No matter what the sensitivity of the film is, as long as the dependence of the brightness of the developed film depends in a continuous manner on the brightness of the object being photographed, there will be a shade of gray that, when photographed, will produce exactly the same shade of gray on the developed film. This is the essence of Wheeler and Feynman's idea. Let us first be a bit more precise and then a bit more general. For simplicity let us suppose that the film is always a uniform shade of gray (i.e. at any time the shade of gray does not vary by location on the film). The possible shades of gray of the film can then be represented by the (real) numbers from 0, representing pure black, to 1, representing pure white. Let us now distinguish various stages in the chronogical order of the life of the film. In stage S1 the film is young; it has just been placed in the camera and is ready to be exposed. It is then exposed to the object that comes out of the time machine. (That object in fact is a later stage of the film itself). By the time we come to stage S2 of the life of the film, it has been developed and is about to enter the time machine. Stage S3 occurs just after it exits the time machine and just before it is photographed. Stage S4 occurs after it has been photographed and before it starts fading away. Let us assume that the film starts out in stage S1 in some uniform shade of gray, and that the only significant change in the shade of gray of the film occurs between stages S1 and S2. During that period it acquires a shade of gray that depends on the shade of gray of the object that was photographed. I.e., the shade of gray that the film acquires at stage S2 depends on the shade of gray it has at stage S3. The influence of the shade of gray of the film at stage S3, on the shade of gray of the film at stage S2, can be represented as a mapping, or function, from the real numbers between 0 and 1 (inclusive), to the real numbers between 0 and 1 (inclusive). Let us suppose that the process of photography is such that if one imagines varying the shade of gray of an object in a smooth, continuous manner then the shade of gray of the developed picture of that object will also vary in a smooth, continuous manner. This implies that the function in question will be a continuous function. Now any continuous function from the real numbers between 0 and 1 (inclusive) to the real numbers between 0 and 1 (inclusive) must map at least one number to itself. One can quickly convince oneself of this by graphing such functions. For one will quickly see that any continuous function f from [0,1] to [0,1] must intersect the line x=y somewhere, and thus there must be at least one point x such that f(x)=x. Such points are called fixed points of the function. Now let us think about what such a fixed point represents. It represents a shade of gray such that, when photographed, it will produce a developed film with exactly that same shade of gray. The existence of such a fixed point implies a solution to the apparent paradox. Let us now be more general and allow color photography. One can represent each possible color of an object (of uniform color) by the proportions of blue, green and red that make up that color. (This is why television screens can produce all possible colors.) Thus one can represent all possible colors of an object by three points on three orthogonal lines x, y and z, that is to say, by a point in a three-dimensional cube. This cube is also known as the ‘Cartesian product’ of the three line segments. Now, one can also show that any continuous map from such a cube to itself must have at least one fixed point. So color photography can not be used to create time travel paradoxes either! Even more generally, consider some system P which, as in the above example, has the following life. It starts in some state S1, it interacts with an object that comes out of a time machine (which happens to be its older self), it travels back in time, it interacts with some object (which happens to be its younger self), and finally it grows old and dies. Let us assume that the set of possible states of P can be represented by a Cartesian product of n closed intervals of the reals, i.e., let us assume that the topology of the state-space of P is isomorphic to a finite Cartesian product of closed intervals of the reals. Let us further assume that the development of P in time, and the dependence of that development on the state of objects that it interacts with, is continuous. Then, by a well-known fixed point theorem in topology (see e.g., Hocking and Young 1961, p 273), no matter what the nature of the interaction is, and no matter what the initial state of the object is, there will be at least one state S3 of the older system (as it emerges from the time travel machine) that will influence the initial state S1 of the younger system (when it encounters the older system) so that, as the younger system becomes older, it develops exactly into state S3. Thus without imposing any constraints on the initial state S1 of the system P, we have shown that there will always be perfectly ordinary, non-paradoxical, solutions, in which everything that happens, happens according to the usual laws of development. Of course, there is looped causation, hence presumably also looped explanation, but what do you expect if there is looped time? Unfortunately, for the fan of time travel, a little reflection suggests that there are systems for which the needed fixed point theorem does not hold. Imagine, for instance, that we have a dial that can only rotate in a plane. We are going to put the dial in the time machine. Indeed we have decided that if we see the later stage of the dial come out of the time machine set at angle x, then we will set the dial to x+90, and throw it into the time machine. Now it seems we have a paradox, since the mapping that consists of a rotation of all points in a circular state-space by 90 degrees does not have a fixed point. And why wouldn't some state-spaces have the topology of a circle? However, we have so far not used another continuity assumption which is also a reasonable assumption. So far we have only made the following demand: the state the dial is in at stage S2 must be a continuous function of the state of the dial at stage S3. But, the state of the dial at stage S2 is arrived at by taking the state of the dial at stage S1, and rotating it over some angle. It is not merely the case that the effect of the interaction, namely the state of the dial at stage S2, should be a continuous function of the cause, namely the state of the dial at stage S3. It is additionally the case that path taken to get there, the way the dial is rotated between stages S1 and S2 must be a continuous function of the state at stage S3. And, rather surprisingly, it turns out that this can not be done. Let us illustrate what the problem is before going to a more general demonstration that there must be a fixed point solution in the dial case. Forget time travel for the moment. Suppose that you and I each have a watch with a single dial neither of which is running. My watch is set at 12. You are going to announce what your watch is set at. My task is going to be to adjust my watch to yours no matter what announcement you make. And my actions should have a continuous (single valued) dependence on the time that you announce. Surprisingly, this is not possible! For instance, suppose that if you announce “12”, then I achieve that setting on my watch by doing nothing. Now imagine slowly and continuously increasing the announced times, starting at 12. By continuity, I must achieve each of those settings by rotating my dial to the right. If at some point I switch and achieve the announced goal by a rotation of my dial to the left, I will have introduced a discontinuity in my actions, a discontinuity in the actions that I take as a function of the announced angle. So I will be forced, by continuity, to achieve every announcement by rotating the dial to the right. But, this rotation to the right will have to be abruptly discontinued as the announcements grow larger and I eventually approach 12 again, since I achieved 12 by not rotating the dial at all. So, there will be a discontinuity at 12 at the latest. In general, continuity of my actions as a function of announced times can not be maintained throughout if I am to be able to replicate all possible settings. Another way to see the problem is that one can similarly reason that, as one starts with 12, and imagines continuously making the announced times earlier, one will be forced, by continuity, to achieve the announced times by rotating the dial to the left. But the conclusions drawn from the assumption of continuous increases and the assumption of continuous decreases are inconsistent. So we have an inconsistency following from the assumption of continuity and the assumption that I always manage to set my watch to your watch. So, a dial developing according to a continuous dynamics from a given initial state, can not be set up so as to react to a second dial, with which it interacts, in such a way that it is guaranteed to always end up set at the same angle as the second dial. Similarly, it can not be set up so that it is guaranteed to always end up set at 90 degrees to the setting of the second dial. All of this has nothing to do with time travel. However, the impossibility of such set ups is what prevents us from enacting the rotation by 90 degrees that would create paradox in the time travel setting. Let us now give the positive result that with such dials there will always be fixed point solutions, as long as the dynamics is continuous. Let us call the state of the dial before it interacts with its older self the initial state of the dial. And let us call the state of the dial after it emerges from the time machine the final state of the dial. We can represent the possible initial and final states of the dial by the angles x and y that the dial can point at initially and finally. The set of possible initial plus final states thus forms a torus. (See figure 1.) Figure 1 Suppose that the dial starts at angle I. The initial angle I that the dial is at before it encounters its older self, and the set of all possible final angles that the dial can have when it emerges from the time machine is represented by the circle I on the torus (see figure 1). Given any possible angle of the emerging dial the dial initially at angle I will develop to some other angle. One can picture this development by rotating each point on I in the horizontal direction by the relevant amount. Since the rotation has to depend continuously on the angle of the emerging dial, ring I during this development will deform into some loop L on the torus. Loop L thus represents the angle x that the dial is at when it is thrown into the time machine, given that it started at angle I and then encountered a dial (its older self) which was at angle y when it emerged from the time machine. We therefore have consistency if x=y for some x and y on loop L. Now, let loop C be the loop which consists of all the points on the torus for which x=y. Ring I intersects C at point <i,i>. Obviously any continuous deformation of I must still intersect C somewhere. So L must intersect C somewhere, say at <j,j>. But that means that no matter how the development of the dial starting at I depends on the angle of the emerging dial, there will be some angle for the emerging dial such that the dial will develop exactly into that angle (by the time it enters the time machine) under the influence of that emerging dial. This is so no matter what angle one starts with, and no matter how the development depends on the angle of the emerging dial. Thus even for a circular state-space there are no constraints needed other than continuity. Unfortunately there are state-spaces that escape even this argument. Consider for instance a pointer that can be set to all values between 0 and 1, where 0 and 1 are not possible values. That is, suppose that we have a state-space that is isomorphic to an open set of real numbers. Now suppose that we have a machine that sets the pointer to half the value that the pointer is set at when it emerges from the time machine. Figure 2 Suppose the pointer starts at value I. As before we can represent the combination of this initial position and all possible final positions by the line I. Under the influence of the pointer coming out of the time machine the pointer value will develop to a value that equals half the value of the final value that it encountered. We can represent this development as the continuous deformation of line I into line L, which is indicated by the arrows in Figure 2. This development is fully continuous. Points <x,y> on line I represent the initial position x=I of the (young) pointer, and the position y of the older pointer as it emerges from the time machine. Points <x,y> on line L represent the position x that the younger pointer should develop into, given that it encountered the older pointer emerging from the time machine set at position y. Since the pointer is designed to develop to half the value of the pointer that it encounters, the line L corresponds to x=1/2y. We have consistency if there is some point such that it develops into that point, if it encounters that point. Thus, we have consistency if there is some point <x,y> on line L such that x=y. However, there is no such point: lines L and C do not intersect. Thus there is no consistent solution, despite the fact that the dynamics is fully continuous. Of course if 0 were a possible value L and C would intersect at 0. This is surprising and strange: adding one point to the set of possible values of a quantity here makes the difference between paradox and peace. One might be tempted to just add the extra point to the state-space in order to avoid problems. After all, one might say, surely no measurements could ever tell us whether the set of possible values includes that exact point or not. Unfortunately there can be good theoretical reasons for supposing that some quantity has a state-space that is open: the set of all possible speeds of massive objects in special relativity surely is an open set, since it includes all speeds up to, but not including, the speed of light. Quantities that have possible values that are not bounded also lead to counter examples to the presented fixed point argument. And it is not obvious to us why one should exclude such possibilities. So the argument that no constraints are needed is not fully general. An interesting question of course is: exactly for which state-spaces must there be such fixed points. We do not know the general answer. (But see Kutach 2003 for more on this issue.) 4. The General Possibility of Time Travel in General Relativity Time travel has recently been discussed quite extensively in the context of general relativity. Time travel can occur in general relativistic models in which one has closed time-like curves (CTC's). A time like curve is simply a space-time trajectory such that the speed of light is never equalled or exceeded along this trajectory. Time-like curves thus represent the possible trajectories of ordinary objects. If there were time-like curves which were closed (formed a loop), then travelling along such a curve one would never exceed the speed of light, and yet after a certain amount of (proper) time one would return to a point in space-time that one previously visited. Or, by staying close to such a CTC, one could come arbitrarily close to a point in space-time that one previously visited. General relativity, in a straightforward sense, allows time travel: there appear to be many space-times compatible with the fundamental equations of General Relativity in which there are CTC's. Space-time, for instance, could have a Minkowski metric everywhere, and yet have CTC's everywhere by having the temporal dimension (topologically) rolled up as a circle. Or, one can have wormhole connections between different parts of space-time which allow one to enter ‘mouth A’ of such a wormhole connection, travel through the wormhole, exit the wormhole at ‘mouth B’ and re-enter ‘mouth A’ again. Or, one can have space-times which topologically are R4, and yet have CTC's due to the ‘tilting’ of light cones (Gödel space-times, Taub-NUT space-times, etc.) General relativity thus appears to provide ample opportunity for time travel. Note that just because there are CTC's in a space-time, this does not mean that one can get from any point in the space-time to any other point by following some future directed timelike curve. In many space-times in which there are CTC's such CTC's do not occur all over space-time. Some parts of space-time can have CTC's while other parts do not. Let us call the part of a space-time that has CTC's the “time travel region" of that space-time, while calling the rest of that space-time the "normal region". More precisely, the “time travel region" consists of all the space-time points p such that there exists a (non-zero length) timelike curve that starts at p and returns to p. Now let us start examining space-times with CTC's a bit more closely for potential problems. 5. Two Toy Models In order to get a feeling for the sorts of implications that closed timelike curves can have, it may be useful to consider two simple models. In space-times with closed timelike curves the traditional initial value problem cannot be framed in the usual way. For it presupposes the existence of Cauchy surfaces, and if there are CTCs then no Cauchy surface exists. (A Cauchy surface is a spacelike surface such that every inextendible timelike curve crosses it exactly once. One normally specifies initial conditions by giving the conditions on such a surface.) Nonetheless, if the topological complexities of the manifold are appropriately localized, we can come quite close. Let us call an edgeless spacelike surface S a quasi-Cauchy surface if it divides the rest of the manifold into two parts such that a) every point in the manifold can be connected by a timelike curve to S, and b) any timelike curve which connects a point in one region to a point in the other region intersects S exactly once. It is obvious that a quasi-Cauchy surface must entirely inhabit the normal region of the space-time; if any point p of S is in the time travel region, then any timelike curve which intersects p can be extended to a timelike curve which intersects S near p again. In extreme cases of time travel, a model may have no normal region at all (e.g., Minkowski space-time rolled up like a cylinder in a time-like direction), in which case our usual notions of temporal precedence will not apply. But temporal anomalies like wormholes (and time machines) can be sufficiently localized to permit the existence of quasi-Cauchy surfaces. Given a timelike orientation, a quasi-Cauchy surface unproblematically divides the manifold into its past (i.e., all points that can be reached by past-directed timelike curves from S) and its future (ditto mutatis mutandis). If the whole past of S is in the normal region of the manifold, then S is a partial Cauchy surface: every inextendible timelike curve which exists to the past of S intersects S exactly once, but (if there is time travel in the future) not every inextendible timelike curve which exists to the future of S intersects S. Now we can ask a particularly clear question: consider a manifold which contains a time travel region, but also has a partial Cauchy surface S, such that all of the temporal funny business is to the future of S. If all you could see were S and its past, you would not know that the space-time had any time travel at all. The question is: are there any constraints on the sort of data which can be put on S and continued to a global solution of the dynamics which are different from the constraints (if any) on the data which can be put on a Cauchy surface in a simply connected manifold and continued to a global solution? If there is time travel to our future, might we we able to tell this now, because of some implied oddity in the arrangement of present things? It is not at all surprising that there might be constraints on the data which can be put on a locally space-like surface which passes through the time travel region: after all, we never think we can freely specify what happens on a space-like surface and on another such surface to its future, but in this case the surface at issue lies to its own future. But if there were particular constraints for data on a partial Cauchy surface then we would apparently need to have to rule out some sorts of otherwise acceptable states on S if there is to be time travel to the future of S. We then might be able to establish that there will be no time travel in the future by simple inspection of the present state of the universe. As we will see, there is reason to suspect that such constraints on the partial Cauchy surface are non-generic. But we are getting ahead of ourselves: first let's consider the effect of time travel on a very simple dynamics. The simplest possible example is the Newtonian theory of perfectly elastic collisions among equally massive particles in one spatial dimension. The space-time is two-dimensional, so we can represent it initially as the Euclidean plane, and the dynamics is completely specified by two conditions. When particles are traveling freely, their world lines are straight lines in the space-time, and when two particles collide, they exchange momenta, so the collision looks like an ‘X’ in space-time, with each particle changing its momentum at the impact.[1] The dynamics is purely local, in that one can check that a set of world-lines constitutes a model of the dynamics by checking that the dynamics is obeyed in every arbitrarily small region. It is also trivial to generate solutions from arbitrary initial data if there are no CTCs: given the initial positions and momenta of a set of particles, one simply draws a straight line from each particle in the appropriate direction and continues it indefinitely. Once all the lines are drawn, the worldline of each particle can be traced from collision to collision. The boundary value problem for this dynamics is obviously well-posed: any set of data at an instant yields a unique global solution, constructed by the method sketched above. What happens if we change the topology of the space-time by hand to produce CTCs? The simplest way to do this is depicted in figure 3: we cut and paste the space-time so it is no longer simply connected by identifying the line L− with the line L+. Particles “going in” to L+ from below “emerge” from L− , and particles “going in” to L− from below “emerge” from L+. Figure 3: Inserting CTCs by Cut and Paste How is the boundary-value problem changed by this alteration in the space-time? Before the cut and paste, we can put arbitrary data on the simultaneity slice S and continue it to a unique solution. After the change in topology, S is no longer a Cauchy surface, since a CTC will never intersect it, but it is a partial Cauchy surface. So we can ask two questions. First, can arbitrary data on S always be continued to a global solution? Second, is that solution unique? If the answer to the first question is no, then we have a backward-temporal constraint: the existence of the region with CTCs places constraints on what can happen on S even though that region lies completely to the future of S. If the answer to the second question is no, then we have an odd sort of indeterminism: the complete physical state on S does not determine the physical state in the future, even though the local dynamics is perfectly deterministic and even though there is no other past edge to the space-time region in S's future (i.e., there is nowhere else for boundary values to come from which could influence the state of the region). In this case the answer to the first question is yes and to the second is no: there are no constraints on the data which can be put on S, but those data are always consistent with an infinitude of different global solutions. The easy way to see that there always is a solution is to construct the minimal solution in the following way. Start drawing straight lines from S as required by the initial data. If a line hits L− from the bottom, just continue it coming out of the top of L+ in the appropriate place, and if a line hits L+ from the bottom, continue it emerging from L− at the appropriate place. Figure 4 represents the minimal solution for a single particle which enters the time-travel region from the left: Figure 4: The Minimal Solution The particle ‘travels back in time’ three times. It is obvious that this minimal solution is a global solution, since the particle always travels inertially. But the same initial state on S is also consistent with other global solutions. The new requirement imposed by the topology is just that the data going into L+ from the bottom match the data coming out of L− from the top, and the data going into L- from the bottom match the data coming out of L+ from the top. So we can add any number of vertical lines connecting L- and L+ to a solution and still have a solution. For example, adding a few such lines to the minimal solution yields: Figure 5: A Non-Minimal Solution The particle now collides with itself twice: first before it reaches L+ for the first time, and again shortly before it exits the CTC region. From the particle's point of view, it is traveling to the right at a constant speed until it hits an older version of itself and comes to rest. It remains at rest until it is hit from the right by a younger version of itself, and then continues moving off, and the same process repeats later. It is clear that this is a global model of the dynamics, and that any number of distinct models could be generating by varying the number and placement of vertical lines. Knowing the data on S, then, gives us only incomplete information about how things will go for the particle. We know that the particle will enter the CTC region, and will reach L+, we know that it will be the only particle in the universe, we know exactly where and with what speed it will exit the CTC region. But we cannot determine how many collisions the particle will undergo (if any), nor how long (in proper time) it will stay in the CTC region. If the particle were a clock, we could not predict what time it would indicate when exiting the region. Furthermore, the dynamics gives us no handle on what to think of the various possibilities: there are no probabilities assigned to the various distinct possible outcomes. Changing the topology has changed the mathematics of the situation in two ways, which tend to pull in opposite directions. On the one hand, S is no longer a Cauchy surface, so it is perhaps not surprising that data on S do not suffice to fix a unique global solution. But on the other hand, there is an added constraint: data “coming out” of L− must exactly match data “going in” to L+, even though what comes out of L− helps to determine what goes into L+. This added consistency constraint tends to cut down on solutions, although in this case the additional constraint is more than outweighed by the freedom to consider various sorts of data on L+/L-. The fact that the extra freedom outweighs the extra constraint also points up one unexpected way that the supposed paradoxes of time travel may be overcome. Let's try to set up a paradoxical situation using the little closed time loop above. If we send a single particle into the loop from the left and do nothing else, we know exactly where it will exit the right side of the time travel region. Now suppose we station someone at the other side of the region with the following charge: if the particle should come out on the right side, the person is to do something to prevent the particle from going in on the left in the first place. In fact, this is quite easy to do: if we send a particle in from the right, it seems that it can exit on the left and deflect the incoming left-hand particle. Carrying on our reflection in this way, we further realize that if the particle comes out on the right, we might as well send it back in order to deflect itself from entering in the first place. So all we really need to do is the following: set up a perfectly reflecting particle mirror on the right-hand side of the time travel region, and launch the particle from the left so that—if nothing interferes with it—it will just barely hit L+. Our paradox is now apparently complete. If, on the one hand, nothing interferes with the particle it will enter the time-travel region on the left, exit on the right, be reflected from the mirror, re-enter from the right, and come out on the left to prevent itself from ever entering. So if it enters, it gets deflected and never enters. On the other hand, if it never enters then nothing goes in on the left, so nothing comes out on the right, so nothing is reflected back, and there is nothing to deflect it from entering. So if it doesn't enter, then there is nothing to deflect it and it enters. If it enters, then it is deflected and doesn't enter; if it doesn't enter then there is nothing to deflect it and it enters: paradox complete. But at least one solution to the supposed paradox is easy to construct: just follow the recipe for constructing the minimal solution, continuing the initial trajectory of the particle (reflecting it the mirror in the obvious way) and then read of the number and trajectories of the particles from the resulting diagram. We get the result of figure 6: Figure 6: Resolving the “Paradox” As we can see, the particle approaching from the left never reaches L+: it is deflected first by a particle which emerges from L-. But it is not deflected by itself, as the paradox suggests, it is deflected by another particle. Indeed, there are now four particles in the diagram: the original particle and three particles which are confined to closed time-like curves. It is not the leftmost particle which is reflected by the mirror, nor even the particle which deflects the leftmost particle; it is another particle altogether. The paradox gets it traction from an incorrect presupposition: if there is only one particle in the world at S then there is only one particle which could participate in an interaction in the time travel region: the single particle would have to interact with its earlier (or later) self. But there is no telling what might come out of L− : the only requirement is that whatever comes out must match what goes in at L+. So if you go to the trouble of constructing a working time machine, you should be prepared for a different kind of disappointment when you attempt to go back and kill yourself: you may be prevented from entering the machine in the first place by some completely unpredictable entity which emerges from it. And once again a peculiar sort of indeterminism appears: if there are many self-consistent things which could prevent you from entering, there is no telling which is even likely to materialize. So when the freedom to put data on L− outweighs the constraint that the same data go into L+, instead of paradox we get an embarrassment of riches: many solution consistent with the data on S. To see a case where the constraint “outweighs” the freedom, we need to construct a very particular, and frankly artificial, dynamics and topology. Consider the space of all linear dynamics for a scalar field on a lattice. (The lattice can be though of as a simple discrete space-time.) We will depict the space-time lattice as a directed graph. There is to be a scalar field defined at every node of the graph, whose value at a given node depends linearly on the values of the field at nodes which have arrows which lead to it. Each edge of the graph can be assigned a weighting factor which determines how much the field at the input node contributes to the field at the output node. If we name the nodes by the letters a, b, c, etc., and the edges by their endpoints in the obvious way, then we can label the weighting factors by the edges they are associated with in an equally obvious way. Suppose that the graph of the space-time lattice is acyclic, as in figure 7. (A graph is Acyclic if one can not travel in the direction of the arrows and go in a loop.) Figure 7: An Acyclic Lattice It is easy to regard a set of nodes as the analog of a Cauchy surface, e.g., the set {a, b, c}, and it is obvious if arbitrary data are put on those nodes the data will generate a unique solution in the future.[2] If the value of the field at node a is 3 and at node b is 7, then its value at node d will be 3Wad and its value at node e will be 3Wae + 7Wbe. By varying the weighting factors we can adjust the dynamics, but in an acyclic graph the future evolution of the field will always be unique. Let us now again artificially alter the topology of the lattice to admit CTCs, so that the graph now is cyclic. One of the simplest such graphs is depicted in figure 8: there are now paths which lead from z back to itself, e.g., z to y to z. Figure 8: Time Travel on a Lattice Can we now put arbitrary data on v and w, and continue that data to a global solution? Will the solution be unique? In the generic case, there will be a solution and the solution will be unique. The equations for the value of the field at x, y, and z are: x = vWvx + zWzx y = wWwy + zWzy z = xWxz + yWyz. Solving these equations for z yields z = (vWvx + zWzx)Wxz + (wWwy + zWzy)Wyz, or z = (vWvxWxz + wWwyWyz)/ (1 − WzxWxz − WzyWyz), which gives a unique value for z in the generic case. But looking at the space of all possible dynamics for this lattice (i.e., the space of all possible weighting factors), we find a singularity in the case where 1−WzxWxz − WzyWyz = 0. If we choose weighting factors in just this way, then arbitrary data at v and w cannot be continued to a global solution. Indeed, if the scalar field is everywhere non-negative, then this particular choice of dynamics puts ironclad constraints on the value of the field at v and w: the field there must be zero (assuming Wvx and Wwy to be non-zero), and similarly all nodes in their past must have field value zero. If the field can take negative values, then the values at v and w must be so chosen that vWvxWxz = −wWwyWyz. In either case, the field values at v and w are severely constrained by the existence of the CTC region even though these nodes lie completely to the past of that region. It is this sort of constraint which we find to be unlike anything which appears in standard physics. Our toy models suggest three things. The first is that it may be impossible to prove in complete generality that arbitrary data on a partial Cauchy surface can always be continued to a global solution: our artificial case provides an example where it cannot. The second is that such odd constraints are not likely to be generic: we had to delicately fine-tune the dynamics to get a problem. The third is that the opposite problem, namely data on a partial Cauchy surface being consistent with many different global solutions, is likely to be generic: we did not have to do any fine-tuning to get this result. And this leads to a peculiar sort of indeterminism: the entire state on S does not determine what will happen in the future even though the local dynamics is deterministic and there are no other “edges” to space-time from which data could influence the result. What happens in the time travel region is constrained but not determined by what happens on S, and the dynamics does not even supply any probabilities for the various possibilities. The example of the photographic negative discussed in section 3, then, seems likely to be unusual, for in that case there is a unique fixed point for the dynamics, and the set-up plus the dynamical laws determine the outcome. In the generic case one would rather expect multiple fixed points, with no room for anything to influence, even probabilistically, which would be realized. It is ironic that time travel should lead generically not to contradictions or to constraints (in the normal region) but to underdetermination of what happens in the time travel region by what happens everywhere else (an underdetermination tied neither to a probabilistic dynamics or to a free edge to space-time). The traditional objection to time travel is that it leads to contradictions: there is no consistent way to complete an arbitrarily constructed story about how the time traveler intends to act. Instead, though, it appears that the problem is underdetermination: the story can be consistently completed in many different ways. 6. Remarks and Limitations on the Toy Models The two toys models presented above have the virtue of being mathematically tractable, but they involve certain simplifications and potential problems that lead to trouble if one tries to make them more complicated. Working through these difficulties will help highlight the conditions we have made use of. Consider a slight modification of the first simple model proposed to us by Adam Elga. Let the particles have an electric charge, which produces forces according to Coulomb’s law. Then set up a situation like that depicted in figure 9: Figure 9: Set-up for Elga's Paradox The dotted line indicates the path the particle will follow if no forces act upon it. The point labeled P is the left edge of the time-travel region; the two labels are a reminder that the point at the bottom and the point at the top are one and the same. Elga's paradox is as follows: if no force acts on the particle, then it will enter the time-travel region. But if it enters the time travel region, and hence reappears along the bottom edge, then its later self will interact electrically with its earlier self, and the earlier self will be deflected away from the time-travel region. It is easy to set up the case so that the deflection will be enough to keep the particle from ever entering the time-travel region in the first place. (For instance, let the momentum of the incoming particle towards the time travel region be very small. The mere existence of an identically charged particle inside the time travel region will then be sufficient to deflect the incoming particle so that it never reaches L+.) But, of course, if the particle never enters the region at all, then it will not be there to deflect itself…. One might suspect that some complicated collection of charged particles in the time-travel-region can save the day, as it did with our mirror-reflection problem above. But (unless there are infinitely many such particles) this can't work, as conservation of particle number and linear momentum show. Suppose that some finite collection of particles emerges from L- and supplies the repulsive electric force needed to deflect the incoming particle. Then exactly the same collection of particles must be “absorbed” at L+. So at all times after L+, the only particle there is in the world is the incoming particle, which has now been deflected away from its original trajectory. The deflection, though, means that the linear momentum of the particle has changed from what is was before L-. But that is impossible, by conservation of linear momementum. No matter how the incoming particle interacts with particles in the time-travel region, or how those particle interact with each other, total linear momentum is conserved by the interaction. And whatever net linear momentum the time-travelling particles have when they emerge from L-, that much linear momentum most be absorbed at L+. So the momentum of the incoming particle can't be changed by the interaction: the particle can't have been deflected. (One could imagine trying to create a sort of “S” curve in the trajectory of the incoming particle, first bending to the left and then to the right, which leaves its final momentum equal to its initial momentum, but moving it over in space so it misses L+. However, if the force at issue is repulsive, then the bending back to the right can't be done. In the mirror example above, the path of the incoming particle can be changed without violating the conservation of momentum because at the end of the process momentum has been transferred to the mirror.) How does Elga's example escape our analysis? Why can't a contintuity principle guarantee the existence of a solution here? The continuity assumption breaks down because of two features of the example: the concentration of the electric charge on a point particle, and the way we have treated (or, more accurately, failed to treat) the point P, the edge of L+ (and L-). We have assumed that a point particle either hits L+, and then emerges from L-, or else it misses L+ and sails on into the region of space-time above it. This means that the charge on the incoming particle only has two possibilities: either it is transported whole back in time or it completely avoids time travel altogether. Let's see how it alters the situation to imagine the charge itself to be continuous divisible. Suppose that, instead of being concentrated at a point, the incoming object is a little stick, with electric charge distributed even across it (figure 10). Figure 10: Elga's Paradox with a Charged Bar Once again, we set things up so that if there are no forces on the bar, it will be completely absorbed at L+. But we now postulate that if the bar should hit the point P, it will fracture: part of it (the part that hits L+) will be sent back in time and the rest will continue on above L+. So continuity of a sort is restored: now we have not just the possibility of the whole charge being sent back or nothing, we have the continuum degrees of charge in between. It is not hard to see that the restoration of continuity restores the existence of a consistent solution. If no charge is sent back through time, then the bar is not deflected and all of it hits L+ (and hence is sent back through time). If all the charge is sent back through time, then is incoming bar is deflected to an extent that it misses L+ completely, and so no charge is sent back. But if just the right amount of charge is sent back through time, then the bar will be only partially deflected, deflected so that it hits the edge point P, and is split into a bit that goes back and a bit that does not, with the bit that goes back being just the right amount of charge to produce just that deflection (figure 11). Figure 11: Solution to Elga's Paradox with a Charged Bar Our problem about conservation of momentum is also solved: piece of the bar that does not time travel has lower momentum to the right at the end than it had initially, but the piece that does time travel has a higher momentum (due to the Coulomb forces), and everything balances out. Is it cheating to model the charged particle as a bar that can fracture? What if we insist that the particle is truly a point particle, and hence that its time travel is an all-or-nothing affair? In that case, we now have to worry about a question we have not yet confronted: what happens if our point particle hits exactly at the point P on the diagram? Does it time-travel or not? Confronting this question requires us to face up to a feature of the rather cheap way we implemented time travel in our toy models by cut-and-paste. The way we rejiggered the space-time structure had a rather severe consequence: the resulting space-time is no longer a manifold: the topological structure at the point P is different from the topological structure elsewhere. Mathematical physicists simply don't deal with such structures: the usual procedure is to eliminate the offending point from the space-time and thus restore the manifold structure. In this case, that would leave a bare singularity at point P, an open edge to space-time into which anything could disappear and out of which, for all the physics tells us, anything could emerge. In particular, if we insist that our particle is a point particle, then if its trajectory should happen to intersect P it will simply disappear. What could cause the extremely fortuitous result that the trajectory strikes precisely at P? The emergence of some other charged particle, with just the right charge and trajectory, from P (on L-). And we are no longer bound by any conservation laws: the bare singularity can both swallow and produce whatever mass or change or momentum we like. So if we insist on point particles, then we have to take account of the singularity, and that again saves the day. Consideration of these (slightly more complicated) toy models does not replace the proving of theorems, of course. But they do serve to illustrate the sorts of consideration that necessarily come into play when trying to spell out the physics of time travel in all detail. Let us now discuss some results regarding some slightly more realistic models that have been discussed in the physics literature. 7. Slightly More Realistic Models of Time Travel Echeverria, Klinkhammer and Thorne (1991) considered the case of 3-dimensional single hard spherical ball that can go through a single time travel wormhole so as to collide with its younger self. Figure 12 The threat of paradox in this case arises in the following form. There are initial trajectories (starting in the non-time travel region of space-time) for the ball such that if such a trajectory is continued (into the time travel region), assuming that the ball does not undergo a collision prior to entering mouth 1 of the wormhole, it will exit mouth 2 so as to collide with its earlier self prior to its entry into mouth 1 in such a way as to prevent its earlier self from entering mouth 1. Thus it seems that the ball will enter mouth 1 if and only if it does not enter mouth 1. Of course, the Wheeler-Feynman strategy is to look for a ‘glancing blow’ solution: a collision which will produce exactly the (small) deviation in trajectory of the earlier ball that produces exactly that collision. Are there always such solutions?[3] Echeverria, Klinkhammer & Thorne found a large class of initial trajectories that have consistent ‘glancing blow’ continuations, and found none that do not (but their search was not completely general). They did not produce a rigorous proof that every initial trajectory has a consistent continuation, but suggested that it is very plausible that every initial trajectory has a consistent continuation. That is to say, they have made it very plausible that, in the billiard ball wormhole case, the time travel structure of such a wormhole space-time does not result in constraints on states on spacelike surfaces in the non-time travel region. In fact, as one might expect from our discussion in the previous section, they found the opposite problem from that of inconsistency: they found underdetermination. For a large class of initial trajectories there are multiple different consistent ‘glancing blow’ continuations of that trajectory (many of which involve multiple wormhole traversals). For example, if one initially has a ball that is traveling on a trajectory aimed straight between the two mouths, then one obvious solution is that the ball passes between the two mouths and never time travels. But another solution is that the younger ball gets knocked into mouth 1 exactly so as to come out of mouth 2 and produce that collision. Echeverria et al. do not note the possibility (which we pointed out in the previous section) of the existence of additional balls in the time travel region. We conjecture (but have no proof) that for every initial trajectory of A there are some, and generically many, multiple ball continuations. Friedman et al. 1990 examined the case of source free non-self-interacting scalar fields traveling through such a time travel wormhole and found that no constraints on initial conditions in the non-time travel region are imposed by the existence of such time travel wormholes. In general there appear to be no known counter examples to the claim that in ‘somewhat realistic’ time-travel space-times with a partial Cauchy surface there are no constraints imposed on the state on such a partial Cauchy surface by the existence of CTC's. (See e.g., Friedman and Morris 1991, Thorne 1994, and Earman 1995; in the Other Internet Resources, see Earman, Smeenk, and Wüthrich 2003.) How about the issue of constraints in the time travel region T? Prima facie, constraints in such a region would not appear to be surprising. But one might still expect that there should be no constraints on states on a spacelike surface, provided one keeps the surface ‘small enough’. In the physics literature the following question has been asked: for any point p in T, and any space-like surface S that includes p is there a neighborhood E of p in S such that any solution on E can be extended to a solution on the whole space-time? With respect to this question, there are some simple models in which one has this kind of extendibility of local solutions to global ones, and some simple models in which one does not have such extendibility, with no clear general pattern. The technical mathematical problems are amplified by the more conceptual problem of what it might mean to say that one could create a situation which forces the creation of closed timelike curves. (See e.g. Yurtsever 1990, Friedman et al. 1990, Novikov 1992, Earman 1995 and Earman, Smeenk and Wüthrich 2009; in the Other Internet Resources, see Earman, Smeenk and Wüthrich 2003). What are we to think of all of this? 8. Even If There are Constraints, So What? Since it is not obvious that one can rid oneself of all constraints in realistic models, let us examine the argument that time travel is implausible, and we should think it unlikely to exist in our world, in so far as it implies such constraints. The argument goes something like the following. In order to satisfy such constraints one needs some pre-established divine harmony between the global (time travel) structure of space-time and the distribution of particles and fields on space-like surfaces in it. But it is not plausible that the actual world, or any world even remotely like ours, is constructed with divine harmony as part of the plan. In fact, one might argue, we have empirical evidence that conditions in any spatial region can vary quite arbitrarily. So we have evidence that such constraints, whatever they are, do not in fact exist in our world. So we have evidence that there are no closed time-like lines in our world or one remotely like it. We will now examine this argument in more detail by presenting four possible responses, with counterresponses, to this argument. Response 1. There is nothing implausible or new about such constraints. For instance, if the universe is spatially closed, there has to be enough matter to produce the needed curvature, and this puts constraints on the matter distribution on a space-like hypersurface. Thus global space-time structure can quite unproblematically constrain matter distributions on space-like hypersurfaces in it. Moreover we have no realistic idea what these constraints look like, so we hardly can be said to have evidence that they do not obtain. Counterresponse 1. Of course there are constraining relations between the global structure of space-time and the matter in it. The Einstein equations relate curvature of the manifold to the matter distribution in it. But what is so strange and implausible about the constraints imposed by the existence of closed time-like curves is that these constraints in essence have nothing to do with the Einstein equations. When investigating such constraints one typically treats the particles and/or field in question as test particles and/or fields in a given space-time, i.e., they are assumed not to affect the metric of space-time in any way. In typical space-times without closed time-like curves this means that one has, in essence, complete freedom of matter distribution on a space-like hypersurface. (See response 2 for some more discussion of this issue). The constraints imposed by the possibility of time travel have a quite different origin and are implausible. In the ordinary case there is a causal interaction between matter and space-time that results in relations between global structure of space-time and the matter distribution in it. In the time travel case there is no such causal story to be told: there simply has to be some pre-established harmony between the global space-time structure and the matter distribution on some space-like surfaces. This is implausible. Response 2. Constraints upon matter distributions are nothing new. For instance, Maxwell's equations constrain electric fields E on an initial surface to be related to the (simultaneous) charge density distribution ρ by the equation ρ = div(E). (If we assume that the E field is generated solely by the charge distribution, this conditions amounts to requiring that the E field at any point in space simply be the one generated by the charge distribution according to Coulomb's inverse square law of electrostatics.) This is not implausible divine harmony. Such constraints can hold as a matter of physical law. Moreover, if we had inferred from the apparent free variation of conditions on spatial regions that there could be no such constraints we would have mistakenly inferred that ρ = div(E) could not be a law of nature. Counterresponse 2. The constraints imposed by the existence of closed time-like lines are of quite a different character from the constraint imposed by ρ = div(E). The constraints imposed by ρ = div(E) on the state on a space-like hypersurface are: (i) local constraints (i.e., to check whether the constraint holds in a region you just need to see whether it holds at each point in the region), (ii) quite independent of the global space-time structure, (iii) quite independent of how the space-like surface in question is embedded in a given space-time, and (iv) very simply and generally stateable. On the other hand, the consistency constraints imposed by the existence of closed time-like curves (i) are not local, (ii) are dependent on the global structure of space-time, (iii) depend on the location of the space-like surface in question in a given space-time, and (iv) appear not to be simply stateable other than as the demand that the state on that space-like surface embedded in such and such a way in a given space-time, do not lead to inconsistency. On some views of laws (e.g., David Lewis' view) this plausibly implies that such constraints, even if they hold, could not possibly be laws. But even if one does not accept such a view of laws, one could claim that the bizarre features of such constraints imply that it is implausible that such constraints hold in our world or in any world remotely like ours. Response 3. It would be strange if there are constraints in the non-time travel region. It is not strange if there are constraints in the time travel region. They should be explained in terms of the strange, self-interactive, character of time travel regions. In this region there are time-like trajectories from points to themselves. Thus the state at such a point, in such a region, will, in a sense, interact with itself. It is a well-known fact that systems that interact with themselves will develop into an equilibrium state, if there is such an equilibrium state, or else will develop towards some singularity. Normally, of course, self-interaction isn't true instantaneous self-interaction, but consists of a feed-back mechanism that takes time. But in time travel regions something like true instantaneous self-interaction occurs. This explains why constraints on states occur in such time travel regions: the states ‘ab initio’ have to be ‘equilibrium states’. Indeed in a way this also provides some picture of why indeterminism occurs in time travel regions: at the onset of self-interaction states can fork into different equi-possible equilibrium states. Counterresponse 3. This is explanation by woolly analogy. It all goes to show that time travel leads to such bizarre consequences that it is unlikely that it occurs in a world remotely like ours. Response 4. All of the previous discussion completely misses the point. So far we have been taking the space-time structure as given, and asked the question whether a given time travel space-time structure imposes constraints on states on (parts of) space-like surfaces. However, space-time and matter interact. Suppose that one is in a space-time with closed time-like lines, such that certain counterfactual distributions of matter on some neighborhood of a point p are ruled out if one holds that space-time structure fixed. One might then ask “Why does the actual state near p in fact satisfy these constraints? By what divine luck or plan is this local state compatible with the global space-time structure? What if conditions near p had been slightly different?” And one might take it that the lack of normal answers to these questions indicates that it is very implausible that our world, or any remotely like it, is such a time travel universe. However the proper response to these question is the following. There are no constraints in any significant sense. If they hold they hold as a matter of accidental fact, not of law. There is no more explanation of them possible than there is of any contingent fact. Had conditions in a neighborhood of p been otherwise, the global structure of space-time would have been different. So what? The only question relevant to the issue of constraints is whether an arbitrary state on an arbitrary spatial surface S can always be embedded into a space-time such that that state on S consistently extends to a solution on the entire space-time. But we know the answer to that question. A well-known theorem in general relativity says the following: any initial data set on a three dimensional manifold S with positive definite metric has a unique embedding into a maximal space-time in which S is a Cauchy surface (see e.g., Geroch and Horowitz 1979, p. 284 for more detail), i.e., there is a unique largest space-time which has S as a Cauchy surface and contains a consistent evolution of the initial value data on S. Now since S is a Cauchy surface this space-time does not have closed time like curves. But it may have extensions (in which S is not a Cauchy surface) which include closed timelike curves, indeed it may be that any maximal extension of it would include closed timelike curves. (This appears to be the case for extensions of states on certain surfaces of Taub-NUT space-times. See Earman, Smeenk, and Wüthrich 2003 in the Other Internet Resources). But these extensions, of course, will be consistent. So properly speaking, there are no constraints on states on space-like surfaces. Nonetheless the space-time in which these are embedded may or may not include closed time-like curves. Counterresponse 4. This, in essence, is the stonewalling answer which we indicated at the beginning of section 2. However, whether or not you call the constraints imposed by a given space-time on distributions of matter on certain space-like surfaces ‘genuine constraints’, whether or not they can be considered lawlike, and whether or not they need to be explained, the existence of such constraints can still be used to argue that time travel worlds are so bizarre that it is implausible that our world or any world remotely like ours is a time travel world. Suppose that one is in a time travel world. Suppose that given the global space-time structure of this world, there are constraints imposed upon, say, the state of motion of a ball on some space-like surface when it is treated as a test particle, i.e., when it is assumed that the ball does not affect the metric properties of the space-time it is in. (There is lots of other matter that, via the Einstein equation, corresponds exactly to the curvature that there is everywhere in this time travel worlds.) Now a real ball of course does have some effect on the metric of the space-time it is in. But let us consider a ball that is so small that its effect on the metric is negligible. Presumably it will still be the case that certain states of this ball on that space-like surface are not compatible with the global time travel structure of this universe. This means that the actual distribution of matter on such a space-like surface can be extended into a space-time with closed time-like lines, but that certain counterfactual distributions of matter on this space-like surface can not be extended into the same space-time. But note that the changes made in the matter distribution (when going from the actual to the counterfactual distribution) do not in any non-negligible way affect the metric properties of the space-time. Thus the reason why the global time travel properties of the counterfactual space-time have to be significantly different from the actual space-time is not that there are problems with metric singularities or alterations in the metric that force significant global changes when we go to the counterfactual matter distribution. The reason that the counterfactual space-time has to be different is that in the counterfactual world the ball's initial state of motion starting on the space-like surface, could not ‘meet up’ in a consistent way with its earlier self (could not be consistently extended) if we were to let the global structure of the counterfactual space-time be the same as that of the actual space-time. Now, it is not bizarre or implausible that there is a counterfactual dependence of manifold structure, even of its topology, on matter distributions on spacelike surfaces. For instance, certain matter distributions may lead to singularities, others may not. We may indeed in some sense have causal power over the topology of the space-time we live in. But this power normally comes via the Einstein equations. But it is bizarre to think that there could be a counterfactual dependence of global space-time structure on the arrangement of certain tiny bits of matter on some space-like surface, where changes in that arrangement by assumption do not affect the metric anywhere in space-time in any significant way. It is implausible that we live in such a world, or that a world even remotely like ours is like that. Let us illustrate this argument in a different way by assuming that wormhole time travel imposes constraints upon the states of people prior to such time travel, where the people have so little mass/energy that they have negligible effect, via the Einstein equation, on the local metric properties of space-time. Do you think it more plausible that we live in a world where wormhole time travel occurs but it only occurs when people's states are such that these local states happen to combine with time travel in such a way that nobody ever succeeds in killing their younger self, or do you think it more plausible that we are not in a wormhole time travel world?[4] 9. Quantum Mechanics to the Rescue? There has been a particularly clear treatment of time travel in the context of quantum mechanics by David Deutsch (see Deutsch 1991, and Deutsch and Lockwood 1994) in which it is claimed that quantum mechanical considerations show that time travel never imposes any constraints on the pre-time travel state of systems. The essence of this account is as follows. A quantum system starts in state S1, interacts with its older self, after the interaction is in state S2, time travels while developing into state S3, then interacts with its younger self, and ends in state S4 (see figure 13). Figure 13 Deutsch assumes that the set of possible states of this system are the mixed states, i.e., are represented by the density matrices over the Hilbert space of that system. Deutsch then shows that for any initial state S1, any unitary interaction between the older and younger self, and any unitary development during time travel, there is a consistent solution, i.e., there is at least one pair of states S2 and S3 such that when S1 interacts with S3 it will change to state S2 and S2 will then develop into S3. The states S2, S3 and S4 will typically be not be pure states, i.e., will be non-trivial mixed states, even if S1 is pure. In order to understand how this leads to interpretational problems let us give an example. Consider a system that has a two dimensional Hilbert space with as a basis the states and . Let us suppose that when state of the young system encounters state of the older system, they interact and the young system develops into state and the old system remains in state . In obvious notation: 13 develops into 24. Similarly, suppose that: 13 develops into 24, 13 develops into 24, and 13 develops into 24. Let us furthermore assume that there is no development of the state of the system during time travel, i.e., that 2 develops into 3, and that 2 develops into 3. Now, if the only possible states of the system were and (i.e., if there were no superpositions or mixtures of these states), then there is a constraint on initial states: initial state 1 is impossible. For if 1 interacts with 3 then it will develop into 2, which, during time travel, will develop into 3, which inconsistent with the assumed state 3. Similarly if 1 interacts with 3 it will develop into 2, which will then develop into 3 which is also inconsistent. Thus the system can not start in state 1. But, says Deutsch, in quantum mechanics such a system can also be in any mixture of the states and . Suppose that the older system, prior to the interaction, is in a state S3 which is an equal mixture of 50% 3 and 50% 3. Then the younger system during the interaction will develop into a mixture of 50% 2 and 50% 2, which will then develop into a mixture of 50% 3 and 50% 3, which is consistent! More generally Deutsch uses a fixed point theorem to show that no matter what the unitary development during interaction is, and no matter what the unitary development during time travel is, for any state S1 there is always a state S3 (which typically is not a pure state) which causes S1 to develop into a state S2 which develops into that state S3. Thus quantum mechanics comes to the rescue: it shows in all generality that no constraints on initial states are needed! One might wonder why Deutsch appeals to mixed states: will superpositions of states and not suffice? Unfortunately such an idea does not work. Suppose again that the initial state is 1. One might suggest that that if state S3 is 1/√2 3 + 1/√2 3 one will obtain a consistent development. For one might think that when initial state 1 encounters the superposition 1/√2 3 + 1/√2 3, it will develop into superposition 1/√2 2 + 1/√2 2, and that this in turn will develop into 1/√2 3 + 1/√2 3, as desired. However this is not correct. For initial state 1 when it encounters 1/√2 3 + 1/√2 3, will develop into the entangled state 1/√2 24 + 1/√2 24. In so far as one can speak of the state of the young system after this interaction, it is in the mixture of 50% 2 and 50% 2, not in the superposition 1/√2 2 + 1/√2 2. So Deutsch does need his recourse to mixed states. This clarification of why Deutsch needs his mixtures does however indicate a serious worry about the simplifications that are part of Deutsch's account. After the interaction the old and young system will (typically) be in an entangled state. Although for purposes of a measurement on one of the two systems one can say that this system is in a mixed state, one can not represent the full state of the two systems by specifying the mixed state of each separate part, as there are correlations between observables of the two systems that are not represented by these two mixed states, but are represented in the joint entangled state. But if there really is an entangled state of the old and young systems directly after the interaction, how is one to represent the subsequent development of this entangled state? Will the state of the younger system remain entangled with the state of the older system as the younger system time travels and the older system moves on into the future? On what space-like surfaces are we to imagine this total entangled state to be? At this point it becomes clear that there is no obvious and simple way to extend elementary non-relativistic quantum mechanics to space-times with closed time-like curves. There have been more sophisticated approaches than Deutsch's to time travel, using technical machinery from quantum field theory and differentiable manifolds (see e.g., Friedman et al 1991, Earman, Smeenk, and Wüthrich 2003 in the Other Internet Resources, and references therein). But out of such approaches no results anywhere near as clear and interesting as Deutsch's have been forthcoming. How does Deutsch avoid these complications? Deutsch assumes a mixed state S3 of the older system prior to the interaction with the younger system. He lets it interact with an arbitrary pure state S1 younger system. After this interaction there is an entangled state S′ of the two systems. Deutsch computes the mixed state S2 of the younger system which is implied by this entangled state S′. His demand for consistency then is just that this mixed state S2 develops into the mixed state S3. Now it is not at all clear that this is a legitimate way to simplify the problem of time travel in quantum mechanics. But even if we grant him this simplification there is a problem: how are we to understand these mixtures? If we take an ignorance interpretation of mixtures we run into trouble. For suppose that we assume that in each individual case each older system is either in state 3 or in state 3 prior to the interaction. Then we regain our paradox. Deutsch instead recommends the following, many worlds, picture of mixtures. Suppose we start with state 1 in all worlds. In some of the many worlds the older system will be in the 3 state, let us call them A-worlds, and in some worlds, B-worlds, it will be in the 3 state. Thus in A-worlds after interaction we will have state 2 , and in B-worlds we will have state 2. During time travel the 2 state will remain the same, i.e., turn into state 3, but the systems in question will travel from A-worlds to B-worlds. Similarly the 2 states will travel from the B-worlds to the A-worlds, thus preserving consistency. Now whatever one thinks of the merits of many worlds interpretations, and of this understanding of it applied to mixtures, in the end one does not obtain genuine time travel in Deutsch's account. The systems in question travel from one time in one world to another time in another world, but no system travels to an earlier time in the same world. (This is so at least in the normal sense of the word ‘world,’ the sense that one means when, for instance, one says “there was, and will be, only one Elvis Presley in this world.”) Thus, even if it were a reasonable view, it is not quite as interesting as it may have initially seemed. 10. Conclusions What remains of the killing-your-earlier-self paradox in general relativistic time travel worlds is the fact that in some cases the states on edgeless spacelike surfaces are ‘overconstrained’, so that one has less than the usual freedom in specifying conditions on such a surface, given the time-travel structure, and in some cases such states are ‘underconstrained’, so that states on edgeless space-like surfaces do not determine what happens elsewhere in the way that they usually do, given the time travel structure. There can also be mixtures of those two types of cases. The extent to which states are overconstrained and/or underconstrained in realistic models is as yet unclear, though it would be very surprising if neither obtained. The extant literature has primarily focused on the problem of overconstraint, since that, often, either is regarded as a metaphysical obstacle to the possibility time travel, or as an epistemological obstacle to the plausibility of time travel in our world. While it is true that our world would be quite different from the way we normally think it is if states were overconstrained, underconstraint seems at least as bizarre as overconstraint. Nonetheless, neither directly rules out the possibility of time travel. If time travel entailed contradictions then the issue would be settled. And indeed, most of the stories employing time travel in popular culture are logically incoherent: one cannot “change” the past to be different from what it was, since the past (like the present and the future) only occurs once. But if the only requirement demanded is logical coherence, then it seems all too easy. A clever author can devise a coherent time-travel scenario in which everything happens just once and in a consistent way. This is just too cheap: logical coherence is a very weak condition, and many things we take to be metaphysically impossible are logically coherent. For example, it involves no logical contradiction to suppose that water is not molecular, but if both chemistry and Kripke are right it is a metaphysical impossibility. We have been interested not in logical possibility but in physical possibility. But even so, our conditions have been relatively weak: we have asked only whether time-travel is consistent with the universal validity of certain fundamental physical laws and with the notion that the physical state on a surface prior to the time travel region be unconstrained. It is perfectly possible that the physical laws obey this condition, but still that time travel is not metaphysically possible because of the nature of time itself. Consider an analogy. Aristotle believed that water is homoiomerous and infinitely divisible: any bit of water could be subdivided, in principle, into smaller bits of water. Aristotle's view contains no logical contradiction. It was certainly consistent with Aristotle's conception of water that it be homoiomerous, so this was, for him, a conceptual possibility. But if chemistry is right, Aristotle was wrong both about what water is like and what is possible for it. It can't be infinitely divided, even though no logical or conceptual analysis would reveal that. Similarly, even if all of our consistency conditions can be met, it does not follow that time travel is physically possible, only that some specific physical considerations cannot rule it out. The only serious proof of the possibility of time travel would be a demonstration of its actuality. For if we agree that there is no actual time travel in our universe, the supposition that there might have been involves postulating a substantial difference from actuality, a difference unlike in kind from anything we could know if firsthand. It is unclear to us exactly what the content of possible would be if one were to either maintain or deny the possibility of time travel in these circumstances, unless one merely meant that the possibility is not ruled out by some delineated set of constraints. As the example of Aristotle's theory of water shows, conceptual and logical “possibility” do not entail possibility in a full-blooded sense. What exactly such a full-blooded sense would be in case of time travel, and whether one could have reason to believe it to obtain, remain to us obscure.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

Thermodynamic Asymmetry in Time

1. Thermodynamic Time Asymmetry: A Brief Guide First developed in Sadi Carnot’s Reflections on the Motive Power of Fire 1824, the science of classical thermodynamics is intimately associated with the industrial revolution. Most of the results responsible for the science originated from the practice of engineers trying to improve steam …

1. Thermodynamic Time Asymmetry: A Brief Guide First developed in Sadi Carnot’s Reflections on the Motive Power of Fire 1824, the science of classical thermodynamics is intimately associated with the industrial revolution. Most of the results responsible for the science originated from the practice of engineers trying to improve steam engines. Originating in France and England in the late eighteenth and early nineteenth centuries, the science quickly spread throughout Europe. By the mid-nineteenth century, Rudolf Clausius in Germany and William Thomson (later Lord Kelvin) in England had developed the theory in great detail. Once developed, its scope grew from steam engines and the like to arguably all macroscopic processes. Thermodynamics is a “phenomenal” science. That means that its variables range over macroscopic parameters such as temperature, pressure and volume. These are properties that hold at equilibrium, i.e., when the values of the macroscopic variables remain approximately stable. Whether the microphysics underlying these variables are motive atoms in the void or an imponderable fluid is largely irrelevant to this science. The developers of the theory both prided themselves on this fact and at the same time worried about it. Clausius, for instance, was one of the first to speculate that heat consisted solely of the motion of particles (without an ether), for it made the equivalence of heat with mechanical work less surprising. However, as was common, he kept his “ontological” beliefs separate from his official statement of the principles of thermodynamics because he didn’t wish to (in his words) “taint” the latter with the speculative character of the former.[1] A treatment of thermodynamics naturally begins with the statements it takes to be laws of nature. These laws are founded upon observations of relationships between particular macroscopic parameters and they are justified by the fact they are empirically adequate. No further justification of these laws is to be found—at this stage—from the details of microphysics. Rather, stable, counterfactual-supporting generalizations about macroscopic features are enshrined as law. The typical textbook treatment of thermodynamics describes some basic concepts, states the laws in a more or less rough way and then proceeds to derive the concepts of temperature and entropy and the various thermodynamic equations of state. It is worth remarking, however, that in the last fifty years the subject has been presented with a degree of mathematical rigor not previously achieved. Originating from the early axiomatization by Carathéodory in 1909, the development of “rational thermodynamics” has clarified the concepts and logic of classical thermodynamics to a degree not generally appreciated. There now exist many quite different, mathematically exact approaches to thermodynamics, each starting with different primitive kinds and/or observational regularities as axioms. (For a popular presentation of a recent axiomatization, see Lieb and Yngvason 2000.) In the traditional approach classical thermodynamics has two laws, the First and Second Laws.[2] The First Law expresses the conservation of energy and is founded upon the impossibility of creating a machine that can create energy. The law uses the concept of the internal energy of a system, \(U\), which is a function of the system’s macroscopic variables, e.g., temperature, volume. For thermally isolated (adiabatic) systems—think of systems such as coffee in a thermos—the law states that this function, \(U\), is such that the work \(W\) delivered to a system’s surroundings is compensated by a loss of internal energy, i.e., \(dW = -dU\). When James Joule and others showed that mechanical work and heat were interconvertible, consistency with the principle of energy conservation demanded that heat, \(Q\), considered as a different form of energy, be taken into account. For non-isolated systems we extend the law as \(dQ = dU + dW\), where \(dQ\) is the differential of the amount of heat added to the system (in a reversible manner). The conservation of energy tells us nothing about temporally asymmetric behavior. It doesn’t follow from the First Law that interacting systems quickly tend to approach equilibrium, and once achieved, never leave this state. It is perfectly consistent with the First Law that systems in equilibrium leave equilibrium. In particular, no limitations are placed on transforming energy from one form into another, so the Law permits the possibility of machines that remove heat from their environment and turn it into work (a so-called perpetual mobile of the second kind). To rule out such machines, and more generally, to capture the amazingly general temporally asymmetric behavior we find, another law is needed. Although Carnot was the first to state it, the formulations of Kelvin and Clausius are standard: Kelvin’s version is essentially the same as the version arrived at by both Carnot and Planck, whereas Clausius’ version differs from these in a few ways.[3] Clausius’ version transparently rules out anti-thermodynamic behavior such as a hot iron bar extracting heat from a neighboring cold iron bar. The cool bar cannot give up a quantity of heat to the warmer bar (without something else happening). Kelvin’s statement is perhaps less obvious. It originates in an observation about steam engines, namely, that heat energy is a “poor” grade of energy. Consider a gas-filled cylinder with a frictionless piston holding the gas down at one end. If we put a flame under the cylinder, the gas will expand and the piston can perform work, e.g., it might move a ball. However, we can never convert the heat energy straight into work without some other effect occurring. In this case, the gas occupies a larger volume. In 1854, Clausius introduced the notion of the “equivalence value” of a transformation, a concept that is the ancestor of the modern day concept of entropy. Later in 1865 Clausius coined the term “entropy” for a similar concept (the word derives from the Greek word for transformation). The entropy of a state \(A\), \(S(A)\) is defined as the integral \(S(A) = \int^{A}_{O} dQ/T\) over a reversible transformation, where \(O\) is some arbitrary fixed state. For \(A\) to have an entropy, the transformation from \(O\) to \(A\) must be quasi-static, i.e., a succession of equilibrium states. Continuity considerations then imply that the initial and final states \(O\) and \(A\) must also be equilibrium states. Put in the language of entropy, the Second Law states that in a transformation from equilibrium state \(A\) to equilibrium state \(B\), the inequality \(S(B) - S(A)\) is greater than or equal to the \(\int^{A}_{B} dQ/T\). Loosely put, for realistic systems, this implies that in the spontaneous evolution of a thermally closed system the entropy can never decrease and that it attains its maximum value at equilibrium. We are invited to think of the Second Law as driving the system to its new, higher entropy equilibrium state. With the Second Law thermodynamics is able to characterize an extraordinary range of phenomena under one simple law. Remarkably, whether they are gases filling their available volumes, iron bars in contact coming to the same temperature, vinegar and oil separating, or milk mixing in your coffee, they all have an observable property in common: their entropy increases. Coupled with the First Law, the Second Law is remarkably powerful. It appears that all classical thermodynamical behavior can be derived from these two simple statements (O. Penrose 1970). The above sketch represents the conventional way of describing thermodynamics and its Second Law. Let me mention a few questions that it raises. First, what is the precise location of the time-asymmetry? Almost all commentators claim that it lay in the Second Law. If Uffink (2001) and Brown and Uffink (2001) are correct, however, then this “static” Second Law does not encode any time asymmetry at all. It is, after all, simply a relation between a few variables at equilibrium. While that may be right, there is no question that thermodynamics, if not its Second Law, makes time-asymmetric claims. The spontaneous movement from non-equilibrium to equilibrium happens and is assumed throughout the field. The only question is whether it must be regarded as a separate assumption (perhaps demanding its own name) or can somehow be derived from existing principles. It’s also worth remarking that many other principles of thermodynamics are time-asymmetric, e.g., the classical heat equation. Second, what is the scope of the Second Law? There are two issues here. First, does it apply to the universe as a whole, so that we can say the universe’s entropy is increasing, or does it only apply to select sub-systems of the universe? (See Uffink 2001 for an interesting historical discussion of this topic.) Many philosophers and physicists have balked at the idea that the universe itself has an entropy. As one might expect, those in the grip of an operationalist philosophy are especially prone to deny that the universe as a whole has an entropy. Second, what sub-systems of the universe does it govern? Are the principles of thermodynamics responsible for generalizations about black holes? The field of black hole thermodynamics assumes it is (see the section on black hole thermodynamics in the entry on singularities and black holes, for discussion and references), although not all are convinced (Dougherty & Callender forthcoming). What about the micro-realm? Third, how are these laws framed in a relativistic universe? They were developed in the nineteenth century with a classical spacetime background in mind. How do we write the theory in a modern formulation? Surprisingly, the issue is as much conceptual as technical. The correct (special) relativistic transformation rules for thermodynamic quantities are controversial. Do Lorentz boosted gases appear hotter or colder in the new inertial frame? Albert Einstein himself answered the question about the gas differently throughout his life! With all the current activity of physicists being focused on the thermodynamics of black holes in general relativity and quantum gravity, it is amusing to note that special relativistic thermodynamics is still a field with many open questions, both physically and philosophically (see Earman 1981 and Liu 1994). Fourth, another important question concerns the reduction of thermodynamic concepts such as entropy to their mechanical, or statistical mechanical, basis. As even a cursory glance at statistical mechanics reveals, there are many candidates for the statistical mechanical entropy, each the center of a different program in the foundations of the field. Surprisingly, there is no consensus as to which entropy is best suited to be the reduction basis of the thermodynamic entropy (see, for example, Sklar 1993; Callender 1999; Lavis 2005; Frigg 2008; Robertson forthcoming). Consequently, there is little agreement about what grounds the Second Law in statistical mechanics. Despite the worthiness of all of these issues, this article focuses on two distinct problems associated with the direction of time. 2. The Problem of the Direction of Time I The first “problem of the direction of time” is: what accounts for the time asymmetry of thermodynamics? Thermodynamics is not a fundamental physical science. Hence it must inherit its massive time asymmetry from the microworld. But where? In virtue of what, fundamentally, is thermodynamics time asymmetric? The puzzle is usually said to arise due to fundamental physics being time symmetric, or more precisely, time reversal invariant. (A theory is time reversal invariant, loosely speaking, if its laws don’t care about the direction of time.) No asymmetry in, no asymmetry out; therefore there is a puzzle over where the asymmetry enters. However, even if fundamental physics is time asymmetric one can and should still demand an answer to the question of what accounts for thermodynamics time asymmetry. The answer could be non-trivial because the time asymmetry of fundamental physics may have nothing to do with the time asymmetry of thermodynamics. This situation actually appears to be the case, as weak interactions between quarks and leptons can violate time symmetry yet these violations don’t appear to be responsible for thermodynamic behavior. Historically the problem arose in a wonderful series of debates and arguments between the great physicist Ludwig Boltzmann and some of his contemporaries, notably, Johann Loschmidt, Ernst Zermelo and Edward Culverwell. Boltzmann was one of the founders and most influential developers of the field of statistical mechanics, as well as (later in life) a philosopher. While seeking a mechanical underpinning of the Second Law, he discovered a particularly ingenious explanation for why systems tend toward equilibrium. Ignoring historical details (Brush 1976, Frigg & Werndl 2011, Sklar 1993, Uffink 2006), here is the core idea loosely reconstructed from Boltzmann’s later writings. Consider an isolated gas of \(N\) particles in a box, where \(N\) is large enough to make the system macroscopic \((N \approx 10^{23}+)\). For the sake of familiarity we will work with classical mechanics. We can characterize the gas by the coordinates and momenta \(x_{in}, p_{in}\) of each of its particles and represent the whole system by a point \(X = (q,p)\) in a \(6N\)-dimensional phase space known as \(\Gamma\), where \(q = (q_1 \ldots q_{3N})\) and \(p = (p_1 \ldots p_{3N})\). Boltzmann’s great insight was to see that the thermodynamic entropy arguably “reduced” to the volume in \(\Gamma\) picked out by the macroscopic parameters of the system. The key ingredient is partitioning \(\Gamma\) into compartments, such that all of the microstates \(X\) in a compartment are macroscopically (and thus thermodynamically) indistinguishable. To each macrostate \(M\), there corresponds a volume of \(\Gamma\), \(\lvert\Gamma_M\rvert\), whose size will depend on the macrostate in question. For combinatorial reasons, almost all of \(\Gamma\) corresponds to a state of thermal equilibrium. There are simply many more ways to be distributed with uniform temperature and pressure than ways to be distributed with nonuniform temperature and pressure. There is a vast numerical imbalance in \(\Gamma\) between the states in thermal equilibrium and the states in thermal nonequilibrium. We now introduce Boltzmann’s famous formula (up to an additive constant) for what we might call the “Boltzmann entropy” \(S_B\): \[ S_B (M(X)) = k \log \lvert\Gamma_M\rvert \] where \(\lvert\Gamma_M\rvert\) is the volume in \(\Gamma\) associated with the macrostate \(M\), \(X\) is the microstate of the system, and \(k\) is Boltzmann’s constant. \(S_B\) provides a relative measure of the amount of \(\Gamma\) corresponding to each \(M\). Given the mentioned asymmetry in \(\Gamma\), almost all microstates realizing non-equilibrium macrostates are such that their entropy value is overwhelmingly likely to increase with time. When the constraints are released on systems initially confined to small sections of \(\Gamma\), typical systems will evolve into larger compartments. Since the new equilibrium distribution occupies almost all of the newly available phase space, nearly all of the microstates originating in the smaller volume will tend toward equilibrium. Except for those incredibly rare microstates conspiring to stay in small compartments, microstates will evolve in such a way as to have \(S_B\) increase. Substantial questions can be raised about the details of this approach. What justifies, for instance, the standard probability measure on \(\Gamma\)? Nonetheless, the Boltzmannian explanation seems to offer a plausible and powerful framework for understanding why the entropy of systems tends to increase with time. (For further explanation and discussion see Bricmont 1995; Frigg 2008, 2009; Goldstein 2001; Hemmo & Shenker 2012; Klein 1973; Lavis 2005; Lebowitz 1993; Uffink 2006.) Trouble looms over this explanation of time asymmetry (see Brown, Myrvold, & Uffink 2009). Before Boltzmann explained entropy increase as described above, he proposed a now notorious “proof” known as the “\(H\)-theorem” to the effect that entropy must always increase. Loschmidt 1876/1877 and Zermelo 1896 launched objections to the \(H\)-theorem. If we take as premises classical mechanical dynamics, they pointed out, it’s impossible to get any function of the classical state to monotonically increase. Loschmidt focused on the time reversal invariance of the classical dynamics and Zermelo on its recurrence property (roughly, that a bounded system, left to itself, will eventually return arbitrarily close to its initial state, for any given initial state). They were right: time reversal means that for every entropy-increasing solution to the classical equations there is a mirror entropy-decreasing solution; and recurrence means that every solution will at some point have its entropy decrease if we wait long enough. Some time asymmetric ingredient that had not been properly announced had been smuggled into the theorem. The reader can find this story in many textbooks and in many references cited above. An objection in their spirit (specifically, Loschmidt’s) can also be advanced against Boltzmann’s later view sketched above. Loosely put, because the classical equations of motion are time reversal invariant, nothing in the original explanation necessarily referred to the direction of time (see Hurley 1986). Although we just stated the Boltzmannian account of entropy increase in terms of entropy increasing into the future, the explanation can be turned around and made for the past temporal direction as well. Given a gas in a box that is in a nonequilibrium state, the vast majority of microstates that are antecedents of the dynamical evolution leading to the present macrostate correspond to a macrostate with higher entropy than the present one. Therefore, not only is it highly likely that typical microstates corresponding to a nonequilibrium state will evolve to higher entropy states, but it is also highly likely that they evolved from higher entropy states. Concisely put, the problem is that given a nonequilibrium state at time \(t_2\), it is overwhelmingly likely that but that due to the reversibility of the dynamics it is also overwhelmingly likely that where \(t_1 \lt t_2 \lt t_3\). However, transitions described by (2) do not seem to occur; or phrased more carefully, not both (1) and (2) occur. However we choose to use the terms “earlier” and “later”, clearly entropy doesn’t increase in both temporal directions. For ease of exposition let us dub (2) the culprit. The traditional problem is not merely that nomologically possible (anti-thermodynamic) behavior does not occur when it could. That is not straightforwardly a problem: all sorts of nomologically allowed processes do not occur. Rather, the problem is that statistical mechanics seems to make a prediction that is falsified, and that is a problem according to anyone’s theory of confirmation. Many solutions to this problem have been proposed. Generally speaking, there are two ways to solve the problem: eliminate transitions of type (2) either with special boundary conditions or with laws of nature. The former method works if we assume that earlier states of the universe are of comparatively low-entropy and that (relatively) later states are not also low-entropy states. There are no high-to-low-entropy processes simply because earlier entropy was very low. Alternatively, the latter method works if we can somehow restrict the domain of physically possible worlds to those admitting only low-to-high transitions. The laws of nature are the straightjacket on what we deem physically possible. Since we need to eliminate transitions of type (2) while keeping those of type (1) (or vice versa), a necessary condition of the laws doing this job is that they be time reversal noninvariant. Our choice of strategy boils down to either assuming temporally asymmetric boundary conditions or of adding (or changing to or restricting to) time reversal noninvariant laws of nature that make entropy increase likely. Many approaches to this problem have thought to avoid this dilemma, but a little analysis of any proposed “third way” arguably proves this to be false. Motivations for restrictions of type (2) transitions originate in both philosophy and in particular physical theories. The rest of this section describes some of the wide range of views found on the issue. 2.1 Past Hypothesis Without proclaiming the laws of nature time asymmetric, there is no way to eliminate as impossible transitions (2) in favor of (1). Nevertheless, appealing to temporally asymmetric boundary conditions allows us to describe a world wherein (1) but not (2) occur. A cosmological hypothesis claiming that in the very distant past entropy was much lower will work. Boltzmann, as well as many of this century’s greatest scientists, e.g., Einstein, Richard Feynman, and Erwin Schroedinger, saw that this hypothesis is necessary given our (mostly) time asymmetric laws. (Boltzmann, however, explained this low-entropy condition by treating the observable universe as a natural statistical fluctuation away from equilibrium in a vastly larger universe.) Earlier states do not have higher entropy than present states because we make the cosmological posit that the universe began in an extremely tiny section of its available phase space. Albert (2000) calls this the “Past Hypothesis” and argues that it solves both this problem of the direction of time and also the one to be discussed below. Note that classical mechanics is also compatible with a “Future Hypothesis”: the claim that entropy is very low in the distant future. The restriction to “distant” is needed, for if the near future were of low-entropy, we would not expect the thermodynamic behavior that we see—see Cocke 1967, Price 1996, and Schulman 1997 for discussion of two-time boundary conditions. The Past Hypothesis offers an elegant solution to the problem of the direction of time. However, there are some concerns. First, some find it incredible that (e.g.) gases everywhere for all time should expand through their available volumes due to special initial conditions. The common cause of these events is viewed as itself monstrously unlikely. Expressing this feeling, R. Penrose (1989) estimates that the probability, given the standard measure on phase space, of the universe starting in the requisite state is astronomically small. In response, one may hold that the Past Hypothesis is lawlike. If so, then the probability for this state, if such exists, is one! Even if one doesn’t go down this path, one may have other problems with claiming that the initial condition of the universe needs further explanation. See Callender 2004a,b for such a view and Price 1996, 2004 for the contrary position. Second, another persistent line of criticism might be labeled the “subsystem” worry. It’s consistent with the Past Hypothesis, after all, that none of the subsystems on Earth ever display thermodynamically asymmetric behavior. How exactly does the global entropy increase of the universe imply local entropy increase among the subsystems (which, after all, is what causes us to posit the Second Law in the first place)? See Winsberg 2004 for this objection and Callender 2011a, Frisch 2010, and North 2011 for discussion. Third, what exactly does the Past Hypothesis say in the context of our best and most recent physics? While not denying that temporally asymmetric boundary conditions are needed to solve the problem, Earman (2006) is very critical of the Past Hypothesis, concluding that it isn’t even coherent enough to be false. The main problem Earman sees is that we cannot state the Past Hypothesis in the language of general relativity. Callender (2010, 2011b) and Wallace (2010) discuss the related question of stating the Past Hypothesis when self-gravitation is included. One may also consider the question in the context of quantum theory (see Wallace 2013). 2.2 Electromagnetism If we place an isolated concentrated homogeneous gas in the middle of a large empty volume, we would expect the particles to spread out in an expanding sphere about the center of the gas, much as waves of radiation spread out from concentrated charge sources. It is therefore tempting to think that there is a relationship between the thermodynamic and electromagnetic arrows of time. In a debate in 1909, Albert Einstein and Walther Ritz apparently disagreed about the nature of this relationship, although the exact points of dispute remain a bit unclear. The common story told is that Ritz took the position that the asymmetry of radiation had to be judged lawlike and that the thermodynamic asymmetry could be derived from this law. Einstein’s position is instead that “irreversibility is exclusively based on reasons of probability” (Ritz and Einstein 1909, English translation from Zeh 1989: 13). It is unclear whether Einstein meant probability plus the right boundary conditions, or simply probability alone. In any case, Ritz is said to believe that the radiation arrow causes the thermodynamic one, whereas Einstein is said to hold something closer to the opposite position. The real story is far more complicated, as Ritz had a particle-based ontology in mind as well as many additional considerations (see Frisch and Pietsch 2016 for subtleties of the actual historical debate). If this common tale is correct—and there is reason to think it isn’t the full story—then it seems that Einstein must be closer to being correct than Ritz. Ritz’ position appears implausible if only because it implies gases composed of neutral particles will not tend to spread out. That aside, Einstein’s position is attractive if we concentrate on the wave asymmetry mentioned above. Using Popper 1956’s famous mechanical wave example as an analogy, throwing a rock into a pond so that waves on the surface spread out into the future requires every bit the conspiracy that is needed for waves to converge on a point in order to eject a rock from the bottom. However, here it does seem clear that one process is favored thermodynamically and the other disfavored once we have a thermodynamic arrow in hand. Given a solution to the thermodynamic arrow, impulses directed toward the center of a pond such as to eject a rock are unlikely, whereas a rock triggering spherical waves diverging from the point of impact are likely. Here the radiation arrow seems plausibly connected to and perhaps even derivable from the thermodynamic arrow. The main interesting difference is that Popper’s time-reversed pond seems approximately attainable whereas anti-thermodynamic processes seem more absolutely forbidden (or at least dramatically harder to engine, requiring a so-called Maxwell Demon). If the wave asymmetry were the only electromagnetic arrow, then the above sketch would plausibly capture the core connection between the thermodynamic and electromagnetic arrows of time. We would have reason to think that whatever causes the thermodynamic arrow also is responsible for the electromagnetic arrow. That may ultimately be correct. However, it’s too early to conclude that, for electromagnetism is chock full of arrows of time besides the wave asymmetry. Maxwell’s equations are well-known to include both “advanced” and “retarded” solutions. The retarded solution \[ \phi_{\text{ret}}(r,t) = \int dr' \rho\frac{(r', t- \frac{\lvert r'-r\rvert}{c})}{\lvert r'-r\rvert} \] gives the field amplitude \(\phi_{\text{ret}}\) at \(r,t\) by finding the source density \(r\) at \(r'\) at earlier times. The advanced solution \[ \phi_{\text{adv}}(r,t) = \int dr' \rho\frac{(r', t+ \frac{\lvert r'-r\rvert}{c})}{\lvert r'-r\rvert} \] gives the field amplitude in terms of the source density at \(r'\) at later times. Physicists routinely discard the advanced solutions for reasons of “causality”. It is not so clear thermodynamic considerations are behind this rejection of solutions, an asymmetry made all the harder to see given the freedom electromagnetism has to rewrite retarded fields in terms of advanced fields and outgoing sourceless radiation (and vice versa). Electromagnetism is also said to be allow emissions and not absorptions. Accelerating charges are also damped and not anti-damped by the field. With so many arrows besides the wave asymmetry—emission/absorption, in/out, retarded/advanced, damped/anti-damped—it’s premature to say that the thermodynamic arrow is the one arrow to rule them all. Most agree that the wave asymmetry is ultimately “thermodynamic” but after that matters are contested. For further discussion of these controversial points, see the articles/chapters by Allori 2015; Arntzenius 1994; Atkinson 2006; Earman 2011; Frisch 2000, 2006; Frisch and Pietsch 2016; North 2003; Price 1996, 2006; Rohrlich 2006; and Zeh 1989. 2.3 Cosmology Cosmology presents us with a number of apparently temporally asymmetric mechanisms. The most obvious one is the inexorable expansion of the universe. The spatial scale factor \(a(t)\), which we might conceive roughly as the radius of the universe (it gives the distance between co-moving observers), is increasing. The universe seems to be uniformly expanding relative to our local frame. Since this temporal asymmetry occupies a rather unique status it is natural to wonder whether it might be the “master” arrow. The cosmologist Thomas Gold 1962 proposed just this. Believing that entropy values covary with the size of the universe, Gold asserts that at the maximum radius the thermodynamic arrow will “flip” due to the re-contraction. However, as Richard Tolman 1934 has shown in some detail, a universe filled with non-relativistic particles will not suffer entropy increase due to expansion, nor will an expanding universe uniformly filled with blackbody radiation increase its entropy either. Interestingly, Tolman demonstrated that more realistic universes containing both matter and radiation will change their entropy contents. Coupled with expansion, various processes will contribute to entropy increase, e.g., energy will flow from the “hot” radiation to the “cool” matter. So long as the relaxation time of these processes is larger than the expansion time scale, they should generate entropy. We thus have a purely cosmological method of entropy generation. Others (e.g., Davies 1994) have thought inflation provides a kind of entropy-increasing behavior—again, given the sort of matter content we have in our universe. The inflationary model is an alternative of sorts to the standard big bang model, although by now it is so well entrenched in the cosmology community that it really deserves the tag “standard”. In this scenario, the universe is very early in a quantum state called a “false vacuum”, a state with a very high energy density and negative pressure. Gravity acts like Einstein’s cosmological constant, so that it is repulsive rather than attractive. Under this force the universe enters a period of exponential inflation, with geometry resembling de Sitter space. When this period ends any initial inhomogeneities will have been smoothed to insignificance. At this point ordinary stellar evolution begins. Loosely associating gravitational homogeneity with low-entropy and inhomogeneity with higher entropy, inflation is arguably a source of a low entropy “initial” condition. There are other proposed sources of cosmological entropy generation, but these should suffice to give the reader a flavor of the idea. We shall not be concerned with evaluating these scenarios in any detail. Rather, our concern is about how these proposals explain time’s arrow. In particular, how do they square with our earlier claim that the issue boils down to either assuming temporally asymmetric boundary conditions or of adding time reversal non-invariant laws of nature? The answer is not always clear, owing in part to the fact that the separation between laws of nature and boundary conditions is especially slippery in the science of cosmology. Advocates of the cosmological explanation of time’s arrow typically see themselves as explaining the origin of the needed low-entropy cosmological condition. Some explicitly state that special initial conditions are needed for the thermodynamic arrow, but differ with the conventional “statistical” school in deducing the origin of these initial conditions. Earlier low-entropy conditions are not viewed as the boundary conditions of the spacetime. They came about, according to the cosmological schools, about a second or more after the big bang. But when the universe is the size of a small particle, a second or more is enough time for some kind of cosmological mechanism to bring about our low-entropy “initial” condition. What cosmologists (primarily) differ about is the precise nature of this mechanism. Once the mechanism creates the “initial” low-entropy we have the same sort of explanation of the thermodynamic asymmetry as discussed in the previous section. Because the proposed mechanisms are supposed to make the special initial conditions inevitable or at least highly probable, this maneuver seems like the alleged “third way” mentioned above. The central question about this type of explanation, as far as we’re concerned, is this: Is the existence of the low “initial” state a consequence of the laws of nature alone or the laws plus boundary conditions? In other words, first, does the proposed mechanism produce low-entropy states given any initial condition, and second, is it a consequence of the laws alone or a consequence of the laws plus initial conditions? We want to know whether our question has merely been shifted back a step, whether the explanation is a disguised appeal to special initial conditions. Though we cannot here answer the question in general, we can say that the two mechanisms mentioned are not lawlike in nature. Expansion fails on two counts. There are boundary conditions in expanding universes that do not lead to an entropy gradient, i.e., conditions without the right matter-radiation content, and there are boundary conditions that do not lead to expansion in which entropy nonetheless increases, e.g., matter-filled Friedmann models that do not expand. Inflation fails at least on the second count. Despite advertising, arbitrary initial conditions will not give rise to an inflationary period. Furthermore, it’s not clear that inflationary periods will give rise to thermodynamic asymmetries (Price 1996: ch. 2). The cosmological scenarios do not seem to make the thermodynamic asymmetries a result of nomic necessity. The cosmological hypotheses may be true, and in some sense, they may even explain the low-entropy initial state. But they do not appear to provide an explanation of the thermodynamic asymmetry that makes it nomologically necessary or even likely. Another way to see the point is to consider the question of whether the thermodynamic arrow would “flip” if (say) the universe started to contract. Gold, as we said above, asserts that at the maximum radius the thermodynamic arrow must “flip” due to the re-contraction. Not positing a thermodynamic flip while maintaining that entropy values covary with the radius of the universe is clearly inconsistent—it is what Price (1996) calls the fallacy of a “temporal double standard”. Gold does not commit this fallacy, and so he claims that the entropy must decrease if ever the universe started to re-contract. However, as Albert writes, there are plainly locations in the phase space of the world from which … the world’s radius will inexorably head up and the world’s entropy will inexorably head down. (2000: 90) Since that is the case, it doesn’t follow from law that the thermodynamic arrow will flip during re-contraction; therefore, without changing the fundamental laws, the Gold mechanism cannot explain the thermodynamic arrow in the sense we want. From these considerations we can understand the basic dilemma that runs throughout Price (1995, 1996): either we explain the earlier low-entropy condition Gold-style or it is inexplicable by time-symmetric physics. Because there is no net asymmetry in a Gold universe, we might paraphrase Price’s conclusion in a more disturbing manner as the claim that the (local) thermodynamic arrow is explicable just in case (globally) there isn’t one. However, notice that this remark leaves open the idea that the laws governing expansion or inflation are not time reversal invariant. (For more on Price’s basic dilemma, see Callender 1998 and Price 1995.) Finally, it’s important to remember that this dilemma and the need for a Past Hypothesis are dependent upon a particular physical set-up. Can we explain the thermodynamic arrow without invoking a Past Hypothesis? Inspired by the idea of eternal spontaneous inflation, Carroll and Chen (2004, Other Internet Resources) describe a model in which new baby universes (or “pocket universes”) are repeatedly born from existing universes. Each birth increases the overall entropy of the multiverse although within each baby universe we have our familiar thermodynamic asymmetry. The crucial assumption in this model – one also found in the gravitational theory of Barbour, Koslowski, and Mercati (2014) – is that entropy is unbound. It can be arbitrarily high. With this assumption and in these models, one can do without a past Hypothesis. For discussion, see Goldstein, Tumulka, & Zanghi 2016 and Lazarovici and Reichert 2020. 2.4 Quantum Cosmology Quantum cosmology, it is often said, is the theory of the universe’s initial conditions. Presumably this entails that its posits are to be regarded as lawlike. Because theories are typically understood as containing a set of laws, quantum cosmologists apparently assume that the distinction between laws and initial conditions is fluid. Particular initial conditions will be said to obtain as a matter of law. Hawking writes, for example, we shall not have a complete model of the universe until we can say more about the boundary conditions than that they must be whatever would produce what we observe, (1987: 163). Combining such aspirations with the observation that thermodynamics requires special boundary conditions leads quite naturally to the thought that “the second law becomes a selection principle for the boundary conditions of the universe [for quantum cosmology]” (Laflamme 1994: 358). In other words, if one is to have a theory of initial conditions, it would certainly be desirable to deduce initial conditions that will lead to the thermodynamic arrow. This is precisely what many quantum cosmologists have sought. (This should be contrasted with the arrows of time discussed in semiclassical quantum gravity, for example, the idea that quantum scattering processes in systems with black holes violate the CPT theorem.) Since quantum cosmology is currently very speculative, it might be premature to start worrying about what it says about time’s arrow. Nevertheless, there has been a substantial amount of debate on this issue (see Haliwell et al. 1994). 2.5 Causation Penrose and Percival (1962) propose a general causal principle to handle our problem. The principle states that the effects of interactions happen after those interactions but not before. Similar to Reichenbach’s principle of the common cause, they suggest what they dub the Law of Conditional Independence, namely, that “If A and B are two disjoint 4-regions, and C is any 4-region which divides the union of the pasts of A and B into two parts, one containing A and the other containing B, then A and B are conditionally independent given c. That is, Pr(a&b/c) = Pr(a/c) × Pr(b/c), for all a,b.” (Penrose and Percival 1962, p. 611). Here c is an event that is a common cause that screens off the correlation between events in A and B. In terms of statistical mechanics, this law would have the effect of making the phase space density associated with a system at a time determined by earlier events but not later events. This would more or less directly preclude the temporal "parity of reasoning" motivated transitions assumed in the problem of the direction of time, transition of type (2). To achieve this, the Law of Conditional Independence must be time asymmetric, which it is, and it must be a kind of fundamental principle that restricts the lawlike correlation otherwise allowed. After all, if we assume that the laws of nature are time reversal invariant, then there is no asymmetry between pre- and post-interaction correlations. Price 1996 (chapter 5) and Sklar 1993 hold that this nomic restriction is unwarranted or unexplanatory. There is the sense that the causal asymmetry should come out of more basic physics, not be baked into this physics. Horwich 1987 is an example of someone trying to derive what he calls the fork asymmetry, which is similar to the Law of Conditional Independence, from more basic assumptions. A recent contribution that has some affinities with the Penrose and Percival move can be found in Myrvold 2020. 2.6 Time Itself Some philosophers have sought an answer to the problem of time’s arrow by claiming that time itself is directed. They do not mean time is asymmetric in the sense intended by advocates of the tensed theory of time. Their proposals are firmly rooted in the idea that time and space are properly represented on a four-dimensional manifold. The main idea is that the asymmetries in time indicate something about the nature of time itself. Christensen (1993) argues that this is the most economical response to our problem since it posits nothing besides time as the common cause of the asymmetries, and we already believe in time. A proposal similar to Christensen’s is Weingard’s “time-ordering field” (1977). Weingard’s speculative thesis is that spacetime is temporally oriented by a “time potential”, a timelike vector field that at every spacetime point directs a vector into its future light cone. In other words, supposing our spacetime is temporally orientable, Weingard wants to actually orient it. The main virtue of this is that it provides a time sense everywhere, even in spacetimes containing closed timelike curves (so long as they’re temporally orientable). As he shows, any explication of the “earlier than” relation in terms of some other physical relation will have trouble providing a consistent description of time direction in such spacetimes. Another virtue of the idea is that it is in principle capable of explaining all the temporal asymmetries. If coupled to the various asymmetries in time, it would be the “master arrow” responsible for the arrows of interest. As Sklar (1985) notes, Weingard’s proposal makes the past-future asymmetry very much like the up-down asymmetry. As the up-down asymmetry was reduced to the existence of a gravitational potential—and not an asymmetry of space itself—so the past-future asymmetry would reduce to the time potential—and not an asymmetry of time itself. Of course, if one thinks of the gravitational metric field as part of spacetime, there is a sense in which the reduction of the up-down asymmetry really was a reduction to a spacetime asymmetry. And if the metric field is conceived as part of spacetime—which is itself a huge source of contention in philosophy of physics—it is natural to think of Weingard’s time-ordering field as also part of spacetime. Thus his proposal shares a lot in common with Christensen’s suggestion. This sort of proposal has been criticized by Sklar on methodological grounds. Sklar claims that scientists would not accept such an explanation (1985: 111–2). One might point out, however, that many scientists did believe in analogues of the time-ordering field as possible causes of the CP violations.[4] The time-ordering field, if it exists, would be an unseen (except through its effects) common cause of strikingly ubiquitous phenomena. Scientists routinely accept such explanations. To find a problem with the time-ordering field we need not invoke methodological scruples; instead we can simply ask whether it does the job asked of it. Is there a mechanism that will couple the time-ordering field to thermodynamic phenomena? Weingard says the time potential field needs to be suitably coupled (1977: 130) to the non-accidental asymmetric processes, but neither he nor Christensen elaborate on how this is to be accomplished. Until this is addressed satisfactorily, this speculative idea must be considered interesting yet embryonic. For more recent work in this vein, see Maudlin 2002. 2.7 Interventionism When explaining time’s arrow many philosophers and physicists have focused their attention upon the unimpeachable fact that real systems are open systems that are subjected to interactions of various sorts. Thermodynamic systems cannot be truly isolated. To take the most obvious example, we can not shield a system from the influence of gravity. At best, we can move systems to locations feeling less and less gravitational force, but we can never completely decouple a system from the gravitational field. Not only do we ignore the weak gravitational force when doing classical thermodynamics, but we also ignore less exotic matters, such as the walls in the standard gas in a box scenario. We can do this because the time it takes for a gas to reach equilibrium with itself is vastly shorter than the time it takes the gas plus walls system to reach equilibrium. For this reason we typically discount the effects of the box walls on the gas. In this approximation many have thought there lies a possible solution to the problem of the direction of time. Indeed, many have thought herein lies a solution that does not change the laws of classical mechanics and does not allow for the nomological possibility of anti-thermodynamic behavior. In other words, advocates of this view seem to believe it embodies a third way. Blatt 1959; Reichenbach 1956; Redhead and Ridderbos 1998, and to some extent, Horwich 1987 are a few works charmed by this idea. The idea is to take advantage of what a random perturbation of the representative phase point would do to the evolution of a system. Given our Boltzmannian setup, there is a tremendous asymmetry in phase space between the volumes of points leading to equilibrium and of points leading away from equilibrium. If the representative point of a system were knocked about randomly, then due to this asymmetry, it would be very probable that the system at any given time be on a trajectory leading toward equilibrium. Thus, if it could be argued that the earlier treatment of the statistical mechanics of ideal systems ignored a random perturber in the environment of the system, then one would seem to have a solution to our problems. Even if the perturbation were weak it would still have the desired effect. The weak “random” previously ignored knocking of the environment is is claimed to be the cause of the approach to equilibrium. Prima facie, this answer to the problem escapes the appeal to special initial conditions and the appeal to new laws. But only prima facie. A number of criticisms have been leveled against this maneuver. One that seems on the mark is the observation that if classical mechanics is to be a universal theory, then the environment must be governed by the laws of classical mechanics as well. The environment is not some mechanism outside the governance of physical law, after all, and when we treat it too, the “deus ex machina”—the random perturber—disappears. If we treat the gas-plus-the-container walls as a classical system, it is still governed by time-reversible laws that will cause the same problem as we met with the gas alone. At this point one sometimes sees the response that this combined system of gas plus walls has a neglected environment too, and so on, and so on, until we get to the entire universe. It is then questioned whether we have a right to expect laws to apply universally (Reichenbach 1956: 81ff). Or the point is made that we cannot write down the Hamiltonian for all the interactions a real system suffers, and so there will always be something “outside” what is governed by the time-reversible Hamiltonian. Both of these points rely, one suspects, on an underlying instrumentalism about the laws of nature. Our problem only arises if we assume or pretend that the world literally is the way the theory says; dropping this assumption naturally “solves” the problem. Rather than further address these responses, let us turn to the claim that this maneuver need not modify the laws of classical mechanics. If one does not make the radical proclamation that physical law does not govern the environment, then it is easy to see that whatever law describes the perturber’s behavior, it cannot be the laws of classical mechanics \(if\) the environment is to do the job required of it. A time-reversal noninvariant law, in contrast to the time symmetric laws of classical mechanics, must govern the external perturber. Otherwise we can in principle subject the whole system, environment plus system of interest, to a Loschmidt reversal. The system’s velocities will reverse, as will the velocities of the millions of tiny perturbers. “Miraculously”, as if there were a conspiracy between the reversed system and the millions of “anti-perturbers”, the whole system will return to a time reverse of its original state. What is more, this reversal will be just as likely as the original process if the laws are time reversal invariant. A minimal criterion of adequacy, therefore, is that the random perturbers be time reversal noninvariant. But the laws of classical mechanics are time reversal invariant. Consequently, if this “solution” is to succeed, it must exercise new laws and modify or supplement classical mechanics. (Since the perturbations need to be genuinely random and not merely unpredictable, and since classical mechanics is deterministic, the same sort of argument could be run with indeterminism instead of irreversibility. See Price 2002 for a diagnosis of why people have made this mistake, and also for an argument objecting to interventionism for offering a “redundant” physical mechanism responsible for entropy increase.)[5] 2.8 Quantum Mechanics To the best of our knowledge our world is fundamentally quantum mechanical, not classical mechanical. Does this change the situation? “Maybe” is perhaps the best answer. Not surprisingly, answers to the question are affected by one’s interpretation of quantum mechanics. Quantum mechanics suffers from the notorious measurement problem, a problem which demands one or another interpretation of the quantum formalism. These interpretations fall broadly into two types, depending on their view of the unitary evolution of the quantum state (e.g., evolution according to the Schroedinger equation): they either say that there is something more than the quantum state, or that the unitary evolution is not entirely correct. The former are called “no-collapse” interpretations while the latter are dubbed “collapse” interpretations. This is not the place to go into the details of these interpretations, but we can still sketch the outlines of the picture painted by quantum mechanics (for more see Albert 1992). Modulo some philosophical concerns about the meaning of time reversal (Albert 2000; Earman 2002), the equation governing the unitary evolution of the quantum state is time reversal invariant. For interpretations that add something to quantum mechanics, this typically means that the resulting theory is time reversal invariant too (since it would be odd or even inconsistent to have one part of the theory invariant and the other part not). Since the resulting theory is time reversal invariant, it is possible to generate the problem of the direction of time just as we did with classical mechanics. While many details are altered in the change from classical to no-collapse quantum mechanics, the logical geography seems to remain the same. Collapse interpretations are more interesting with respect to our topic. Collapses interrupt or outright replace the unitary evolution of the quantum state. To date, they have always done so in a time reversal noninvariant manner. The resulting theory, therefore, is not time reversal invariant. This fact offers a potential escape from our problem: the transitions of type (2) in our above statement of the problem may not be lawful. And this has led many thinkers throughout the century to believe that collapses somehow explain the thermodynamic time asymmetry. Mostly these postulated methods fail to provide what we want. We think gases relax to equilibrium even when they’re not measured by Bohrian observers or Wignerian conscious beings. This complaint is, admittedly, not independent of more general complaints about the adequacy of these interpretations. But perhaps because of these controversial features they have not been pushed very far in explaining thermodynamics. More satisfactory collapse theories exist, however. One, due to Ghirardi, Rimini, and Weber, commonly known as GRW, can describe collapses in a closed system—no dubious appeal to observers outside the quantum system is required. Albert (1992, 2000) has extensively investigated the impact GRW would have on statistical mechanics and thermodynamics. GRW would ground a temporally asymmetric probabilistic tendency for systems to evolve toward equilibrium. Anti-thermodynamic behavior is not impossible according to this theory. Instead it is tremendously unlikely. The innovation of the theory lies in the fact that although entropy is overwhelmingly likely to increase toward the future, it is not also overwhelmingly likely to increase toward the past (because there are no dynamic backwards transition probabilities provided by the theory). So the theory does not suffer from a problem of the direction of time as stated above. This does not mean, however, that it removes the need for something like the Past Hypothesis. GRW is capable of explaining why, given a present nonequilibrium state, later states should have higher entropy; and it can do this without also implying that earlier states have higher entropy too. But it does not explain how the universe ever got into a nonequilibrium state in the first place. As indicated before, some are not sure what would explain this fact, if anything, or whether it’s something we should even aspire to explain. The principal virtue GRW would bring to the situation, Albert thinks, is that it would solve or bypass various troubles involving the nature of probabilities in statistical mechanics. The same type of benefit, plus arguably others, come from a recent proposal by Chen (forthcoming). Chen suggests that we adopt a position known as density matrix realism to help understand time’s arrow. Instead of regarding the wavefunction as the basic ontology of quantum theory, we take the quantum state to be represented by an impure density matrix. When we express the Past Hypothesis in terms of a density matrix, a number of virtues appear, including greater harmony between the probabilities of statistical mechanics and quantum mechanics. It may be that interpretations of quantum mechanics that are not like GRW can possess some of the same benefits that GRW brings. More detailed discussion of the impact quantum mechanics has on our problem can be found in Albert 2000, North 2002, Price 2002 and Chen forthcoming. But if our superficial review is correct, we can say that quantum mechanics will not obviate our need for a Past Hypothesis though it may well solve at least one problem related to the direction of time. 2.9 Lawlike Initial Conditions? Finally, let’s return to a point made in passing about the status of the Past Hypothesis. Without some new physics that eliminates or explains the Past Hypothesis, or some satisfactory “third way”, it seems we are left with a bald posit of special initial conditions. One can question whether there really is anything unsatisfactory about this (Sklar 1993; Callender 2004b). But perhaps we were wrong in the first place to think of the Past Hypothesis as a contingent boundary condition. The question “why these special initial conditions?” would be answered with “it’s physically impossible for them to be otherwise”, which is always a conversation stopper. Indeed, Feynman (1965: 116) speaks this way when explaining the statistical version of the second law. Absent a particular understanding of laws of nature, there is perhaps not much to say about the issue. But given particular conceptions of lawhood, it is clear that various judgments about this issue follow naturally—as we will see momentarily. However, let’s acknowledge that this may be to get matters backwards. It might be said that we first ought to find out whether the boundary conditions are lawlike, and then devise a theory of law appropriate to the answer. To decide whether or not the boundary conditions are lawlike based merely on current philosophical theories of law is to prejudge the issue. Perhaps this objection is really evidence of the feeling that settling the issue based on one’s conception of lawhood seems a bit unsatisfying. It is hard to deny this. Even so, it is illuminating to have a brief look at the relationships between some conceptions of lawhood and the topic of special initial conditions. For discussion and references on laws of nature, please refer to the entry on that topic. For instance, if one agrees with John Stuart Mill that from the laws one should be able to deduce everything and one considers the thermodynamic part of that “everything”, then the special initial condition will be needed for such a deduction. The modern heir of this conception of lawhood, the one associated with Frank Ramsey and David Lewis (see Loewer 1996), sees laws as the axioms of the simplest, most powerful, consistent deductive system possible. It is likely that the specification of a special initial condition would emerge as an axiom in such a system, for such a constraint may well make the laws much more powerful than they otherwise would be. We should not expect the naïve regularity view of laws to follow suit, however. On this sort of account, roughly, if \(B\)s always follow \(A\)s, then it is a law of nature that \(A\) causes \(B\). To avoid finding laws everywhere, however, this account needs to assume that \(A\)s and \(B\)s are instantiated plenty of times. But the initial conditions occur only once. For more robust realist conceptions of law, it’s difficult to predict whether the special initial conditions will emerge as lawlike. Necessitarian accounts like Pargetter’s (1984) maintain that it is a law that \(P\) in our world iff \(P\) obtains at every possible world joined to ours by a nomic accessibility relation. Without more specific information about the nature of the accessibility relations and the worlds to which we’re related, one can only guess whether all of the worlds relative to ours have the same special initial conditions. Nevertheless some realist theories offer apparently prohibitive criteria, so they are able to make negative judgments. For instance, “universalist” theories associated with David Armstrong say that laws are relations between universals. Yet a constraint on initial conditions isn’t in any natural way put in this form; hence it would seem the universalist theory would not consider this constraint lawlike. Philosophical opinion is certainly divided. The problem is that a lawlike boundary condition lacks many of the features we ordinarily attribute to laws, e.g., multiple instances, governing temporal evolution, etc., yet different accounts of laws focus on different subsets of these features. When we turn to the issue at hand, what we find is the disagreement we expect. 3. The Problem of the Direction of Time II Life is filled with temporal asymmetries. This directedness is one of the most general features of the world we inhabit. We can break this general tendency down into a few more specific temporal arrows. The above list is not meant to be exhaustive or especially clean. Temporal asymmetries are everywhere. We age and die. Punchlines are at the ends of jokes. Propensities and dispositions and reproductive fitness are all future-directed. We prefer rags-to-riches stories to riches-to-rags stories. Obviously there are connections amongst many of these arrows. Some authors have explicitly or implicitly proposed various “dependency charts” that are supposed to explain which of the above arrows depend on which for their existence. Horwich (1987) argues for an explanatory relationship wherein the counterfactual arrow depends on the causal arrow, which depends on the arrow of explanation, which depends on the epistemological arrow. Lewis (1979), by contrast, thinks an alleged over-determination of traces grounds the asymmetry of counterfactuals and that this in turn grounds the rest. Suhler and Callender (2011) ground the psychological arrow on the causal and knowledge asymmetries. The chart one judges most appropriate will depend, to a large degree, upon one’s general philosophical stance on many large topics. Which dependency chart is the correct one is not our concern here. Rather, the second “problem of the direction of time” asks: do any (all?) of these arrows ultimately hold in virtue of the thermodynamic arrow of time (or what grounds it)? Sklar (1985) provides useful examples to have in mind. Consider the up-down asymmetry. It plausibly reduces to the local gravitational gradient. Astronauts on the moon think down is the direction toward the center of the moon, not wherever it was when they left Earth. By contrast, there is (probably) merely a correlation between the left-right asymmetry (say, in snail shells) and parity violations in high-energy particle physics. The second problem asks whether any of the above temporal asymmetries are to the thermodynamic arrow as the up-down asymmetry is to the local gravitational gradient. Of course, we don’t expect anything quite so straightforward. Sklar describes an experiment where iron dust inserted in the ear sacs of fish cause the fish to swim upside down when a magnet is held over the tank, presumably altering their sense of up and down. But as Jos Uffink remarked to me, going inside a refrigerator doesn’t cause us to remember the future. The connections, if any, are bound to be subtle. 3.1 The Thermodynamic Reduction Inspired by Boltzmann’s attempts in this regard, many philosophers have sought such reductions, either partial or total. Grünbaum (1973) and Smart (1967) develop entropic accounts of the knowledge asymmetry. Lewis (1979) suspects the asymmetry of traces is linked to the thermodynamic arrow but provides no specifics. Dowe (1992), like a few others, ties the direction of causation to the entropy gradient. And some have also tied the psychological arrow to this gradient (for a discussion see Kroes 1985). Perhaps the most ambitious attempts at grounding many arrows all at once can be found in Reichenbach 1956, Horwich 1987, and Albert 2000, 2015. Each of these books offers possible thermodynamic explanations for the causal and epistemic arrows, as well as many subsidiary arrows. A straightforward reduction of these arrows to entropy is probably not in the cards (Earman 1974; Horwich 1987). Consider the epistemic arrow of time. The traditional entropic account claimed that because we know there are many more entropy-increasing rather than entropy-decreasing systems in the world (or our part of it), we can infer when we see a low-entropy system that it was preceded and caused by an interaction with something outside the system. To take the canonical example, imagine you are walking on the beach and come across a footprint in the sand. You can infer that earlier someone walked by (in contrast to it arising as a random fluctuation). In other words, you infer, due to its high order, that it was caused by something previously also of high (or higher) order, i.e, someone walking. However, the entropic account faces some very severe challenges. First, do footprints on beaches have well-defined thermodynamic entropies? To describe the example we switched from low-entropy to high order, but the association between entropy and our ordinary concept of order is tenuous at best and usually completely misleading. (To appreciate this, just consider what happens to your salad dressing after it is left undisturbed. Order increases when the oil and vinegar separate, yet entropy has increased.) To describe the range of systems about which we have knowledge, the account needs something broader than the thermodynamic entropy. But what? Reichenbach is forced to move to a notion of quasi-entropy, losing the reduction in the process. Second, the entropic account doesn’t license the inference to a human being walking on the beach. All it tells you is that the grains of sand in the footprint interacted with its environment previously, which barely scratches the surface of our ability to tell detailed stories about what happened in the past. Third, even if we entertain a broader understanding of entropy, it still doesn’t always work. Consider Earman’s (1974) example of a bomb destroying a city. From the destruction we may infer that a bomb went off; yet the bombed city does not have lower entropy than its surroundings or even any type of intuitively higher order than its surroundings.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

The Experience and Perception of Time

1. What is ‘the perception of time’? The very expression ‘the perception of time’ invites objection. Insofar as time is something different from events, we do not perceive time as such, but changes or events in time. But, arguably, we do not perceive events only, but also their temporal relations. …

1. What is ‘the perception of time’? The very expression ‘the perception of time’ invites objection. Insofar as time is something different from events, we do not perceive time as such, but changes or events in time. But, arguably, we do not perceive events only, but also their temporal relations. So, just as it is natural to say that we perceive spatial distances and other relations between objects (I see the dragonfly as hovering above the surface of the water), it seems natural to talk of perceiving one event following another (the thunderclap as following the flash of lightning), though even here there is a difficulty. For what we perceive, we perceive as present—as going on right now. Can we perceive a relation between two events without also perceiving the events themselves? If not, then it seems we perceive both events as present, in which case we must perceive them as simultaneous, and so not as successive after all. There is then a paradox in the notion of perceiving an event as occurring after another, though one that perhaps admits of a straightforward solution. When we perceive B as coming after A, we have, surely, ceased to perceive A. In which case, A is merely an item in our memory. Now if we wanted to construe ‘perceive’ narrowly, excluding any element of memory, then we would have to say that we do not, after all, perceive B as following A. But in this article, we shall construe ‘perceive’ more broadly, to include a wide range of experiences of time that essentially involve the senses. In this wide sense, we perceive a variety of temporal aspects of the world. We shall begin by enumerating these, and then consider accounts of how such perception is possible. 2. Kinds of temporal experience There are a number of what Ernst Pöppel (1978) calls ‘elementary time experiences’, or fundamental aspects of our experience of time. Among these we may list the experience of (i) duration; (ii) non-simultaneity; (iii) order; (iv) past and present; (v) change, including the passage of time. It might be thought that experience of non-simultaneity is the same as experience of time order, but it appears that, when two events occur very close together in time, we can be aware that they occur at different times without being able to say which one came first (see Hirsh and Sherrick 1961). We might also think that perception of order was itself explicable in terms of our experience of the distinction between past and present. There will certainly be links here, but it is a contentious question whether the experience of tense—that is, experiencing an event as past or present—is more fundamental than the experience of order, or vice versa, or whether indeed there is such a thing as the experience of tense at all. This issue is taken up below. Finally, we should expect to see links between the perception of time order and the perception of motion if the latter simply involves perception of the order of the different spatial positions of an object. This is another contentious issue that is taken up below. 3. Duration One of the earliest, and most famous, discussions of the nature and experience of time occurs in the autobiographical Confessions of St Augustine. Augustine was born in Numidia (now Algeria) in 354 AD, held chairs in rhetoric at Carthage and Milan, and become Bishop of Hippo in 395. He died in 430. As a young adult, he had rejected Christianity, but was finally converted at the age of 32. Book XI of the Confessions contains a long and fascinating exploration of time, and its relation to God. During the course of it Augustine raises the following conundrum: when we say that an event or interval of time is short or long, what is it that is being described as of short or long duration? It cannot be what is past, since that has ceased to be, and what is non-existent cannot presently have any properties, such as being long. But neither can it be what is present, for the present has no duration. (For the reason why the present must be regarded as durationless, see the section on the specious present, below.) In any case, while an event is still going on, its duration cannot be assessed. Augustine’s answer to this riddle is that what we are measuring, when we measure the duration of an event or interval of time, is in the memory. From this he derives the radical conclusion that past and future exist only in the mind. While not following Augustine all the way to the mind-dependence of other times, we can concede that the perception of temporal duration is crucially bound up with memory. It is some feature of our memory of the event (and perhaps specifically our memory of the beginning and end of the event) that allows us to form a belief about its duration. This process need not be described, as Augustine describes it, as a matter of measuring something wholly in the mind. Arguably, at least, we are measuring the event or interval itself, a mind-independent item, but doing so by means of some psychological process. Whatever the process in question is, it seems likely that it is intimately connected with what William Friedman (1990) calls ‘time memory’: that is, memory of when some particular event occurred. That there is a close connection here is entailed by the plausible suggestion that we infer (albeit subconsciously) the duration of an event, once it has ceased, from information about how long ago the beginning of that event occurred. That is, information that is metrical in nature (e.g. ‘the burst of sound was very brief’) is derived from tensed information, concerning how far in the past something occurred. The question is how we acquire this tensed information. It may be direct or indirect, a contrast we can illustrate by two models of time memory described by Friedman. He calls the first the strength model of time memory. If there is such a thing as a memory trace that persists over time, then we could judge the age of a memory (and therefore how long ago the event remembered occurred) from the strength of the trace. The longer ago the event, the weaker the trace. This provides a simple and direct means of assessing the duration of an event. Unfortunately, the trace model comes into conflict with a very familiar feature of our experience: that some memories of recent events may fade more quickly than memories of more distant events, especially when those distant events were very salient ones (visiting a rarely seen and frightening relative when one was a child, for instance.) A contrasting account of time memory is the inference model. According to this, the time of an event is not simply read off from some aspect of the memory of it, but is inferred from information about relations between the event in question and other events whose date or time is known. The inference model may be plausible enough when we are dealing with distant events, but rather less so for much more recent ones. In addition, the model posits a rather complex cognitive operation that is unlikely to occur in non-human animals, such as the rat. Rats, however, are rather good at measuring time over short intervals of up to a minute, as demonstrated by instrumental conditioning experiments involving the ‘free operant procedure’. In this, a given response (such as depressing a lever) will delay the occurrence of an electric shock by a fixed period of time, such as 40 seconds, described as the R-S (response-shock) interval. Eventually, rate of responding tracks the R-S interval, so that the probability of responding increases rapidly as the end of the interval approaches. (See Mackintosh 1983 for a discussion of this and related experiments.) It is hard to avoid the inference here that the mere passage of time itself is acting as a conditioned stimulus: that the rats, to put it in more anthropocentric terms, are successfully estimating intervals of time. In this case, the strength model seems more appropriate than the inference model. 4. The specious present The term ‘specious present’ was first introduced by the psychologist E.R. Clay, but the best known characterisation of it was due to William James, widely regarded as one of the founders of modern psychology. He lived from 1842 to 1910, and was professor both of psychology and of philosophy at Harvard. His definition of the specious present goes as follows: ‘the prototype of all conceived times is the specious present, the short duration of which we are immediately and incessantly sensible’ (James 1890). How long is this specious present? Elsewhere in the same work, James asserts ‘We are constantly aware of a certain duration—the specious present—varying from a few seconds to probably not more than a minute, and this duration (with its content perceived as having one part earlier and another part later) is the original intuition of time.’ This surprising variation in the length of the specious present makes one suspect that more than one definition is hidden in James’ rather vague characterisation. There are two sources of ambiguity here. One is over whether ‘the specious present’ refers to the object of the experience, namely a duration in time, or the way in which that object is presented to us. The second is over how we should interpret ‘immediately sensible’. James’ words suggest that the specious present is the duration itself, picked out as the object of a certain kind of experience. But ‘ immediately sensible’admits of a number of disambiguations. So we could define the specious present as: If James means the first of these, that would certainly explain his suggestion that it could last up to a minute. But this does not seem to have much to do specifically with the experience of presentness, since we can certainly hold something in the short-term memory and yet recognise it as past. James may be thinking of cases where we are listening to a sentence: if we did not somehow hold all the words in our conscious mind, we would not understand the sentence as a whole. But it is clear that the words are not experienced as simultaneous, for then the result would be an unintelligible jumble of sounds. (2) is illustrated by the familiar fact that some movements are so fast that we see them as a blur, such as when we look at a fan. What is in fact taking place at different times is presented as happening in an instant. But this is not standardly what is meant by the specious present. (3) is a construal that is found in the literature (see, e.g., Kelly 2005), but it is not obvious that that is what James had in mind, since James is concerned with the phenomenology of time perception, and whether or not an experience constitutes a direct or indirect perception of an interval does not seem to be a phenomenological matter. (Besides which, as Kelly points out, we might think it odd to suppose that past parts of the interval could be directly perceived.) That leaves us with (4): a duration which is perceived both as present and as temporally extended. This present of experience is ‘specious’ in that, unlike the objective present (if there is such a thing — see The metaphysics of time perception below) it is an interval and not a durationless instant. The real or objective present must be durationless for, as Augustine argued, in an interval of any duration, there are earlier and later parts. So if any part of that interval is present, there will be another part that is past or future. But is it possible to perceive something as extended and as present? If we hear a short phrase of music, we seem to hear the phrase as present, and yet — because it is a phrase rather than a single chord — we also hear the notes as successive, and therefore as extending over an interval. If this does not seem entirely convincing, consider the perception of motion. As Broad (1923) puts it, ‘to see a second-hand moving is quite a different thing from "seeing" that a hour-hand has moved.’ It is not that we see the current position of the second hand and remember where it was a second ago: we just see the motion. That leads to the following argument: Still, there is more than an air of paradox about this. If successive parts of the motion (or musical phrase, or whatever change we perceive) are perceived as present, then surely they are perceived as simultaneous. But if they are perceived as simultaneous, then the motion will simply be a blur, as it is in cases where it is too fast to perceive as motion. The fact that we do not see it as motion suggests that we do not see the successive parts of it as simultaneous, and so do not see them as present. But then how do we explain the distinction to which Broad directs our attention? One way out of this impasse is to suggest that two quite distinct processes are going on in the perception of motion (and other kinds of change). One is the perception of successive states as successive, for example the different positions of the second hand. The other is the perception of pure movement. This second perception, which may involve a more primitive system than the first, does not contain as part the recognition of earlier and later elements. (Le Poidevin 2007, Chapter 5.) Alternatively, we might attempt to explain the phenomena of temporal experience without appeal to the notion of the specious present at all (see Arstila, 2018). 5. Past, present and the passage of time The previous section indicated the importance of distinguishing between perceiving the present and perceiving something as present. We may perceive as present items that are past. Indeed, given the finite speed of the transmission of both light and sound (and the finite speed of transmission of information from receptors to brain), it seems that we only ever perceive what is past. However, this does not by itself tell us what it is to perceive something as present, rather than as past. Nor does it explain the most striking feature of our experience as-of the present: that it is constantly changing. The passage (or apparent passage) of time is its most striking feature, and any account of our perception of time must account for this aspect of our experience. Here is one attempt to do so. The first problem is to explain why our temporal experience is limited in a way in which our spatial experience is not. We can perceive objects that stand in a variety of spatial relations to us: near, far, to the left or right, up or down, etc. Our experience is not limited to the immediate vicinity (although of course our experience is spatially limited to the extent that sufficiently distant objects are invisible to us). But, although we perceive the past, we do not perceive it as past, but as present. Moreover, our experience does not only appear to be temporally limited, it is so: we do not perceive the future, and we do not continue to perceive transient events long after information from them reached our senses. Now, there is a very simple answer to the question why we do not perceive the future, and it is a causal one. Briefly, causes always precede their effects; perception is a causal process, in that to perceive something is to be causally affected by it; therefore we can only perceive earlier events, never later ones. So one temporal boundary of our experience is explained; what of the other? There seems no logical reason why we should not directly experience the distant past. We could appeal to the principle that there can be no action at a temporal distance, so that something distantly past can only causally affect us via more proximate events. But this is inadequate justification. We can only perceive a spatially distant tree by virtue of its effects on items in our vicinity (light reflected off the tree impinging on our retinas), but this is not seen by those who espouse a direct realist theory of perception as incompatible with their position. We still see the tree, they say, not some more immediate object. Perhaps then we should look for a different strategy, such as the following one, which appeals to biological considerations. To be effective agents in the world, we must represent accurately what is currently going on: to be constantly out of date in our beliefs while going about our activities would be to face pretty immediate extinction. Now we are fortunate in that, although we only perceive the past it is, in most cases, the very recent past, since the transmission of light and sound, though finite, is extremely rapid. Moreover, although things change, they do so, again in most cases, at a rate that is vastly slower than the rate at which information from external objects travels to us. So when we form beliefs about what is going on in the world, they are largely accurate ones. (See Butterfield 1984 for a more detailed account along these lines.) But, incoming information having been registered, it needs to move into the memory to make way for more up to date information. For, although things may change slowly relative to the speed of light or of sound, they do change, and we cannot afford to be simultaneously processing conflicting information. So our effectiveness as agents depends on our not continuing to experience a transient state of affairs (rather in the manner of a slow motion film) once information from it has been absorbed. Evolution has ensured that we do not experience anything other than the very recent past (except when we are looking at the heavens). To perceive something as present is simply to perceive it: we do not need to postulate some extra item in our experience that is ‘the experience of presentness.’ It follows that there can be no ‘perception of pastness’. In addition, if pastness were something we could perceive, then we would perceive everything in this way, since every event is past by the time we perceive it. But even if we never perceive anything as past (at the same time as perceiving the event in question) we could intelligibly talk more widely of the experience of pastness: the experience we get when something comes to an end. And it has been suggested that memories—more specifically, episodic memories, those of our experiences of past events—are accompanied by a feeling of pastness (see Russell 1921). The problem that this suggestion is supposed to solve is that an episodic memory is simply a memory of an event: it represents the event simpliciter, rather than the fact that the event is past. So we need to postulate something else which alerts us to the fact that the event remembered is past. An alternative account, and one which does not appeal to any phenomenological aspects of memory, is that memories dispose us to form past-tensed beliefs, and is by virtue of this that they represent an event as past. We have, then, a candidate explanation for our experience of being located at a particular moment in time, the (specious) present. And as the content of that experience is constantly changing, so that position in time shifts. But there is still a further puzzle. Change in our experience is not the same thing as experience of change. We want to know, not just what it is to perceive one event after another, but also what it is to perceive an event as occurring after another. Only then will we understand our experience of the passage of time. We turn, then, to the perception of time order. 6. Time order How do we perceive precedence amongst events? A temptingly simple answer is that the perception of precedence is just a sensation caused by instances of precedence, just as a sensation of red is caused by instances of redness. Hugh Mellor (1998), who considers this line, rejects it for the following reason. If this were the correct explanation, then we could not distinguish between x being earlier than y, and x being later than y, for whenever there is an instance of one relation, there is also an instance of the other. But plainly we are able to distinguish the two cases, so it cannot simply be a matter of perceiving a relation, but something to do with our perception of the relata. But mere perception of the relata cannot be all there is to perceiving precedence. Consider again Broad’s point about the second hand and the hour hand. We first perceive the hour hand in one position, say pointing to 3 o’clock, and later we perceive it in a different position, pointing to half-past 3. So I have two perceptions, one later than the other. I may also be aware of the temporal relationship of the two positions of the hand. Nevertheless, I do not perceive that relationship, in that I do not see the hand moving. In contrast, I do see the second hand move from one position to another: I see the successive positions as successive. Mellor’s proposal is that I perceive x precede y by virtue of the fact that my perception of x causally affects my perception of y. As I see the second hand in one position, I have in my short-term memory an image (or information in some form) of its immediately previous position, and this image affects my current perception. The result is a perception of movement. The perceived order of different positions need not necessarily be the same as the actual temporal order of those positions, but it will be the same as the causal order of the perceptions of them. Since causes always precede their effects, the temporal order perceived entails a corresponding temporal order in the perceptions. Dainton (2001) has objected to this that, if the account were right, we should not be able to remember perceiving precedence, since we only remember what we can genuinely perceive. But there seems no reason to deny that, just because perception of precedence may involve short-term memory, it does not thereby count as genuine perception. There is a further disanalogy between perception of colour and perception of time order. What is perceived in the case of colour is something that has a definite spatio-temporal location. The relation of precedence, in contrast, is not something that has any obvious location. But causes do have locations, so the perception of precedence is rather harder to reconcile with the causal theory of perception than the perception of colour (Le Poidevin 2004, 2007). In effect, Mellor’s idea is that the brain represents time by means of time: that temporally ordered events are represented by similarly temporally ordered experiences. This would make the representation of time unique. (For example, the brain does not represent spatially separated objects by means of spatially separated perceptions, or orange things by orange perceptions.) But why should time be unique in this respect? In other media, time can be represented spatially (as in cartoons, graphs, and analogue clocks) or numerically (as in calendars and digital clocks). So perhaps the brain can represent time by other means. One reason to suppose that it must have other means at its disposal is that time needs to be represented in memory (I recall, both that a was earlier than b, and also the experience of seeing a occur before b) and intention (I intend to F after I G), but there is no obvious way in which Mellor’s ‘representation of time by time’ account can be extended to these. On Mellor’s model, the mechanism by which time-order is perceived is sensitive to the time at which perceptions occur, but indifferent to their content (what the perceptions are of). Daniel Dennett (1991) proposes a different model, on which the process is time-independent, but content-sensitive. For example, the brain may infer the temporal order of events by seeing which sequence makes sense of the causal order of those events. One of the advantages of Dennett’s model is that it can account for the rather puzzling cases of ‘backwards time referral’, where perceived order does not follow the order of perceptions. (See Dennett 1991 for a discussion of these cases, and also Roache 1999 for an attempt to reconcile them with Mellor’s account.) 7. The metaphysics of time perception In giving an account of the various aspects of time perception, we inevitably make use of concepts that we take to have an objective counterpart in the world: the past, temporal order, causation, change, the passage of time and so on. But one of the most important lessons of philosophy, for many writers, is that there may be a gap, perhaps even a gulf, between our representation of the world and the world itself, even on a quite abstract level. (It would be fair to add that, for other writers, this is precisely not the lesson philosophy teaches.) Philosophy of time is no exception to this. Indeed, it is interesting to note how many philosophers have taken the view that, despite appearances, time, or some aspect of time, is unreal. In this final section, we will take a look at how three metaphysical debates concerning the nature of the world interact with accounts of time perception. The first debate concerns the reality of tense, that is, our division of time into past, present and future. Is time really divided in this way? Does what is present slip further and further into the past? Or does this picture merely reflect our perspective on a reality in which there is no uniquely privileged moment, the present, but simply an ordered series of moments? A-theorists say that our ordinary picture of the world as tensed reflects the world as it really is: the passage of time is an objective fact. B-theorists deny this. (The terms A-theory and B-theory derive from McTaggart’s (1908) distinction between two ways in which events can be ordered in time, either as an A-series—that is in terms of whether they are past, present or future — or as a B-series—that is according to whether they are earlier than, later than, or simultaneous with other events.) For B-theorists, the only objective temporal facts concern relations of precedence and simultaneity between events. (I ignore here the complications introduced by the Special Theory of Relativity, since B-theory—and perhaps A-theory also—can be reformulated in terms which are compatible with the Special Theory.) B-theorists do not deny that our tensed beliefs, such as the belief that a cold front is now passing, or that Sally’s wedding was two years ago, may be true, but they assert that what makes such beliefs true are not facts about the pastness, presentness or futurity of events, but tenseless facts concerning precedence and simultaneity (see Mellor 1998, Oaklander and Smith 1994). On one version of the B-theory, for example, my belief that there is a cold front now passing is true because the passing of the front is simultaneous with my forming the belief. Now one very serious challenge to the tenseless theorist is to explain why, if time does not pass in reality, it appears to do so. What, in B-theoretic terms, is the basis for our experience as-of the passage of time? The accounts we considered above, first of the temporal restrictions on our experience, and secondly of our experience of time order, did not explicitly appeal to tensed, or A-theoretic notions. The facts we did appeal to look like purely B-theoretic ones: that causes are always earlier than their effects, that things typically change slowly in relation to the speed of transmission of light and sound, that our information-processing capacities are limited, and that there can be causal connections between memories and experiences. So it may be that the tenseless theorist can discharge the obligation to explain why time seems to pass. But two doubts remain. First, perhaps the A- theorist can produce a simpler explanation of our experience. Second, it may turn out that supposedly B-series facts are dependent upon A-series ones, so that, for example, a and b are simultaneous by virtue of the fact that both are present. What is clear, though, is that there is no direct argument from experience to the A-theory, since the present of experience, being temporally extended and concerning the past, is very different from the objective present postulated by the A-theory. Further, it cannot be taken for granted that the objective passage of time would explain whatever it is that the experience as-of time’s passage is supposed to amount to. (See Prosser 2005, 2007, 2012, 2016, 2018.) The second metaphysical issue that has a crucial bearing on time perception is connected with the A/B-theory dispute, and that is the debate between presentists and eternalists. Presentists hold that only the present exists (for an articulation of various kinds of presentism, and the challenges they face, see Bourne 2006), whereas eternalists grant equal reality to all times. the two debates, A- versus B-theory and presentism versus eternalism, do not map precisely onto each other. Arguably, B-theory is committed to eternalism, but A-theorists may not necessarily endorse presentism (though Bourne argues that they should). How might his be connected to perception? According to the indirect (or, as it is sometimes called, representative) theory of perception, we perceive external objects only by perceiving some intermediate object, a sense datum. According to the direct theory, in contrast, perception of external objects involves no such intermediary. Now, external objects are at varying distances from us, and, as noted above, since light and sound travel at finite speeds, that means that the state of objects that we perceive will necessarily lie in the past. In the case of stars, where the distances are very considerable, the time gap between light leaving the star and our perceiving it may be one of many years. The presentist holds that past states, events and objects are no longer real. But if all that we perceive in the external world is past, then it seems that the objects of our perception (or at least the states of those objects that we perceive) are unreal. It is hard to reconcile this with the direct theory of perception. It looks on the face of it, therefore, that presentists are committed to the indirect theory of perception. (See Power 2010a, 2010b, 2018, Le Poidevin 2015b.) The third and final metaphysical issue that we will discuss in the context of time perception concerns causal asymmetry. The account of our sense of being located at a time which we considered under Past, present and the passage of time rested on the assumption that causation is asymmetric. Later events, it was suggested, cannot affect earlier ones, as a matter of mind-independent fact, and this is why we do not perceive the future, only the past. But attempts to explain the basis of causal asymmetry, in terms for example of counterfactual dependence, or in probabilistic terms, are notoriously problematic. One moral we might draw from the difficulties of reducing causal asymmetry to other asymmetries is that causal asymmetry is primitive, and so irreducible. Another is that that the search for a mind-independent account is mistaken. Perhaps causation in intrinsically symmetric, but some feature of our psychological constitution and relation to the world makes causation appear asymmetric. This causal perspectivalism is the line taken by Huw Price (1996). That causal asymmetry should be explained in part by our psychological constitution, in a way analogous to our understanding of secondary qualities such as colour, is a radical reversal of our ordinary assumptions, but then our ordinary understanding of a number of apparently objective features of the world—tense, absolute simultaneity—have met with similarly radical challenges. Now, if causal asymmetry is mind-dependent in this way, then we cannot appeal to it in accounting for our experience of temporal asymmetry—the difference between past and future. Further, it is not at all clear that perspectivalism can account for the perception of time order. The mechanism suggested by Mellor (see Time Order) exploited the asymmetry of causation: it is the fact that the perception of A causally influences the perception of B, but not vice versa, that gives rise to the perception of A’s being followed by B. We can represent this schematically as follows (where the arrow stands for an asymmetric causal relation): P(A)→P(B)→P(A<B) But if there is no objective asymmetry, then what is the explanation? Of course, we can still define causal order in terms of a causal betweenness relation, and we can say that the perceived order follows the objective causal order of the perceptions, in this sense: on the one hand, where A is perceived as being followed by B, then the perception of B is always causally between the perception of A and the perception of A’s being followed by B (the dash represents a symmetric causal relation): P(A) – P(B) – P(A<B) On the other hand, where B is perceived as being followed by A, the perception of A is always causally between the perception of B and the perception of B’s being followed by A: P(B) – P(A)) – P(B<A) But what, on the causal perspectivalist view, would rule out the following case? P(B<A) – P(A) – P(B) – P(A<B) For such a case would satisfy the above constraints. But it is a case in which A is perceived by an observer both as following, and as being followed by, B, and we know that such a case never occurs in experience. ‘Is perceived by x as followed by’ is an asymmetric relation (assuming we are dealing with a single sense modality), and so one that can be grounded in the causal relation only if the causal relation is itself asymmetric. Now if perspectivalism cannot meet the challenge to explain why, when B is perceived as following A, A is never perceived by the same observer as following B, it seems that our experience of time order, insofar as it has a causal explanation, requires causation to be objectively asymmetric. One strategy the causal perspectivalist could adopt (indeed, the only one available) is to explain the asymmetric principle above in terms of some objective non-causal asymmetry. Price, for example, allows an objective thermodynamic asymmetry, in that an ordered series of states of the universe will exhibit what he calls a thermodynamic gradient: entropy will be lower at one end of the series than at the end. We should resist the temptation to say that entropy increases, for that would be like asserting that a road goes uphill rather than downhill without conceding the perspectival nature of descriptions like ‘uphill’. Could such a thermodynamic asymmetry explain the perception of time order? That is a question for the reader to ponder.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

Epistemological Problems of Testimony

1. Reductionism and Non-Reductionism Consider this scenario: Your friend testifies to you that your favorite team won last night’s game (= p). Because you know that your friend is a highly reliable sports reporter, and because you have no reason to doubt what she says on this occasion, you believe …

1. Reductionism and Non-Reductionism Consider this scenario: Your friend testifies to you that your favorite team won last night’s game (= p). Because you know that your friend is a highly reliable sports reporter, and because you have no reason to doubt what she says on this occasion, you believe what you are told. In this case, your belief that p is clearly justified. Now, contrast that scenario with this one: You run into a stranger whom you have never met and they tell you that your favorite team won last night’s game (= p). Even though you don’t know if this person often speaks the truth, you also don’t have any good reason to doubt what they are telling you. Thus, you decide to believe what you are told. Whether or not your belief that p is justified in this case is a lot less clear. Thinking about the difference between cases like these helps motivate the debate about the following question: First Big Question: Is testimony a basic source of justification, or can testimonial justification be reduced to a combination of other epistemic sources? Those who defend answers to this question tend to endorse one of three main positions: Reductionism, Non-Reductionism, and Hybrid Views. 1.1 Reductionism Reductionists maintain that in order to acquire testimonial justification, one must have positive reasons for thinking that the speaker in question is a reliable testifier. More specifically, Reductionists endorse Positive Reasons: A hearer is justified in believing what a speaker says if, and only if, they (a) have positive reasons for thinking that the speaker’s testimony is reliable, where these reasons are not themselves ultimately based on testimony, and (b) do not have any undefeated defeaters[4] that indicate that the speaker’s testimony is false or unlikely to be true. Reductionist views trace at least as far back as David Hume’s (1740, 1748)—see Traiger (1993, 2010), Faulkner (1998), Root (2001), Fogelin (2005), van Cleve (2006), Gelfert (2010), and Shieber (2015) for more on Hume’s view in particular. More recently, other Reductionist views have been defended by E. Fricker (1987, 1994, 1995, 2002, 2006a, 2006b), Adler (1994, 2002), Lyons (1997), Lipton (1998), Shogenji (2006), Sutton (2007), Malmgren (2006), and Kenyon (2013). One of the primary motivations for Reductionism stems from concerns having to do with gullibility; that is, many reductionists maintain that if we could justifiably accept a speaker’s testimony without having positive reasons for thinking that they typically speak the truth, then we would be justified in accepting testimony in cases in which doing so would be clearly irresponsible. So for example, if a hearer does not need positive reasons for thinking that the speaker’s testimony is reliable, then one could be justified in believing the say-so of a random blogger on an arbitrary website so long as one did not have any reasons for doubting the testimony in question. Now, while all Reductionists endorse Positive Reasons, there is disagreement over exactly how this thesis should be understood. For this reason, Reductionists fall into one of two camps: Global Reductionists and Local Reductionists. According to Global Reductionism, in order to be justified in accepting a speaker’s testimony, you need to have positive reasons for believing that testimony is generally reliable, i.e., that accepting the reports of others is a reliable way of forming true beliefs. For instance, suppose that your friend tells you that he got a puppy. Global Reductionists maintain that you are only justified in accepting this report if you have positive reasons that support inferences like the following: It is in this sense that Global Reductionists think that testimonial justification can be reduced to a combination of perceptual, memorial, and inferential justification. That is, testimonial justification can be reduced to a combination of other epistemic sources because it only involves you (i) perceiving that the speaker made an utterance (ii) remembering that when people have told you things in the past, they turned out to be right most of the time and (iii) inferring on this basis that what you were told on this occasion is likely to be true. Historically, Global Reductionism has been saddled with three objections. First, opponents have argued that any attempt to acquire non-testimonially based reasons for thinking that testimony is generally reliable will either be viciously circular or else involve an insurmountable regress. For instance, in order to know that people generally speak the truth, I might need to rely on Bill’s testimony to confirm that what Alice said was true. But in order to know that Bill can be trusted, I might need to rely on Carly to confirm that he usually says true things. But to ensure that Carly typically speaks the truth, I will either need to rely on Alice or Bill to confirm this for me (hence the vicious circle), or else I will need to rely on on a fourth person like Donald (and hence the regress will continue). Thus, because there is no good way to acquire the non-testimonially based reasons in question, Global Reductionism problematically entails that we are rarely (if ever) justified in accepting what people tell us. See Coady (1992) for this worry, and see Wright (2016a, 2019) for an importantly different kind of circularity worry for all Reductionist views. Second, and relatedly, opponents have argued that in order to acquire non-testimonially based reasons for thinking that testimony is generally reliable, we would need to be exposed to loads and loads of facts that correspond to the things that we receive testimony about, i.e., in order to check if testimony about history, medicine, biology, etc., is generally reliable, we would need to have confirmed many of these facts for ourselves. However, most (if not all) of us simply lack the time and resources to confirm such things. Thus, Global Reductionism seems to problematically entail that we are rarely (if ever) justified in accepting what other people tell us. See, e.g., Coady (1992). To see the third worry with Global Reductionism, notice that Global Reductionists treat testimony as if it is a unified, homogeneous category, i.e., according to Global Reductionists, testimony in general can be a more or less reliable source of knowledge. The problem here is that we frequently receive testimony about wildly different topics, e.g., quantum mechanics, politics, one’s own music preferences, etc. And clearly testimony about some of these things is highly reliable (e.g., all of your friends are probably very good at speaking truly about what kinds of music they like), whereas testimony about other topics is less so (e.g., if your friends are like mine, then at least a few of them are probably a lot less reliable at speaking truly about politics). Thus, contra Global Reductionism, it is a mistake to treat testimony as a unified source of knowledge; that is, instead of thinking about testimony in general, we should think about the various categories of testimony in particular, e.g., categories differentiated by subject matter. For it is only when we think of testimony as being disaggregated in this way that it make sense to ask about whether receiving testimony about a particular category is a reliable source of knowledge. See, e.g., E. Fricker (1994). According to Local Reductionism, in order to be justified in accepting a speaker’s testimony, the hearer needs to have non-testimonially based reasons for thinking that the speaker in question is a reliable testifier on this occasion (as opposed to having positive reasons for thinking that testimony in general is reliable). For instance, suppose your friend tells you that he got a puppy and that you make the following inference: Local Reductionists maintain that you are only justified in accepting what you are told on this occasion if you have non-testimonially based reasons that support (1) and (2). For instance, perhaps you know that your friend usually speaks the truth about these sorts of things because you have known them for a long time. Or perhaps it is because you know that, generally speaking, anyone who takes the time to talk to you about their pets is probably telling the truth. Or perhaps it is because you know that when you ask people about their pets in this kind of context, it is highly likely that you will get an honest answer. Regardless of how these non-testimonially based reasons are acquired, it is in this sense that Local Reductionists also think that testimonial justification can be reduced to a combination of perceptual, memorial, and inferential justification, i.e., testimonial justification only consists in you perceiving that the speaker made an utterance and then inferring on this basis that what the speaker said on this occasion is likely to be true. Local Reductionists are well positioned to avoid the problems that plague Global Reductionism. This is because they are not committed to the claim that testimony is a unified category, i.e., instead of thinking about the reliability of testimony in general, we only need to think about the reliability of each piece of testimony that we are offered on a given occasion. Moreover, Local Reductionists do not maintain that in order to be justified in accepting a speaker’s say-so, one needs positive reasons for thinking that testimony in general is a reliable source of knowledge. Thus, even if you lack the resources to confirm that most people generally speak the truth, you can still have non-testimonially based reasons for thinking that what the speaker said is likely to be true on this occasion. For instance, if your relationship is long enough, you can come to know that your friend has a great track record of saying true things about getting new pets, since anytime they say this, you can just go over to their place and see their new puppy for yourself. And because you don’t need to rely on the testimony of a third party to acquire these positive reasons, there is no worry of running into the kinds of vicious circles or insurmountable regresses that Global Reductionists need to explain away. Historically, though, there are at least three problems that cause trouble for Local Reductionists. First, opponents have objected to Local Reductionism on the grounds that it problematically excludes young children (e.g., 3-year-olds) from justifiably accepting what their parents tell them. For if Local Reductionism is true, then in order to be justified in accepting a parent’s testimony, a young child would need non-testimonially based reasons for thinking that this parent is a reliable testifier. But youngsters simply lack the worldly experience to have good reasons for thinking that their parents’ reports are usually true, i.e., they have not been around long enough to have confirmed enough of these reports for themselves. Thus, Local Reductionism problematically precludes young children from being able to learn from the say-so of their parents. See, e.g., Audi (1997), and see also Harris (2002), Harris and Corriveau (2011) and Koenig and Harris (2007) for empirical results about children accepting the testimony of others. (Note: This objection poses a worry for Global Reductionists as well). Second, opponents have objected to Local Reductionism on the grounds that we can be justified in believing a speaker, S’s, testimony that p even if we lack the relevant non-testimonially based reasons to support the inference from “S said that p” to “p”. (See, e.g., Webb [1994: 263–264], Strawson [1994: 25], Schmitt [1999: 360] and Lackey [2008: 180]). For instance, suppose you arrive in a new country and spot someone on the street. And suppose that you approach this person and ask them for directions. Now, if that person tells you that your hotel is three blocks down the road, then it seems like you are justified in accepting their testimony that this is the case. But Local Reductionism cannot accommodate this result. For insofar as the only thing that justifies your belief is your inference from “This person said that my hotel is just down the road” to “My hotel is just down the road”, then since you know next to nothing about this stranger, and since you also know very little about whether anyone in this area is likely to answer this sort of question honestly, it is hard to see how your non-testimonially based reasons for accepting this person’s testimony are strong enough to justify you in believing what you are told on this occasion. (But see, e.g., Kenyon [2013], who defends Local Reductionism from this worry by arguing that even if a hearer knows very little about the speaker in question, they can still appeal to other contextual information to support their inference). Third, others have argued that given the current results in social psychology, there is good reason to reject Local Reductionism on the grounds that it makes testimonial justification too hard to come by. The worry here is that the evidence from social psychology suggests that humans are not very good at determining when a particular instance of testimony is false or unlikely to be true. Thus, insofar as Local Reductionists maintain that hearers need to be good at monitoring for these signs of falsehood and unreliability in order to have positive reasons for thinking that a particular instance of testimony is worth accepting, Local Reductionism problematically entails that we have way less testimonial justification than we previously thought. See Michaelian (2010, 2013) and Shieber (2012, 2015) for more on this style of objection, and see Sperber (2013) and Harris et al. (2018) for an empirical arguments to the contrary. (Note: This objection is not meant to just target Local Reductionism, but Reductionist views more generally). Reductionists have offered responses to all of the worries mentioned above. For instance, see Owen (1987), Sobel (1987), and Alvin Goldman (1999: Ch. 4) for a Bayesian analysis of how a hearer can acquire positive reasons for accepting a speaker’s testimony. See also E. Fricker (1995), Lipton (1998, 2007), Schiffer (2003), and Malmgren (2006) for more on how hearers can acquire these positive reasons via inference to the best explanation. And for more on debates surrounding Reductionism in general, see Faulkner (2000), Elgin (2002), Lackey (2005a, 2006), Goldberg and Henderson (2006), Kenyon (2013) and Graham (2018). Whether or not these responses succeed remains an open question. 1.2 Non-Reductionism According to Non-Reductionists, Positive Reasons is false, i.e., we don’t need positive reasons for thinking that a speaker’s testimony is reliable in order to be justified in believing what we are told. Instead, we have a defeasible but presumptive right to believe what people tell us. More specifically, Non-Reductionists endorse Presumptive Right: A hearer is justified (or warranted[5]) in believing what a speaker says if they do not have an undefeated defeater that indicates that the speaker’s testimony is false or unlikely to be true. (Some Non-Reductionists (e.g., Goldberg & Henderson 2006) maintain that in addition to simply lacking any relevant undefeated defeaters, the hearer must also be counterfactually sensitive to, or on the lookout for, the presence of defeaters in their environment). Non-Reductionism traces at least as far back as Thomas Reid’s (IE [1983, 94–95])—see Wolterstorff (2001) for more on Reid’s view. More recently, various versions of Non-Reductionism have been defended by Austin (1946 [1979]), Welbourne (1979, 1981, 1986, 1994), Evans (1982), A. Ross (1986), Hardwig (1985, 1991), Coady (1992, 1994), Burge (1993, 1997, 2013), Plantinga (1993), Stevenson (1993), Webb (1993), Dummett (1994), Foley (1994), McDowell (1994), Strawson (1994), Williamson (1996, 2000), Millgram (1997), Alvin Goldman (1999), Schmitt (1999), Insole (2000), Owens (2000), Rysiew (2000), Weiner (2003), Graham (2006a), Sosa (2006), McMyler (2011) and Baker and Clark (2018). See also Audi (1997, 1998, 2004, 2006), who defends Non-Reductionism about testimonial knowledge but not about testimonial justification. One motivation for Non-Reductionism stems from the desire to avoid the problems associated with the various forms of Reductionism, e.g., if hearers are not required to have positive reasons for thinking that the speaker’s testimony is reliable on this occasion, testimonial knowledge will not be too hard to acquire. Another motivation (i.e., Reid IE [1983, 94–95]) is rooted in the following idea: Whatever reason we have for thinking that perception is a basic source of justification, we have an analogous reason for thinking that testimony is a basic source of justification too. For instance, we can rely on a speaker’s testimony unless we have a good reason not to because humans are endowed—perhaps by God or just by nature—with the disposition to (a) tell the truth (b) believe what they are told and (c) have some sense of when a speaker is not to be trusted. However, because Non-Reductionists reject Positive Reasons, opponents have objected to the view on the grounds that it permits hearers to be irrationally gullible. For instance, recall the case in which you read a bit of testimony from an anonymous blogger on an arbitrary website (i.e., E. Fricker 2002). Or consider this situation: While on your way home from work you see a group of aliens from another planet drop a notebook written in what appears to be English. Upon reading the notebook, you see that the aliens seem to have testified that hungry tigers have eaten some of their friends (i.e., Lackey 2008: 168–169). While these cases are different in certain respects, they are related by the fact that while you do not have any defeaters that indicate that the testimony in question is false or unlikely to be true, you also do not have any positive reasons for accepting what the speaker says. Opponents of Non-Reductionism argue that because it would be irrational for you to accept either of these reports, these cases show that Non-Reductionism is false and that in order to be justified in believing what a speaker says, you really do need positive reasons for thinking that the speaker’s testimony is likely to be true. 1.3 Hybrid Views Finally, some epistemologists reject both Reductionism and Non-Reductionism in favor of various hybrid views. The primary motivation for these hybrid views is to capture what seems promising about the Reductionist and Non-Reductionist approaches while also avoiding the objections discussed above. For instance, instead of endorsing Reductionism and requiring that all hearers must possess strong, non-testimonially based positive reasons for thinking that the speaker in question is reliable, one might opt for a qualified hybrid view according to which (a) adults need to possess these positive reasons but (b) youngsters in the developmental phase do not, i.e., children are justified in believing a speaker’s testimony so long as they do not have any reasons to not do so. One upshot of this hybrid view is that unlike standard versions of Reductionism, it is possible for young children to be justified in believing what their parents tell them. See, e.g., E. Fricker (1995). Or, one might opt for a hybrid view according to which the hearer and the speaker both have an important role to play in the hearer’s ability to acquire testimonial justification, i.e., it takes two to tango, so to speak. For instance, perhaps a hearer does need to possess at least some non-testimonially based reasons for thinking that the speaker in question is a reliable testifier on this occasion. But insofar as the hearer’s inference from “S said that p” to “P” is not the only thing that justifies the hearer’s belief, these reasons do not need to be nearly as strong as standard Reductionists have made them out to be; that is, so long as the hearer’s non-testimonially based reasons render it not irrational to rely on the speaker’s say-so, then this is good enough. And this is because, in addition to the hearer having these weaker kinds of positive reasons, the speaker in question needs to actually be a reliable reporter. The hope here is that by requiring contributions from both the speaker and the hearer, all of the worries associated with standard versions of Reductionism and Non-reductionism can be avoided. For instance, by requiring that the speaker has these weaker kinds of positive reasons, this hybrid view can explain how young children can acquire testimonial justification while also avoiding the worries associated with gullibility. See, e.g., Lackey (2008). And for defenses of other hybrid views, see E. Fricker (2006b), Faulkner (2000), Lehrer (2006), and Pritchard (2006). Whether any of these hybrid views will ultimately succeed is still very much an open debate. However, opponents have worried that at least some of these accounts either run into the same objections that plagued standard versions of Reductionism and Non-Reductionism, or that they incur entirely new problems of their own, e.g., Insole (2000), Weiner (2003) and Lackey (2008). 2. Knowledge Transmission and Generation Consider this scenario: Gretchen knows that the bakery is closed. If Gretchen tells you that this is the case, and if all goes well, then it is uncontroversial that you can acquire testimonial knowledge that the bakery is closed too. Now, contrast that scenario with this one: Gretchen does not know that the bakery is closed (perhaps because she simply lacks any justification for believing this). Nevertheless, she testifies to you that the bakery is closed anyway. If you come to believe that the bakery is closed on the basis of Gretchen’s testimony, and if the bakery really is closed, then is it possible for your belief to amount to knowledge? Depending on how the details are filled in, things are much more controversial in this second scenario. The controversy centers on the following question: Second Big Question: Can testimony generate knowledge, or can it merely transmit it? Otherwise put, can a hearer acquire testimonial knowledge that p from a speaker who does not know that p themselves? Before moving on, two clarification points are in order. First, while much of the debate about the Transmission View has centered on whether testimony can only transmit knowledge, there is also some debate about whether testimony can transmit justification. (See, e.g., Audi [1997] who maintains that while testimony can generate justification, it can only transmit knowledge. See also Wright 2016a for a recent discussion of other views according to which testimony transmits knowledge but generates justification). Second, debates about knowledge transmission bear on debates about the Inheritance View (Section 3.1.2) and on the Individualism vs. Non-Individualism debate (Section 4). 2.1 The Transmission View According to the Transmission View, testimonial knowledge can only be transmitted from a speaker to a hearer. Here is one (but not the only) way of formulating this view in terms of necessity and sufficiency: TV-S: For every speaker, A, and hearer, B, if A knows that p, B comes to believe that p on the basis of A’s testimony and B has no undefeated defeaters for believing that p, then B comes to know that p too. (See Austin 1946 [1979]; Welbourne 1979, 1981, 1986, 1994; Evans 1982; E. Fricker 1987; Coady 1992; McDowell 1994; Adler 1996, 2002; Owens 2000, 2006; Burge 1993; Williamson 1996, 2000; and Audi 1997). TV-N: For every speaker, A, and hearer, B, B knows that p on the basis of A’s testimony only if A knows that p too. (See Welbourne 1979, 1981, 1986, 1994; Hardwig 1985, 1991; A. Ross 1986; Burge 1993, 1997; Plantinga 1993; Williamson 1996, 2000; Audi 1997, 1998, 2006; Owens 2000, 2006; Reynolds 2002; Adler 2002; Faulkner 2006; Schmitt 2006). One of the main motivations for the Transmission View comes from an alleged analogy between testimony and memory: Just as I cannot acquire memorial knowledge that p today if I did not know that p at some earlier point in time, I cannot acquire testimonial knowledge that p from you today if you do not know that p yourself. (But see Barnett 2015 for a recent discussion of the important differences between memory and testimony, and see Lackey 2005b for why memory can generate knowledge.) Despite the intuitive and theoretical appeal, the Transmission View has challenged in a variety of ways. 2.2 The Generation View Opponents have raised two importantly different kinds of arguments against TV-N. First, suppose that there is a creationist teacher, Stella, who does not believe, and thus fails to know, that homo sapiens evolved from homo erectus (= p). That is, while Stella has read the relevant text books on evolutionary theory, her creationist commitments prevent her from believing that p is true. Now, suppose that during one of her biology lessons Stella tells her fourth grade students that p, and suppose that her students come to believe that p on the basis of Stella’s testimony. The argument here is that the fourth graders can come to know that p on the basis of Stella’s testimony even though Stella herself does not believe, and thus does not know, that p is true. Thus, TV-N is false, i.e., testimonial knowledge can be generated from a speaker who lacks the knowledge in question. (This Creationist Teacher case comes from Lackey (2008). Other school teacher cases have been discussed in Graham (2006a) and Carter and Nickel (2014). Goldberg (2005) and Pelling (2013) also give cases in which a speaker’s belief is unsafe and does not amount to knowledge even though the hearer’s belief does).[6] While this first case involved a speaker who did not know that p because they did not believe it, the second type of objection to TV-N involves a speaker who does not know that p because they are not justified in believing it. For instance, consider Persia, who is a persistent believer in the following sense: Persia goes to her eye doctor, Eyal, who tells her that the eye drops she was just given will make her vision unreliable for the next three hours. While Eyal is a highly reliable testifier, he is wrong on this occasion, i.e., for some strange reason, the drops did not have this side-effect on Persia. However, while Persia has no reason to distrust Eyal, she ignores him on this occasion, walks out of his office, and sees a Badger in the parking lot. Because Persia is a persistent believer, she forms the true belief that there is a badger in the parking lot despite Eyal’s (misleading) testimony about the unreliability of her visual faculties. Later that day Persia runs into her friend, Fred, and tells him that there was a badger in the parking lot (= p). The argument here is that Eyal’s testimony constitutes an undefeated defeater that defeats Persia’s justification for believing that p. However, since Fred is completely unaware that Persia has the defeater, and because he has positive reasons for thinking that his friend is a reliable testifier, he does come to know that p on the basis of Persia’s say-so. Thus, TV-N is false (This Persistent Believer case comes from Lackey [2008]. It is worth noting that this case purports to show that testimonial justification can also be generated, i.e., Fred can acquire testimonial justification for believing p via Persia’s testimony even though Persia was not justified in believing p herself). In addition to targeting TV-N, opponents of the Transmission View have also targeted TV-S. Consider for instance, Quinn, who is so infatuated with his friend, Kevin, that he is compulsively trusting, i.e., Quinn believes anything that Kevin says, regardless of how outrageous Kevin’s claim may be. One day Kevin testifies to Quinn that he is moving to Brooklyn (= p). Kevin is being truthful, and he has terrific evidence that p is true (he is the one who is moving, after all). Unsurprisingly, Quinn believes what Kevin says. However, Quinn would also have believed Kevin even if he had massive amounts of evidence that Kevin was lying, or joking, or whatever. Opponents argue that while Kevin knows that p, Quinn does not, i.e., because of his compulsively trusting nature, Quinn’s attitude is insensitive to counterevidence in a way that precludes his belief from being amounting to knowledge. Thus, TV-S is false. (This Compulsively Trusting case comes from Lackey 2008. See also Graham 2000b). Much of the recent work on whether testimony generates or transmits knowledge concerns carefully distinguishing between different versions of TV-N and TV-S, and arguing that while some versions may face the problems mentioned here, others do not. See, e.g., Wright (2016a). 3. Testimony and Evidence Consider this scenario: Your friend testifies to you that the taco truck is open. Because you know that your friend is almost always right about this kind of thing, and because you have no reason to doubt what they are telling you on this occasion, you believe what you are told. While it is uncontroversial that your belief is justified in this case, scenarios like this one have generated lots of debate about the following question: Third Big Question: When a hearer is justified in believing that p on the basis of a speaker’s testimony, is the hearer’s belief justified by evidence? And if the hearer’s belief is justified by evidence, where does this evidence come from? 3.1 Evidential Views Some epistemologists maintain that our testimonially based beliefs are justified by evidence. However, there is disagreement about where exactly this evidence comes from. On the one hand, some maintain that this evidence must be supplied by the hearer. On the other hand, some maintain that this evidence must be supplied by the speaker. Let us consider these two views in turn. As we saw in Section 1, Reductionists maintain that because a hearer must have positive reasons for accepting a speaker’s testimony, testimonial justification can be reduced to a combination of other epistemic resources that the hearer possesses, i.e., the hearer’s memorial, perceptual, and inferential capacities. For this reason, Reductionists can maintain that a hearer’s testimonial-based beliefs are justified by evidence, where this evidence comes from the hearer’s inferences, i.e., inferences from the premise that the speaker said that p, to the conclusion that p is true. However, as we also saw in Section 1, Reductionists face a number of difficult challenges. For this reason, those who are sympathetic with an evidential approach to testimonial justification have offered an alternative account of how our testimonially based beliefs are justified. Instead of thinking about testimonial justification in terms of the evidence that a hearer possesses, some have offered an alternative account in which the hearer’s belief is justified by evidence that is supplied by the speaker. More specifically, consider The Inheritance View:[7] If a hearer acquires testimonial justification for believing that p on the basis of a speaker’s testimony, then the hearer’s belief that p is justified by whatever evidence is justifying the speaker’s belief that p. (See, e.g., Burge 1993, 1997;[8] McDowell 1994; Owens 2000, 2006; Schmitt 2006; Faulkner 2011; and Wright 2015, 2016b, 2016c, 2019[9]). (It is worth nothing that while this debate about evidence and justification is importantly different from the debate between Reductionists and Anti-Reductionists, some of the biggest proponents of the Inheritance View also endorse Anti-Reductionism, e.g., Burge 1993, 1997.) To begin to get a handle on the Inheritance View, suppose that you are justified in believing that the taco truck is busy because your friend just told you so. And suppose that your friend’s belief is justified by some excellent evidence, i.e., they are standing in front of the truck and can see the long lineup. According to the Inheritance view, the evidence that justifies your belief comes from, or is based on, the very same evidence that justifies your friend’s belief, i.e., your belief is based on your friend’s perception of a huge group of people waiting to order tacos. Or, consider this example from David Owens (2006: 120): Suppose that you are justified in believing that some math theorem, T, is true because you just proved it yourself on the basis of some impeccable a priori reasoning. If you testify to me that T is true such that I come to acquire testimonial justification for believing that this is the case, then according to the Inheritance View, my belief is also based on your impeccable a priori reasoning.[10] Now, while many epistemologists are sympathetic to the idea that your testimonial-based beliefs are justified by evidence, they disagree that the evidence in question is literally inherited from the speaker. Here are two reasons why. The first objection starts with the observation that a hearer can acquire testimonial justification for believing p even though the speaker’s evidence does not justify them in believing p. For instance, suppose that after an eye exam your optometrist tells you that your eyes will be dilated for a few hours and that your visual faculties will be unreliable during this time. Suppose also that as you are walking home it appears to you that there is a small puppy playing fetch in a field (= p). Thus, because you decide to completely and irrationally ignore what your doctor said, you decide to believe that p. Finally, suppose that unbeknownst to you, your doctor was a bit off and the effects of the eye medication have worn off such that your eyes are now functioning in a highly reliable way. Here it seems like your total evidence does not justify you in believing p. After all, given what your doctor said, you ought to think that your vision is still unreliable, i.e., your doctor’s testimony provides you with a defeater that makes it irrational for you to believe that what you are looking at is a small puppy (as opposed to, say, a really big kitten or an average sized raccoon). But, suppose that you decide to call and tell me that p anyway. Insofar as your visual faculties are actually working great, and insofar as I have no reason to think that your vision is screwed up, it does seem like I can acquire testimonial justification for believing that p on the basis of your say-so. And herein lies the problem. For if the Inheritance View is true, then I could not acquire testimonial justification on the basis of what you told me. After all, if your total evidence does not justify you in believing p, and if my belief is literally based on the evidence that you have, then I could not be justified in believing p either. But since I do seem to acquire testimonial justification for believing that p in this case, the Inheritance View is false. (This objection comes from Lackey’s [2008] Persistent Believer case. Graham (2006b) gives a similar objection, and Pelling (2013) offers a case in which a hearer seems to acquire testimonial justification from a speaker who has no good reason to believe what they say, but does so anyway on the basis of an irrational hunch.) To see the second problem with the Inheritance View, notice that a hearer can receive testimony from multiple speakers who each have excellent evidence for believing that p, but where their evidence conflicts in an important sense. For instance, suppose that two detectives are investigating who stole the curry from Sonya’s restaurant. And suppose that the first detective, Dell, has excellent evidence that justifies him in believing that that Steph is the only one who committed the crime. Thus, Dell infers that there is exactly one culprit. Moreover, suppose that the second detective, Doris, has excellent evidence that justifies her in believing that Seth is the only one who committed the crime. Thus, Doris also infers that there is exactly one culprit. Now, suppose that while Dell does testify to you that there is exactly one thief, he does not fill you in on the evidence that he has for thinking this. And suppose while Doris also tells you that there is exactly one thief, she does not fill you in on the evidence that she has for thinking this either. Even so, it seems like you are clearly justified in believing that there is exactly one culprit on the basis of what these detectives have told you. However—and herein lies the problem—if the Inheritance View is true, then it is hard to see how you could be justified in believing this. After all, you have inherited Dell’s evidence for believing that there is exactly one culprit (i.e., his evidence for thinking that Steph is guilty), and you have also inherited Doris’ evidence for thinking that there is exactly one culprit (i.e., her evidence for thinking that Seth is guilty). But taken together, your combined body of evidence conflicts in the sense that it does not justify you in thinking that there is exactly one thief. Thus, the Inheritance View is false. See Leonard (2018).[11] 3.2 Non-Evidential Views Instead of further developing these evidential views, some epistemologists maintain that our testimonial-based beliefs are not justified by evidence. More specifically, some argue that testimonial justification should be understood in terms of non-evidential assurances, while others contend that it should be understood in terms of the reliability of the processes that produced the belief in question. Let us consider both of these positions in turn. According to proponents of the Assurance View (also called the Interpersonal View), the problem with all of the theories discussed above is that they do not appreciate the epistemological significance of the interpersonal relationship that obtains between a speaker and their audience in a testimonial exchange. More specifically, consider The Assurance View: Because of the interpersonal relationship that obtains in a testimonial exchange, if a hearer acquires testimonial justification for believing that p on the basis of a speaker’s say-so, then the hearer’s belief is justified, at least in part,[12] by the speaker’s assurance, where this assurance is non-evidential in nature. (A. Ross 1986; Hinchman 2005, 2014; Moran 2005, 2018; Faulkner 2007, 2011; Zagzebski 2012; and McMyler 2011). In order to get a handle on this view, there are two things that need unpacking here. First, how should we understand the nature of the interpersonal relationship that is said to obtain in a testimonial exchange? And second, why is testimonial justification non-evidential in nature? Let us consider these questions in turn. First, proponents of the Assurance View maintain that the speech act of telling is key to understanding the relationship that a speaker has with their audience. This is because when a speaker tells their audience that p is true, she is doing much more than merely uttering p. Rather, she is inviting her audience to trust her that p is true; that is, she is assuring, or guaranteeing her audience that p is the case. More specifically, in order for a hearer to acquire testimonial justification, the speaker must tell them that p is true, where telling is understood along the following lines: Telling: S tells A that p iff A recognizes that S, in asserting that p, intends: that A gain access to an epistemic reason to believe that p, that A recognize S’s (ii)-intention, and that A gain access to the epistemic reason to believe that p as a direct result of A’s recognition of S’s (ii)-intention (Hinchman 2005: 567). The idea is that when your friend testifies to you that the ice cream shop is open (= p), they are not merely uttering something; rather, they are telling you that p. And by telling you that p, they are thereby assuring you that this really is the case.[13] Thus, when your friend tells you that p, i.e., when conditions (i)–(iv) are satisfied, they have established an important, interpersonal relationship with you, and you alone. This is because you are the only one that has been assured by your friend that p is true. It is in this sense, then, that proponents of the Assurance View maintain that there is an important interpersonal relationship that obtains between a speaker and their audience. This brings us to the second key question about the Assurance View: Even if testimony should be understood in terms of the speech act of telling, why does this mean that testimonial justification cannot be understood in terms of evidence? The idea here is that when your friend tells you that p, they are assuring you that p is true, and that this assurance is what is justifying your belief. Moreover—and this is the key—these assurances are non-evidential in nature. Here is one way that proponents of the Assurance View have argued for this claim: a piece of evidence, e, counts in favor of a proposition, p, regardless of what anyone intends (e.g., my fingerprint at the ice cream shop is evidence that I was there regardless of whether I wanted to leave the print behind); but a speaker’s assurance that p only counts in favor of p because they intended it to, i.e., a speaker cannot unintentionally assure you of anything; thus, the assurances that justify your testimonial-based beliefs are non-evidential in nature.[14] It is for this reason, then, that proponents of the Assurance View maintain that testimonial justification cannot be understood in terms of evidence. However, the Assurance View is not without problems of its own. One objection is that it is unclear how these non-evidential assurances can actually justify one’s belief. For instance, suppose that once again your friend tells you that the ice cream shop is open (= p). But suppose that unbeknownst to both of you, Evelyn is eavesdropping on the conversation. Thus, while your friend does not issue Evelyn an assurance (namely because they do not intend for her to believe what they say and thus fail to satisfy conditions (i)–(iv) in Telling), Evelyn clearly hears what your friend says. Finally, suppose that you and Evelyn are equally reliable consumers of testimony, that both of have the same background information about your friend, and that neither of you have any reason to doubt what your friend says on this occasion. The key question here is this: Insofar as you and Evelyn both believe that p because of what your friend said, epistemically speaking, is there any sense in which your belief is better off than Evelyn’s? Given the details of the case, it is hard to see what the important difference could be. Thus—and herein lies the problem—even though you were issued an assurance and Evelyn was not, the assurance in question seems epistemically superfluous, i.e., it makes no difference to the epistemic status of one’s belief. Thus, proponents of the Assurance View must explain how assurances can justify one’s beliefs, given that they seem epistemically inert. (This case comes from Lackey 2008. Owens 2006 and Schmitt 2010 raise similar worries). A second problem is that in order to make the case that testimonial justification is non-evidential in nature, proponents of the Assurance View have over-cognized what is involved in a testimonial exchange. To see why, notice that Telling requires that the speaker and the hearer both have the cognitive capacity to recognize that other people have higher-order mental states, i.e., both parties must be cognitively capable of recognizing that people have mental states about other people’s mental states. For instance, in order for you to satisfy all of the conditions in Telling, you must believe (that your friend intends [that you believe (that your friend is intending [that you acquire an epistemic reason for belief because you recognizes that you friend is intending to offer one)]). But decades of literature in developmental psychology suggest that for neuro-typical children, the ability to recognize that people have higher order mental states is not acquired until around five or six years old. Moreover, this literature also suggests that for people with autism, the ability to do this is not acquired until much later in life, if it is acquired at all. Thus, insofar as young children and people with autism can acquire testimonial justification from their parents, say, then the Assurance View should be rejected on the grounds that it problematically excludes these people from acquiring something of epistemic importance. See Leonard (2016). Testimonial Reliabilists also deny that our testimonial-based beliefs are justified by evidence. But instead of claiming that they are justified by non-evidential assurances, the idea is that: Testimonial Reliabilism:[15] A hearer’s testimonial justification consists in the reliability of the processes involved in the production of the hearer’ testimonially-based belief. (See, e.g., Graham 2000a, 2000b, 2006a;[16] Goldberg 2010a; and Sosa 2010). To get a better handle on this view, suppose that your friend tells you that the concert starts in an hour and that you thereby acquire testimonial justification for believing that this is the case. In very broad strokes, Testimonial Reliabilists can explain the nature of your justification as follows: When it comes to concerts, your friend testifies truly almost all of the time; moreover, you are great at differentiating cases in which your friend is speaking honestly and when she is trying to deceive you; thus, you have testimonial justification in this case because the processes involved in the production and consumption of the testimony in question are highly reliable. It is worth noting that there are at least two important processes involved in a testimonial exchange. First, there are the processes involved in the production of the speaker’s testimony, i.e., the processes that are relevant to the likelihood that the testifier speaks the truth. Second, there are the processes involved in the hearer’s consumption of the testimony, i.e., the processes involved in the hearer being able to monitor for signs that what the speaker says is false or unlikely to be true. For this reason, Testimonial Reliabilism can be developed in a number of importantly different ways. For instance, one could opt for a view according to which a hearer’s testimonial justification for believing that p is only a matter of the reliability of the processes involved in the production of the speaker’s say-so. Or, one could opt for a view according to which testimonial justification only amounts to the reliability of the processes involved in the hearer’s consumption of the speaker’s testimony. Or, one could also opt for a view according to which all of the relevant processes matter. See Graham (2000a, 2000b, 2006a), Goldberg (2010a), and Sosa (2010) for recent defenses of Testimonial Reliabilism, and see Section 4 for additional versions of this view as well. Testimonial Reliabilism is motivated by the considerations that support Reliabilist theories of justification more generally, as well as its ability to avoid the problems that plague the views discussed above. Nevertheless, opponents have argued that Testimonial Reliabilism faces at least two problems of its own. First, insofar as there are at least two processes involved in a testimonial exchange, Testimonial Reliabilists are faced with the substantial challenge of specifying which of these processes are relevant to the hearer’s testimonial justification, i.e., Testimonial Reliabilists must give an account of which processes are relevant here, and they must do so in a way that captures every instance in which a hearer intuitively acquires testimonial justification from a speaker. (See Wright 2019, who argues that this is not merely an instance of the generality problem that poses a worry for Reliabilist views of justification more generally). Second, consider cases that involve one hearer and two sources of information. For instance, suppose that Rebecca, who is in fact a reliable testifier, tells you that traffic on I405 is bad. And suppose also that Umar, who is in fact an unreliable testifier, tells you that traffic on I90 is all clear. Finally, suppose that you do not have any reason to prefer one source of information over the other, i.e., for all you know, Rebecca and Umar are equally reliable testifiers. Now, consider the versions of Testimonial Reliabilism according to which the processes that are relevant to acquisition of testimonial justification are those that are involved in the speaker’s production of the testimony in question, as well as the hearer’s ability to discern when the speaker is being sincere. It seems that these Testimonial Reliabilists are committed to giving an asymmetric verdict in this case; that is, because the processes involved in the production of your belief based on Rebecca’s testimony are reliable, and because the processes involved in the production of your belief based on Umar’s testimony are not, this version of Testimonial Reliabilism is committed to the claim that while you do have testimonial justification for believing that the traffic on 1405 is bad, you do not have testimonial justification for believing that I90 is all clear. However, opponents have argued that this verdict is highly counterintuitive. After all, how could you possibly be justified in believing Rebecca’s testimony but not Umar’s, given that you have no reason to think that the former is in any way better than the latter? Thus, this version of Testimonial Reliabilism should be rejected. See Barnett (2015). 3.3 Hybrid Views We have seen that the evidential and non-evidential views discussed above offer very different takes on how our testimonial-based beliefs are justified. We have also seen that while these views have their advantages, they face some serious problems as well. Consequently, some epistemologists have argued that testimonial justification cannot be explained in a unified way. Instead, the strategy has been to offer hybrid views that combine various components of the accounts discussed above. For instance, some have tried to combine Reductionist and Reliabilist insights such that testimonial justification consists partly in the hearer’s evidence for accepting the speaker’s testimony, and partly in terms of the speaker’s and hearer’s reliability at producing and consuming testimony respectively, e.g., Lackey (2008). Others have tried to combine insights from Reductionism, Reliabilism and the Inheritance View such that a hearer’s belief can be justified by their own evidence for accepting what the speaker says, or by the reliability of the speaker’s testimony, or by inheriting the evidence that is possessed by the speaker, e.g., Wright (2019). (For other hybrid views, see Gerken 2013 and Faulkner 2000). Much of the recent work on testimonial justification concerns whether these hybrid views ultimately succeed, or whether they run into problems of their own. 4. Individualism and Anti-Individualism Consider Fourth Big Question: Should testimonial justification be understood individualistically, or anti- individualistically? Some epistemologists endorse Individualism: A complete account of testimonial justification can be given by appealing to features that only have to do with the hearer. Other epistemologists endorse Anti-Individualism: A complete account of testimonial justification cannot be given by only appealing to features that have to do with the hearer. For instance, according to some Anti-Individualists, acquiring testimonial justification involves features having to do with both the hearer and the speaker. And according to other Anti-Individualists, acquiring testimonial justification involves features having to do with both the hearer and the other speakers in the hearer’s local environment. For various defenses of Anti-Individualism, see, e.g., Graham (2000b), Lackey (2008), Goldberg (2010a), Kallestrup and Pritchard (2012), Gerken (2013), Pritchard (2015), and Palermos (forthcoming). (Note: In formulating these two views, I am being deliberately open-ended about how the “features” in question should be understood. As we will see below, this is because the debate between Individualists and Anti-Individualists cuts across the other debates about testimonial justification that we have explored above. Consequently, different members of each camp will want to give importantly different takes on what these features amount to.) 4.1 Individualism Suppose that Amanda tells Scott that the roller rink is open (= p) and that Scott thereby acquires testimonial justification for believing that p. To get a grip on one version of Individualism, recall the Reductionist views discussed in Section 1.1. According to Reductionists, testimonial justification consists in an inference that the hearer makes, i.e., the hearer’s inference from the claim that (a) the speaker said that p to the conclusion that (b) p is true. Thus, Reductionists are Individualists in the following sense: they maintain that whether or not a hearer acquires testimonial justification for believing p depends entirely on features having to do with the hearer, where these features include, e.g., the hearer’s perception of the speaker uttering p, the hearer remembering that testimony is generally reliable, and the hearer inferring on these grounds that what the speaker said on this occasion is likely to be true. To see a second version of Individualism, recall the our discussion of Testimonial Reliabilism in Section 3.2.2. According to some (but certainly not all) Testimonial Reliabilists, testimonial justification should be understood Individualistically because it consists only in the reliability of the cognitive processes that are internal to the hearer, i.e., the cognitive processes that take place exclusively in the mind of the hearer herself. See Alvin Goldman (1979, 1986) and Alston (1994, 1995). While we have seen a variety of problems for both of these views above, it is worth considering one challenge to this individualistic version of Testimonial Reliabilism in particular. Doing so will not only help shed light on why some Testimonial Reliabilists opt for an anti-individualistic view, it will also help illustrate how the debate about Individualism and Anti-Individualism cuts across the other debates we have considered above. To begin, consider these two cases from Goldberg (2010a): Good: Wilma has known Fred for a long time; she knows that he is a highly reliable speaker. So when Fred tells her that Barney has been at the stonecutters’ conference all day, Wilma believes him. (Fred appeared to her as sincere and competent as he normally does, and she found nothing remiss with the testimony.) In point of fact, Fred spoke from knowledge. Bad: Wilma has known Fred for a long time; she knows that he is a highly reliable speaker. So when Fred tells her that Barney has been at the stonecutters’ conference all day, Wilma believes him. (Fred appeared to her as sincere and competent as he normally does, and she found nothing remiss with the testimony.) However, in this case, Fred did not speak from knowledge. Instead, he was just making up a story about Barney, having had ulterior motives in getting Wilma to believe this story. (Fred has never done this before; it is out of his normally reliable character to do such a thing.) Even so, Fred’s speech contribution struck Wilma here, as in the good scenario, as sincere and competent; and she was not epistemically remiss in reaching this verdict… As luck would have it, though, Barney was in fact at the conference all day (though Fred, of course, did not know this). Contrasting these two cases motivates the following line of thought: It seems like Wilma knows that Barney was at the stonecutters’ conference (= p) in Good but not in Bad. It also seems like the cognitive processes that are internal to Wilma are the same across both cases. Thus, insofar as justification is what turns an unGettiered, true belief into knowledge, and insofar as Wilma’s unGettiered, true belief that p amounts to knowledge in Good but not in Bad, the cognitive processes involved in the acquisition of testimonial justification cannot just be the ones that are internal to Wilma. Thus, Testimonial Reliabilists should not endorse Individualism. See Goldberg (2010a) for this argument. 4.2 Anti-Individualism Contrasting the Good and Bad cases has motivated some Testimonial Reliabilists to endorse one version of Anti-Individualism. The core idea here is that insofar as testimonial justification should be understood in terms of the cognitive processes implicated in the production of the hearer’s belief that p, the relevant processes must include both (a) the processes involved in the production of the speaker’s testimony and (b) the processes involved in the hearer’s consumption of what the speaker said. For instance, the cognitive processes internal to Wilma were the highly reliable in both Good and Bad, e.g., in both cases she was equally good at monitoring for signs that Barney was being insincere. However, the processes internal to Barney that were implicated in his utterance that p were reliable in Good (i.e., Barney spoke from knowledge) but unreliable in Bad (i.e., Barney uttered that p in an attempt to be deceptive). Thus, by giving an account of testimonial justification that requires both the speaker and hearer to be reliable producers and consumers of testimony respectively, Testimonial Reliabilists who endorse this Anti-Individualistic approach can explain why Wilma’s belief seems better off in Good than it is in Bad. (Goldberg [2010a] defends Anti-Individualism on these grounds, and Graham (2000b) and Lackey (2008) also defend Anti-Individualistic views by requiring that in order for a hearer to acquire testimonial justification, not only does the hearer need to be a reliable consumer of testimony, the speaker needs to be a reliable testifier as well. Finally, Kallestrup and Pritchard (2012), Gerken (2013), Pritchard (2015), and Palermos (forthcoming) have recently defended versions of Anti-Individualism according to which the testifiers in the hearer’s local environment need to be reliable in order for the hearer to acquire testimonial knowledge from the particular speaker in question). To see a second and importantly different version of Anti-Individualism, recall the Inheritance View from Section 3.1.2. On this view, when a hearer acquires testimonial justification for believing p, this is because they literally inherit the justification that the speaker has for believing p. Thus, proponents of the Inheritance View are Anti-Individualists in the following sense: they maintain that whether or not a hearer acquires testimonial justification for believing p crucially depends on features having to do with the speaker, i.e., whether the speaker has any justification for the hearer to inherit. Whether or not either of these Anti-Individualistic approaches will ultimately succeed is a topic of current debate. Before moving on, it is worth noting that while we have been focusing on testimonial justification, similar debates between Individualists and Anti-Individualists can be had about testimonial knowledge. While many epistemologists endorse Individualism (Anti-Individualism) about both justification and knowledge, one need not do so. For instance, Audi (1997) endorses Reductionism about justification and the Transmission View about knowledge. On this picture, then, Individualism is true with respect to justification because whether or not a hearer acquires testimonial justification depends solely on the inferences that they make. However, Anti-Individualism is true with respect to knowledge because in order for a hearer to acquire testimonial knowledge that p, the speaker must also know that p. Keeping these distinctions in mind further illustrates how the debate between Individualists and Anti-Individualists cuts across so many of the other debates we have seen above. 5. Authoritative Testimony Here is a conversation that we might have: You: This plant is Pacific Poison Oak. Don’t touch it! Me: How do you know that? You: Suneet told me. He lives in this area a knows a little bit about plants. And here is another: You: This plant is Pacific Poison Oak. Don’t touch it! Me: How do you know that? You: Margae told me. She has a PhD in plant biology and studies this plant in particular. In both cases you have acquired testimonial knowledge. But in the second case it seems like your belief is better off, epistemically speaking. This is because in the first case your belief is based on the testimony of a layman who is somewhat knowledgeable about the topic at hand, whereas in the second case your belief is based on the testimony of an epistemic authority (or, someone who is both your epistemic superior and an expert about the domain in question). (See Zagzebski 2012; Jäger 2016; Croce 2018; and Constantin & Grundmann 2020 for more on how the notion of an epistemic authority should be understood.) But how exactly should the difference between epistemic authorities and everyone else be accounted for? Broadly speaking, those working on the epistemology of authoritative testimony endorse one of two accounts: Preemptive Accounts and Non-Preemptive Accounts. Those who endorse a Preemptive Account of authoritative testimony accept Preemption: The fact that an authority… [testifies] that p is a reason for me to believe that p which replaces my other reasons relevant to p and is not simply added to them. (Zagzebski 2012: 107) The key idea here is that when you get testimony from an authority that p, the authority’s testimony is now the only reason that you have for believing p, i.e., any other reasons you may have had are now preempted in the sense that they no longer count for or against p. Proponents of the Preemptive Account, then, explain the difference between authoritative and non-authoritative testimony as follows: Authoritative testimony can provide you with a preemptive reason for belief, whereas non-authoritative testimony cannot. For defenses of various versions of the Preemptive Account, see Zagzebski (2012, 2014, 2016), Keren (2007, 2014a, 2014b), Croce (2018) and Constantin and Grundmann (2020). See Anderson (2014), Dougherty (2014), Jäger (2016), Dormandy (2018), and Lackey (2018a) for some worries with this view. Those who endorse a Non-Preemptive Account of authoritative testimony argue that Preemption has wildly unintuitive consequences, e.g., if Preemption is true, then you can be justified in believing your pastor (who is otherwise reliable) when he tells you that women are inherently inferior to men (see, e.g., Lackey 2018a). Instead of thinking about authoritative testimony as providing preemptive reasons for belief, proponents of the Non-Preemptive Account take an authority’s testimony that p to provide a very strong reason to believe that p, where this reason is to be added to, or combined with, all of the other reasons that you have related to the proposition in question. See Dormandy (2018) and Lackey (2018a) for defenses of Non-Preemptive Accounts. For related debates about testimony and expertise, see Hardwig’s (1985) seminal paper on expert testimony in general, Alvin Goldman’s (2001) paper on determining which experts to trust when there is disagreement amongst them, and Goldberg’s (2009) paper that links issues in epistemology and philosophy of language by discussing how expert testimony bears on the semantics of technical terms. See also Kitcher (1993), Walton (1997), Brewer (1998) and Golanski (2001) for a discussion of expert testimony in the scientific setting, and for discussion of expert testimony in a legal setting, see Wells and Olson (2003). 6. Group Testimony While much attention has been paid to issues surrounding individual testimony, i.e., cases in which one speaker tells someone that p is true, recently epistemologists have started exploring a number of related questions regarding group testimony, i.e., cases in which a group testifies to someone that p is true. Here is one case that motivates this line of research. Population Commission: Consider the UN Population Commission that was established by the Economic and Social Council of the United Nations in 1946. The Commission was designed to assist the council by arranging studies and advising the council on population issues, trends, developments, policies, and so on. It is also charged with monitoring the implementation of policies designed by the United Nations to regulate population and to provide recommendations to the council and United Nations as a whole. The commission is composed of 47 members with a representative from almost every country in the United Nations. In 2002, the Commission released a report entitled Charting the Progress of Populations that provides information on 12 socio-economic indicators, including total population, maternal mortality, infant mortality, and so on. (Tollefsen 2007: 300–301) There are three things to notice here. First, consider a particular claim in the Charting the Progress of Populations report. For instance, let p be the claim that While the population in North America has risen, the population in Central America has stayed the same, and the population in South America has declined. At the time the report was released, no single member of the UN Population committee believed p. That is, none of the committee members were aware that p was true until the report was released and they read it for themselves. Second, and relatedly, before the report was released, none of the committee members had any evidence, or justification, for believing p. That is, while some members might have justifiably believed that the population in North America was on the rise, and while others might have justifiably believed that the population in South America was on the decline, and while others still might have justifiably believed that the population in Central America had stayed the same, given the way in which the labor was divided amongst the researchers, i.e., given that none of them had communicated their findings with one another, nobody had justification for thinking that p itself was true until after the report came out. Third, and finally, the UN Commission did seem to testify that p, i.e., their report did contain the group’s testimony about the population changes in the Americas. (Of course, this is not the only case that motivates the need for an epistemology of group testimony. Wikipedia, for instance, presents a number of interesting questions about what it would take for a group to testify, and when and why we should accept what a group says. See, e.g., Tollefsen 2009; Wray 2009; and Fallis 2008. Cases involving testimony from scientific groups also raise similar issues. See, e.g., Hardwig 1985 and Faulkner 2018). Cases like this give rise to at least five important questions. First, consider How should we understand the relationship between a group’s testimony that p and the testimony of the group’s individual members? On the one hand, Summativists maintain that a group’s testimony that p should be understood in terms of the testimony of some (or most, or all) of its members. On the other hand, Non-Summativists maintain that it is possible for a group to testify that p even if none of its members do. (See Tollefsen (2007) and Lackey (2014) for a defense of different Non-Summative positions). Relatedly, Deflationists maintain to a group’s testimony that p can be reduced to some individual’s testimony that p (regardless of whether those individuals are members of the group, or just mere spokesmen), whereas Inflationists maintain that a group itself can be a testifier. (See Tollefsen (2007) for a defense of the latter, and see Lackey’s (2014) for a deflationary account of the epistemology of group testimony and her (2018a) for an inflationary account of the nature of group assertion). Second, consider Under what conditions is a hearer justified in believing a group’s testimony that p? The debate surrounding this question is analogous to the Reductionist/Anti-reductionist debate about individual testimony in Section 1. See Tollefsen (2007) for a defense of a reductionist view. Third, consider If you are justified in believing that p on the basis of a group’s testimony, is your belief justified by evidence? The debate surrounding this question is analogous to the debates about individual testimony discussed in Section 3. For instance, suppose that you are justified in believing that p on the basis of a group’s testimony that p. Miranda Fricker (2012) defends an Assurance View according to which your belief is justified by the group’s assurance that p (but see Faulkner (2018) for a criticism of this view). Lackey (2014) defends a reliabilist account according to which your belief is justified by the reliability (or truth conduciveness) of the group’s statement that p (but see Faulkner (2018) for a criticism of this view too). Finally, Faulkner (2018) defends a qualified Inheritance View according to which your belief that p can be justified by the justification that the group has (or at least has access to). Fourth, consider, Can group testimony generate knowledge, or can it merely transmit it? The debate surrounding this question is analogous to the debates about individual testimony in Section 2. On the one hand, Faulkner (2018) defends a qualified Transmission View according to which you can only acquire testimonial knowledge and justification from a group’s testimony that p if that group has, or at least has access to, a body of justification that supports p. On the other hand, Lackey (2014) defends a view that is compatible with a group’s testimony generating knowledge and justification. Fifth, and finally, consider, What, if anything, does a group’s testimony that p entail about that group’s knowledge (and thus belief) that p? More specifically, suppose that a group testifies that p and that you come to know that p on this basis. Does the fact that you acquired testimonial knowledge in this case entail that groups themselves can be knowers (and thus believers)? On the one hand, John Hardwig (1985) argues for a positive answer here. That is, Hardwig argues that if we acknowledge that groups can testify, we should also acknowledge that groups themselves can be knowers, and thus believers too (see also Lackey (2016) for an argument to the effect that groups can possess justified beliefs). On the other hand, Faulkner (2018) argues against this line of thought and suggests that even if groups can testify, this does not entail that they possess any mental states. Of course, there is much more work that can, and should, be done about the epistemological significance of receiving testimony from groups. 7. The Nature of Testimony Itself Until now we have been operating with an intuitive but inexact notion of what counts as testimony, i.e., for the most part, we have just been looking at cases in which speakers say stuff. But how exactly should the speech act of testimony be understood? That is, how should testimony be individuated from the other things that one can do with their words? One answer is that testimony should simply be identified with assertion, i.e., one testifies that p if, and only if, one asserts that p. (E. Fricker 1987 and Sosa (1994) offer passing remarks in defense of this position). But while it is widely accepted that one must assert that p in order to testify that p, there is much debate about whether asserting that p is sufficient for testifying that p. (See Goldberg 2010b, though, who argues that asserting that p is not even necessary for testifying that p, and see the entry on Assertion for more about how this speech act should be understood). For instance, in addition to asserting that p, one influential account maintains that in order to testify that p, the following conditions must also be met: Testimony: S testifies by making some statement that p if and only if: (T1) S’s stating that p is evidence that p and is offered as evidence that p. (T2) S has the relevant competence, authority, or credentials to state truly that p. (T3) S’s statement that p is relevant to some disputed or unresolved question (which may or may not be whether p) and is directed to those who are in need of evidence on the matter. (Coady 1992: 42). However, opponents have objected to each of T1–T3. Here is just one example. Some have rejected T1 on the grounds that one can testify that p even though the testimony itself does not provide the hearer with any evidence that p is true, e.g., if I tell you that humans spontaneously combust all the time, and insofar as you know that I am wildly unreliable about this issue, it seems like I have testified to you even though my testimony provides no evidence whatsoever for the proposition in question. (See E. Fricker (1995) and Lackey (2008). See Lackey (2008: Ch. 1) for a discussion of other problems with this view). In light of worries like these, many authors have offered alternative takes on how testimony should be characterized. For instance, E. Fricker (1995: 396–7) argues that testimony should just be understood in a very general sense, with “no restrictions either on the subject matter, or on the speaker’s epistemic relation to it.” (See also Audi (1997) and Sosa (1991) for views in this ballpark). And, as we saw in Section 3.2.1, proponents of the Assurance View understand testimony in terms of Telling. Graham (1997: 227) offers a different account of testimony based on conveying information, i.e., a speaker, S, testifies that p if, and only if, (i) S’s stating that p is offered as evidence that p (ii) S intends that his audience believe that he has the relevant competence, authority, or credentials to state truly that p and (iii) S’s statement that p is believed by S to be relevant to some question that he believes is disputed or unresolved (which may or may not be whether p) and is directed at those whom he believes to be in need of evidence on the matter. (J. Ross 1975 and Elgin (2002) also offer accounts that crucially hinge on the speaker’s statement purporting to convey information). And Lackey (2008: 30–32) offers a disjunctive account of testimony according to which we need to distinguish between speaker testimony and hearer testimony as follows. Speaker Testimony: S s-testifies that p by performing an act of communication a if and only if, in performing a, S reasonably intends to convey the information that p (in part) in virtue of a’s communicable content. Hearer Testimony: S h-testifies that p by making an act of communication a if and only if H, S’s hearer, reasonably takes a as conveying the information that p (in part) in virtue of a’s communicable content. One upshot of this disjunctive account is that it captures the sense in which testimony is often an intentional act performed by the speaker, as well as the sense in which testimony is a source of knowledge and justified belief regardless of what the speaker intended to say. Regardless of how testimony itself should be understood, all of these authors agree that it is possible to learn from the testimony of others. As we have seen, though, explaining how it is that we can learn from what other people tell us has proven to be a difficult task.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

Tarski’s Truth Definitions

1. The 1933 programme and the semantic conception In the late 1920s Alfred Tarski embarked on a project to give rigorous definitions for notions useful in scientific methodology. In 1933 he published (in Polish) his analysis of the notion of a true sentence. This long paper undertook two tasks: first …

1. The 1933 programme and the semantic conception In the late 1920s Alfred Tarski embarked on a project to give rigorous definitions for notions useful in scientific methodology. In 1933 he published (in Polish) his analysis of the notion of a true sentence. This long paper undertook two tasks: first to say what should count as a satisfactory definition of ‘true sentence’ for a given formal language, and second to show that there do exist satisfactory definitions of ‘true sentence’ for a range of formal languages. We begin with the first task; Section 2 will consider the second. We say that a language is fully interpreted if all its sentences have meanings that make them either true or false. All the languages that Tarski considered in the 1933 paper were fully interpreted, with one exception described in Section 2.2 below. This was the main difference between the 1933 definition and the later model-theoretic definition of 1956, which we shall examine in Section 3. Tarski described several conditions that a satisfactory definition of truth should meet. 1.1 Object language and metalanguage If the language under discussion (the object language) is \(L\), then the definition should be given in another language known as the metalanguage, call it \(M\). The metalanguage should contain a copy of the object language (so that anything one can say in \(L\) can be said in \(M\) too), and \(M\) should also be able to talk about the sentences of \(L\) and their syntax. Finally Tarski allowed \(M\) to contain notions from set theory, and a 1-ary predicate symbol True with the intended reading ‘is a true sentence of \(L\)’. The main purpose of the metalanguage was to formalise what was being said about the object language, and so Tarski also required that the metalanguage should carry with it a set of axioms expressing everything that one needs to assume for purposes of defining and justifying the truth definition. The truth definition itself was to be a definition of True in terms of the other expressions of the metalanguage. So the definition was to be in terms of syntax, set theory and the notions expressible in \(L\), but not semantic notions like ‘denote’ or ‘mean’ (unless the object language happened to contain these notions). Tarski assumed, in the manner of his time, that the object language \(L\) and the metalanguage \(M\) would be languages of some kind of higher order logic. Today it is more usual to take some kind of informal set theory as one’s metalanguage; this would affect a few details of Tarski’s paper but not its main thrust. Also today it is usual to define syntax in set-theoretic terms, so that for example a string of letters becomes a sequence. In fact one must use a set-theoretic syntax if one wants to work with an object language that has uncountably many symbols, as model theorists have done freely for over half a century now. 1.2 Formal correctness The definition of True should be ‘formally correct’. This means that it should be a sentence of the form For all \(x\), True\((x)\) if and only if \(\phi(x)\), where True never occurs in \(\phi\); or failing this, that the definition should be provably equivalent to a sentence of this form. The equivalence must be provable using axioms of the metalanguage that don’t contain True. Definitions of the kind displayed above are usually called explicit, though Tarski in 1933 called them normal. 1.3 Material adequacy The definition should be ‘materially adequate’ (trafny – a better translation would be ‘accurate’). This means that the objects satisfying \(\phi\) should be exactly the objects that we would intuitively count as being true sentences of \(L\), and that this fact should be provable from the axioms of the metalanguage. At first sight this is a paradoxical requirement: if we can prove what Tarski asks for, just from the axioms of the metalanguage, then we must already have a materially adequate formalisation of ‘true sentence of \(L\)’ within the metalanguage, suggesting an infinite regress. In fact Tarski escapes the paradox by using (in general) infinitely many sentences of \(M\) to express truth, namely all the sentences of the form whenever \(s\) is the name of a sentence \(S\) of \(L\) and \(\psi\) is the copy of \(S\) in the metalanguage. So the technical problem is to find a single formula \(\phi\) that allows us to deduce all these sentences from the axioms of \(M\); this formula \(\phi\) will serve to give the explicit definition of True. Tarski’s own name for this criterion of material adequacy was Convention T. More generally his name for his approach to defining truth, using this criterion, was the semantic conception of truth. As Tarski himself emphasised, Convention \(T\) rapidly leads to the liar paradox if the language \(L\) has enough resources to talk about its own semantics. (See the entry on the revision theory of truth.) Tarski’s own conclusion was that a truth definition for a language \(L\) has to be given in a metalanguage which is essentially stronger than \(L\). There is a consequence for the foundations of mathematics. First-order Zermelo-Fraenkel set theory is widely regarded as the standard of mathematical correctness, in the sense that a proof is correct if and only if it can be formalised as a formal proof in set theory. We would like to be able to give a truth definition for set theory; but by Tarski’s result this truth definition can’t be given in set theory itself. The usual solution is to give the truth definition informally in English. But there are a number of ways of giving limited formal truth definitions for set theory. For example Azriel Levy showed that for every natural number \(n\) there is a \(\Sigma_n\) formula that is satisfied by all and only the set-theoretic names of true \(\Sigma_n\) sentences of set theory. The definition of \(\Sigma_n\) is too technical to give here, but three points are worth making. First, every sentence of set theory is provably equivalent to a \(\Sigma_n\) sentence for any large enough \(n\). Second, the class of \(\Sigma_n\) formulas is closed under adding existential quantifiers at the beginning, but not under adding universal quantifiers. Third, the class is not closed under negation; this is how Levy escapes Tarski’s paradox. (See the entry on set theory.) Essentially the same devices allow Jaakko Hintikka to give an internal truth definition for his independence friendly logic; this logic shares the second and third properties of Levy’s classes of formulas. 2. Some kinds of truth definition on the 1933 pattern In his 1933 paper Tarski went on to show that many fully interpreted formal languages do have a truth definition that satisfies his conditions. He gave four examples in that paper. One was a trivial definition for a finite language; it simply listed the finitely many true sentences. One was a definition by quantifier elimination; see Section 2.2 below. The remaining two, for different classes of language, were examples of what people today think of as the standard Tarski truth definition; they are forerunners of the 1956 model-theoretic definition. 2.1 The standard truth definitions The two standard truth definitions are at first glance not definitions of truth at all, but definitions of a more complicated relation involving assignments \(a\) of objects to variables: (where the symbol ‘\(F\)’ is a placeholder for a name of a particular formula of the object language). In fact satisfaction reduces to truth in this sense: \(a\) satisfies the formula \(F\) if and only if taking each free variable in \(F\) as a name of the object assigned to it by \(a\) makes the formula \(F\) into a true sentence. So it follows that our intuitions about when a sentence is true can guide our intuitions about when an assignment satisfies a formula. But none of this can enter into the formal definition of truth, because ‘taking a variable as a name of an object’ is a semantic notion, and Tarski’s truth definition has to be built only on notions from syntax and set theory (together with those in the object language); recall Section 1.1. In fact Tarski’s reduction goes in the other direction: if the formula \(F\) has no free variables, then to say that \(F\) is true is to say that every assignment satisfies it. The reason why Tarski defines satisfaction directly, and then deduces a definition of truth, is that satisfaction obeys recursive conditions in the following sense: if \(F\) is a compound formula, then to know which assignments satisfy \(F\), it’s enough to know which assignments satisfy the immediate constituents of \(F\). Here are two typical examples: We have to use a different approach for atomic formulas. But for these, at least assuming for simplicity that \(L\) has no function symbols, we can use the metalanguage copies \(\#(R)\) of the predicate symbols \(R\) of the object language. Thus: (Warning: the expression \(\#\) is in the metametalanguage, not in the metalanguage \(M\). We may or may not be able to find a formula of \(M\) that expresses \(\#\) for predicate symbols; it depends on exactly what the language \(L\) is.) Subject to the mild reservation in the next paragraph, Tarski’s definition of satisfaction is compositional, meaning that the class of assignments which satisfy a compound formula \(F\) is determined solely by (1) the syntactic rule used to construct \(F\) from its immediate constituents and (2) the classes of assignments that satisfy these immediate constituents. (This is sometimes phrased loosely as: satisfaction is defined recursively. But this formulation misses the central point, that (1) and (2) don’t contain any syntactic information about the immediate constituents.) Compositionality explains why Tarski switched from truth to satisfaction. You can’t define whether ‘For all \(x, G\)’ is true in terms of whether \(G\) is true, because in general \(G\) has a free variable \(x\) and so it isn’t either true or false. The reservation is that Tarski’s definition of satisfaction in the 1933 paper doesn’t in fact mention the class of assignments that satisfy a formula \(F\). Instead, as we saw, he defines the relation ‘\(a\) satisfies \(F\)’, which determines what that class is. This is probably the main reason why some people (including Tarski himself in conversation, as reported by Barbara Partee) have preferred not to describe the 1933 definition as compositional. But the class format, which is compositional on any reckoning, does appear in an early variant of the truth definition in Tarski’s paper of 1931 on definable sets of real numbers. Tarski had a good reason for preferring the format ‘\(a\) satisfies \(F\)’ in his 1933 paper, namely that it allowed him to reduce the set-theoretic requirements of the truth definition. In sections 4 and 5 of the 1933 paper he spelled out these requirements carefully. The name ‘compositional(ity)’ first appears in papers of Putnam in 1960 (published 1975) and Katz and Fodor in 1963 on natural language semantics. In talking about compositionality, we have moved to thinking of Tarski’s definition as a semantics, i.e. a way of assigning ‘meanings’ to formulas. (Here we take the meaning of a sentence to be its truth value.) Compositionality means essentially that the meanings assigned to formulas give at least enough information to determine the truth values of sentences containing them. One can ask conversely whether Tarski’s semantics provides only as much information as we need about each formula, in order to reach the truth values of sentences. If the answer is yes, we say that the semantics is fully abstract (for truth). One can show fairly easily, for any of the standard languages of logic, that Tarski’s definition of satisfaction is in fact fully abstract. As it stands, Tarski’s definition of satisfaction is not an explicit definition, because satisfaction for one formula is defined in terms of satisfaction for other formulas. So to show that it is formally correct, we need a way of converting it to an explicit definition. One way to do this is as follows, using either higher order logic or set theory. Suppose we write \(S\) for a binary relation between assignments and formulas. We say that \(S\) is a satisfaction relation if for every formula \(G, S\) meets the conditions put for satisfaction of \(G\) by Tarski’s definition. For example, if \(G\) is ‘\(G_1\) and \(G_2\)’, \(S\) should satisfy the following condition for every assignment \(a\): We can define ‘satisfaction relation’ formally, using the recursive clauses and the conditions for atomic formulas in Tarski’s recursive definition. Now we prove, by induction on the complexity of formulas, that there is exactly one satisfaction relation \(S\). (There are some technical subtleties, but it can be done.) Finally we define \(a\) satisfies \(F\) if and only if: there is a satisfaction relation \(S\) such that \(S(a,F)\). It is then a technical exercise to show that this definition of satisfaction is materially adequate. Actually one must first write out the counterpart of Convention \(T\) for satisfaction of formulas, but I leave this to the reader. 2.2 The truth definition by quantifier elimination The remaining truth definition in Tarski’s 1933 paper – the third as they appear in the paper – is really a bundle of related truth definitions, all for the same object language \(L\) but in different interpretations. The quantifiers of \(L\) are assumed to range over a particular class, call it \(A\); in fact they are second order quantifiers, so that really they range over the collection of subclasses of \(A\). The class \(A\) is not named explicitly in the object language, and thus one can give separate truth definitions for different values of \(A\), as Tarski proceeds to do. So for this section of the paper, Tarski allows one and the same sentence to be given different interpretations; this is the exception to the general claim that his object language sentences are fully interpreted. But Tarski stays on the straight and narrow: he talks about ‘truth’ only in the special case where \(A\) is the class of all individuals. For other values of \(A\), he speaks not of ‘truth’ but of ‘correctness in the domain \(A\)’. These truth or correctness definitions don’t fall out of a definition of satisfaction. In fact they go by a much less direct route, which Tarski describes as a ‘purely accidental’ possibility that relies on the ‘specific peculiarities’ of the particular object language. It may be helpful to give a few more of the technical details than Tarski does, in a more familiar notation than Tarski’s, in order to show what is involved. Tarski refers his readers to a paper of Thoralf Skolem in 1919 for the technicalities. One can think of the language \(L\) as the first-order language with predicate symbols \(\subseteq\) and =. The language is interpreted as talking about the subclasses of the class \(A\). In this language we can define: Now we aim to prove: Lemma. Every formula \(F\) of \(L\) is equivalent to (i.e. is satisfied by exactly the same assignments as) some boolean combination of sentences of the form ‘There are exactly \(k\) elements in \(A\)’ and formulas of the form ‘There are exactly \(k\) elements that are in \(v_1\), not in \(v_2\), not in \(v_3\) and in \(v_4\)’ (or any other combination of this type, using only variables free in \(F)\). The proof is by induction on the complexity of formulas. For atomic formulas it is easy. For boolean combinations of formulas it is easy, since a boolean combination of boolean combinations is again a boolean combination. For formulas beginning with \(\forall\), we take the negation. This leaves just one case that involves any work, namely the case of a formula beginning with an existential quantifier. By induction hypothesis we can replace the part after the quantifier by a boolean combination of formulas of the kinds stated. So a typical case might be: \(\exists z\) (there are exactly two elements that are in \(z\) and \(x\) and not in \(y)\). This holds if and only if there are at least two elements that are in \(x\) and not in \(y\). We can write this in turn as: The number of elements in \(x\) and not in \(y\) is not 0 and is not 1; which is a boolean combination of allowed formulas. The general proof is very similar but more complicated. When the lemma has been proved, we look at what it says about a sentence. Since the sentence has no free variables, the lemma tells us that it is equivalent to a boolean combination of statements saying that \(A\) has a given finite number of elements. So if we know how many elements \(A\) has, we can immediately calculate whether the sentence is ‘correct in the domain \(A\)’. One more step and we are home. As we prove the lemma, we should gather up any facts that can be stated in \(L\), are true in every domain, and are needed for proving the lemma. For example we shall almost certainly need the sentence saying that \(\subseteq\) is transitive. Write \(T\) for the set of all these sentences. (In Tarski’s presentation \(T\) vanishes, since he is using higher order logic and the required statements about classes become theorems of logic.) Thus we reach, for example: Theorem. If the domain \(A\) is infinite, then a sentence \(S\) of the language \(L\) is correct in \(A\) if and only if \(S\) is deducible from \(T\) and the sentences saying that the number of elements of \(A\) is not any finite number. The class of all individuals is infinite (Tarski asserts), so the theorem applies when \(A\) is this class. And in this case Tarski has no inhibitions about saying not just ‘correct in \(A\)’ but ‘true’; so we have our truth definition. The method we have described revolves almost entirely around removing existential quantifiers from the beginnings of formulas; so it is known as the method of quantifier elimination. It is not as far as you might think from the two standard definitions. In all cases Tarski assigns to each formula, by induction on the complexity of formulas, a description of the class of assignments that satisfy the formula. In the two previous truth definitions this class is described directly; in the quantifier elimination case it is described in terms of a boolean combination of formulas of a simple kind. At around the same time as he was writing the 1933 paper, Tarski gave a truth definition by quantifier elimination for the first-order language of the field of real numbers. In his 1931 paper it appears only as an interesting way of characterising the set of relations definable by formulas. Later he gave a fuller account, emphasising that his method provided not just a truth definition but an algorithm for determining which sentences about the real numbers are true and which are false. 3. The 1956 definition and its offspring In 1933 Tarski assumed that the formal languages that he was dealing with had two kinds of symbol (apart from punctuation), namely constants and variables. The constants included logical constants, but also any other terms of fixed meaning. The variables had no independent meaning and were simply part of the apparatus of quantification. Model theory by contrast works with three levels of symbol. There are the logical constants \((=, \neg\), & for example), the variables (as before), and between these a middle group of symbols which have no fixed meaning but get a meaning through being applied to a particular structure. The symbols of this middle group include the nonlogical constants of the language, such as relation symbols, function symbols and constant individual symbols. They also include the quantifier symbols \(\forall\) and \(\exists\), since we need to refer to the structure to see what set they range over. This type of three-level language corresponds to mathematical usage; for example we write the addition operation of an abelian group as +, and this symbol stands for different functions in different groups. So one has to work a little to apply the 1933 definition to model-theoretic languages. There are basically two approaches: (1) Take one structure \(A\) at a time, and regard the nonlogical constants as constants, interpreted in \(A\). (2) Regard the nonlogical constants as variables, and use the 1933 definition to describe when a sentence is satisfied by an assignment of the ingredients of a structure \(A\) to these variables. There are problems with both these approaches, as Tarski himself describes in several places. The chief problem with (1) is that in model theory we very frequently want to use the same language in connection with two or more different structures – for example when we are defining elementary embeddings between structures (see the entry on first-order model theory). The problem with (2) is more abstract: it is disruptive and bad practice to talk of formulas with free variables being ‘true’. (We saw in Section 2.2 how Tarski avoided talking about truth in connection with sentences that have varying interpretations.) What Tarski did in practice, from the appearance of his textbook in 1936 to the late 1940s, was to use a version of (2) and simply avoid talking about model-theoretic sentences being true in structures; instead he gave an indirect definition of what it is for a structure to be a ‘model of’ a sentence, and apologised that strictly this was an abuse of language. (Chapter VI of Tarski 1994 still contains relics of this old approach.) By the late 1940s it had become clear that a direct model-theoretic truth definition was needed. Tarski and colleagues experimented with several ways of casting it. The version we use today is based on that published by Tarski and Robert Vaught in 1956. See the entry on classical logic for an exposition. The right way to think of the model-theoretic definition is that we have sentences whose truth value varies according to the situation where they are used. So the nonlogical constants are not variables; they are definite descriptions whose reference depends on the context. Likewise the quantifiers have this indexical feature, that the domain over which they range depends on the context of use. In this spirit one can add other kinds of indexing. For example a Kripke structure is an indexed family of structures, with a relation on the index set; these structures and their close relatives are fundamental for the semantics of modal, temporal and intuitionist logic. Already in the 1950s model theorists were interested in formal languages that include kinds of expression different from anything in Tarski’s 1933 paper. Extending the truth definition to infinitary logics was no problem at all. Nor was there any serious problem about most of the generalised quantifiers proposed at the time. For example there is a quantifier \(Qxy\) with the intended meaning: \(QxyF(x,y)\) if and only if there is an infinite set \(X\) of elements such that for all \(a\) and \(b\) in \(X, F(a,b)\). This definition itself shows at once how the required clause in the truth definition should go. In 1961 Leon Henkin pointed out two sorts of model-theoretic language that didn’t immediately have a truth definition of Tarski’s kind. The first had infinite strings of quantifiers: The second had quantifiers that are not linearly ordered. For ease of writing I use Hintikka’s later notation for these: Here the slash after \(\exists v_4\) means that this quantifier is outside the scope of the earlier quantifier \(\forall v_1\) (and also outside that of the earlier existential quantifier). Henkin pointed out that in both cases one could give a natural semantics in terms of Skolem functions. For example the second sentence can be paraphrased as which has a straightforward Tarski truth condition in second order logic. Hintikka then observed that one can read the Skolem functions as winning strategies in a game, as in the entry on logic and games. In this way one can build up a compositional semantics, by assigning to each formula a game. A sentence is true if and only if the player Myself (in Hintikka’s nomenclature) has a winning strategy for the game assigned to the sentence. This game semantics agrees with Tarski’s on conventional first-order sentences. But it is far from fully abstract; probably one should think of it as an operational semantics, describing how a sentence is verified rather than whether it is true. The problem of giving a Tarski-style semantics for Henkin’s two languages turned out to be different in the two cases. With the first, the problem is that the syntax of the language is not well-founded: there is an infinite descending sequence of subformulas as one strips off the quantifiers one by one. Hence there is no hope of giving a definition of satisfaction by recursion on the complexity of formulas. The remedy is to note that the explicit form of Tarski’s truth definition in Section 2.1 above didn’t require a recursive definition; it needed only that the conditions on the satisfaction relation \(S\) pin it down uniquely. For Henkin’s first style of language this is still true, though the reason is no longer the well-foundedness of the syntax. For Henkin’s second style of language, at least in Hintikka’s notation (see the entry on independence friendly logic), the syntax is well-founded, but the displacement of the quantifier scopes means that the usual quantifier clauses in the definition of satisfaction no longer work. To get a compositional and fully abstract semantics, one has to ask not what assignments of variables satisfy a formula, but what sets of assignments satisfy the formula ‘uniformly’, where ‘uniformly’ means ‘independent of assignments to certain variables, as shown by the slashes on quantifiers inside the formula’. (Further details of revisions of Tarski’s truth definition along these lines are in the entry on dependence logic.) Henkin’s second example is of more than theoretical interest, because clashes between the semantic and the syntactic scope of quantifiers occur very often in natural languages.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

Supervenience in Ethics

1. Theorizing Ethical Supervenience Many philosophers hope to make significant arguments about ethics using ethical supervenience as a premise. However, there are many distinct ethical supervenience theses that philosophers might be interested in. Understanding the differences between these theses can help to clarify which of them deserve our allegiance. It …

1. Theorizing Ethical Supervenience Many philosophers hope to make significant arguments about ethics using ethical supervenience as a premise. However, there are many distinct ethical supervenience theses that philosophers might be interested in. Understanding the differences between these theses can help to clarify which of them deserve our allegiance. It is also important because different supervenience theses will support quite different arguments about ethics. To begin, it is worth briefly characterizing certain core features of supervenience relations, as they are now standardly understood in metaphysics (see, e.g., the entry on supervenience). Supervenience relations are typically understood as relations between pairs of classes of properties. Consider the claim that a certain class of properties—the A-properties—cannot vary without the B-properties also varying. In this claim, we can call the A-properties the supervening properties, and the B-properties the subvening or base properties. Supervenience relations are covariance relations that have three logical features: they are reflexive, transitive, and non-symmetric. The claim that supervenience is reflexive means that every set of properties supervenes on itself: for any class of properties A, there can be no difference in the A-properties without a difference in the A-properties. The claim that supervenience is transitive means that: if the A-properties supervene on the B-properties, and the B-properties supervene on the C-properties, then the A-properties supervene on the C-properties. The claim that supervenience is non-symmetric means that supervenience is compatible with either symmetry (A supervenes on B and B supervenes on A; as in the case of the ethical and itself) or asymmetry (A supervenes on B but B does not supervene on A; as may be the case between the biological and the microphysical). These claims reflect how use of the word ‘supervenience’ has come to be usefully regimented in contemporary metaphysics. It is worth emphasizing this point, because there is a significant history of the word being used in ways that depart from this contemporary orthodoxy. For example, for a time it was quite common both in metaphysics and in ethics for ‘supervenience’ to be used to mark an asymmetrical dependence relation. Such uses are, however, inconsistent with the contemporary regimentation. This is a point about terminological clarity, not a substantive barrier to discussing such asymmetric relations. For example, one could name the asymmetric relation that holds when A supervenes on B but B does not supervene on A. Or one could name the relation that holds when the supervenience of A on B is accompanied by an adequate explanation. One influential variant of the latter sort of explanatory relation has been dubbed ‘superdupervenience’ (Horgan 1993, 566). More recently, many philosophers have suggested that a certain asymmetric dependence relation—grounding—is of central importance to our metaphysical theorizing. (For discussion, see the entry on metaphysical grounding.) Given the standard contemporary regimentation, however, supervenience claims state a certain pattern of covariation between classes of properties, they do not purport to explain that pattern, as a grounding or superdupervenience thesis would (compare DePaul 1987). This point is crucial to several arguments from ethical supervenience, as we will see below. These clarifying remarks put us in a position to introduce four central questions that can be used to develop alternative supervenience theses: The next four subsections consider these questions in turn. Before turning to these questions, it is worth briefly highlighting a different issue: which class of supervening properties to focus on? A survey of the literature provides a variety of suggestions: relevant supervening properties are characterized as ethical, moral, evaluative, or normative. The nature of each of these categories, and the relationship between them, are both controversial. For example, some philosophers will question the normative authority of morality, while others will think of normativity as a very broad tent, including any rule- or convention-governed activity, such as chess or etiquette. This entry will not explore these interesting issues (see Baker 2017 for discussion). Instead, it will provisionally assume that the significance of supervenience is similar for each of these classes of properties. For the sake of uniformity, the entry will focus on ethical properties throughout. 1.1 What does the ethical supervene on? Somewhat surprisingly, the idea of ethical supervenience can be made to seem plausible despite the fact that it is difficult to provide a characterization of what the ethical supervenes on that is at once uncontroversial and theoretically interesting (see Section 5.4 for further discussion of this point). This section briefly sketches the options for characterizing what the ethical supervenes on, and some difficulties that these options face. The thesis used to introduce supervenience above—Initial—suggested that the ethical supervenes on the natural properties. This is the most common way of characterizing ethical supervenience in the literature. However, there are at least two difficulties with this idea. The first difficulty is ambiguity: the term ‘natural’ has been characterized in wildly varying terms in metaethics (see the introductory section of the entry on moral non-naturalism for a brief survey of characterizations of the natural; see McPherson 2015, §3–4 for one constructive proposal). The second difficulty is that on many conceptions of the natural there will be counterexamples to Initial. For example, many philosophers want to contrast natural properties with supernatural properties. Even if we assume that there are no actually instantiated supernatural properties, we might allow that such entities are possible. But this might in turn seem to suggest that two possible states of affairs could be naturalistically identical, but ethically different. For example, they might be different because of ethically significant interactions between supernatural beings (Klagge 1984, 374–5; for some complications see McPherson 2015, 134–5). This sort of worry might lead one to reject the common assumption that the ethical supervenes on the natural as misguided; instead, one might propose that the ethical supervenes on the non-ethical. This might seem promising: the point of the embezzling bank manager case might seem to be that there would need to be some non-ethical difference between cases—natural or not—in order for there to be an ethical difference in the bank manager’s actions. However, there is an important worry about this way of characterizing the supervenience base (compare Sturgeon 2009, 70–72), which can be brought out briefly by example. Some philosophers are sympathetic to ambitious reductive hypotheses about ethics. On one such example, the ethical property of goodness is just identical to the property of pleasantness. Because identicals have all of the same properties, this would entail that pleasantness is an ethical property. Some philosophers also think that certain experiential or “phenomenal” properties, such as pleasantness, are metaphysically fundamental, such that two possible circumstances could differ only in how much pleasantness they contained. Together, the points entail the conclusion that two worlds could differ from each other solely in an ethical respect: how much goodness/pleasantness they include. This is inconsistent with the supervenience of the ethical on the non-ethical, but it is not clear that we should be prepared to dismiss out of hand the assumptions that generate this conclusion. This might in turn lead us to think that there can at least be reasonable controversy concerning the supervenience of the ethical on the non-ethical. One can avoid this problem by proposing that the ethical supervenes on the distribution of all of the properties. But this formulation purchases plausibility at the price of triviality. Ethical differences are differences, so there can obviously be no ethical difference without some difference. In light of its triviality, this sort of supervenience thesis fails to identify anything in ethical supervenience that is of philosophical interest. An influential alternative way of characterizing what the ethical supervenes on begins with a distinction in language. Some philosophers think that we can intuitively distinguish between broadly evaluative predicates (like ‘is right’, ‘is good’, ‘is virtuous’, etc.) from descriptive predicates (like ‘is round’, ‘is accelerating’, ‘is a badger’ etc.). We can then ask about the relationship between the properties that are picked out by these two sets of predicates. Frank Jackson has argued that this allows us to state an ethical supervenience thesis: there is no possible difference that can be stated using evaluative predicates between states that are identical with respect to all properties picked out by descriptive predicates (1998, 118–125). Jackson’s proposal seemingly avoids triviality, because evaluative and descriptive predicates appear to be distinct. However, the detour through language faces significant challenges. One challenge concerns the expressive power of a language like ours: if it is limited, then there seemingly might be ethical differences between states of affairs that are not correlated with descriptive differences expressible in a language like ours (for related worries, see Sturgeon 2009, 73–79). A second challenge questions whether the distinction between description and evaluation is characteristically a distinction in the semantic properties of predicates, as Jackson assumes. On one contrasting view, evaluation might instead characteristically be a pragmatic property of whole speech acts (see Väyrynen 2013b for extended defense of this idea for the case of “thick” evaluation.) In the face of these difficulties, some philosophers have sought to develop accounts of the class of properties which subvene the ethical which are substantive enough for ethical supervenience to do dialectical work, but avoid some of the difficulties just sketched. For example, it has been proposed that the ethical supervenes on the disjunctive class of non-ethical or descriptive properties (Ridge 2007). In the context of discussing arguments concerning supervenience and non-naturalism, it has been proposed that the ethical supervenes on the set of properties that are not ethical properties as those are understood by the non-naturalist (McPherson 2012). There is a cross-cutting distinction that may be important for our thinking about the supervenience of the ethical. Most properties are repeatable, in the sense that they can be possessed by distinct possible individuals. But some properties are not repeatable. For example, the property of being identical to Emad Atiq is not repeatable: it can only be borne by a single individual, across modal space. It appears plausible that the ethical properties supervene on a set of repeatable properties (Atiq forthcoming). As this brief survey makes clear, it is not obvious how to characterize what the ethical supervenes on, in a way that makes an ethical supervenience thesis both plausible and theoretically interesting. Now that the difficulties here have been made clear (especially by Sturgeon 2009), this is an important potential locus for future research. The following discussion largely sets aside these debates, speaking of the supervenience of the ethical properties on the base properties,where ‘base’ serves as a placeholder for a more illuminating characterization of the class of properties that subvene the ethical. 1.2 The structure of ethical supervenience There are many possible structures of covariation that have been called supervenience theses in the metaphysics literature. For our purposes, it will be convenient to distinguish four of the most influential formulations. (The literature on supervenience contains several other variations; see the entry on supervenience for an excellent introduction, from which this entry adopts some of the formulations below. That entry also has very helpful discussion of the contrast between supervenience and certain other metaphysical relations with which it is often associated. The contrast between supervenience and the closely-related notion of entailment, discussed in section 3.2 of the entry on supervenience, is especially germane to the topic of this subsection.) One important structural distinction concerns whether a thesis makes claims about the properties of individuals (individual supervenience theses), or is cast in terms of the character of whole possible worlds (global supervenience theses). The ethical properties globally supervene on the base properties just in case: Global Every pair of possible worlds that has exactly the same world-wide pattern of distribution of base properties, also has exactly the same world-wide pattern of distribution of ethical properties (cf. the entry on supervenience). Individual supervenience theses are so-called because they explicitly state patterns of instantiation of properties by individuals (rather than across whole possible worlds). There are two prominent sorts of individual supervenience theses in the literature. The ethical properties weakly supervene on the base properties just in case: Weak Necessarily, if anything x has some ethical property F, then there is at least one base property G such that x has G, and everything that has G has F (cf. the entry on supervenience). The ethical properties strongly supervene on the base properties just in case: Strong Necessarily, if anything x has some ethical property F, then there is at least one base property G such x has G, and necessarily everything that has G has F (cf. the entry on supervenience). The crucial difference between Strong and Weak supervenience is the second necessity operator in Strong. An example will make the difference here vivid: weak ethical supervenience is compatible with it being a brute fact that there are both “utilitarian” possible worlds where rightness covaries uniformly with happiness maximization, and “Kantian” possible worlds, where rightness covaries uniformly with satisfying the categorical imperative. By contrast, strong supervenience denies this possibility. It is generally agreed that strong supervenience entails global supervenience and weak supervenience; there is considerable controversy about whether global supervenience entails strong supervenience (see §4.3 of the entry on supervenience). Consider another important individual ethical supervenience relation, inspired by Brian McLaughlin (1995, 24) but stated less technically: Strong Intuitive If two possible entities are alike in all base respects, they are alike in all ethical respects. If we interpret ‘possible’ here as representing metaphysical modality, both McLaughlin and Jaegwon Kim (1993, 81) note that the Strong and Strong Intuitive supervenience relations are equivalent. However, Section 2 below will show that if we reinterpret the modalities involved, these theses will no longer be equivalent. 1.3 The modal strength of ethical supervenience So far this entry has talked freely of necessity, possibility, and possible worlds. However, one can use such talk to discuss importantly different modal standards: for example, philosophers talk of logical necessity, conceptual necessity, metaphysical necessity, nomic necessity, and normative necessity. The aim of this section is to briefly orient readers to each of these notions. To begin, consider some examples: On one traditional gloss, a sentence is logically necessary if it would remain true given any uniform and grammatically legitimate reinterpretation of the non-logical expressions of that sentence. Sentence (1) is a promising example: the only non-logical word in (1) is ‘bachelor’, and any uniform and grammatically appropriate interpretation of ‘bachelor’ in (1) will result in a true sentence. (For more on logical truths, see the entry on logical truth. Section 1.1 of that entry discusses the alleged modal force of logical truths.) By contrast, (2) is not a logical truth: one could easily hold fixed its logical structure, but vary the meaning of ‘bachelor’ or ‘unmarried’ and thereby produce a false sentence. However, (2) is a promising candidate to be conceptually necessary. On one gloss, a sentence is conceptually necessary (or “analytically true”) if it is true solely in virtue of the meanings or concepts involved in the sentence. Sentence (2) is a traditional example. If ‘bachelor’ means unmarried male, then the meaning of the sentence suffices to explain why it is true. (The notion of analyticity is famously controversial; for discussion, see the entry on the analytic-synthetic distinction.) Two notes are relevant here. First, some philosophers will talk of ‘logical’ necessity or supervenience as a way of discussing what this entry is calling conceptual necessity or supervenience. Here, as elsewhere, it is important to keep track of what exactly an author intends to express by their terms. Second, some proponents of analytic truth will nonetheless reject the idea of a distinct conceptual modality (e.g. Jackson 1998, Ch. 3). Such philosophers can, however, capture importantly related phenomena by discussing modal claims formulated in terms of sentences and their intensions. Next consider (3): this does not seem to be true simply because of the concepts it expresses. Rather, if it is true, it seems to reflect an important law of nature: a deep and non-accidental pattern in our universe. Some philosophers think that such laws underwrite a distinctive sort of modality: a proposition is nomically necessary just in case its falsity is incompatible with the laws of nature. On this view, (3) is nomically necessarily true, because it follows from the laws governing the speed of light. Now consider (4). It is commonly thought that (4) is necessarily true. For example: a substance composed overwhelmingly of atoms that do not contain 79 protons in their nuclei could not be gold. But (4) does not on its face look like a conceptual truth: it was a substantive discovery that there were protons at all, let alone how many protons an atom of gold characteristically possesses. Further (4) does not seem like it reflects a law of nature in the way that (3) does: rather, (4) seems to follow immediately from facts about what it is to be gold. Examples like (4) thus purport to give us an initial grasp on metaphysical modality as distinct from the other modalities considered thus far. Still more controversial is the notion of normative necessity (Fine2002, Rosen 2020). One way of understanding this idea appeals to an analogy with nomic modality. We can think of nomically necessary facts as those which follow from facts about the laws of nature. For example, the nomic impossibility of something traveling faster than light is a direct consequence of it being a law of nature that that nothing can travel faster than light. Someone might similarly claim that there are fundamental normative laws or principles. Suppose that (5) stated one of those laws. Then the normative impossibility of a state’s being good just because it is painful could be understood as expressing a consequence of that underlying normative law. There is enormous controversy about each of these alleged varieties of modality. For each of logical, conceptual, nomic, metaphysical and normative flavors of modality, some philosophers have raised important challenges to whether that flavor of modality is well-regimented, theoretically useful, or genuinely distinct from others on the list. This entry will not enter seriously into those debates. (For discussion of some of the issues, see the entry on varieties of modality.) If we instead provisionally assume that each of these notions is legitimate, this will put us in a position to ask (in Section 2, below): what is the modal strength of the supervenience thesis that we should accept? 1.4 Ontological and ascriptive supervenience The ethical supervenience theses discussed thus far are ontological: they propose various covariance relationships between ethical properties and certain other properties. However, James Klagge (1988) has helpfully regimented an important alternative way of understanding ethical supervenience. Call two circumstances that a thinker believes to be identical in all base respects apparently base-identical. Now consider the following claim: Ascriptive Anyone who treats apparently base-identical circumstances as ethically different from each other thereby makes a mistake. Unlike the supervenience theses encountered so far, Ascriptive is fundamentally a claim about ethical judgments: it is a claim that someone who makes a certain pair of such judgments thereby makes a mistake. Klagge usefully dubs claims like this ascriptive supervenience theses. A fully informative ascriptive supervenience thesis would explain how we should understand the mistake claimed by Ascriptive. There are several possibilities, of which four are worth emphasizing. The claimed mistake could be alethic, consisting in having made at least one judgment with a false content. Or it might be epistemic: consisting in making at least one epistemically unjustified judgment. It could be conceptual, consisting in judging in a way that is inconsistent with the meanings of ethical words. Finally, it might be characterized as ethical, consisting in making a judgment in a way that is vicious or ethically objectionable. (Note that the relevant judgment might be mistaken in more than one of these ways.) Because ascriptive supervenience theses are about judgments rather than relations between classes of properties, they are quite different from the ontological supervenience theses we have considered thus far. One way to bring this out is to notice that one could potentially accept Ascriptive without thereby having any views about whether there are ethical properties. On the other hand, there are interesting connections between certain ascriptive and ontological supervenience theses. For example, anyone who accepts Strong Intuitive seems to be committed to a version of Ascriptive, with an alethic gloss on ‘mistake’. This entry began with the suggestion that it is plausible that the ethical supervenes. This section has aimed to clarify some of our options for understanding that idea. The various interpretive options we have explored together suggest a dizzying space of possible ethical supervenience theses. This in turn raises a pressing question: which of these theses (if any) best articulate the plausibility and significance that philosophers have often taken ethical supervenience to have? One thing that might help to answer this question is to consider the arguments that we can give for supervenience: these arguments might favor some of these theses over others. 2. Arguments for Ethical Supervenience It is common for philosophers to endorse ethical supervenience without much argument (an important exception is Smith 2004; for critical discussion of a variety of the arguments that have been offered, see Roberts 2018, 10–18). Part of the reason for this is that ethical supervenience is taken to be both obvious and uncontroversial. (Rosen 2020 calls it “The least controversial thesis in metaethics”.) Further, ethical supervenience is often claimed or assumed to be an obvious conceptual truth, doubts about which are supposed to reveal conceptual incompetence. The discussion just completed, however, suggests reason to worry about this assumption: there is not one ethical supervenience thesis but instead a complex variety of such theses. It is far from clear that we should accept all of these theses, and a substantive question how to assess each of them. Given that supervenience claims are modal claims, those seeking to evaluate supervenience claims might begin by considering the general question of how we can know modal facts (see the entry modality-epistemology/). This section sets aside this broad question. Instead, it begins by setting out a general strategy for arguing for ethical supervenience. It then explores the implications of that strategy for the controversies introduced in the previous section. The general argumentative strategy has two elements. The first element defends ethical supervenience as a plausible generalization from cases. Thus, consider our orienting case of the embezzling bank manager. This case provides us with a specific ethical supervenience thesis: it suggests that the ethical quality of the manager’s action cannot vary without something else varying as well (compare Horgan and Timmons 1992, 226 on specific supervenience facts). Next, notice that there is nothing special in this respect about the bank manager case: we can identify specific supervenience facts about anything from genocide to insulting your neighbor’s hat. Each such fact is constituted by an interesting necessary connection between ethical properties and some base properties. It is theoretically unattractive to rest satisfied with a long list of such necessary connections. Instead, we should look for a single thesis that unifies all of these specific theses into a single pattern. This pattern can be captured by a general ethical supervenience thesis such as Initial (compare McPherson 2012, 211). The second element of the general strategy for arguing for ethical supervenience emphasizes the independent credibility of such a general supervenience thesis. This element takes inspiration from a comment by Henry Sidgwick: In the variety of coexistent physical facts we find an accidental or arbitrary element in which we have to acquiesce…. But within the range of our cognitions of right and wrong, it will be generally agreed that we cannot admit a similar unexplained variation. (1907, 209) It is plausible to interpret Sidgwick as suggesting that although we seek explanatory power when we develop our account of the physical world, we need to be prepared to admit brute contingency: the possibility that our best theories or explanations include claims like “and these just happened to be the initial conditions”, or (to be anachronistic) “it is a brute fact that the quantum wave function collapsed this way”. By contrast, we cannot admit the analogous idea that it is a brute contingent fact that a certain ethical property just happens to covary with the base properties that are instantiated around here. Because of their modal scope, ethical supervenience theses reflect this ban on brute ethical contingency (compare also Shafer-Landau 2003, 78; Smith 2004, 225). The two parts of the strategy complement each other: The first part of the strategy defends general ethical supervenience on the basis of unification, which is a familiar and domain-general theoretical virtue. The second part of the strategy suggests that we have further reasons to accept such a general thesis that stem from a feature of our understanding of the ethical domain specifically. While Initial is a general supervenience thesis, it is silent on many of the issues broached in Section 1. The next task is thus to extend the strategy just introduced to discuss those issues. Before doing so, it is important to emphasize that many of the options considered in that section are compatible: for example, supervenience on the natural properties entails supervenience on all of the properties. Because of this, an argument for the former thesis is not an argument against the latter thesis. Because stronger ethical supervenience theses are potentially both more illuminating and more dialectically significant, this section will focus on examining competing cases concerning what the strongest well-supported ethical supervenience thesis is. The general strategy just canvassed has two stages: the first stage carefully examines cases, and the second appeals to our more general understanding of the ethical. Both parts of the strategy can be useful in addressing the question of what the ethical supervenes on. For example, Section 1.1 appealed to possible cases involving supernatural beings as part of an argument against the idea that the ethical supervenes on the natural. In terms of the first part of the strategy, this suggests that once we make salient the possibility of supernatural beings, ethical supervenience theses that posit a naturalistic base become more doubtful. In terms of the second part of the strategy, the same cases fit nicely with the Sidgwickian thesis: if an ethical claim were true in part because of some supernatural truth, it would thereby not be brutely true. As noted in Section 1.1, characterizing what the ethical supervenes on is an open challenge. This merely illustrates how the strategy can be applied to make progress on that challenge. The general strategy can also be applied to the structural question: for example, Section 1.2 noted that weak supervenience is compatible with the idea that a utilitarian ethical principle is a fundamental truth in some possible worlds, but is false in others. Strong ethical supervenience, by contrast, is incompatible with this idea. Many philosophers believe that the fundamental ethical principles could not vary contingently in this way, because this would again threaten to entail that some fundamental ethical truths are brute contingencies. If correct, this supports the idea that ethical supervenience is a strong supervenience thesis. On the other hand, assessing whether ethical supervenience is strong or global (or both) might require adjudicating live metaphysical controversies concerning the relationship between strong and global supervenience (for discussion of these controversies, see section 4.3.1 of the entry on supervenience). What about the modality of ethical supervenience? One might think of this question as seeking to clarify what sort of non-contingency the Sidgwickian commitment requires. If we distinguish logical from conceptual necessity, it is easy to see that the logical supervenience of the ethical is a non-starter. The truth of ‘pain is bad’, e.g., is not secured simply by the logical vocabulary and the syntax of the sentence, in the way that the truth of ‘all bachelors are bachelors’ seemingly is. The most common view in the literature is that the supervenience of the ethical is a conceptual truth. Here we cannot simply adapt the general strategy used so far, since neither the cases nor the inference to the best explanation from those cases seems to settle the matter. Consider three reasons to think that ethical supervenience is a conceptual truth. First, to adapt R. M. Hare’s canonical example (1952, §5.2), if I mentioned to you that one possible act was right, and another wrong, despite these acts being exactly alike in all other respects, your initial reaction would be puzzlement, and if I persisted in my view upon interrogation, you might start to worry that I was simply confused or misusing words. Second, the crucial cases used to support supervenience—like the embezzling banker case—seem to involve conceivability reasoning: we are asked to consider two circumstances that are identical in all base respects, and notice that we cannot make sense of the idea that they differ in ethical respects. Some philosophers find it natural to think that conceivability reasoning first and foremost reveals facts about conceptual possibility and necessity. This can be bolstered by a third (much more controversial) thought. Conceivability reasoning appears to be a priori. But if such reasoning fundamentally concerned the world rather than our concepts, then we would seemingly have a priori access to substantive facts about the world, which many philosophers have found deeply mysterious. Each of the sorts of reasons just offered is controversial. Consider three examples of this controversy. First, it is controversial whether the sorts of puzzlement reactions identified by Hare must signal conceptual confusion or misuse (Kramer 2009, Harrison 2013). For example, perhaps we take ethical supervenience claims to be so obvious that when someone appears to deny them, we are inclined to treat conceptual confusion or difference as a charitable hypothesis. One potential piece of evidence for this is that when denial of ethical supervenience is based upon reasoned arguments, such as those mentioned in Section 5 below, a diagnosis of conceptual confusion or difference arguably become less plausible diagnosis. Second, philosophers unafraid of the ‘synthetic a priori’ can reject the inference from conceivability reasoning to conceptual status. It is notable here that a great deal of work in contemporary metaphysics appeals to something like conceivability reasoning to argue directly for claims about the nature of reality. Third, the very notion of conceptual truth is hotly contested: many philosophers have become convinced that there is no notion of conceptual truth that is both coherent and philosophically interesting (for discussion, see the entry on the analytic-synthetic distinction). Set aside these challenges for the moment, and consider how we should interpret the idea that ethical supervenience is a conceptual truth. We saw above that there is some support for thinking that ethical supervenience is a strong supervenience thesis. But combining this idea with the idea that the modality of supervenience is conceptual leads to complications. To see the issue, recall the schema for Strong Supervenience: Strong Necessarily, if anything x has some ethical property F, then there is at least one base property G such that x has G, and necessarily everything that has G has F. If we interpret the claim that ethical supervenience is conceptual by replacing ‘Necessarily’ in the schema with ‘it is a conceptual truth that’. The result is: Strong Conceptual It is a conceptual truth that if anything x has some ethical property F, then there is some base property G such that x has G, and it is a conceptual truth that everything that has G also has F. One central problem with Strong Conceptual is that it claims that for every instantiated ethical property, there is a base property such that: it is a conceptual truth that anything that has this base property also has the ethical property. And this consequence will seem defensible only on certain very controversial views about ethics and conceptual analysis. The implausibility of Strong Conceptual may explain why two of the most influential philosophers who discussed supervenience in ethics —R. M. Hare (1984, 4) and Simon Blackburn (cf. 1985, 134, and the contrast between ‘supervenience’ and ‘necessity’ in 1984, 183–4.)—seemed to accept something like weak but not strong conceptual supervenience of the ethical. However, as noted above, it appears that we have reason to accept something stronger than weak ethical supervenience (Shoemaker 1987, 440–1; for dissent see Miller 2017). It is thus worth considering alternatives that capture that strength without succumbing to the difficulties facing Strong Conceptual. One way to avoid the problem is to interpret the first necessity operator in Strong as conceptual, while leaving the second operator as metaphysical: Strong Mixed It is a conceptual truth that if anything x has some ethical property F, then there is some base property G such that x has G, and it is metaphysically necessary that everything that has G also has F (compare Dreier 1992, 15). This avoids the implausible implications that Strong Conceptual has: Strong Mixed says only that it is a conceptual truth that a certain base property (we may not know which) covaries with each ethical property. Note that Strong Mixed is only one possible mixed-modality supervenience thesis: one could reinterpret either necessity operator, to produce one of a wide variety of possible mixed ethical supervenience theses. For example, the second necessity operator could be interpreted as normative (rather than metaphysical) necessity. Such mixed modality theses have not yet been seriously explored. Another option is to offer a conceptual version of the Strong Intuitive supervenience thesis mentioned in Section 1.2: Intuitive Conceptual If two conceptually possible entities are alike in all base respects, they are alike in all ethical respects. Because it does not posit known relations between specific ethical and base properties, Intuitive Conceptual does not face the difficulties of Strong Conceptual. Intuitive Conceptual also has an advantage over Strong Mixed: the latter commits one to metaphysical as well as conceptual modality. Intuitive Conceptual is a plausible option for philosophers who take there to be a stronger alternative to weak ethical supervenience, but who are suspicious of the notion of metaphysical modality. Among philosophers who reject the idea that ethical supervenience is a conceptual truth, many will insist that the supervenience of the ethical is at least metaphysically necessary. Most such philosophers appear happy to accept the strong metaphysical supervenience of the ethical. Such philosophers might defend the metaphysical supervenience of the ethical by applying the general strategy suggested at the beginning of this section, while rejecting the case for thinking this strategy has specifically conceptual implications. Other philosophers will reject the idea that we should begin with the sorts of judgments about cases that drove the general strategy. They can instead argue that the metaphysical supervenience of the ethical is supported as an abstract consequence of the best overall empirical theory concerning ethical facts (e.g. Sturgeon 2009, 61). Other philosophers reject the conceptual and metaphysical supervenience of the ethical, but claim that the ethical supervenes nomically or normatively. In general, such supervenience theses are too weak to support the sorts of arguments from ethical supervenience that philosophers have made. Because of this, arguments for these theses will be discussed in Section 5.4, which concerns doubts about ethical supervenience. Finally, how should we decide between ontological and ascriptive supervenience theses? Proponents of ascriptive supervenience take on the obligation of making precise the sort of mistake that ‘supervenience-violators’ are allegedly making, and defending the idea that this is a mistake. The most prominent approach takes the mistake to be conceptual, which involves commitments similar to those taken on by defenders of the conceptual supervenience theses just discussed. One reason to focus on ascriptive supervenience theses is that some philosophers deny that our ethical thought and talk commits us to the existence of ethical facts and properties. Such philosophers can still grant that if we interpret supervenience in an ascriptive way, it provides important insights into ethics. Further, philosophers who accept that there are ethical facts and properties can also accept ascriptive supervenience theses about ethical thought. Indeed, if we understand Ascriptive as a conceptual claim, then together with realism it could provide the basis for accepting a conceptual-strength ethical supervenience thesis. This means that ascriptive ethical supervenience theses have the potential to be a point of significant common ground between philosophers with widely differing views about the nature of ethical thought and talk. And this might make them especially dialectically powerful in arguments that appeal to ethical supervenience. 3. Arguments from Ethical Supervenience This section examines arguments in and about ethics that philosophers have made which appeal centrally to ethical supervenience as a premise. The bulk of the section discusses the most influential supervenience arguments in ethics, which have concerned realism and reduction, before considering the significance of ethical supervenience for the epistemology of ethics, and for debates about the existence of ethical principles. 3.1 Arguments against realism The earliest influential discussions of what we now call supervenience in ethics focused on its significance for substantive ethical investigation. Henry Sidgwick draws from it what he takes to be a “practical rule of some value” for such investigation (1907, 208–9). And G. E. Moore (1922) used the idea as part of his attempt to explain the idea of intrinsic value. Given that Moore and Sidgwick were both ethical realists, it is perhaps striking that the most influential philosophical use of ethical supervenience has been in arguments against ethical realism. In his argument for error theory, J. L. Mackie briefly claims that supervenience makes trouble for the realist. His quick argument can usefully serve as a prelude to the more detailed discussion to come. Mackie suggests that we think that actions have their ethical properties because they have some natural features. For example, we think a certain action wrong because it is cruel. He denies that this ‘because’ references a conceptual entailment, and thinks this raises two questions: (1) what sort of relation is the connection being referred to? And (2) how do we come to know that actions stand in this relation? (1977, 41). As it stands, Mackie’s questions serve more as a research agenda than an argument (for important recent discussion, see Olson 2014, §5.1). It appears plausible that realists should aim to have something illuminating to say both about the nature of the relation between the ethical and base properties, and a credible epistemology for how we come to know such relations. But Mackie’s questions do not yet constitute an argument that realists cannot achieve these aims. Simon Blackburn developed a more substantial supervenience argument against realism. The details of Blackburn’s various presentations of his argument (1971, 1984, and 1985) are complex and raise difficult interpretive questions; the reconstruction that follows is a rather free interpretation of Blackburn’s (1984, 183–4; for sympathetic discussion, see Mabrito 2005 and Mitchell 2017). The argument starts with two claims: Now consider an act of happiness-maximizing promise-breaking. It follows from (2) that is conceptually possible that the world is base-identical to the actual world, and this act is wrong, but it is also conceptually possible that the world is base-identical to the actual world, and this act is not wrong. But from (1), we can notice that it is not conceptually possible that there are two base-identical acts, one of which is wrong and one of which is not. This combination is supposed to be difficult for the realist to explain. For (2) seems to show that there is no conceptual link between ethical concepts like ‘wrong’ and any one of our naturalistic concepts. And if ethical concepts function to pick out properties (as the realist claims), then given this conceptual separation, it seems that we should be able to identify conceptual possibilities by arbitrarily “mixing and matching” distributions of naturalistic and ethical properties. Ethical supervenience precisely functions to limit such mixing and matching. Consider four possible ways that the realist might reply. First, the realist could seek to debunk the challenge. For example, she might do this by denying that the ethical supervenes with conceptual necessity (see the previous section for discussion). Or she might reject the supervenience of the ethical on the natural (see Section 1.1), and challenge Blackburn to identify a supervenience base for which the argument remains potent. Second, the realist might seek to explain the pattern of individual conceptual possibility without conceptual co-possibility. For example, if it were a conceptual truth that ethical properties were natural properties, then this would explain the pattern of knowledge suggested here (Dreier 1992, 20). An analogy may help to make this vivid: it might be a conceptual truth that physical properties are natural properties (compare Kim 2011). But which total naturalistic patterns in the world the physical properties covary with is arguably an empirical question. One might take these examples to illustrate a general reply: the pattern is not puzzling, because it simply reflects the limitation of our conceptually-based insight into reality (Shafer-Landau 2003, 86). Third, some realists are prepared to claim more ambitiously that we can give a conceptual analysis of rightness in base terms (e.g. Jackson 1998, Ch. 5). Such philosophers can thereby deny (2), cutting the argument off at the knees. (Dreier 1992, 17–18 suggests that Blackburn’s argument simply begs the question against this sort of reductive realist.) Such realists take on the burden of rejecting the most famous argument in metaethics: G. E. Moore’s “open question argument” (1903, Ch. 1). However, it is a hotly contested question what—if any—probative value this argument has (for discussion, see section 2 of the entry on moral non-naturalism). A fourth reply would be to shrug off the alleged explanatory challenge. However allegedly puzzling the combination of the features described by (1) and (2) are, they are consistent features of a concept. This means that we could choose to introduce a concept that exemplified those features. It might thus be suggested that Blackburn’s argument shows only that we have chosen to do so with our ethical concepts (compare Olson 2014, 89–90). One might reply to this last point that it is precisely this choice that needs to be explained. Blackburn argues that the non-cognitivist has a smooth functionalist explanation for why our ethical thought and talk includes the ban on mixed worlds (see Section 3.3 below for discussion), while for the realist, this might just be an unexplained peculiarity of our choice of concepts. 3.2 Arguments against non-reductive realism As was just noted, a certain kind of reductive naturalist seems to have an easy reply to Blackburn’s argument. In light of this, it is perhaps unsurprising that several philosophers have argued that ethical supervenience theses support reductionist forms of ethical realism against non-reductive forms. Consider a few important variants of such arguments. The first is a simplified version of arguments due to Frank Jackson (1998, Ch. 5; see also related arguments by Brown 2011 and Streumer 2017, Ch.s 2–3). The argument has two steps. The first step is an argument that if the ethical properties strongly (or globally) metaphysically supervene on the base properties, then there is no metaphysically possible ethical difference between states that does not have a correlated base difference between the same states. If we make some liberal assumptions about property types, this entails in turn that there is a base property that is necessarily coextensive with every ethical property. The second step of the argument is the claim that necessarily coextensive properties are identical. Brown offers a nice motivation for this thesis: we should commit ourselves to the existence of a property only insofar as it can do explanatory work, and the only way for a property to do explanatory work is for it to distinguish metaphysical possibilities (2011, 213). If we assume that identity is sufficient for reduction, these two steps together entail the reduction of the ethical. While both steps of the argument are controversial, the second stage has come in for especially heavy fire. (For a careful discussion of the dialectic, see Suikkanen 2010; for an ingenious argument against Jackson that identity with descriptive properties is compatible with ethical non-naturalism, see Dunaway 2017). One important general basis for doubt is that many contemporary philosophers question whether modality constitutes the fundamental explanatory currency of metaphysics, as Jackson and Brown seem to presuppose (for an especially influential challenge see Fine 1994, for an especially radical challenge, see Sider 2011, Ch. 12). The argument for reduction from metaphysical supervenience can, however, be prosecuted within frameworks that reject Jackson’s and Brown’s core assumptions. Consider two examples. First, one might deny that necessary coextension entails identity, but nonetheless argue that the best explanation of ethical supervenience is a grounding relation that suffices to ensure that ethical properties are identical to some of the base properties (Bader 2017). Second, you might deny that reduction requires identity. Of course, identifying non-obvious identities is a powerful model of reduction. For example, a standard way of characterizing the physicalistic reduction of heat is that the heat in a volume of gas is identical to the mean molecular kinetic energy of that volume of gas, which is a physical property. However, there is no consensus concerning how to understand reduction as a metaphysical relation (for a taste of the controversy, see McPherson 2015, §3, and the entry on scientific reduction and the discussion of reduction in the entry on David Lewis). The core idea at stake in debates over reduction is that commitment to the existence of the reduced properties should constitute no ontological commitment “over and above” commitment to the reducing properties. Some philosophers have sought to spell out this idea by appealing to essence rather than to identity. Consider an essentialist account of reduction (cf. Rosen 2017b, 163), on which the A properties reduce to the B-properties just in case: (i) it is necessary and sufficient for each A property to be instantiated that some B property is instantiated; and (ii) these modal facts follow from the essences of the A-properties. The idea is that if what it is to be each A property entails that the A-properties are uniquely realized by the B-properties, this amounts to a kind of reducibility of the A-properties. Consider an example: one might take oneself to have offered a reduction of the number one, in claiming that: what it is to be the number one is just to be the successor of zero. One important contrast with the identity conception is that on the essentialist conception, successful reductions reveal metaphysical structure. Thus, one might say in our example that the number one is ‘built out of’ the number zero and the successor function. On an influential essentialist account of metaphysical modality, all necessities are to be explained by facts about the essences of things. Ralph Wedgwood (2007) and Gideon Rosen (2020) argue that on this sort of view, the strong metaphysical supervenience of the ethical would entail that the ethical possibilities are fully explained by the essences of the base entities. Interestingly, both Rosen and Wedgwood reject this reductive conclusion. Wedgwood argues that some necessary truths (including ethical supervenience theses) can be explained by certain contingent truths, together with facts about essences, and that this sort of explanation does not have reductive implications (2007, §9.3; for critical discussion of this response, see McPherson 2009, Sec 3, and especially Schmitt and Schroeder 2011). Rosen responds by rejecting the strong metaphysical supervenience of the ethical (see Section 5.3 below). 3.3 Supervenience and anti-realism As Section 3.1 explained, supervenience arguments were initially used by Mackie and Blackburn to raise doubts about ethical realism. Indeed, it has been widely assumed that the realist faces a challenge here that the anti-realist does not. The issues here are complicated, and it will be helpful to consider common varieties of ethical anti-realism separately. First, consider ethical nihilism, the thesis that there are no ethical properties. The ethical nihilist might seem to have an easy time explaining the metaphysical supervenience of the ethical: if there are no ethical properties, there are, trivially, no ethical differences. And if there are no ethical differences, there are no ethical differences without base differences. This line of reasoning is too quick as it stands. Supervenience is a modal claim, so contingent ethical nihilism—the thesis that there are no actually instantiated ethical properties—cannot explain ethical supervenience. Indeed, as Christian Coons (2011) has shown, it is possible to use supervenience to construct an interesting argument against contingent nihilism. A crucial question here is: what is the modality of the supervenience thesis to be accounted for? If the supervenience thesis we need to explain is conceptual, then even the truth of non-contingent nihilism—the thesis that it is metaphysically impossible for ethical properties to be instantiated—would not do the relevant explanatory work. Only the thesis that the instantiation of ethical properties is conceptually impossible would suffice. (Note that the nihilist might be able to adapt one of the realist replies to Blackburn discussed in Section 3.1, but in this case it would not be easier for the nihilist to explain supervenience, than it is for the realist who adopts the same reply.) The nihilist imagined above does not question the assumption that ordinary ethical thought and talk commits us to ontological claims. Other ethical anti-realists, however, will deny this assumption (for discussion, see the entries on moral anti-realism and moral cognitivism vs. non-cognitivism). Consider two examples of such views. First, hermeneutic fictionalists about ethical thought and talk argue that such thought and talk is to be understood as a form of pretense or fictional discourse (see Kalderon 2005 for discussion and defense). It will be natural for the hermeneutic fictionalist to reject ordinary ethical supervenience claims as misleading. However, they will presumably still need to account for the considerations that lead other philosophers to accept ethical supervenience claims. The issues concerning ethical fictionalism and supervenience are comparatively unexplored; see (Nolan, Restall, and West 2005, 325–327) for important preliminary discussion. Second (and much more influentially) some non-cognitivists about ethical thought and talk deny that our ethical claims express beliefs about the ethical nature of the world, suggesting instead that they express desire-like mental states. Such a view may make ontological supervenience claims about ethics appear misleading at best. More interesting is the question of what non-cognitivists can say about the sort of ascriptive supervenience thesis discussed in Section 1.4: Ascriptive Anyone who treats apparently base-identical circumstances as ethically different from each other thereby makes a mistake. This thesis is an alleged correctness constraint on ethical thought and talk. Prominent philosophers in the non-cognitivist tradition (broadly understood) have characteristically claimed that their views enabled them to explain theses like Ascriptive. Consider a representative sample of these explanations. R. M. Hare claims that ascriptive supervenience holds because a significant part of the function of moralizing is to teach others our ethical standards, and the only way to do that is to get our audience to see the recognizable pattern that we are prescribing that they follow (1952, 134). According to Simon Blackburn, the presumption of ascriptive supervenience is required by the idea that our ethical attitudes are supposed to be practical guides to decision-making (1984, 186). According to Allan Gibbard (2003, Ch. 5), ascriptive supervenience for ethical thought is explained by a consistency norm on planning states. Critics of non-cognitivism (e.g. Zangwill 1997, 110–11; Sturgeon 2009) have challenged the rationales offered by Hare and Blackburn. Suppose that we grant that consistency is useful, given the various functions of ethical discourse. It is unclear why this usefulness should force on us a conceptual truth about moral discourse. Further, it is arguable that all that is required for these practical purposes is consistency within worlds that are very similar to the actual world. So the idea that such consistency is required over every possible world (as seems to be the case for ethical supervenience) seems like considerably more than the practical considerations require. Gibbard’s rationale has faced related criticism: why must planners be committed to consistency in the sweeping way that Gibbard envisions (Chrisman 2005, 411–12; Sturgeon 2009, 84–87)? If these critics are right, it is not clear that the non-cognitivist has an especially compelling explanation of ethical supervenience. And if they do not, this will complicate their efforts to claim that explaining ethical supervenience is a dialectical advantage against cognitivism. It is also worth bearing in mind that the details of which ethical supervenience thesis we need to explain can affect how promising the non-cognitivist explanations will be. For an important illustration of this point, see (Atiq 2019). A further complication arises from the fact that leading contemporary heirs of non-cognitivism (such as Blackburn and Gibbard) have abandoned anti-realism. Instead, they have adopted what Simon Blackburn (e.g. 1993) has dubbed the ‘quasi-realist’ program. This involves the claim that one can, while beginning with the non-cognitivist’s framework, “earn the right” to realist-sounding claims about ethical truth and objectivity (for further discussion see the section on noncognitivism in the entry on moral anti-realism). Now consider an ontological supervenience claim: that there can be no difference in ethical properties without a difference in base properties. The quasi-realist program can seem to commit the quasi-realist to accepting this claim. Dreier (2015) argues that this leads to a further challenge to the non-cognitivist: even if she can explain ascriptive supervenience, it is not clear that she can explain ontological supervenience. If this is the case, the most influential contemporary non-cognitivists may find that supervenience is a dialectical burden rather than benefit. 3.4 Supervenience and moral epistemology So far, this entry has focused on the significance of supervenience for claims about the nature of ethical thought, talk, and metaphysics. However, influential early discussions of this sort of thesis seemed to have something else in mind. For example, Section 2 above quoted an evocative passage from Henry Sidgwick. But Sidgwick’s point was not to argue about the metaphysics of ethics. Rather, he was proposing a supervenience-like idea as an epistemological corrective to ad hoc special pleading in one’s ethical reasoning (1907, 209). The mere fact of supervenience could not play this sort of role: after all, the supervenience of the ethical is compatible with the idea that everyone ought always to do what I want them to do. However, Sidgwick points to an important idea: that we expect there to be a rational explanation for any ethical fact. One ambitious way of developing this idea has been suggested by Nick Zangwill (2006). According to Zangwill, a central conceptual constraint on ethical reasoning is the “because constraint”: when we judge something to be wrong (or to have another ethical property), we are committed to its having this property because it has some other property. Zangwill claims that this principle “either is, or explains” ethical supervenience (2006, 273). And Zangwill goes on to argue that this constraint has striking epistemological implications: he claims that it entails that our only epistemic access to facts about the distribution of ethical properties is by knowing about the distribution of base properties, and knowing ethical principles that link the presence of base properties to ethical properties. He then argues that our knowledge of these ethical principles could itself only be a priori (2006, 276). If Zangwill is right about this, then the a priori character of moral epistemology can be derived from claims about the supervenience of the ethical. One worry about this argument is that it might overgeneralize. The “because” structure seems to be shared by other normative domains: it would be very odd to claim that a particular chess move was winning, or that a particular action was illegal, without being committed to their being some general explanation in terms of the rules of chess, or the relevant laws, that explains this particular fact. But our knowledge of the law and the rules of chess is empirical. So one might wonder what precisely prevents our knowledge of ethical principles being empirical as well. 3.5 Supervenience and the existence of ethical principles One traditional assumption about ethics is that our ethical obligations can be expressed by general ethical principles. This assumption has recently been challenged by ethical particularists, who claim that our ethical reasons and obligations cannot be codified into principles. Supervenience might seem to be relevant to this debate. For as Section 3.2 above showed, some philosophers argue that the strong metaphysical supervenience of the ethical entails that for every ethical property, there will be a base property that is necessarily coextensive with it. Focusing on wrongness, this in turn has the apparent consequence that there is a base property B such that: Entailment It is metaphysically necessary that an action is wrong just in case that action is B. One might think that Entailment just is the schema for an ethical principle concerning wrongness: for example, if we substitute ‘fails to maximize happiness’ for ‘is B’ we seem to get a clear statement of a utilitarian ethical principle. And this in turn might seem to cast doubt on the coherence of particularism. This reasoning, however, is too quick. To see this, note that supervenience itself in no way guarantees that B will be some elegant base property like failing to maximize happiness. B might instead be enormously complicated: at the limit, supervenience is compatible with B simply being a disjunction of an infinitely long list of complete base specifications of various possible worlds. Call an instance of Entailment with such a base a gruesome entailment. It is not clear that such entailments constitute principles that are incompatible with particularism. One reason to think that they do not is that genuine ethical principles arguably have explanatory power. Margaret Little argues that the “radical over-specificity” of gruesome entailments renders them non-explanatory, and hence inapt to be principles (2000, 286). Another reason to doubt that gruesome entailments are principles is that we ordinarily assume that ethical principles would be usable by agents (Dancy 2004, 87–8), but a gruesome “principle” is clearly not. (For a relevant argument that the true instance of Entailment could not be gruesome, because it would need to be learnable by ordinary speakers, see Jackson, Pettit, and Smith 2000). 4. Metaphysical Supervenience and Ethical Realism The Blackburn-inspired argument against ethical realism relies crucially on the assumption that ethical supervenience is a conceptual truth. For thesis (2) was crucial to that argument: 2. No specific naturalistic description of an action conceptually entails an ethical description…. While many find (2) plausible, fewer would be prepared to accept a purely metaphysical version of this thesis, such as: 2*. No base way a world could be metaphysically necessitates that world being a certain ethical way. This is precisely because thesis (2*) is inconsistent with the strong metaphysical supervenience of the ethical, which very many philosophers accept. This means that a purely metaphysical variant of Blackburn’s argument will not be plausible. This does not mean, however, that treating ethical supervenience as a non-conceptual truth renders it dialectically inert. This section considers the significance of metaphysical supervenience for ethical realism: does it pose a challenge to ethical realism? If so, how can we best understand this challenge? And what resources do different sorts of ethical realist have to meet the challenge? To focus our discussion, assume this metaphysical variant of Strong Intuitive (cf. Rosen 2020): Intuitive Metaphysical If two metaphysically possible entities are alike in all base respects, they are alike in all ethical respects. Intuitive Metaphysical might pose a challenge to the ethical realist in light of one of at least two background ideas. First, some philosophers have argued that there are no necessary connections between “distinct existences,” a claim that is sometimes called Hume’s dictum. If Hume’s dictum is correct, then the ethical realist will be committed to the ethical not being distinct in the relevant sense from what it supervenes on. The metaphysical use of Hume’s dictum faces at least two formidable challenges. The first is to clarify the dictum in such a way that it is both interesting and a plausible candidate for truth. To see this, note that many non-identical properties are necessarily connected: for example, a surface’s being scarlet entails that it is red, but being scarlet is not identical to being red. Red and scarlet, then, must not count as distinct in the sense relevant to a plausible form of the dictum. This raises the question: what does distinctness amount to? If we use necessary connection as a criterion, then Hume’s dictum turns out to be a trivial way of tracking this way of using the word ‘distinct’. Second, Hume’s dictum is usually defended on directly intuitive grounds. This raises a deep methodological question: if we notice a conflict between Hume’s dictum and another intuitively plausible claim, why should we retain Hume’s dictum and jettison the other claim? (For helpful discussion of Hume’s Dictum, see Wilson 2010). Consider a second way of developing a challenge to the ethical realist, inspired by the Sidgwickian motivation for accepting ethical supervenience, introduced in Section 2. According to this motivation, we should accept an ethical supervenience thesis because doing so rules out the implausible hypothesis of brute ethical contingency. Intuitive Metaphysical clearly satisfies this motivation: it permits no brutely contingent ethical variation. However, suppose that it was not possible to explain why the ethical properties supervene on the base properties. Then the very thesis that we used to explain why there was no brute ethical contingency would turn out to be something arguably even more peculiar. It would be a metaphysically necessary connection that nonetheless has what Sidgwick might call an “arbitrary element in which we have to acquiesce;” in a slogan: a brute necessity. A natural way of thinking about the significance of brute necessity begins with the assumption that we are entitled to a default combinatorial assumption about modality: that for any pair of properties F and G, it is possible that there is an x that is both F and G, that x is only one and not the other, and that there is an x that is neither F nor G. The next step is to suggest that this default assumption can be defeated. Consider red and scarlet: on one view, to be red just is to be scarlet or crimson or cherry red or… The thesis that this is what it is to be red, if true, would provide a straightforward explanation of why the combinatorial assumption is defeated here: it is not possible for something to be scarlet but not red precisely because of what it is to be red. Where we take there to be no such explanation however, we should be loathe to accept an alleged necessary connection (cf. McPherson (2012); for a similar idea in a different context, compare Levine and Trogdon 2009). Call this constraint on our metaphysical theorizing anti-brutalism. Both Hume’s dictum and anti-brutalism put us in a position to pose a conditional challenge to the ethical realist. If the realist thinks that the ethical properties are distinct from the base properties, they must reject either metaphysical supervenience or Hume’s dictum. And if they think the supervenience of the ethical is a brute necessity, they need to explain why such brutalism is not objectionable. Different variants of ethical realism have different resources available to address this challenge. The remainder of this section examines some of these resources. 4.1 Reductive explanations of ethical supervenience As Section 3.2 explained, some philosophers have argued that the supervenience of the ethical entails that the ethical can be reduced. These arguments are quite controversial, but it is perhaps less controversial that a successful reduction of the ethical properties would suffice to explain the metaphysical supervenience of the ethical. Consider first a reductive account that identifies the ethical properties with some natural or supernatural property. Assuming that natural and supernatural properties are among the base properties, the supervenience of rightness on the base properties would be easily explained on this view: because rightness is identical to a base property, on this view, there clearly cannot be a difference in rightness without some difference in base properties. If essentialist explanations are legitimate, essentialist reduction again appears to be a straightforward way of explaining the supervenience of the ethical. Part of the idea of essence is that necessarily, nothing can survive the loss of one of its essential properties. So if rightness had an essentialist real definition purely in terms of base properties, then it would be clear why there could be no difference in rightness without a difference in base properties. In light of this, neither Hume’s dictum nor anti-brutalism appear to cast doubt on either sort of reductive theory, for both theories are able to explain supervenience, and hence avoid commitment to a brute necessary connection between the ethical properties and the base properties. Terence Horgan and Mark Timmons claim that even if the ethical realist endorses reduction, they face a further explanatory burden before they can fully explain supervenience: “Even if goodness, for instance, is identical to some specific natural property, there remains the task of explaining why this natural property, rather than any other one(s), counts as the correct referent of the term ‘goodness’” (1992, 230; emphasis in original). This is a fair explanatory demand, if we interpret it as the familiar challenge to provide a plausible theory of reference for ethical terms (a demand that Horgan and Timmons have pressed incisively). However this challenge does not appear to have anything distinctive to do with supervenience. Either the reductive naturalistic realist can explain the reference of ‘wrong,’ in which case she can also explain supervenience, or she cannot explain the reference of ‘wrong,’ in which case her view is implausible for reasons that have nothing to do with supervenience. 4.2 Functionalist explanations of ethical supervenience One influential account of metaphysical structure, especially in the philosophy of mind, has been functionalism. Here is a simplified toy example of a functional analysis: any system that takes some money as an input, and reliably produces a candy as an output, thereby counts as a candy machine. On this account, the kind candy machine is individuated by input-output relations. A functional kind is any kind that can be individuated in this way. Because functional kinds are not individuated by the nature of the stuff that realizes the functional relations, they are often claimed to be paradigmatically friendly to multiple realization. Thus, given my characterization of candy machines, such a machine could be realized by a structure composed of metal or of plastic or perhaps even of spooky supernatural stuff. In light of this possibility of multiple realization, the relationship of functionalism to reduction is controversial: many philosophers have taken multiple realizability to constitute a barrier to reduction, but others disagree. (See the entries on functionalism and multiple realization for useful discussion). Now consider a version of ethical realism that takes ethical properties to be functional properties. Such a view, like the reductionist view, appears well-placed to explain the metaphysical supervenience of the ethical. This is because functional properties necessarily covary with the class of properties that are their possible realizers. If, for example, every complex property that could realize a candy machine is a natural property, then there could be no “candy machine difference” without a naturalistic difference. Similarly, if ethical properties are functional properties that could only be realized by certain of the base properties, then the supervenience of the ethical on the base properties would be smoothly explained. 4.3 Grounding explanations of ethical supervenience The strategies for explaining ethical supervenience discussed in the preceding two sections are useful to reductionist and functionalist ethical realists. However, many contemporary ethical realists reject both functionalism and reductionism about ethical properties. Most strikingly, several contemporary ethical realists are non-naturalists, claiming that the ethical properties are a distinct and irreducible class of properties (see the entry on moral non-naturalism for discussion). Several philosophers have argued that ethical supervenience poses a distinctive problem for the non-naturalist (Dreier 1992, 2019 ; Ridge 2007; McPherson 2012; Väyrynen 2017). So it is worth asking what metaphysical resources non-naturalists might have for explaining the supervenience of the ethical. A salient place to begin is with the grounding relation. As was noted in Section 1, grounding has recently been theorized as an asymmetrical explanatory metaphysical relationship (For an introduction to grounding, see the entry on metaphysical grounding; for a useful discussion of relevant issues in the context of ethics, see Väyrynen 2013a). It is thus natural to ask whether the non-naturalist could explain the supervenience of the ethical on the base properties by appealing to the fact that: certain facts about the instantiation of the base properties fully ground all facts about the instantiation of the ethical properties. A natural question at this point concerns why such a grounding relationship holds. An influential answer is that all grounding facts are themselves explained in essentialist terms (Fine 1994, Rosen 2010). As Section 4.1 suggested, these essentialist explanations can appear to have reductionist implications. If so, essentialist explanations are no help to the non-naturalist. Stephanie Leary has offered an ingenious proposal within the essentialist framework: she posits a class of “hybrid” properties, whose essences entail (i) that they are instantiated just in case certain base properties are instantiated, and (ii) that ethical properties are instantiated whenever they are instantiated, and argues that these relations do not suffice for essentialist reduction of the ethical (Leary 2017; for critical discussion see Faraci 2017 and Toppinen 2018). A recently influential alternative to the essentialist account of grounding proposes that we can explain the grounding of the ethical in terms of metaphysical laws. Here is the basic idea. One class of ethical facts are facts which state the instantiation of some ethical property. An example of such an ethical instantiation fact would be: Alice’s current state is intrinsically bad. One explanation of why the ethical supervenes is that such facts are always grounded in certain base facts, such as: Alice is currently in pain. The proponent of law-mediated ethical grounding denies that the latter base fact provides a complete grounding explanation for the former ethical fact. Rather, a complete grounding explanation will take this form: It requires a base fact (e.g. Alice is currently in pain) and an ethical law (e.g. Pain grounds badness), in order to fully ground any ethical instantiation fact (e.g. Alice’s current state is intrinsically bad). Suppose that, necessarily, every possible ethical instantiation fact is grounded by the combination of a base fact and an ethical law, as in this example. Then, (i) this would provide a complete explanation for supervenience: this grounding structure would explain why the instantiation of ethical properties must covary with thew instantiation of base properties. And (ii) this might look like a promising explanation on behalf of the non-naturalist, since the ethical laws could be metaphysically fundamental ethical entities. If ethical laws such as the one mentioned here are metaphysically fundamental, then one might think that this would secure non-naturalism (For this reason, Gideon Rosen calls such metaphysically fundamental laws ‘Moorean connections’ (2010, §13). The appeal to fundamental laws may seem to raise the same concerns that a brute supervenience relation did, however: Why is there a metaphysical law linking these distinct properties? The contrast with essentialist explanations is striking: in the latter case, facts about the natures of the related properties explain the links between them. However, some have argued that metaphysical grounding relations are either commonly, or even universally, law-mediated (e.g. Kment 2014, §6.2.3; Wilsch 2015). For a taste of the currently flowering literature on the explanatory role of ethical laws or principles, see (Eliot 2014; Scanlon 2014, Ch. 2; Schroeder 2014; Skarsaune 2015; §7; Rosen 2017a; 2017c; Berker forthcoming; and Morton forthcoming). This brief sketch of possible types of metaphysical explanations of supervenience barely scratches the surface. Among the many other options, replies grounded in appeals to tropes or universals have garnered explicit attention (Ridge 2007, Suikkanen 2010). As with the appeal to grounding, a central question about such strategies is whether they constitute genuine explanatory progress, or whether they simply explain one necessity by appealing to some further brute necessity. 4.4 Analytic and conceptual explanations of ethical supervenience This and the next subsection consider attempts to explain the metaphysical supervenience of the ethical by appealing to conceptual or ethical premises. The first such strategy appeals to analytic or conceptual truths. Suppose that an ethical realist accepts the popular view that ethical supervenience is an analytic truth. She might put her view this way: Analytic It is an analytic truth that: if two metaphysically possible entities are alike in all base respects, they are alike in all ethical respects. The core idea is that the truth of Analytic explains the truth of the supervenience thesis that it embeds (Intuitive Metaphysical). On this account, the ethical and the base properties covary because it is definitional of ‘ethical’ that nothing could count as an ethical property unless it covaried in this way. This strategy claims to meet the bruteness challenge: the necessary connection is explained by the way a property would have to be, in order to be what we talk about when we talk about ethical properties (cf. Stratton-Lake and Hooker 2006). Consider three brief worries about this strategy. The first is that on some influential contemporary accounts of analyticity, analyticity does not guarantee truth. For example, one account of analyticity is that for a sentence ‘S’ to be analytic in a language L is for competence with L to dispose a speaker to accept ‘S’. And some philosophers (e.g. Eklund 2002) have argued that there are inconsistent sets of sentences that satisfy this condition. If this is right, Intuitive Metaphysical’s being analytic in English would not guarantee its being true. The second worry is broadly intuitive. Analytic alone does not appear to guarantee that the supervenience of the ethical follows from the other aspects of the nature of ethical properties. And this suggests that, for all Analytic says, we can conceive of ethical* properties, which have every feature characteristic of ethical properties, except that they do not supervene. But this may lead us to wonder: why give the ethical properties the role in our lives that we do, and ignore the ethical* properties, just because they do not supervene? (For a related point, see the end of Mabrito 2005.) The third worry is that even if the truth of Analytic entails the truth of Intuitive Metaphysical, it nonetheless arguably does nothing to explain why the supervenience relationship holds. Consider an analogy: suppose that the infallible oracle tells you that a certain ethical supervenience thesis holds. This testimony does nothing to explain why that supervenience thesis holds (compare McPherson 2012, 221–222, and Dreier 2015, 2019). Like the oracle’s testimony, one might think that learning the truth of Analytic would simply reinforce our confidence in the very thesis (Intuitive Metaphysical) that we were hoping to explain. My exposition of these three worries (like the rest of this entry thus far) has followed the common practice of lumping together the notions of analytic truth and conceptual truth. Terence Cuneo and Russ Shafer-Landau (2014) have argued that distinguishing these two notions permits them to develop an attractive form of moral realism, and also enables them to explain the supervenience of the moral properties. They distinguish analytic and conceptual truth as follows: for a sentence to be analytically true is for it to be true in virtue of the meanings of the terms that constitute it. By contrast, for a proposition to be a conceptual truth is for it to be true wholly in virtue of the essences of its constituent concepts (ibid., 410–11). Concepts, in turn, are to be understood as abstract non-mental objects. One has a propositional thought in virtue of being appropriately related to some of these objects. Cuneo and Shafer-Landau then offer what they call a ‘reversal argument’, which entails that some conceptual truths about morality are ‘fact-makers’: that is, some of the facts about the distribution of moral properties are grounded in facts about moral concepts (ibid., 418–421). This puts them in a position to avoid the complaint that I just made about Analytic: on their view, conceptual truths really do metaphysically explain (some) of the relations between the moral and the base properties. They then propose that such connections quite generally explain the supervenience of the moral. It is worth emphasizing the commitments of this ingenious proposal. Consider one central issue. Cuneo and Shafer-Landau argue for the existence of several substantive-seeming conceptual truths about morality. As they admit, their view is quite heterodox in virtue of this. However, they nowhere claim that all necessary moral truths can be explained as conceptual truths. That, of course, would be a much stronger claim, and much harder to motivate. However, Intuitive Metaphysical is a quite general modal covariance thesis, and in light of this, only the stronger claim would suffice to explain its truth. 4.5 Ethical explanations of ethical supervenience Several philosophers have suggested that we can offer ethical explanations of the supervenience relation (Kramer 2009, Ch. 10; Olson 2014, §5.1, Scanlon 2014, 38ff; other philosophers, such as Dworkin 1996 and Blackburn 1998, 311 also appear committed to this idea; for discussion see Tiefensee 2014). For example, one might think that the dictum treat like cases alike! is an ethical requirement of ethical reasoning. Or one might think that all ethical truths are grounded in certain fundamental ethical truths that are relational: for example, a fundamental truth might be that it is wrong to torture someone purely for fun. This truth states a relationship between ethical and non-ethical properties. If all ethical facts are explained by such fundamental ethical truths, then these truths could seemingly explain why there are supervenience relations between ethical and base properties. One worry about this strategy is that one might take a mark of ethical realism to be commitment to a truthmaker thesis, according to which ethical truths are metaphysically explained by (or grounded in) the patterns of instantiation of ethical properties. The ethical explanation strategy seems to invert this intuitive order of explanation, by having the distribution of ethical properties explained by ethical truths. Suppose that we rejected this idea in an especially radical way, insisting instead on the reverse order of metaphysical explanation everywhere. The nature of every property, we might say, is wholly grounded in some relevant subset of the true propositions. Provided that we can recover the idea of metaphysical explanation within this framework, we will be able to isolate the set of propositions that state metaphysically unexplained necessary connections. And it is natural to think that the brute necessities worry could be expressed within this framework as objecting to accepting such propositions. The problem is that fundamental normative principles, as invoked in the ‘ethical explanation’ strategy, would seem to be of exactly the objectionable sort. 5. Arguments against Ethical Supervenience, or its Significance As the preceding sections have shown, philosophers have tried to extract a number of striking conclusions using ethical supervenience as a premise. Part of the motivation for these attempts is that ethical supervenience is widely assumed to be a powerful dialectical weapon, such that if your view is incompatible with ethical supervenience, it is in trouble. This section considers challenges to this status. 5.1 Arguments against supervenience from thick ethical concepts It is now common to distinguish thick ethical concepts—like courage—from thin ethical concepts—like ought or good (for an introduction to thick ethical concepts, see Roberts 2017). Courage seems like an ethical concept: we expect each other to treat courage as a virtue and not a vice. However, competent use of thick ethical concepts seems to require recognition that only certain sorts of grounds make an ascription of such a concept apt. To adapt Monty Python’s example, it seems conceptually inapt to say that Sir Robin was courageous in light of running away from battle, even if we think that is what he ought to have done. Jonathan Dancy (1995, 278–9) and Debbie Roberts (2018) have suggested that attention to thick ethical concepts casts doubt on ethical supervenience. The core idea is this: it is true that there are no thin ethical differences between otherwise identical circumstances. However, it is suggested that sometimes the thin ethical properties of an action or event are best explained by citing thick ethical properties. And it is claimed that it is not at all clear that these thick ethical properties can always be explained in purely base terms (see especially Roberts 2017a). A natural objection to this strategy is to point out that the supervenience of the thick on the base properties is, if anything, far more plausible than the supervenience of the thin. For example, it is very hard to believe that two possible worlds could be wholly base-identical, but be such that Doris’s action is brave in the first world, but not brave in the second. 5.2 Arguments against the epistemic credentials of ethical supervenience Section 2 noted that there are few extended defenses of ethical supervenience. This might suggest that the evidence for the supervenience is overwhelming. However, it might instead be a sign that supervenience is a dogma, accepted without adequate critical examination. This section briefly explains two challenges to the epistemic credentials of ethical supervenience. Joseph Raz briefly suggests that the supervenience of the ethical does not purport to explain much. And he suggests that this explanatory poverty gives us reason to doubt whether the ethical supervenes. According to Raz, ethical supervenience neither provides more specific theses that allow us to concretely explain the ethical features of reality, nor guarantees that we can find such explanatory theses (2000, 54–5). If we assume that we should accept only those theoretical claims that do substantial explanatory work, then this casts doubt on ethical supervenience as a theoretical claim. Section 2 suggested a different explanatory case for supervenience than the one Raz considers: general ethical supervenience theses serve to explain the host of specific ethical supervenience facts that we notice. These facts are perhaps not themselves explanatory. But they may seem difficult to intelligibly deny, at least pending a developed moral epistemology that might adjudicate their epistemic credentials. Alison Hills (2009) argues that we can undermine the case for ethical supervenience by granting that in many cases ethical difference without naturalistic difference seems inconceivable, and arguing that we should not take inconceivability here to be a good guide to impossibility. She suggests that the appearance of inconceivability may be grounded in our unwillingness to engage in certain distasteful imaginative exercises. Hills bolsters this case by arguing that if we consider a controversial and low-stakes case—say, whether a certain lie made with benevolent motives is permissible—we are able to conceive of such a lie being either permissible or impermissible. But, she suggests, if we can conceive of it as being permissible, and as being impermissible, we have shown that we are able to conceive of two ethically inconsistent possible worlds. Further, this low-stakes case is easier to conceive of than the possibility of Hitler being a moral paragon, and Hills suggests that this supports the idea that conceivability is grounded in our willingness to imagine certain possibilities, for we presumably have a stronger desire to avoid imagining Hitler as a moral paragon than we do to avoid imagining the lower-stakes case. 5.3 Arguments against the strong metaphysical supervenience of the ethical Section 1.3 showed that one of the crucial choice-points in theorizing ethical supervenience is the strength of the modality of the supervenience relation (conceptual? metaphysical? etc.). And Section 3 and Section 4 showed that the claim that the ethical supervenes with conceptual or metaphysical necessity is the starting point for several influential arguments. Gideon Rosen’s (2020) develops a view of the modal strength of ethical supervenience that is intended to be strong enough to accommodate the intuitive appearances, while weak enough to be dialectically inert. The heart of Rosen’s challenge is an argument that we can characterize and clearly regiment a notion of normative necessity, which falls short of metaphysical necessity (i.e. at least some normative necessities are metaphysically contingent), while still being quite strong, in the sense that in any counterfactual where one considers how things would be if we altered some non-normative fact, we hold fixed the normative necessities. Rosen proposes that normative necessity is the appropriate modality for ethical supervenience. If he is correct about this, most of the arguments from supervenience discussed so far would fail, as they tend to require ethical supervenience to have either metaphysical or conceptual strength. Even with this alternative clearly stated, the strong metaphysical supervenience of the ethical may seem especially plausible. But with his account of normative necessity in hand, Rosen can make two points: (i) when we consider possibilities that violate the strong metaphysical supervenience of the ethical, we are considering very distant possibilities, where our modal judgments may not be particularly trustworthy, and (ii) our judgments of metaphysical impossibility of these scenarios might be explained by implicit confusion derived from the fact that while these scenarios may be metaphysically possible, they are normatively impossible. By rejecting strong metaphysical supervenience, Rosen must reject the Sidgwickian explanatory idea suggested in Section 2: that ethical supervenience reflects a commitment to rejecting brute ethical contingency. One worry about Rosen’s strategy is that by embracing such contingency one permits an especially objectionable form of moral luck (Dreier, 2019). On Rosen’s view, there may be a world that is relevantly non-ethically identical to this one in which my counterpart is ethically quite different: in the extreme case, it raises the specter that the specific loving attitudes that I bear towards my child might have been evil, or even just a matter of utter ethical indifference. But it is hard to believe that I am lucky that the very attitudes that I possess count as commendable rather than awful. (See Lange 2018 for another important challenge to Rosen’s argument). Anandi Hattiangadi (2018) offers a conceivability argument against the idea that the ethical supervenes with conceptual or metaphysical necessity. The core idea is this. Mutually inconsistent ethical principles each appear to be perfectly conceivable. And in general, conceivability is a good guide to possibility. But if utilitarianism and Kantianism, say, are both true in some possible world otherwise like ours, then the supervenience of the ethical fails. One worry for Hattiangadi’s argument is that there seems to be a straightforward way to contextualize the relevant conceivability judgments. Consider an analogy. I cannot remember the atomic number of plutonium. So it is conceivable to me that plutonium atoms have any of a fairly wide range of numbers of protons. But I do not think that it is possible both that one plutonium atom has 100 protons, and that some other possible plutonium atom has 110 protons. If any plutonium atom has 100 protons, they all do. (This stems from my empirically-derived belief that number of protons is essential to the nature of plutonium). Similarly, I can entertain the possibility that utilitarianism is true, or that it is false. But what is hard to wrap one’s head around is the idea that there might be worlds just like this one in all base respects, which vary with respect to whether utilitarianism is true.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

Philosophy of Statistical Mechanics

1. The Aims of Statistical Mechanics (SM) Statistical Mechanics (SM) is the third pillar of modern physics, next to quantum theory and relativity theory. Its aim is to account for the macroscopic behaviour of physical systems in terms of dynamical laws governing the microscopic constituents of these systems and the …

1. The Aims of Statistical Mechanics (SM) Statistical Mechanics (SM) is the third pillar of modern physics, next to quantum theory and relativity theory. Its aim is to account for the macroscopic behaviour of physical systems in terms of dynamical laws governing the microscopic constituents of these systems and the probabilistic assumptions made about them. One aspect of that behaviour is the focal point of SM: equilibrium. Much of SM investigates questions concerning equilibrium, and philosophical discussions about SM focus on the foundational assumptions that are employed in answers to these questions. Let us illustrate the core questions concerning equilibrium with a standard example. Consider a gas confined to the left half of a container with a dividing wall (see Figure 1a). The gas is in equilibrium and there is no manifest change in any of its macro properties like pressure, temperature, and volume. Now you suddenly remove the dividing wall (see Figure 1b), and, as result, the gas starts spreading through the entire available volume. The gas is now no longer in equilibrium (see Figure 1c). The spreading of the gas comes to an end when the entire available space is filled evenly (see Figure 1d). At this point, the gas has reached a new equilibrium. Since the process of spreading culminates in a new equilibrium, this process is an approach to equilibrium. A key characteristic of the approach to equilibrium is that it seems to be irreversible: systems move from non-equilibrium to equilibrium, but not vice versa; gases spread to fill the container evenly, but they do not spontaneously concentrate in the left half of the container. Since an irreversible approach to equilibrium is often associated with thermodynamics, this is referred to as thermodynamic behaviour. Characterising the state of equilibrium and accounting for why, and how, a system approaches equilibrium is the core task for SM. Sometimes these two problems are assigned to separate theories (or separate parts of a larger theory), which are then referred to as equilibrium SM and non-equilibrium SM, respectively. While equilibrium occupies centre stage, SM of course also deals with other issues such as phase transitions, the entropy costs of computation, and the process of mixing substances, and in philosophical contexts SM has also been employed to shed light on the nature of the direction of time, the interpretation of probabilities in deterministic theories, the state of the universe shortly after the big bang, and the possibility of knowledge about the past. We will touch on all these below, but in keeping with the centrality of equilibrium in SM, the bulk of this entry is concerned with an analysis of the conceptual underpinnings of both equilibrium and non-equilibrium SM. Sometimes the aim of SM is said to provide a reduction of the laws of thermodynamics: the laws of TD provide a correct description of the macroscopic behaviour of systems and the aim of SM is to account for these laws in microscopic terms. We avoid this way of framing the aims of SM. Both the nature of reduction itself, and the question whether SM can provide a reduction of TD (in some specifiable sense) are matters of controversy, and we will come back to them in Section 7.5. 2. The Theoretical Landscape of SM Philosophical discussions in SM face an immediate difficulty. Philosophical projects in many areas of physics can take an accepted theory and its formalism as their point of departure. Philosophical discussions of quantum mechanics, for instance, can begin with the Hilbert space formulation of the theory and develop their arguments with reference to it. The situation in SM is different. Unlike theories such as quantum mechanics, SM has not yet found a generally accepted theoretical framework or a canonical formalism. What we encounter in SM is a plurality of different approaches and schools of thought, each with its own mathematical apparatus and foundational assumptions. For this reason, a review of the philosophy of SM cannot simply start with a statement of the theory’s basic principles and then move on to different interpretations of the theory. Our task is to first classify different approaches and then discuss how each works; a further question then concerns the relation between them. Classifying and labelling approaches raises its own issues, and different routes are possible. However, SM’s theoretical plurality notwithstanding, most of the approaches one finds in it can be brought under one of three broad theoretical umbrellas. These are known as “Boltzmannian SM” (BSM), the “Boltzmann Equation” (BE), and “Gibbsian SM” (GSM). The label “BSM” is somewhat unfortunate because it might suggest that Boltzmann, only (or primarily) championed this particular approach, whereas he has in fact contributed to the development of many different theoretical positions (for an overview of his contributions to SM see the entry on Boltzmann’s work in statistical physics; for detailed discussions see Cercignani (1998), Darrigol (2018), and Uffink (2007)). These labels have, however, become customary and so we stick with “BSM” despite its historical infelicity. We will now discuss the theoretical backdrop against which these positions are formulated, namely dynamical systems, and then introduce the positions in §4, §5, and §6, respectively. Extensive synoptic discussion of SM can also be found in Frigg (2008b), Shenker (2017a, 2017b), Sklar (1993), and Uffink (2007). 3. Dynamical Systems Before delving into the discussion of SM, some attention needs to be paid to the “M” in SM. The mechanical background theory against which SM is formulated can be either classical mechanics or quantum mechanics, resulting in either classical SM or quantum SM. Foundational debates are by and large conducted in the context of classical SM. We follow this practice in the current entry, but we briefly draw attention to problems and issues that occur when moving from a classical to a quantum framework (§4.8). From the point of view of classical mechanics, the systems of interest in SM have the structure of dynamical system, a triple \((X,\) \(\phi,\) \(\mu).\) \(X\) is the state space of the system (and from a mathematical point of view is a set). In the case of a gas with \(n\) molecules this space has \(6n\) dimensions: three coordinates specifying the position and three coordinates specifying the momentum of each molecule. \(\phi\) is the time evolution function, which specifies how a system’s state changes over time, and we write \(\phi_{t}(x)\) to denote the state into which \(x\) evolves after time \(t\). If the dynamic of the system is specified by an equation of motion like Newton’s or Hamilton’s, then \(\phi\) is the solution of that equation. If we let time evolve, \(\phi_{t}(x)\) draws a “line” in \(X\) that represents the time evolution of a system that was initially in state \(x\); this “line” is called a trajectory. Finally, \(\mu\) is a measure on \(X\), roughly a means to say how large a part of \(X\) is. This is illustrated schematically in Figure 2. For a more extensive introductory discussion of dynamical systems see the entry on the ergodic hierarchy, section on dynamical systems, and for mathematical discussions see, for instance, Arnold and Avez (1967 [1968]) and Katok and Hasselblatt (1995). It is standard to assume that \(\phi\) is deterministic, meaning, that every state \(x\) has exactly one past and exactly one future, or, in geometrical terms, that trajectories cannot intersect (for a discussion of determinism see Earman (1986)). The systems studied in BSM are such that the volume of “blobs” in the state space is conserved: if we follow the time evolution of a “blob” in state space, this blob can change its shape but not its volume. From a mathematical point of view, this amounts to saying that the dynamics is measure-preserving: \(\mu(A) = \mu(\phi_{t}(A))\) for all subsets \(A\) of \(X\) and for all times \(t\). Systems in SM are often assumed to be governed by Hamilton’s equations of motion, and it is a consequence of Liouville’s theorem that the time evolution of a Hamiltonian system is measure-preserving. 4. Boltzmannian Statistical Mechanics (BSM) In the current debate, “BSM” denotes a family of positions that take as their starting point the approach that was first introduced by Boltzmann in his 1877 paper and then presented in a streamlined manner by Ehrenfest and Ehrenfest-Afanassjewa in their 1911 [1959] review. In this section we discuss different contemporary articulations of BSM along with the challenges they face. 4.1 The Framework of BSM To articulate the framework of BSM, we distinguish between micro-states and macro-states; for a discussion of the this framework see, for instance, Albert (2000), Frigg (2008b), Goldstein (2001), and Sklar (1993). The micro-state of a system at time \(t\) is the state \(x \in X\) in which the system is at time \(t\). This state specifies the exact mechanical state of every micro-constituent of the system. As we have seen in the previous section, in the case of a gas \(x\) specifies the positions and momenta of every molecule in the gas. Intuitively, the macro-state \(M\) of a system at time \(t\) specifies the macro-constitution of the system at \(t\) in terms of variables like volume, temperature and other properties measurable, loosely speaking, at human scales, although, as we will see in Section 4.8, reference to thermodynamic variables in this context must be taken with a grain of salt. The configurations shown in Figure 1 are macro-states in this sense. The core posit of BSM is that macro-states supervene on micro-states, meaning that any change in the system’s macro-state must be accompanied by a change in the system’s micro-state: every micro-state \(x\) has exactly one corresponding macro-state \(M\). This rules out that, say, the pressure of a gas can change while the positions and momenta each of its molecules remain the same (see entry on supervenience). Let \(M(x)\) be the unique macro-state that corresponds to microstate \(x\). The correspondence between micro-states and macro-states typically is not one-to-one and macro-states are multiply realisable. If, for instance, we swap the positions and momenta of two molecules, the gas’ macro-state does not change. It is therefore natural to group together all micro-states \(x\), that correspond to the same macro-state \(M\): \(X_{M}\) is the macro-region of \(M\). Now consider a complete set of macro-states (i.e., a set that contains every macro-state that the system can be in), and assume that there are exactly \(m\) such states. This complete set is \(\{ M_{1},\ldots,M_{m}\}\). It is then the case that the corresponding set of macro-regions, \(\{ X_{M_{1}},\ldots,X_{M_{m}}\}\), forms a partition of \(X\), meaning that the elements of the set do not overlap and jointly cover \(X\). This is illustrated in Figure 3. The figure also indicates that if the system under study is a gas, then the macro-states correspond to different states of the gas we have seen in Section 1. Specifically, one of the macro-states corresponds to the initial state of the gas, and another one corresponds to its final equilibrium state. This raises two fundamental questions that occupy centre stage in discussions about BSM. First, what are macro-states and how is the equilibrium state identified? That is, where do we get the set \(\{M_{1},\ldots,M_{m}\}\) from and how do we single out one member of the set as the equilibrium macro-state? Second, as already illustrated in Figure 3, an approach to equilibrium takes place if the time evolution of the system is such that a micro-state \(x\) in a non-equilibrium macro-region evolves such that \(\phi_{t}(x)\) lies in the equilibrium macro-region at a later point in time. Ideally one would want this to happen for all \(x\) in any non-equilibrium macro-region, because this would mean that all non-equilibrium states would eventually approach equilibrium. The question now is whether this is indeed the case, and, if not, what “portion” of states evolves differently. Before turning to these questions, let us introduce the Boltzmann entropy \(S_{B}\), which is a property of a macro-state defined through the measure of the macro-states’ macro-region: for all \(i = 1,\ldots, m\), where \(k\) is the so-called Boltzmann constant. Since the logarithm is a monotonic function, the larger the measure \(\mu\) of a macro-region, the larger the entropy of the corresponding macro-state. This framework is the backbone of positions that self-identify as “Boltzmannian”. Differences appear in how the elements of this framework are articulated and in how difficulties are resolved. 4.2 Defining Equilibrium: Boltzmann’s Combinatorial Argument An influential way of defining equilibrium goes back to Boltzmann (1877); for contemporary discussion of the argument see, for instance, Albert (2000), Frigg (2008b), and Uffink (2007). The approach first focusses on the state space of one particle of the system, which in the case of a gas has six dimensions (three for the particle’s positions in each spatial dimension and a further three for the corresponding momenta). We then introduce a grid on this space—an operation known as coarse-graining—and say that two particles have the same coarse-grained micro-state if they are in the same grid cell. The state of the entire gas is then represented by an arrangement, a specification of \(n\) points on this space (one for each particle in the gas). But for the gas’ macro-properties it is irrelevant which particle is in which state, meaning that the gas’ macro-state must be unaffected by a permutation of the particles. All that the macro-state depends on is the distribution of particles, a specification of how many particles are in each grid cell. The core idea of the approach is to determine how many arrangements are compatible with a given distribution, and to define the equilibrium state as the one for which this number is maximal. Making the strong (and unrealistic) assumption that the particles in the gas are non-interacting (which also means that they never collide) and that the energy of the gas is preserved, Boltzmann offered a solution to this problem and showed that the distribution for which the number of arrangements is maximal is the so-called discrete Maxwell-Boltzmann distribution: where \(n_{i}\) is the number of particles in cell \(i\) if the coarse-graining, \(E_{i}\) is the energy of a particle in that cell, and \(\alpha\) and \(\beta\) are constants that depend on the number of particles and the temperature of the system (Tolman 1938 [1979]: Ch. 4). From a mathematical point of view, deriving this distribution is a problem in combinatorics, which is why the approach is now known as the combinatorial argument. As Paul and Tatiana Ehrenfest pointed out in their 1911 [1959] review, the mathematical structure of the argument also shows that if we now return to the state space \(X\) of the entire system (which, recall, has \(6n\) dimensions), the macro-region of the equilibrium state thus defined is the largest of all macro-regions. Hence, the equilibrium macro-state is the macro-state with the largest macro-region. In contemporary discussions this is customarily glossed as the equilibrium macro-state not only being larger than any other macro-state, but as being enormously larger and in fact taking up most of \(X\) (see, for instance, Goldstein 2001). However, as Lavis (2008) points out, the formalism only shows that the equilibrium macro-region is larger than any other macro-region and it is not a general truism that it takes up most of the state space; there are in fact systems in which the non-equilibrium macro-regions taken together are larger than the equilibrium macro-region. Since, as we have seen, the Boltzmann entropy is a monotonic function of the measure of a macro-region, this implies that the equilibrium microstate is also the macro-state with the largest Boltzmann entropy, and the approach to equilibrium is a process that can be characterised by an increase of entropy. Two questions arise: first, is this a tenable general definition of equilibrium, and, second, how does it explain the approach to equilibrium? As regards the first question, Uffink (2007) highlights that the combinatorial argument assumes particles to be non-interacting. The result can therefore be seen as a good approximation for dilute gases, but it fails to describe (even approximately) interacting systems like liquids and solids. But important applications of SM are to systems that are not dilute gases and so this is a significant limitation. Furthermore, from a conceptual point of view, the problem is that a definition of equilibrium in terms of the number of arrangements compatible with a distribution makes no contact with the thermodynamic notion of equilibrium, where equilibrium is defined as the state to which an isolated system converges when left to itself (Werndl & Frigg 2015b). Finally, this definition of equilibrium is completely disconnected form the system’s dynamics, which has the odd consequence that it would still provide an equilibrium state even if the system’s time evolution was the identity function (and hence nothing ever changed and no approach to equilibrium took place). And even if one were to set thermodynamics aside, there is nothing truly macro about the definition, which in fact directly constructs a macro-region without ever specifying a macro-state. A further problem (still as regards the first question) is the justification of coarse-graining. The combinatorial argument does not get off the ground without coarse-grained micro-states, and so the question is what legitimises the use of such states. The problem is accentuated by the facts that the procedure only works for particular kind of coarse-graining (namely if the grid is parallel to the position and momentum axes) and that the grid cannot be eliminated by taking a limit which lets the grid size tend toward zero. A number of justificatory strategies have been proposed but none is entirely satisfactory. A similar problem arises with coarse-gaining in Gibbsian SM, and we refer the reader to Section 6.5 for a discussion. As regards the second question, the combinatorial argument itself is silent about why and how systems approach equilibrium and additional ingredients must be added to the account to provide such an explanation. Before discussing some of these ingredients (which is the topic of much of the remainder of this section), let us discuss two challenges that every explanation of the approach to equilibrium must address: the reversibility problem and the recurrence problem. 4.3 Two Challenges: Reversibility and Recurrence In Section 2 we have seen that at bottom the physical systems of BSM have the structure of a dynamical system \((X,\) \(\phi,\) \(\mu)\) where \(\phi\) is deterministic and measure preserving. Systems of this kind have two features that pose a challenge for an understanding of the approach to equilibrium. The first feature is what is known as time-reversal invariance. Intuitively you can think of the time-reversal of a process as what you get when you play a movie of a process backwards. The dynamics of system is time-reversal invariant if every process that is allowed to happen in one direction of time is also allowed to happen the reverse direction of time. That is, for every process that is allowed by the theory it is that case that if you capture the process in a movie, then the process that you see when you play the movie backwards is also allowed by the theory; for detailed and more technical discussions see, for instance, Earman (2002), Malament (2004), Roberts (2022), and Uffink (2001). Hamiltonian systems are time-reversal invariant and so the most common systems studied in SM have this property. A look at Figure 3 makes the consequences of this for an understanding of the approach to equilibrium clear. We consider a system whose micro-state initially lies in a non-equilibrium macro-region and then evolves into a micro-state that lies in the equilibrium macro-region. Obviously, this process ought to be allowed by the theory. But this means that the reverse process—a process that starts in the equilibrium macro-region and moves back into the initial non-equilibrium macro region—must be allowed too. In Section 1 we have seen that the approach to equilibrium is expected to be irreversible, prohibiting systems like gases to spontaneously leave equilibrium and evolve into a non-equilibrium state. But we are now faced with a contradiction: if the dynamics of the system is time-reversal invariant, then the approach to equilibrium cannot be irreversible because the evolution from the equilibrium state to a non-equilibrium state is allowed. This observation is known as Loschmidt's reversibility objection because it was first put forward by Loschmidt (1876); for a historical discussion of this objection, see Darrigol (2021). The second feature that poses a challenge is Poincaré recurrence. The systems of interest in BSM are both measure-preserving and spatially bounded: they are gases in a box, liquids in a container and crystals on a laboratory table. This means that the system’s micro-state can only access a finite region in \(X\). Poincaré showed that dynamical systems of this kind must, at some point, return arbitrarily close to their initial state, and, indeed do so infinitely many times. The time that it takes the system to return close to its initial condition is called the recurrence time. Like time-reversal invariance, Poincaré recurrence contradicts the supposed irreversibility of the approach to equilibrium: it implies that systems will return to non-equilibrium states at some point. One just has to wait for long enough. This is known as Zermelo’s recurrence objection because it was first put forward by Zermelo (1896); for a historical discussion see Uffink (2007). Any explanation of the approach to equilibrium has to address these two objections. 4.4 The Ergodic Approach A classical explanation of the approach to equilibrium is given within ergodic theory. A system is ergodic iff, in the long run (i.e., in the limit of time \(t \rightarrow \infty\)), for almost all initial conditions it is the case that the fraction of time that the system’s trajectory spends in a region \(R\) of \(X\) is equal to the fraction that \(R\) occupies in \(X\) (Arnold & Avez 1967 [1968]). For instance, if \(\mu(R)/\mu(X) = 1/3,\) then an ergodic system will, in the long run, spend 1/3 of its time in \(R\) (for a more extensive discussion of ergodicity see entry on the ergodic hierarchy). In Section 4.2 we have seen that if the equilibrium macro-region is constructed with the combinatorial argument, then it occupies the largest portion of \(X\). If we now also assume that the system is ergodic, it follows immediately that the system spends the largest portion of time in equilibrium. This is then often given a probabilistic gloss by associating the time that a system spends in a certain part of \(X\) with the probability of finding the system in that part of \(X\), and so we get that we are overwhelmingly likely to find that system in equilibrium; for a discussion of this approach to probabilities see Frigg (2010) and references therein. The ergodic approach faces a number of problems. First, being ergodic is a stringent condition that many systems fail to meet. This is a problem because among those systems are many to which SM is successfully applied. For instance, in a solid the molecules oscillate around fixed positions in a lattice, and as a result the phase point of the system can only access a small part of the energy hypersurface (Uffink 2007: 1017). The Kac Ring model and a system of anharmonic oscillators behave thermodynamically but fail to be ergodic (Bricmont 2001). And even the ideal gas—supposedly the paradigm system of SM—is not ergodic (Uffink 1996b: 381). But if core-systems of SM are not ergodic, then ergodicity cannot provide an explanation for the approach to equilibrium, at least not one that is applicable across the board (Earman & Rédei 1996; van Lith 2001). Attempts have been made to improve the situation through the notion of epsilon-ergodicity, where a system is epsilon-ergodic if it is ergodic only on subset \(Y \subset X\) where \(\mu(Y) \geq 1 - \varepsilon\), for small positive real number \(\varepsilon\) (Vranas 1998). While this approach deals successfully with some systems (Frigg & Werndl 2011), it is still not universally applicable and hence remains silent about large classes of SM systems. The ergodic approach accommodates Loschmidt’s and Zermelo’s objections by rejecting the requirement of strict irreversibility. The approach insists that systems, can, and actually do, move away from equilibrium. What SM should explain is not strict irreversibility, but the fact that systems spend most of the time in equilibrium. The ergodic approach does this by construction, and only allows for brief and infrequent episodes of non-thermodynamic behaviour (when the system moves out of equilibrium). This response is in line with Callender (2001) who argues that we should not take thermodynamics “too seriously” and see its strictly irreversible approach to equilibrium as an idealisation that is not empirically accurate because physical systems turn out to exhibit equilibrium fluctuations. A more technical worry is what is known as the measure zero problem. As we have seen, ergodicity says that “almost all initial conditions” are such that the fraction of time spent in \(R\) is equal to the fraction \(R\) occupies in \(X\). In technical terms this means that set of initial conditions for which this is not the case has measure zero (with respect to \(\mu\)). Intuitively this would seem to suggest that these conditions are negligible. However, as Sklar (1993: 182–88) points out, sets of measure zero can be rather large (remember that set of rational numbers has measure zero in the real numbers), and the problem is to justify why a set of measure zero really is negligible. 4.5 Typicality An alternative account explains the approach to equilibrium in terms of typicality. Intuitively something is typical if it happens in the “vast majority” of cases: typical lottery tickets are blanks, and in a typical series of a thousand coin tosses the ratio of the number of heads and the number of tails is approximately one. The leading idea of a typicality-based account of SM is to show that thermodynamic behaviour is typical and is therefore to be expected. The typicality account comes in different version, which disagree on how exactly typicality reasoning is put to use; different versions have been formulated, among others, by Goldstein (2001), Goldstein and Lebowitz (2004), Goldstein, Lebowitz, Tumulka, and Zanghì (2006), Lebowitz (1993a, 1993b), and Volchan (2007). In its paradigmatic version, the account builds on the observation (discussed in Section 4.2) that the equilibrium macro-region is so large that \(X\) consists almost entirely of equilibrium micro-states, which means that equilibrium micro-states are typical in \(X\). The account submits that, for this reason, a system that starts its time-evolution in a non-equilibrium state can simply not avoid evolving into a typical state—i.e., an equilibrium state—and staying there for very long time, which explains the approach to equilibrium. Frigg (2009, 2011) and Uffink (2007) argue that from the point of view of dynamical systems theory this is unjustified because there is no reason to assume that micro-states in an atypical set have to evolve into a typical set without there being any further dynamical assumptions in place. To get around this problem Frigg and Werndl (2012) formulate a version of the account that takes the dynamics of the system into account. Lazarovici and Reichert (2015) disagree that such additions are necessary. For further discussions of the use of typicality in SM, see Badino (2020), Bricmont (2022), Chibbaro, Rondoni and Vulpiani (2022), Crane and Wilhelm (2020), Goldstein (2012), Hemmo and Shenker (2015), Luczak (2016), Maudlin (2020), Reichert (forthcoming), and Wilhelm (2022). As far as Loschmidt’s and Zermelo’s objections are concerned, the typicality approach has to make the same move as the ergodic approach and reject strict irreversibility as a requirement. 4.6 The Mentaculus and the Past-Hypothesis An altogether different approach has been formulated by Albert (2000). This approach focusses on the internal structure of macro-regions and aims to explain the approach to equilibrium by showing that the probability for system in a non-equilibrium macro-state to evolve toward a macro-state of higher Boltzmann entropy is high. The basis for this discussion is the so-called statistical postulate. Consider a particular macro-state \(M\) with macro-region \(X_{M}\) and assume that the system is in macro-state \(M\). The postulate then says that for any subset \(A\) of \(X_{M}\) the probability of finding the system’s micro-state in \(A\) is \({\mu(A)/\mu(X}_{M})\). We can now separate the micro-states in \(X_{M}\) into those that evolve into a higher entropy macro-state and those that move toward macro-states of lower entropy. Let’s call these sets \(X_{M}^{+}\) and \(X_{M}^{-}\). The statistical postulate then says that the probability of a system in \(M\) evolving toward a higher entropy macro-state is \({\mu(X}_{M}^{+})/\mu(X_{M})\). For it to be likely that system approaches equilibrium this probability would have to be high. It now turns out that for purely mathematical reasons, if the system is highly likely to evolve toward a macro-state of higher entropy, then it is also highly likely to have evolved into the current macro-state \(M\) from a macro-state of high entropy. In other words, if the entropy is highly likely to increase in the future, it is also highly likely to have decreased in the past. Albert suggests solving this problem by regarding the entire universe as the system being studied and then conditionalizing on the Past-Hypothesis, which is the assumption that that the world first came into being in whatever particular low-entropy highly condensed big-bang sort of macrocondition it is that the normal inferential procedures of cosmology will eventually present to us. (2000: 96) Let \(M_{p}\) be the past state, the state in which the world first came into being according to the Past-Hypothesis, and let \(I_{t} = \phi_{t}(X_{M_{p}}) \cap X_{M}\) be the intersection of the time-evolved macro-region of the past state and the current macro-state. The probability of high entropy future is then \({\mu(I_{t} \cap X}_{M}^{+})/\mu(I_{t})\). If we further assume “abnormal” states with low entropy futures are scattered all over \(X_{M}\), then a high entropy future can be highly likely without it a high entropy past also being highly likely. This approach to SM is based on three core elements: the deterministic time evolution of the system given by \(\phi_{t}\), the Past-Hypothesis, and the statistical postulate. Together they result in the assignment of a probability to propositions about the history of a system. Albert (2015) calls this assignment the Mentaculus. Albert regards the Mentaculus not only as an account of thermodynamic phenomena, but as the backbone of a complete scientific theory of the universe because the Mentaculus assigns probabilities to propositions in all sciences. This raises all kind of issues about the nature of laws, reduction, and the status of the special sciences, which are discussed, for instance, in Frisch (2011), Hemmo and Shenker (2021) and Myrvold and others (2016). Like the ergodic approach, the Mentaculus must accommodate Loschmidt’s and Zermelo’s objections by rejecting the requirement of strict irreversibility. Higher to lower entropy transitions are still allowed, but they are rendered unlikely, and recurrence can be tamed by noting that the recurrence time for a typical SM system is larger than age of the universe, which means that we won’t observe recurrence (Bricmont 1995; Callender 1999). Yet, this amounts to admitting that entropy increase is not universal and the formalism is compatible with there being periods of decreasing entropy at some later point in the history of the universe. A crucial ingredient of the Mentaculus is the Past-Hypothesis. The idea of grounding thermodynamic behaviour in a cosmic low-entropy past can be traced back to Boltzmann (Uffink 2007: 990) and has since been advocated by prominent physicists like Feynman (1965: Ch. 5) and R. Penrose (2004: Ch. 27). This raises two questions: first, can the Past-Hypothesis be given a precise formulation that serves the purpose of SM, and, second, what status does the Past-Hypothesis have and does the fact that the universe started in this particular state require an explanation? As regards the first question, Earman has cast the damning verdict that the Past-Hypothesis is “not even false” (2006) because in cosmologies described in general relativity there is no well-defined sense in which the Boltzmann entropy has a low value. A further problem is that in the Mentaculus the Boltzmann entropy is a global quantity characterising the entire universe. But, as Winsberg points out, the fact that this quantity is low does not imply that the entropy of a particular small subsystem of interest is also low, and, worse just because the overall entropy of the universe increases it need not be the case that the entropy in a small subsystem also increases (2004a). The source of these difficulties is that the Mentaculus takes the entire universe to be the relevant system and so one might try get around them by reverting to where we started: laboratory systems like gases in boxes. One can then take the the past state simply to be the state in which such a gas is prepared at the beginning of a process (say in the left half of the container). This leads to the so-called branch systems approach, because a system is seen as “branching off” from the rest of the universe when it is isolated from its environment and prepared in non-equilibrium state (Davies 1974; Sklar 1993: 318–32). Albert (2000) dismisses this option for a number of reasons, chief among them that it is not clear why one should regard the statistical postulate as valid for such a state (see Winsberg (2004b) for a discussion). As regards the second question, Chen (forthcoming), Goldstein (2001), and Loewer (2001) argue that Past-Hypothesis has the status of a fundamental law of nature. Albert seems to regard it as something like a Kantian regulative principle in that its truth must be assumed in order to make knowledge of the past possible at all. By contrast, Callender, Price, and Wald regard that the Past-Hypothesis a contingent matter of fact, but they disagree on whether this fact stands in need of an explanation. Price (1996, 2004) argues that it does because the crucial question in SM is not why entropy increase, but rather why it ever got to be low in the first place. Callender (1998, 2004a, 2004b) disagrees: the Past-Hypothesis simply specifies initial conditions of a process, and initial conditions are not the kind of thing that needs to be explained (see also Sklar (1993: 309–18)). Parker (2005) argues that conditionalising on the initial state of the universe does not have the explanatory power to explain irreversible behaviour. Baras and Shenker (2020) and Farr (2022) analysed the notion of explanation that is involved in this debate and argue that different questions are in play that require different answers. 4.7 The Long-Run Residence Time Account The long-run residence time account offers a different perspective both on the definition of equilibrium and the approach to it (Werndl & Frigg 2015a, 2015b). Rather than first defining equilibrium through combinatorial considerations (as in §4.2) and then asking why systems approach equilibrium thus defined (as do the accounts discussed in §§4.4–4.6), the long-run residence time account defines equilibrium through thermodynamic behaviour. The account begins by characterising the macro-states in the set \(\{ M_{1},\ldots,M_{n}\}\) in purely macroscopic terms, i.e., through thermodynamic variables like pressure and temperature, and then identifies the state in which a system resides most of the time as the equilibrium state: among the \(M_{i}\), the equilibrium macro-state is by definition the state in which a system spends most of its time in the long run (which gives the account its name). This definition requires no assumption about the size of the equilibrium macro-region, but one can then show that it is a property of the equilibrium macro-state that its macro-region is large. This result is fully general in that it does not depend on assumptions like particles being non-interacting (which makes it applicable to all systems including liquids and solids), and it does not depend on combinatorial considerations at the micro-level. The approach to equilibrium is built into the definition in the sense that if there is no macro-state in which the system spends most of its time, then the system simply has no equilibrium. This raises the question of the circumstances under which an equilibrium exists. The account answers this question by providing a general existence theorem which furnishes criteria for the existence of an equilibrium state (Werndl & Frigg forthcoming-b). Intuitively, the existence theorem says that there is an equilibrium just in case the system’s state space is split up into invariant regions on which the motion is ergodic and the equilibrium macro-state is largest in size relative to the other macro-states on each such region. Like the account previously discussed, the long-run residence time account accommodates Loschmidt’s and Zermelo’s objections by rejecting the requirement of strict irreversibility: it insists that being in equilibrium most of the time is as much as one can reasonably ask for because actual physical systems show equilibrium fluctuations and equilibrium is not the dead and immovable state that thermodynamics says it is. 4.8 Problems and Limitations BSM enjoys great popularity in foundational debates due to its clear and intuitive theoretical structure. Nevertheless, BSM faces a number of problems and limitations. The first problem is that BSM only deals with closed systems that evolve under their own internal dynamics. As we will see in Section 6, GSM successfully deals with systems that can exchange energy and even particles with their environments, and systems of this kind play an important role in SM. Those who think that SM only deals with the entire universe can set this problem aside because the universe (arguably) is a closed system. However, those who think that the objects of study in SM are laboratory-size systems like gases and crystals will have to address the issues of how BSM can accommodate interactions between systems and their environments, which is a largely ignored problem. A second problem is that even though macro-states are ubiquitous in discussions about BSM, little attention is paid to a precise articulation of what these states are. There is loose talk about how a system looks from macroscopic perspective, or there is a vague appeal to thermodynamic variables. However, by the lights of thermodynamics, variables like pressure and temperature are defined only in equilibrium and it remains unclear how non-equilibrium states, and with them the approach to equilibrium, should be characterised in terms of thermodynamic variables. Frigg and Werndl (forthcoming-a) suggest solving this problem by defining macro-states in terms of local field-variables, but the issue needs further attention. A third problem is that current formulations of BSM are closely tied to deterministic classical systems (§3). Some versions of BSM can be formulated based on classical stochastic system (Werndl & Frigg 2017). But the crucial question is whether, and if so how, a quantum version of BSM can be formulated (for a discussion see the entry on quantum mechanics). Dizadji-Bahmani (2011) discusses how a result due to Linden and others (2009) can be used to construct an argument for the conclusion that an arbitrary small subsystem of a large quantum system typically tends toward equilibrium. Chen (forthcoming) formulates a quantum version of the Mentaculus, which he calls the Wentaculus (see also his 2022). Goldstein, Lebowitz, Tumulka, and Zanghì (2020) describe a quantum analogue of the Boltzmann entropy and argue that the Boltzmannian conception of equilibrium is vindicated also in quantum mechanics by recent work on thermalization of closed quantum systems. These early steps have not yet resulted in comprehensive and widely accepted formulation of quantum version of BSM, the formulation of a such a version of remains an understudied topic. Albert (2000: Ch. 7) suggested that the spontaneous collapses of the so-called GRW theory (for introduction see the entry on collapse theories), a particular approach quantum mechanics, could be responsible for the emergence of thermodynamic irreversibility. Te Vrugt, Tóth and Wittkowski (2021) put this proposal to test in computer simulations and found that for initial conditions leading to anti-thermodynamic behaviour GRW collapses do not lead to thermodynamic behaviour and that therefore the GRW does not induce irreversible behaviour. Finally, there is no way around recognising that BSM is mostly used in foundational debates, but it is GSM that is the practitioner’s workhorse. When physicists have to carry out calculations and solve problems, they usually turn to GSM which offers user-friendly strategies that are absent in BSM. So either BSM has to be extended with practical prescriptions, or it has to be connected to GSM so that it can benefit from its computational methods (for a discussion of the latter option see §6.7). 5. The Boltzmann Equation A different approach to the problem is taken by Boltzmann in his famous (1872 [1966 Brush translation]) paper, which contains two results that are now known as the Boltzmann Equation and the H-theorem. As before, consider a gas, now described through a distribution function \(f_{t}(\vec{v})\), which specifies what fraction of molecules in the gas has a certain velocity \(\vec{v}\) at time \(t\). This distribution can change over time, and Boltzmann’s aim was to show that as time passes this distribution function changes so that it approximates the Maxwell-Boltzmann distribution, which, as we have seen in Section 4.2, is the equilibrium distribution for a gas. To this end, Boltzmann derived an equation describing the time evolution of \(f_{t}(\vec{v})\). The derivation assumes that the gas consists of particles of diameter \(D\) that interact like hard spheres (i.e., they interact only when they collide); that all collisions are elastic (i.e., no energy is lost); that the number of particles is so large that their distribution, which in reality is discrete, can be well approximated by a continuous and differentiable function \(f_{t}(\vec{v})\); and that the density of the gas is so low that only two-particle collisions play a role in the evolution of \(f_{t}(\vec{v})\). The crucial assumption in the argument is the so-called “Stosszahlansatz”, which specifies how many collisions of a certain type take place in certain interval of time (the German “Stosszahlansatz” literally means something like “collision number assumption”). Assume the gas has \(N\) molecules per unit volume and the molecules are equally distributed in space. The type of collisions we are focussing on is the one between a particle with velocity \(\vec{v}_{1}\) and one with velocity \(\vec{v}_{2}\), and we want to know the number \(N(\vec{v}_{1}, \vec{v}_{2})\) of such collisions during a small interval of time \(\Delta t\). To solve this problem, we begin by focussing on one molecule with \(\vec{v}_{1}\). The relative velocity of this molecule and a molecule moving with \(\vec{v}_{2}\) is \(\vec{v}_{2} - \vec{v}_{1}\) and the absolute value of that relative velocity is \(\left\| \vec{v}_{2} - \vec{v}_{1} \right\|\). Molecules of diameter D only collide if their centres come closer than \(D\). So let us look at a cylinder with radius \(D\) and height \(\left\| \vec{v}_{2} - \vec{v}_{1} \right\|\Delta t\), which is the volume in space in which molecules with velocity \(\vec{v}_{2}\) would collide with our molecule during \(\Delta t\). The volume of this cylinder is If we now make the strong assumption that the initial velocities of colliding particles are independent, it follows that number of molecules with velocity \(\vec{v}_{2}\) in a unit volume of the gas at time \(t\) is \(Nf_{t}(\vec{v}_{2})\), and hence the number of such molecules in our cylinder is This is the number of collisions that the molecule we are focussing on can be expected to undergo during \(\Delta t\). But there is nothing special about this molecule, and we are interested in the number of all collisions between particles with velocities \(\vec{v}_{1}\) and \(\vec{v}_{2}\). To get to that number, note that the number of molecules with velocity \(\vec{v}_{1}\) in a unit volume of gas at time \(t\) is \(Nf_{t}(\vec{v}_{1})\). That is, there are \(Nf_{t}(\vec{v}_{1})\) molecules like the one we were focussing on. It is then clear that the total number of collisions can be expected to be the product of the number of collisions for each molecule with \(\vec{v}_{1}\) times the number of molecules with \(\vec{v}_{1}\): This is the Stosszahlansatz. For ease of presentation, we have made the mathematical simplification of treating \(f_{t}(\vec{v})\) as a fraction rather than as density in our discussion of the Stosszahlansatz; for a statement of the Stosszahlansatz for densities see, for instance, Uffink (2007). Based on the Stosszahlansatz, Boltzmann derived what is now known as the Boltzmann Equation: where \(\vec{v}_{1}^{*}\) and \(\vec{v}_{2}^{*}\) are the velocities of the particles after the collision. The integration is over the space of the box that contains the gas. This is a so-called integro-differential equation. The details of this equation need not concern us (and the mathematics of such equations is rather tricky). What matters is the overall structure, which says that the way the density \(f_{t}(\vec{v})\) changes over time depends on the difference of the products of the densities of the incoming an of the outgoing particles. Boltzmann then introduced the quantity \(H\), and proved that \(H\) decreases monotonically in time, and that \(H\) is stationary (i.e., \(dH\lbrack f_{t}(\vec{v}) \rbrack/dt = 0\)) iff \(f_{t}(\vec{v})\) is the Maxwell-Boltzmann distribution. These two results are the H-Theorem. The definition of \(H\) bears formal similarities both to the expression of the Boltzmann entropy in the combinatorial argument (§4.3) and, as we will see, to the Gibbs entropy (§6.3); in fact \(H\) looks like a negative entropy. For this reason the H-theorem is often paraphrased as showing that entropy increases monotonically until the system reaches the equilibrium distribution, which would provide a justification of thermodynamic behaviour based on purely mechanical assumptions. Indeed, in his 1872 paper, Boltzmann himself regarded it as a rigorous general proof of the Second Law of thermodynamics (Uffink 2007: 965; Klein 1973: 73). The crucial conceptual questions at this point are: what exactly did Boltzmann prove with the H-theorem? Under which conditions is the Boltzmann Equation valid? And what role do the assumptions, in particular, the Stosszahlansatz play in deriving it? The discussion of these question started four years after the paper was published, when Loschmidt put forward his reversibility objection (§4.3). This objection implies that \(H\) must be able to increase as well as decrease. Boltzmann’s own response to Loschmidt’s challenge and the question of the scope of the H-theorem is a matter of much debate; for discussions see, for instance, Brown, Myrvold, and Uffink (2009), Cercignani (1998), Brush (1976), and Uffink (2007). We cannot pursue this matter here, but the gist of Boltzmann’s reply would seem to have been that he admitted that there exists initial states for which \(H\) decreases, but that these rarely, if ever, occur in nature. This leads to what is now known as a statistical reading of the H-theorem: the H-theorem shows entropy increase to be likely rather universal. A century later, Lanford published a string of papers (1973, 1975, 1976, 1981) culminating in what is now known as Lanford’s theorem, which provides rigorous results concerning the validity of the Boltzmann Equation. Lanford’s starting point is the question whether, and if so in what sense, the Boltzmann equation is consistent with the underlying Hamiltonian dynamics. To this end, note that every point \(x\) in the state space \(X\) of a gas has a distribution \(f_{x}(\vec{r}, \vec{v})\) associated with it, where \(\vec{r}\) and \(\vec{v}\) are, respectively, the location and velocity of one particle (recall from §3 that \(X\) contains the position and momenta of all molecules). For a finite number of particles \(f_{x}(\vec{r}, \vec{v})\) is not continuous, let alone differentiable. So as a first step, Lanford developed a way to obtain a differentiable distribution function distribution \(f^{(x)}(\vec{r}, \vec{v})\), which involves taking the so-called Boltzmann-Grad limit. He then evolved this distribution forward in time both under the fundamental Hamiltonian dynamics, which yields \(f_{\text{Ht}}^{(x)}(\vec{r}, \vec{v})\), and under the Boltzmann Equation, which yields \(f_{\text{Bt}}^{(x)}(\vec{r}, \vec{v})\). Lanford’s theorem compares these two distributions and essentially says that for most points \(x\) in \(X\), \(f_{\text{Ht}}^{(x)}(\vec{r}, \vec{v})\) and \(f_{\text{Bt}}^{(x)}(\vec{r}, \vec{v})\) are close to each other for times in the interval \(\left\lbrack 0, t^{*} \right\rbrack,\) where \(t^{*}\) is a cut-off time (where “most” is judged by the so-called microcanonical measure on the phase space; for discussion of this measure see §6.1). For rigorous statements and further discussions of the theorem see Ardourel (2017), Uffink and Valente (2015), and Valente (2014). Lanford's theorem is a remarkable achievement because it shows that a statistical and approximate version of the Bolzmann Equation can be derived from the Hamiltonian mechanics and most initial conditions in the Bolzmann-Grad limit for a finite amount of time. In this sense it can be seen as a vindication of Boltzmann’s statistical version of the H-theorem. At the same time the theorem also highlights the limitations of the approach. The relevant distributions are close to each other only up to time \(t^{*}\), and it turns out that \(t^{*}\) is roughly two fifths of the mean time a particle moves freely between two collisions. But this is a very short time! During the interval \(\left\lbrack 0, t^{*} \right\rbrack\), which for a gas like air at room temperature is in the order of microseconds, on average 40% of the molecules in the gas will have been involved in one collision and the other 60% will have moved freely. This is patiently too short to understand macroscopic phenomena like the one that we described at the beginning of this article, which take place on a longer timescale and will involve many collisions for all particles. And like Boltzmann's original results, Lanford's theorem also depends on strong assumptions, in particular a measure-theoretic version of the Stosszahlansatz and Valente (cf. Uffink & Valente 2015). Finally, one of the main conceptual problems concerning Lanford’s theorem is where the apparent irreversibility comes from. Various opinions have been expressed on this issue. Lanford himself first argued that irreversibility results from passing to the Boltzmann-Grad limit (Lanford 1975: 110), but later changed his mind and argued that the Stosszahlansatz for incoming collision points is responsible for the irreversible behaviour (1976, 1981). Cercignani, Illner, and Pulvirenti (1994) and Cercignani (2008) claim that irreversibility arises as a consequence of assuming a hard-sphere dynamics. Valente (2014) and Uffink and Valente (2015) argue that there is no genuine irreversibility in the theorem because the theorem is time-reversal invariant. For further discussions on the role of irreversibility in Lanford’s theorem, see also Lebowitz (1983), Spohn (1980, 1991), and Weaver (2021, 2022) 6. Gibbsian Statistical Mechanics (GSM) Gibbsian Statistical Mechanics (GSM) is an umbrella term covering a number of positions that take Gibbs’ (1902 [1981]) as their point of departure. In this section, we introduce the framework and discuss different articulations of it along with the issues they face. 6.1 The Framework of GSM Like BSM, GSM departs from the dynamical system \((X,\) \(\phi,\) \(\mu)\) introduced in Section 3 (although, as we will see below, it readily generalises to quantum mechanics). But this is where the commonalities end. Rather than partitioning \(X\) into macro-regions, GSM puts a probability density function \(\rho(x)\) on \(X\), often referred to as a “distribution”. This distribution evolves under the dynamics of the system through the law where \(\rho_{0}\) is the distribution the initial time \(t_{0}\) and \(\phi_{- t}(x)\) is the micro-state that evolves into \(x\) during \(t\). A distribution is called stationary if it does not change over time, i.e., \(\rho_{t}(x)= \rho_{0}(x)\) for all \(t\). If the distribution is stationary, Gibbs says that the system is in “statistical equilibrium”. At the macro-level, a system is characterised by macro-variables, which are functions \(f:X\rightarrow \mathbb{R}\), where \(\mathbb{R}\) are the real numbers. With the exception of entropy and temperature (to which we turn below), GSM takes all physical quantities to be represented by such functions. The so-called phase average of \(f\) is The question now is how to interpret this formalism. The standard interpretation is in terms of what is known as an ensemble. An ensemble is an infinite collection of systems of the same kind that differ in their state. Crucially, this is a collection of copies of the entire system and not a collection of molecules. For this reason, Schrödinger characterised an ensemble as a collection of “mental copies of the one system under consideration” (1952 [1989: 3]). Hence the members of an ensemble do not interact with each other; an ensemble is not a physical object; and ensembles have no spatiotemporal existence. The distribution can then be interpreted as specifying “how many” systems in the ensemble have their state in certain region \(R\) of \(X\) at time \(t\). More precisely, \(\rho_{t}(x)\) is interpreted as giving the probability of finding a system in \(R\) at \(t\) when drawing a system randomly from the ensemble in much the same way in which one draws a ball from an urn: What is the right distribution for a given physical situation? Gibbs discusses this problem at length and formulates three distributions which are still used today: the microcanonical distribution for isolated systems, the canonical distribution for system with fluctuating energy, and the grand-canonical distribution for systems with both fluctuating energy and fluctuating particle number. For a discussion of the formal aspects of these distributions see, for instance, Tolman (1938 [1979]), and for philosophical discussions see Davey (2008, 2009) and Myrvold (2016). Gibbs’ statistical equilibrium is a condition on an ensemble being in equilibrium, which is different from an individual system being in equilibrium (as introduced in §1). The question is how the two relate, and what an experimenter who measures a physical quantity on a system observes. A standard answer one finds in SM textbooks appeals to the averaging principle: when measuring the quantity \(f\) on a system in thermal equilibrium, the observed equilibrium value of the property is the ensemble average \(\langle f\rangle\) of an ensemble in ensemble-equilibrium. The practice of applying this principle is often called phase averaging. One of the core challenges for GSM is to justify this principle. 6.2 Equilibrium: Why Does Phase Averaging Work? The standard justification of phase averaging that one finds in many textbooks is based on the notion of ergodicity that we have already encountered in Section 4.4. In the current context, we consider the infinite time average \(f^{*}\)of the function \(f\). It is a mathematical fact that ergodicity as defined earlier is equivalent to it being the case that \(f^{*} = \langle f \rangle\) for almost all initial states. This is reported to provide a justification for phase averaging as follows. Assume we carry out a measurement of the physical quantity represented by \(f\). It will take some time to carry out the measurement, and so what the measurement device registers is the time average over the duration of the measurement. Indeed, the time needed to make the measurement is long compared to the time scale on which typical molecular processes take place, the measured result is approximately equal to the infinite time average \(f^{*}\). By ergodicity, \(f^{*}\) is equal to \(\langle f\rangle\), which justifies the averaging principle. This argument fails for several reasons (Malament & Zabell 1980; Sklar 1993: 176–9). First, from the fact that measurements take time it does not follow that what is measured are time averages, and even if one could argue that measurement devices output time averages, these would be finite time averages and equating these finite time averages with infinite time averages is problematic because finite and infinite averages can assume very different values even if the duration of the finite measurement is very long. Second, this account makes a mystery of how we observe change. As we have seen in Section 1, we do observe how systems approach equilibrium, and in doing so we observe macro-variables changing their values. If measurements produced infinite time averages, then no change would ever be observed because these averages are constant. Third, as we already noted earlier, ergodicity is a stringent condition and many systems to which SM is successfully applied are not ergodic (Earman & Rédei 1996), which makes equating time averages and phase averages wrong. A number of approaches have been designed to either solve or circumvent these problems. Malament and Zabell (1980) suggest a method of justifying phase averaging that still invokes ergodicity but avoids an appeal to time averages. Vranas (1998) offers a reformulation of this argument for systems that are epsilon-ergodic (see §4.4). This accounts for systems that are “almost” ergodic, but remains silent about systems that are far from being ergodic. Khinchin (1949) restricts attention to systems with a large number of degrees of freedom and so-called sum functions (i.e., functions that can are a sum over one-particle functions), and shows that for such systems \(f^{*} = \langle f\rangle\) holds on the largest part of \(X\); for a discussion of this approach see Batterman (1998) and Badino (2006). However, as Khinchin himself notes, the focus on sum-functions is too restrictive to cover realistic systems, and the approach also has to revert to the implausible posit that observations yield infinite time averages. This led to a research programme now known as the “thermodynamic limit”, aiming to prove “Khinchin-like” results under more realistic assumptions. Classic statements are Ruelle (1969, 2004); for a survey and further references see Uffink (2007: 1020–8). A different approach to the problem insists that one should take the status of \(\rho(x)\) as a probability seriously and seek a justification of averaging in statistical terms. In this vein, Wallace (2015) insists that the quantitative content of statistical mechanics is exhausted by the statistics of observables (their expectation values, variances, and so on) and McCoy (2020) submits that \(\rho(x)\) is the complete physical state of an individual statistical mechanical system. Such a view renounces the association of measurement outcomes with phase averages and insists that measurements are “an instantaneous act, like taking a snapshot” (O. Penrose 1970: 17–18): if a measurement of the quantity associated with \(f\) is performed on a system at time \(t\) and the system’s micro-state at time \(t\) is \(x(t)\), then the measurement outcome at time \(t\) will be \(f(x(t))\). An obvious consequence of this definition is that measurements at different times can have different outcomes, and the values of macro-variables can change over time. One can then look at how these values change over time. One way of doing this is to look at fluctuations away from the average: where \(\Delta(t)\) is the fluctuation away from the average at time \(t\). One can then expect that a that the outcome of a measurement will be \(\langle f\rangle\) if fluctuations turn out to be small and infrequent. Although this would not seem to be the received textbook position, something like it can be identified in some, for instance Hill (1956 [1987]) and Schrödinger (1952 [1989]). A precise articulation will have to use \(\rho\) to calculate the probability of fluctuations of a certain size, and this requires the system to meet stringent dynamical conditions, namely either the masking condition or the f-independence condition (Frigg & Werndl 2021). 6.3 GSM and Approach to Equilibrium As discussed so far, GSM is an equilibrium theory, and this is also how it is mostly used in applications. Nevertheless, a comprehensive theory of SM must also account for the approach to equilibrium. To discuss the approach to equilibrium, it is common to introduce the Gibbs entropy The Gibbs entropy is a property of an ensemble characterised by a distribution \(\rho\). One might then try to characterise the approach to equilibrium as a process in which \(S_{G}\) increases monotonically to finally reach a maximum in equilibrium. But this idea is undercut immediately by a mathematical theorem saying that \(S_{G}\) is a constant of motion: for all times \(t\). So not only does \(S_{G}\) fail to increase monotonically; it does not change at all! This precludes a characterisation of the approach to equilibrium in terms of increasing Gibbs entropy. Hence, either such a characterisation has to be abandoned, or the formalism has to be modified to allow \(S_{G}\) to increase. A second problem is a consequence of the Gibbsian definition of statistical equilibrium. As we have seen in §6.1, a system is in statistical equilibrium if \(\rho\) is stationary. A system away from equilibrium would then have to be associated with a non-stationary distribution and eventually evolve into the stationary equilibrium distribution. But this is mathematically impossible. It is a consequence of the theory’s formalism of GSM that a distribution that is stationary at some point in time has to be stationary at all times (past and future), and that a distribution that is non-stationary at some point in time will always be non-stationary. So an ensemble cannot evolve from non-stationary distribution to stationary distribution. This requires either a change in the definition of equilibrium, or a change in the formalism that would allow distributions to change in requisite way. In what follows we discuss the main attempts to address these problems. For alternative approaches that we cannot cover here see Frigg (2008b: 166–68) and references therein. 6.4 Coarse-Graining Gibbs was aware of the problems with the approach to equilibrium and proposed coarse-graining as a solution (Gibbs 1902 [1981]: Ch. 12). This notion has since been endorsed by many practitioners (see, for instance, Farquhar 1964 and O. Penrose 1970). We have already encountered coarse-graining in §4.2. The use of it here is different, though, because we are now putting a grid on the full state space \(X\) and not just on the one-particle space. One can then define a coarse-grained density \(\bar{\rho}\) by saying that at every point \(x\) in \(X\) the value of \(\bar{\rho}\) is the average of \(\rho\) over the grid cell in which \(x\) lies. The advantage of coarse-graining is that the coarse-grained distribution is not subject to the same limitations as the original distribution. Specifically, let us call the Gibbs entropy that is calculated with the coarse-grained distribution the coarse-grained Gibbs entropy. It now turns out that coarse-grained Gibbs entropy is not a constant of motion and it is possible for the entropy to increase. This re-opens the avenue of understanding the approach to equilibrium in terms of an increase of the entropy. It is also possible for the coarse-grained distribution to evolve so that it is spread out evenly over the entire available space and thereby comes to look like a micro-canonical equilibrium distribution. Such a distribution is also known as the quasi-equilibrium equilibrium distribution (Blatt 1959; Ridderbos 2002). Coarse-graining raises two questions. First, the coarse-grained entropy can increase and the system can approach a coarse-grained equilibrium, but under what circumstances will it actually do so? Second, is it legitimate to replace standard equilibrium by quasi-equilibrium? As regards the first question, the standard answer (which also goes back to Gibbs) is that the system has to be mixing. Intuitively speaking, a system is mixing if every subset of \(X\) ends up being spread out evenly over the entire state space in the long run (for a more detailed account of mixing see entry on the ergodic hierarchy). The problem is that mixing is a very demanding condition. In fact, being mixing implies being ergodic (because mixing is strictly stronger than ergodicity). As we have already noticed, many relevant systems are not ergodic, and hence a fortiori not mixing. Even if a system is mixing, the mixed state is only achieved in the limit for \(t \rightarrow \infty\), but real physical systems reach equilibrium in finite time (indeed, in most cases rather quickly). As regards the second question, the first point to note is that a silent shift has occurred: Gibbs initially defined equilibrium through stationarity while the above argument defines it through uniformity. This needs further justification, but in principle there would seem to be nothing to stop us from redefining equilibrium in this way. The motivation for adopting quasi-equilibrium is that \(\bar{\rho}\) and \(\rho\) are empirically indistinguishable. If the size of the grid is below the measurement precision, no measurement will be able to tell the difference between the two, and phase averages calculated with the two distributions agree. Hence, hence there is no reason to prefer \(\rho\) to \(\bar{\rho}\). This premise has been challenged. Blatt (1959) and Ridderbos and Redhead (1998) argue that this is wrong because the spin-echo experiment (Hahn 1950) makes it possible to empirically discern between \(\rho\) and \(\bar{\rho}\). The weight of this experiment continues to be discussed controversially, with some authors insisting that it invalidates the coarse gaining approach (Ridderbos 2002) and others insisting that coarse-graining can still be defended (Ainsworth 2005; Lavis 2004; Robertson 2020). For further discussion see Myrvold (2020b). 6.5 Interventionism The approaches we discussed so far assume that systems are isolated. This is an idealising assumption because real physical systems are not perfectly isolated from their environment. This is the starting point for the interventionist programme, which is based on the idea that real systems are constantly subject to outside perturbations, and that it is exactly these perturbations that drive the system into equilibrium. In other words, it’s these interventions from outside the system that are responsible for its approach to equilibrium, which is what earns the position the name interventionism. This position has been formulated by Blatt (1959) and further developed by Ridderbos and Redhead (1998). The key insight behind the approach is that two challenges introduced in Section 6.3 vanish once the system is not assumed to be isolated: the entropy can increase, and a non-stationary distribution can be pushed toward a distribution that is stationary in the future. This approach accepts that isolated systems do not approach equilibrium, and critics wonder why this would be the case. If one places a gas like the one we discussed in Section 1 somewhere in interstellar space where it is isolated from outside influences, will it really sit there confined to the left half of the container and not spread? And even if this were the case, would adding just any environment resolve the issue? Interventionist sometimes seem to suggest that this is the case, but in an unqualified form this claim cannot be right. Environments can be of very different kinds and there is no general theorem that says that any environment drives a system to equilibrium. Indeed, there are reasons to assume that there is no such theorem because while environments do drive systems, they need not drive them to equilibrium. So it remains an unresolved question under what conditions environments drive systems to equilibrium. Another challenge for interventionism is that one is always free to consider a larger system, consisting of our original system plus its environment. For instance, we can consider the “gas + box” system. This system would then also approach equilibrium because of outside influences, and we can then again form an even larger system. So we get into a regress that only ends once the system under study is the entire universe. But the universe has no environment that could serve as a source of perturbations which, so the criticism goes, shows that the programme fails. Whether one sees this criticism as decisive depends on one’s views of laws of nature. The argument relies on the premise that the underlying theory is a universal theory, i.e., one that applies to everything that there is without restrictions. The reader can find an extensive discussion in the entry on laws of nature. At this point we just note that while universality is widely held, some have argued against it because laws are always tested in highly artificial situations. Claiming that they equally apply outside these settings involves an inductive leap that is problematic; see for instance Cartwright (1999) for a discussion of such a view. This, if true, successfully undercuts the above argument against interventionism. 6.6 The Epistemic Account The epistemic account urges a radical reconceptualization of SM. The account goes back to Tolman (1938 [1979]) and has been brought to prominence by Jaynes in a string of publications between 1955 and 1980, most of which are gathered in Jaynes (1983). On this approach, SM is about our knowledge of the world and not about the world itself, and the probability distributions in GSM represents our state of knowledge about a system and not some matter of fact. The centre piece of this interpretation is the fact that the Gibbs entropy is formally identical to the Shannon entropy in information theory, which is a measure for the lack of information about a system: the higher the entropy, the less we know (for a discussion of the Shannon entropy see the entry on information, §4.2). The Gibbs entropy can therefore be seen as quantifying our lack of information about a system. This has the advantage that ensembles are no longer needed in the statement of GSM. On the epistemic account, there is only one system, the one on which we are performing our experiments, and \(\rho\) describes what we know about it. This also offers a natural criterion for identifying equilibrium distributions: they are the distributions with the highest entropy consistent with the external constraints on the system because such distributions are the least committal distributions. This explains why we expect equilibrium to be associated with maximum entropy. This is known as Jaynes’ maximum entropy principle (MEP). MEP has been discussed controversially, and, to date, there is no consensus on its significance, or even cogency. For discussions see, for instance, Denbigh and Denbigh (1985), Howson and Urbach (2006), Lavis (1977), Lavis and Milligan (1985), Seidenfeld (1986), Shimony (1985), Uffink (1995, 1996a), and Williamson (2010). The epistemic approach also assumes that experimental outcomes correspond to phase averages, but as we have seen, this is a problematic assumption (§6.1). A further concern is that the system’s own dynamics plays no role in the epistemic approach. This is problematic because if the dynamics has invariant quantities, a system cannot access certain parts of the state space even though \(\rho\) may assign a non-zero probability to it (Sklar 1993: 193–4). The epistemic account’s explanation of the approach to equilibrium relies on making repeated measurements and conditionalizing on each measurement result; for a discussion see Sklar (1993: 255–257). This successfully gets around the problem that the Gibbs entropy is constant, because the value assignments now depend not only on the system’s internal dynamics, but also on the action of an experimenter. The problem with this solution is that depending on how exactly the calculations are done, either the entropy increase fails to be monotonic (indeed entropy decreases are possible) or the entropy curve will become dependent on the sequence of instants of time chosen to carry out measurements (Lavis & Milligan 1985). However, the most fundamental worry about the epistemic approach is that it fails to realise the fundamental aim of SM, namely to explain how and why processes in nature take place because these processes cannot possibly depend on what we know about them. Surely, so the argument goes, the boiling of kettles or the spreading of gases has something to do with how the molecules constituting these systems behave and not with what we happen (or fail) to know about them (Redhead 1995; Albert 2000; Loewer 2001). For further discussions of the epistemic approach see Anta (forthcoming-a, forthcoming-b), Shenker (2020), and Uffink (2011). 6.7 The Relation between GSM and BSM A pressing and yet understudied question in the philosophy of SM concerns the relation between the GSM and BSM. GSM provides the tools and methods to carry out a wide range of equilibrium calculations, and it is the approach predominantly used by practitioners in the field. Without it, the discipline of SM would not be able to operate (Wallace 2020). BSM is conceptually neat and is preferred by philosophers when they give foundational accounts of SM. So what we’re facing is a schism whereby the day-to-day work of physicists is in one framework and foundational accounts and explanations are given in another framework (Anta 2021a). This would not be worrisome if the frameworks were equivalent, or at least inter-translatable in relatively clear way. As the discussion in the previous sections has made clear, this is not the case. And what is more, in some contexts the formalisms do not even give empirically equivalent predictions (Werndl & Frigg 2020b). This raises the question of how exactly the two approaches are related. Lavis (2005) proposes a reconciliation of the two frameworks through giving up on the binary property of the system being or not being in equilibrium, which should be replaced by the continuous property of commonness. Wallace (2020) argues that GSM is a more general framework in which the Boltzmannian approach may be understood as a special case. Frigg and Werndl suggest that BSM is a fundamental theory and GSM is an effective theory that offers means to calculate values defined in BSM (Frigg & Werndl 2019; Werndl & Frigg 2020a). Goldstein (2019) plays down their difference and argues that the conflict between them is not as great as often imagined. Finally, Goldstein, Lebowitz, Tumulka, and Zanghì (2020) compare the Boltzmann entropy and the Gibbs entropy and argue that the two notions yield the same (leading order) values for the entropy of a macroscopic system in thermal equilibrium. 7. Further Issues So far we have focussed on the questions that arise in the articulation of the theory itself. In this section we discuss some further issue that arise in connection with SM, explicitly excluding a discussion of the direction of time and other temporal asymmetries, which have their own entry in this encyclopedia (see the entry on thermodynamic asymmetry in time). 7.1 The Interpretation of SM Probabilities How to interpret probabilities is a problem with a long philosophical tradition (for a survey of different views see the entry on interpretations of probability). Since SM introduces probabilities, there is a question of how these probabilities should be interpreted. This problem is particularly pressing in SM because, as we have seen, the underlying mechanical laws are deterministic. This is not a problem so long as the probabilities are interpreted epistemically as in Jaynes’ account (§6.6). But, as we have seen, a subjective interpretation seems to clash with the realist intuition that SM is a physical theory that tells us how things are independently of what we happen to know about them. This requires probabilities to be objective. Approaches to SM that rely on ergodic theory tend to interpret probabilities as time-averages, which is natural because ergodicity provides such averages. However, long-run time averages are not a good indicator for how a system behaves because, as we have seen, they are constant and so do not indicate how a system behaves out of equilibrium. Furthermore, interpreting long-run time averages as probabilities is motivated by the fact the that these averages seem to be close cousins of long-run relative frequencies. But this association is problematic for a number of reasons (Emch 2005; Guttmann 1999; van Lith 2003; von Plato 1981, 1982, 1988, 1994). An alternative is to interpret SM probabilities as propensities, but many regard this as problematic because propensities would ultimately seem to be incompatible with a deterministic underlying micro theory (Clark 2001). Loewer (2001) suggested that we interpret SM probabilities as Humean objective chances in Lewis’ sense (1980) because the Mentaculus (see §4.6) is a best system in Lewis’ sense. Frigg (2008a) identifies some problems with this interpretation, and Frigg and Hoefer (2015) formulate an alternative Humean account that is designed to overcome these issues. For further discussion of Humean chances in SM, see Beisbart (2014), Dardashti, Glynn, Thébault, and Frisch (2014), Hemmo and Shenker (2022),  Hoefer (2019), and Myrvold (2016, 2021). 7.2 Maxwell’s Demon and the Entropy Costs of Computation Consider the following scenario, which originates in a letter that Maxwell wrote in 1867 (see Knott 1911). Recall the vessel with a partition wall that we have encountered in Section 1, but vary the setup slightly: rather than having one side empty, the two sides of the vessel are filled with gases of different temperatures. Additionally, there is now a shutter in the wall which is operated by a demon. The demon carefully observes all the molecules. Whenever a particle in the cooler side moves towards the shutter the demon checks its velocity, and if the velocity of the particle is greater than the mean velocity of the particles on the hotter side of the vessel he opens the shutter and lets the particle pass through to the hotter side. The net effect of the demon’s actions is that the hotter gas becomes even hotter and that the colder gas becomes even colder. This means that there is a heat transfer from the cooler to the hotter gas without doing any work because the heat transfer is solely due to the demon’s skill and intelligence in sorting the molecules. Yet, according to the Second Law of thermodynamics, this sort of heat transfer is not allowed. So we arrive at the conclusion that the demons’ action result in a violation of the Second Law of thermodynamics. Maxwell interpreted this scenario as a thought experiment that showed that the Second Law of thermodynamics is not an exceptionless law and that it has only “statistical certainty” (see Knott 1911; Hemmo & Shenker 2010). Maxwell’s demon has given rise to a vast literature, some of it in prestigious physics journals. Much of this literature has focused on exorcising the demon, i.e., on showing that a demon would not be physically possible. Broadly speaking, there are two approaches. The first approach is commonly attributed to Szilard (1929 [1990]), but also goes also back to von Neumann (1932 [1955]) and Brillouin (1951 [1990]). The core idea of this approach is that gaining information that allows us to distinguish between \(n\) equally likely states comes at a necessary minimum cost in thermodynamic entropy of \(k \log(n)\), which is the entropy dissipated by the system that gains information. Since the demon has to gain information to decide whether to open the shutter, the second law of thermodynamics is not violated. The second approach is based on what is now called Landauer’s principle, which states that in erasing information that can discern between \(n\) states, a minimum thermodynamic entropy of \(k \log(n)\) is dissipated (Landauer 1961 [1990]). Proponents of the principle argue that because a demon has to erase information on memory devices, Landauer’s principle prohibits a violation of the second law of thermodynamics. In two influential articles Earman and Norton (1998, 1999) lament that from the point of view of philosophy of science the literature on exorcising the demon lacks rigour and reflection on what the goals the enterprise are, and that the demon has been discussed from various different perspectives, often leading to confusion. Earman and Norton argue that the appeal to information theory has not resulted in a decisive exorcism of Maxwell’s demon. They pose a dilemma for the proponent of an information theoretic exorcism of Maxwell’s demon. Either the combined system of the vessel and the demon are already assumed to be subject to the second law of thermodynamics, in which case it is trivial that the demon will fail. Or, if this is not assumed, then proponents of the information theoretic exorcism have to supply new physical principles to guarantee the failure of the demon and they have to give independent grounds for it. Yet, in Earman and Norton’s view, such independent grounds have not been convincingly established. Bub (2001) and Bennett (2003) responded to Earman and Norton that if one assumes that the demon is subject to the Second Law of thermodynamics, the merit of Landauer’s principle is that it shows where the thermodynamic costs arise. Norton (2005, 2017) replies that no general precise principle is stated how erasure and the merging of computational paths necessarily lead to an increase in thermodynamic entropy. He concludes that the literature on Landauer’s principle is too fragile and too tied to a few specific examples to sustain general claims about the failure of Maxwell’s demons. Maroney (2005) argues that thermodynamic entropy and information-theoretic entropy are conceptually different, and that hence, in general, Landauer’s principle fails. The discussions around Maxwell’s demon are now so extensive that they defy documentation in an introductory survey of SM. Classical papers on the matter are collected in Leff and Rex (1990). For more recent discussion see, for instance, Anta (2021b), Hemmo and Shenker (2012; 2019), Ladyman and Robertson (2013, 2014), Leff and Rex (1994), Myrvold (forthcoming), Norton (2013), and references therein. 7.3 The Gibbs Paradox So far, we have considered how one gas evolves. Now let’s look at what happens when we mix two gases. Again, consider a container with a partition wall in the middle, but now imagine that there are two different gases on the left and on the right (for instance helium and hydrogen) where both gases have the same temperature. We now remove the shutter, and the gases start spreading and get mixed. If we then calculate the entropy of the initial and the final state of the two gases, we find that the entropy of the mixture is greater than the entropy of the gases in their initial compartments. This is the result that we expect. The paradox arises from the fact that the calculations do not depend on the fact that the gases are different: if we assume that we have air of the same temperature on both sides of the barrier the calculations still yield an increase in entropy when the barrier is removed. This seems wrong because it would imply that the entropy of a gas depends on its history and cannot be a function of its thermodynamic state alone (as thermodynamics requires). This is known as the Gibbs Paradox. The standard textbook resolution of the paradox is that classical SM gets the entropy wrong because it counts states that differ only by a permutation of two indistinguishable particles as distinct, which is a mistake (Huang 1963). So the problem is rooted in the notion of individuality, which is seen as inherent to classical mechanics. Therefore, so the argument goes, the problem is resolved by quantum mechanics, which treats indistinguishable particles in the right way. This argument raises a number of questions concerning the nature of individuality in classical and quantum mechanics, the way of counting states in both the Boltzmann and the Gibbs approach, and the relation of SM to thermodynamics. Classical discussions include Denbigh and Denbigh (1985: Ch. 4), Denbigh and Redhead(1989), Jaynes (1992), Landé (1965), Rosen (1964), and van Kampen (1984). For more recent discussions, see, for instance, Huggett (1999), Saunders (2006), and Wills (forthcoming), as well as the contributions to Dieks and Saunders (2018) and references therein. 7.4 SM Beyond Physics Increasingly, the methods of SM are used to address problems outside physics. Costantini and Garibaldi (2004) present a generalised version of the Ehrenfest flea model and show that it can be used to describe a wide class of stochastic processes, including problems in population genetics and macroeconomics. Colombo and Palacios (2021) discuss the application of the free energy principle in biology. The most prolific application of SM methods outside physics are in economics and finance, where an entire field is named after them, namely econophysics. For discussions of different aspects of econophysics see Jhun, Palacios, and Weatherall (2018); Kutner et al. (2019), Rickles (2007, 2011), Schinckus (2018), Thébault, Bradley, and Reutlinger (2017), and Voit (2005).

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

Boltzmann’s Work in Statistical Physics

1. Introduction 1.1 Popular perceptions of Boltzmann Boltzmann's work met with mixed reactions during his lifetime, and continue to do so even today. It may be worthwhile, therefore, to devote a few remarks to the perception and reception of his work. Boltzmann is often portrayed as a staunch defender of …

1. Introduction 1.1 Popular perceptions of Boltzmann Boltzmann's work met with mixed reactions during his lifetime, and continue to do so even today. It may be worthwhile, therefore, to devote a few remarks to the perception and reception of his work. Boltzmann is often portrayed as a staunch defender of the atomic view of matter, at a time when the dominant opinion in the German-speaking physics community, led by influential authors like Mach and Ostwald, disapproved of this view. Indeed, the story goes, in the late nineteenth century any attempt at all to search for a hypothetical, microphysical underpinning of macroscopic phenomena was regarded as suspicious. Further, serious criticism on his work was raised by Loschmidt and Zermelo. Various passages in Boltzmann's writing, especially in the late 1890s, complain that his work was hardly noticed (entitling one article “On some of my lesser-known papers on gas theory and their relation to the same” (1879b) or even about a “hostile attitude” (1898a, v) towards gas theory, and of his awareness of “being a powerless individual struggling against the currents of the time” (ibid.). Thus, the myth has arisen that Boltzmann was ignored or resisted by his contemporaries.[1] Sometimes, his suicide in 1906 is attributed to the injustice he thus suffered, The fact that his death occurred just at the dawn of the definitive victory of the atomic view in the works of Einstein, Smoluchowski, Perrin et al. adds a further touch of drama to this picture. As a matter of fact, Boltzmann's reputation as a theoretical physicist was actually widely known and well-respected. In 1888 he was offered (but declined, after a curious sequence of negotiations) a most prestigious chair in Berlin. Later, several universities (Vienna, Munich, Leipzig) competed to get him appointed, sometimes putting the salaries of several professorships together in their effort (Lindley 2001). He was elected to membership or honorary membership in many academies (cf. Höflechner 1994, 192), received honorary doctorates, and was also awarded various medals. In short, there is no factual evidence for the claim that Boltzmann was ignored or suffered any unusual lack of recognition from his contemporaries. His suicide seems to have been due to factors in his personal life (depressions and decline of health) rather than to any academic matters. 1.2 Debates and controversies Boltzmann was involved in various disputes. But this is not to say that he was the innocent victim of hostilities. In many cases he took the initiative by launching a polemic attack on his colleagues. I will focus below on the most important disputes: with Mach and Ostwald on the reality of atoms; and with colleagues who criticized Boltzmann's own work in the form of the famous reversibility objection (Loschmidt) and the recurrence objection (Zermelo). For a wider sketch of how contemporary scientists took position in the debate on the topics of mechanism and irreversibility I refer to (van Strien 2013). Ostwald and Mach clearly resisted the atomic view of matter (although for different reasons). Boltzmann certainly defended and promoted this view. But he was not the naive realist or unabashed believer in the existence of atoms that the more popular literature has made of him. Instead, he stressed from the 1880s onwards that the atomic view yielded at best an analogy, or a picture or model of reality (cf. de Regt 1999). In his debate with Mach he advocated (1897c, 1897d) this approach as a useful or economical way to understand the thermal behavior of gases. This means that his views were quite compatible with Mach's views on the goal of science.[2] What divided them was more a strategic issue. Boltzmann claimed that no approach in natural science that avoids hypotheses completely could ever succeed. He argued that those who reject the atomic hypothesis in favor of a continuum view of matter were guilty of adopting hypotheses too. Ultimately, the choice between such views should depend on their fruitfulness, and here Boltzmann had no doubt that the atomic hypothesis would be more successful.[3] In the case of Ostwald, and his ‘energetics’, Boltzmann did become involved in a more heated dispute at a meeting in Lübeck in 1895. Roughly speaking, energetics presented a conception of nature that took energy as the most fundamental physical entity, and thus represented physical processes as transformations of various forms of energy. It resisted attempts to comprehend energy, or these transformations in terms of mechanical pictures. It has been suggested that in the 1890s “the adherents of energetics reigned supreme in the German school and even throughout Europe” (Dugas 1959, 82). But this is surely a great exaggeration. It seems closer to the truth to say that energetics represented a rather small (but vocal) minority in the physics community, that claimed to put forward a seemingly attractive conception of natural science, and being promoted in the mid-90s by reputed scientists, could no longer be dismissed as the work of amateurs (cf. Deltete 1999). The 1895 gathering of the Naturforscherversammlung in Lübeck (the annual meeting of physicists, chemists, biologists and physicians) was programmed to devote special sessions to the state of the art of energetics. Boltzmann, who was member of the programme committee, had already shown interest in the development of energetics in private correspondence with Ostwald. Georg Helm was asked to prepare a report, and at Boltzmann's own suggestion, Ostwald also contributed a lecture. All agreed that the meeting should follow the “British style”, i.e., manuscripts would be circulated beforehand and there would be ample room for discussion, following the example of the British Association for the Advancement of Science meeting that Boltzmann had attended the previous year. Both Helm and Ostwald, apparently, anticipated that they would have the opportunity to discuss their views on energetics in an open-minded atmosphere. But at the meeting Boltzmann surprised them with devastating criticism. According to those who were present Boltzmann was the clear winner of the debate.[4] Yet the energeticists experienced the confrontation as an ambush (Höflechner 1994, I, 169), for which he had not been prepared. Nevertheless, Boltzmann and Ostwald remained friends, and in 1902 Ostwald made a great effort to persuade his home university in Leipzig to appoint Boltzmann (cf. Blackmore 1995, 61–65). Neither is there any hostile attitude in the famous ‘reversibility objection’ by Loschmidt in 1875. Loschmidt was Boltzmann's former teacher and later colleague at the University of Vienna, and a life-long friend. He had no philosophical reservations against the existence of atoms at all. (Indeed, he is best known for his estimate of their size.) Rather, his main objection was against the prediction by Maxwell and Boltzmann that a gas column in thermal equilibrium in a gravitational field has the same temperature at all heights. His now famous reversibility objection arose in his attempts to undermine this prediction. Whether Boltzmann succeeded in refuting the objection or not is still a matter of dispute, as we shall see below (section 4.1). Zermelo's opposition had a quite different background. When he put forward the recurrence objection in 1896, he was an assistant to Planck in Berlin. And like his mentor, he did not favor the mechanical underpinning of thermal phenomena. Yet his 1896 paper (Zermelo 1896a) is by no means hostile. It presents a careful logical argument that leads him to a dilemma: thermodynamics with its Second Law on the one hand and gas theory (in the form as Zermelo understood it) on the other cannot both be literally true. By contrast, it is Boltzmann's (1896b) reaction to Zermelo, drenched in sarcasm and bitterness which (if anything) may have led to hostile feelings between these two authors. In any case, the tone of Zermelo's (1896b) is considerably sharper. Still, Zermelo maintained a keen, yet critical, interest in gas theory and statistical physics, and subsequently played an important role in making Gibbs' work known in Germany. In fact, I think that Boltzmann's rather aggressive reactions to Zermelo and Ostwald should be compared to other polemical exchanges in which he was involved, and sometimes initiated himself (e.g. against Clausius, Tait, Planck, and Bertrand — not to mention his essay on Schopenhauer). It seems to me that Boltzmann enjoyed polemics, and the use of sharp language for rhetorical effect.[5] Boltzmann's complaints in 1896–1898 about an hostile environment are, I think, partly explained by his love of polemic exaggerations, partly also by his mental depression in that period. (See Höflechner 1994, 198–202) for details.) Certainly, the debates with Ostwald and Zermelo might well have contributed to this personal crisis. But it would be wrong to interpret Boltzmann's plaintive moods as evidence that his critics were, in fact, hostile. Even today, commentators on Boltzmann's works are divided in their opinion. Some praise them as brilliant and exceptionally clear. Often one finds passages suggesting he possessed all the right answers all along the way — or at least in his later writings, while his critics were simply prejudiced, confused or misguided (von Plato, Lebowitz, Kac, Bricmont, Goldstein). Others (Ehrenfests, Klein, Truesdell) have emphasized that Boltzmann's work is not always clear and that he often failed to indicate crucial assumptions or important changes in his position, while friendly critics helped him in clarifying and developing his views. Fans and critics of Boltzmann's work alike agree that he pioneered much of the approaches currently used in statistical physics, but also that he did not leave behind a unified coherent theory. His scientific papers, collected in Wissenschaftliche Abhandlungen, contain more than 100 papers on statistical physics alone. Some of these papers are forbiddingly long, full of tedious calculations and lack a clear coherent structure. Sometimes, vital assumptions, or even a complete change of approach, are stated only somewhere tucked away between the calculations, or at the very last page. Even Maxwell, who might have been in the best position to appreciate Boltzmann's work, expressed his difficulty with Boltzmann's longwindedness (in a letter to Tait, August 1873; see Garber, Brush, and Everett 1995, 123).[6] But not all of his prose is cumbersome and heavy-going. Boltzmann at his best could be witty, passionate and a delight to read. He excelled in such qualities in much of his popular work and some of his polemical articles. 1.3 Boltzmann's relevance for the foundations of statistical physics The foundations of statistical physics may today be characterized as a battlefield between a dozen or so different schools, each firmly dug into their own trenches, e.g.: ergodic theory, coarse-graining, the approaches of Markovianism, interventionism, BBKGY, or Jaynes, Prigogine, etc. Still, many of the protagonists of these schools, regardless of their disagreements, frequently express their debt to ideas first formulated to Boltzmann. Even to those who consider the concept of ensembles as the most important tool of statistical physics, and claim Gibbs rather than Boltzmann as their champion, it has been pointed out that Boltzmann introduced ensembles long before Gibbs. And those who advocate Boltzmann while rejecting ergodic theory, may similarly be reminded that the latter theory too originated with Boltzmann himself. It appears, therefore, that Boltzmann is the father of many approaches, even if these approaches are presently seen as conflicting with each other. This is due to the fact that during his forty years of work on the subject, Boltzmann pursued many lines of thought. Typically, he would follow a particular train of thought that he regarded promising and fruitful, only to discard it in the next paper for another one, and then pick it up again years later. This meandering approach is of course not unusual among theoretical physicists but it makes it hard to pin down Boltzmann on a particular set of rock-bottom assumptions, that would reveal his true colors in the modern debate on the foundations of statistical physics. The Ehrenfests (1912) in their famous Encyclopedia article, set themselves the task of constructing a more or less coherent framework out of Boltzmann's legacy. But their presentation of Boltzmann was, as is rather well known, not historically adequate. Without going into a more detailed description of the landscape of the battlefield of the foundations of statistical physics, or a sketch of the various positions occupied, it might be useful to mention only the roughest of distinctions. I use the term ‘statistical physics’ as a deliberately vague term that includes at least two more sharply distinguished theories: the kinetic theory of gases and statistical mechanics proper. The first theory aims to explain the properties of gases by assuming that they consist of a very large number of molecules in rapid motion. (The term ‘kinetic’ is meant to underline the vital importance of motion here, and to distinguish the approach from older static molecular gas models.) During the 1860s probability considerations were imported into this theory. The aim then became to characterize the properties of gases, in particular in thermal equilibrium, in terms of probabilities of various molecular states. This is what the Ehrenfests call “kineto-statistics of the molecule”. Here, molecular states, in particular their velocities, are regarded as stochastic variables, and probabilities are attached to such molecular states of motion. These probabilities themselves are conceived of as mechanical properties of the state of the total gas system. Either they represent the relative number of molecules with a particular state, or the relative time during which a molecule has that state. In the course of time a transition was made to what the Ehrenfests called “kineto-statistics of the gas model”, or what is nowadays known as statistical mechanics. In this latter approach, probabilities are not attached to the state of a molecule but of the entire gas system. Thus, the state of the gas, instead of determining the probability distribution, now itself becomes a stochastic variable. A merit of this latter approach is that interactions between molecules can be taken into account. Indeed, the approach is not restricted to gases, but also applicable to liquids or solids. The price to be paid, however, is that the probabilities themselves become more abstract. Since probabilities are attributed to the mechanical states of the total system, they are no longer determined by such mechanical states. Instead, in statistical mechanics, the probabilities are usually determined by means of an ‘ensemble’, i.e., a fictitious collection of replicas of the system in question. It is not easy to pinpoint this transition in the course of history, except to say that in Maxwell's work in the 1860s definitely belong to the first category, and Gibbs' book of 1902 to the second. Boltzmann's own works fall somewhere in the middle. His earlier contributions clearly belong to the kinetic theory of gases (although his 1868 paper already applies probability to an entire gas system), while his work of 1877 is usually seen as belonging to statistical mechanics. However, Boltzmann himself never indicated a clear distinction between these two different theories, and any attempt to draw a demarcation at an exact location in his work seems somewhat arbitrary. From a conceptual point of view, the transition from kinetic gas theory to statistical mechanics poses two main foundational questions. On what grounds do we choose a particular ensemble, or the probability distribution characterizing the system? Gibbs did not enter into a systematic discussion of this problem, but only discussed special cases of equilibrium ensembles (i.e. canonical, micro-canonical etc.). A second problem is to relate the ensemble-based probabilities with the probabilities obtained in the earlier kinetic approach for a single gas model. The Ehrenfests (1912) paper was the first to recognize these questions, and to provide a partial answer: Assuming a certain hypothesis of Boltzmann's, which they dubbed the ergodic hypothesis, they pointed out that for an isolated system the micro-canonical distribution is the unique stationary probability distribution. Hence, if one demands that an ensemble of isolated systems describing thermal equilibrium must be represented by a stationary distribution, the only choice for this purpose is the micro-canonical one. Similarly, they pointed out that under the ergodic hypothesis infinite time averages and ensemble averages were identical. This, then, would provide a desired link between the probabilities of the older kinetic gas theory and those of statistical mechanics, at least in equilibrium and in the infinite time limit. Yet the Ehrenfests simultaneously expressed strong doubts about the validity of the ergodic hypothesis. These doubts were soon substantiated when in 1913 Rozenthal and Plancherel proved that the hypothesis was untenable for realistic gas models. The Ehrenfests' reconstruction of Boltzmann's work thus gave a prominent role to the ergodic hypothesis, suggesting that it played a fundamental and lasting role in his thinking. Although this view indeed produces a more coherent view of his multifaceted work, it is certainly not historically correct. Boltzmann himself also had grave doubts about this hypothesis, and expressly avoided it whenever he could, in particular in his two great papers of 1872 and 1877b. Since the Ehrenfests, many other authors have presented accounts of Boltzmann's work. Particularly important are Klein (1973) and Brush (1976). Still, much confusion remains about what exactly his approach to statistical physics was, and how it developed. For a more elaborate attempt to sketch the general landscape, and Boltzmann's work in particular,I refer to (Uffink 2007). 1.4 A concise chronography of Boltzmann's writings Roughly speaking, one may divide Boltzmann's work in four periods. The period 1866–1871 is more or less his formative period. In his first paper (1866), Boltzmann set himself the problem of deriving the full second law from mechanics. The notion of probability does not appear in this paper. The following papers, from 1868 and 1871, were written after Boltzmann had read Maxwell's work of 1860 and 1867. Following Maxwell's example, they deal with the characterization of a gas in thermal equilibrium, in terms of a probability distribution. Even then, he was set on obtaining more general results, and extended the discussion to cases where the gas is subject to a static external force, and might consist of poly-atomic molecules. He regularly switched between different conceptions of probability: sometimes this referred to a time average, sometimes a particle average or, in an exceptional paper (1871b), it referred to an ensemble average. The main result of those papers is that from the so-called Stoßzahlansatz (SZA), i.e. an assumption about the number of collisions (or a closely analogous assumption) the Maxwellian distribution function is stationary, and thus an appropriate candidate for the equilibrium state. In some cases Boltzmann also argued it was the unique such state. However, in this period he also presented a completely different method, which did not rely on the SZA but rather on the ergodic hypothesis. This approach led to a new form of the distribution function that, in the limit \(N \rightarrow \infty\) reduces to the Maxwellian form. In the same period, he also introduced the concept of ensembles, but this concept would not play a prominent role in his thinking until the 1880s. The next period is that of 1872–1878, in which he wrote his two most famous papers: (1872) (Weitere Studien) and (1877b) (Über die Beziehung). The 1872 paper contained the Boltzmann equation and the H-theorem. Boltzmann claimed that the H-theorem provided the desired theorem from mechanics corresponding to the second law. However, this claim came under a serious objection due to Loschmidt's criticism of 1876. The objection was simply that no purely mechanical theorem could ever produce a time-asymmetrical result. Boltzmann's response to this objection will be summarized later. The result was, however, that Boltzmann rethought the basis of his approach and in 1877b produced a conceptually very different analysis, which might be called the combinatorial argument, of equilibrium and evolutions towards equilibrium, and the role of probability theory. The distribution function, which formerly represented the probability distribution, was now conceived of as a stochastic variable (nowadays called a macrostate) subject to a probability distribution. That probability distribution was now determined by the size of the volume in phase space corresponding to all the microstates giving rise to the same macrostate, (essentially given by calculating all permutations of the particles in a given macrostate). Equilibrium was now conceived of as the most probable macrostate instead of a stationary macrostate. The evolution towards equilibrium could then be reformulated as an evolution from less probable to more probable states. Even though all commentators agree on the importance of these two papers, there is still disagreement about what Boltzmann's claims actually were, and whether he succeeded (or indeed even attempted) in avoiding the reversibility objection in this new combinatorial argument, whether he intended or succeeded to prove that most evolutions go from less probable to more probable states and whether or not he (implicitly) relied on the ergodic hypothesis in these works. I shall comment on these issues in due course. (See Uffink (2007) for a more detailed overview.) The third period is taken up by the papers Boltzmann wrote during the 1880's have attracted much less attention. During this period, he abandoned the combinatorial argument, and went back to an approach that relied on a combination of the ergodic hypothesis and the use of ensembles. For a while Boltzmann worked on an application of this approach to Helmholtz's concept of monocyclic systems. However, after finding that concept did not always provide the desired thermodynamical analogies, he abandoned this topic again. Next, in the 1890s the reversibility problem resurfaced again, this time in a debate in the columns of Nature. This time Boltzmann chose an entirely different line of counterargument than in his debate with Loschmidt. A few years later, Zermelo presented another objection, now called the recurrence objection. The same period also saw the publication of the two volumes of his Lectures on Gas Theory. In this book, he takes the hypothesis of molecular disorder (a close relative of the SZA) as the basis of his approach. The combinatorial argument is only discussed as an aside, and the ergodic hypothesis is not mentioned at all. His last paper is an Encyclopedia article with Nabl presenting a survey of kinetic theory. 2. The Stoßzahlansatz and the ergodic hypothesis Boltzmann's first paper (1866) in statistical physics aimed to reduce the second law to mechanics. Within the next two years he became acquainted with Maxwell's papers on gas theory of 1860 and 1867, which introduced probability notions in the description of the gas. Maxwell had studied specific mechanical models for a gas (as a system of hard spheres (1860) or of point particles exerting a mutual force on each other inversely proportional to the fifth power of their distance), and characterized the state of such a gas by means of a probability distribution f over the various values of the molecular velocities \(\vec{v}\). For Maxwell, the probability \(f(\vec{v})d^3\vec{v}\) denoted the relative number of particles in the gas with a velocity between \(\vec{v}\) and \(\vec{v} + d^3\vec{v}\). In particular, he had argued that the state of equilibrium is characterized by the so-called Maxwell distribution function: where \(A\) is a normalization constant and \(B\) is proportional to the absolute temperature. The argument that Maxwell had given in 1860 to single out this distribution relied on the fact that this is the only probability distribution that is both spherically symmetric and factorizes into functions of the orthogonal components \(v_x, v_y, v_z\) separately. In 1867, however he replaced these desiderata with the more natural requirement that the equilibrium distribution should be stationary, i.e. it should not change shape as a result of the continual collisions between the particles. This called for a more elaborate argument, involving a detailed consideration of the collisions between particles. The crucial assumption in this argument is what is now known as the SZA. Roughly speaking, it states that the number of particle pairs, \(dN(\vec{v}_1, \vec{v}_2)\) with initial velocities between \(\vec{v}_1\) and \(\vec{v}_1 + d^3\vec{v}_1\) and between \(\vec{v}_2\) and \(\vec{v}_2 + d^3\vec{v}_2\) respectively, which are about to collide in a time span \(dt\) is proportional to where the proportionality constant depends on the geometry of the collision and the relative velocity. For Maxwell, and Boltzmann later, this assumption seemed almost self-evident. One ought to note, however, that by choosing the initial, rather than the final velocities of the collision, the assumption introduced an explicit time-asymmetric element. This, however, was not noticed until 1895. Maxwell showed that, under the SZA, the distribution (1) is indeed stationary. He also argued, but much less convincingly, that it should be the only stationary distribution. In his (1868), Boltzmann set out to apply this argument to a variety of other models (including gases in a static external force field). However, Boltzmann started out with a somewhat different interpretation of probability in mind than Maxwell. For him, \(f(\vec{v})d^3\vec{v}\) is introduced firstly as the relative time during which a (given) particle has a velocity between \(\vec{v}\) and \(\vec{v} + d^3\vec{v}\) (WA I, 50). But, in the same breath, he identifies this with the relative number of particles with this velocity. This equivocation between different meanings of probability returned again and again in Boltzmann's writing.[7] Either way, of course, whether we average over time or particles, probabilities are defined here in strictly mechanical terms, and therefore objective properties of the gas. Yet apart from this striking difference in interpretation, the first section of the paper is a straightforward continuation of the ideas Maxwell had developed in his 1867. In particular, the main ingredient is always played by the SZA, or a version of that assumption suitably modified for the case discussed. But in the last section of the paper he suddenly shifts course. He now focuses on a general Hamiltonian system, i.e., a system of N material points with an arbitrary interaction potential. The state of this system may be represented as a phase point \(x = (\vec{p}_1,\ldots,\vec{p}_N,\vec{q}_1,\ldots,\vec{q}_N)\) in the mechanical phase space \(\Gamma\). By the Hamiltonian equations of motion, this point evolves in time, and thus describes a trajectory \(x_t\). This trajectory is constrained to lie on a given energy hypersurface \(H(x) = E\), where \(H(x)\) denotes the Hamiltonian function. Now consider an arbitrary probability density \(\rho(x)\) over this phase space. He shows, by (what is now known as) Liouville's theorem, that \(\rho\) remains constant along a trajectory, i.e., \(\rho(x_0) = \rho(x_t)\). Assuming now for simplicity that all points in a given energy hypersurface lie on a single trajectory, the probability should be a constant over the energy hypersurface. In other words, the only stationary probability with fixed total energy is the microcanonical distribution. where \(\delta\) is Dirac's delta function. By integrating this expression over all momenta but one, and dividing this by the integral of \(\rho_{mc}\) over all momenta, Boltzmann obtained the marginal probability density \(\rho_{mc}(\vec{p}_1 \mid \vec{q}_1,\ldots,\vec{q}_N)\) for particle 1's momentum, conditionalized on the particle positions \(\vec{q}_1,\ldots,\vec{q}_N\). He then showed that this marginal probability distribution tends to the Maxwell distribution when the number of particles tends to infinity. Some comments on this result. First, the difference between the approach relying on the ergodic hypothesis and that relying on the SZA is rather striking. Instead of concentrating on a specific gas model, Boltzmann here assumes a much more general model with an arbitrary interaction potential \(V(\vec{q}_1,\ldots,\vec{q}_N)\). Moreover, the probability density \(\rho\) is defined over phase space, instead of the space of molecular velocities. This is the first occasion where probability considerations are applied to the state of the mechanical system as whole, instead of its individual particles. If the transition between kinetic gas theory and statistical mechanics may be identified with this caesura, (as argued by the Ehrenfests and by Klein) it would seem that the transition has already been made right here in 1868, rather than only in 1877. But of course, for Boltzmann the transition did not involve a major conceptual move, thanks to his conception of probability as a relative time. Thus, the probability of a particular state of the total system is still identified with the fraction of time in which that state is occupied by the system. In other words, he had no need for ensembles or non-mechanical probabilistic assumptions in this paper. However, note that the equivocation between relative times and relative numbers, which was relatively harmless in the first section of the 1868 paper, is no longer possible in the interpretation of \(\rho\). The probability \(\rho_{mc}(\vec{p}_1 \mid \vec{q}_1,\ldots,\vec{q}_N) d^3\vec{p}_1\) gives us the relative time that the total system is in a state for which particle 1 has a momentum between \(\vec{p}_1\) and \(\vec{p}_1 + d^3\vec{p}_1\), for given values of all positions. There is no route back to infer that this has anything to do with the relative number of particles with this momentum. Second, and more importantly, these results open up a perspective of great generality. It suggests that the probability of the molecular velocities for an isolated system in a stationary state will always assume the Maxwellian form if the number of particles tends to infinity. Notably, this argument completely dispenses with any particular assumption about collisions, or other details of the mechanical model involved, apart from the assumption that it is Hamiltonian. Indeed it need not even represent a gas. Third, and most importanty, the main weakness of the present result is its assumption that the trajectory actually visits all points on the energy hypersurface. This is what the Ehrenfests called the ergodic hypothesis.[8] Boltzmann returned to this issue on the final page of the paper (WA I, 96). He notes there that exceptions to his theorem might occur, if the microscopic variables would not, in the course of time, take on all values compatible with the conservation of energy. For example this would be the case when the trajectory is periodic. However, Boltzmann observed, such cases would be immediately destroyed by the slightest disturbance from outside, e.g., by the interaction of a single external atom. He argued that these exceptions would thus only provide cases of unstable equilibrium. Still, Boltzmann must have felt unsatisfied with his own argument. According to an editorial footnote in the collection of his scientific papers (WA I, 96), Boltzmann's personal copy of the paper contains a hand-written remark in the margin stating that the point was still dubious and that it had not been proven that, even including interaction with an external atom, the trajectory would traverse all points on the energy hypersurface. 2.1 Doubts about the ergodic hypothesis However, his doubts were still not laid to rest. His next paper on gas theory (1871a) returns to the study of a detailed mechanical gas model, this time consisting of polyatomic molecules, and explicitly avoids any reliance on the ergodic hypothesis. And when he did return to the ergodic hypothesis in (1871b), it was with much more caution. Indeed, it is here that he actually first described the worrying assumption as an hypothesis, formulated as follows: The great irregularity of the thermal motion and the multitude of forces that act on a body make it probable that its atoms, due to the motion we call heat, traverse all positions and velocities which are compatible with the principle of [conservation of] energy. (WA I, 284) Note that Boltzmann formulates this hypothesis for an arbitrary body, i.e., it is not restricted to gases. He also emphasizes, at the end of the paper, that “the proof that this hypothesis is fulfilled for thermal bodies, or even is fullfillable, has not been provided” (WA I, 287). There is a major confusion among modern commentators about the role and status of the ergodic hypothesis in Boltzmann's thinking. Indeed, the question has often been raised how Boltzmann could ever have believed that a trajectory traverses all points on the energy hypersurface, since, as the Ehrenfests conjectured in 1911, and was shown almost immediately in 1913 by Plancherel and Rozenthal, this is mathematically impossible when the energy hypersurface has a dimension larger than 1. It is a fact that both (1868) [WA I, 96] and (1871b) [WA I, 284] mention external disturbances as an ingredient in the motivation for the ergodic hypothesis. This might be taken as evidence for ‘interventionalism’, i.e., the viewpoint that such external influences are crucial in the explanation of thermal phenomena (see Blatt 1959, Ridderbos & Redhead 1998). Yet even though Boltzmann clearly expressed the thought that these disturbances might help to motivate the ergodic hypothesis, he never took the idea very seriously. The marginal note in the 1868 paper mentioned above indicated that, even if the system is disturbed, there is still no easy proof of the ergodic hypothesis, and all his further investigations concerning this hypothesis assume a system that is either completely isolated from its environment or at most acted upon by a static external force. Thus, interventionalism did not play a significant role in his thinking.[9] It has also been suggested, in view of Boltzmann's later habit of discretising continuous variables, that he somehow thought of the energy hypersurface as a discrete manifold containing only finitely many discrete cells (Gallavotti 1994). In this reading, obviously, the mathematical no-go theorems of Rozenthal and Plancherel no longer apply. Now it is definitely true that Boltzmann developed a preference towards discretizing continuous variables, and would later apply this procedure more and more (although usually adding that this was purely for purposes of illustration and more easy understanding). However, there is no evidence in the (1868) and (1871b) papers that Boltzmann implicitly assumed a discrete structure of mechanical phase space or the energy hypersurface. Instead, the context of his (1871b) makes clear enough how he intended the hypothesis, as has already been argued by (Brush 1976). Immediately preceding the section in which the hypothesis is introduced, Boltzmann discusses trajectories for a simple example: a two-dimensional harmonic oscillator with potential \(V(x,y) = ax^2 + by^2\). For this system, the configuration point \((x, y)\) moves through the surface of a rectangle. See Figure 1 below. (See also Cercignani 1998, 148.) He then notes that if \(a/b\) is rational, (actually: if \(\sqrt{a/b}\) is rational) this motion is periodic. However, if this value is irrational, the trajectory will, in the course of time, traverse “almählich die ganze Fläche” (WA I, 271) of the rectangle. See Figure 2: He says in this case that \(x\) and \(y\) are independent, since for each values of \(x\) an infinity of values for \(y\) in any interval in its range are possible. The very fact that Boltzmann considers intervals for the values of \(x\) and \(y\) of arbitrary small sizes, and stressed the distinction between rational and irrational values of the ratio \(a/b\), indicates that he did not silently presuppose that phase space was essentially discrete, where those distinctions would make no sense. Now clearly, in modern language, one should say in the second case that the trajectory lies densely in the surface, but not that it traverses all points. Boltzmann did not possess this language. In fact, he could not have been aware of Cantor's insight that the continuum contains more than a countable infinity of points. Thus, the correct statement that, in the case that \(\sqrt{a/b}\) is irrational, the trajectory will traverse, for each value of \(x\), an infinity of values of \(y\) within any interval however small, could easily have lead him to believe (incorrectly) that all values of \(x\) and \(y\) are traversed in the course of time. It thus seems eminently plausible, by the fact that this discussion immediately precedes the formulation of the ergodic hypothesis, that the intended reading of the ergodic hypothesis is really what the Ehrenfests dubbed the quasi-ergodic hypothesis, namely, the assumption that the trajectory lies densely (i.e. passes arbitrarily close to every point) on the energy hypersurface.[10] The quasi-ergodic hypothesis is not mathematically impossible in higher-dimensional phase spaces. However, the quasi-ergodic hypothesis does not entail the desired conclusion that the only stationary probability distribution over the energy surface is micro-canonical. One might then still conjecture that if the system is quasi-ergodic, the only continuous stationary distribution is microcanonical. But even this is fails in general (Nemytskii and Stepanov 1960). Nevertheless, Boltzmann remained skeptical about the validity of his hypothesis. For this reason, he attempted to explore different routes to his goal of characterizing thermal equilibrium in mechanics. Indeed, both the preceding (1871a) and his next paper (1871c) present alternative arguments, with the explicit recommendation that they avoid hypotheses. In fact, he did not return to this hypothesis until the 1880s (stimulated by Maxwell's 1879 review of the last section of Boltzmann's 1868 paper). At that time, perhaps feeling fortified by Maxwell's authority, he would express much more confidence in the ergodic hypothesis (see Section 5). So what role did the ergodic hypothesis play? It seems that Boltzmann regarded the ergodic hypothesis as a special dynamical assumption that may or may not be true, depending on the nature of the system, and perhaps also on its initial state. Its role was simply to help derive a result of great generality: for any system for which the hypothesis is true, its unique equilibrium state is characterized by the microcanonical distribution (3), from which a form of the Maxwell distribution may be recovered in the limit \(N \rightarrow \infty\), regardless of any details of the inter-particle interactions, or indeed whether the system represented is a gas, fluid, solid or any other thermal body. Note also that the microcanonical distribution immediately implies that the probability of finding the system in any region on the energy hypersurface is proportional to the size of that region (as measured by the microcanonical measure). This idea would resurface in his 1877 combinatorial argument, although then without the context of characterizing equilibrium thermal equilibrium. The Ehrenfests have suggested that the ergodic hypothesis played a much more fundamental role. In particular they have pointed out that if the hypothesis is true, averaging over an (infinitely) long time would be identical to phase averaging with the microcanonical distribution. Thus, they suggested that Boltzmann relied on the ergodic hypothesis in order to equate time averages and phase averages, or in other words, to equate two meaning of probability (relative time and relative volume in phase space.) There is however no evidence that Boltzmann ever followed this line of reasoning. He simply never gave any justification for equivocating time and particle averages, or phase averages, at all. Presumably, he thought nothing much depended on this issue and that it was a matter of taste. 3. The H-theorem and the reversibility objection 3.1 1872: The Boltzmann equation and H-theorem In 1872 Boltzmann published one of his most important papers, its long title often abbreviated as Weitere Studien (Further studies). It was aimed at something completely new, namely at showing that whatever the initial state of a gas system was, it would always tend to evolve to equilibrium. Thus, this paper is the first work to deal with non-equilibrium theory. The paper contained two celebrated results nowadays known as the Boltzmann equation and the H-theorem. The latter result was the basis of Boltzmann's renewed claim to have obtained a general theorem corresponding to the second law. This paper has been studied and commented upon by numerous authors. Indeed an integral translation of the text has been provided by (Brush 1966). Thus, for present purposes, a succinct summary of the main points might have been sufficient. However, there is still dispute among modern commentators about its actual content. The issue at stake is the question whether the results obtained in this paper are presented as necessary consequences of the mechanical equations of motion, or whether Boltzmann explicitly acknowledged that they would allow for exceptions. Klein has written I can find no indication in his 1872 memoir that Boltzmann conceived of possible exceptions to the H-theorem, as he later called it. (Klein 1973, 73) Klein argues that Boltzmann only came to acknowledge the existence of such exceptions thanks to Loschmidt's critique in 1877. An opposite opinion is expressed by von Plato (1994). He argues that, already in 1872, Boltzmann was well aware that his H-theorem had exceptions, and thus “already had a full hand against his future critics”. Indeed, von Plato states that … contrary to a widely held opinion, Boltzmann is not in 1872 claiming that the Second Law and the Maxwellian distribution are necessary consequences of kinetic theory. (von Plato 1994, 81) It might be of some interest to try and settle this dispute. The Weitere Studien starts with an appraisal of the role of probability theory in the context of gas theory. The number of particles in a gas is so enormous, and their movements are so swift that we can observe nothing but average values. The determination of averages is the province of probability calculus. Therefore, “the problems of the mechanical theory of heat are really problems in probability calculus” (WA I, 317). But, Boltzmann says, it would be a mistake to believe that the theory of heat would therefore contain uncertainties. He emphasizes that one should not confuse incompletely proven assertions with rigorously derived theorems of probability theory. The latter are necessary consequences from their premisses, as in any other theory. They will be confirmed by experience as soon as one has observed a sufficiently large number of cases. This last condition, however, should be no significant problem in the theory of heat because of the enormous number of molecules in macroscopic bodies. Yet, in this context, one has to make doubly sure that we proceed with the utmost rigor. Thus, the message expressed in the opening pages of this paper seems clear enough: the results Boltzmann is about to derive are advertised as doubly checked and utterly rigorous. Of course, their relationship with experience might be less secure, since any probability statement is only reproduced in observations by sufficiently large numbers of independent data. Thus, Boltzmann would have allowed for exceptions in the relationship between theory and observation, but not in the relation between premisses and conclusion. He continues by saying what he means by probability, and repeats its equivocation as a fraction of time and the relative number of particles that we have seen earlier in 1868a: If one wants […] to build up an exact theory […] it is before all necessary to determine the probabilities of the various states that one and the same molecule assumes in the course of a very long time, and that occur simultaneously for different molecules. That is, one must calculate how the number of those molecules whose states lie between certain limits relates to the total number of molecules (WA I, 317). This equivocation is not vicious however. For most of the paper the intended meaning of probability is always the relative number of molecules with a particular molecular state. Only at the final stages of his paper (WA I, 400) does the time-average interpretation of probability (suddenly) recur. Boltzmann says that both he and Maxwell had attempted the determination of these probabilities for a gas system but without reaching a complete solution. Yet, on a closer inspection, “it seems not so unlikely that these probabilities can be derived on the basis of the equations of motion alone…” (WA I, 317). Indeed, he announces, he has solved this problem for gases whose molecules consist of an arbitrary number of atoms. His aim is to prove that whatever the initial state in such a system of gas molecules, it must inevitably approach the state characterized by the Maxwell distribution (WA I, 320). The next section specializes to the simplest case of monatomic gases and also provides a more complete specification of the problem he aims to solve. The gas molecules are modelled as hard spheres, contained in a fixed vessel with perfectly elastic walls (WA I, 320). Boltzmann represents the state of the gas by a time-dependent distribution function \(f_t(\vec{v})\) which gives us, at each time \(t\), the relative number of molecules with velocity \(\vec{v}\).[11] He also states three more special assumptions: After a few well-known manipulations, the result from these assumptions is a differentio-integral equation (the Boltzmann equation) that determines the evolution of the distribution function \(f_t(v)\) from any given initial form. There are also a few unstated assumptions that go into the derivation of this equation. First, the number of molecules must be large enough so that the (discrete) distribution of their velocities can be well approximated by a continuous and differentiable function \(f\). Secondly, \(f\) changes under the effect of binary collisions only. This means that the density of the gas should be low (so that three-particle collisions can be ignored) but not too low (so that collisions would be too infrequent to change \(f\) at all. (The modern procedure to put these requirements in a mathematically precise form is that of taking the so-called Boltzmann-Grad limit.) A final ingredient is that all the above assumptions are not only valid at an instant but remain true in the course of time. The H-theorem. Assuming that the Boltzmann equation is valid for all times, one can prove without difficulty the “H-theorem”: the quantity \(H\) (that Boltzmann in this paper actually denotes as \(E\)), defined as decreases monotonically in time, i.e., as well as its stationarity for the Maxwell distribution, i.e., Boltzmann concludes this section of the paper as follows: It has thus been rigorously proved that whatever may have been the initial distribution of kinetic energy, in the course of time it must necessarily approach the form found by Maxwell. […] This [proof] actually gains much in significance because of its applicability on the theory of multi-atomic gas molecules. There too, one can prove for a certain quantity \(E\) that, because of the molecular motion, this quantity can only decrease or in the limiting case remain constant. Thus, one may prove that, because of the atomic movement in systems consisting of arbitrarily many material points, there always exists a quantity which, due to these atomic movements, cannot increase, and this quantity agrees, up to a constant factor, exactly with the value that I found in [Boltzmann 1871c] for the well-known integral \(\int dQ/T\). This provides an analytical proof of the Second Law in a way completely different from those attempted so far. Up till now, one has attempted to proof that \(\int dQ/T = 0\) for reversible (umkehrbaren) cyclic[12] processes, which however does not prove that for an irreversible cyclic process, which is the only one note-that occurs in nature, it is always negative; the reversible process being merely an idealization, which can be approached more or less but never perfectly. Here, however, we immediately reach the result that \(\int dQ/T\) is in general negative and zero only in a limit case… (WA I, 345) Thus, as in his 1866 paper, Boltzmann claims to have a rigorous, analytical and general proof of the Second Law. 3.2 Remarks and problems 1. As we have seen, The H-theorem formed the basis of a renewed claim by Boltzmann to have obtained a theorem corresponding to the second law, at least for gases. A main difference with his previous (1866) claim, is that he now strongly emphasized the role of probability calculus in his derivation. Even so, it will be noted that his conception of probability is still a fully mechanical one. Thus, there is no conflict between his claims that on the one hand, “the problems of the mechanical theory of heat are really problems in probability calculus” and that the probabilities themselves are “derived on the basis of the equations of motion alone”, on the other hand. Indeed, it seems to me that Boltzmann's emphasis on the crucial role of probability is only intended to convey that probability theory provides a particularly useful and appropriate language for discussing mechanical problems in gas theory. There is no indication in this paper yet that probability theory could play a role by furnishing assumptions of a non-mechanical nature, i.e., independent of the equations of motion. However, see Badino (2006, Other Internet Resources) for a very different point of view. 2. Note that Boltzmann stresses the generality, rigor and “analyticity” of his proof. He put no emphasis on the special assumptions that go into the argument. Indeed, the, Stoßzahlansatz commonly identified as the key assumption that is responsible for the time-asymmetry of the H-theorem, is announced as follows: The determination [of the number of collisions] can only be obtained in a truly tedious manner, by consideration of the relative velocities of both particles. But since this consideration has, apart from its tediousness, not the slightest difficulty, nor any special interest, and because the result is so simple that one might almost say it is self-evident I will only state this result. (WA I, 323) This, is not an announcement that would alert his readers to the crucial role of this assumption. Still, it thus seems natural that Boltzmann's contemporaries must have understood him as claiming that the H-theorem followed necessarily from the dynamics of the mechanical gas model. Indeed this is exactly how Boltzmann's claims were understood. For example, the recommendation written in 1888 for his membership of the Prussian Academy of Sciences mentions as Boltzmann's main feat that had proven that, whatever its initial state, a gas must necessarily approach the Maxwellian distribution (Kirsten and Körber 1975, 109). Is there then no evidence at all for von Plato's reading of the paper? Von Plato quotes a passage from Section II, where Boltzmann repeats the previous analysis by assuming that energy can take on only discrete values, and replacing all integrals by sums. He recovers, of course, the same conclusion, but now adds a side remark, which touches upon the case of non-uniform gases: Whatever may have been the initial distribution of states, there is one and only one distribution which will be approached in the course of time. […] This statement has been proved for the case that the distribution of states was already initially uniform. It must also be valid when this is not the case, i.e. when the molecules are initially distributed in such a way that in the course of time they mix among themselves more and more, so that after a very long time the distribution of states becomes uniform. This will always be the case, with the exception of very special cases, e.g., when all molecules were initially situated along a straight line, and were reflected by the walls onto this line. (WA I, 358) True enough, Boltzmann in the above quote indicates that there are exceptions. But he mentions them only in connection with an extension of his results to the case when the gas is not initially uniform, i.e., when condition (b) above is dropped. There can be no doubt that under the assumption of the conditions (a – c), Boltzmann claimed rigorous validity of the H-theorem. 3. Note that Boltzmann misconstrues, or perhaps understates, the significance of his results. Both the Boltzmann equation and the H theorem refer to a body of gas in a fixed container that evolves in complete isolation from its environment. There is no question of heat being exchanged by the gas during a process, let alone in an irreversible cyclic process. His comparison with Clausius' integral \(\int dQ/T\) (i.e., \(\oint \dbar Q/T\) in modern notation) is therefore really completely out of place. The true import of Boltzmann's results is rather that they provide a generalization of the entropy concept to non-equilibrium states, and a claim that this non-equilibrium entropy \(-kH\) increases monotonically as the isolated gas evolves from non-equilibrium towards an equilibrium state. The relationship with the second law is, therefore, indirect. On the one hand, Boltzmann proves much more than was required, since the second law does not speak of non-equilibrium entropy, nor of monotonic increase; on the other hand it proves also less, since Boltzmann does not consider more general adiabatic processes. 3.3 1877: The reversibility objection According to Klein (1973) Boltzmann seemed to have been satisfied with his treatments of 1871 and 1872 and turned his attention to other matters for a couple of years. He did come back to gas theory in 1875 to discuss an extension of the Boltzmann equation to gases subjected to external forces. But this paper does not present any fundamental changes of thought. However, the 1875 paper did contain a result which, two years later, led to a debate with Loschmidt. It showed that a gas in equilibrium in an external force field (such as the earth's gravity) should have a uniform temperature, and therefore, the same average kinetic energy at all heights. This conclusion conflicted with the intuition that rising molecules must do work against the gravitational field, and pay for this by having a lower kinetic energy at greater heights. Now Boltzmann (1875) was not the first to reach the contrary result, and Loschmidt was not the first to challenge it. Maxwell and Guthrie entered into a debate on the very same topic in 1873. But actually their main point of contention need not concern us very much. The discussion between Loschmidt and Boltzmann is important for quite another issue which Loschmidt only introduced as a side remark: By the way, one should be careful about the claim that in a system in which the so-called stationary state has been achieved, starting from an arbitrary initial state, this average state can remain intact for all times. […] Indeed, if in the above case [i.e. starting in a state where one particle is moving, and all the others lie still on the bottom], after a time τ which is long enough to obtain the stationary state, one suddenly assumes that the velocities of all atoms are reversed, we would obtain an initial state that would appear to have the same character as the stationary state. For a fairly long time this would be appropriate, but gradually the stationary state would deteriorate, and after passage of the time τ we would inevitably return to our original state: only one atom has absorbed all kinetic energy of the system […], while all other molecules lie still on the bottom of the container. Obviously, in every arbitrary system the course of events must be become retrograde when the velocities of all its elements are reversed. (Loschmidt 1876, 139) Putting the point in more modern terms, the laws of (Hamiltonian) mechanics are such that for every solution one can construct another solution by reversing all velocities and replacing \(t\) by \(-t\). Since \(H[f]\) is invariant under the velocity reversal, it follows that if \(H[f]\) decreases for the first solution, it will increase for the second. Accordingly, the reversibility objection is that the H-theorem cannot be a general theorem for all mechanical evolutions of the gas. Boltzmann's response (1877a). Boltzmann's responses to the reversibility objection are not easy to make sense of, and varied in the course of time. In his immediate response to Loschmidt he acknowledges that certain initial states of the gas would lead to an increase of the \(H\) function, and hence a violation of the H-theorem. The crux of his rebuttal was that such initial states were extremely improbable, and could hence safely be ignored. This argument shows that Boltzmann was already implicitly embarking on an approach that differed from the context of the 1872 paper. Recall that this paper used the concept of probability only in the guise of a distribution function, giving the probability of molecular velocities. There was no such thing in that paper as the probability of a state of the gas as whole. This conceptual shift would become more explicit in Boltzmann's next paper (1877b). This rebuttal of Loschmidt is far from satisfactory. Any reasonable probability assignment to gas states is presumably invariant under the velocity reversal of the molecules. If an initial state leading to an increase of \(H\) is to be ignored on account of its small probability, one ought to assume the same for the state from which it was constructed by velocity reversal. In other words, any non-equilibrium state would have to be ignored. But that in effect saves the H-theorem by restricting it to those cases where it is trivially true, i.e., where \(H\) is constant. The true source of the reversibility problem was only identified by Burbury (1894a) and Bryan1 (1894), by pointing out that already the Stoßzahlansatz contained a time-asymmetric assumption. Indeed, if we replace the SZA by the assumption that the number of collisions is proportional to the product \(f(\vec{v}'_1) f(\vec{v}'_2)\) for the velocities \(\vec{v}'_1, \vec{v}'_2\) after the collision, we would obtain, by a similar reasoning, \(dH/dt \le 0\). The question is now, of course, we would prefer one assumption above the other, without falling into some kind of double standards. One thing is certain, and that is that any such preference cannot be obtained from mechanics and probability theory alone. 4. 1877b: The combinatorial argument Boltzmann’s begins the paper by stating that his goal is to elucidate the relationship between the Second Law and probability calculus. He notes he has repeatedly emphasized that the Second Law is related to probability calculus. In particular he points out that the 1872 paper confirmed this relationship by showing that a certain quantity [i.e. \(H\)] can only decrease, and must therefore obtain its minimum value in the state of thermal equilibrium. Yet, this connection of the Second Law with probability theory became even more apparent in his previous paper (1877a). Boltzmann states that he will now solve the problem mentioned in that paper, of calculating the probabilities of various distributions of state by determining the ratio of their numbers. He also announces that, when a system starts in an improbable state, it will always evolve towards more probable states, until it reaches the most probable state, i.e. that of thermal equilibrium. When this is applied to the Second Law, he says, “we can identify that quantity which is usually called entropy, with the probability of the state in question.” And: “According to the present interpretation, [the Second Law] states nothing else but that the probability of the total state of a composite system always increases” [W.A. II, pp. 165–6]. Exactly how all this is meant, he says, will become clear later in the article. Succinctly, and rephrased in modern terms, the argument is as follows. Apart from \(\Gamma\), the mechanical phase space containing the possible states x for the total gas system, we consider the so-called \(\mu\)-space, i.e., the state space of a single molecule. For monatomic gases, this space is just a six-dimensional space with \((\vec{p}, \vec{q})\) as coordinates. With each state \(x\) is associated a collection of \(N\) points in \(\mu\)-space. We now partition \(\mu\) into \(m\) disjoint cells: \(\mu = \omega_1 \cup \ldots \cup \omega_m\). These cells are taken to be rectangular in the position and momentum coordinates and of equal size. Further, it is assumed we can characterize each cell in \(\mu\) with a molecular energy \(\epsilon_i\). For each \(x\), henceforth also called the microstate, we define the macrostate (Boltzmann's term was Komplexion) as \(Z := (n_1,\ldots,n_m)\), where \(n_i\) is the number of particles that have their molecular state in cell \(\omega_i\). The relation between macro- and microstate is obviously non-unique since many different microstates, e.g., obtained by permuting the molecules, lead to the same macrostate. One may associate with every given macrostate \(Z_0\) the corresponding set of microstates: The volume \(\lvert A_{Z_0} \rvert\) of this set is proportional to the number of permutations that lead to this macrostate. Boltzmann proposes the problem to determine for which macrostate \(Z\) the volume \(\lvert A_Z \rvert\) is maximal, under the constraints of a given total number of particles, and a given total energy: This problem can easily be solved with the Lagrange multiplier technique. Under the Stirling approximation for \(n_i \gg 1\) we find which is a discrete version of the Maxwell distribution. Moreover, the volume of the corresponding set in \(\Gamma\) is related to a discrete approximation of the H-function. Indeed, one finds In other words, if we take \(-kNH\) as the entropy of a macrostate, it is also proportional to the logarithm of the volume of the corresponding region in phase space. Boltzmann also refers to these volumes as the “probability” of the macrostate. He therefore now expresses the second law as a tendency to evolve towards ever more probable macrostates. 4.1 Remarks and problems 1. No dynamical assumption is made; i.e., it is not relevant to the argument whether or how the particles collide. It might seem that this makes the present argument more general than the previous one. Indeed, Boltzmann suggests at the end of the paper that the same argument might be applicable also to dense gases and even to solids. However, it should be noticed that the assumption that the total energy can be expressed in the form \(E = \sum_i n_i\epsilon_i\) means that the energy of each particle depends only on the cell in which it is located, and not the state of other particles. This can only be maintained, independently of the number \(N\), if there is no interaction at all between the particles. The validity of the argument is thus really restricted to ideal gases. 2. The procedure of dividing \(\mu\) space into cells is essential here. Indeed, the whole prospect of using combinatorics would disappear if we did not adopt a partition. But the choice to take all cells equal in size in position and momentum variables is not quite self-evident, as Boltzmann himself shows. In fact, before he develops the argument above, his paper first discusses an analysis in which the particles are characterized by their energy instead of position and momentum. This leads him to carve up \(\mu\)-space into cells of equal size in energy. He then shows that this analysis fails to reproduce the desired Maxwell distribution as the most probable state. This failure is remedied by taking equally sized cells in position and momentum variables. The latter choice is apparently ‘right’, in the sense that leads to the desired result. However, since the choice clearly cannot be relegated to a matter of convention, it leaves the question for a justification. 3. A crucial new ingredient in the argument is the distinction between micro- and macrostates. Note in particular that where in the previous work the distribution function \(f\) was identified with a probability (namely of a molecular state), in the present paper it, or its discrete analogy \(Z\) is a description of the macrostate of the gas. Probabilities are not assigned to the particles, but to the macrostate of the gas as a whole. According to Klein (1973, 84), this conceptual transition in 1877b marks the birth of statistical mechanics. While this view is not completely correct (as we have seen, Boltzmann 1868 already applied probability to the total gas), it is true that (1877b) is the first occasion where Boltzmann identifies probability of a gas state with relative volume in phase space, rather than its relative time of duration. Another novelty is that Boltzmann has changed his concept of equilibrium. Whereas previously the essential characteristic of an equilibrium state was always that it is stationary, in Boltzmann's new view it is conceived as the macrostate (i.e., a region in phase space) that can be realized in the largest number of ways. As a result, an equilibrium state need not be stationary: in the course of time, the system may fluctuate in and out of equilibrium. 4. But what about evolutions? Perhaps the most important issue is the question what exactly the relation is of the 1877b paper to Loschmidt’s objection and Boltzmann’s p reply to it (1877a)? The primary reply can be read as an announcement of two subjects of further investigation: From the relative numbers of the various distributions of state, one might even be able to calculate their probabilities. This could lead to an interesting method of determining thermal equilibrium (W.A. II, p. 121) This is a problem about equilibrium. The second announcement was that Boltzmann said: “The case is completely analogous for the Second Law” (W.A.. II, p. 121). Because there are so very many more uniform than non-uniform distributions, it should be extraordinarily improbable that a system should evolve from a uniform distribution of states to a non-uniform distribution of states. This is a problem about evolution. In other words, one would like to see that something like the statistical H-theorem actually holds. Boltzmann’s (1877b) is widely read as a follow-up to these announcements. Indeed, Boltzmann repeats the first quote above in the introduction of the paper (W.A. II, p. 165), indicating that he will address this problem. And so he does, extensively. Yet he also states: Our main goal is not to linger on a discussion of thermal equilibrium, but to investigate the relations of probability with the Second Law of thermodynamics (W.A.. II, p. 166). Thus, the main goal of 1877b is apparently to address the problem concerning evolutions and to show how they relate to the Second Law. Indeed, this is what one would naturally expect since the reversibility objection is, after all, a problem concerned with evolutions. Even so, a remarkable fact is that the 1877b paper hardly ever touches its self-professed “main goal” at all. For a sketch of how different commentators on Boltzmann's (1877b) view his attitude on this question I refer to Uffink (2007). To sum up this discussion of Boltzmann’s answer to the reversibility objection: it seems that on all above readings of his two 1877 papers, the lacuna between what Boltzmann had achieved and what he needed to do to answer Loschmidt satisfactorily – i.e. to address the issue of the evolution of distributions of state and to prove that non-uniform distributions tend, in some statistical sense, to uniform ones, or to prove any other reformulation of the H-theorem – remains striking. 5. Some later work 5.1 Return of the ergodic hypothesis As we have seen, the 1877 papers introduced some conceptual shifts in Boltzmann’ approach. Accordingly, this year is frequently seen as a watershed in Boltzmann's thinking. Concurrent with that view, one would expect his subsequent work to build on his new insights and turn away from the themes and assumptions of his earlier papers. Actually, Boltzmann's subsequent work in gas theory in the next decade and a half was predominantly concerned with technical applications of his 1872 Boltzmann equation, in particular to gas diffusion and gas friction. And when he did touch on fundamental aspects of the theory, he returned to the issues and themes raised in his 1868–1871 papers, in particular the ergodic hypothesis and the use of ensembles. This step was again triggered by a paper of Maxwell, this time one that must have pleased Boltzmann very much, since it was called “On Boltzmann's theorem” (Maxwell 1879) and dealt with the theorem discussed in the last section of his (1868). He pointed out that this theorem does not rely on any collision assumption. But Maxwell also made some pertinent observations along the way. He is critical about Boltzmann's ergodic hypothesis, pointing out that “it is manifest that there are cases in which this does not take place” (Maxwell 1879, 694). Apparently, Maxwell had not noticed that Boltzmann's later papers had also expressed similar doubts. He rejected Boltzmann'a time-average view of probability and instead preferred to interpret ρ as an ensemble density. Further, he states that any claim that the distribution function obtained was the unique stationary distribution “remained to be investigated” (Maxwell 1879, 722). Maxwell's paper seems to have revived Boltzmann's interest in the ergodic hypothesis, which he had been avoiding for a decade. This renewed confidence is expressed, for example in Boltzmann (1887): Under all purely mechanical systems, for which equations exist that are analogous to the so-called second law of the mechanical theory of heat, those which I and Maxwell have investigated … seem to me to be by far the most important. … It is likely that thermal bodies in general are of this kind [i.e., they obey the ergodic hypothesis] However, he does not return to this conviction in later work. His Lectures on Gas Theory (1896,1898), for example, does not even mention the ergodic hypothesis. 5.2 Return of the reversibility objection The first occasion on which Boltzmann returned to the reversibility objection is in (1887b). This paper delves into a discussion between Tait and Burbury about the approach to equilibrium for a system consisting of gas particles of two different kinds. The details of the debate need not concern us, except to note that Tait raised the reversibility objection to show that taking any evolution approaching equilibrium one may construct, by reversal of the velocities, another evolution moving away from equilibrium. At this point Boltzmann entered the discussion: I remark only that the objection of Mr. Tait regarding the reversal of the direction of all velocities, after the special state [i.e., equilibrium] has been reached, […] has already been refuted in my [(1877a)]. If one starts with an arbitrary non-special state, one will get […] the to special state (of course, perhaps after a very long time). When one reverses the directions of all velocities in this initial state, then, going backwards, one will not (or perhaps only during some time) reach states that are even further removed from the special state; instead, in this case too, one will eventually again reach the special state. (WA III, 304) This reply to the reversibility objection uses an entirely different strategy from his (1877a). Here, Boltzmann does not exclude the reversed motions on account of their vanishing probability, but rather argues that, sooner or later, they too will reach the equilibrium state. Note how much Boltzmann's strategy has shifted: whereas previously the idea was that a gas system should approach equilibrium because of the H-theorem; Boltzmann's idea is now, apparently, that regardless of the behavior of \(H\) as a function of time, there are independent reasons for assuming that the system approaches equilibrium. Boltzmann's contentions may of course very well be true. But they do not follow from the H-theorem, or by ignoring its exceptions, and would have to be proven otherwise.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
Philosophical Concepts

Spinoza’s Theory of Attributes

1. Attributes in the Ethics Before discussing the theory of attributes in the Ethics, it will be helpful to keep in mind a rudimentary sketch of the general structure of Spinoza’s ontology:[2] There is only an infinite substance (1P14), that is, there are no created substances. The infinite substance consists …

1. Attributes in the Ethics Before discussing the theory of attributes in the Ethics, it will be helpful to keep in mind a rudimentary sketch of the general structure of Spinoza’s ontology:[2] There is only an infinite substance (1P14), that is, there are no created substances. The infinite substance consists of infinite attributes (1D6). Every mode, be it finite or infinite, must be conceived through an attribute (1D5, 1P10Schol, 2P6 and 2P6Dem). Finally, what other philosophers consider to be “created substances,” such as my mind (as well as my body), are finite modes for Spinoza (1P11).[3] 1.1 What are Attributes? Spinoza is not the first to furnish his metaphysics with attributes and in that he is following a very long tradition. He is, though, mostly influenced by Descartes, and in some ways is trying to keep with Descartes’ notion of “attribute.” It therefore will be useful to look back and get a sense of what Descartes had in mind and thus get a preliminary grasp (which will be revised) of what Spinoza means by “attribute.” Descartes states in the Principles of Philosophy that attributes are the essence of a thing, so the essence of mind is thought or thinking, and the essence of body is to be extended (Principles, I, §53, CSM, I, p. 210, AT 25). To see why this is so, it is worth revisiting the first and second Meditations, even if very briefly. Let us begin with body and Extension. To understand the essence of body, we can look to the famous wax example in Meditation Two (CSM, II, p. 20–21, AT 30–32). While sitting by the fireplace, Descartes inspects a piece of wax and asks himself what he knows of the wax. He begins by listing all the sensory properties of the wax: it is white, has a certain smell, makes a certain sound when one raps it with one’s finger, is hard, and has a certain taste. After listing all its sensory properties, he then places the piece of wax by the fire and sees how it loses all those properties: it changes color, smell, texture, taste, etc. Descartes concludes, among other things, that the essence of the wax, insofar as it is a body, is that it is extended in length, breadth, and depth since that is the only thing that remains constant about the wax. In this respect, the piece of wax is no different from any other body—that is, its essence is to be extended. Extension, then, according to Descartes, is the essence of body. In the Meditations we also, famously, come to recognize our own essence as thinking things. We realize this by recognizing that we cannot doubt that we are doubting while doubting. Furthermore, we realize that doubting in this sense is no different from understanding, affirming, denying, willing, unwilling, imagining, and having sense perceptions (seeming to see, etc.) (CSM, II, p. 19, AT 28). Descartes then reaches the conclusion that the essence of the mind is Thought. For these reasons, Descartes claims that Thought and Extension are the principal attributes of mind and body and that they are “really distinct”, that is, they exist independently one from the other.[4] It is important to note that for Descartes, any created substance has only one principal attribute, as opposed to God who has infinite attributes. Spinoza adopts some aspects of the Cartesian set up while rejecting others. He agrees that Thought and Extension are attributes (2P1, 2P2) and are related to essences (1D4). He agrees they are “really distinct” from each other (1P10Schol).[5] Furthermore, he agrees that “mind” has to be conceived through Thought, and “body” through Extension. (2P5 and its demonstration make the case with regard to ideas and Thought; 2D1 establishes it for bodies and Extension. This is also made very clear in 3P2 and its demonstration.) However, he does not agree that they are attributes of created substances, since he rejects the possibility of created substances altogether (1P6Cor., 1P8Schol1, 1P14). One way to understand Spinoza is to see how he can hold both Thought and Extension (and other attributes, if there are others) to be divine attributes or attributes of one and the same (infinite) substance.[6] 1.2 Definition of Attribute Spinoza defines the term “attribute” thus: “By attribute I understand what the intellect perceives of substance as constituting its essence” (1D4). This definition is reminiscent of Descartes’ notion of attributes as it appears in the Principle of Philosophy insofar as attributes are related to the essence (or essences) of substance. However, as many have noticed, it is not clear from the definition alone what exactly Spinoza means. There are several, by now famous, ambiguities in the definition.[7] These, together with the different interpretative options, are discussed in Section 1.8. 1.3 Real Distinction Spinoza makes a very important claim about attributes in the Scholium to Proposition 10 of Part One: “…although two attributes may be conceived to be really distinct (i.e., one may be conceived without the aid of the other), we still cannot infer from that that they constitute two beings, or two different substances.” Spinoza here is explaining something about the relationship among attributes—one may be conceived without the aid of the other—and about the relation of the attributes to the substance, namely, that conceiving attributes independently is not evidence of the existence of independent substances. To understand why this scholium is so important, it is helpful to recall Descartes’ definition of a “real distinction.” In the Principles of Philosophy, Descartes says: “Strictly speaking, a real distinction exists only between two or more substances; and we can perceive that two substances are really distinct simply from the fact that we can clearly and distinctly understand one apart from the other” (Principles, I, §60, CSM, I, p. 213, AT 28). For Descartes, this anchors the strict epistemological and ontological separation between mind and body. One of the things we learn from going through the Meditations is that we are capable of clearly and distinctly perceiving ourselves without a body—the cogito in the Second Meditation, and we clearly and distinctly perceive body without thinking in the Fifth Meditation. (Of course, in retrospect, we realize that we already did this in a sense with the wax as well). Descartes thus concludes that mind and body are really distinct, that is, one can exist without the other. One important implication of this distinction is that it allows for a fully mechanistic explanation of the physical world. To explain the interaction between two bodies requires alluding only to their physical properties (size, shape and motion) without the need to take recourse to any Aristotelian explanation involving final causes. Making room for mechanistic explanations, that is, for the New Science, was one of Descartes’ chief motivations for writing the Meditations. Spinoza preserves this aspect of Cartesian doctrine (cf. appendix to Part One of the Ethics and discussion in Section 1.3.1). Having separated the mind so sharply from the body, Descartes is left with having to explain their evident unity. More specifically, he is burdened with trying to explain how two really distinct substances seem to interact causally. Their causal interaction seems problematic because, according to Descartes, each substance is independent; the infinite substance depends on nothing but itself (Principles, I, §51, CSM, I, p. 210, AT 24), while created substances depend on nothing but God for their existence (Principles, I, §52, CSM I, p. 210, AT 25). If distinct substances interact causally then they seem to depend on one another, and this would go against their nature qua substances. This is why the union of the mind and body is a thorny issue for Descartes, and was and continues to be a source of much debate (Cf. for example, Hoffman, 1986). For some, a version of this problem translates into Spinoza’s metaphysics (cf. Section 1.9.4). The issue of the nature of the “real distinction” for Spinoza is discussed in the subsequent section. For Descartes, then, there is the epistemological claim that perceiving Thought does not involve perceiving Extension and vice versa. Each is explanatorily independent from the other, (although not from God). Spinoza adopts this aspect of Cartesian philosophy and holds, as well, that there is what Della Rocca calls, “a conceptual barrier” between Thought and Extension as Spinoza states in the scholium “i.e., one may be conceived without the aid of the other” (Della Rocca, 1996, 9–17). Spinoza holds Thought and Extension to be explanatorily self-contained. Physical changes are to be understood in terms of other physical items, and ideas are to be understood in terms of other ideas. What is ruled out is what can be called “cross attribute explanations.” For example, explaining the movement of my hand by my desire to move my hand. According to Spinoza, the movement of my hand is to be explained purely physically by alluding to other bodies and their motions, while my desire is to be explained by other desires and ideas. Spinoza makes this very clear in 3P2, its demonstration and scholium: 3P2: The body cannot determine the mind to thinking, and the mind cannot determine the body to motion, to rest, or to anything else (if there is anything else). Dem.: All modes of thinking have God for a cause, insofar as he is a thinking thing, and not insofar as he is explained by another attribute (by 2P6). So what determines the mind to thinking is a mode of thinking and not of extension, that is (by 2D1), it is not the body. This was the first thing. Next, the motion and rest of a body must arise from another body…whatever arises in the body must have arisen from God insofar as he is considered to be affected by some mode of extension, and not insofar as he is considered to be affected by some mode of thinking (also 2P6), that is, it cannot arise from the mind, which (by 2P11) is a mode of thinking. This was the second point. Therefore, the body cannot determine the mind, and so on, q.e.d. Although this is reminiscent of Descartes in some respects, there is, of course, one crucial difference. For Descartes the fact that one can conceive Thought distinctly from Extension is evidence for the existence of two substances—mind and body. For Spinoza, this is not the case, and this is the point he is making in this central proposition (1P10), namely, that although two attributes may be conceived independently—one without the other—this does not imply that there are two substances existing separately. For Spinoza there is only one substance with infinite attributes, and although each attribute is conceived independently from the other/s they still are, nonetheless, all attributes of one and the same substance. It is possible then to conceive, think, or completely explain the entire universe, or everything that exists, under each one of the attributes. That is, we can give a complete physical description of everything that exists, or alternatively explain, describe, or conceive everything as ideas or thought. Being able to explain the entire universe under the attribute of Extension is what allows Spinoza to preserve Descartes’ effort of providing room for progress in the New Science (Cf. Appendix to Part One). Spinoza and Descartes agree about the epistemological separation between Thought and Extension, but not about the ontological one. Descartes calls the distinction between attributes of the same substance, and between a given attribute and its substance a “rational distinction,” (Principles, I, §62, CSM, I, p. 214, AT 30) and so, insofar as Thought and Extension belong to the same substance for Spinoza, they would be, in Descartes’ terminology, rationally distinct.[8] Spinoza however, says that they are “really distinct.” How exactly to understand the “reality” of the distinction among the attributes is a crucial interpretative matter and is discussed in Sections 1.8.1–1.8.2. 1.4 The Identification of Attributes with Substance Another claim that has to be taken into account in an analysis of Spinoza’s view on attributes is that God is his attributes: 1P4: “Therefore, there is nothing outside the intellect through which a number of things can be distinguished from one another except substance, or what is the same (by 1D4), their attributes, and their affections” (italics added), 1P19: “God is eternal, or [sive] all God’s attributes are eternal,” 1P20Cor.: “It follows second, that God, or [sive] all of God’s attributes, are immutable.” Some might consider 1P29Schol to be making an identity claim as well: “But by Natura Naturata I understand whatever follows from the necessity of God’s nature, or [sive] from any of God’s attributes…” Spinoza in these places seems to be claiming that there is an identification of the substance with its attributes. However, this identification can be understood in several ways and in various degrees of strictness. How one reads this claim depends on other considerations discussed in Section 1.9.3. 1.5 Extension as a Divine Attribute One of the important things that Spinoza does in the first two parts of the Ethics is to establish Extension as a divine attribute (elements of this view are evident already in KV I/25). Although Spinoza adopts many important aspects of Cartesian metaphysics, he collapses the divide between the infinite and created substances. This means that principal attributes that were at the “created substance” level in the Cartesian set-up are “moved up”, so to speak, to the infinite substance level for Spinoza. One of these attributes is, of course, Extension. Spinoza has to explain to a resistant audience how Extension can be considered a divine attribute. The important steps that will allow Spinoza to claim that Extension can be an attribute of God are the following. He defines God as a substance consisting of infinite attributes (1D6).[9] He shows that substances cannot share attributes (1P5), that every substance is infinite (1D8), that a single substance can have several attributes (1P10schol), and that an infinite substance exists (1P11). With an eye towards specifically establishing Extension as a divine attribute, he claims in 1P12: “No attribute of a substance can be truly conceived from which it follows that the substance can be divided.” In 1P13, he states: “A substance which is absolutely infinite is indivisible,” and in the corollary, he makes the point especially clear with respect to Extension: “From these propositions it follows that no substance, and consequently no corporeal substance, insofar as it is a substance, is divisible.” In 1P14, he establishes that there is only one substance (or rather, that there are no created substances). Finally in 1P15 he claims: “Whatever is, is in God, and nothing can be or be conceived without God.” With this, the stage is set for Extension being a divine attribute or applicable to God, if in fact it is a genuine attribute (which is established only in Part Two). Spinoza is aware that this will be received with great resistance. The possible objection he imagines is that since Extension is divisible by its very nature then, if Extension were an attribute of God, God would be divisible. God, of course, cannot be divisible, for then he would not be infinite. In the Scholium to 1P15 he shows the ensuing contradictions if one holds Extension to be by its very nature divisible. It is important for him to show that Extension cannot imply divisibility in answer to possible objectors holding traditional views. Moreover, he has just shown that there is only one substance, which is indivisible (1P12 and 1P13), and so whatever attributes it has, none of them can imply divisibility in the only substance. Spinoza then shows that if Extension is an attribute, it is applicable to God, and there is no danger of that implying any real division in the substance. One important result of this is that what appear to be individuated bodies cannot be really individuated in the Cartesian sense of implying real distinction and the existence of multiple substances. Rather, what appear to be individuated bodies are only modes of substance under the attribute of Extension.[10] Only in Part Two does Spinoza show that Extension (as well as Thought) are in fact attributes of God: “Thought is an attribute of God, or [sive] God is a thinking thing” (2P1) and “Extension is an attribute of God, or [sive] God is an extended thing” (2P2). 1.6 The 2P7 Doctrine A very important characteristic regarding attributes is established in 2P7 and its scholium, which is sometimes referred to in the literature as the “parallelism doctrine.” However, as will be discussed in Section 1.9.2, this nomenclature is laden with a significant amount of interpretative bias and the term is nowhere to be found in the Ethics itself. It is thus advisable to stay clear of it and simply refer to it as “the 2P7 Doctrine.” 2P7 states: “The order and connection of ideas is the same as the order and connection of things,” (“ordo, & connexio idearum idem est, ac ordo & connexio rerum”). Spinoza explains this proposition in the scholium: For example, a circle existing in Nature and the idea of the existing circle, which is also in God, are one and the same thing, which is explained through different attributes…Therefore, whether we conceive Nature under the attribute of extension, or under the attribute of thought, or under any attribute, we shall find one and the same order, or one and the same connection of causes, that is, that the same things follow one another. Spinoza is claiming here that a mode X under the attribute of Thought is one and the same as mode X under Attributey. A good way to get some intuitive sense of this is to see how this works with respect to ourselves. Under the attribute of Thought, I am a finite mode—an idea or mind. Under the attribute of Extension, I am a finite mode, that is, a body. The claim in 2P7 and its scholium is that my mind (a mode of Thought) and my body (a mode of Extension) are one and the same. This is the case for all modes. Furthermore, whatever causal relation my body, say, bears to other modes of Extension, my mind will bear to the other modes of Thought. The understanding of this doctrine and its implications in more depth depends, probably more than any other doctrine, on how one construes other central elements of Spinoza’s theory of attributes (e.g. the number of attributes). In Section 1.9.2 different directions of interpretation are considered regarding 2P7 and its scholium. 1.7 The Two Known Attributes Spinoza famously claims that we, human minds, only know two attributes—Thought and Extension. This can be seen as arising from the axioms in Part Two: 2A2: “Man thinks,” 2A4: “We feel a certain body is affected in many ways,” 2A5: “We neither feel nor perceive any singular things, except bodies and modes of thinking,” as well as 2P13: “The object of the idea constituting the human mind is the body, or [sive] a certain mode of extension which actually exists, and nothing else” [italics added] (this is true already in KV, 1/27). In Letter 64 Spinoza tries to explain why we can only perceive these two attributes, and he does so by referring back to 2P13 and claims in the letter: “Therefore, the mind’s power of understanding extends only as far as that which this idea of the body contains within itself, or which follows therefrom. Now this idea of the body involves and expresses no other attributes of God than extension and thought.” Although some have found this line of argumentation unsatisfying (e.g. Bennett, 1984, 78–79) it is worth noting that Spinoza here is relying on axioms. 1.8 Ambiguities and Interpretative Directions The attempt to understand Spinoza’s doctrine regarding the attributes has traditionally led interpreters in two main directions, although others have been proposed (e.g. Lennon, 2005. 12–30; Shein, 2009).[11] The first is what is known as the “subjective” interpretation which follows Hegel, and is given its paradigmatic expression by Wolfson. More recently, Michael Della Rocca as been advocating a more idealistic interpretation of the attributes, which shares certain important features with the subjectivist camp. The other, which has become the standard, is the “objective” interpretation. These two principal avenues stem from some important ambiguities in the definition of “attribute”: “By attribute I understand what the intellect perceives of substance as constituting its essence” (1D4).[12] The first term that is ambiguous is “intellect,” since it can refer either to the finite intellect or the infinite one (cf. diagram in Section 1). The second important ambiguity lies in the Latin term tanquam, since it can mean either “as if, but not in fact,” or “as in fact.” The definition can therefore be read, either as stating that attributes are what the intellect perceives of substance as constituting its actual essence, or that attributes are what the intellect perceives only as if they are what constitute the essence but are not what in fact constitutes it or them. The subjectivists accordingly claim that attributes are what the finite intellect perceives of substance as if constituting its essence. The objectivists, by and large, instead claim that it is the infinite intellect that perceives the attribute as in fact constituting the essence of substance. In the following sections the different interpretative options are explained a grosso modo. The ways in which the different interpretative avenues affect other Spinozistic doctrines are discussed in Sections 1.9.1–1.9.4. As is well known, Hegel, in various respects, considered himself to be modifying Spinoza’s doctrine (“to be a follower of Spinoza is the essential commencement of all philosophy”) and his interpretation of Spinoza was extremely influential.[13] In his Lectures on the History of Philosophy, Hegel says that what has utmost reality for Spinoza is the absolute (or the infinite substance) and that anything else (finite modes, in particular) are ways of negating this absolute. He goes on to explain that the understanding [or “intellect”] grasps the reality of substance through attributes, but “it is only reality in view of the understanding.” He stresses that understanding in terms of attributes is due to the nature of the understanding and not because of the nature of the absolute (or the infinite substance) as such. It is clear that he considers the understanding to be the understanding of finite minds, because he goes on to explain that Spinoza’s claim that there are “infinite attributes” has to be interpreted as “infinite in character” and not in number and that there are only the two attributes known to finite minds—Thought and Extension. What is referred to in the literature as the subjectivist reading, following Hegel, holds that the intellect perceiving the attributes is the finite intellect and that the attributes are projections of the finite mind onto the infinite substance which it cannot fully comprehend. In other words, according to the subjectivist interpretation, the definition of attribute states that attributes are what the finite intellect perceives of substance as if (but not in fact) constituting its essence. In contrast, the objectivist reading takes the intellect in question to be the infinite one, and the tanquam to mean “as in fact,” and so it read the definition as claiming that attributes are what the infinite intellect perceives of substance as (in fact) constituting its essence. Wolfson summarizes the difference between the two positions thus: According to the former interpretation [subjectivism], to be perceived by the mind means to be invented by the mind, for of themselves the attributes have no independent existence at all but are identical with the essence of the substance. According to the latter interpretation [objectivism], to be perceived by the mind means only to be discovered by the mind, for even of themselves the attributes have independent existence in the essence of substance (Wolfson, 1934, 146). One of the motivations behind Wolfson’s view is that he considers Spinoza to be the last of the medieval Jewish rationalists, and in following with that tradition, Spinoza locates all multiplicity not in the infinite substance (God), but rather in the human mind. That is, the fact that God has multiple attributes is explained not by his having multiple essences, natures, or aspects, but rather because of the nature of the human mind. This is based on the conviction that God’s true nature is simple and any multiplicity is merely apparent but not real. It is because of the limitations of the finite mind that it attributes multiplicity to the infinite substance, when in reality the infinite substance is simple. In this view there is a gap between the attributes and the infinite substance. The infinite substance as it is in itself, so to speak, is unknowable to the finite mind. With respect to the “real distinction,” the distinction between the attributes in this view is grounded in the different ways the finite mind has of conceiving the infinite substance. That is, the distinction between the attributes is not based on the nature of the infinite substance itself, but it reveals, in a way, something about the nature of finite perception. It is in these terms that the “reality” of the distinction is to be understood, i.e., as if but not in fact. Two main objections have been brought forth against the subjectivist interpretation. These are considered by most commentators to be forceful enough for rejecting subjectivism as a serious contender for a satisfying interpretation of Spinoza’s theory of attributes. The first objection to subjectivism is that, finite minds can never have true knowledge of God, but only knowledge “as if.” All knowledge is rendered illusory. The reason for this is quite clear. In the subjectivist interpretation the attributes are projections of the finite mind, therefore the finite mind can never come to know the infinite substance as it is in itself. This seems to contradict Spinoza’s claim that the finite mind can have adequate, that is, perfect knowledge of God’s essence (2P47). The second objection is that this interpretation seems irreconcilable with those places in the text where Spinoza identifies the attributes and God (cf. 1P4, 1P19 and 1P20Cor.). Again, as projections of the finite intellect, the attributes do not properly pertain to the substance, and therefore cannot be identical to it. For these reasons, among others, the subjective interpretation (understood in these terms) has fallen out of favor.[14] Michael Della Rocca, however, has been advocating more recently that attributes (and diversity more generally), are mind-dependent yet not illusory. Thus he aims to overcome some of the traditional objections to subjectivism (primarily “illusory knowledge” objections) while insisting on the mind-dependent status of attributes. He takes the mind-dependant nature of diversity (be it of attribute or modes) to be an inevitable consequence of Spinoza’s adoption of the Principle of Sufficient Reason (cf. Della Rocca, 2012). In light of these kinds of criticisms to the subjectivist interpretation, commentators have turned towards what are known as “objectivist” accounts. Although the details of these accounts are quite diverse, there are a few key elements they share—all related to the fact that they do not wish to be subjectivist. The first of these characteristics is that they hold that knowledge in the system cannot be illusory. That is, knowledge through attributes must yield true, or adequate, knowledge. One way to do this is to claim that it is the infinite intellect that perceives the attributes, and so knowledge through attributes is the kind of knowledge the infinite intellect has, and therefore is not illusory (e.g. Bennett, 1984, 147; Delahunty, 1985, 116; Della Rocca 1996, 157; Haserot, 1972, 32–35). Therefore, the tanquam in the definition is to be read “as in fact” and not “as if.” As opposed to subjectivism, which does not emphasize the “reality” of the distinction between the attributes, or rather, does not ground the distinction in the nature of the infinite substance, objectivist interpretations place ontological weight on the “real distinction” between the attributes. In other words, for the multiplicity to have a certain reality and not be illusory, it must somehow be grounded not in the perceiver but in the thing perceived, namely, the infinite substance. The danger of this kind of interpretation is that if the distinction is stressed too strongly, the unity of the substance is lost. If the infinite substance has “really distinct” attributes, and this distinction is grounded in, say, distinct natures or essences of the infinite substance, then there has to be an explanation of how a multiplicity of natures or essences can be united to form one substance. (This issue is addressed in further detail in Section 1.9.4 as it emerges in the discussion of the nature of the union of mind and body). Any interpretation of Spinoza must characterize the relation between any given attribute and the substance. As mentioned, in the subjectivist account there is a problematic gap between the substance and any given attribute. The alternative is to deny this gap. For example, Bennett claims the following: I think that here [Ep. 9] he is saying that substance differs from attribute only by the difference between a substance and an adjectival presentation of the very same content. If we look for how that which is extended (substance) differs from extension (attribute), we find that it consists only in the notion of that which has… extension or thought or whatever; and that, Spinoza thinks, adds nothing to the conceptual content of extension, but merely marks something about how the content is logically structured. As I did in §12.7, he is rejecting the view that a property bearer is an item whose nature qualifies it to have properties, in favour of the view that the notion of a property bearer, of a thing which…, is a bit of formal apparatus, something which organizes conceptual content without adding to it. According to this view, there is an emptiness about the difference between substance and attribute (Bennett, 1984, 62–63). Although Bennett claims there is an emptiness about the distinction between the two, he does not consider it an absolute identity either. He finds an identity claim to be irreconcilable with the claim that attributes are really distinct. Della Rocca has suggested intentionality as a way of denying the gap and treats “… is extended” and “… is thinking” as referentially opaque. In other words, what is being picked out by the infinite intellect in either instance is the same, but the way in which it is picked out is different. Yet another way of denying the gap is to claim, along with Descartes, that the distinction between an attribute and the substance is only a rational distinction. That is, in reality there is no distinction, but in the finite mind we can separate, contra natura, the attribute from the substance. Or in other words, the finite mind can abstract the attribute from the substance, but in reality they are not separated. This type of view must be supplemented by an account, then, of what is meant by the “real distinction” among the attributes. 1.9 Implications of the Various Readings on Other Spinozistic Doctrines Although Spinoza claims that there are infinite attributes, a question arises as to how many there are, because “infinity” may not necessarily refer to numeric infinity.[15] Bennett, among others, has made the case that infinity in early modern philosophy means totality (Bennett, 1984, 75–79). Spinoza’s claims, then, that the infinite substance has infinite attributes can be understood as the claim that the infinite substance has all the attributes there are to be had.[16] This is consistent with there being, say, only the two known attributes. There are sections in the text, on the other hand, that seem to suggest that infinity means a numerical infinity, and thus the infinite substance has as attributes Thought and Extension, as well as infinitely many other unknown attributes. The places used as evidence for those wishing to claim there are more than two attributes are the following: 1D6: By God I understand a being absolutely infinite, that is, a substance consisting of an infinity of attributes, of which each one expresses an eternal and infinite essence. Exp.: I say absolutely infinite, not infinite in its own kind; for if something is only infinite in its own kind, we can deny infinite attributes of it; but if something is absolutely infinite, whatever expresses essence and involves no negation pertains to its essence. 2P7Schol: Therefore whether we conceive Nature under the attribute of Extension, or the attribute of Thought, or any other attribute, we shall find one and the same order, or one and the same connection of causes, that is the same things follow one another. Letter 56: To your [Hugo Boxel] question as to whether I have as clear an idea of God as of a triangle, I reply in the affirmative. But if you ask me whether I have as clear a mental image of God as of a triangle, I reply in the negative. We cannot imagine God, but we can apprehend him by the intellect. Here it should also be observed that I do not claim to have complete knowledge of God, but that I do understand some of his attributes—not indeed all of them, or the greater part—and it is certain that my ignorance of very many attributes does not prevent me from having knowledge of some of them. When I was studying Euclid’s Elements, I understood early on that the three angles of a triangle are equal to two right angles, and I clearly perceived this property of a triangle although I was ignorant of many others. This issue can be linked to the previous discussion regarding the ambiguities in the definition of attribute, although this is not always done. If one holds that it is the infinite intellect that is doing the relevant perceiving, there seems to be no reason to limit the number of attributes it perceives. Conversely, it might be claimed that if the infinite intellect perceives only two attributes, there must be a sufficient reason why there are only two, and why they are Thought and Extension and not other attributes. If, on the other hand, one holds that it is the finite intellect that conceives the attributes, and it only conceives Thought and Extension, then these are the only two attributes there are. In the literature, however, this line of reasoning is not always followed, and examples can be found of interpreters who hold that it is the infinite intellect that does the perceiving, but that there need not be more than two attributes (Bennett, 1984, 75–76). At the same time, there are interpreters who claim it is the finite intellect that perceives the attributes while there are infinitely many attributes (Wolfson, 1934, 226). How many attributes there are affects how one reads another central doctrine in Spinoza’s metaphysics, such as 2P7 and 2P7Schol, to which we turn next. A crucial role in Spinoza’s system is played by 2P7 and its scholium, since they lay the ground for solving, or rather dissolving, the mind–body problem. Therefore, the understanding of the nature of the union of mind and body depends on one’s interpretation of Spinoza’s theory of attributes and 2P7 and its scholium in particular. (For a discussion of the issues regarding the union of Mind and Body, see Section 1.9.4). The interpretation of the metaphysical structure of what is expressed in 2P7 and its scholium is affected greatly by the number of attributes one believes there are in Spinoza’s system and how one understands the relation between the attributes and the substance. The general description of 2P7 and its scholium is discussed above in Section 1.6. 2P7 and its scholium can be understood in very different ways. In what follows three types of interpretive directions are described. This is not meant to be exhaustive by any means, but it will provide a sense of the kinds of options that have been offered by commentators. Let us begin with the simplest option first. If one holds that there are only two attributes, Thought and Extension, the metaphysical structure of 2P7 and its scholium is quite straightforward. Every mode under the attribute of Thought is associated with a mode in Extension, and vice versa and the relations between modes in one attribute are mirrored in the other. Those that hold this kind of view must, of course, provide a convincing argument to the effect that there are only two attributes. However, if one takes there to be more than two attributes, the structure gets quite a bit more complex. One option that has been advanced is that Thought is a special attribute and encompasses ideas of all the modes in all the other attributes (Cf. for example Curley, 1969, 146; and more recently, Melamed, 2009, Chapters 1–2). Thought turns out to be “special” in this kind of interpretation because there are many more modes (ideas) or facets of modes in Thought than there are under any other attribute. Another way of expressing this is by saying that 2P7 is not a biconditional. The requirement of an associated mode goes only in one direction from any mode in any attribute to a mode in Thought. The burden on this type of view is that it must account for the favoring of Thought over the other attributes, and perhaps also for the relation between all the non-Thought modes in the other attributes. Another option (or class of options) that is available is to claim that attributes come in pairs: an object-like attribute coupled with a thought-like attribute (Curley entertains this option as well; Curley, 1969, 146). Under this type of interpretation we would get Thought and Extension following the structure of the first alternative, that is, each idea in Thought is associated with (one and the same) a mode in Extension. Taking there to be more than just two attributes, we also get Thoughtx coupled with Extensionx in which each ideax is one and the same as bodyx under Extensionx, and Thoughty coupled with Extensiony, and so on. Letter 66 provides some support for this view. This kind of account has to be supplemented, of course, with an account of the relations among these Thought-like / Extension-like pairs of attributes. As mentioned earlier, Spinoza identifies God, or the infinite substance with the attributes (1P4, 1P19 and 1P20Cor.). The nature of this identification is also affected by one’s interpretative stance regarding the attributes. The traditional subjectivist view, since it claims that the attributes are a projection of the finite mind onto the substance, cannot hold this identification to be strict. Objectivist views, which stress the distinctness of attributes, also cannot accept these claims literally (E.g., Bennett, 1984, 64; Curley, 1988, 13; Gueroult, 1968, 50). The reason for this is as follows: if the substance is strictly identical to any one of its attributes, then attributes will be identical to each other (by transitivity), and therefore no longer distinct, as Spinoza claims. Different objectivist interpretations address this issue differently. Curley, for example, holds that the identity is not one that holds between any given attribute and the substance, but rather between the totality of the attributes and the substance (Curley, 1988, 30). Bennett, on the other hand, believes Spinoza is simply overstating the case (Bennett, 1984, 64). This identity can be understood strictly if the distinction between the attributes and the substance is taken to be only rationally distinct, that is identical in reality, and at the same time taking the distinction between attributes to be only epistemological and not ontological. Another doctrine that is heavily affected by how one understands the attributes is the union of mind and body. For Descartes, the issue was how to unite two really distinct created substances—uniting minds with bodies. Descartes’ reply is that God unites these two substances, and we have tools by which to recognize that we are united in this way, i.e., sensory experience (Meditation Six). Spinoza, of course, cannot allude to God as creator to unite minds and bodies, since what is being united are not created substances but finite modes. The possible problem can be articulated as follows: How can Spinoza claim, on the one hand, that there are modes of really distinct attributes, e.g., my mind and my body, and therefore there is a real distinction between my mind and my body, and, on the other, claim in the Scholium to 2P7 that my mind and body are one and the same? This problem which arises for the objectivist interpretations has been addressed in a variety of ways. It is worth noting that this problem does not arise for subjectivist views. This possible tension in Spinoza does not present itself for the subjectivists, since they do not claim that the “real distinction” between the attributes has ontological weight. That is, there are no two things that have to be united. Commentators wishing to stress the “distinctness” of the attributes find themselves having to explain the sense in which Spinoza can mean that the mind and the body are “one and the same.” A common strategy among commentators has been to appeal to a structure that is attribute-neutral in order to account for the unity. To better understand this issue it is useful to consider a few examples. One important example is Bennett, who claims that the unity is to be understood as a unity of properties, but not of the modes themselves: …his [Spinoza’s] thesis about the identity of physical and mental particulars is really about the identity of properties. He cannot be saying that physical P1=mental M1; that is impossible because they belong to different attributes. His thesis is rather that if P1 is systematically linked with M1, then P1 is extension—and—F for some differentia F such that M1 is thought—and—F. What it takes for an extended world to contain my body is exactly what it takes for a thinking world to contain my mind (Bennett, 1984, 141) That is, Bennett thinks that there is some trans-attribute feature (what he calls “differentia F”) such that it can be added to Extension to get extended-F, and added it to Thought to get thinking-F. Bennett admits that nothing like this is found anywhere in the text, but he believes that this way we can make sense of Spinoza’s holding that the attributes are “really distinct” from each other and, at the same time, that thinking-F and extended-F are one and the same. Della Rocca, while holding a view different from that of Bennett regarding Spinoza’s theory of attributes, also finds himself having to account in some way for the unity of mind and body, and so he suggests that modes are said to be numerically identical when they share all of their neutral properties, where “neutral properties” are those properties which do not involve being thought under any particular attributes. This is contrasted with “intentional properties” which are attribute-dependent such as “being of a certain volume.” As an example of a neutral property, Della Rocca offers “having five immediate effects.” He then claims that if modes share all of their neutral properties, they are identical (that is, one and the same). Therefore, since my mind and my body share all of their neutral properties, they are identical (Della Rocca, 1996, 133–38). The final example that shall be considered is Gueroult’s interpretation. Gueroult, in order to account for the professed identity between modes of different attributes in 2P7 and its scholium, considers 1P28 which states: Every singular thing, or any thing which is finite and has a determinate existence, can neither exist nor be determined to produce an effect unless it is determined to exist and produce an effect by another cause, which is also finite and has a determinate existence; and again, this cause also can neither exist nor be determined to produce an effect unless it is determined to exist and produce an effect by another, which is also finite and has a determinate existence, and so on, to infinity. To explain this proposition, Gueroult draws a distinction between “modes of substance” and “modes of attributes.” The claim is that 1P28 treats only modes of substance and not attributes, and is therefore unique. In other words, the identity is then understood in reference to “modes of substance” and not “modes of attributes” (Gueroult, 1968, 338–39). Again, we see an attribute-independent structure—the chain of modes of substance—that is meant to account for the “one and the sameness” of modes of different attributes. It has been pointed out, however, that this type of solution is not without serious problems (Shein, 2009). Briefly, the issue is as follows: The main reason for rejecting the subjectivist view is that in that type of interpretation, God, as he is in himself, remains unknowable, and this conflicts with Spinoza’s view that adequate knowledge is possible. However, as Spinoza makes clear in 1P10Schol, nature must be conceived under attributes. In light of this, an attribute-independent structure, by its very nature as “attribute-independent,” is unknowable as well. Therefore, in this view, knowledge of the union or the nature of the identity between mind and body, is in principle unknowable, and, in that respect, does not provide any advantage over subjectivist views. An alternative mentioned above that has been suggested is to deny the gap between the attributes and the substance by claiming that, along with Descartes, Spinoza holds there to be a rational distinction between them, that is, in reality they are identical (Shein, 2009). This avoids the kind of problems that are raised for the subjectivist view, since, in this interpretation to know the attributes is to know the substance. Since in this view the attributes are only rationally distinct from the substance, the “real distinction” between the attributes, that Spinoza states in 1P10Schol, is understood as being only an epistemological claim, as he states in the text—“i.e., one may be conceived without the other” (1P10Schol). That is, it does not carry additional ontological weight as the objectivists hold. This avoids having to impose onto the Spinozistic system an attribute-independent structure to account for the unity which does not seem to fit with his epistemology. 2. Attributes in the Short Treatise In the Short Treatise Spinoza develops ideas that will come to a full articulation later on in the Ethics, such as the idea that, strictly speaking, there are only two attributes through which we can properly come to have knowledge of God—Thought and Extension. However, unlike in the Ethics, he does not simply dismiss the more traditional attributes such as omnipotence, eternality, immutability, and infinity. To maintain some sense of these traditional divine attributes, Spinoza explains that they are not attributes strictly speaking, but rather propria of God. This is stated first clearly in the first footnote to Chapter III (“How God is the Cause of All Things”): The following are called Propria because they are nothing but Adjectives which cannot be understood without their Substantives. I.e., without them God would indeed not be God; but still, he is not God through them, for they do not make known anything substantial, and it is only through what is substantial that God exists. Spinoza then, is distinguishing between that which gives us knowledge of God, or better yet, through which God can be known—Thought and Extension—and things that can be said of God, that is, adjectival, but give us no knowledge—what he terms propria. This is explained most explicitly in Chapter VII of the Short Treatise. The difference Spinoza wishes to draw between these is that although these traditional divine attributes can be said of God, they do not teach us anything about what God is really like. An analysis of these traditional attributes (propria) shows them either to be said of God when considering all of the attributes or to be only modes of attributes. For example, Spinoza claims that when statements such as that “God is one,” “eternal” and “immutable” are said of God, they are said “in consideration of all his attributes.” On the other hand, something like “omniscience” is only a mode of an attribute, since it is only said of God when he is conceived through, or considered under, the attribute of Thought. That is, only when God is thought of as a thinking thing, can he be said to be omniscient. Similarly, when God is said to be “omnipresent,” it is only when he is conceived of through Extension. In the Ethics though, Spinoza does away with the talk of propria and does not accord them really any status as such. 3. Conclusion With the collapse of the divide between created substances and the infinite substance, attributes play a new role for Spinoza; traditional divine attributes are eliminated while attributes traditionally associated with created substances (Extension in particular) are attributed to the infinite substance. Furthermore, with the elimination of this divide and the establishment of the infinite substance as the only substance, Spinoza hopes that attributes account for variety in the substance without jeopardizing its unity. All interpreters and readers of Spinoza are forced to wrestle with making sense of this double role since it sits at the very core of his metaphysics. It is vital to realize that this endeavor is necessarily and beautifully linked to other fundamental aspects of Spinoza’s metaphysics such as the “real distinction” between the attributes, the proclaimed identity of the substance and its attributes, the nature of the conceiving intellect in the definition of ‘attribute’, the nature of this intellect’s conceptions (illusory or not), the number of attributes, the structure of 2P7 and its scholium, and finally the nature of the union of mind and body. These inter-connections are a reflection of the fully systematic nature of Spinoza’s metaphysics.

🧠 0
❤️ 0
🔥 0
🧩 0
🕳️ 0
← Previous Page 1 of 74 Next →