Second-order and Higher-order Logic
Second-order logic has a subtle role in the philosophy of mathematics. It is stronger than first order logic in that it incorporates “for all properties” into the syntax, while first order logic can only say “for all elements”. At the same time it is arguably weaker than set theory in that its quantifiers range over one limited domain at a time, while set theory has the universalist approach in that its quantifiers range over all possible domains.
This stronger-than-first-order-logic/weaker-than-set-theory duality is the source of lively debate, not least because set theory is usually construed as based on first order logic. How can second-order logic be at the same time stronger and weaker? To make the situation even more complex, it was suggested early on that in order that the strength of second-order logic can be exploited to its full extent and in order that “for all properties” can be given an exact interpretation, a first order set-theoretical background has to be assumed.
This seemed to undermine the claimed strength of second-order logic as well as its role as the primary foundation of mathematics. It also seemed to attach second-order logic to aspects of set theory which second-order logic might have wanted to bypass: the higher infinite, the independence results, and the difficulties in finding new convincing axioms. Setting aside philosophical questions, it is undeniable and manifested by a continued stream of interesting results, that second-order logic is part and parcel of a logician’s toolbox.
1. Introduction
Second-order logic[1] was introduced by Frege in his Begriffsschrift (1879) who also coined the term “second order” (“zweiter Ordnung”) in (1884: §53). It was widely used in logic until the 1930s, when set theory started to take over as a foundation of mathematics. It is difficult to say exactly why this happened, but set theory has certain simplicity in being based on one single binary predicate \(x\in y\), compared to second- and higher-order logics, including type theory. Discussion among philosophers of the merits of second-order logic especially in comparison to set theory continues to this day (Shapiro 1991). Meanwhile second-order logic has become a standard tool of computer science logic, finite model theory in particular. Central questions of theoretical computer science, such as the P = NP? question, can be seen as questions about second-order logic in the finite context.
Ultimately second-order logic is just another formal language. It can be given the standard treatment making syntax, semantics and proof theory exact by working in a mathematical metatheory which we choose to be set theory. It would be possible to use second-order logic itself as metatheory, but it would be more complicated, simply because second-order logic is less developed than set theory.
The distinction between first order logic and second-order logic is easiest to understand in elementary mathematics. Let us consider number theory. The objects of our study are the natural numbers 0, 1, 2,… and their arithmetic. With first order logic we can formulate statements about number theory by using atomic expressions \(x = y,\) \(x+y = z\) and \(x\times y = z\) combined with the propositional operations \(\land,\) \(\neg,\) \(\lor,\) \(\to\) and the quantifiers \(\forall x\) and \(\exists x\). Here the variables x, y, z,… are thought to range over the natural numbers. With second-order logic we can express more: we have variables x, y, z,… as in first order logic for numbers. In addition we have variables X, Y, Z,… for properties of numbers and relations between numbers as well as quantifiers \(\forall X\) and \(\exists X\) for these variables. We have the atomic expressions of first order logic and also new atomic expressions of the form \(X(y_1,\ldots, y_n)\).
An interesting and much studied question arises: what properties of natural numbers can be expressed in first order logic and what can be expressed in second-order logic? The question is equally interesting if “expressed” is replaced by “proved”, once a suitable proof system is given in both cases. We can replace natural numbers by some other mathematical context, for example real numbers, complex numbers, and so on.
With just the constant \(0\) and the unary function \(n \mapsto n^+\) (where \(n^+\) means \(n+1\)) we can express in second-order logic the Induction Axiom
of natural numbers. This, together with the axioms \(\forall x\,\neg(x^+=0)\) and \(\forall x\,\forall y(x^+=y^+\to x=y)\) characterizes the natural numbers with their successor operation up to isomorphism. In first order logic any theory which has a countably infinite model has also an uncountable model (by the Upward Löwenheim Skolem Theorem). Hence (\ref{ind}) cannot be expressed in first order logic. Another typical second-order expression is the Completeness Axiom of the linear order \(\le\) of the real numbers:
This, together with the axioms of ordered fields characterizes the ordered field of real numbers up to isomorphism. In first order logic any countable theory which has an infinite model has also a countable model (by the Downward Löwenheim Skolem Theorem). Hence (\ref{comp}) cannot be expressed in first order logic.
To early logicians it seemed natural to use second-order logic on a par with first order logic, as if there were not much difference. This included Russell, Löwenheim, Hilbert, and Zermelo. Essentially only Skolem emphasized the difference (see Moore 1988 for the emergence of first order logic). In 1929 Gödel proved his Completeness Theorem and the next year his Incompleteness Theorem. It became gradually obvious that second-order logic is very different from first order logic. One of the first indicators was the incompleteness phenomenon. Gödel showed that any effective axiomatization of number theory is incomplete. On the other hand, there was a simple finite categorical—hence complete (§10)—axiomatization of the structure \((\oN,\) \(+,\) \(\times)\) in second-order logic (see also the discussion related to (\ref{ind})). This showed that there cannot be such a complete axiomatization of second-order logic as there was for first order logic. What became known in the case of first order logic as Gödel’s Completeness Theorem simply cannot hold for second-order logic.
The situation changed somewhat when Henkin proved the Completeness Theorem for second-order logic with respect to so-called general models (§9). Now it became possible to use second-order logic in the same way as first order logic, if only one keeps in mind that the semantics is based on general models. The ability of second-order logic to characterize mathematical structures up to isomorphism does not extend to general models (except in the form of internal categoricity—see §10). Any second-order theory with an infinite general model has general models of all infinite cardinalities, and therefore cannot be categorical in the context of general models. We shall discuss general models and the related Henkin models in §9.
2. The Syntax of Second-Order Logic
A vocabulary in second-order logic is just as a vocabulary in first order logic, that is, a set L of relation, function and constant symbols. Each relation and function symbol has an arity, which is a positive natural number.
Second-order logic has several kinds of variables. It has individual variables denoted by lower case letters \(x, y, z, \ldots\) possibly with subscripts. It has property and relation variables denoted by upper case letters \(X,Y,Z,\ldots\) possibly with subscripts. Finally, it has function variables denoted by upper case letters \(F, G, H, \ldots\) possibly with subscripts. Sometimes function variables are omitted as, after all, functions are special kinds of relations. Each relation and function variable has an arity, which is a positive natural number.
It is noteworthy that although we have property variables we do not have variables for properties of properties. Such variables would be part of the formalism of third order logic, see §12.
The terms of second-order logic are defined recursively as follows: Constant symbols and individual variables are terms. If \(t_1,\ldots,t_n\) are terms, U is an n-ary function symbol and F is an n-ary function variable, then \(U(t_1,\ldots,t_n)\) and \(F(t_1,\ldots,t_n)\) are terms. Note that terms denote individuals, not relations or properties. Thus X alone is not a term but x is.
The atomic formulas of second-order logic are defined from terms as follows: If t and \(t'\) are terms, then \(t = t'\) is an atomic formula. If R is an n-ary relation symbol and \(t_1,\ldots,t_n\) are terms, then \(R(t_1,\ldots,t_n)\) is an atomic formula. If X is an n-ary relation variable, then also \(X(t_1,\ldots,t_n)\) is an atomic formula. The intuitive meaning of \(X(t_1,\ldots,t_n)\) is that the elements \(t_1,\ldots, t_n\) are in the relation X or are predicated by X. We do not allow atomic formulas of the form \(X = Y\), although they would have an obvious meaning. We achieve the same effect by using quantifiers.
It is interesting to note that in second order logic we can actually define the identity \(t=t'\) as \(\forall X(X(t)\leftrightarrow X(t'))\) and prove the familiar axioms of identity from properties of the implication.
An important special case is monadic second-order logic where no function variables are allowed and the relation variables are required to be monadic (a.k.a. unary), i.e., of arity one.
We did not take \(X=Y\) as an atomic formula (although we could have) but having introduced the quantifiers we can use
as a substitute for \(X=Y\). The advantage of taking \(X=Y\) as a basic atomic formula would be that using (\ref{XY}) brings along extra quantifiers. Sometimes it is interesting to minimize the number of quantifiers in a formula. Also, (\ref{XY}) gives the identity \(X=Y\) an extensional flavour in contrast to a possibly different intensional construal.
The concepts of a free and bound occurrence of a variable in a formula are defined in the usual way. A formula is called a sentence if it has no free variables. The concept of a term being free for a variable in a formula is defined as in first order logic.
5. The Infamous Power of Second-Order Logic
11. Logics Between First and Second Order
First order logic and second-order logic are in a sense two opposite extremes. There are many logics between them i.e., logics that extend properly first order logic, and are properly contained in second-order logic. One example is the extension of first order logic by the generalized quantifier known as the Henkin quantifier:
which has the meaning
The extension \(L(H)\) of first order logic by the Henkin quantifier is almost the same as second-order logic: they have the same \(\Delta\)-extension (Krynicki & Lachlan 1979).[19]
The equicardinality or Härtig quantifier
has the meaning “there are as many elements x satisfying \(\phi(x,z_1,\ldots,z_n)\) as there are elements y satisfying \(\psi(y,z_1,\ldots,z_n)\)”, i.e.,
The extension \(L(I)\) of first order logic by the Härtig quantifier is weaker[20] than second-order logic as its Löwenheim number can be \(<2^\omega\), but if \(V=L\), then it is \(\Delta\)-equivalent with second-order logic (Väänänen 1980).
Here are some other generalized quantifiers that are clearly second-order definable:
In weak second-order logic we have no function variables and the relation variables range over finite relations only. The resulting logic is in many ways similar to the extension of first order logic by the generalized quantifier \(Q_0\). Another way to limit the power of second-order logic is the following: Suppose \(\psi\) is a first order formula with an n-ary relation variable X and no non-logical symbols. For infinite A, we define \(\ma\models_s Q_{\psi}X\phi\) if and only if there is a relation \(P\subseteq M^n\) such that \(\ma\models_{s(P/X)}\phi\land\psi\). The quantifier \(Q_{\psi}\) allows second-order quantification over relations that satisfies \(\psi\). If \(\psi\) is \(\forall x(x=x)\) we obtain the usual second-order quantifier which we denote \(Q_{II}\). If moreover \( n=1\) we obtain the monadic second-order quantifier which we denote \(Q_{mon}\). If \(n=1\) and \(\psi\) says that X is a singleton, we obtain the usual first order quantifier which we now denote \(Q_I\). Finally, if \(\psi\) says X is the graph of a permutation of the model, we denote this quantifier by \(Q_{1-1}\). Rather surprisingly, it turns out that with the right concept of interpretability the quantifiers \(Q_I,Q_{mon},Q_{1-1}\) and \(Q_{II}\) are, up to biinterpretability, the only second-order quantifiers of the form \(Q_{\psi}\) (Shelah 1973).
13. Foundations of Mathematics
Mathematics can be based on set theory. This means that mathematical objects are construed as sets and their properties are derived from the axioms of set theory. The intuitive informal picture behind set theory is that there is a cumulative hierarchy \(V_\alpha\) (\(\alpha\) an ordinal) of sets and the axioms are intended to describe the basic properties of such sets. This kind of set theoretical foundation of mathematics has become widely accepted among working mathematicians. We refer to the SEP entry on set theory for more details. Alternatives to set theory are at least category theory, constructive mathematics, and second-order logic.
If one tries to follow as a logician the language mathematicians use in their research, one cannot help seeing that they use second-order concepts very freely. For example, the Induction Axiom is without doubt used by mathematicians in the second-order form depicted in (\ref{ind}). In their natural language mathematicians do not hesitate to use higher order concepts, and indeed do not always even (need to) know the difference.
Second-order logic as a foundation for mathematics focuses on mathematical structures rather than mathematical “objects”, as set theory does. In principle every structure has its own “foundation” based on the special features of the structure. The role of second-order logic is to provide a common framework for this. The intuitive informal picture behind second-order logic on a structure is that there is a domain A of individuals and then separate domains for all n-ary relations for all \(n\in\oN\) and for all n-ary functions for all \(n\in\oN\), associated with A. In comparison to the informal picture of set theory, we have only the domains of subsets, relations and functions, not iterations such as sets of sets, sets of sets of sets and so on, as in set theory. Therefore the intuitive picture behind second-order logic is simpler than the picture behind set theory. In particular, there is no transfinite iteration over all ordinals. Most structures \(\ma\) used in mathematics have a second-order characterization \(\theta_\ma\), see §10. Proving (second order) properties \(\phi\) of such structures means deriving \(\phi\) from \(\theta_\ma\). Opinions differ as to whether this derivation should be a syntactic (formal) derivation using some (incomplete) axiom system of second-order logic or whether it should be a semantic derivation (of the form “Every model of the assumptions is a model of the conclusion”). For a working mathematician there is not much difference and probably a semantic derivation is usually favored.
Suppose a mathematician wants to convince someone about the truth of Bolzano’s Theorem “If a continuous function on the reals has negative values as well as positive values inside an interval, then it has a root in that interval”. They would start with the ordered field of real numbers. It would be (thought to be) clear what this means because the second-order axiomatization of the field is categorical. Then they would take an arbitrary continuous function f on this field such that it has values of opposite sign inside a fixed interval \((a,b)\). It would be clear what this means because second-order logic has variables for functions. Let \(c,d\in (a,b)\) such that \(f(c)<0\) and \(f(d)>0\). Without loss of generality, \(c<d\). Let \(X=\{e\in(a,b) : f(e)<0\}\). Since we have relation variables for subsets of the domain, we can think of X simply as a value of such a relation variable. It is obvious that X exists, but a priori we could have imposed contradictory conditions on X (e.g., \(X=\{e : e\notin X\}\)) and then we should not be able to claim that it exists. However, in this case the Comprehension Axiom Schema implies that X exists. Clearly, \(X\ne\emptyset\) and X is bounded from above by d. One of the second-order axioms characterizing the field of real numbers is that every non-empty set which is bounded from above has a supremum. Thus we can let z to be the supremum of X. Now it follows from basic properties of continuous functions that \(f(z)=0\).
It would be difficult to tell in the above proof whether it was a syntactical derivation from the axioms or a semantic argument. On the surface it looks like a semantic argument. But every step can be derived from the axioms, so it could be considered a shorthand version of a syntactic derivation.
Syntactic derivations in second-order logic based on the Comprehension Axiom Schema and Axioms of Choice are very much like syntactic derivations in set theory. In neither case would a working mathematicians write all the details of the argument but would resort to shorthand notation. In both cases—second-order logic and set theory—we can interpret the proofs as shorthand versions of syntactic formal proofs or as semantic proofs. First order logic satisfies the Completeness Theorem with the help of which semantic proofs can be turned into syntactic formal proofs. Since the Completeness Theorem uses the concept of a first order structure, sentences proved from the axioms are true in all models of the axioms, both countable models and non-standard models. Are they true in the intuitive model of cumulative hierarchy of sets? By the Reflection Theorem such sentences are true in the class of all sets, but since the intuitive model is informal, we cannot prove that the class of all sets is the same as the intuitive model.
On the other hand, in order for the Completeness Theorem to be applicable, a semantic argument has to be such that it works whatever model of set theory we are using. This means that a semantic argument in set theory should not use properties of individual models of set theory unless they are universal first order properties of all models of ZFC. In practice this means that if a property such as \(CH\), \(\Diamond\), \(2^{\aleph_0}=\aleph_2\), and so on, is used, it is mentioned separately as an assumption. Non-first order properties of models cannot be used at all, or else the Completeness Theorem cannot be used. Still, sometimes non-first order properties are used. For example, we may have an argument that every countable model of ZFC satisfies \(\phi\). But then, by the Downward Löwenheim-Skolem Theorem, every model of ZFC satisfies \(\phi\). Thus we have been able to eliminate the non-first order assumption of the countability of the model.
Second-order logic satisfies the Completeness Theorem if Henkin models are used. Thus semantic arguments in second-order logic can be turned into syntactic formal proofs, but as with set theory the semantic argument has to be valid in all Henkin models, not only in all of the full Henkin models. Thus, as in set theory, the semantic argument cannot use properties of the Henkin models that are not shared by all Henkin models, In particular, fullness (see §9.1) cannot be assumed because in general models it is not a second-order property. What happens to categoricity when fullness is abandoned? We still have internal categoricity, so we can use categoricity in proofs as long as we make sure we work inside one model. One certain way to make sure of this is that we base everything on the axioms, the Comprehension Axiom Schema and the Axioms of Choice.
When second-order logic is used as a foundation for mathematics, a situation may emerge in which third order logic is needed. For example, a topology is usually defined as a family of sets and is therefore a third order object. It would be well in the spirit of second-order logic to simply include third (or higher) order logic when needed. Naturally, the Comprehension Axiom Schema and the Axioms of Choice have to be then assumed for third order quantifiers.
We have devoted a fair amount of space in this entry to the power of second-order logic to characterize structures (see §7.1). Putting it briefly, if a structure has mathematical interest, it is second-order characterizable. Structures constructed by means of the Axiom of Choice may escape being second-order characterizable. But what is the power of second-order logic to demonstrate the existence of structures? The axioms, including the Comprehension Axioms and the Axioms of Choice, can be satisfied in a one element domain. So we cannot prove from the axioms that there are any structures of any size \(>1\). This undermines Quine’s characterization of second-order logic as “set theory in sheep’s clothing” (1970, section heading in chapter 5), meant probably to suggest that the ontological commitments of second-order logic are on the same level as those of set theory.
Perhaps the philosophy of second-order logic is that if a structure can be characterized, it exists, echoing Hilbert’s formalist philosophy. However, even in that sense we need to know that the characterizing sentence \(\theta_\ma\) is consistent. A trivial approach is to observe that \(\ma\) itself certainly satisfies \(\theta_\ma\), but this clearly begs the question. Still, this is how Hilbert argues for the consistency of his axioms for the real numbers (1900). If \(\ma=(\oN,+,\cdot)\), an Axiom of Infinity saying that that a proper subset of the domain has the same cardinality as the entire domain, is what is needed. In combination with the Comprehension Axiom Schema the existence of \(\ma\) follows. For \((\oR,+,\cdot,0,1)\) this is not enough, we need a stronger Axiom of Infinity, for example the existence of a dense order which satisfies the Completeness Axiom: Every bounded non-empty set has a supremum. It seems inevitable that when we move from a structure to a bigger structure we need to make a largeness assumption. Such assumptions are called Large Domain Assumptions in Väänänen (2012). In set theory this aspect is taken care of at the outset by assuming the Power Set Axiom and the Replacement Axioms which together make sure that there are large enough sets. In second-order logic we have to assume new and new Large Domain Assumptions, as we move along. This can be seen as a drawback, having to make new assumptions all the time. On the other hand it can be seen as an asset, not having to make too ambitious assumptions before they are really needed.
15. Second-Order Set Theory
We have up to now treated set theory (ZFC) as a first order theory. However, when Zermelo (1930) introduced the axioms which constitute the modern ZFC axiom system, he formulated the axioms in second-order logic. In particular, his Separation Axiom is
and the Replacement Axiom is
Second-order ZFC, \(\ZFC^2\), is simply the received first order ZFC with the Separation Schema replaced by the above single Separation Axiom, and the Replacement Schema replaced by the above single Replacement Axiom. Accordingly, \(\ZFC^2\) is a finite axiom system. Zermelo proved that the models of \(\ZFC^2\) are, up to isomorphism, of the form \((V_\kappa,\in)\), where \(\kappa\) is (strongly) inaccessible \(>\omega\).
Kreisel (1967) has pointed out that second-order set theory in a sense decides the CH, i.e., decides whether it is true or not, even if we do not know which way the decision goes. More exactly \(\ZFC^2\models CH\) or \(\ZFC^2\models \neg CH\), because \(CH\) is true if and only if \(V_\kappa\models CH\) for inaccessible \(\kappa\), i.e., if and only if \(\ZFC^2\models CH\). Of course we can express CH in first order set theory, too, so the situation is not really different from first order set theory. Many set theorists think that the concept of set is definitive enough to decide eventually also CH even if ZFC does not decide it. Likewise, we may argue that the concept of second-order semantics is definitive enough to decide CH even if the current axioms of second-order logic cannot do it.
The question arises, why do we not use second-order set theory \(\ZFC^2\) as the metatheory of second-order logic, as recommended in Shapiro (1991). In fact we could use it. However, the question might rise, what is the semantics of our metatheory? In principle such questions can lead to an infinite regress. By using first order set theory as the metatheory, the question about the semantics of the metatheory would simply be the question about the semantics of first order logic. We have pointed out in §6 that semantics of first order logic is absolute relative to ZFC. This gives some assurance that we need not continue asking, what the metatheory is.