Chapter 3

Fundamental Considerations in Set

In this chapter we continue to pursue an understanding of sets. We begin by examining how to combine sets in various ways to get new sets. To that end, products and coproducts are introduced, and then more complex limits and colimits, with the aim of conveying a sense of their universal properties. The chapter ends with some additional interesting constructions in Set.

3.1   Products and coproducts

This section introduces two concepts that are likely to be familiar, although perhaps not by their category-theoretic names: product and coproduct. Each is an example of a large class of ideas that exist far beyond the realm of sets (see Section 6.1.1).

3.1.1   Products

Definition Let X and Y be sets. The product of X and Y, denoted X × Y, is defined as the set of ordered pairs (x, y), where xX and yY. Symbolically,

X×Y={ (x,y)|xX, yY }.

There are two natural projection functions, π1 : X × YX and π2: X × YY.


Example (Grid of dots). Let X = {1, 2, 3, 4, 5, 6} and Y = {♣, ♢, ♡, ♠}. Then we can draw X × Y as a 6 by 4 grid of dots, and the projections as projections


Application A traditional (Mendelian) way to predict the genotype of offspring based on the genotype of its parents is by the use of Punnett squares. If F is the set of possible genotypes for the female parent, and M is the set of possible genotypes of the male parent, then F × M is drawn as a square, called a Punnett square, in which every combination is drawn.


How many elements does the set {a, b, c, d} × {1, 2, 3} have?

Application Suppose we are conducting experiments about the mechanical properties of materials, as in Application For each material sample we will produce multiple data points in the set ⌜extension⌝ × ⌜force⌝ ≅ ℝ × ℝ.

Remark It is possible to take the product of more than two sets as well. For example, if A, B, and C are sets, then A × B × C is the set of triples

A×B×C{ (a,b,c)|aA,bB,cC }.

This kind of generality is useful in understanding multiple dimensions, e.g., what physicists mean by ten-dimensional space. It comes under the heading of limits (see Section 6.1.3).

Example Let ℝ be the set of real numbers. By ℝ2 we mean ℝ × ℝ. Similarly, for any n ∈ ℕ, we define ℝn to be the product of n copies of ℝ.

According to Penrose [35], Aristotle seems to have conceived of space as something like S ≔ ℝ3 and of time as something like T ≔ ℝ. Space-time, had he conceived of it, would probably have been S × T ≅ ℝ4. He, of course, did not have access to this kind of abstraction, which was probably due to Descartes. (The product X × Y is often called Cartesian product, in his honor.)


Let ℤ denote the set of integers, and let +: ℤ × ℤ → ℤ denote the addition function and ·: ℤ × ℤ → ℤ denote the multiplication function. Which of the following diagrams commute?

a. art

b. art

c. art   Universal property for products

A universal property is an abstract quality that characterizes a given construction. For example, the following proposition says that the product construction is characterized as possessing a certain quality.

Proposition (Universal property for product). Let X and Y be sets. For any set A and functions f : AX and g : AY, there exists a unique function AX × Y such that the following diagram commutes:


We say this function is induced by f and g, and we denote it

f,g:AX×Y,   where   f,g(a)=(f(a),g(a)).

That is, we have π1 ○ 〈f, g〉 = f and π2 ○ 〈f, g〉 = g, andf, gis the only function for which that is so.

Proof. Suppose given f, g as in the proposition statement. To provide a function : AX × Y is equivalent to providing an element (a) ∈ X × Y for each aA. We need such a function = 〈f, g〉, for which π1 ○ 〈f, g〉 = f and π2 ○ 〈f, g〉 = g. An element of X × Y is an ordered pair (x, y), and we can use 〈f, g〉(a) = (x, y) if and only if x = π1(x, y) = f(a) and y = π2(x, y) = g(a). So it is necessary and sufficient to define


for all aA.

Example (Grid of dots, continued). It is important that the reader sees the universal property for products as completely intuitive.

Recall that if X and Y are sets, say, of cardinalities |X| = m and |Y | = n respectively, then X × Y is an m × n grid of dots, and it comes with two canonical projections Xπ1X×Yπ2Y. These allow us to extract from every grid element zX × Y its column π1(z) ∈ X and its row π2(z) ∈ Y.

Suppose that each person in a classroom picks an element of X and an element of Y. Thus we have functions f : CX and g : CY. But is not picking a column and a row the same thing as picking an element in the grid? The two functions f and g induce a unique function CX × Y. How does this function CX × Y compare with the original functions f and g? The commutative diagram (3.2) sums up the connection.

Example Let ℝ be the set of real numbers, and let 0 ∈ ℝ be the origin. As in Notation, it is represented by a function z: {☺} → ℝ, with z(☺) = 0. Thus we can draw functions


The universal property for products guarantees a function 〈z, z〉: {☺} → ℝ × ℝ, which represents the origin in (0, 0) ∈ ℝ2.


For every set A there is some relationship between the following three sets:

HomSet(A,X),   HomSet(A,Y),   and   HomSet(A,X×Y).

What is it?

Hint: This problem is somewhat recursive in that you will use products in your formula.


a. Let X and Y be sets. Construct the swap map s: X × YY × X using only the universal property for products. If π1: X × YX, π2: X × YY, p1: Y × XY, and p2: Y × XX are the projection functions, write s in terms of the symbols π1, π2, p1, p2, ○, and 〈 , 〉.

b. Can you prove that s is an isomorphism using only the universal property for products?

Example Suppose given sets X, X′, Y, Y′ and functions m: XX′ and n: YY′. We can use the universal property for products to construct a function s: X × YX′ × Y′.

The universal property (Proposition says that to get a function from any set A to X′ × Y′, we need two functions, namely, some f : AX′ and some g : AY′. Here we want to use AX × Y.

What we have readily available are the two projections π1: X × YX and π2: X × YY. But we also have m: XX′ and n: YY′. Composing, we set fmπ1 and gnπ2.


The dotted arrow is often called the product of m: XX′ and n: YY′. Here it is denoted 〈f, g〉, but f and g were not given variables. Since writing 〈mπ1, nπ2〉 is clunky notation, we instead denote this function

m×n: X×YX×Y.   Ologging products

Given two objects c, d in an olog, there is a canonical label 〈〈c × d〉〉 for their product c × d, written in terms of the labels 〈〈c〉〉 and 〈〈d〉〉. Namely,

〈〈c×d〉〉a pair (x,y), where x is 〈〈c〉〉 and y is 〈〈d〉〉.

The projections cc × dd can be labeled “yields, as x,” and “yields, as y,” respectively.

Suppose that e is another object, and p: ec and q: ed are two arrows. By the universal property for products (Proposition, p and q induce a unique arrow ec × d, making the evident diagrams commute. This arrow can be labeled

yields, insofar as it 〈〈p〉〉〈〈c〉〉 and 〈〈q〉〉〈〈d〉〉,.

Example Every car owner owns at least one car, but there is no obvious function ⌜a car owner⌝ → ⌜a car⌝ because he or she may own more than one. One good choice would be the car that the person drives most often, which can be called his or her primary car. Also, given a person and a car, an economist could ask how much utility the person would get out of the car. From all this we can put together the following olog involving products:


The composite map OV tells us the utility a car owner gets out of their primary car.

3.1.2   Coproducts

We can characterize the coproduct of two sets with its own universal property.

Definition Let X and Y be sets. The coproduct of X and Y, denoted XY, is defined as the disjoint union of X and Y, i.e., the set for which an element is either an element of X or an element of Y. If something is an element of both X and Y, then we include both copies, and distinguish between them, in XY. See Example

There are two natural inclusion functions, i1: XXY and i2: YXY.


Example The coproduct of X ≔ {a, b, c, d} and Y ≔ {1, 2, 3} is

XY{ a,b,c,d,1,2,3 }.

The coproduct of X and itself is

XX{ a1,b1,c1,d1,a2,b2,c2,d2 }.

The names of the elements in XY are not so important. What is important are the inclusion maps i1, i2 from (3.3), which ensure that we know where each element of XY came from.

Example (Airplane seats).



Would you say that ⌜a phone⌝ is the coproduct of ⌜a cell phone⌝ and ⌜a landline phone⌝?

Example (Disjoint union of dots). Below, X and Y are sets, having six and four elements respectively, and XY is their coproduct, which has ten elements.

art   Universal property for coproducts

Proposition (Universal property for coproduct). Let X and Y be sets. For any set A and functions f : XA and g : YA, there exists a unique function XYA such that the following diagram commutes:


We say this function is induced by f and g, and we denote it1


That is, we have {fgi1=f and {fgi2=g , and {fg is the only function for which that is so.

Proof. Suppose given f, g as in the proposition statement. To provide a function : XYA is equivalent to providing an element f(m) ∈ A for each mXY. We need such a function ={fg such that {fgi1=f and {fgi2=g . But each element mXY is either of the form i1x or i2y and cannot be of both forms. So we assign

{fg (m)={f(x)  if m=i1x,g(y) if m=i2y,    (3.6)

This assignment is necessary and sufficient to make all relevant diagrams commute.


Any time behavior is determined by cases, there is a coproduct involved.


Let f : ℤ → ℕ be the function defined by

f(n)={nif n0,nif n<0.

a. What is the standard name for f?

b. In the terminology of Proposition, what are A, X, Y, and XY ?

Application (Piecewise defined curves). In science, curves are often defined or considered piecewise. For example, in testing the mechanical properties of a material, we might be interested in various regions of deformation, such as elastic, plastic, or post-fracture. These are three intervals on which the material displays different kinds of properties.

For real numbers ab ∈ ℝ, let [a, b] ≔ {x ∈ ℝ | axb} denote the closed interval. Given a function [a, b]f and a function [c, d]g, the universal property for coproducts implies that they extend uniquely to a function [a, b] ⊔ [c, d] → ℝ, which will appear as a piecewise defined curve,

{fg(x)={f(x)if x[a, b],g(x)if x[c, d].

Often we are given a curve on [a, b] and another on [b, c], where the two curves agree at the point b. This situation is described by pushouts, which are mild generalizations of coproducts (see Section 3.3.2).

Example (Airplane seats, continued). The universal property for coproducts says the following. Any time we have a function XA and a function YA, we get a unique function XYA. For example, every economy-class seat in an airplane and every first-class seat in an airplane is actually in a particular airplane. Every economy-class seat has a price, as does every first-class seat.


The universal property for coproducts formalizes the following intuitively obvious fact:

If we know how economy-class seats are priced and we know how first-class seats are priced, and if we know that every seat is either economy class or first class, then we automatically know how all seats are priced.

To say it another way (and using the other induced map),

If we keep track of which airplane every economy-class seat is in and we keep track of which airplane every first-class seat is in, and if we know that every seat is either economy class or first class, then we require no additional tracking for any airplane seat whatsoever.


Write the universal property for coproduct, in terms of a relationship between the following three sets:

HomSet(X,A),   HomSet(Y,A),   and   HomSet(XY,A).



To assign an A value to each element of XY, you can delegate responsibility: have one person assign an A value to each element of X, and have another person assign an A value to each element of Y. One function is equivalent to two.

Example In the following olog the types A and B are disjoint, so the coproduct C = AB is just the union.


Example In the following olog A and B are not disjoint, so care must be taken to differentiate common elements.


Since ducks can both swim and fly, each duck is found twice in C, once labeled “A”, a flyer, and once labeled “B”, a swimmer. The types A and B are kept disjoint in C, which justifies the name disjoint union.


Following Section, devise a naming system for coproducts, the inclusions, and the universal maps. Try it out by making an olog (involving coproducts) that discusses the idea that both a .wav file and an .mp3 file can be played on a modern computer. Be careful that your arrows are valid (see Section

3.2   Finite limits in Set

This section discusses limits of variously shaped diagrams of sets. This is made more precise in Section 6.1.3, which discusses arbitrary limits in arbitrary categories.

3.2.1   Pullbacks

Definition (Pullback). Suppose given the following diagram of sets and functions:


Its fiber product is the set

X×ZY{ (x,z,y)|f(x)=z=g(y) }.

There are obvious projections π1: X ×Z YX and π2 : X ×Z YY (e.g., π2(x, z, y) = y). The following diagram commutes:


Given the setup of diagram (3.8), we define a pullback of X and Y over Z to be any set W for which we have an isomorphism WX×ZY. The corner symbol ⌟ in diagram (3.9) indicates that X ×Z Y is a pullback.


Let X, Y, Z be as drawn and f : XZ and g : YZ the indicated functions.


What is the fiber product of the diagram XfZgY?


a. Draw a set X with five elements and a set Y with three elements. Color each element of X and each element of Y red, blue, or green,2 and do so in a random-looking way. Considering your coloring of X as a function XC, where C = {red, blue, green}, and similarly obtaining a function YC, draw the fiber product X ×C Y.

b. The universal property for products guarantees a function X ×C YX × Y, which will be an injection. This means that the drawing you made of the fiber product can be embedded into the 5 × 3 grid. Draw the grid and indicate this subset.


a. Let X = {1, 2, 3, 4, 5} and Y = {a, b, c}. The fiber product is shown in part (b).



Note that inside the set of X × Y = 15 possible (x, y) pairs is the set of pairs that agree on color—this is X ×C Y. The grid X × Y is not drawn, but it includes the drawn dots, X ×C YX × Y, as well as eight nondrawn dots such as (3, a), which “couldn’t agree on a color.”

Remark Some may prefer to denote the fiber product in (3.8) by f × Z g rather than X ×Z Y. The former is mathematically better notation, but human-readability is often enhanced by the latter, which is also more common in the literature. We use whichever is more convenient.


Let f : XZ and g : YZ be functions.

a. Suppose that Y = ∅; what can you say about X ×Z Y?

b. Suppose now that Y is any set but that Z has exactly one element; what can you say about X ×Z Y?


Let S = ℝ3, T = ℝ, and think of them as (Aristotelian) space and time, with the origin in S × T given by the center of mass of MIT at the time of its founding. Let Y = S × T, and let g1 : YS be one projection and g2 : YT the other projection. Let X = {☺} be a set with one element, and let f1 : XS and f2 : XT be given by the origin in both cases.

a. What are the fiber products W1 and W2:


b. Interpret these sets in terms of the center of mass of MIT at the time of its founding.   Using pullbacks to define new ideas from old

The fiber product of a diagram can serve to define a new concept. For example, olog (3.13) defines what it means for a cell phone to have a bad battery, in terms of the length of time for which it remains charged. Being explicit reduces the chance of misunderstandings between different groups of people. This can be useful in situations like audits and those in which one is trying to reuse or understand data gathered by others.

Example Consider the following two ologs. The one on the right is the pullback of the one on the left.


Check from Definition that the label “a customer that is wealthy and loyal” is fair and straightforward as a label for the fiber product A = B ×D C, given the labels on B, C, and D.

Remark Note that in diagram (3.11) the upper left box in the pullback could have been (noncanonically named) ⌜a good customer⌝. If it were taken to be the fiber product, then the author would be effectively defining a good customer to be one that is wealthy and loyal.


For each of the following, an author has proposed that the right-hand diagram is a pullback. Do you think their labels are appropriate or misleading; that is, is the label in the upper left box of the pullback reasonable given the rest of the olog, or is it suspect in some way?








Consider your olog from Exercise Are any of the commutative squares in it actually pullback squares?

Definition (Preimage). Let f : XY be a function and yY an element. The preimage of y under f, denoted f−1(y), is the subset f−1(y) ≔ {xX | f(x) = y}. If Y′ ⊆ Y is any subset, the preimage of Yunder f, denoted f−1(Y′), is the subset f−1(Y′) = {xX | f(x) ∈ Y′}.


Let f : XY be a function and yY an element. Draw a pullback diagram in which the fiber product is isomorphic to the preimage f−1(y).


Consider the function f : ℕ → ℕ, where f(n) = n + 3. Let A = {i ∈ ℕ | i ⩾ 7}, and let g : A → ℕ be the inclusion, e.g., g(17) = 17. What is the pullback of the following diagram?


Proposition (Universal property for pullback). Suppose given the diagram of sets and functions as below:


For any set A and the following commutative solid-arrow diagram (i.e., functions f : AX and g : AY such that tf = ug), there is a unique function AX ×Z Y such that the diagram commutes:



a. Create an olog whose underlying shape is a commutative square. Now add the fiber product so that the shape is the same as that of diagram (3.12).

b. Use your result to devise English labels to the object X ×Z Y, to the projections π1, π2, and to the dotted map Af,gZX×ZY, such that these labels are as canonical as possible.   Pasting diagrams for pullback

Consider the following diagram, which includes a left-hand square, a right-hand square, and a big rectangle:


The right-hand square has a corner symbol indicating that B′ ≅ B ×C C′ is a pullback. But the corner symbol in the leftmost corner is ambiguous; it might be indicating that the left-hand square is a pullback, or it might be indicating that the big rectangle is a pullback. It turns out not to be ambiguous because the left-hand square is a pullback if and only if the big rectangle is. This is the content of the following proposition.

Proposition Consider the diagram:


where B′ ≅ B ×C Cis a pullback. Then there is an isomorphism A ×B B′ ≅ A ×C C′. In other words, there is an isomorphism


Proof. We first provide a map ϕ: A×B(B×CC′) → A×CC′. An element of A×B(B×CC′) is of the form (a, b, (b, c, c′)) such that f(a) = b, g(b) = c and k(c′) = c. But this implies that gf(a) = c = k(c′) so we put ϕ(a, b, (b, c, c′)) ≔ (a, c, c′) ∈ A ×C C′. Now we provide a proposed inverse, ψ : A ×C C′ → A ×B (B ×C C′). Given (a, c, c′) with gf(a) = c = k(c′), let b = f(a) and note that (b, c, c′) is an element of B ×C C′. So we can define ψ(a, c, c′) = (a, b, (b, c, c′)). It is easy to see that ϕ and ψ are inverse.

Proposition can be useful in authoring ologs. For example, the type ⌜a cell phone that has a bad battery⌝ is vague, but we can lay out precisely what it means using pullbacks:


The category-theoretic fact described here says that since AB ×D C and CD ×F E, it follows that AB ×F E. That is, we can deduce the definition “a cell phone that has a bad battery is defined as a cell phone that has a battery which remains charged for less than one hour.”


a. Create an olog that defines two people to be “of approximately the same height” if and only if their height difference is less than half an inch, using a pullback. Your olog can include the box ⌜a real number x such that −.5 < x < .5⌝.

b. In the same olog, use pullbacks to make a box for those people whose height is approximately the same as a person named “Mary Quite Contrary.”


Consider the following diagrams. In the left-hand one, both squares commute.


Let W = X ×Z Y and W′ = X′ ×Z Y′ be fiber products, and form the right-hand diagram. Use the universal property for fiber products to construct a map WW′ such that all squares commute.


We redraw the right-hand diagram, with arrows labeled and a new dotted arrow:


The commutativity of the right, back, and bottom squares can be written equationally as


Therefore, the universal property for pullbacks ( allows us to form the desired map WW′ as 〈ga, ebZ

3.2.2   Spans, experiments, and matrices

Definition Given sets A and B, a span on A and B is a set R together with functions f : RA and g : RB.


Application Think of A and B as observables and R as a set of experiments performed on these two variables.

For example, let’s rename variables and say that T is the set of possible temperatures of a gas in a fixed container and that P is the set of possible pressures of the gas, so we have the span TfEgP. We perform 1,000 experiments in which we change and record the temperature, and we simultaneously record the pressure. The results might look like this:

Experiment E




1 100 72
2 100 73
3 100 72
4 200 140
5 200 138
6 200 141

Definition Let A, B, and C be sets, and let AfRgB and BfRgC be spans. Their composite span is given by the fiber product R ×B R′ as in this diagram:


Application Let’s look back at the lab’s experiment in Application, which resulted in a span TfEgP. Suppose we notice that something looks a little wrong. The pressure should be linear with the temperature but it does not appear to be. We hypothesize that the volume of the container is increasing under pressure. We look up this container online and see that experiments have been done to measure the volume as the interior pressure changes. That data gives us a span PfEgV.

The composite of our lab’s span with the online data span yields a span TE″ → V, where E″ ≔ E ×P E′. What information does this span give us? In explaining it, one might say, “whenever an experiment e in our lab yielded the same pressure as the online experiment e′ recorded, we called that a data point e″. Every data point has an associated temperature (from our lab) and an associated volume (from the online experiment). This is the best we can do.”

The information we get this way might be seen by some as unscientific, but it certainly is the kind of information people use in business and in everyday life calculation—we get data from multiple sources and put it together. Moreover, it is scientific in the sense that it is reproducible. The way we obtained our T-V data is completely transparent.

We can relate spans to matrices of natural numbers, and see a natural categorification of matrix addition and matrix multiplication. If the spans come from experiments, as in Applications and, the matrices will look like huge but sparse matrices. Let’s go through that.

Let A and B be sets, and let ARB be a span. By the universal property for products, we have a unique map RpA×B.

We make a matrix of natural numbers out of this data as follows. The set of rows is A, the set of columns is B. For elements aA and bB, the (a, b) entry is the cardinality of its preimage, |p−1(a, b)|, i.e., the number of elements in R that are sent by p to (a, b).

Suppose we are given two (A, B) spans, i.e., ARB and AR′ → B; we might think of these has having the same dimensions, i.e., they are both |A| × |B| matrices. We can take the disjoint union RR′, and by the universal property for coproducts we have a unique span ARR′ → B making the requisite diagram commute.3 The matrix corresponding to this new span will be the sum of the matrices corresponding to the two previous spans out of which it was made.

Given a span ARB and a span BSC, the composite span can be formed as in Definition It will correspond to the usual multiplication of an |A| × |B| matrix by a |B| × |C| matrix, resulting in a |A| × |C| matrix.


Let A = {1, 2} and B = {1, 2, 3}, and consider the span AfRgB given by the table



f : A

g : B

1 1 2
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 2 3
8 2 3

So R = 8.

a. What is the matrix corresponding to this span?

b. If R′ ⊆ A×B is a subset, with corresponding span AfRgB given by projections, what can you say about the numbers in the corresponding matrix?

Construction Given a span AfRgB, one can draw a bipartite graph with each element of A drawn as a dot on the left, each element of B drawn as a dot on the right, and each element rR drawn as an arrow connecting vertex f(r) on the left to vertex g(r) on the right.


a. Draw the bipartite graph (as in Construction corresponding to the span TfEgP in Application (assuming the ellipses are vacuous, i.e., assuming that |E| = 6).

b. Now make up your own span PfEgV (with |E′| ⩾ 2), and write it out in database form as in Application and in bipartite graph form.

c. Draw the composite span TE ×P E′ → V as a bipartite graph.

d. Describe in words how the composite span graph (for TE ×P E′ → V ) relates to the graphs of its factors (TEP and PE′ → V ).

3.2.3   Equalizers and terminal objects

Definition Suppose given two parallel arrows


The equalizer of f and g is the set

Eq(f,g){ xX|f(x)=g(x)}

which is a subset of X. Writing p: Eq(f, g) → X for the inclusion, we have a commutative diagram


with fp = gp.

Example Suppose one has designed an experiment to test a theoretical prediction. The question is, When does the theory match the experiment? The answer is given by the equalizer of the following diagram:


The equalizer is the set of all inputs for which the theory and the experiment yield the same output.


Create an olog that uses equalizers in a reasonably interesting way. Alternatively, use an equalizer to specify those published authors who have published exactly one paper. Hint: Find a function from authors to papers; then find another.


Find a universal property enjoyed by the equalizer of two arrows f : XY and g : XY, and present it in the style of Propositions,, and


a. A terminal set is a set S such that for every set X, there exists a unique function XS. Find a terminal set.

b. Do you think that the notion terminal set belongs here in Section 3.2, i.e., in the same world as products, pullbacks, and equalizers? Why? Another way to ask this is, If products, pullbacks, and equalizers are all limits, then what do limits have in common?


a. Let S = {☺}. Then S is a terminal set. So is S = {43}. This was the content of Exercise, part (a).

b. The notion of a terminal set does fit well into Section 3.2 because it has a similar kind of universal property. Namely, for any other set S′ that might fill the position of S, there is a unique map S′ → S. See Section 6.1.3.

3.3   Finite colimits in Set

This section parallels Section 3.2. I introduce several types of finite colimits to give the reader some intuition about them without formally defining colimits yet.

3.3.1   Background: equivalence relations

Definition (Equivalence relations and equivalence classes). Let X be a set, and consider the product X × X, as in Definition An equivalence relation on X is a subset RX × X satisfying the following properties for all x, y, zX:

Reflexivity: (x, x) ∈ R;

Symmetry: (x, y) ∈ R if and only if (y, x) ∈ R;

Transitivity: If (x, y) ∈ R and (y, z) ∈ R, then (x, z) ∈ R.

If R is an equivalence relation, we often write xR y, or simply xy, to mean (x, y) ∈ R. For convenience we may refer to the equivalence relation by the symbol ∼, saying that ∼ is an equivalence relation on X.

An equivalence class of ∼ is a subset AX such that

  • A is nonempty, A ≠ ∅;
  • if xA, then yA if and only if xy.

Suppose that ∼ is an equivalence relation on X. The quotient of X by ∼, denoted X/ ∼, is the set of equivalence classes of ∼. By definition, for any element xX, there is exactly one equivalence class A such that xA. Thus we can define a function Q: XX/∼, called the quotient function, sending each element xX to the equivalence class containing it. Note that for any yX/∼, there is some xX with Q(x) = y; we call x a representative of the equivalence class y.

Example Let ℤ denote the set of integers. Define a relation R ⊆ ℤ × ℤ by

R={(x,y)|n such that x+7n=y}.

Then R is an equivalence relation because x + 7 * 0 = x (reflexivity); x + 7 * n = y if and only if y + 7 * (−n) = x (symmetry); and x + 7n = y and y + 7m = z together imply that x + 7(m + n) = z (transitivity).

An example equivalence class A ⊆ ℤ for this relation is A = {…, −12, −5, 2, 9, …}.


Let X be the set of people on earth. Define a binary relation RX × X on X as follows. For a pair (x, y) of people, put (x, y) ∈ R if x cares what happens to y. Justify your answers to the following questions:

a. Is this relation reflexive?

b. Is it symmetric?

c. Is it transitive?

d. What if “cares what happens to” is replaced with “has shaken hands with”. Is this relation reflexive, symmetric, transitive?

Example (Partitions). An equivalence relation on a set X can be thought of as a way of partitioning X. A partition of X consists of a set I, called the set of parts, and for every element iI, the selection of a subset XiX such that two properties hold:

  • Every element xX is in some part (i.e., for all xX, there exists iI such that xXi).
  • No element can be found in two different parts (i.e., if xXi and xXj, then i = j).

Given a partition of X, we define an equivalence relation ∼ on X by putting xx′ if x and x′ are in the same part (i.e., if there exists iI such that x, x′ ∈ Xi). The parts become the equivalence classes of this relation. Conversely, given an equivalence relation, one makes a partition on X by taking I to be the set of equivalence classes and, for each iI, letting Xi be the elements in that equivalence class.


Let X and B be sets, and let f : XB be a function. Define a subset RfX × X by


a. Let f : ℝ → ℝ be given by the cosine function, f(x) = cos(x), and let Rf ⊆ ℝ × ℝ be the relation as defined. Find x, y ∈ ℝ such that xy, but (x, y) ∈ Rf.

b. Is Rf an equivalence relation, for any f?

c. Are all equivalence relations on X obtainable in this way (as Rf for some function having domain X)?

d. Does this viewpoint on equivalence classes relate to that of Example


Take a set I of sets. That is, suppose I is a set and that for each element iI, you are given a set Xi. For every two elements i, jI, say that ij if Xi and Xj are isomorphic. Is this relation an equivalence relation on I?

Any relation can be enlarged to an equivalence relation with minimal alteration.

Proposition (Generating equivalence relations). Let X be a set and RX × X any subset. There exists a relation SX × X such that

  • S is an equivalence relation;
  • RS;
  • for any equivalence relation Ssuch that RS′, we have SS′.

The relation Sis called the equivalence relation generated by R.

Proof. Let LR be the set of all equivalence relations on X that contain R. In other words, each element LR is an equivalence relation, so we have RX × X. The set LR is nonempty because X × XX × X is an equivalence relation containing R. Let S denote the set of pairs (x1, x2) ∈ X × X that appear in every element of LR, that is, S=LR. Note that RS by definition. We need only show that S is an equivalence relation.

Clearly, S is reflexive, because each LR is. If (x, y) ∈ S, then (x, y) ∈ for all LR. But since each is an equivalence relation, (y, x) ∈ too, so (y, x) ∈ S. This shows that S is symmetric. The proof that it is transitive is similar: if (x, y) ∈ S and (y, z) ∈ S, then they are both in each , which puts (x, z) in each , which puts it in S.


Consider the set ℝ of real numbers. Draw the coordinate plane ℝ × ℝ, and give it coordinates x and y. A binary relation on ℝ is a subset S ⊆ ℝ × ℝ, which can be graphed as a set of points in the (x, y) plane.

a. Draw the relation {(x, y) | y = x2}.

b. Draw the relation {(x, y) | yx2}.

c. Let S0 be the equivalence relation on ℝ generated (in the sense of Proposition by the empty set. Draw S0 as a subset of the plane.

d. Consider the equivalence relation S1 generated by {(1, 2), (1, 3)}. Draw S1 in the plane. Highlight the equivalence class containing (1, 2).

e. The reflexivity property and the symmetry property (from Definition have pleasing visualizations in ℝ × ℝ; what are they?

f. Can you think of a heuristic for visualizing the transitivity property?


Let X be a set, and consider the empty relation R = ∅ ⊆ X × X.

a. What is the equivalence relation ∼ generated by R (called the trivial equivalence relation on X)?

b. Is the quotient function XX/∼ always an isomorphism?


a. It is the smallest reflexive relation R ≔ {(x, x) | xX}.

b. Yes. We have xy if and only if x = y, so each equivalence class contains precisely one element.


Consider the binary relation R = {(n, n + 1) | n ∈ ℤ} ⊆ ℤ × ℤ.

a. What is the equivalence relation ∼ generated by R?

b. How many equivalence classes are there?


Suppose N is a network (system of nodes and edges). Let X be the nodes of the network, and let RX × X denote the relation such that (x, y) ∈ R iff there exists an edge connecting x to y.4

a. What is the equivalence relation ∼ generated by R?

b. What is the quotient X/∼?


a. Node x is equivalent to node y if and only if one can get from x to y by moving along some finite number of edges (including no edges if x = y). In other words, if nodes are street addresses in a city, and each edge is like a street, then two addresses are equivalent if a pedestrian can get from one to the other.

b. It is called the set of “connected components” of the network. Think of a connected component as an island within the network. A pedestrian can get from everywhere on the island to everywhere else on the island but cannot get off the island.

Remark Let X be a set and RX × X a relation. The proof of Proposition has the benefit of working even if |X| ⩾ ∞, but it has the cost that it is not very intuitive nor useful in practice when X is finite. The intuitive way to think about the idea of equivalence relation generated by R is as follows:

  1. First add to R what is demanded by reflexivity, R1R ∪ {(x, x) | xX}.
  2. To the result, add what is demanded by symmetry, R2R1 ∪{(x, y) | (y, x) ∈ R1}.
  3. Finally, to the result, add what is demanded by transitivity,

    S=R2{(x,z)|(x,y)R2,and (y,z)R2}.

Then S is an equivalence relation, the smallest one containing R.

3.3.2   Pushouts

Equivalence relations are used to define pushouts.

Definition (Pushout). Suppose given the following diagram of sets and functions:


Its fiber sum, denoted XW Y, is defined as the quotient of XWY by the equivalence relation ∼ generated by wf(w) and wg(w) for all wW.

XWY(XWY)/,  where wW, wf(w),and wg(w).

There are obvious functions i1 : XXW Y and i2 : YXW Y, called inclusions.5 The following diagram commutes:


Given the setup of diagram (3.15), we define a pushout of X and Y over W to be any set Z for which we have an isomorphism XW YZ. The corner symbol ⌜ in diagram (3.16) indicates that XW Y is a pushout.

Example Let X = {x ∈ ℝ | 0 ⩽ x ⩽ 1} be the set of numbers between 0 and 1, inclusive, and let Y = {y ∈ ℝ | 1 ⩽ y ⩽ 2} be the set of numbers between 1 and 2, inclusive. We can form XY, but it has two copies of 1. This is weird, so we use pushouts; let W = {1}. Then the pushout XfWgY, where f and g are the inclusions (1 ↦ 1), is XW Y ≅ {z ∈ ℝ | 0 ⩽ z ⩽ 2}, as desired.


Example (Pushout). In ologs (3.17) and (3.18) right-hand diagram is a pushout of the left-hand diagram. The new object, D, is the union of B and C, but instances of A are equated to their B and C aspects.


In diagram (3.17), the two arrows in the left-hand olog are inclusions: its author considers every cell in the shoulder to be both part of the arm and part of the torso. The pushout is then the union. In olog (3.17), the shoulder is seen as part of the arm and part of the torso. When taking the union of these two parts, we do not want to double-count cells in the shoulder (as would be done in the coproduct BC; see Example Thus we create a new type A for cells in the shoulder, which are considered the same whether viewed as cells in the arm or cells in the torso. In general, if one wishes to take two things and glue them together, with A as the glue and B and C as the two things to be glued, the result is the pushout BA C. (A nice image of this can be seen in the setting of topological spaces; see Example


In olog (3.18), if every mathematics course is simply “too hard,” then when reading off a list of courses, each math course may either be read aloud or simply be read as “too hard.” To form D we begin by taking the union of B and C, and then we consider everything in A to be the same whether one looks at it as a course or as the phrase “too hard.” The math courses are all blurred together as one thing. Thus we see that the power to equate different things can be exercised with pushouts.


Let W, X, Y be as drawn and f : WX and g : WY the indicated functions.


The pushout of the diagram XfWgY is a set P. Write the cardinality |P | of P (see Definition


Suppose that W = ∅; what can you say about XW Z?


Let W ≔ ℕ = {0, 1, 2, …} denote the set of natural numbers, let X = ℤ denote the set of integers, and let Y = {☺} denote a one-element set. Define f : WX by f(w) = −(w + 1), and define g : WY to be the unique map. Describe the set XW Y.


Let i: RX × X be an equivalence relation (see Example for notation). Composing with the projections π1, π2 : X × XX, we have two maps, π1i, : RX and π2i: RX.

a. Consider the pushout XR X of the diagram


How should one think about XR X? That is, before we defined pushouts, we went through some work to define something we can now call XR X—what was it?

b. If i: RX × X is not assumed to be an equivalence relation, we can still define this pushout. Is there a relationship between the pushout Xπ1iRπ2iX and the equivalence relation generated by RX × X?

Proposition (Universal property for pushout). Suppose given the following diagram of sets and functions:


The pushout, XW Y together with the inclusions i1 and i2, satisfies the following property. For any set A and commutative solid arrow diagram (i.e., functions f : XA and g : YA such that ft = gu),


there exists a unique arrow w{ fg : XW YA making everything commute,

f=w{ fgi1  and  g=w{ fgi2.

3.3.3   Other finite colimits

Definition (Coequalizer). Suppose given two parallel arrows


The coequalizer of f and g is the commutative diagram


where we define

Coeq(f,g)Y/, where f(x)g(x) for all xX,

and q is the quotient function q : YY /∼.


Let X = ℝ be the set of real numbers. What is the coequalizer of the two maps XX given by xx and x ↦ (x + 1) respectively?


Find a universal property enjoyed by the coequalizer of two arrows.


An initial set is a set S such that for every set A, there exists a unique function SA.

a. Find an initial set.

b. Do you think that the notion initial set belongs here in Section 3.3, i.e., in the same world as coproducts, pushouts, and coequalizers? Why? Another way to ask this is, If coproducts, pushouts, and coequalizers are all colimits, what do colimits have in common?


a. Let S = ∅. Then S is the initial set. This was the content of Exercise part (b).

b. The notion of an initial set does fit well into Section 3.3 because it has a similar kind of universal property. Namely, for any other set S′ that might fill the position of S, there is a unique map SS′. See Section 6.1.3.

3.4   Other notions in Set

This section discusses some additional notions in the category Set.

3.4.1   Retractions

Definition Suppose given a function f : XY and a function g : YX such that gf = idX. In this case we call f a retract section and we call g a retract projection.


Create an olog that includes sets X and Y and functions f : XY and g : YX such that gf = idX, but such that fg ≠ idY ; that is, such that f is a retract section but not an isomorphism.

3.4.2   Currying

Currying is the idea that when a function takes many inputs, we can input them one at a time or all at once. For example, consider the function that takes a material M and an extension E and returns the force transmitted through material M when it is pulled to extension E. This is a function e: ⌜a material⌝ × ⌜an extension⌝ → ⌜a force⌝. This function takes two inputs at once, but it is convenient to curry the second input. Recall that HomSet(⌜an extension⌝, ⌜a force⌝) is the set of theoretical force-extension curves. Currying transforms e into a function

e:a materialHomSet(an extension,a force).

This is a more convenient way to package the same information: each material M has a force-extension curve e′(M). This will be made precise in Proposition

Notation Let A and B be sets. We sometimes denote by BA the set of functions from A to B,



For a finite set A, let |A| ∈ ℕ denote the cardinality of (number of elements in) A. If A and B are both finite (including the possibility that one or both are empty), is it always true that |BA| = |B||A|?

Proposition (Currying). Let A denote a set. For any sets X, Y there is a bijection


Proof. Suppose given f : X × AY. Define ϕ(f): XY A as follows: for any xX, let ϕ(f)(x): AY be defined as follows: for any aA, let ϕ(f)(x)(a) ≔ f(x, a).

We now construct the inverse, ψ : HomSet(X, YA) → HomSet(X × A, Y). Suppose given g : XY A. Define ψ(g): X × AY as follows: for any pair (x, a) ∈ X × A let ψ(g)(x, a) ≔ g(x)(a).

Then for any f ∈ HomSet(X × A, Y), we have ψϕ(f)(x, a) = ϕ(f)(x)(a) = f(x, a), and for any g ∈ HomSet(X, YA), we have ϕψ(g)(x)(a) = ψ(g)(x, a) = g(x)(a). Thus we see that ϕ is an isomorphism as desired.


Let X = {1, 2}, A = {a, b}, and Y = {x, y}.

a. Write three distinct elements of L ≔ HomSet(X × A, Y).

b. Write all the elements of M ≔ HomSet(A, Y).

c. For each of the three elements L you chose in part (a), write the corresponding function ϕ(): XM guaranteed by Proposition


Let A and B be sets. We defined BA ≔ HomSet(A, B), so we can write the identity function as idBA : HomSet(A, B) → BA. Proposition, make the substitutions X = HomSet(A, B), Y = B, and A = A. Consider the function


obtained as the inverse of (3.22). We have a canonical element idBA in the domain of ϕ−1. We can apply the function ϕ−1 and obtain an element ev = ϕ−1(idBA) ∈ HomSet(HomSet(A, B) × A, B), which is itself a function,


a. Describe the function ev in terms of how it operates on elements in its domain.

b. Why might one be tempted to denote this function ev?


a. An element in HomSet(A, B) × A is a pair (f, a), where f : AB is a function and aA is an element. Applying ev to (f, a) returns f(a), an element of B as desired.

b. One might be tempted because they are the first two letters of the word evaluate—we evaluate the function f on the input a.

If n ∈ ℕ is a natural number, recall from (2.4) that there is a set n = {1, 2, …, n}. If A is a set, we often make the abbreviation



Example said that ℝ2 is an abbreviation for ℝ × ℝ, but (3.24) says that ℝ2 is an abbreviation for ℝ2 = HomSet(2, ℝ). Use Exercise, Exercise, and the fact that 1+1=2, to prove that these are isomorphic, ℝ2 ≅ ℝ × ℝ.

(The answer to Exercise was A = {☺}; i.e., HomSet({☺}, X) ≅ X for all X. The answer to Exercise was HomSet(XY, A) HomSet(X, A) × HomSet(Y, A).)

3.4.3   Arithmetic of sets

Proposition summarizes some properties of products, coproducts, and exponentials, and shows them all in a familiar light, namely, that of elementary school arithmetic. In fact, one can think of the natural numbers as literally being the isomorphism classes of finite sets—that is what they are used for in counting.

Consider the standard procedure for counting the elements of a set S, say, cows in a field. One points to an element in S and simultaneously says “one”, points to another element in S and simultaneously says “two”, and so on until finished. By pointing at a cow as you speak a number, you are drawing an imaginary line between the number and the cow. In other words, this procedure amounts to nothing more than creating an isomorphism (one-to-one mapping) between S and some set {1, 2, 3, …, n}.

Again, the natural numbers are the isomorphism classes of finite sets. Their behavior, i.e., the arithmetic of natural numbers, reflects the behavior of sets. For example, the fact that multiplication distributes over addition is a fact about grids of dots, as in Example The following proposition lays out such arithmetic properties of sets.

This proposition denotes the coproduct of two sets A and B by the notation A + B rather than AB. It is a reasonable notation in general, and one that is often used.

Proposition The following isomorphisms exist for any sets A, B, and C (except for one caveat; see Exercise

  • A + 0A
  • A + BB + A
  • (A + B) + CA + (B + C)
  • A × 00
  • A × 1A
  • A × BB × A
  • (A × B) × CA × (B × C)
  • A × (B + C) ≅ (A × B) + (A × C)
  • A01
  • A1A
  • 0A0
  • 1A1
  • AB + CAB × AC
  • (AB)CAB×C
  • (A × B)CAC × BC


Everything in Proposition is true except in one case, namely, that of


In this case we get conflicting answers, because for any set A, including A = ∅ = 0, we have claimed both that A01 and that 0A0.

What is the correct answer for 00, based on the definitions of 0 and 1, given in (2.4), and of AB, given in (3.21)?


HomSet(∅, ∅) has one element, so 001.


It is also true of natural numbers that if a, b ∈ ℕ and ab = 0, then either a = 0 or b = 0. Is the analogous statement true of all sets?

Proposition is in some sense about isomorphisms. It says that understanding isomorphisms of finite sets reduces to understanding natural numbers. But note that there is much more going on in Set than isomorphisms; in particular, there are functions that are not invertible.

In grade school you probably never saw anything that looked like this:


And yet in Exercise we found a function ev : BA × AB that exists for any sets A, B. This function ev is not an isomorphism, so it somehow does not show up as an equation of natural numbers. But it still has important meaning.6 In terms of mere number, it looks like we are being told of an important function 5755, which is bizarre. The issue here is precisely the one confronted in Exercise


Explain why there is a canonical function 53 × 35, but not a canonical function 5755.


It is true that a set is isomorphic to any other set with the same number of elements, but do not be fooled into thinking that the study of sets reduces to the study of numbers. Functions that are not isomorphisms cannot be captured within the framework of numbers.

3.4.4   Subobjects and characteristic functions

Definition For any set B, define the power-set of B, denoted ℙ(B), to be the set of subsets of B.


a. How many elements does ℙ(∅) have?

b. How many elements does ℙ({☺}) have?

c. How many elements does ℙ({1, 2, 3, 4, 5, 6}) have?

d. Why it be named “power-set”?


a. |ℙ(∅)| = 1.

b. |ℙ({☺})| = 2.

c. |ℙ({1, 2, 3, 4, 5, 6})| = 64.

d. For any finite set X, we find that |ℙ(X)| = 2|X|, i.e., 2 to the power |X|.   Simplicial complexes

Definition Let V be a set, let ℙ(V) be its power-set. Since each element x ∈ ℙ(V) is a subset xU, we can make sense of the expression xx′ for x, x′ ∈ ℙ(V). A subset X ⊆ ℙ(V) is called downward-closed if for every uX and every u′ ⊆ u, we have u′ ∈ X. We say that X contains all atoms if for every vV, the singleton set {v} is an element of X.

A simplicial complex is a pair (V, X), where V is a set and X ⊆ ℙ(V) is a downward-closed subset that contains all atoms. The elements of X are called simplices (singular: simplex). Any subset uV has a cardinality |u|, so we have a function X → ℕ sending each simplex to its cardinality. The set of simplices with cardinality n + 1 is denoted Xn, and each element xXn is called an n-simplex.7 Since X contains all atoms (subsets of cardinality 1), we have an isomorphism X0V, and we may also call the 0-simplices vertices. We sometimes call the 1-simplices edges.8

Since X0V, a simplicial complex (V, X) may simply be denoted X.

Example Let n ∈ ℕ be a natural number, and let V = n + 1. Define the n-simplex, denoted ∆n, to be the simplicial complex ℙ(V) ⊆ ℙ(V), i.e., the whole power-set, which indeed is downward-closed and contains all atoms.

We can draw a simplicial complex X by first putting all the vertices on the page as dots. Then for every xX1, we see that x = {v, v′} consists of two vertices, and we draw an edge connecting v and v′. For every yX2 we see that y = {w, w′, w″} consists of three vertices, and we draw a (filed-in) triangle connecting them. All three edges will be drawn too, because X is assumed to be downward-closed.

The 0-simplex ∆0, the 1-simplex ∆1, the 2-simplex ∆2, and the 3-simplex ∆3 are drawn here:


The n-simplices for various n ∈ ℕ are not the only simplicial complexes. In general, a simplicial complex is a union, or gluing together of simplices in a prescribed manner. For example, consider the simplicial complex X with vertices X0 = {1, 2, 3, 4}, edges X1 = {{1, 2}, {2, 3}, {2, 4}}, and no higher simplices X2 = X3 = ⋯ = ∅. We might draw X as follows:



Let X be the following simplicial complex, so that X0 = {A, B, …, M}.


In this case X1 consists of elements like {A, B} and {D, K}, but not {D, J}.

Write X2, X3, and X4. Hint: The drawing of X is supposed to indicate that X3 should have one element.


The 2-simplex ∆2 is drawn as a filled-in triangle with vertices V = {1, 2, 3}. There is a simplicial complex, often denoted 2, that would be drawn as an empty triangle with the same set of vertices.

a. Draw ∆2 and 2 side by side and make clear the difference.

b. Write X = 2 as a simplicial complex. In other words, what are the elements of the sets X0, X1, X2, X3, …?   Subobject classifier

Given a subset AX, we can decide for every element of X whether it is in A or not. This is a true/false question for X.

Definition We define the subobject classifier for Set, denoted Ω, to be the set Ω ≔ {True, False}, together with the function {☺} → Ω sending the unique element to True.

Proposition Let X be a set. There is an isomorphism


Proof. Given a function f : X → Ω, let ϕ(f) = {xX | f(x) = True} ⊆ X. We now construct a function ψ : ℙ(X) → HomSet(X, Ω) to serve as the inverse of ϕ. Given a subset AX, we define

ψ(A):XΩ  by ψ(i)(x)={ Trueif xA,Falseif xA.(3.25)

One checks easily that ϕ and ψ are mutually inverse.


A function X to Ω = {True, False} is like a roll call. We are interested in the subset that calls out True.

Definition (Characteristic function). Given a subset AX, we define its characteristic function of A in X to be the function ψ(A): X → Ω, from (3.25).

Let X be any set, and let ℙ(X) be its power-set. By Proposition there is a bijection between ℙ(X) and ΩX. Since Ω has cardinality 2, the cardinality of ℙ(X) is 2|X|, which explains the correct answer to Exercise


Let f : X → Ω denote the characteristic function of some subset AX, and define A′ = XA to be its complement, i.e., A′ = {xX | xA}.

a. What is the characteristic function of A′ ⊆ X?

b. Can you phrase it in terms of f and some function Ω → Ω?

3.4.5   Surjections, injections

The classical definition of injections and surjections, given in Definition involves elements. But a more robust notion involves functions; it is given in Proposition

Definition Let f : XY be a function.

  • We say that f is injective if for all x, x′ ∈ X with f(x) = f(x′), we have x = x′.
  • We say that f is surjective if for all yY, there exists some xX such that f(x) = y.
  • We say that f is bijective if it is both injective and surjective.

We sometimes denote an injective function XY, a surjective function XY, and a bijective function X Y (see Proposition


a. Is the function f : ℤ → ℕ, given by f(n) = n2, injective, surjective, or neither?

b. Is the function g : ℕ → ℕ, given by g(n) = n2, injective, surjective, or neither?

c. Is the function h: ℤ → ℕ, given by h(n) = |n| (the absolute value), injective, surjective, or neither?

d. Is the function i: ℤ → ℤ, given by i(n) = −n, injective, surjective, or neither?


Let f : XY and g : YZ be functions.

a. Show that if f and g are injections, then so is gf.

b. Show that if f and g are both surjections, then so is gf.

c. Show that if gf is an injection, then so is f.

d. Show that if gf is a surjection, then so is g.


a. Let x, x′ ∈ X and suppose that gf(x) = gf(x′). Then g(f(x)) = g(f(x′)), so the injectivity of g implies that f(x) = f(x′); the injectivity of f implies that x = x′.

b. Let zZ be an element. The surjectivity of g implies that there is some yY with g(y) = z; the surjectivity of f implies that there is some xX with f(x) = y.

c. Let x, x′ ∈ X and suppose that f(x) = f(x′). Because g is a function, gf(x) = gf(x′), and now the injectivity of gf implies that x = x′.

d. Let zZ be an element. The surjectivity of gf implies that there is some xX with gf(x) = z. But then we have found yf(x) ∈ Y with g(y) = z.

Proposition A function f : XY is bijective if and only if it is an isomorphism.

Proof. Suppose that f is bijective; we define an inverse g : YX. For each yY, the preimage f−1(y) ⊆ X is a set with exactly one element. Indeed, it has at least one element because f is surjective, and it has at most one element because f is injective. Define g(y) to be the unique element of f−1(y). It is easy to see that f and g are mutually inverse.

Note that for every set X, the identity function idX : XX is bijective. Suppose now that f is an isomorphism, and let g be its inverse. The composition gf = idX is injective, and the composition fg = idY is surjective, so f is injective and surjective by Exercise

Proposition Let m, n ∈ ℕ be natural numbers. Then mn if and only if there exists an injection mn.

Sketch of proof. If mn, then there is an inclusion {1, 2, …, m} → {1, 2, …, n}. Suppose now that we are given an injection f : mn; we assume that m > n and derive a contradiction. If m > n, then n + 1 ⩽ m, and we have already shown that there exists an injection g : n + 1m. Composing, we have an injection hgf : n + 1n by Exercise One can show by induction on n that this is impossible.

Corollary Let m, n ∈ ℕ be natural numbers. Then m = n if and only if there exists an isomorphism f : m n.

Proof. If m = n, then the identity idm : mn is an isomorphism.

On the other hand, if we have an isomorphism f : m n, then both it and its inverse are injective by Proposition Thus mn and nm by Proposition, which implies m = n.

Definition (Monomorphisms, epimorphisms). Let f : XY be a function.

We say that f is a monomorphism if for all sets A and pairs of functions g, g′ : AX,


if fg = fg′, then g = g′.

We say that f is an epimorphism if for all sets B and pairs of functions h, h′ : YB,


if hf = h′ ○ f, then h = h′.

Proposition Let f : XY be a function. Then f is injective if and only if it is a monomorphism; f is surjective if and only if it is an epimorphism.

Proof. We use notation as in Definition

If f is a monomorphism, it is clearly injective by putting A = {☺}. Suppose that f injective, and let g, g′ : AX be functions such that fg = fg′, but suppose for contradiction that gg′. Then there is some element aA such g(a) ≠ g′(a) ∈ X. But by injectivity f(g(a)) ≠ f(g′(a)), contradicting the fact that fg = fg′.

Suppose that f : XY is an epimorphism, and choose some y0Y (noting that if Y is empty, then the claim is vacuously true). Let B = Ω, and let h: Y → Ω denote the characteristic function of the subset {y0} ⊆ Y, and let h′ : Y → Ω denote the characteristic function of ∅ ⊆ Y. Note that h(y) = h′(y) for all yy0. Then since f is an epimorphism and hh′, we must have hfh′ ○ f, so there exists xX with h(f(x)) ≠ h′(f(x)), which implies that f(x) = y0. This proves that f is surjective.

Finally, suppose that f is surjective, and let h, h′ : YB be functions with hf = h′ ○ f. For any yY, there exists some xX with f(x) = y, so h(y) = h(f(x)) = h′(f(x)) = h′(y). This proves that f is an epimorphism.

Proposition Let g : AY be a monomorphism. Then for any function f : XY, the left-hand map g′ : X ×Y AX in the diagram


is a monomorphism.

Proof. To show that g′ is a monomorphism, we take an arbitrary set B and two maps m, n: BX ×Y A such that g′ ○ m = g′ ○ n, denoting that function pg′ ○ m: BX. Now let q = f′ ○ m and r = f′ ○ n. The diagram looks like this:


We have that


But we assumed that g is a monomorphism, so this implies that q = r. By the universal property for pullbacks, Proposition, we have m = n = 〈q, pY : BX ×Y A.

Example Suppose an olog has a fiber product square


such that g is intended to be a monomorphism and f is any map.9 In this case, there are labeling systems for f′, g′, and X ×Y A. Namely,

  • “is” is an appropriate label for g and g′;
  • the label for f is an appropriate label for f′;
  • 〈〈X ×Y A〉〉 ≔ “〈〈X〉〉, which 〈〈f〉〉 〈〈A〉〉” is an appropriate label for X ×Y A.

To give an explicit example,


Corollary Let i: AX be a monomorphism, and let True: {☺} → Ω be the subobject classifier (see Definition Then there is a fiber product square of the form


Proof. Let X′ ⊆ X denote the image of i, and let f : X → Ω denote the characteristic function of X′ ⊆ X, given by Proposition Then it is easy to check that diagram (3.26) is a pullback.


Consider the subobject classifier Ω = {True, False}, the singleton {☺}, and the map {☺} True Ω from Definition In diagram (3.26), in the spirit of Example, devise a label for Ω, a label for {☺}, and a label for True. Given a subobject AX, both labeled, devise a label for f, a label for i, and a label for f′ such that the English smoothly fits the mathematics.


Show, in analogy to Proposition, that pushouts preserve epimorphisms.

3.4.6   Multisets, relative sets, and set-indexed sets

In this section we prepare to consider categories other than Set by looking at some categories related to Set.   Multisets

Consider the set X of words in a given document. If WC(X) is the word count of the document, we do not generally have WC(X) = |X|. The reason is that a set cannot contain the same element more than once, so words like the might be undercounted in |X|. A multiset X consists of a set of names, NX, and each name is assigned a multiplicity, i.e., a positive finite number of times it is to be counted. For example, the multiset X =(The, man, just, ate, and, ate, and, ate) has names NX = {The, man, just, ate, and}, and these names have multiplicity 1, 1, 1, 3, 2 respectively.

But if X and Y are multisets, what is the appropriate type of mapping from X to Y? Since every set can be cast as a multiset (in which each element has multiplicity 1), let’s restrict ourselves to notions of mapping that agree with the usual one on sets. That is, if multisets X and Y happen to be ordinary sets, then our mappings XY should just be functions.

In order to define what I believe is the appropriate notion of mapping of multisets, it is useful to take a step back from this definition. The role of the natural numbers in multisets is to count the number of occurrences of each element. The point perhaps is not the number, but the set of occurrences it counts. Each occurrence has a name, so we have a function from occurrences to names. The fact that every name has multiplicity at least 1 means that this function is surjective. So I suggest the following definition of multisets and mappings.

Definition A multiset is a sequence X ≔ (Oc, N, π), where Oc and N are sets and π : OcN is a surjective function. We refer to Oc as the set of occurrences in X, to N as the set of names in X, and to π as the naming function for X. Given a name xN, let π−1(x) ⊆ Oc be the preimage; the number of elements in π−1(x) is called the multiplicity of x.

Suppose that X = (Oc, N, π) and X′ = (Oc′, N′, π′) are multisets. A mapping from X to Y, denoted f : XY, consists of a pair (f1, f0) such that f1 : OcOc′ and f0 : NN′ are functions and such that the following diagram commutes:



Suppose that a pseudo-multiset is defined to be almost the same as a multiset, except that π is not required to be surjective.

a. Write a pseudo-multiset that is not a multiset.

b. Describe the difference between the two notions (multiset vs. pseudo-multiset) in terms of multiplicities.


Consider the multisets X = (a, a, b, c) and Y = (d, d, e, e, e).

a. Write each of them in the form (Oc, N, π), as in Definition

b. In terms of the same definition, how many mappings XY are there?

c. If we were to remove the restriction that diagram (3.27) must commute, how many mappings XY would there be?   Relative sets

Continuing with ideas from multisets, let’s suppose that we have a fixed set N of names that we want to keep once and for all. Whenever someone discusses a set, each of its elements must have a name in N. And whenever someone discusses a mapping, it must preserve the naming. For example, if N is the set of English words, then every document consists of a set {1, 2, 3, …, n} mapping to N (e.g., 1 ↦ Continuing, 2 ↦ with, 3 ↦ ideas, …). A mapping from document A to document B would send each word found somewhere in A to the same word found somewhere in B. This notion is defined in the following definition.

Definition (Relative set). Let N be a set. A relative set over N, or simply a set over N, is a pair (E, π) such that E is a set and π : EN is a function. A mapping of relative sets over N, denoted f : (E, π) → (E′, π′), is a function f : EE′ such that the following triangle commutes, i.e., π = π′ ○ f:



Given sets X, Y, Z and functions f : XY and g : YZ, we can compose them to get a function XZ. If N is a set, if (X, p), (Y, q), and (Z, r) are relative sets over N, and if f : (X, p) → (Y, q) and g : (Y, q) → (Z, r) are mappings of relative sets, is there a reasonable notion of composition such that we get a mapping of relative sets(X, p) → (Z, r)? Hint: Draw diagrams.


a. Let {☺} denote a set with one element. What is the difference between sets relative to N ≔ {☺} and simply sets?

b. Describe the sets relative to ∅. How many are there?   Indexed sets

Let A be a set. Suppose we want to assign to each element aA a set Sa. This is called an A-indexed set. In category theory we are always interested in the legal mappings between two different objects of the same sort of structure, so we need a notion of A-indexed mappings.

Example Let C be a set of classrooms. For each cC, let Pc denote the set of people in classroom c, and let Sc denote the set of seats (chairs) in classroom c. Then P and S are C-indexed sets. The appropriate kind of mapping between them respects the indices. That is, a mapping of C-indexed sets PS should, for each classroom cC, be a function PcSc.10

Definition Let A be a set. An A-indexed set is a collection of sets Sa, one for each element aA; for now we denote this (Sa)aA. Each element aA is called an index. If (Sa')aA is another A-indexed set, an A-indexed function from (Sa)aA to (Sa')aA, denoted


is a collection of functions fa : SaSa', one for each element aA.


Let {☺} denote a one-element set. What are {☺}-indexed sets and {☺}-indexed functions?


There is a strong relationship between A-indexed sets and relative sets over A. What is it?


1We are using a two-line symbol, which is a bit unusual. A certain function XYA is being denoted by the symbol {fg, called case notation. The reasoning for this will be clear from the proof, especially (3.6).

2You may use shadings rather than coloring, if you prefer.

3The following diagram commutes:


4The meaning of iff is “if and only if.” In this case we are saying that the pair (x, y) is in R if and only if there exists an arrow connecting x and y.

5Note that the term inclusion is not too good because it seems to suggest that i1 and i2 are injective (see Definition and this is not always the case. The reason we use inclusion terminology is to be consistent with the terminology of coproducts. The functions i1 and i2 are sometimes called coprojections.

6Roughly, the existence of ev : 53 × 35 says that given a dot in a 5 × 5 × 5 grid of dots, and given one of the three axes, one can tell the coordinate of that dot along that axis.

7It seems anomalous that the set of subsets with cardinality 2 is denoted X1, and so on. But this is standard convention because it fits with the standard notion of dimension: each element of X1 corresponds to a two-dimensional shape, and more generally, each element of Xn is n-dimensional.

8The reason I write X0V rather than X0 = V is that X0 is the set of one-element subsets of V. So if V = {a, b, c}, then X0 = {{a}, {b}, {c}}. This is really just pedantry.

9Of course, this diagram is symmetrical, so the same ideas hold if f is a monomorphism and g is any map.

10If we wanted to allow people from any classroom to choose a chair from just any classroom, category theory would tell us to reconsider P and S as sets, forgetting their C-indices. See Section