Chapter 7

Categories at Work

The reader should now have an understanding of the basic notions of category theory: categories, functors, natural transformations, and universal properties. As well, we have discussed many sources of examples: orders, graphs, monoids, and databases. This chapter begins with the notion of adjoint functors (also known as adjunctions), which are like dictionaries translating back and forth between different categories.

7.1   Adjoint functors

How far can we take this dictionary analogy?

In the common understanding of dictionaries, we assume that two languages (say, French and English) are equally expressive and that a good dictionary will assist in an even exchange of ideas. But in category theory we often have two categories that are not on the same conceptual level. This is most clear in the case of free-forgetful adjunctions. Section 7.1.1 explores the sense in which each adjunction provides a dictionary between two categories that are not necessarily on an equal footing, so to speak.

7.1.1   Discussion and definition

Consider the category of monoids and the category of sets. A monoid (M, e, ⋆) is a set with a unit element and a multiplication formula that is associative. A set is just a set. A dictionary between Mon and Set should not be required to set up an even exchange but rather an exchange that is appropriate to the structures at hand. It will be in the form of two functors, denoted L: SetMon and R: MonSet. So we can translate back and forth, but to say what kind of exchange is appropriate will require more work.

An extended analogy will introduce the subject. A one-year-old can make repeatable noises, and an adult can make repeatable noises. One might say, “After all, talking is nothing but making repeatable noises.” But the adult’s repeatable noises are called words, they form sentences, and those sentences can cause nuclear wars. There is something more in adult language than simply repeatable sounds. In the same vein, a game of tennis can be viewed in terms of physics, the movement of trillions of atoms, but in so doing one won’t see the game aspect. So we have here something analogous to two categories here: {repeated noises} and {meaningful words}. We are looking for adjoint functors to serve as the appropriate sort of dictionary.

To translate baby talk into adult language we would make every repeated noise a kind of word, thereby granting it meaning. We do not know what a given repeated noise should mean, but we give it a slot in our conceptual space while always pondering, “I wonder what she means by Koh….” On the other hand, to translate from meaningful words to repeatable noises is easy. We just hear the word as a repeated noise, which is how the baby probably hears it.

Adjoint functors often come in the form of “free” and “forgetful.” Here we freely add Koh to our conceptual space without having any idea how it adheres to the rest of the child’s noises or feelings. But it does not act like a sound to us, it acts like a word; we do not know what it means, but we figure it means something. Conversely, the translation going the other way is “forgetful,” forgetting the meaning of the words and just hearing them as sounds. The baby hears our words and accepts them as mere sounds, not knowing that there is anything extra to get.

Sets are like the babies in the story: they are simple objects full of unconnected dots. Monoids are like the adults, forming words and performing actions. In the monoid each element means something and combines with other elements in certain ways. There are many different sets and many different monoids, just as there are many babies and many adults, but there are differences in how they interact, so we put them in different categories.

Applying free functor L: SetMon to a set X makes every element xX a word, and these words can be strung together to form more complex words. (Section 4.1.1.12 discussed the free monoid functor L.) Since a set such as X carries no information about the meaning or structure of its various elements, the free monoid F(X) does not relate different words in any way. To apply the forgetful functor R: MonSet to a monoid, even a structured one, is to simply forget that its elements are anything but mere elements of a set. It sends a monoid (M, e, ⋆) to the set M.

Definition 7.1.1.1. Let B and A be categories.1 An adjunction between B and A is a pair of functors

L:BAandR:AB

together with a natural isomorphism2 whose component for any objects AOb(A) and BOb(B)) is

αB,A:HOMA(L(B),A)HOMB(B,R(A)).(7.1)

This isomorphism is called the adjunction isomorphism for the (L, R) adjunction, and for any morphism f : L(B) → A in A, we refer to αB,A(f): BR(A) as the adjunct of f.3

The functor L is called the left adjoint and the functor R is called the right adjoint. We may say that L is the left adjoint of R or that R is the right adjoint of L.4 We often denote this setup

L:BA:R(7.2)

Proposition 7.1.1.2. Let L: SetMon be the functor sending X ∈ Ob(Set) to the free monoid L(X) ≔ (List(X), [ ], ++), as in Definition 4.1.1.15. Let R: MonSet be the functor sending each monoid M ≔ (M, e, ⋆) to its underlying set R(M) ≔ M. Then L is left adjoint to R.

Proof. This is precisely the content of Proposition 4.1.4.9.

Example 7.1.1.3. We need to ground the discussion in some concrete mathematics. In Proposition 7.1.1.2 we provided an adjunction between sets and monoids. A set X gets transformed into a monoid by considering lists in X; a monoid M gets transformed into a set by forgetting the multiplication law. So we have a functor for translating each way,

L:SetMon,  R:MonSet,

but an adjunction is more than that: it includes a guarantee about the relationship between these two functors. What is the relationship between L and R? Consider an arbitrary monoid M = (M, e, ⋆).

If we want to pick out three elements of the set M, that is the same thing as giving a function {a, b, c} → M. But that function exists in the category of sets; in fact it is an element of HomSet({a, b, c}, M). But since M = R(M) is the underlying set of the monoid, we can view the current paragraph in the light of adjunction (7.1) by saying the set

HomSet({a,b,c},R(M)).

classifies all the ways to choose three elements out of the underlying set of monoid M. It was constructed completely from within the context of sets and functions.

Now, what does (7.1) mean? The equation

HomMon(L({a,b,c}),M)HomSet({a,b,c},R(M))

tells us that somehow we can classify all the ways to choose three elements from M, while staying in the context of monoids and monoid homomorphisms. In fact, it tells us how to do so, namely, as HomMon(List({1, 2, 3}), M). Exercise 7.1.1.4 looks at that. The answer can be extracted from the proof of Proposition 4.1.4.9.

Exercise 7.1.1.4.

Let X = {a, b, c}, and let M = (ℕ, 1, *) be the multiplicative monoid of natural numbers (see Example 4.1.3.2). Let g : X → ℕ be the function given by g(a) = 7, g(b) = 2, g(c) = 2, and let βX,M : HomSet(X, R(M)) → HomMon(L(X), M) be as in the proof of Proposition 4.1.4.9.

Consider the list [b, b, a, c] ∈ L(X). What is βX,M(g)([b, b, a, c])?

Let us look once more at the adjunction between adults and babies. Using the notation of Definition 7.1.1.1, A is the adult category of meaningful words, and B is the baby category of repeated noises. The left adjoint turns every repeated sound into a meaningful word (having free meaning), and the right adjoint forgets the meaning of any word and considers it merely as a sound.

At the risk of taking this simple analogy too far, let’s look at the heart of the issue: how to conceive of the isomorphism (7.1) of hom-sets. Once we have freely given a slot to each of the baby’s repeated sounds, we try to find a mapping from the lexicon L(B) of these new words to the adult lexicon A of meaningful words; these are mappings in the adult category A of the form L(B) → A. And (stretching it) the baby tries to find a mapping (which we might see as emulation) from her set B of repeatable sounds to the set R(A) of the sounds the adult seems to repeat. If there were a global system for making these transformations, that would establish (7.1) and hence the adjunction.

Note that the directionality of the adjunction makes a difference. If L:BA is left adjoint to R:AB, there is no reason to think that L is also a right adjoint. In the case of babies and adults, we see that it would make little sense to look for a mapping in the category of meaningful words from the adult lexicon to the wordifications of baby sounds HomA(A,L(B)), because there is unlikely to be a good candidate for most of the words. That is, to which of the child’s repeated noises would we assign the concept “weekday”?

Again, this is simply an analogy and should not be taken to seriously. The next example shows mathematically that the directionality of an adjunction is not arbitrary.

Example 7.1.1.5. Let L: SetMon and R: MonSet be the free and forgetful functors from Proposition 7.1.1.2. We know that L is left adjoint to R; however L is not right adjoint to R. In other words, we can show that the necessary natural isomorphism cannot exist.

Let X = {a, b}, and let M = 1 be the trivial monoid. Then the necessary natural isomorphism would need to give a bijection

HomMon(M,L(X))?HomSet({1},X).

But the left-hand side has one element, because M is the initial object in Mon (see Example 6.1.3.7), whereas the right-hand side has two elements. Therefore, no isomorphism can exist.

Example 7.1.1.6. Preorders have underlying sets, giving rise to a functor U : PrOSet. The functor U has both a left adjoint and a right adjoint. The left adjoint of U is D : SetPrO, sending a set X to the discrete preorder on X (the preorder with underlying set X, having the fewest possible ⩽’s). The right adjoint of U is I : SetPrO, sending a set X to the indiscrete preorder on X (the preorder with underlying set X, having the most possible ⩽’s). See Example 4.4.4.5.

Exercise 7.1.1.7.

Let U : GrphSet denote the functor sending a graph to its underlying set of vertices. This functor has both a left and a right adjoint.

a. What functor SetGrph is the left adjoint of U?

b. What functor SetGrph is the right adjoint of U?

Example 7.1.1.8. Here are some other adjunctions:

  • Ob: CatSet has a left adjoint Disc: SetCat given by the discrete category.
  • Ob: CatSet has a right adjoint Ind: SetCat given by the indiscrete category.
  • The underlying graph functor CatGrph has a left adjoint GrphCat given by the free category.
  • The inclusion GrpMon has a right adjoint MonCoreGrp, called the core, that sends a monoid to its subgroup of invertible elements.
  • The functor PrOGrph, given by drawing edges for ⩽’s, has a left adjoint given by existence of paths.
  • The forgetful functor from partial orders to preorders has a left adjoint given by quotienting out the cliques (see Exercise 4.4.1.15).
  • Given a set A, the functor (− × A): SetSet has a right adjoint Hom(A, −) (this was called currying in Section 3.4.2).

Exercise 7.1.1.9.

Let 1 denote the terminal category. There is a unique functor !: Set1.

a. Does ! have a left adjoint? If so, what is it; if not, why not?

b. Does ! have a right adjoint? If so, what is it; if not, why not?

Exercise 7.1.1.10.

The discrete category functor Disc: SetCat has a left adjoint p: CatSet. In this exercise you will work out how to unpack this idea and begin to deduce how p must behave.

a. For an arbitrary object X ∈ Ob(Set) and an arbitrary object COb(Cat), write the adjunction in the style of (7.2), appropriately filling in all the variables (e.g., decide whether B = Cat or B = Set, etc.).

b. For X and C as in part (a), write the adjunction isomorphism in the style of (7.1), appropriately filling in all the variables.

c. Let C be the free category on the graph G

art

and let X = {1, 2, 3}. How many elements does the set HomCat(C,Disc(X)) have?

d. What can you do to an arbitrary category COb(Cat) to make a set p(C) such that the adjunction isomorphism holds? That is, how does the functor p: CatSet behave on objects?

The following proposition says that all adjoints to a given functor are isomorphic to each other.

Proposition 7.1.1.11. Let C and D be categories, let F:CD be a functor, and let G,G:DC also be functors. If both G and G′ are right adjoint (resp. if both are left adjoint) to F, then there is a natural isomorphism ϕ:GG.

Proof. Suppose that both G and G′ are right adjoint to F (the case of G and G′ being left adjoint is similarly proved). We first give a formula for the components of ϕ: GG′ and its inverse ψ : G′ → G. Given an object dOb(D), we use c = G(d) to obtain two natural isomorphisms, one from each adjunction:

HomC(G(d),G(d))HomD(F(G(d)),d)HomC(G(d),G(d)).

The identity morphism idG(d) is then sent to some morphism G(d) → G′(d), which we take to be the component ϕd. Similarly, we use c′ = G′(d) to obtain two natural isomorphisms, one from each adjunction:

HomC(G(d),G(d))HomD(F(G(d)),d)HomC(G(d),G(d)).

Again, the identity element idG′(d) is sent to some morphism G′(d) → G(d), which we take to be the d-component ψd. The naturality of the adjunction isomorphisms implies that ϕ and ψ are natural transformations, and it is straightforward to check that they are mutually inverse.

7.1.1.12   Quantifiers as adjoints

One of the simplest places where adjoints show up is between preimages and the logical quantifiers ∃ and ∀, ideas first discussed in Notation 2.1.1.1. The setting in which to discuss this is that of sets and their power preorders. That is, if X is a set, then recall from Section 4.4.2 that the power-set ℙ(X) has a natural ordering by inclusion of subsets.

Given a function f : XY and a subset VY the preimage is f−1(V) ≔ {xX | f(x) ∈ V}. If V′ ⊆ V, then f−1(V′) ⊆ f−1(V), so in fact f−1 : ℙ(Y) → ℙ(X) can be considered a functor (where of course we are thinking of preorders as categories). The quantifiers ∃ and ∀ appear as adjoints of f−1.

Let’s begin with the left adjoint of f−1 : ℙ(Y) → ℙ(X). It is a functor Lf : ℙ(X) → ℙ(Y). Choose an object UX in ℙ(X). It turns out that

Lf(U)={yY|xf1(y) such that xU}.

And the right adjoint Rf : ℙ(X) → ℙ(Y), when applied to U, is

Rf(U)={yY|xf1(y),xU}.

In fact, the functor Lf is generally denoted ∃f : ℙ(X) → ℙ(Y), and Rf is generally denoted ∀f : ℙ(X) → ℙ(Y).

art

The next example shows why this notation is apt.

Example 7.1.1.13. In logic or computer science the quantifiers ∃ and ∀ are used to ask whether any or all elements of a set have a certain property. For example, one may have a set U of natural numbers and want to know whether any or all are even or odd. Let Y = {even, odd}, and let

p:Y

be the function that assigns to each natural number its parity (even or odd). Because the elements of ℙ(ℕ) and ℙ(Y) are ordered by inclusion of subsets, we can construe these orders as categories (by Proposition 5.2.1.13). What is new is that we have adjunctions between these categories:

art

Given a subset U ⊆ ℕ, i.e., an object U ∈ Ob(ℙ(ℕ)), we investigate the objects ∃p(U), ∀p(U). These are both subsets of {even, odd}. The set ∃p(U) includes the element even if there exists an even number in U; it includes the element odd if there exists an odd number in U. Similarly, the set ∀p(U) includes the element even if every even number is in U, and it includes odd if every odd number is in U.

Let’s use the definition of adjunction to ask whether every element of U ⊆ ℕ is even. Let V = {even} ⊆ Y. Then f−1(V) ⊆ ℕ is the set of even numbers, and there is a morphism Uf−1(V) in the preorder ℙ(ℕ) if and only if every element of U is even. Therefore, the adjunction isomorphism Homℙ(ℕ)(U, f−1(V)) ≅ Homℙ(Y)(∃pU, V) says that ∃pU ⊆ {even} if and only if every element of U is even.

Exercise 7.1.1.14.

The national scout jamboree is a gathering of Boy Scouts from troops across the United States. Let S be the set of Boy Scouts in the U.S., and let T be the set of Boy Scout troops in the U.S. Let t: ST be the function that assigns to each Boy Scout his troop. Let US be the set of Boy Scouts in attendance at this year’s jamboree.

a. What is the meaning of the object ∃tU

b. What is the meaning of the object ∀tU?

Exercise 7.1.1.15.

Let X be an arbitrary set and UX a subset.

a. Find a set Y and a function f : XY such that ∃fU tells you whether U is nonempty.

b. What is the meaning of ∀fU for your choice of Y and f?

In fact, the idea of quantifiers as adjoints is part of a larger story. Suppose we think of elements of a set X as bins, or storage areas. An element of ℙ(X) can be construed as an injection UX, i.e., an assignment of a bin to each element of U, with at most one element of U in each bin. Relaxing the injectivity restriction, we may consider arbitrary sets U and assignments UX of a bin to each element uU. Given a function f : XY , we can generalize ∃f and ∀f to functors denoted Σf and Πf, which will parameterize disjoint unions and products (respectively) over yY . This is discussed in Section 7.1.4.

7.1.2   Universal concepts in terms of adjoints

This section explores how universal concepts, i.e., initial objects and terminal objects, colimits and limits, are easily phrased in the language of adjoint functors. We say that a functor F:CD is a left adjoint or has a right adjoint if there exists a functor G:DC such that F is a left adjoint of G. Proposition 7.1.1.11 showed that if F is a left adjoint of some functor G, then it is isomorphic to every other left adjoint of G, and G is isomorphic to every other right adjoint of F.

Example 7.1.2.1. Let C be a category and t:C1¯ the unique functor to the terminal category. Then t has a right adjoint if and only if C has a terminal object, and t has a left adjoint if and only if C has an initial object. The proofs are dual, so let’s focus on the first.

The functor t has a right adjoint R:1¯C if and only if for every object cOb(C) there is an isomorphism

HomC(c,r)Hom1¯(t(c),1),

where r = R(1). But Hom1(t(c), 1) has one element. Thus t has a right adjoint iff HomC(c,r) has one element for each cOb(C). This is the definition of r being a terminal object.

When colimits and limits were defined in Definitions 6.1.3.31 and 6.1.3.20, it was for individual I-shaped diagrams X:IC. Using adjoints we can define the limit of every I-shaped diagram in C at once.

Let t: I1 denote the unique functor to the terminal category. Suppose given an object cOb(C), represented by the functor c:1¯C. Then ct:IC is the constant functor at c, sending each object in I to the same C-object, c, and every morphism in I to idc. Thus composing with t induces a functor CFun(1¯,C)Fun(I,C), denoted Δt:CFun(I,C). It sends each object c to the associated constant functor ct.

Suppose we want to take the colimit or limit of X. We are given an object X of Fun(I,C), and we want back an object of C. We could hope, and it turns out to be true, that the adjoints of Δt are the limit and colimit. Indeed, let Σt:Fun(I,C)C denote the left adjoint of Δt, and let Πt:Fun(I,C)C denote the right adjoint of Δt. Then Σt is the functor that takes colimits, and Πt is the functor that takes limits.

A generalization of colimits and limits is given in Section 7.1.4. But for now, let’s consider a concrete example.

Example 7.1.2.2. Let C=Set, and let I = 3. The category Fun(3, Set) is the category of {1, 2, 3}-indexed sets, e.g., (ℤ, ℕ, ℤ) ∈ Ob(Fun(3, Set)) is an object of it. We will obtain the limit, i.e., the product of these three sets 3Set using adjoints.

In fact, the limit will be right adjoint to a functor Δt: Set → Fun(3, Set), defined as follows. Given a set c ∈ Ob(Set), represented by a functor c: 1Set, and define Δt(c) to be the composite ct: 3Set; it is the constant functor. That is, Δt(c): 3Set is the {1, 2, 3}-indexed set (c, c, c).

To say that Δt has a right adjoint called Πt : Fun(3, Set) → Set and that Πt takes limits should mean that the definition of right adjoint provides the formula that yields the appropriate limit. Fix a functor D : 3Set, so D(1), D(2), and D(3) are sets. We know from Example 6.1.3.25 that the limit, lim D, of D is supposed to be the product D(1) × D(2) × D(3). For example, if D = (ℤ, ℕ, ℤ), then lim D = ℤ × ℕ × ℤ. How does this fact arise in the definition of adjoint?

The definition of Πt being the right adjoint to Δt says that for any c ∈ Ob(Set) and D ∈ Fun(3, Set), there is a natural isomorphism of sets,

αc,D:HomFun( 3¯,Set)(Δt(c),D)HomSet(c,Πt(D)).(7.3)

The domain of αc,D has elements f ∈ HomFun(3,Set)t(c), D) that look like the left-hand drawing, but having these three maps is equivalent to having the right-hand diagram:

art

The isomorphism αc,D in (7.3) says that choosing the three functions f(1), f(2), f(3) is the same thing as choosing a function c → Πt(D). This is basically the universal property for limits: there is a unique function : cD(1) × D(2) × D(3), so this product is isomorphic to Πt. I have not given a formal proof here but hopefully enough for the interested reader to work it out.

7.1.3   Preservation of colimits or limits

One useful fact about adjunctions is that left adjoints preserve all colimits, and right adjoints preserve all limits.

Proposition 7.1.3.1. Let L:BA :R be an adjunction. For any indexing category I and functor D : IB, if D has a colimit in B, then there is a unique isomorphism

L(colim D)colim(LD).

Similarly, for any I ∈ Ob(Cat) and functor D:IA, if D has a limit in A, then there is a unique isomorphism

R(limD)lim(RD).

Proof. The proof is simple if one knows the Yoneda lemma (Section 7.2.1.14). See Mac Lane [29] for details.

Example 7.1.3.2. Since Ob: CatSet is both a left adjoint and a right adjoint, it must preserve both limits and colimits. This means that if one wants to know the set of objects in the fiber product of some categories, one can simply take the fiber product of the set of objects in those categories,

Ob(A×CB)Ob(A)×Ob(C)Ob(B).

While the right-hand side might look daunting, it is just a fiber product in Set, which is quite understandable (see Definition 3.2.1.1).

This is greatly simplifying. If one thinks through what defines a limit in Cat, one encounters notions of slice categories and terminal objects in them. These slice categories are in Cat so they involve several categories and functors, and it is difficult for a beginner. Knowing that the objects are given by a simple fiber product makes the search for limits in Cat much simpler.

For example, if [n] is the linear order category of length n, then [n] × [m] has (n + 1)(m + 1) objects because [n] has n + 1 objects and [m] has m + 1 objects.

Example 7.1.3.3. The path preorder functor L: GrphPrO given by existence of paths (see Exercise 5.1.2.13) is left adjoint to the functor R: PrOGrph given by replacing ⩽’s by arrows. This means that L preserves colimits. So taking the union of graphs G and H results in a graph whose path poset L(GH) is the union of the path posets of G and H. But this is not so for products, i.e., we do not expect to have an isomorphism L(G × H) ≅? L(G) × L(H).

As an example, let art. Then L(G) = L(H) = [1], the linear order of length 1. But the product G × H in Grph looks like the graph

art

Its preorder L(G × H) does not have (a, a) ⩽ (a, b), whereas this is the case in the preorder L(G) × L(H). So L(G × H) ≇ L(G) × L(H). The left adjoint preservers all colimits, but not necessarily limits.

7.1.4   Data migration

As we saw in Sections 5.2.2 and 5.2.2.6, a database schema is a category C, and an instance is a functor I:CSet.

Notation 7.1.4.1. Let C be a category. The category Fun(C,Set) of functors from C to Set, i.e., the category of instances on C, is denoted CSet.

This section discusses what happens to the resulting instances when different schemas are connected by a functor, say, F:CD. It turns out that three adjoint functors emerge: ΔF:DSetCSet, ΣF:CSetDSet, and ΠF:CSetDSet, where ΔF is adjoint to both of them:

ΣF:CSetDSet :ΔFΔF:DSetCSet :ΠF.

Interestingly, many of the basic database operations are captured by these three functors. For example, ΔF handles the job of duplicating or deleting tables as well as duplicating or deleting columns in a single table. The functor ΣF handles taking unions, and the functor ΠF handles joining tables together, matching columns, or selecting the rows with certain properties (e.g., everyone whose first name is Mary).

This section is challenging, and it can be safely skipped, resuming at Section 7.2. For those who want to pursue it, there is an open source implementation of these ideas and more, called FQL,5 which stands for functorial query language (not to be confused with Facebook query language).

7.1.4.2   Pullback: Δ

Given a functor F:CD and a functor I:DSet, we can compose them to get a functor IF:CSet. In other words, the presence of F provides a way to convert D-instances into C-instances. In fact, this conversion is functorial, meaning that a morphism of D-instances α: II′ is sent to a morphism of C-instances. This can be seen by whiskering (see Definition 5.3.2.16):

art

We denote the resulting functor ΔF:DSetCSet and call it pullback along F .

An example of this was given in Example 5.3.2.15, which showed how a monoid homomorphism F:MM could add functionality to a finite state machine. More generally, we can use pullbacks to reorganize data, copying and deleting tables and columns.

Remark 7.1.4.3. Given a functor F:CD, which we think of as a schema translation, the functor ΔF:DSetCSet goes the opposite way. The reasoning is simple to explain (we are composing functors) but something about it often seems strange at first. The rough idea of this contravariance is captured by the role-reversal in the following slogan:

Slogan 7.1.4.4.

If I get my information from you, then your information becomes my information.

Consider the following functor F:CD:6

art

Recall how to read schemas. In schema C there are leaf tables SSN, First, Last, Salary, which represent different kinds of basic data. More interestingly, there are two fact tables. The first is called T1, and it relates SSN, First, and Last. The second is called T2, and it relates First, Last, and Salary.

The functor F:CD relates C to a schema D which has a single fact table relating all four attributes: SSN, First, Last, and Salary. We are interested in ΔF:DSetCSet. Suppose given the following database instance I:DSet on D:

art

art

How does one get the instance ΔF(I):CSet? The formula was given: compose I with F . In terms of tables, it is like duplicating table T as T1 and T2 but deleting a column from each in accordance with the definition of C in (7.4). Here is the result, ΔF (I), in table form:

art

Exercise 7.1.4.5.

Consider the schemas

art

and the functor F : [1] → [2] given by sending 0 ↦ 0 and 1 ↦ 2.

a. How many possibilities are there for F(f)?

b. Suppose I : [2] → Set is given by the following tables:

art

Write the two tables associated to the [1]-instance ΔF(I): [1] → Set.

7.1.4.6   Left pushforward: Σ

Let F:CD be a functor. The functor ΔF:DSetCSet has a left adjoint, ΣF:CSetDSet. The rough idea is that ΣF performs parameterized colimits. Given an instance I:CSet, we get an instance on D that acts as follows. For each object dOb(D), the set ΣF(I)(d) is the colimit (think of union) of some diagram in C.

Left pushforwards (also known as left Kan extensions) are discussed at length in Spivak [38]; here we examine some examples from that paper.

Example 7.1.4.7. We again use the functor F:CD from (7.4):

art

We apply the left pushforward ΣF:CSetDSet to the following instance I:CSet:

art

The functor F:CD sends both tables T1 and T2 to table T. Applying ΣF takes what was in T1 and T2 and puts the union in T. The result, ΣFI:DSet, is as follows:

art

art

As one can see, no set salary information for any data comes from table T1, nor does any set SSN information come form table T2. But the definition of adjoint, given in Definition 7.1.1.1, yields the universal response: freely add new variables that take the place of missing information. It turns out that this idea already has a name in logic, Skolem variables, and a name in database theory, labeled nulls.

Exercise 7.1.4.8.

Consider the functor F : 32 given by the sequence (1, 2, 2).

a. Write an instance I : 3Set.

b. Given the description “ΣF performs a parameterized colimit,” make an educated guess about what ΣF(I): 2Set is. Give your answer in the form of two sets that are made up from the three sets you already wrote.

Here is the actual formula for computing left pushforwards. Suppose that F:CD is a functor, and let I:CSet be a set-valued functor on C. Then ΣF(I):DSet is defined as follows. Given an object dOb(D), we first form the comma category (see Definition 6.2.4.1) for the cospan

CFDd1¯

and denote it (Fd). There is a canonical projection functor π:(Fd)C, which we can compose with I:CSet to obtain a functor (Fd) → Set. We are ready to define ΣF(I)(d) to be its colimit,

ΣF(I)(d)colim(Fd) Iπ.

ΣF(I):DSet has been defined on objects dOb(D). Morphisms are treated here only briefly; see Spivak [38] for details. Given a morphism g : dd′, there is an induced functor (Fg): (Fd) → (Fd′) and a commutative diagram of categories:

art

By the universal property for colimits, this induces the required function

colim(Fd) IπΣF(I)(g)colim(Fd) Iπ.

7.1.4.9   Right pushforward: Π

Let F:CD be a functor. Section 7.1.4.6 explained that the functor ΔF:DSetCSet has a left adjoint. The present section explains that ΔF has a right adjoint, ΠF:CSetDSet as well. The rough idea is that ΠF performs parameterized limits. Given an instance I:CSet, we get an instance on D that acts as follows. For each object dOb(D), the set ΠF(I)(d) is the limit (think of fiber product) of some diagram in C.

Right pushforwards (also known as right Kan extensions) are discussed at length in Spivak [38]; here we look at some examples from that paper.

Example 7.1.4.10. We again use the functor F:CD from (7.4) and Example 7.1.4.7. We apply the right pushforward ΠF to instance I:CSet from that example.7

The instance ΠF(I) puts data in all five tables in D. In T it puts pairs (t1, t2), where t1 is a row in T1, and t2 is a row in T2, for which the first and last names agree. It copies the leaf tables exactly, so they are not displayed here; the following is the table T for ΠF(I):

T

ID SSN First Last Salary
T1-002T2-A104

122-988

Sue

Smith

$300

T1-003T2-A101

198-877

Alice

Jones

$100

From T1 and T2 there are only two ways to match first and last names.

Exercise 7.1.4.11.

Consider the functor F : 32 given by the sequence (1, 2, 2).

a. Write an instance I : 3Set.

b. Given the description “ΠF performs a parameterized limit,” make an educated guess about what ΠF(I): 2Set is. Give your answer in the form of two sets that are made up from the three sets you already wrote down.

Here is the actual formula for computing right pushforwards. Suppose that F:CD is a functor, and let I:CSet be a set-valued functor on C. Then ΠF(I):DSet is defined as follows. Given an object dOb(D), we first form the comma category (see Definition 6.2.4.1) for the cospan

1¯dDFC

and denote it (dF). There is a canonical projection functor π:(dF)C, which we can compose with I:CSet to obtain a functor (dF) → Set. We are ready to define ΠF(I)(d) to be its limit,

ΠF(I)(d)lim(dF)Iπ.

ΠF(I):DSet has been defined on objects dOb(D), and morphisms are treated only briefly; see Spivak [38] for details. Given a morphism g : dd′, there is an induced functor (gF) : (d′ ↓ F) → (dF) and a commutative diagram of categories:

art

By the universal property for limits, this induces the required function

lim(dF)IπΠF(I)(g)lim(dF)Iπ.

Proposition 7.1.4.12. Left adjoints are closed under composition, as are right adjoints. That is, given adjunctions,

CRLDRL

their composite is also an adjunction:

CRRLL.

Proof. This is a straightforward calculation. For any objects cOb(C) and e ∈ Ob(E) we have adjunction isomorphisms:

Hom(L(L(c)),e)HomD(L(c),R(e))HomC(c,R(R(e)))

whose composite is the required adjunction isomorphism. It is natural in our choice of objects c and e.

Example 7.1.4.13 (Currying via Δ, Σ, Π). This example shows how currying (as in Sections 3.4.2 and 7.1.1.8) arises out of a certain combination of data migration functors.

Let A, B, and C be sets. Consider the unique functor a: A1 and consider B and C as functors 1¯BSet and 1¯CSet respectively.

art

Note that 1SetSet, and we elide the difference.

We know that Σa is left adjoint to Δa and that Δa is left adjoint to Πa, so by Proposition 7.1.4.12, the composite Σa ○ Δa is left adjoint to ΠaΔa. The goal is to see currying arise out of the adjunction isomorphism

HomSet(ΣaΔa(B),C)HomSet(B,ΠaΔa(C)).(7.5)

By definition, Δa(B): ASet assigns to each element aA the set B. Since ΣA takes disjoint unions, we have a bijection

Σa(Δa(B))=(aAB)A×B.

Similarly, Δa(C): ASet assigns to each element aA the set C. Since ΠA takes products, we have a bijection

Πa(Δa(C))=(aAC)CA.

The currying isomorphism HomSet(A × B, C) ≅ HomSet(B, CA) falls out of (7.5).

7.2   Categories of functors

For any two categories C and D,8 Section 5.3.2.1 discussed the category Fun(C,D) of functors and natural transformations between them. This section discusses functor categories a bit more and gives some important applications in mathematics (sheaves) that extend to the real world.

7.2.1   Set-valued functors

Let C be a category. We have been denoted by CSet the functor category Fun(C,Set). Here is a nice result about these categories.

Proposition 7.2.1.1. Let C be a category. The category CSet is closed under colimits and limits. That is, for any category I and functor D:ICSet, both the limit and the colimit of D exist in CSet.

Sketch of proof. We rely on the fact that the category Set is complete and cocomplete (see Remark 6.1.3.37), i.e., that it has all limits and colimits (see Theorems 6.1.3.28 and 6.1.3.35 for constructions). Let J be an indexing category and D:JCSet a functor. For each object cOb(C), we have a functor Dc:JSet defined by Dc(j) = D(j)(c). Define a functor L:CSet by L(c) = limJ Dc, and note that for each f : cc′ in C there is an induced function L(f): L(c) → L(c′). One can check that L is a limit of J, because it satisfies the relevant universal property.

The dual proof holds for colimits.

Application 7.2.1.2. When taking in data about a scientific subject, one often finds that how one thinks about the problem changes over time. We understand this phenomenon in the language of databases in terms of a series of schemas C1,C2,,Cn+1, perhaps indexed chronologically. The problem is that previously-collected data is held in what may be outdated schemas, and we want to work with it in our current understanding. By finding appropriate functors between these schemas, or possibly with the help of auxiliary schemas, we can make a chain of categories and functors

C1F1D1G11H1C2F2D2G22H2GnnHnCn+1.

We can then use the data migration functors ΔF, ΠG, and ΣH to move data from category C1 to category Cn+1 using projections, joins, and unions in any combination. Theorems about sequences of Δ’s, Π’s, and Σ’s can help us understand how such a transformation will behave, before we spend the resources to enact it.

Exercise 7.2.1.3.

By Proposition 7.2.1.1, the category CSet is closed under taking colimits and limits. By Exercises 6.1.3.24 and 6.1.3.34, this means in particular, that CSet has an initial object and a terminal object.

a. Let AOb(CSet) be the initial object, considered as a functor A:CSet. For any cOb(C), what is the set A(c)?

b. Let ZOb(CSet) be the terminal object, considered as a functor Z:CSet. For any cOb(C), what is the set Z(c)?

Proposition 7.2.1.1 says that we can add or multiply database instances together. In fact, database instances on C form a topos, which means that just about every consideration we made for sets holds for instances on any schema. Perhaps the simplest schema is art, on which the relevant topos art is indeed equivalent to Set. But schemas can be arbitrarily complex categories, and it is impressive that all these set-theoretic notions make sense in such generality. Here is a table that compares these domains:

Dictionary between Set and CSet

Concept in Set Concept in CSet
Set Object in CSet
Function Morphism in CSet
Element Representable functor
Empty set Initial object
Natural numbers Natural numbers object
Image Image
(Co)limits (Co)limits
Exponential objects Exponential objects
“Familiar” arithmetic “Familiar” arithmetic
Power-sets 2X Power objects ΩX
Characteristic functions Characteristic morphisms
Surjections, injections Epimorphisms, monomorphisms

Thus elements of a set are akin to representable functors in CSet, which are defined in Section 7.2.1.6. We briefly discuss monomorphisms and epimorphisms first in general (Definition 7.2.1.4) and then in CSet (Proposition 7.2.1.5).

Definition 7.2.1.4 (Monomorphism, epimorphism). Let S be a category, and let f : XY be a morphism. We say that f is a monomorphism if it has the following property. For all objects AOb(S) and morphisms g, g′ : AX in S,

art

if fg = fg′, then g = g′.

We say that f : XY is an epimorphism if it has the following property. For all objects BOb(S) and morphisms h, h′ : YB in S,

art

if hf = h′ ○ f, then h = h′.

In the category of sets, monomorphisms are the same as injections, and epimorphisms are the same as surjections (see Proposition 3.4.5.8). The same is true in CSet: one can check table by table that a morphism of instances is mono or epi.

Proposition 7.2.1.5. Let C be a category, let X,Y:CSet be objects in CSet, and let f : XY be a morphism in CSet. Then f is a monomorphism (resp. an epimorphism) if and only if for every object cOb(C), the function f(c): X(c) → Y(c) is injective (resp. surjective).

Sketch of proof. We first show that if f is mono (resp. epi), then so is f(c), for all cOb(C). Considering c as a functor c:1¯C, this result follows from the fact that Δc preserves limits and colimits, hence monos and epis.

We now check that if f(c) is mono for all cOb(C), then f is mono. Suppose that g, g′ : AX are morphisms in CSet such that fg = fg′. Then for every c, we have fg(c) = fg′(c), which implies by hypothesis that g(c) = g′(c). But the morphisms in CSet are natural transformations, and if two natural transformations g, g′ have the same components, then they are the same.

A similar argument works to show the analogous result for epimorphisms.

7.2.1.6   Representable functors

Given a category C, there are certain functors CSet that come with the package, i.e., that are not arbitrary from a mathematical perspective as database instances usually are. In fact, there is a certain instance corresponding to each object in C. So if C is a database schema, then for every table cOb(C) there is a certain database instance associated to it. These instances, i.e., set-valued functors, are called representable functors (see Definition 7.2.1.7). The idea is that if a database schema is a conceptual layout of types (e.g., as an olog), then each type c has an instance associated to it, standing for “the generic thing of type c with all its generic attributes.”

Definition 7.2.1.7. Let C be a category, and let cOb(C) be an object. The functor HomC(c,):CSet, sending dOb(C) to the set HomC(c,d) and acting similarly on morphisms dd′, is said to be represented by c. If a functor F:CSet is isomorphic to HomC(c,), we say that F is a representable functor. To shorten notation we sometimes write

YcHomC(c,).

Example 7.2.1.8. Given a category C and an object cOb(C), we get a representable functor Yc. If we think of C as a database schema and c as a table, then what does the representable functor Yc:CSet look like in terms of databases? It turns out that the following procedure will generate it.

Begin by writing a new row, say, “☺,” in the ID column of table c. For each foreign key column f : cc′, add a row in the ID column of table c′ called “f(☺)” and record that result, “f(☺),” in the f column of table c. Repeat as follows: for each table d, identify all rows r that have a blank cell in column g : de. Add a new row called “g(r)” to table e and record that result, “g(r),” in the (r, g) cell of table d.

Here is a concrete example. Let C be the following schema:

art

Then YB:CSet is given by “morphisms from B to −,” i.e., it is the following instance:

art

To create YB we began with a single element in table B and followed the arrows, putting new entries wherever they were required. One might call this the schematically implied reference spread or SIRS of the element ☺ in table B. Notice that the table at A is empty, because there are no morphisms BA in C.

Representable functors Yc yield database instances that are as free as possible, subject to having the initial row ☺ in table c. We saw this before (as Skolem variables) when studying the left pushforward Σ. Indeed, suppose cOb(C) is an object represented by the functor c:1¯C. A database instance on 1 is the same thing as a set X. The left pushforward Σc(X) has the same kinds of Skolem variables as Yc does. In fact, if X = {☺} is a one-element set, then we get the representable functor

YcΣc{}.

Exercise 7.2.1.9.

Consider the schema for graphs,

art

a. Write the representable functor YAr : GrInSet as two tables.

b. Write the representable functor YVe as two tables.

Solution 7.2.1.9.

a. This was done in Exercise 5.3.3.7, although not with the most natural names. Here we rewrite YAr = HomGrIn(Ar, −) as

art

b. Here is YVe = HomGrIn(Ve, −) with “natural names”:

art

(The left-hand table is empty because there are no morphisms VeAr in GrIn.)

Exercise 7.2.1.10.

Consider the loop schema

art

Express the representable functor Ys:LoopSet in table form.

Solution 7.2.1.10.

We have Ys=HomLoop(s,):LoopSet. On objects, of which there is only Ob(Loop)={s}, we have Ys(s) = {fn | n ∈ ℕ}. The morphism f : ss acts on Ys(s) by composing. Here is Ys in table form:

s

ID f
f(☺)
f(☺) f2(☺)
f2(☺) f3(☺)
f3(☺) f4(☺)
f4(☺) f5(☺)

Let B be a box in an olog, say, ⌜a person⌝, and recall that an aspect of B is an outgoing arrow, such as a personhas as height in inchesan integer. The following slogan explains representable functors in those terms.

Slogan 7.2.1.11.

The functor represented by ⌜a person⌝ simply leaves a placeholder, likeperson’s name hereorperson’s height here〉, for every aspect of ⌜a person⌝. In general, there is a representable functor for every type in an olog. The representable functor for type T simply encapsulates the most generic or abstract example of type T, by leaving a placeholder for each of its attributes.

Exercise 7.2.1.12.

Recall from Definition 7.2.1.7 that a functor F:CSet is said to be represented by c if there is a natural isomorphism FHomC(c,).

a. There is a functor Ob: CatSet (see Exercise 5.1.2.41) sending a category C to its set Ob(C) of objects, and sending a functor to its on-objects part. This functor is representable by some category. Name a category A that represents Ob.

b. There is a functor Hom: CatSet (see Exercise 5.1.2.42) sending a category C to the set HomC of all morphisms in C and sending a functor to its on-morphisms part. This functor is representable by a category. Name a category B that represents Hom.

Exercise 7.2.1.13.

Let C be a category, let c,cOb(C) be objects, and let Yc,Yc:CSet be the associated representable functors. Given f : cc′, we want to construct a morphism Yf : YcYc in Fun(CSet). Of course, Yf is supposed to be a natural transformation, so we need to provide a component (Yf)d for every object dOb(C).

a. What must the domain and codomain of (Yf)d be? (Simplify your answer using Definition 7.2.1.7.)

b. Can you make sense of the statement, “Define (Yf)d by precomposition”?

c. If h: de is a morphism in C, draw the naturality square for Yf. Does it commute?

Solution 7.2.1.13.

a. We have (Yf)d : Yc(d) → Yc(d). But by definition, this is (Yf)d:HomC(c,d)HomC(c,d).

b. Given an element gHomC(c,d), we can precompose with f to get a morphism cfcgd, so let’s define (Yf)d(g) = gf.

c. The naturality square is as follows

art

and it commutes because, for any element gYc(d), the composition cfcgdhe is associative. More explicitly, going down then right we have (Yf)d(g) = gf and Yc(h)(gf) = h ○ (gf). Going right then down we have Yc(h)(g) = hg and (Yf)e(hg) = (hg) ○ f. To reiterate, the associativity of composition in C insures that this square commutes.

7.2.1.14   Yoneda’s lemma

One of the most powerful tools in category theory is Yoneda’s lemma. It is often considered by students to be quite abstract, but grounding it in databases may help.

Suppose that I:CSet is an arbitrary database instance, let cOb(C) be an object, and let f : cc′ be any outgoing arrow. Because I is a functor, we know that for every row rI(c) in table c, a value has been recorded in the f column. The value in the (r, f) cell refers to some row in table c′. That is, each row in table c induces SIRS throughout the database as freely as possible (see Example 7.2.1.8). The instance Yc consists entirely of a single row ☺ in table c and its SIRS. The idea is that for any row rI(c) in arbitrary instance I, there exists a unique map YcI sending ☺ to r.

Proposition 7.2.1.15 (Yoneda’s lemma, part 1). Let C be a category, cOb(C) an object, and I:CSet a set-valued functor. There is a natural bijection

HomCSet(Yc,I)I(c).

Proof. See Mac Lane [29].

Example 7.2.1.16. Consider the category C drawn as follows:

art

There are two representable functors, YChild and YMother. The former, YChild:CSet, is shown here:

art

The representable functor YChild is the freest instance possible, starting with one element in the Child table and satisfying the constraints. The latter, YMother is the freest instance possible, starting with one element in the Mother table and satisfying the constraints. Since mother○firstChild=idMother, this instance has just one row in each table:

art

Here is an arbitrary instance I:CSet:

art

Yoneda’s lemma (7.2.1.15) is about the set of natural transformations YChildI. Recall from Definition 5.3.1.2 that a search for natural transformations can get tedious. Yoneda’s lemma makes the calculation quite trivial. In this case there are exactly four such natural transformations, HomCSet(YChild,I)I(Child)4¯, and they are completely determined by where ☺ goes. In some sense the symbol ☺ in YChild represents childness in this database.

Exercise 7.2.1.17.

Consider the schema C and instance I:CSet from Example 7.2.1.16. Let YChild be the representable functor, and write (☺ ↦ Amy) for the unique natural transformation YChildI sending ☺ to Amy, and so on.

a. What is (☺ ↦ Amy)Child(firstChild(mother(☺)))?9

b. What is (☺ ↦ Bob)Child(firstChild(mother(☺)))?

c. What is (☺ ↦ Carl)Child(firstChild(mother(☺)))?

d. What is (☺ ↦ Amy)Mother(mother(☺))?

e. In parts (a)–(d), what information does the first subscript (Child, Child, Child, Mother) give you about the answer?

Section 7.2.1.6 showed that a representable functor CSet is a mathematically generated database instance for an abstract thing of type TOb(C). It creates placeholders for every attribute that things of type T are supposed to have.

Slogan 7.2.1.18.

Yoneda’s lemma says the following. Specifying an actual thing of type T is the same as filling in all placeholders found in the generic thing of type T.

Yoneda’s lemma is considered by many category theorists to be the most important tool in the subject. While its power is probably unclear to students whose sole background in category theory comes from this book, Yoneda’s lemma is indeed extremely useful for reasoning. It allows us to move the notion of functor application into the realm of morphisms between functors (i.e., morphisms in CSet, which are natural transformations). This keeps everything in one place—it is all in the morphisms—and thus more interoperable.

Example 7.2.1.19. Example 4.1.1.27 discussed the cyclic monoid M generated by the symbol Q and subject to the relation Q7 = Q4, depicted as

art

Here is the mathematical foundation for this picture. Since M is a category with one object, ▲, there is a unique representable functor (up to isomorphism) YY:MSet. Any functor MSet can be thought of as a set with an M action (see Section 5.2.1.1). In the case of Y , the required set is

Y()=HomM(,){Q0,Q1,Q2,Q3,Q4,Q5,Q6},

and the action is pretty straightforward (it is called the principal action). For example, art. We might say that (7.6) is a picture of this principal action of M.

However, we can go one step further. Given the functor Y:MSet, we can take its category of elements, MY (see Section 6.2.2). The category MY has objects Y(▲) ∈ Ob(Set), i.e., the set of dots in (7.6), and it has a unique morphism QiQj for every path of length ⩽ 6 from Qi to Qj in that picture. So the drawing of M in (7.6) is actually the category of elements of M’s unique representable functor.

Exercise 7.2.1.20.

Let C be a category, let cOb(C) be an object, and let IOb(CSet) be in instance of C. Consider c also as a functor c:1¯C and recall the pullback functor Δc:CSetSet and its left adjoint Σc:SetCSet (see Section 7.1.4).

a. What is the set Δc(I)?

b. What is HomSet({☺}, Δc(I))?

c. What is HomCSet(Σc({}),I)?

d. How does Σc({☺}) compare to Yc, the functor represented by c, as objects in CSet?

Proposition 7.2.1.21 (Yoneda’s lemma, part 2). Let C be a category. The assignment cYc from Proposition 7.2.1.15 extends to a functor Y:CopCSet, and this functor is fully faithful.

In particular, if c,cOb(C) are objects and there is an isomorphism YcYc in CSet, then there is an isomorphism ccin C.

Proof. See Mac Lane [29].

Exercise 7.2.1.22.

The distributive law for addition of natural numbers says c × (a + b) = c × a + c × b. Following is a proof of the distributive law using category-theoretic reasoning. Annotate anything shown in red with a justification for why it is true.

Proposition (Distributive law). For any natural numbers a, b, c ∈ ℕ, the distributive law holds:

c(a+b)=ca+cb.

Sketch of proof. To finish, justify things shown in red.

Let A, B, C be finite sets, and let X be another finite set.

HomSet(C × (A + B), X) HomSet(A + B, XC)
HomSet(A, XC) × HomSet(B, XC)
HomSet(C × A, X) × HomSet(C × B, X)
HomSet((C × A) + (C × B), X).

By the appropriate application of Yoneda’s lemma, we see that there is an isomorphism

C×(A+B)(C×A)+(C×B)

in Fin. The result about natural numbers follows.

7.2.1.23   The subobject classifier ΩOb(CSet)

If C is a category, then the functor category CSet is a special kind of category, called a topos. Note that when C=1¯ is the terminal category, then we have an isomorphism 1SetSet, so the category of sets is a special case of a topos. What is interesting about toposes (or topoi) is that they generalize many properties of Set. This short section investigates only one such property, namely, that CSet has a subobject classifier, denoted ΩOb(CSet). In the case C=1¯ the subobject classifier is {True, False} ∈ Ob(Set) (see Definition 3.4.4.9).

As usual, we consider the matter of subobject classifiers by grounding the discussion in terms of databases. The analogue of {True, False} for an arbitrary database can be quite complex—it encodes the whole story of relational database instances for that schema.

Definition 7.2.1.24. Let C be a category, let CSet denote its category of instances, and let 1COb(CSet) denote the terminal object. A subobject classifier for CSet is an object ΩCOb(CSet) and a morphism t:1CΩC with the following property. For any monomorphism f : IJ in CSet, there exists a unique morphism char(f):JΩC such that the following diagram is a pullback in CSet:

art

That is, for any instance J there is a bijection

HomCSet(J,Ω){IOb(CSet)|IJ}.

In terms of databases, what this means is that for every schema C, there is some special instance ΩCOb(CSet) that somehow classifies subinstances of anything. When the schema is the terminal category, C=1¯, instances are sets and according to Definition 3.4.4.9 the subobject classifier is Ω1 = {True, False}. One might think that the subobject classifier for CSet should just consist of a two-element set table by table, i.e., that for every cOb(C), we should have ΩC=?{True,False}, but this is not correct.

In fact, for any object cOb(C), there is a way to figure out what ΩC(c) has to be. We know by Yoneda’s lemma (Proposition 7.2.1.15) that ΩC(c)=HomCSet(Yc,ΩC), where Yc is the functor represented by c. There is a bijection between HomCSet(Yc,ΩC) and the set of subinstances of Yc. Thus we have

ΩC(c)={IOb(CSet)|IYc}.(7.7)

How should ΩC:CSet behave on morphisms? By Exercise 7.2.1.13, each morphism f : cd in C induces a morphism Yf : YdYc, and the map ΩC(f):ΩC(c)ΩC(d) sends a subinstance AYc to the pullback

art

That is, ΩC(f)(A)=Yf1(A).

We have now fully described ΩC as a functor, but the description is very abstract. Here is an example of a subobject classifier.

Example 7.2.1.25. Consider the following category C[3]:

art

To write ΩC, we need to understand the representable functors YcOb(CSet), for c = 0, 1, 2, 3, as well as their subobjects. Here is Y0 as an instance:

art

What are the subinstances of this? There is the empty subinstance ∅ ⊆ Y0 and the identity subinstance Y0Y0. But there are three more as well. Note that if we want to keep the ☺ row of table 0, then we have to keep everything. But if we throw away the ☺ row of table 0, we can still keep the rest and get a subinstance. If we want to keep the after_1(☺) row of table 1, then we have to keep its images in tables 2 and 3. But we could throw away both the ☺ row of table 0 and the after_1(☺) row of table 1 and still keep the rest. And so on. In other words, there are five subobjects of Y0, i.e., elements of ΩC(0), but they are hard to name. We arbitrarily name them by ΩC(0){yes, wait 1, wait 2, wait 3, never}.

The same analysis holds for the other tables of ΩC. For example, we denote the three subinstances of Y2 by ΩC(2)={yes, wait 1, never}. In sum, the database instance ΩC is:

art

The morphism 1ΩC picks out the yes row of every table.

Now that we have constructed ΩCOb(CSet), we are ready to use it. What makes ΩC special is that for any instance X:CSet, the subinstances if X are in one-to-one correspondence with the instance morphisms XΩC. Consider the following arbitrary instance X, where the blue rows denote a subinstance AX.

art

This blue subinstance AX corresponds to a natural transformation char(A):XΩC. That is, for each cOb(C), all the rows in the c table of X are sent to the rows in the c table of ΩC, as they would be for any natural transformation. The way char(A) works is as follows. For each table i and row xX(i), find the first column f in which the entry is blue (i.e., f(x) ∈ A), and send x to the corresponding element of ΩC(i). For example, char(A)(0) sends a1 to wait 2 and sends a4 to never, and char(A)(2) sends c1 to yes and sends c2 to never.

Exercise 7.2.1.26.

a. Write the blue subinstance AX shown in (7.9) as an instance of C, i.e., as four tables.

b. This subinstance AX corresponds to a map char(A):XΩC. For all cOb(C), we have a function (c):X(c)ΩC(c). With c = 1, write out (1):X(1)ΩC(1).

Exercise 7.2.1.27.

Let Loop be the loop schema

art

a. What is the subobject classifier ΩLoopOb(LoopSet)? (Write it out in table form.)

b. In Exercise 7.2.1.10 you computed the representable functor Ys. How does ΩLoop compare to Ys?

c. Consider the discrete dynamical system X and its subset WX:

art

What is the morphism char(W):XΩLoop that corresponds to this subobject?

Exercise 7.2.1.28.

Let art be the indexing category for graphs.

a. Write the subobject classifier ΩGrIn ∈ Ob(GrInSet) in tabular form, i.e., as two tables.

b. Draw ΩGrIn as a graph.

c. Let G be the following graph and G′ ⊆ G the blue part.

art

Write G ∈ Ob(GrInSet) in tabular form.

d. Write the components of the natural transformation char(G′): G → ΩGrIn.

7.2.2   Database instances in other categories

So far we have focused on the category CSet=Fun(C,Set) of set-valued functors CSet for arbitrary categories, or database schemas, C. What if we allow the target category Set to change?

7.2.2.1   Representations of groups

The classical mathematical subject of representation theory is the study of Fun(G, Vect), where G is a group and Vect is the category of vector spaces (over, say, ℝ). Every such functor F : GVect is called a representation of G. Since G is a category with one object ▲, the functor F provides a single vector space V = F (▲) together with an action of G on it.

We can think of this in terms of databases if we have a presentation of G in terms of generators and relations. The schema corresponding to G has one table, and this table has a column for each generator (see Section 4.1.3). Giving a representation F is the same as giving an instance on the schema, with some properties that stem from the fact that the target category is Vect rather than Set. There are many possibilities for expressing such data.

One possibility is if we could draw V , say, if V were one-, two-, or three-dimensional. If so, let P be the chosen picture of V , e.g., P is the standard drawing of a Cartesian coordinate plane V = ℝ2. Then every column of the table would consist entirely of the picture P instead of a set of rows. Touching a point in the ID column ℝ2 would result in a point being drawn in the ℝ2 corresponding to the other column, in accordance with the G action. Each column would, of course, respect addition and scalar multiplication.

Another possibility is to use the fact that there is a functor U : VectSet, so the instance F : GVect could be converted to an ordinary instance UF : GSet. We would have an ordinary set of rows. This set would generally be infinite, but it would be structured by addition and scalar multiplication. For example, assuming V is finite-dimensional, one could find a few rows that generated the rest.

A third possibility is to use monads, which would allow the table to have only as many rows as V has dimensions. This yields a considerable saving of space. See Section 7.3. In all these possibilities, the usual tabulated format of databases has been slightly altered to accommodate the extra information in a vector space.

7.2.2.2   Representations of quivers

Representation theory also studies representations of quivers. A quiver is just the free category (see Example 5.1.2.33) on a graph. If P is a graph with free category P, then a representation of the quiver P is a functor F:PVect. Such a representation consists of a vector space at each vertex of P and a linear transformation for each arrow. All the discussion in Section 7.2.2.1 works in this setting, except that there is more than one table.

7.2.2.3   Other target categories

One can imagine the value of using target categories other than Set or Vect for databases.

Application 7.2.2.4. Geographic data consists of maps of the earth together with various functions on it. For example, for any point on the earth one may want to know the average of temperatures recorded in the past ten years or the precise temperature at this moment. Earth can be considered as a topological space, E. Similarly, temperatures on earth reside on a continuum, say, the space T of real numbers [−100, 200]. Thus the temperature record is a continuous function ET .

Other records such as precipitation, population density, elevation, and so on, can all be considered as continuous functions from E to some space. Agencies like the U.S. Geological Survey hold databases of such information. By modeling them on functors CTop, they may be able to employ mathematical tools such as persistent homology (see Weinberger [44]) to find interesting invariants of the data.

Application 7.2.2.5. Application 7.2.2.4 discussed using topological database instances to model geographical data. Other scientific disciplines could use the same kind of tool. For example, in studying the mechanics of materials, one may want to consider the material as a topological space M and measure values such as energy as a continuous map ME. Such observations could be modeled by databases with target category Top or Vect rather than Set.

7.2.3   Sheaves

Let X be a topological space (see Example 5.2.3.1), such as a sphere. Section 7.2.2.3 discussed continuous functions out of X and their use in science (e.g., recording temperatures on the earth as a continuous map X → [−100, 200]). Sheaves allow us to consider the local-global nature of such maps, taking into account reparable discrepancies in data-gathering tools.

Application 7.2.3.1. Suppose that X is the topological space corresponding to the earth, and let region mean an open subset UX. Suppose that we cover X with 10,000 regions U1, U2, …, U10000, such that some of the regions overlap in a nonempty subregion (e.g., U5U9 ≠ ∅). For each i, j, let Ui,j = UiUj.

For each region UiX, we have a temperature-recording device, which gives a function Ti : Ui → [−100, 200]. If UiUj ≠ ∅, then two different recording devices give us temperature data for the intersection Ui,j. Suppose we find that they do not give precisely the same data but that there is a translation formula between their results. For example, Ti might register 3 warmer than Tj registers, throughout the region UiUj.

Roughly speaking, a consistent system of translation formulas is called a sheaf. It does not demand a universal true temperature function but only a consistent translation system between them.

Definitions 7.2.3.2 and 7.2.3.5 make the notion of sheaf precise, but it is developed slowly at first.

For every region U, we can record the value of some function (say, temperature) throughout U. Although this record might consist of a mountain of data (a temperature for each point in U), it can be thought of as one thing. That is, it is one element in the set of “value assignments throughout U”. A sheaf holds the set of “value assignments throughout U” for each region U as well as how a “value assignment throughout U” restricts to a “value assignment throughout V ” for any subset VU.

Definition 7.2.3.2. Let X be a topological space, let Open(X) denote its partial order of open sets, and let Open(X)op be the opposite category. A presheaf on X is a functor O:Open(X)opSet. For every open set UX, we refer to the set O(U) as the set of value assignments throughout U of O. If VU is an open subset, it corresponds to an arrow in Open(X), and applying the functor O yields a function called the restriction map from U to V and denoted ρV,U:O(U)O(V). Given aO(U), we may denote ρV,U(a) by a|V; it is called the restriction of a to V.

The category of presheaves on X is simply Open(X)opSet (see Definition 5.3.3.1).

Exercise 7.2.3.3.

a. Find four overlapping open subsets that cover the square X ≔ [0, 3] × [0, 3] ⊆ ℝ2. Write a label for each open set as well as a label for each overlap (two-fold, three-fold, etc.). You now have labeled n open sets. What is your n?

b. Draw the preorder Open(X). For each of the n open sets, draw a dot with the appropriate label. Then draw an arrow from one dot to another when the first refers to an open subset of the second. This is Open(X).

c. Make up and write formulas R1 : X → ℝ and R2 : X → ℝ with R1(x) ⩽ R2(x) for all xX, expressing a range of temperatures R1(p) ⩽ Temp(p) ⩽ R2(p) that an imaginary experiment shows can exist at each point p in the square. What is the temperature range at p = (2, 1) ∈ X?

d. Make a presheaf O:Open(X)opSet as follows. For each of your open sets, say, A ∈ Open(X), put

O(A){Temp:A|aA, R1(a)Temp(a)R2(a)}.

Call one of your n open sets A. What is O(A)? Then choose some A′ ⊆ A; what is O(A), and what is the restriction map ρA,A:O(A)O(A) in this case? Do you like the name “value assignment throughout A” for an element of O(A)?

Before moving to a definition of sheaves, we need to clarify the notion of covering. Suppose that U is a region and V1, …, Vn are subregions (i.e., for each 1 ⩽ in, we have ViU). Then we say that the Vi collectively cover U if every point in U is in Vi for some i. Another way to say this is that the natural function ⊔iViU is surjective.

Example 7.2.3.4. Let X = ℝ be the space of real numbers, and define the following open subsets: U = (5, 10), V1 = (5, 7), V2 = (6, 9), V3 = (8, 10).10 Then V1, V2, V3 collectively cover of U. It has overlaps V12 = V1V2 = (6, 7), V13 = V1V3 = Ø, V23 = V2V3 = (8, 9).

Given a presheaf O:Open(X)opSet, we have sets and functions as in the following diagram

art

A presheaf O on X tells us what value assignments throughout U can exist for each U. Suppose we have a value assignment a1O(V1) throughout V1 and another value assignment a2O(V2) throughout V2, and suppose they agree as value assignments throughout V1V2, i.e., a1|V1V2=a2|V1V2. In this case we should have a unique value assignment bO(V1V2) throughout V1V2 that agrees on the V1 part with a1 and agrees on the V2 part with a2; i.e., b|V1=a1 and b|U2=a2. The condition that such equations hold for every covering is the sheaf condition.

For example, the elements of O(U) might be functions h : U → ℝ, each of which we imagine as a curve defined on the interval U = (5, 10). The sheaf condition says that if one is given a curve-snippet over (5, 7), a curve-snippet over (6, 9), and a curve snippet over (8, 10), and these all agree on overlap intervals (6, 7) and (8, 9), then they can be put together to form a curve over all of U.

Definition 7.2.3.5. Let X be a topological space, let Open(X) be its partial order of open sets, and let O:Open(X)opSet be a presheaf. Given an open set UX and a cover V1, …, Vn of U, the following condition is called the sheaf condition for that cover.

Sheaf condition Given a sequence a1, …, an, where each aiO(Vi) is a value assignment throughout Vi, suppose that for all i, j, we have ai|ViVj=aj|ViVj; then there is a unique value assignment bO(U) such that b|Vi=ai.

The presheaf O is called a sheaf if it satisfies the sheaf condition for every cover.

Remark 7.2.3.6. Application 7.2.3.1 said that sheaves help us patch together information from different sources. Even if different temperature-recording devices Ti and Tj registered different temperatures on an overlapping region UiUj, they could be patched together if given a consistent translation system between their results. What is actually needed is a set of isomorphisms

pi,j:Ti|Ui,jTj|Ui,j

that translate between them, and that these pi,j’s act in concert with one another. This (when precisely defined) is called descent data. The way it interacts with the definition of sheaf given in Definitions 7.2.3.2 and 7.2.3.5 is buried in the restriction maps ρ for the overlaps as subsets Ui,jUi and Ui,jUj (see Grothendieck and Raynaud [18] for details).

Application 7.2.3.7. Consider outer space as a topological space X. Different amateur astronomers record observations of what they see in X on a given night. Let C = [390, 700] denote the set of wavelengths in the visible light spectrum (written in nanometers). Given an open subset UX, let O(U) denote the set of functions UC. The presheaf O satisfies the sheaf condition; this is the taken-for-granted fact that we can patch together different observations of space.

Figure 7.1 (see page 377) shows three views of the night sky. Given a telescope position to obtain the first view, one moves the telescope right and a little down to obtain the second, and one moves it down and left to obtain the third. These are value assignments a1O(V1),a2O(V2), and a3O(V3), throughout subsets V1, V2, V3X (respectively). These subsets V1, V2, V3 cover some (strangely shaped) subset UX. Because the restriction of a1 to V1V2 is equal to the restriction of a2 to V1V2, and so on, the sheaf condition says that these three value assignments glue together to form a single value assignment throughout U, as shown in Figure 7.2 (see page 378).

Exercise 7.2.3.8.

Find an application of sheaves in your own domain of expertise.

Application 7.2.3.9. Suppose we have a sheaf for temperatures on earth. For every region U, we have a set of theoretically possible temperature assignments throughout U. For example, we may know that if it is warm in Texas, warm in Arkansas, and warm in Kansas, then it cannot be cold in Oklahoma. With such a sheaf O in hand, one can use facts about the temperature in one region U to predict the temperature in another region V.

The mathematics is as follows. Suppose given regions U, VX and a subset AO(U) corresponding to what we know about the temperature assignment throughout U. We take the following fiber product:

art

The image of the top composite im((ρU,X)1(A)O(V)) is a subset of O(V) telling us which temperature assignments are possible throughout V, given our knowledge A about the temperature throughout U.

We can imagine the same type of prediction systems for other domains as well, such as the energy of various parts of a material.

Example 7.2.3.10. Exercises 5.2.4.3 and 5.2.4.4 discussed the idea of laws being dictated or respected throughout a jurisdiction. If X is earth, to every jurisdiction UX we assign the set O(U) of laws that are dictated to hold throughout U. Given a law on U and a law on V, we can see if they amount to the same law on UV. For example, on U a law might say, “no hunting near rivers” and on V a law might say, “no hunting in public areas.” It happens that on UV all public areas are near rivers, and vice versa, so the laws agree there. These laws patch together to form a single rule about hunting that is enforced throughout the union UV, respected by all jurisdictions within it.

7.2.3.11   Sheaf of ologged concepts

Definition 7.2.3.5 defines what should be called a sheaf of sets. We can discuss sheaves of groups or even sheaves of categories. Here is an application of the latter.

Recall the notion of simplicial complexes (see Section 3.4.4.3). They look like this:

art

Given such a simplicial complex X, we can imagine each vertex vX0 as an entity with a worldview (e.g., a person) and each simplex as the common worldview shared by its vertices. To model this, we assign to each vertex vX an olog O(v), corresponding to the worldview held by that entity, and to each simplex uXn, we assign an olog O(u) corresponding to a common ground worldview. Recall that X is a subset of ℙ(X0); it is a preorder and its elements (the simplices) are ordered by inclusion. If u, v are simplices with uv, then we want a map of ologs (i.e., a schema morphism) O(v)O(u). In this way the model says that any idea shared among the people in v is shared among the people in u. Thus we have a functor O:XSch (where we forget the distinction between ologs and databases for notational convenience).

To every simplicial complex (indeed every ordered set) one can associate a topological space; in fact, we have a functor Alx : PrOTop, called the Alexandrov functor. Applying Alx(Xop), we have a space denoted X. One can visualize X as X, but the open sets include unions of simplices. There is a unique sheaf of categories on X that behaves like O on simplices of X.

Example 7.2.3.12. Imagine two groups of people G1 and G2 each making observations about the world. Suppose there is some overlap H = G1G2. Then it may happen that there is a conversation including G1 and G2, and both groups are talking about something (though using different words). H says, “You guys are talking about the same things, you just use different words.” In this case there is an observation being made throughout G1G2 that agrees with both those on G1 and those on G2.

7.2.3.13   Time

One can use sheaves to model objects in time; Goguen [17] gave an approach to this. For an approach that more closely fits the flow of this book, let C be a database schema. The lifespan of information about the world is generally finite; that is, what was true yesterday is not always the case today. Thus we can associate to each interval U of time the information that we deem to hold throughout U. This is sometimes called the valid time of the data.

If data is valid throughout U and we have a subset VU, then of course it is valid throughout V. And the sheaf condition holds too. If some information is valid throughout U, and some other information is valid throughout U′, and if these two things restrict to the same information on the overlap UV, then they can be glued together to form information that is valid throughout the union UV.

So we can model information change over time by using a sheaf of C-sets on the topological space ℝ. In other words, for every time interval, we give an C-instance whose information is valid throughout that time interval. Definition 7.2.3.5 only defined sheaves with values in Set; we are now generalizing to sheaves in CSet. Namely we consider functors Open(ℝ) → CSet satisfying the same sheaf condition.

Example 7.2.3.14. Consider a hospital in which babies are born. In our scenario, mothers enter the hospital, babies are born, mothers and babies leave the hospital. Let C be the schema

art

Consider the eight-hour intervals

Shift1(Jan 1, 00:0008:00),Shift2(Jan 1, 04:0012:00),Shift3(Jan 1, 08:0016:00).

The nurses take shifts of eight hours, overlapping with their predecessors by four hours, and they record in the database only patients that were there throughout their shift or throughout any overlapping shift. Here is the schema:

art

Whether or not this implementation of the sheaf semantics is most useful in practice is certainly debatable. But something like this could easily be useful as a semantics, i.e., a way of thinking about, the temporal nature of data.

7.3   Monads

Monads would probably not have been invented without category theory, but they have been useful in understanding algebraic theories, calculating invariants of topological spaces, and embedding nonfunctional operations into functional programming languages. We mainly discuss monads in terms of how they can help one make explicit a given modeling context and in so doing allow one to simplify the language used in such models. We use databases to give concrete examples.

Much of the following material on monads is taken from Spivak [40].

7.3.1   Monads formalize context

Monads can formalize assumptions about the way one does business throughout a domain. For example, suppose we want to consider functions that are not required to return a value for all inputs. These are not valid functions as defined in Section 2.1.2 (because they are not total), but in math classes one wants to speak of f(x)=1x and g(x) = tan(x) as though they were functions ℝ → ℝ, so that they can be composed without constantly paying attention to domains.

Functions that are not required to be defined throughout their domain are called partial functions. We all know how they should work, so we need a way to make it mathematically legal. Monads, and the Kleisli categories to which they give rise, provide us with a way to do so. In particular, we will be able to formally discuss the composition 1xtan(x).

Here we are drawing arrows between sets as though we were talking about total functions, but there is an implicit context in which we are actually talking about partial functions. Monads allow us to write maps between sets in the functional way while holding the underlying context. What makes them useful is that the notion of context we are using here is made formal.

Example 7.3.1.1 (Partial functions). Partial functions can be modeled by ordinary functions if we add a special “no answer” element to the codomain. That is, the set of partial functions AB is in one-to-one correspondence with the set of ordinary functions AB ⊔ {☺}. For example, suppose we want to model the partial function fp(x)1x21: in this way; we would use the total function ft : ℝ → ℝ ⊔ {☺} defined as:

f(x){1x21if x1 and x1,if x=1,if x=1.

An ordinary function g : AB can be considered a partial function because we can compose it with the inclusion

BB{}.(7.11)

to get AB ⊔ {☺}.

But how do we compose two partial functions written in this way? Suppose f : AB ⊔ {☺} and g : BC ⊔ {☺} are functions. First form a new function

gg{}:B{}C{}{},

then compose to get (g′ ○ f) : AC ⊔ {☺} ⊔ {☺}, and finally send both ☺’s to the same element by composing with

C{}{}C{}.(7.12)

How should one think about composing partial functions gf? Every element aA is sent by f either to an element bB or to “no answer.” If it has an answer f(a) ∈ B, then this again is sent by g either to an element g(f(a)) ∈ C or to “no answer.” We get a partial function AC by sending a to g(f(a)) if possible or to “no answer” if it gets stopped along the way.

This monad is sometimes called the maybe monad in computer science, because a partial function f : AB takes every element of A and may output just an element of B or may output nothing; more succinctly, it outputs a “maybe B.”

Exercise 7.3.1.2.

a. Let f : ℤ → ℤ ⊔ {☺} be the partial function given by f(n)=1n2n. Calculate the following: f(−3), f(−2), f(−1), f(0), f(1), and f(2).

b. Let g : ℤ → ℤ ⊔ {☺} be the partial function given by

g(n)={n23if n1,if n<1

Write fg(n) for −3 ⩽ n ⩽ 2.

Application 7.3.1.3. Experiments are supposed to be performed objectively, but suppose we imagine that changing the person who performs the experiment, say, in psychology, may change the outcome. Let A be the set of experimenters, let X be the parameter space for the experimental variables (e.g., X = Age × Income), and let Y be the observation space (e.g., Y = propensity for violence). We want to think of such an experiment as telling us about a function f : XY (how age and income affect propensity for violence). However, we may want to make some of the context explicit by including information about who performed the experiment. That is, we are really finding a function f : X × AY.

Given a set P of persons, the experimenter wants to know the age and income of each, i.e., a function PX. However, it may be the case that even ascertaining this basic information, which is achieved merely by asking each person these questions, is subject to which experimenter in A is doing the asking. Then we again want to consider the experimenter as part of the equation, replacing the function PX with a function P × AX. In such a case, we can use a monad to hide the fact that everything in sight is assumed to be influenced by A. In other words, we want to announce, once and for all, the modeling context—that every observable is possibly influenced by the observer—so that it can recede into the background.

We return to this in Examples 7.3.2.6 and 7.3.3.4.

7.3.2   Definition and examples

What aspects of Example 7.3.1.1 are about monads, and what aspects are about partial functions in particular? Monads are structures involving a functor and a couple of natural transformations. Roughly speaking, the functor for partial functors was BB ⊔ {☺}, and the natural transformations were given in (7.11) and (7.12). This section gives the definition of monads and a few examples. We return to consider about how monads formalize context in Section 7.3.3.

Definition 7.3.2.1 (Monad). A monad on Set is defined as follows: One announces some constituents (A. functor, B. unit map, C. multiplication map) and shows that they conform to some laws (1. unit laws, 2. associativity law). Specifically, one announces

A. a functor T : SetSet,

B. a natural transformation η : idSetT,

C. a natural transformation μ : TTT.

We sometimes refer to the functor T as though it were the whole monad; we call η the unit map and μ the multiplication map. One must then show that the following monad laws hold:

  1. The following diagrams of functors SetSet commute:

    art

  2. The following diagram of functors SetSet commutes:

    art

Example 7.3.2.2 (List monad). We now go through Definition 7.3.2.1 using the List monad. The first step is to give a functor List: SetSet, which was done in Example 5.1.2.20. Recall that if X = {p, q, r}, then List(X) includes the empty list [ ], singleton lists such as [p], and any other list of elements in X such as [p, p, r, q, p]. Given a function f : XY, one obtains a function List(f) : List(X) → List(Y) by entrywise application of f, as in Exercise 5.1.2.22.

As a monad, the functor List comes with two natural transformations, a unit map η and a multiplication map μ. Given a set X, the unit map ηX : X → List(X) returns singleton lists as follows:

art

Given a set X, the multiplication map μX : List(List(X)) → List(X) concatenates lists of lists as follows:

art

The naturality of η and μ means that these maps work appropriately well under entrywise application of a function f : XY. Finally, the three monad laws from Definition 7.3.2.1 can be exemplified as follows:

art

art

Exercise 7.3.2.3.

Let ℙ : SetSet be the power-set functor, so that given a function f : XY, the function ℙ(f) : ℙ(X) → ℙ(Y) is given by taking images.

a. Make sense of the statement, “With η defined by singleton subsets and with μ defined by union, T(,η,μ) is a monad.”

b. With X = {a, b}, write the function ηX as a two-row, two-column table.

c. With X = {a, b}, write the function μX as a sixteen-row, two-column table (you can stop after five rows if you fully understand it).

d. Check that you believe the monad laws from Definition 7.3.2.1.

Solution 7.3.2.3.

a. The statement suggests that the components of η : idSet → ℙ can be defined using the concept of singleton subsets and that the components of μ : ℙ ○ ℙ → ℙ can be defined using the concept of union. Given a set X ∈ Ob(Set), we need a function ηX : X → ℙ(X), meaning that for every element xX, we need a subset of X. The statement suggests we send x to the singleton subset {x} ⊆ X. The statement also suggests that we obtain μX : ℙ(ℙ(X)) → ℙ(X) by sending a set of subsets to their union. For example, if X = {1, 2, 3, 4, 5}, then an element T ∈ ℙ(ℙ(X)) might look like {{1, 2}, Ø, {1, 3, 5}}; the union of these subsets is μX(T) = {1, 2, 3, 5}, a subset of X. It is not hard to check that the given η and μ are natural transformations. The statement now asserts that the power-set functor ℙ, together with these natural transformations, forms a monad.

b.)

ηX

X

ℙ(X)

a

{a}

b

{b}

c.)

μX

ℙ(ℙ(X))

ℙ(X)

Ø

Ø

{Ø}

Ø

{{a}}

{a}

{{b}}

{b}

{{a, b}}

{a, b}

{Ø, {a}}

{a}

{Ø, {b}}

{b}

{Ø, {a, b}}

{a, b}

{{a}, {b}}

{a, b}

{{a, {a, b}}}

{a, b}

{{b}, {a, b}}

{a, b}

{Ø, {a}, {b}}

{a, b}

{Ø, {a}, {a, b}}

{a, b}

{Ø, {b}, {a, b}}

{a, b}

{{a}, {b}, {a, b}}

{a, b}

{Ø, {a}, {b}, {a, b}}

{a, b}

d. The monad laws hold. One says that if we take all the singleton subsets of X and union them, we get X. Another says that if we take the singleton set consisting of the whole set X and union it, we get X. The last says that the union of unions is a union.

Example 7.3.2.4 (Partial functions as a monad). Here is the monad for partial functions, as discussed in Example 7.3.1.1. The functor T : SetSet sends a set X to the set X ⊔ {☺}. Clearly, given a function f : XY, there is an induced function (f ⊔ {☺}) : (X ⊔ {☺}) → (Y ⊔ {☺}), so this is a functor. The natural transformation η : id → T is given on a set X by the component function

ηX:XX{}

that includes XX ⊔ {☺}. Finally, the natural transformation μ : TTT is given on a set X by the component function

μX:X{}{}X{}

that collapses both copies of ☺.

Exercise 7.3.2.5.

Let E be a set with elements refered to as exceptions. We imagine exceptions as warnings like “overflow!” or “division by zero!” and we imagine that a function f : XY outputs either a value or one of these exceptions. Let T : SetSet be the functor XXE. Follow Example 7.3.2.4 and find a unit map η and a multiplication map μ for which (T, η, μ) is a monad.

Example 7.3.2.6. Fix a set A. Let T : SetSet be the functor given by T(X) = XA = HomSet(A, X); this is a functor. For a set X and an element xX, let cx : AX be the constant-x function, cx(a) = x for all aA. Define ηX : XT(X) to be given by the constant-x function, xcx.

Now we have to specify a natural transformation μ : TTT, i.e., for each X ∈ Ob(Set), we need to provide an X-component function

μX:(XA)AXA.

By currying (see Example 7.1.1.8), this is equivalent to providing a function (XA)A × AX. For any Y ∈ Ob(Set), we have an evaluation function (see Exercise 3.4.2.5) ev : YA × AY. We use it twice and find the desired function:

(XA)A×A   ev×idA   XA×A  ev   X.

Remark 7.3.2.7. Monads can be defined on categories other than Set. In fact, for any category C, one can take Definition 7.3.2.1 and replace every occurrence of Set with C and obtain the definition for monads on C. We have actually seen a monad (Paths, η, μ) on the category Grph of graphs before, namely, in Examples 5.3.1.15 and 5.3.1.16. That is, Paths: GrphGrph, which sends a graph to its paths-graph is the functor part. The unit map η includes a graph into its paths-graph using the observation that every arrow is a path of length 1. And the multiplication map μ concatenates paths of paths. The Kleisli category of this monad (see Definition 7.3.3.1) is used, e.g., in (5.17), to define morphisms of database schemas.

7.3.3   Kleisli category of a monad

We are on our way to understanding how monads are used in computer science and how they may be useful for formalizing methodological context. There is only one more stop along the way, called the Kleisli category of a monad. For example, when we apply this Kleisli construction to the partial functions monad (Example 7.3.2.4), we obtain the category of partial functions (see Example 7.3.3.2). When we apply the Kleisli construction to the monad XXA of Example 7.3.2.6 we get the psychological experiment example (Application 7.3.1.3) completed in Example 7.3.3.4.

Definition 7.3.3.1. Let T=(T,η,μ) be a monad on Set. Form a new category, called the Kleisli category for T, denoted Kls(T), with sets as objects, Ob(Kls(T))Ob(Set), and with

HomKls(T)(X,Y)HomSet(X,T(Y))

for sets X, Y. The identity morphism idX : XX in Kls(T) is given by η : XT(X) in Set. The composition of morphisms f : XY and g : YZ in Kls(T) is given as follows. Writing them as functions, we have f : XT(Y) and g : YT(Z). The first step is to apply the functor T to g, giving T(g) : T(Y) → T(T(Z)). Then compose with f to get T(g) ○ f : XT(T(Z)). Finally, compose with μZ : T(T(Z)) → T(Z) to get the required function XT(Z):

art

The associativity of this composition formula follows from the associativity law for monads.

Example 7.3.3.2. Recall the monad T for partial functions, T(X) = X ⊔ {☺}, from Example 7.3.2.4. The Kleisli category Kls(T) has sets as objects, but a morphism f : XY means a function XY ⊔ {☺}, i.e., a partial function. Given another morphism g : YZ, the composition formula in Kls(T) ensures that gf : XZ has the appropriate behavior.

Note how this monad allows us to make explicit a context in which all functions are assumed partial and then hide this context from our notation.

Remark 7.3.3.3. For any monad T=(T,η,μ) on Set, there is a functor i:SetKls(T), given as follows. On objects we have Ob(Kls(T))=Ob(Set), so take i = idOb(Set). Given a morphism f : XY in Set, we need a morphism i(f) : XY in Kls(T), i.e., a function i(f) : XT(Y). We assign i(f) to be the composite XfYηT(Y). The functoriality of this mapping follows from the unit law for monads.

Example 7.3.3.4. In this example we return to the setting laid out in Application 7.3.1.3, where we had a set A of experimenters and assumed that the person doing the experiment might affect the outcome. We use the monad T=(T,η,μ) from Example 7.3.2.6 and hope that Kls(T) will conform to the understanding of how to manage the effect of the experimenter on data.

The objects of Kls(T) are ordinary sets, but a map f : XY in Kls(T) is a function XYA. By currying, this is the same as a function X × AY, as desired. To compose f with g : YZ in Kls(T), we follow the formula from (7.13). It turns out to be equivalent to the following. We have a function X × AY and a function Y × AZ. Multiplying by idA, we have a function X × AY × A, and we can now compose to get X × AZ.

What does this say in terms of experimenters affecting data gathering? It says that if we work within Kls(T), then we may assume that the experimenter is being taken into account; all proposed functions XY are actually functions A × XY. The natural way to compose these experiments is that we only consider the data from one experiment to feed into another if the experimenter is the same in both experiments.11

Exercise 7.3.3.5.

Exercise 7.3.2.3 discussed the power-set monad T=(,η,μ).

a. Can you find a way to relate the morphisms in Kls(T) to relations? That is, given a morphism f : AB in Kls(T), is there a natural way to associate to it a relation RA × B?

b. How does the composition formula in Kls(T) relate to the composition of relations given in Definition 3.2.2.3?12

Solution 7.3.3.5.

a. A morphism AB in Kls(T) is a function f : A → ℙ(B) in Set. From such a function we need to obtain a binary relation, i.e., a subset RA × B. Recall that for any set X (e.g., X = B or X = A × B), we can identify the subsets of X with the functions X → Ω = {True, False}, using the characteristic function as in Definition 3.4.4.12. In other words, we have a bijection

(X)HomSet(X,Ω).

By currying, we get an isomorphism

HomSet(A,(B))HomSet(A,HomSet(B,Ω))HomSet(A×B,Ω)(A×B).

In other words, we can identify the function f : A → ℙ(B) with an element of ℙ(A × B), i.e., with a subset RA × B, i.e., with a relation.

A more down-to-earth way to specify how f : A → ℙ(B) gives rise to a binary relation RA × B is as follows. We ask, given (a, b) ∈ A × B, when is it in R? We see that f(a) ∈ ℙ(B) is a subset, so the answer is that we put (a, b) ∈ R if bf(a). This gives the desired relation.

b. It is the same.

Exercise 7.3.3.6.

(Challenge) Let T=(,η,μ) be the power-set monad. The category Kls(T) is closed under binary products, i.e., every pair of objects A,BOb(Kls(T)) has a product in Kls(T). What is the product of A = {1, 2, 3} and B = {a, b}, and what are the projections?

Solution 7.3.3.6.

The product of A and B in Kls(T) is A × B = {1, 2, 3, a, b}, which coincidentally would be their coproduct in Set. The projection maps are functions (A)π1{1,2,3,a,b}π2(B); we use the obvious maps, e.g., π1(3) = {3} and π1(a) = Ø. The question did not ask for the universal property, but we specify it anyway. Given f : X → ℙ(A) and g : X → ℙ(B), we take 〈f, g〉: X → ℙ(AB} to be given by union.

Exercise 7.3.3.7.

(Challenge.) Let T=(,η,μ) be the power-set monad. The category Kls(T) is closed under binary coproducts, i.e., every pair of objects A,BOb(Kls(T)) has a coproduct in Kls(T). What is the coproduct of A = {1, 2, 3} and B = {a, b}?

Example 7.3.3.8. Let A be any preorder. We speak of A throughout this example as though it were the linear order given by time; however, the mathematics works for any A ∈ Ob(PrO).

There is a monad T=(T,η,μ) that captures the idea that a function f : XY occurs in the context of time in the following sense: The output of f is determined not only by the element xX on which it is applied but also by the time at which it was applied to x; and the output of f occurs at another time, which is not before the time of input.

The functor part of the monad is given on Y ∈ Ob(Set) by

T(Y)={p:AA×Y| if p(a)=(a,y) then aa}.

The unit ηY : YT(Y) sends y to the function a ↦ (a, y). The multiplication map μY : T(T(Y)) → T(Y) is as follows. Suppose given p : AA × T(Y) in T(T(Y)). Then μY (p) : AA × Y is given on aA as follows. Suppose p(a) = (a′, p′), where p′ : AA × Y. Then we assign μY (p)(a) = p′(a′) ∈ A × Y.

Given two sets X, Y, what is the meaning of a morphism XY in the Kleisli category Kls(T), i.e., a function f : XT(Y)? Note that T(Y) ⊆ HomSet(A, A × Y), and composing with f, we have a function X → HomSet(A, A × Y), which can be curried to a function f : A × XA × Y. So we have an isomorphism

HomKls(T)(X,Y){fHomSet(A×X,A×Y)| if f(a,x)=(a,y) then aa}.

The right-hand set could be characterized as time-sensitive functions f : XY for which the output arrives after the input.

Remark 7.3.3.9. One of the most important monads in computer science is the state monad. It is used when one wants to allow a program to mutate state variables (e.g., in the program

if x ⩽ 4, then xx + 1 else Print “done”

x is a state variable). The state monad is a special case of the monad discussed in Example 7.3.3.8. Given any set A, the usual state monad of type A is obtained by giving A the indiscrete preorder (see Example 4.4.4.5). More explicitly, it is a monad with functor part

X(A×X)A

(see Example 7.3.5.3).

Example 7.3.3.10. We reconsider Figure 1.1 reproduced as Figure 7.3.

art
Figure 7.3 An olog whose arrows do not denote functions. It should be interpreted using a monad.

It looks like an olog, and all ologs are database schemas (see Section 4.5.2.15). But how is “analyzed by a person yields” a function? For it to be a function, there must be only one hypothesis corresponding to a given observation. The very name of this arrow belies the fact that it is an invalid aspect in the sense of Section 2.3.2.1, because given an observation, there may be more than one hypothesis yielded, corresponding to which person is doing the observing. In fact, all the arrows in this figure correspond to some hidden context involving people: the prediction is dependent on who analyzes the hypothesis, the specification of an experiment is dependent on who is motivated to specify it, and experiments may result in different observations by different observers.

Without monads, the model of science proposed by this olog would be difficult to believe in. But by choosing a monad we can make explicit (and then hide from discourse) the implicit assumption that “this is all dependent on which human is doing the science.” The choice of monad is an additional modeling choice. Do we want to incorporate the partial order of time? Do we want the scientist to be modified by each function (i.e., the person is changed when analyzing an observation to yield a hypothesis)? These are all interesting possibilities.

One reasonable choice would be to use the state monad of type A, where A is the set of scientific models. This implies the following context. Every morphism f : XY in the Kleisli category of this monad is really a morphism f : X × AY × A; while ostensibly giving a map from X to Y, it is influenced by the scientific model under which it is performed, and its outcome yields a new scientific model.

Reading the olog in this context might look like this:

A hypothesis (in the presence of a scientific model) analyzed by a person produces a prediction (in the presence of a scientific model), which motivates the specification of an experiment (in the presence of a scientific model), which when executed results in an observation (in the presence of a scientific model), which analyzed by a person yields a hypothesis (in the presence of a scientific model).

The parenthetical statements can be removed if we assume them to be always there, which can be done using the preceding monad.

7.3.3.11   Relaxing functionality constraint for ologs

Section 2.3.2 said that every arrow in an olog has to be English-readable as a sentence, and it has to correspond to a function. For example, the arrow

art

makes for a readable sentence, but it does not correspond to a function because a person may have no children or more than one child. We call an olog in which every arrow corresponds to a function (the only option proposed so far in this book) a functional olog. Requiring that ologs be functional comes with advantages and disadvantages. The main advantage is that creating a functional olog requires more conceptual clarity, and this has benefits for the olog creator as well as for anyone to whom he tries to explain the situation. The main disadvantage is that creating a functional olog takes more time, and the olog takes up more space on the page.

In the context of the power-set monad (see Exercise 7.3.2.3), a morphism f : XY between sets X and Y, as objects in Kls(ℙ), becomes a binary relation on X and Y rather than a function (see Exercise 7.3.3.5). So in that context, the arrow in (7.14) becomes valid. An olog in which arrows correspond to mere binary relations rather than functions might be called a relational olog.

7.3.4   Monads in databases

This section discusses how to record data in the presence of a monad. The idea is quite simple. Given a schema (category) C, an ordinary instance is a functor I:CSet. But if T=(T,η,μ) is a monad, then a Kleisli T-instance on C is a functor J:CKls(T). Such a functor associates to every object cOb(C) a set J(c), and to every arrow f : cc′ in C a morphism J(f) : J(c) → J(c′) in Kls(T). How does this look in terms of tables?

Recall that to represent an ordinary database instance I:CSet, we use a tabular format in which every object cOb(C) is displayed as a table including one ID column and one additional column for each arrow f : cc′ emanating from c. The cells in the ID column of table c contain the elements of the set I(c), and the cells in the f column contain elements of the set I(c′).

To represent a Kleisli database instance J:CKls(T) is similar; we again use a tabular format in which every object cOb(C) is displayed as a table including one ID column and one additional column for each arrow f : cc′ emanating from c. The cells in the ID column of table c again contain the elements of the set J(c); however the cells in the f column do not contain elements of J(c′), but T-values in J(c′), i.e., elements of T(J(c′)).

Example 7.3.4.1. Let T=(T,η,μ) be the monad for partial functions (see Example 7.3.1.1). Given any schema C, we can represent a Kleisli T-instance I:CKls(T) in tabular format. For every object cOb(C) we have a set I(c) of rows, and given a column f : cc′, applying f to a row either produces a value in I(c′) or fails to produce a value; this is the essence of partial functions. We might denote the absence of a value using ☺.

Consider the schema indexing graphs

art

As discussed in Section 5.2.1.21, an ordinary instance on C represents a graph:

art

A Kleisli T-instance on C represents graphs in which edges can fail to have a source vertex, fail to have a target vertex, or both:

art

The context of these tables is that of partial functions, so we do not need a reference for ☺ in the vertex table. Mathematically, the morphism J(src) : J(Arrow) → J(Vertex) in Kls(T) needs to be a function J(Arrow) → J(Vertex) ⊔ {☺}, and it is.

7.3.4.2   Probability distributions

Let [0, 1] ⊆ ℝ denote the set of real numbers between 0 and 1. Let X be a set and p : X → [0, 1] a function. We say that p is a finitary probability distribution on X if there exists a finite subset WX such that

wWp(w)=1,(7.15)

and such that p(x) > 0 if and only if xW. Note that the subset W is unique if it exists; we call it the support of p and denote it Supp(p).

For any set X, let Dist(X) denote the set of finitary probability distributions on X. It is easy to check that given a function f : XY, one obtains a function Dist(f) : Dist(X) → Dist(Y) by Dist(f)(y) = Σf(x)=yp(x). Thus we can consider Dist : SetSet as a functor, and in fact the functor part of a monad. Its unit η : XDist(X) is given by the Kronecker delta function xδx, where δx(x) = 1 and δx(x′) = 0 for x′ ≠ x. Its multiplication μ : Dist(Dist(X)) → Dist(X) is given by weighted sum: given a finitary probability distribution w : Dist(X) → [0, 1] and xX, put μ(w)(x) = ΣpSupp(w) w(p)p(x).

Example 7.3.4.3 (Markov chains). Let Loop be the loop schema

art

as in Example 4.5.2.10. A Dist-instance on Loop is equivalent to a time-homogeneous Markov chain. To be explicit, a functor δ : LoopKls(Dist) assigns to the unique object s ∈ Ob(Loop) a set S = δ(s), called the state space, and to f : ss a function δ(f) : SDist(S), which sends each element xS to some probability distribution on elements of S. For example, the left-hand table δ (having states δ(s) = {a, b, c, d}) corresponds to the right-hand Markov matrix M:

art

As one might hope, for any natural number n ∈ ℕ, the map fn : SS in Kls(Dist) corresponds to the matrix Mn, which sends an element sS to its probable location after n iterations of the transition map, fn(s) ∈ Dist(S).

Application 7.3.4.4. Every star emits a spectrum of light, which can be understood as a distribution on the electromagnetic spectrum. Given an object B on earth, different parts of B will absorb radiation at different rates. Thus B produces a function from the electromagnetic spectrum to distributions of energy absorption. In the context of the probability distributions monad, we can record data on the schema

star    emits    wavelengths  absorbed by B  energies

The composition formula for Kleisli categories is the desired one: to each star we associate the weighted sum of energy absorption rates over the set of wavelengths emitted by the star.

7.3.5   Monads and adjunctions

There is a strong connection between monads and adjunctions: every adjunction creates a monad, and every monad comes from an adjunction. For example, the List monad (Example 7.3.2.2) comes from the free forgetful adjunction between sets and monoids

SetUFMon

(see Proposition 7.1.1.2). That is, for any set X, the free monoid on X is

F(X)=(List(X),[],++),

and the underlying set of that monoid is U(F(X)) = List(X). So the List functor is given by UF : SetSet. But a monad is more than a functor; it includes a unit map η and a multiplication map μ (see Definition 7.3.2.1). Luckily, the unit η and multiplication μ drop out of the adjunction too. First, we discuss the unit and counit of an adjunction.

Definition 7.3.5.1. Let C and D be categories, and let L:CD and R:DC be functors with adjunction isomorphism

αc,d:HomD(L(c),d)    HomC(c,R(d))

for any objects cOb(C) and dOb(D) (see Definition 7.1.1.1). The unit η:idCRL (resp. the counit ϵ:LRidD) of the adjunction is a natural transformation defined as follows.

Given an object cOb(C), we apply α to idL(c) : L(c) → L(c) to get the c component

ηc:cRL(c)

of η. Similarly given an object dOb(D) we apply α−1 to idR(d) : R(d) → R(d) to get the d component

ϵd:LR(d)d.

One checks that these components are natural.

Later we see how to use the unit and counit of any adjunction to make a monad. We first walk through the process in Example 7.3.5.2.

Example 7.3.5.2. Consider the adjunction SetUFMon between sets and monoids. Let T = UF : SetSet; this will be the functor part of the monad, and we have seen that T = List. The unit of the adjunction, η : idSetUF is precisely the unit of the monad: for any set X ∈ Ob(Set) the component ηX : X → List(X) is the function that takes xX to the singleton list [x] ∈ List(X). The monad also has a multiplication map μX : T(T(X)) → T(X), which amounts to concatenating a list of lists. This function comes about using the counit ϵ, as follows

TT=UFUF  idUϵidF  UF=T.

The general procedure for extracting a monad from an adjunction is analogous to the process shown in Example 7.3.5.2. Given any adjunction

CRLD,

we define T=RL:CC, we define η:idCT to be the unit of the adjunction (as in Definition 7.3.5.1), and we define μ:TTT to be the natural transformation idRϵ ⋄ idL : RLRLRL, obtained by applying the counit ϵ:LRidD.

This procedure produces monads on arbitrary categories C, whereas the definition of monad (Definition 7.3.2.1) considers only the case C=Set. However, Definition 7.3.2.1 can be generalized to arbitrary categories C by simply replacing every occurrence of the string Set with the string C. Similarly, the definition of Kleisli categories (Definition 7.3.3.1) considers only the case C=Set, but again the generalization to arbitrary categories C is straightforward.

Example 7.3.5.3. Let A ∈ Ob(Set) be a set, and recall the currying adjunction

Set  YYA    XX×A  Set,

discussed briefly in Example 7.1.1.8. The corresponding monad StA is typically called the state monad of type A in programming language theory. Given a set X, we have

StA(X)=(A×X)A.

In the Kleisli category Kls(StA) a morphism from X to Y is a function of the form X → (A × Y)A, but this can be curried to a function A × XA × Y.

As discussed in Remark 7.3.3.9, this monad is related to holding onto an internal state variable of type A. Under the state monad StA, every morphism written XY, when viewed as a function, takes as input not only an element of X, but also the current state aA, and it produces as output not only an element of Y, but also an updated state.

Computer scientists in programming language theory have found monads very useful (Moggi [33]). In much the same way, monads on Set might be useful in databases (see Section 7.3.4). Another, totally different way to use monads in databases is by using a mapping between schemas to produce in each one an internal model of the other. That is, for any functor F:CD, i.e., mapping of database schemas, the adjunction (ΣF, ΔF) produces a monad on CSet, and the adjunction (ΔF, ΠF) produces a monad on DSet. If one interprets the List monad as producing in Set an internal model of the category Mon of monoids, one can similarly interpret these monads on CSet and DSet as producing internal models of each within the other.

7.4   Operads

This section briefly introduces operads, which are generalizations of categories. They often are useful for speaking about self-similarity of structure. For example, we use operads to model agents made up of smaller agents, or materials made up of smaller materials. This association with self-similarity is not really inherent in the definition, but it tends to emerge in thinking about many operads used in practice.

Let me begin with a warning.

Warning 7.4.0.4. My use of the term operad is not entirely standard and conflicts with widespread usage. The more common term for what I am calling an operad is colored operad or symmetric multicategory. An operad classically is a multicategory with one object, and a colored operad is a multicategory with possibly many objects (one for each “color”). The term multicategory stems from the fact that the morphisms in a multicategory have many, rather than one, domain object. One reason I prefer not to use the term multicategory is that there is nothing really “multi” about the multicategory itself, only its morphisms. Further, I do not see enough reason to differentiate, given that the term multicategory seems rather clunky and the term operad seems rather sleek. I hope my break with standard terminology does not cause confusion.

This introduction to operads is quite short; see Leinster [25] for an excellent treatment. Operads are also related to monoidal categories, a subject that is not elaborated in this book to discuss, but which was briefly mentioned when discussing topological enrichment in Example 5.2.3.3. Many of the following operads are actually monoidal categories in disguise.

7.4.1   Definition and classical examples

An operad is like a category in that it has objects, morphisms, and a composition formula, and it obeys an identity law and an associativity law. The difference is that each morphism f in an operad can have many inputs (and one output):

art

The description of composition in an operad is a bit more complicated than for a category, because it involves much more variable indexing; however, the idea is straightforward. Figure ?? shows morphisms being composed. Note that S and T disappear from the composition, but this is analogous to the way the middle object disappears from the composition of morphisms in a category

art

Here is the definition, taken from Spivak [41]. Skip to Example 7.4.1.3 if the definition gets too difficult.

Definition 7.4.1.1. An operad O is defined as follows: One announces some constituents (A. objects, B. morphisms, C. identities, D. compositions) and shows that they conform to some laws (1. identity law, 2. associativity law). Specifically, one announces

A. a collection Ob(O), each element of which is called an object of O;

B. for each object yOb(O), finite set n ∈ Ob(Fin), and n-indexed set of objects x:nOb(O), a set On(x;y)Ob(Set); its elements are called morphisms from x to y in O;

C. for every object xOb(O), a specified morphism, denoted idxO1(x;x) and called the identity morphism on x.

D. Let s : mn be a morphism in Fin. Let zOb(O) be an object, let y:nOb(O) be an n-indexed set of objects, and let x:mOb(O) be an m-indexed set of objects. For each element in, write mis−1(i) for the pre-image of s under i, and write xi=x|mi:miOb(O) for the restriction of x to mi. Then one announces a function

:On(y;z)×inOmi(xi;y(i))Om(x;z),(7.17)

called the composition formula.

Given an n-indexed set of objects x:nOb(O) and an object yOb(O), we sometimes abuse notation and denote the set of morphisms from x to y by O(x1,,xn;y).13 We may write HomO(x1,,xn;y), in place of O(x1,,xn;y), when convenient. We can denote a morphism ϕOn(x;y) by ϕ : xy or by ϕ : (x1, …, xn) → y; we say that each xi is a domain object of ϕ and that y is the codomain object of ϕ. We use infix notation for the composition formula, e.g., ψ ○ (ϕ1, …, ϕn).

One must then show that the following operad laws hold:

  1. For every x1,,xn,yOb(O) and every morphism ϕ : (x1, …, xn) → y, we have

    ϕ(idx1,...,idxn)=ϕ and idyϕ=ϕ.

  2. Let msntp be composable morphisms in Fin. Let zOb(O) be an object, let y:pOb(O), x:nOb(O), and w:mOb(O) respectively be a p-indexed, n-indexed, and m-indexed set of objects. For each ip, write ni = t−1(i) for the pre-image and xi:niOb(O) for the restriction. Similarly, for each kn, write mk = s−1(k) and wk:mkOb(O); for each ip, write mi,− = (ts)−1(i) and wi,:mi,Ob(O); for each jni, write mi,js−1(j) and wi,j:mi,jOb(O). Then the following diagram commutes:

    art

Remark 7.4.1.2. This remark considers the abuse of notation in Definition 7.4.1.1 and how it relates to an action of a symmetric group on each morphism set in the definition of operad. We follow the notation of Definition 7.4.1.1, especially the use of subscripts in the composition formula.

Suppose that O is an operad, zOb(O) is an object, y:nOb(O) is an n-indexed set of objects, and ϕ : yz is a morphism. If we linearly order n, enabling us to write ϕ : (y(1), …, y(|n|)) → z, then changing the linear ordering amounts to finding an isomorphism of finite sets σ:mn, where |m| = |n|. Let x = yσ, and for each in, note that mi = σ−1({i}) = {σ−1(i)}, so xi=x|σ1(i)=y(i). Taking idxiOmi(xi;y(i)) for each in, and using the identity law, we find that the composition formula induces a bijection On(y;z)Om(x;z), which we might denote

σ:O(y(1),y(2),,y(n);z)O(y(σ(1)),y(σ(2)),,y(σ(n));z).(7.18)

In other words, the permutation group Aut(n) acts on the set On of n-ary morphisms by permuting the order of the domain objects Ob(O)n.

Throughout this book, we allow this abuse of notation and speak of morphisms ϕ : (y1, y2, …, yn) → z for a natural number n ∈ ℕ, without mentioning the abuse inherent in choosing an order, as long as it is clear that permuting the order of indices would not change anything up to the canonical isomorphism of (7.18).

Example 7.4.1.3 (Little squares operad). An operad commonly used in mathematics is called the little n-cubes operad. We will focus on n = 2 and talk about the little squares operad O. Here the set of objects has only one element, denoted by a square, Ob(O)={}. For a natural number n ∈ ℕ, a morphism f : (▫, ▫, …, ▫) → ▫ is a positioning of n nonoverlapping squares inside of a square. Figure 7.5 shows a morphism (X1, X2, X3) → Y, where X1 = X2 = X3 = Y = ▫.

The composition formula says that given a positioning of small squares inside a large square, and given a positioning of tiny squares inside each of those small squares, we get a positioning of tiny squares inside a large square. See Figure 7.6.

Example 7.4.1.3 exemplifies the kind of self-similarity mentioned on page 362.

Exercise 7.4.1.4.

Consider an operad O like the little squares operad from Example 7.4.1.3, except with three objects: square, circle, equilateral triangle. A morphism is again a nonoverlapping positioning of shapes inside a shape.

a. Draw an example of a morphism f from two circles and a square to a triangle.

b. Find three other morphisms that compose into f, and draw the composite.

Solution 7.4.1.4.

a.

art

b.

art

Example 7.4.1.5. Let Sets denote the operad defined as follows. As objects we put Ob(Sets) = Ob(Set). For a natural number n ∈ ℕ and sets X1, …, Xn, Y, put

HomSets(X1,,Xn;Y)HomSet(X1××Xn,Y).

Given functions f1:(X1,1××X1,m1)Y1 through fn:(Xn,1××Xn,mn)Yn and a function Y1 × ⋯ × YnZ, the universal property provides a unique function of the form (X1,1××Xn,mn)Z, giving rise to the composition formula in Sets.

7.4.1.6   Operads: functors and algebras

If operads are like categories, then we can define things like functors and call them operad functors.

Warning 7.4.1.7. What is called an operad functor in Definition 7.4.1.8 is usually called an operad morphism. I think the terminology clash between morphisms of operads and morphisms in an operad is confusing. It is similar to what would occur in regular category theory (see Chapter 5) if we replaced the term functor with the term category morphism.

Definition 7.4.1.8. Let O and O be operads. An operad functor from O to O, denoted F:OO, is defined as follows. One announces some constituents (A. on-objects part, B. on-morphisms part) and shows that they conform to some laws (1. preservation of identities, 2. preservation of composition). Specifically, one announces

A. a function Ob(F):Ob(O)Ob(O), sometimes denoted simply F:Ob(O)Ob(O);

B. for each object yOb(O), finite set n ∈ Ob(Fin), and n-indexed set of objects x:nOb(O), a function

Fn:On(x;y)On(Fx;Fy).

One must then show that the following operad functor laws hold:

  1. For each object xOb(O), the equation F(idx) = idFx holds.
  2. Let s : mn be a morphism in Fin. Let zOb(O) be an object, let y:nOb(O) be an n-indexed set of objects, and let x:mOb(O) be an m-indexed set of objects. Then, with notation as in Definition 7.4.1.1, the following diagram of sets commutes:

    art

We denote the category of operads and operad functors Oprd.

Exercise 7.4.1.9.

Let O denote the little squares operad from Example 7.4.1.3, and let O denote the little shapes operad you constructed in Exercise 7.4.1.4.

a. Can you find an operad functor F:OO?

b. Is it possible to find an operad functor G:OO?

Definition 7.4.1.10 (Operad algebra). Let O be an operad, and let Sets be the operad from Example 7.4.1.5. An algebra on O is an operad functor A:OSets.

Remark 7.4.1.11. Every category can be construed as an operad (there is a functor CatOprd), one in which every morphism is unary. That is, given a category C, one makes an operad O with Ob(O)Ob(C) and with

HomO(x1,,xn;y)={HomC(x1,y)if n=1,if n1.

Throughout the book a connection is made between database schemas and categories (see Section 5.2.2), under which a schema C is construed as a category presentation, i.e., by generators and relations. Similarly, it is possible to discuss operad presentations O, again by generators and relations. Under this analogy, an instance CSet of the database (see Section 5.2.2.6) corresponds to an algebra OSets of the operad.

7.4.2   Applications of operads and their algebras

Hierarchical structures seem to be well modeled by operads. A hierarchical structure often has basic building blocks and instructions for how they can be put together into larger building blocks. Describing such structures using operads and their algebras allows one to make appropriate distinctions between different types of thinking, which may otherwise be blurred. For example, the abstract building instructions should be encoded in the operad, whereas the concrete building blocks should be encoded in the algebra. Morphisms of algebras are high-level understandings of how building blocks of very different types (such as materials versus numbers) can occupy the same place in the structure and be compared.

We get a general flavor of these ideas in the following examples.

Application 7.4.2.1. Every material is composed of constituent materials, arranged in certain patterns. (In case the material is pure, we consider the material to consist of itself as the sole constituent.) Each of these constituent materials is itself an arrangement of constituent materials. Thus a kind of self-similarity can be modeled with operads.

For example, a tendon is made of collagen fibers that are assembled in series and then in parallel, in a specific way. Each collagen fiber is made of collagen fibrils that are again assembled in series and then in parallel, with slightly different specifications. We can continue, perhaps indefinitely. Going a bit further, each collagen fibril is made up of tropocollagen collagen molecules, which are twisted ropes of collagen molecules, and so on.14

Here is how operads might be employed. We want the same operad to model all three of the following: actual materials, theoretical materials, and functional properties. That is, we want more than one algebra on the same operad.

The operad O should abstractly model the structure but not the substance being structured. Imagine that each of the shapes, say a triangle, in Figure (7.7) is a placeholder that indicates “your triangular material here.” Each morphism represents a construction of a material out of parts.

Application 7.4.2.2. Suppose we have chosen an operad O to model the structure of materials. Say each object of O corresponds to a certain quality of material, and each morphism corresponds to an arrangement of various qualities to form a new quality. An algebra A:OSets on O requires us to choose what substances will fill in for these qualities. For every object xOb(O), we want a set A(x) that will be the set of materials with that quality. For every arrangement, i.e., morphism, f : (x1, …, xn) → y, and every choice a1A(x1), …, anA(xn) of materials, we need to understand what material a′ = A(f)(a1, …, an) ∈ A(y) will emerge when materials a1, …, an are arranged in accordance with f.

There may be more than one interesting algebra on O. Suppose that B:OSets is an algebra of strengths rather than of materials. For each object xOb(O), which represents some quality, we let B(x) be the set of possible strengths that something of quality x can have. Then for each arrangement, i.e., morphism, f : (x1, …, xn) → y, and every choice b1B(x1), …, bnB(xn) of strengths, we need to understand what strength b′ = B(f)(b1, …, bn) ∈ B(y) will emerge when strengths b1, …, bn are arranged in accordance with f.

Finally, a morphism of algebras S : AB would consist of a coherent system for assigning to each material aA(X) of a given quality x a specific strength S(a) ∈ B(X), in such a way that morphisms behave appropriately. One can use the language of operads and algebras to state a very precise goal for the field of material mechanics.

Exercise 7.4.2.3.

Consider again the little squares operad O from Example 7.4.1.3. Suppose we want to use this operad to describe photographic mosaics.

a. Devise an algebra P:OSets that sends the square to the set M of all photos that can be pasted into that square. What does P do on morphisms in O?

b. Devise an algebra C:OSets that sends each square to the set of all colors (visible frequencies of light). In other words, C(▫) is the set of colors, not the set of ways to color the square. What does C do on morphisms in O. Hint: Use some kind of averaging scheme for the morphisms.

c. Guess: If someone were to appropriately define morphisms of O-algebras (something akin to natural transformations between functors OSets), do you think there would be some morphism of algebras PC?

7.4.2.4   Relations and wiring diagrams

Example 7.4.2.5. Here we describe an operad of relations, denoted R. The objects are sets, Ob(R)=Ob(Set). A morphism f : (X1, X2, …, Xn) → Y in R is a relation

RX1×X2××Xn×Y.(7.20)

We use a composition formula similar to that in Definition 3.2.2.3. Namely, to compose relations R1, …, Rn with S, we first form a fiber product, denoted FP:

art

We have an induced function FP(in¯jmi¯Xi,j)×Z, and its image is the subset we take to be the composite: S(R1,,Rn)(in¯jmi¯Xi,j)×Z. This gives a composition formula, for which the associativity and identity laws hold, so we indeed have an operad R.

Application 7.4.2.6. Suppose we are trying to model life in the following way. We define an entity as a set of available experiences. We also want to be able to put entities together to form a superentity, so we have a notion of morphism f : (X1, …, Xn) → Y defined as a relation, as in (7.20).

The idea is that the morphism f is a way of translating between the experiences available to the subentities and the experiences available to the superentity. The superentity Y consists of some available experiences, like “hunger” ∈ Y. The subentities Xi each have their own set of available experiences, like “U88fh” ∈ X2. The relation RX1 × … × Xn × Y provides a way to translate between them. It says that when X1 is experiencing “acidic” and X2 is experiencing “U88fh,” and so on, this is the same as Y experiencing “hunger.”

The operad R from Example 7.4.2.5 becomes useful as a language for discussing issues in this domain.

Example 7.4.2.7. Let R be the operad of relations from Example 7.4.2.5, and recall that Ob(R)=Ob(Set). Consider the algebra S:RSets given by S(X) = ℙ(X) for XOb(R). Given a morphism R ⊆ ∏i Xi × Y and subsets XiXi, we have a subset iXiiXi. We take the fiber product

art

and the image of FPY is a subset of Y, as needed. We will continue with Application 7.4.2.8 using this algebra.

Application 7.4.2.8. Following Application 7.4.2.6 we can use Example 7.4.2.7 as a model of survival. Each entity Y survives only for a subset of the phenomena that it can experience. Under this interpretation, the algebra from Example 7.4.2.7 defines survival of an entity as the survival of all parts.

Suppose that we understand how the experiences of a superentity Y relate to those of subentities X1, …, Xn in the sense that we have a morphism f : (X1, …, Xn) → Y in R. In the language of Application 7.4.2.6, we have a translation between the set of experiences available across the sub-entities and the set of experiences available to the superentity. Our algebra postulates that the superentity will survive exactly those experiences for which each subentity survives.

Another way to phrase this, rather than in terms of survival, would be in terms of allowance. A bureaucracy consists of a set of smaller bureaucracies, each of which allows certain requests to pass; the whole bureaucracy allows a request to pass if and only if, when the request is translated into the perspective of each subbureaucracy, it is allowed to pass there.

Exercise 7.4.2.9.

Define the following six sets, A = B = M = C = N = Z = ℤ, and consider them as objects A,B,M,C,N,ZOb(R).

a. How would you encode the relations

ab=m2, c2=m3, m+n=z

as a 2-ary morphism R1 : (A, B) → M, a 1-ary morphism R2 : (C) → N, and a 2-ary morphism S : (M, N) → Z in the operad R?

b. What is the domain and codomain of the composite S ○ (R1, R2)?

c. Write the composite S ○ (R1, R2) as a relation.

Example 7.4.2.10. This example discusses wiring diagrams. This operad is denoted W (see [41]). An object of W is just a finite set, Ob(W)=Ob(Fin), elements of which are called wires. A morphism in W is shown in Figure 7.8 (see page 382) and is formalized as follows. Given objects C1, …, Cn, and D, a morphism (C1, …, Cn) → D is a commutative diagram of sets

art

such that p and q are jointly surjective.

Composition of morphisms is easily understood in graphic form: Given wiring diagrams inside of wiring diagrams, we can throw away the intermediary circles. In terms of sets, we first take the pushout PO:

art

and then take the composition to be the image of (in¯jmi¯Ci,j)EPO.

Exercise 7.4.2.11.

Let C1 = {a, b, m}, C2 = {c, n}, C3 = {m, n, z}, let C = C1C2C3, and let D = {a, c, z}.

a. Suppose we draw C1, C2, and C3 as follows:

art

Follow those examples to draw D.

b. What set G and functions CpGqD in (7.21) correspond to this picture?

art

Solution 7.4.2.11.

a. We can draw D = {a, c, z} as follows:

art

b. Here G = {a, b, m, c, n, z}. The functions CpGqD are given in the following tables:

art

Example 7.4.2.12. Let’s continue with the operad W of wiring diagrams, and try to form an algebra on it. Taking R to be the operad of relations as described in Example 7.4.2.5, there is an operad functor Q:WR. It assigns to each COb(W) the set COb(R)=Ob(Set). To a morphism G : (C1, …, Cn) → D as in (7.21) it assigns the relation

G(in¯Ci)×D.

The idea is that to an entity defined as having a bunch of cables carrying integers, a phenomenon is the same thing as a choice of integer on each cable. A wiring diagram translates between phenomena experienced locally and phenomena experienced globally.

Now recall the algebra S:RSet from Example 7.4.2.7. We can compose with Q to get QSQ:WSet.

Exercise 7.4.2.13.

Consider the wiring diagrams operad W from Example 7.4.2.10. Let’s continue with Exercise 7.4.2.11 so that “everything,” i.e., C1, C2, C3, D, G, i, and j, are as in that exercise. By Example 7.4.2.12 we have an algebra Q:WSet.

a. What might we mean by saying that the following picture represents an element q1Q′(C1)?

art

b. Suppose we have the following elements q1Q′(C1), q2Q′(C2), and q3Q′(C3):

art

Given the wiring diagram G : (C1, C2, C3) → D pictured here,

art

what is G(q1, q2, q3) ∈ Q′(D)?

Application 7.4.2.14. In cognitive neuroscience or in industrial economics, it may be that we want to understand the behavior of an entity such as a mind, a society, or a business in terms of its structure. Knowing the connection pattern (connectome, supply chain) of subentities should help us understand how big changes are generated from small ones.

Application 7.4.2.15. In [36], Radul and Sussman discuss propagator networks. Their implementation can presumably be understood in terms of wiring diagrams and their algebra of relations.

art
Figure 7.1 Three overlapping views of the night sky. Source: NASA, ESA, Digitized Sky Survey Consortium.
art
Figure 7.2 The three overlapping views have been glued together into one coherent view.
art
Figure 7.4 The composition of morphisms f1 and f2 with g.
art
Figure 7.5 A morphism (X1, X2, X3) → Y in an operad with only one object, ▫.
art
Figure 7.6 A morphism (X1, X2, X3) → Y and morphisms (W1,1, W1,2) → X1, (W2,1, W2,2, W2,3) → X2, and (W3,1) → X3, each of which is a positioning of squares inside a square. The composition formula is given by scaling and positioning the squares to give (W1,1, W1,2, W2,1, W2,2, W2,3, W3,1) → Y.
art
Figure 7.7 A morphism expressing the construction of a material from smaller materials.
art
Figure 7.8 Morphisms in a wiring diagram operad W. Composition of wiring diagrams is given by substitution.

__________________

1Throughout this definition, notice that B’s come before A’s, especially in (7.1), which might be confusing. It was a stylistic choice to match with the Babies and Adults discussion.

2The natural isomorphism α (see Proposition 5.3.2.12) is between two functors Bop×ASet, namely, the functor (B,A)HomA(L(B),A) and the functor (B,A)HomB(B,R(A)).

3Conversely, for any g : BR(A) in B, we refer to αB,A1(g):L(B)A as the adjunct of g.

4The left adjoint does not have to be called L, nor does the right adjoint have to be called R, of course.

5FQL is available on the Internet. See http://categoricaldata.net/fql.html.

6This example was taken from Spivak [38].

7Repeated for convenience,

art

I:CSet is

art

8Technically C has to be small but, as mentioned in Remark 5.1.1.2), we do not worry about that distinction in this book.

9There is a lot of clutter here. Note that “firstChild(mother(☺))” is a row in the Child table of YChild. Assuming that the math follows the meaning, if ☺ points to Amy, where should firstChild(mother(☺)) point?

10Parentheses are used to denote open intervals of real numbers. For example, (6, 9) denotes the set {x ∈ ℝ | 6 < x < 9}.

11This requirement is somewhat stringent, but it can be mitigated in a variety of ways. One such way would be to model the ability to hand off the experimental results to another person, who would then carry them forward. This could be done by defining a preorder structure on A to model who can hand off to whom (see Example 7.3.3.8).

12Actually, Definition 3.2.2.3 is about composing spans, but a relation RA × B is a kind of span, RA × B.

13There are three abuses of notation when writing O(x1,,xn;y). First, it confuses the set n ∈ Ob(Fin) with its cardinality |n| ∈ ℕ. But rather than writing O(x1,,x|n|;y), it would be more consistent to write O(x(1),,x(|n|);y) because we have assigned subscripts another meaning in part D. But even this notation unfoundedly suggests that the set n has been endowed with a linear ordering, which it has not. This may be seen as a more serious abuse, but see Remark 7.4.1.2.

14Thanks to Professor Sandra Shefelbine for explaining the hierarchical nature of collagen to me. Any errors are my own.