Down and Dirty with Semantic Set-theoretic Types (a tutorial) v0.4
1 Introduction
1.1 Prerequisites
1.2 Example Grammar and Function Definitions
2 Set-theoretic Types:   An Overview
2.1 Subtyping
2.2 Semantic Subtyping
2.3 Deciding Inhabitation, Normal Forms
2.3.1 Types in Disjunctive Normal Form
2.3.2 Partitioning Types
3 Set-theoretic Type Representation
3.1 Types as Data Structures
3.1.1 Top and Bottom Type Representation
3.1.2 Type Operations
3.2 Base DNF Representation
3.2.1 Base DNF Operations
3.3 Product and Function DNFs
3.3.1 Binary Decision Diagrams
3.3.2 Types as BDDs?
3.3.3 Types as Lazy BDDs!
3.3.4 Lazy BDD Operations
3.4 Parsing and Example Types
4 Implementing Semantic Subtyping
4.1 Deciding Type Inhabitation
4.1.1 Product Type Inhabitation
4.1.2 Function Type Inhabitation
5 Other Key Type-level Functions
5.1 Product Projection
5.1.1 Implementing Product Projection
5.2 Function Domain
5.2.1 Implementing Function Domain
5.3 Function Application
5.3.1 Implementing Function Application
6 Strategies for Testing
Bibliography
8.3

Down and Dirty with Semantic Set-theoretic Types (a tutorial) v0.4

Fork me on GitHub

Andrew M. Kent <pnwamk@gmail.com>

Last updated: Monday, November 22nd, 2021

    1 Introduction

      1.1 Prerequisites

      1.2 Example Grammar and Function Definitions

    2 Set-theoretic Types: An Overview

      2.1 Subtyping

      2.2 Semantic Subtyping

      2.3 Deciding Inhabitation, Normal Forms

        2.3.1 Types in Disjunctive Normal Form

        2.3.2 Partitioning Types

    3 Set-theoretic Type Representation

      3.1 Types as Data Structures

        3.1.1 Top and Bottom Type Representation

        3.1.2 Type Operations

      3.2 Base DNF Representation

        3.2.1 Base DNF Operations

      3.3 Product and Function DNFs

        3.3.1 Binary Decision Diagrams

        3.3.2 Types as BDDs?

        3.3.3 Types as Lazy BDDs!

        3.3.4 Lazy BDD Operations

      3.4 Parsing and Example Types

    4 Implementing Semantic Subtyping

      4.1 Deciding Type Inhabitation

        4.1.1 Product Type Inhabitation

        4.1.2 Function Type Inhabitation

    5 Other Key Type-level Functions

      5.1 Product Projection

        5.1.1 Implementing Product Projection

      5.2 Function Domain

        5.2.1 Implementing Function Domain

      5.3 Function Application

        5.3.1 Implementing Function Application

    6 Strategies for Testing

    Bibliography

1 Introduction

This is a “living” document: please submit bug reports and pull requests if you spot a problem! https://github.com/pnwamk/sst-tutorial/

This is an informal tutorial designed to:

Most of the "pseudo code" in this tutorial was generated directly from a functioning redex model  (Klein et al. 2012) to greatly reduce the chance for typos or other subtle bugs in their presentation.

We would like to thank Giuseppe Castagna for the time he spent writing the wonderful, detailed manuscript Covariance and Contravariance: a fresh look at an old issue (a primer in advanced type systems for learning functional programmers)  (Castagna 2013) which we used to initially understand the implementation details we will discuss, as well as Alain Frisch, Giuseppe Castagna, and Véronique Benzaken for their detailed journal article Semantic subtyping: Dealing set-theoretically with function, union, intersection, and negation types  (Frisch et al. 2008) which contains thorough mathematical and technical descriptions of semantic subtyping. Without those works this tutorial would simply not be possible! We highly recommend perusing those works as well for readers who are interested in this material.

1.1 Prerequisites

We assume the reader has some mathematical maturity and is familiar with terminology commonly used by computer scientists for describing programming languages and types. In particular, we assume basic familiarity with:

1.2 Example Grammar and Function Definitions

To be clear about how our term and function definitions should be read we start by briefly examining a simple well-understood domain: Peano natural numbers.

Here is a grammar for Peano naturals and booleans:

image

Figure 1: a context-free grammar for natural numbers and booleans

This grammar states that a natural number is either image (zero) or image (i.e. the successor of some natural number p), and that a boolean must be either image or image.

We can then define the function image to be addition by structural recursion on the first argument:

image

Figure 2: addition for natural numbers

Example usages:

image

image

image

image will be defined to be a function that returns a image indicating whether the first argument is strictly less than the second:

image

Figure 3: less than for natural numbers

Example usages:

image

image

image

image will be defined to be a function that returns a image indicating whether the first argument is strictly greater-than the second:

image

Figure 4: greater than for natural numbers

Example usages:

image

image

image

Things to note about these definitions (and about definitions in general):

Important Note: We will "overload" certain convenient symbols during the course of our discussion since we can always distinguish them based on the types of arguments they are given. e.g. we will define several "union" operations, all of which will use the standard set-theoretic symbol image. But when we see image and image, for example, we will know the former usage of image is the version explicitly defined to operate on image’s and the latter is the version explicitly defined to operate on image’s. Like the function definitions above, all function definitions will have clear signatures describing the kinds of arguments they operate on.

2 Set-theoretic Types: An Overview

Set-theoretic types are a flexible and natural way for describing sets of values, featuring intuitive "logical combinators" in addition to traditional types for creating detailed specifications.

image

Figure 5: set-theoretic types

As figure 5 illustrates, languages with set-theoretic types feature (at least some of) the following logical type constructors:

Additionally, we may specify "specific top types", which for each kind of atomic type denotes all values of that particular kind:

image

Figure 6: specific top type definitions

Set theoretic types frequently appear in type systems which reason about dynamically typed languages (e.g. TypeScript, Flow, Typed Racket, Typed Clojure, Julia), but some statically typed languages use them as well (e.g. CDuce, Pony).

2.1 Subtyping

With set-theoretic types, the programmer (and the type system) must be able to reason about how types that are not equivalent relate. i.e., even though image is not the same type as image, is it the case that a value of type image will necessarily also be a value of type image? In other words, does image hold (i.e. is τ a subtype of σ)?

For example, consider the following subtyping question:

image

Clearly the two types are not equal... but we can also see that any pair whose first element is either an integer or a string and whose second element is a string (i.e. the type on the left-hand side) is indeed either a pair with an integer and a string or a pair with a string and a string (i.e. the type on the right-hand side). As a programmer then we might reasonably expect that anywhere a image is expected, we could provide a value of type image and things should work just fine.

Unfortunately, many (most?) systems that feature set-theoretic types use sound but incomplete reasoning to determine subtyping. This is because most type systems reason about subtyping via syntactic inference rules:

image

These rules allow us to conclude the statement below the line if we can show that the statement(s) above the line hold. Upsides to using a system built from rules like this include (1) the rules can often directly be translated into efficient code and (2) we can generally examine each rule individually and decide if the antecedants necessarily imply the consequent (i.e. determine if the rule is valid). The downside is that such rules are incomplete for set-theoretic types: it is impossible to derive all valid subtyping judgments, e.g. we cannot conclude image is a subtype of image even though it is true.

For a complete treatment of subtyping for set-theoretic types, a semantic (instead of a syntactic) notion of subtyping is required.

At the time of writing this tutorial, CDuce may be the only example of an in-use language with a type system which features set-theoretic types and complete subtyping. This is not surprising since its developers are also the researchers that have pioneered the approaches we will discuss.

2.2 Semantic Subtyping

Instead of using a syntactic approach to reason about subtyping, we will instead us a semantic approach: types will simply denote sets of values in the language in the expected way.

Our description here omits many interesting subtleties and details about why this approach more or less "just works"; Frisch et al. (2008) thoroughly discuss this topic and much more and should be consulted if the reader is so inclined.

With our types merely denoting sets of values, subtyping can be determined by deciding type inhabitation (figure 7).

image

Figure 7: subtyping/inhabitation equivalence

In other words, "is a particular type inhabited" is really the only question we have to be able to answer since asking image is the same as asking if image is uninhabited (i.e. does it denote the empty set?).

2.3 Deciding Inhabitation, Normal Forms

To efficiently decide type inhabitation for set-theoretic types we leverage some of the same strategies used to decide boolean satisfiability:

2.3.1 Types in Disjunctive Normal Form

In addition to using DNF, it will be helpful to impose some additional structure to our normal form for types. To illustrate, first let us note that a DNF boolean formula image:

image

can be reorganized slightly to "group" the positive and negative atoms in each conjunction:

image

We then observe that image can be described as a set of pairs, with one pair image for each clause in the original disjunction, where image is the set of positive atoms in the clause and image is the set of negated atoms in the clause:

image

Similarly, any type image can be converted into a DNF, i.e. a set of pairs image, where for each clause image, image contains the positive atoms (written a) which are either a base type (image), a product type (image), or function type (image), and image contains the negated atoms:

image

2.3.2 Partitioning Types

In addition to being able to convert any type into DNF, for any type image there exists three specialized types image, image, and image which contain only atoms of the same kind such that:

image

By representing a type in this way, we can efficiently divide types into non-overlapping segments which can each have their own DNF representation.

i.e., image is a type whose atoms are all base types:

image

image is a DNF type whose atoms are all product types:

image

and image is a DNF type whose atoms are all function types:

image

To illustrate what this partitioning looks like in practice, here are a few very simple types and their equivalent "partitioned" representation:

image image image image image image

This technique for partitioning types into separate non-overlapping DNFs—which will inform our strategy for actually representing types as data structures—will make type inhabitation inquiries easier to implement since we’re specializing our representation to describe only the interesting, non-trivial clauses in a type. We summarize this discussion’s key takeaway in figure 8 for reference.

image

Figure 8: canonical form for representing types

3 Set-theoretic Type Representation

In Set-theoretic Types: An Overview we determined that

In this section we focus on the latter issue: type representation (since how we represent types impacts how our algorithms decide type inhabitation). We will introduce several data structures, defining for each the binary operators union ("∪"), intersection ("∩"), and difference ("\") and the unary operator complement ("¬").

3.1 Types as Data Structures

In figure 8 we noted a type can be conveniently deconstructed into three partitions, allowing us to reason separately about the base type (image), product type (image), and function type (image) portion of a type:

image

Our representation of types will exactly mirror this structure.

image

Figure 9: internal type representation

As figure 9 indicates, our internal representation of types is a 3-tuple:

The specific top types used in figure 8 are implicit in our representation, i.e. we know what kind of type-information each field is responsible for so we need not explicitly keep around image, image, and image in our partitioned representation.

The grammar and meaning for image will be given in Base DNF Representation and for image and image will be given in Product and Function DNFs.

3.1.1 Top and Bottom Type Representation

The representation of the "top type" imagethe type that denotes all values—is written image and is defined by placing the top image, image, and image in each respective field (figure 10).

image

Figure 10: top type representation

This mirrors the previous "partitioned" version of image we showed earlier:

image

The representation of the "bottom type" imagethe type that denotes no values—is written image and is defined by placing the bottom image, image, and image in each respective field (figure 11).

image

Figure 11: bottom type representation

Again, this mirrors the previous "partitioned" version of image we showed earlier:

image

Figure 12 describes how we represent the specific top types image, image, and image as data structures.

image

Figure 12: specific top type representations

3.1.2 Type Operations

Binary operations on types benefit from our partitioned design: each is defined pointwise in the natural way on each partition; type complement is defined in terms of type difference, subtracting the negated type from the top type (figure 13).

image

Figure 13: internal type operations

3.2 Base DNF Representation

We now examine how a DNF type with only base type atoms can be efficiently represented (i.e. the base portion image of a type described in figure 8 and the image field in our representation of types described in figure 9).

Although any type can be represented by some DNF type, in the case of base types things can be simplified even further! Any DNF type image whose atoms are all base types is equivalent to either

To see why this is the case, it may be helpful to recall that each base type is disjoint (i.e. no values inhabit more than one base type), note that this is obviously true for image, image, and any a single base type image or negated base type image, and then examine the definitions of base type operations presented in figure 15 and note how the representation is naturally maintained.

Because any DNF of base types can be represented by a set of base types (i.e. the elements in the union) and a polarity (i.e. is the union negated or not), we represent the base portion of a type image using a tuple with these two pieces of information (figure 14).

image

Figure 14: internal base type representation

The first field is the polarity flag (either image for a union or image for a negated union) and the second field is the set of base types image in the union.

The top base type (i.e. the type which denotes all base type values) is a negated empty set image (i.e. it is not the case that this type contains no base values) and the bottom base type (the type which denotes no base type values) is a positive empty set image (i.e. it is the case that this type contains no base values).

3.2.1 Base DNF Operations

Operations on these base type representations boil down to selecting the appropriate set-theoretic operation to combine the sets based on the polarities (figure 15).

image

Figure 15: internal base DNF operations

Base type negation is not shown (because it is not used anywhere in this model), but would simply require "flipping" the polarity flag (i.e. the first field in the tuple).

3.3 Product and Function DNFs

In order to efficiently represent a DNF type with only product or function type atoms (i.e. the image and image portions of a type described in figure 8 and the image and image fields in our type representation described in figure 9) we will use a binary decision diagram (BDD).

First we include a brief review of how BDDs work, then we discuss how they can be used effectively to represent our product/function DNF types.

3.3.1 Binary Decision Diagrams

A binary decision diagram (BDD) is a tree-like data structure which provides a convenient way to represent sets or relations.

For example, the boolean formula image has truth table:

image

     

image

     

image

     

image

0

     

0

     

0

     

1

0

     

0

     

1

     

0

0

     

1

     

0

     

0

0

     

1

     

1

     

1

1

     

0

     

0

     

0

1

     

0

     

1

     

0

1

     

1

     

0

     

1

1

     

1

     

1

     

1

and can be represented with the following BDD:

image

Each node in the tree contains a boolean variable. A node’s left subtree (the blue edge) describes the "residual formula" for when that variable is true. The node’s right subtree (the red edge) describes the "residual formula" when that variable is false. We invite the reader to compare the truth table and corresponding BDD until they are convinced they indeed represent the same boolean formula. It may be useful to observe that the leaves in the BDD correspond to the right-most column in the truth table.

3.3.2 Types as BDDs?

BDDs can also naturally encode set-theoretic types (in our case, DNF product or function types). Each node has a function/product type associated with it; henceforth we will call this associated type the atom of the node. A node’s left sub-tree (the blue edge) describes the "residual type" for when the atom is included in the overall type. A node’s right sub-tree (the red edge) describes the "residual type" for when the atom’s negation is included in the overall type.

For example, here we have encoded the type image:

image

and here is image:

image

Basically, each path in the tree represents a clause in the overall DNF, so the overall type is the union of all the possibly inhabited paths (i.e. paths that end in image).

In other words, for an arbitrary (type) BDD

image

we would interpret the meaning of image (written〚image〛) as follows:

image

where image is interpreted as image and image as image.

There is, however, a well-known problem with BDDs we will want to avoid: repeatedly unioning trees can lead to significant increases in size. This is particularly frustrating because—as we have previously noted—our primary concern algorithmically is deciding type inhabitation and taking the union of two types will have no interesting impact with respect to inhabitation. This is because the union of two types is uninhabited only when both the individual types themselves are already uninhabited.

3.3.3 Types as Lazy BDDs!

Because there is no interesting impact on inhabitation when computing unions, we can use "lazy" BDDs—whose unions only fully expand when computing type intersection or difference (i.e. operations that can have an interesting impact on inhabitation)—to represent our function/product DNF types.

Nodes in lazy BDDs have—in addition to the left and right subtrees described before—a "middle" subtree (the grey edge) which assumes nothing about its node’s atom.

In other words, for an arbitrary lazy (type) BDD

image

we would interpret the meaning of image (written〚image〛) as follows:

image

where image is interpreted as image and image as image.

image

Figure 16: lazy function/product type BDDs

Figure 16 describes in detail our representation for the DNF function/product portions of a type as lazy BDDs. Note that

Although not explicit in the grammar, these trees are constructed using an ordering on atoms that we leave up to the reader to implement (note that this implies types, BDDs, etc must all also have an ordering defined since these data structures are mutually dependent). A simple lexicographic ordering would work... a fast (possibly non-deterministic?) hash code-based ordering should also work... the choice is yours. The ordering—written image and the like—will be called upon frequently in function definitions for BDDs in the next section. The ordering allows us to have consistent representations for BDDs with similar elements.

To make some definitions more clear, "accessor functions" which extract the various fields of a node are defined in figure 17.

image

Figure 17: accessors for node fields

Finally, we use a "smart constructor"—defined in figure 18to perform some obvious simplifications when constructing a BDD node. We use an implicit syntax for the smart constructor (i.e. it looks identical to constructing a normal node), so whenever we construct a node (except of course on the right-hand side of the definition in figure 18) we are using the smart constructor.

image

Figure 18: simplifying constructor for BDD nodes

3.3.4 Lazy BDD Operations

The operations on lazy BDDs can be understood by again considering how we logically interpret a BDD:

image

Also, recall that BDD binary operations will only ever be used on two BDDs with atoms of the same kind.

Figure 19 describes BDD union, i.e. logical "or".

image

Figure 19: BDD union

Figure 20 describes BDD intersection, i.e. logical "and".

image

Figure 20: BDD intersection

Figure 21 describes BDD difference.

image

Figure 21: BDD difference

Figure 22 describes BDD complement, i.e. logical "not".

image

Figure 22: BDD complement

3.4 Parsing and Example Types

Figure 23 defines a function that converts the user-friendly types shown in figure 5 into the internal representation we have just finished describing:

image

Figure 23: type parsing function

Here are a few simple examples of what types look like once parsed:

image

image

image

image

4 Implementing Semantic Subtyping

Because we are working with set-theoretic types, we are free to define subtyping purely in terms of type inhabitation (see figure 7), which is precisely what we do (figure 24).

image

Figure 24: (semantic) subtyping, defined in terms of type emptiness

In the remainder of this section we examine how to decide type inhabitation using the data structures introduced in Set-theoretic Type Representation.

4.1 Deciding Type Inhabitation

A DNF type is uninhabited exactly when each clause in the overall disjunction is uninhabited. With our DNF types partitioned into base, product, and function segments (see figure 8):

image

we simply need ways to check if the base component (image), product component (image), and function component (image) each are empty.

image

Figure 25: type emptiness predicate

As figure 25 suggests, the representation of the base type portion is simple enough that we can pattern match on it directly to check if it is empty (recall that image is the bottom/empty image).

For deciding if the product and function components—which are represented with lazy BDDs (see previous discussion in Types as Lazy BDDs!)—are empty, we rely on helper functions image and image which are defined in Product Type Inhabitation and Function Type Inhabitation respectively.

In these sections, we will use non-terminals image and image to represent a collection of atoms (see figure 26).

image

Figure 26: sets of atoms

4.1.1 Product Type Inhabitation

To decide if the product portion of a type is uninhabited, we recall (from Partitioning Types) that it is a union of conjunctive clauses, each of which can be described with a pair of two sets (P,N), where P contains the positive product types and N the negated product types:

image

For image to be uninhabited, each such clause must be uninhabited. Checking that a given clause (P,N) is uninhabited occurs in two steps:

  1. accumulate the positive type information in P into a single product image (i.e. fold over the products in P, accumulating their pairwise intersection in a single product),

  2. check that for each N′ ⊆ N the following holds:

image

The first step is justified because pairs are covariant in their fields, i.e. if something is a image and a image then it is also a image.

The second step is more complicated. To understand, let us first note that if we know something is a product of some sort and also that it is of type image, then either it is a product whose first field is image or it is a product whose second field is image (i.e. logically we are applying DeMorgan’s law). And for the clause to be uninhabited, it must be uninhabited for both possibilities (again, intuitively an "or" is false only when all its elements are false). So by exploring each subset image and verifying that either the left-hand side of the product is empty with the negated left-hand sides in N or the right-hand side is empty for the negated right-hand sides in the complement (i.e. image), we are exploring all possible combinations of negated first and second fields from N and ensuring each possible combination is indeed uninhabited.

We describe an algorithm to perform these computations in figure 27. The function image walks over each path in the product BDD accumulating the positive information in a product image and the negative information in the set N. Then at each non-trivial leaf in the BDD, we call the helper function image which searches the space of possible negation combinations ensuring that for each possibility the pair ends up being uninhabited.

image

Figure 27: functions for checking if a product BDD is uninhabited

Note that image is designed with a "short-circuiting" behavior, i.e. as we are exploring each possible combination of negations, if a negated field we are considering would negate the corresponding positive field (e.g. image) then we can stop searching for emptiness on that side, otherwise we subtract that negated type from the corresponding field and we keep searching the remaining possible negations checking for emptiness. If we reach the base case when N is the empty set, then we have failed to show the product is empty and we return false. (Note that image checks for emptiness before calling image, otherwise we would need to check image and image for emptiness in the base case of image).

4.1.2 Function Type Inhabitation

Just like with products, to show that the function portion of a type is uninhabited we show that each clause in the DNF—

image

i.e. each pair (P,N) where P contains the positive function types and N the negated function types of the clause, represents an uninhabited function type. To do this, we show that for each clause (P,N) there exists a (image) image such that

image

(i.e. image is in the domain of the function) and that for each image,

image

Basically we are verifying that for each possible set of arrows P′ which must handle a value of type image (i.e. the left-hand check fails), those arrows must map the value to image (the right-hand check), which is a contradiction since we know this function is not of type image and therefore this function type is uninhabited.

image

Figure 28: functions for checking if a function BDD is uninhabited

We implement this algorithm with the function image defined in figure 28. It walks each path in a function BDD accumulating the domain along the way and collecting the negated function types in the variable image. At the non-trivial leaves of the BDD, it calls image with each function type image image until it finds a contradiction (i.e. an arrow that satisfies the above described equation) or runs out of negated function types.

image is the function which explores each set of arrows P′ ⊆ P checking that one of the two clauses in the above noted disjunction is true. Note that in the initial call from image we negate the original image: this is because although we are interested in image, the equivalent "contrapositive" statement image is more convenient to accumulatively check as we iterate through the function types in image.

In the base case of image when P has been exhausted, the function checks that either the arrows not in P′ could have handled the value of (the original) type image (i.e. is image now empty), otherwise it checks if the value we mapped the input to must be a subtype of (the original) type image (i.e. is image now empty).

In the case where image has not been exhausted, we examine the first arrow image in image and check two cases: one for when that arrow is not in image (i.e. when it is in image) and one for when it is in image.

The first clause in the conjunction of the non-empty P case is for when image is not in image. It first checks if the set of arrows we’re not considering (i.e. P \ P′) would handle a value of type image (i.e. image), and if not it remembers that image is not in P′ by subtracting image from image for the recursive call which keeps searching.

The second clause in the conjunction is for when image is in image. As we noted, instead of checking image (resembling the original mathematical description above), it turns out to be more convenient to check the contrapositive statement image (recall that image was actually negated originally when image was called). First we check if having image in image means we would indeed map a value of type image to a value of type image (i.e. the image check). If so we are done, otherwise we recur while remembering that image is in P′ by adding image to image (i.e. "subtracting" negated image from the negated image we are accumulating by using intersection).

5 Other Key Type-level Functions

In addition to being able to decide type inhabitation, we need to be able to semantically calculate types for the following situations:

  1. projection from a product,

  2. a function’s domain, and

  3. the result of function application.

5.1 Product Projection

In a language with syntactic types, calculating the type of the first or second projection of a pair simply involves matching on the product type and extracting the first or second field. In a language with semantic types, however, we could be dealing with a complex pair type which uses numerous set-theoretic constructors and we can no longer simply pattern match to determine the types of its projections. Instead, we must reason semantically about the fields.

To begin, first note that if a type is a subtype of image (i.e. it is indeed a pair), we can focus on the product portion of the type:

image

Projecting the field image from image (where image ∈ {1,2}) involves unioning each positive type for field image in the DNF intersected with each possible combination of negations for that field:

image

This calculation follows the same line of reasoning involved with deciding product type inhabitation (see Product Type Inhabitation), i.e. it considers each logical clause in the DNF of the type and unions them.

Actually that equation is sound but a little too coarse: it only considers the type of field image and thus may include some impossible cases where the other field would have been image. In other words, if j is an index and j ≠ i (i.e. it is the index of the other field), then as we’re calculating the projection of i, we’ll want to "skip" any cases N′ where the following is true:

image

i.e. cases where the other field j is uninhabited. If we incorporate that subtlety, our inner loop will end up containing a conditional statement:

image

5.1.1 Implementing Product Projection

As was suggested by our use of index variables i and j in the previous section’s discussion, we implement product projection as a single function indexed by some image {1,2} and use image (defined in figure 29) to return the appropriate type in non-empty clauses.

image

Figure 29: function for selecting a type during product projection

Because projection can fail, we have the function image as the "public interface" to projection. image performs important preliminary checks (i.e. is this type actually a product?) before extracting the product portion of the type and passing it to the "internal" function image where the real work begins.

image

Figure 30: functions for projecting from a product type

image walks the BDD, accumulating for each path (i.e. each clause in the DNF) the positive type information for each field in variables image and image respectively. Along the way, if either image or image are empty we can ignore that path. Otherwise at non-trivial leaves we call the helper function image which traverses the possible combinations of negations, calculating and unioning the type of field image for each possibility.

5.2 Function Domain

Similar to product projection, deciding the domain of a function in a language with set-theoretic types cannot be done using simple pattern matching; we must reason about the domain of a function type potentially constructed with intersections and/or unions.

To do this, first note that for an intersection of arrows, the domain is equivalent to the union of each of the domains (i.e. the function can accept any value any of the various arrows can collectively accept):

image

Second, note that for a union of arrows, the domain is equivalent to the intersection of each of the domains (i.e. the function can only accept values that each of the arrows can accept):

image

With those two points in mind, we can deduce that the domain of an arbitrary function type

image

is the following intersection of unions:

image

5.2.1 Implementing Function Domain

We perform those domain calculations with the functions defined in figure 31.

image

Figure 31: domain calculation for function types

image first checks if the type is indeed a function (i.e. is it a subtype of image), if so it then calls image with the function portion of the type (image) to begin traversing the BDD calculating the intersection of the union of the respective domains.

5.3 Function Application

When applying an arbitrary function to a value, we must be able to determine the type of the result. If the application is simple, e.g. a function of type image applied to an argument of type image, calculating the result is trivial (image). However, when we are dealing with an arbitrarily complicated function type which could contain set-theoretic connectives, deciding the return type is a little more complicated. As we did in the previous section, let us again reason separately about how we might apply intersections and unions of function types to guide our intuition.

In order to apply a union of function types, the argument type image of course would have to be in the domain of each function (see the discussion in the previous section). The result type of the application would then be the union of the ranges:

image

This corresponds to the logical observation that if we know P and that either P implies Q or P implies R, then we can conclude that either Q or R holds.

When applying an intersection of function types, the result type is the combination (i.e. intersection) of each applicable arrow’s range. This more or less corresponds to the logical observation that if we know P and that both P implies Q and P implies R, then we can conclude that Q and R hold.

Combining these lines of reasoning we can deduce that when considering a function type

image

being applied to an argument of type image, we first verify that image is in the domain of image (i.e. using image) and then calculate the result type of the application as follows:

image

Basically, we traverse each clause in the DNF of the function type (i.e. each pair (P,N)) unioning the results. In each clause (P,N), we consider each possible set of arrows P′ in P and if that set would necessarily have to handle a value of type image. For those sets P′ that would necessarily handle the input, we intersect their arrow’s result types (otherwise we ignore it by returning image for that clause). This reasoning resembles that which was required to decide function type inhabitation (see Function Type Inhabitation), i.e. both are considering which combinations of arrows necessarily need to be considered to perform the relevant calculation.

5.3.1 Implementing Function Application

image

Figure 32: function application result type calculations

Figure 32 describes the functions which calculate the result type for function application. image first ensures that the alleged function type is indeed a function with the appropriate domain before calling image to calculate the result type of the application. image then traverses the BDD combining recursive results via union. As it traverses down a BDD node’s left edge (i.e. when a function type is a member of a set P) it makes two recursive calls: one for when that arrow is in P′ (where we intersect the arrow’s range image with the result type accumulator image) and one for when it is not in P′ (where we subtract image from the argument type parameter image to track if the arrows in P \ P′ can handle the argument type). At non-trivial leaves where image is not empty (i.e. when we’re considering a set of arrows P′ which necessarily would need to handle the argument) we return the accumulated range type (image) for that set of arrows. Note that we can "short-circuit" the calculation when either of the accumulators (image and image) are empty, which is important to keeping the complexity of this calculation reasonable.

6 Strategies for Testing

For testing an implementation of the data structures and algorithms described in this tutorial there are some convenient properties we can leverage:

With these properties in mind, in addition to writing simple "unit tests" that are written entirely by hand we can use a tool such as QuickCheck (Claessen and Hughes 2000) or redex-check (Fetscher et al. 2015) to generate random types and verify our implementation respects well-known set properties. Additionally, we can write two implementations of algorithms which have both a naive and efficient description and feed them random input while checking that their outputs are always equivalent.

The model we used to generate the pseudo code in this tutorial uses these testing strategies. This approach helped us discover several subtle bugs in our implementation at various points that simpler hand-written unit tests had not yet exposed.

Bibliography

Giuseppe Castagna. Covariance and Contravariance: a fresh look at an old issue (a primer in advanced type systems for learning functional programmers). 2013. Unpublished manuscript, periodically updated.

Koen Claessen and John Hughes. QuickCheck: a lightweight tool for random testing of Haskell programs. In Proc. The 5th ACM SIGPLAN International Conference on Functional Programming (ICFP), 2000.

Burke Fetscher, Koen Claessen, Michał Pałka, John Hughes, and Robert Bruce Findler. Making Random Judgments: Automatically Generating Well-Typed Terms from the Definition of a Type-System. In Proc. ESOP, 2015.

Alain Frisch, Giuseppe Castagna, and Véronique Benzaken. Semantic subtyping: Dealing set-theoretically with function, union, intersection, and negation types. Journal of the ACM 55(19), 2008.

Casey Klein, John Clements, Christos Dimoulas, Carl Eastlund, Matthias Felleisen, Matthew Flatt, Jay A. McCarthy, Jon Rafkind, Sam Tobin-Hochstadt, and Robert Bruce Findler. Run Your Research: On the Effectiveness of Lightweight Mechanization. In Proc. The 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2012.