Down and Dirty with Semantic Set-theoretic Types (a tutorial) v0.4

Most of the "pseudo code" in this tutorial was generated directly from a functioning redex model (Klein et al. 2012) to greatly reduce the chance for typos or other subtle bugs in their presentation.

We would like to thank Giuseppe Castagna for the time he spent writing the wonderful, detailed manuscript Covariance and Contravariance: a fresh look at an old issue (a primer in advanced type systems for learning functional programmers) (Castagna 2013) which we used to initially understand the implementation details we will discuss, as well as Alain Frisch, Giuseppe Castagna, and Véronique Benzaken for their detailed journal article Semantic subtyping: Dealing set-theoretically with function, union, intersection, and negation types (Frisch et al. 2008) which contains thorough mathematical and technical descriptions of semantic subtyping. Without those works this tutorial would simply not be possible! We highly recommend perusing those works as well for readers who are interested in this material.

1.1 Prerequisites

We assume the reader has some mathematical maturity and is familiar with terminology commonly used by computer scientists for describing programming languages and types. In particular, we assume basic familiarity with:

how types are generally used to describe programs,
basic set-theoretic concepts,
context-free grammars (CFGs), and
defining functions via pattern matching.

1.2 Example Grammar and Function Definitions

To be clear about how our term and function definitions should be read we start by briefly examining a simple well-understood domain: Peano natural numbers.

Here is a grammar for Peano naturals and booleans:

Figure 1: a context-free grammar for natural numbers and booleans

This grammar states that a natural number is either (zero) or (i.e. the successor of some natural number p), and that a boolean must be either or .

We can then define the function to be addition by structural recursion on the first argument:

Figure 2: addition for natural numbers

Example usages:

will be defined to be a function that returns a indicating whether the first argument is strictly less than the second:

Figure 3: less than for natural numbers

Example usages:

will be defined to be a function that returns a indicating whether the first argument is strictly greater-than the second:

Figure 4: greater than for natural numbers

Example usages:

Things to note about these definitions (and about definitions in general):

The first line is the signature of the function and tells us how many arguments the function takes in addition to the types the inputs and outputs will have. e.g., takes two natural numbers and returns a natural number;
Underscores that appear in the signature of a function are simply indicators for where arguments will go for functions with non-standard usage syntax. e.g., does not use underscores because we use standard function application syntax to talk about its application (e.g. ), but and use underscores to indicate they will be used as infix operators (i.e. we will write inbetween the two arguments instead of at the front like );
Underscores used within the definition of the function (e.g. the second and third clauses for ) are "wild-card" patterns and will match any input;
The order of function clauses matters! i.e. when determining which clause of a function applies to a particular input, the function tests each clause in order to see if the input matches the pattern(s);
In addition to pattern matching against the arguments themselves, "where clauses" also act as "pattern matching constraints" when present, e.g. ’s first clause matches on any naturals and only where equals .
When the same non-terminal variable appears in two places in a pattern, that pattern matches only if the same term appears in both places. Note: this does not apply to function signatures, where non-terminals only indicate what kind of terms are accepted/returned.

Important Note: We will "overload" certain convenient symbols during the course of our discussion since we can always distinguish them based on the types of arguments they are given. e.g. we will define several "union" operations, all of which will use the standard set-theoretic symbol . But when we see and , for example, we will know the former usage of is the version explicitly defined to operate on ’s and the latter is the version explicitly defined to operate on ’s. Like the function definitions above, all function definitions will have clear signatures describing the kinds of arguments they operate on.

2 Set-theoretic Types: An Overview

Set-theoretic types are a flexible and natural way for describing sets of values, featuring intuitive "logical combinators" in addition to traditional types for creating detailed specifications.

Figure 5: set-theoretic types

As figure 5 illustrates, languages with set-theoretic types feature (at least some of) the following logical type constructors:

is the union of types and , describing values of type or of type ;
is the intersection of types and , describing values of both type and ;
is the complement (or negation) of type , describing values not of type ;
is the type describing all possible values; and
is the type describing no values (i.e. ).

Additionally, we may specify "specific top types", which for each kind of atomic type denotes all values of that particular kind:

is the type that denotes all pairs,
is the type that denotes all function, and
is the type that denotes all base values (i.e. integers, strings, and booleans).

Figure 6: specific top type definitions

Set theoretic types frequently appear in type systems which reason about dynamically typed languages (e.g. TypeScript, Flow, Typed Racket, Typed Clojure, Julia), but some statically typed languages use them as well (e.g. CDuce, Pony).

2.1 Subtyping

With set-theoretic types, the programmer (and the type system) must be able to reason about how types that are not equivalent relate. i.e., even though is not the same type as , is it the case that a value of type will necessarily also be a value of type ? In other words, does hold (i.e. is τ a subtype of σ)?

For example, consider the following subtyping question:

Clearly the two types are not equal... but we can also see that any pair whose first element is either an integer or a string and whose second element is a string (i.e. the type on the left-hand side) is indeed either a pair with an integer and a string or a pair with a string and a string (i.e. the type on the right-hand side). As a programmer then we might reasonably expect that anywhere a is expected, we could provide a value of type and things should work just fine.

Unfortunately, many (most?) systems that feature set-theoretic types use sound but incomplete reasoning to determine subtyping. This is because most type systems reason about subtyping via syntactic inference rules:

These rules allow us to conclude the statement below the line if we can show that the statement(s) above the line hold. Upsides to using a system built from rules like this include (1) the rules can often directly be translated into efficient code and (2) we can generally examine each rule individually and decide if the antecedants necessarily imply the consequent (i.e. determine if the rule is valid). The downside is that such rules are incomplete for set-theoretic types: it is impossible to derive all valid subtyping judgments, e.g. we cannot conclude is a subtype of even though it is true.

For a complete treatment of subtyping for set-theoretic types, a semantic (instead of a syntactic) notion of subtyping is required.

At the time of writing this tutorial, CDuce may be the only example of an in-use language with a type system which features set-theoretic types and complete subtyping. This is not surprising since its developers are also the researchers that have pioneered the approaches we will discuss.

2.2 Semantic Subtyping

Instead of using a syntactic approach to reason about subtyping, we will instead us a semantic approach: types will simply denote sets of values in the language in the expected way.

denotes singleton set ;
denotes singleton set ;
denotes the set of integers;
denotes the set of strings;
denotes the set of pairs whose first element is a value in and whose second element is a value in (i.e. the cartesian product of and );
denotes the set of functions which can be applied to a value in and will return a value from (if they return);
denotes the union of the sets denoted by and ;
denotes the intersection of the sets denoted by and ;
denotes the complement of the set denoted by ;
denotes the set of all values; and
denotes the empty set.

Our description here omits many interesting subtleties and details about why this approach more or less "just works"; Frisch et al. (2008) thoroughly discuss this topic and much more and should be consulted if the reader is so inclined.

With our types merely denoting sets of values, subtyping can be determined by deciding type inhabitation (figure 7).

Figure 7: subtyping/inhabitation equivalence

In other words, "is a particular type inhabited" is really the only question we have to be able to answer since asking is the same as asking if is uninhabited (i.e. does it denote the empty set?).

2.3 Deciding Inhabitation, Normal Forms

To efficiently decide type inhabitation for set-theoretic types we leverage some of the same strategies used to decide boolean satisfiability:

types are kept in disjunctive normal form (DNF), and
special data structures are used to efficiently represent DNF types.

2.3.1 Types in Disjunctive Normal Form

In addition to using DNF, it will be helpful to impose some additional structure to our normal form for types. To illustrate, first let us note that a DNF boolean formula :

can be reorganized slightly to "group" the positive and negative atoms in each conjunction:

We then observe that can be described as a set of pairs, with one pair for each clause in the original disjunction, where is the set of positive atoms in the clause and is the set of negated atoms in the clause:

Similarly, any type can be converted into a DNF, i.e. a set of pairs , where for each clause , contains the positive atoms (written a) which are either a base type (), a product type (), or function type (), and contains the negated atoms:

2.3.2 Partitioning Types

In addition to being able to convert any type into DNF, for any type there exists three specialized types , , and which contain only atoms of the same kind such that:

By representing a type in this way, we can efficiently divide types into non-overlapping segments which can each have their own DNF representation.

i.e., is a type whose atoms are all base types:

is a DNF type whose atoms are all product types:

and is a DNF type whose atoms are all function types:

To illustrate what this partitioning looks like in practice, here are a few very simple types and their equivalent "partitioned" representation:

This technique for partitioning types into separate non-overlapping DNFs—which will inform our strategy for actually representing types as data structures—will make type inhabitation inquiries easier to implement since we’re specializing our representation to describe only the interesting, non-trivial clauses in a type. We summarize this discussion’s key takeaway in figure 8 for reference.

Figure 8: canonical form for representing types

3 Set-theoretic Type Representation

In Set-theoretic Types: An Overview we determined that

many type-related inquiries for set-theoretic types can be reduced to deciding type inhabitation (see Semantic Subtyping), and that because of this
a partitioned DNF representation (summarized in figure 8) may be useful.

In this section we focus on the latter issue: type representation (since how we represent types impacts how our algorithms decide type inhabitation). We will introduce several data structures, defining for each the binary operators union ("∪"), intersection ("∩"), and difference ("\") and the unary operator complement ("¬").

3.1 Types as Data Structures

In figure 8 we noted a type can be conveniently deconstructed into three partitions, allowing us to reason separately about the base type (), product type (), and function type () portion of a type:

Our representation of types will exactly mirror this structure.

Figure 9: internal type representation

As figure 9 indicates, our internal representation of types is a 3-tuple:

(the first field) is the portion which contains base type information, corresponding to ;
(the second field) is the portion corresponding to product type information, corresponding to ; and
(the third field) is the portion corresponding to function type information, corresponding to ;

The specific top types used in figure 8 are implicit in our representation, i.e. we know what kind of type-information each field is responsible for so we need not explicitly keep around , , and in our partitioned representation.

The grammar and meaning for will be given in Base DNF Representation and for and will be given in Product and Function DNFs.

3.1.1 Top and Bottom Type Representation

The representation of the "top type" —the type that denotes all values—is written and is defined by placing the top , , and in each respective field (figure 10).

Figure 10: top type representation

This mirrors the previous "partitioned" version of we showed earlier:

The representation of the "bottom type" —the type that denotes no values—is written and is defined by placing the bottom , , and in each respective field (figure 11).

Figure 11: bottom type representation

Again, this mirrors the previous "partitioned" version of we showed earlier:

Figure 12 describes how we represent the specific top types , , and as data structures.

Figure 12: specific top type representations

3.1.2 Type Operations

Binary operations on types benefit from our partitioned design: each is defined pointwise in the natural way on each partition; type complement is defined in terms of type difference, subtracting the negated type from the top type (figure 13).

Figure 13: internal type operations

3.2 Base DNF Representation

We now examine how a DNF type with only base type atoms can be efficiently represented (i.e. the base portion of a type described in figure 8 and the field in our representation of types described in figure 9).

Although any type can be represented by some DNF type, in the case of base types things can be simplified even further! Any DNF type whose atoms are all base types is equivalent to either

a union of base types , or
a negated union of base types .

To see why this is the case, it may be helpful to recall that each base type is disjoint (i.e. no values inhabit more than one base type), note that this is obviously true for , , and any a single base type or negated base type , and then examine the definitions of base type operations presented in figure 15 and note how the representation is naturally maintained.

Because any DNF of base types can be represented by a set of base types (i.e. the elements in the union) and a polarity (i.e. is the union negated or not), we represent the base portion of a type using a tuple with these two pieces of information (figure 14).

Figure 14: internal base type representation

The first field is the polarity flag (either for a union or for a negated union) and the second field is the set of base types in the union.

The top base type (i.e. the type which denotes all base type values) is a negated empty set (i.e. it is not the case that this type contains no base values) and the bottom base type (the type which denotes no base type values) is a positive empty set (i.e. it is the case that this type contains no base values).

3.2.1 Base DNF Operations

Operations on these base type representations boil down to selecting the appropriate set-theoretic operation to combine the sets based on the polarities (figure 15).

Figure 15: internal base DNF operations

Base type negation is not shown (because it is not used anywhere in this model), but would simply require "flipping" the polarity flag (i.e. the first field in the tuple).

3.3 Product and Function DNFs

In order to efficiently represent a DNF type with only product or function type atoms (i.e. the and portions of a type described in figure 8 and the and fields in our type representation described in figure 9) we will use a binary decision diagram (BDD).

First we include a brief review of how BDDs work, then we discuss how they can be used effectively to represent our product/function DNF types.

3.3.1 Binary Decision Diagrams

A binary decision diagram (BDD) is a tree-like data structure which provides a convenient way to represent sets or relations.

For example, the boolean formula has truth table:

0

0

0

1
0

0

1

0
0

1

0

0
0

1

1

1
1

0

0

0
1

0

1

0
1

1

0

1
1

1

1

1

and can be represented with the following BDD:

Each node in the tree contains a boolean variable. A node’s left subtree (the blue edge) describes the "residual formula" for when that variable is true. The node’s right subtree (the red edge) describes the "residual formula" when that variable is false. We invite the reader to compare the truth table and corresponding BDD until they are convinced they indeed represent the same boolean formula. It may be useful to observe that the leaves in the BDD correspond to the right-most column in the truth table.

3.3.2 Types as BDDs?

BDDs can also naturally encode set-theoretic types (in our case, DNF product or function types). Each node has a function/product type associated with it; henceforth we will call this associated type the atom of the node. A node’s left sub-tree (the blue edge) describes the "residual type" for when the atom is included in the overall type. A node’s right sub-tree (the red edge) describes the "residual type" for when the atom’s negation is included in the overall type.

For example, here we have encoded the type :

and here is :

Basically, each path in the tree represents a clause in the overall DNF, so the overall type is the union of all the possibly inhabited paths (i.e. paths that end in ).

In other words, for an arbitrary (type) BDD

we would interpret the meaning of (written〚〛) as follows:

where is interpreted as and as .

There is, however, a well-known problem with BDDs we will want to avoid: repeatedly unioning trees can lead to significant increases in size. This is particularly frustrating because—as we have previously noted—our primary concern algorithmically is deciding type inhabitation and taking the union of two types will have no interesting impact with respect to inhabitation. This is because the union of two types is uninhabited only when both the individual types themselves are already uninhabited.

3.3.3 Types as Lazy BDDs!

Because there is no interesting impact on inhabitation when computing unions, we can use "lazy" BDDs—whose unions only fully expand when computing type intersection or difference (i.e. operations that can have an interesting impact on inhabitation)—to represent our function/product DNF types.

Nodes in lazy BDDs have—in addition to the left and right subtrees described before—a "middle" subtree (the grey edge) which assumes nothing about its node’s atom.

In other words, for an arbitrary lazy (type) BDD

we would interpret the meaning of (written〚〛) as follows:

where is interpreted as and as .

Figure 16: lazy function/product type BDDs

Figure 16 describes in detail our representation for the DNF function/product portions of a type as lazy BDDs. Note that

describes a lazy BDD of either functions or products and is useful for describing functions that are parametric w.r.t. which kind of atom they contain;
and are the leaves in our BDDs, interpreted as and respectively;
describes a BDD node: a 4-tuple with an atom and three sub-trees (whose meaning are described earlier in this section);
an atom () is either a product or a function type—a given BDD will only contain atoms of one kind or the other; and
and simply allow us to be more specific and describe what kind of atoms a particular contains.

Although not explicit in the grammar, these trees are constructed using an ordering on atoms that we leave up to the reader to implement (note that this implies types, BDDs, etc must all also have an ordering defined since these data structures are mutually dependent). A simple lexicographic ordering would work... a fast (possibly non-deterministic?) hash code-based ordering should also work... the choice is yours. The ordering—written and the like—will be called upon frequently in function definitions for BDDs in the next section. The ordering allows us to have consistent representations for BDDs with similar elements.

To make some definitions more clear, "accessor functions" which extract the various fields of a node are defined in figure 17.

Figure 17: accessors for node fields

Finally, we use a "smart constructor"—defined in figure 18—to perform some obvious simplifications when constructing a BDD node. We use an implicit syntax for the smart constructor (i.e. it looks identical to constructing a normal node), so whenever we construct a node (except of course on the right-hand side of the definition in figure 18) we are using the smart constructor.

Figure 18: simplifying constructor for BDD nodes

3.3.4 Lazy BDD Operations

The operations on lazy BDDs can be understood by again considering how we logically interpret a BDD:

Also, recall that BDD binary operations will only ever be used on two BDDs with atoms of the same kind.

Figure 19 describes BDD union, i.e. logical "or".

Figure 19: BDD union

Figure 20 describes BDD intersection, i.e. logical "and".

Figure 20: BDD intersection

Figure 21 describes BDD difference.

Figure 21: BDD difference

Figure 22 describes BDD complement, i.e. logical "not".

Figure 22: BDD complement

3.4 Parsing and Example Types

Figure 23 defines a function that converts the user-friendly types shown in figure 5 into the internal representation we have just finished describing:

Figure 23: type parsing function

Here are a few simple examples of what types look like once parsed:

4 Implementing Semantic Subtyping

Because we are working with set-theoretic types, we are free to define subtyping purely in terms of type inhabitation (see figure 7), which is precisely what we do (figure 24).

Figure 24: (semantic) subtyping, defined in terms of type emptiness

In the remainder of this section we examine how to decide type inhabitation using the data structures introduced in Set-theoretic Type Representation.

4.1 Deciding Type Inhabitation

A DNF type is uninhabited exactly when each clause in the overall disjunction is uninhabited. With our DNF types partitioned into base, product, and function segments (see figure 8):

we simply need ways to check if the base component (), product component (), and function component () each are empty.

Figure 25: type emptiness predicate

As figure 25 suggests, the representation of the base type portion is simple enough that we can pattern match on it directly to check if it is empty (recall that is the bottom/empty ).

For deciding if the product and function components—which are represented with lazy BDDs (see previous discussion in Types as Lazy BDDs!)—are empty, we rely on helper functions and which are defined in Product Type Inhabitation and Function Type Inhabitation respectively.

In these sections, we will use non-terminals and to represent a collection of atoms (see figure 26).

Figure 26: sets of atoms

4.1.1 Product Type Inhabitation

To decide if the product portion of a type is uninhabited, we recall (from Partitioning Types) that it is a union of conjunctive clauses, each of which can be described with a pair of two sets (P,N), where P contains the positive product types and N the negated product types:

For to be uninhabited, each such clause must be uninhabited. Checking that a given clause (P,N) is uninhabited occurs in two steps:

accumulate the positive type information in P into a single product (i.e. fold over the products in P, accumulating their pairwise intersection in a single product),
check that for each N′ ⊆ N the following holds:

The first step is justified because pairs are covariant in their fields, i.e. if something is a and a then it is also a .

The second step is more complicated. To understand, let us first note that if we know something is a product of some sort and also that it is of type , then either it is a product whose first field is or it is a product whose second field is (i.e. logically we are applying DeMorgan’s law). And for the clause to be uninhabited, it must be uninhabited for both possibilities (again, intuitively an "or" is false only when all its elements are false). So by exploring each subset and verifying that either the left-hand side of the product is empty with the negated left-hand sides in N or the right-hand side is empty for the negated right-hand sides in the complement (i.e. ), we are exploring all possible combinations of negated first and second fields from N and ensuring each possible combination is indeed uninhabited.

We describe an algorithm to perform these computations in figure 27. The function walks over each path in the product BDD accumulating the positive information in a product and the negative information in the set N. Then at each non-trivial leaf in the BDD, we call the helper function which searches the space of possible negation combinations ensuring that for each possibility the pair ends up being uninhabited.

Figure 27: functions for checking if a product BDD is uninhabited

Note that is designed with a "short-circuiting" behavior, i.e. as we are exploring each possible combination of negations, if a negated field we are considering would negate the corresponding positive field (e.g. ) then we can stop searching for emptiness on that side, otherwise we subtract that negated type from the corresponding field and we keep searching the remaining possible negations checking for emptiness. If we reach the base case when N is the empty set, then we have failed to show the product is empty and we return false. (Note that checks for emptiness before calling , otherwise we would need to check and for emptiness in the base case of ).

4.1.2 Function Type Inhabitation

Just like with products, to show that the function portion of a type is uninhabited we show that each clause in the DNF—

—i.e. each pair (P,N) where P contains the positive function types and N the negated function types of the clause, represents an uninhabited function type. To do this, we show that for each clause (P,N) there exists a () such that

(i.e. is in the domain of the function) and that for each ,

Basically we are verifying that for each possible set of arrows P′ which must handle a value of type (i.e. the left-hand check fails), those arrows must map the value to (the right-hand check), which is a contradiction since we know this function is not of type and therefore this function type is uninhabited.

Figure 28: functions for checking if a function BDD is uninhabited

We implement this algorithm with the function defined in figure 28. It walks each path in a function BDD accumulating the domain along the way and collecting the negated function types in the variable . At the non-trivial leaves of the BDD, it calls with each function type until it finds a contradiction (i.e. an arrow that satisfies the above described equation) or runs out of negated function types.

is the function which explores each set of arrows P′ ⊆ P checking that one of the two clauses in the above noted disjunction is true. Note that in the initial call from we negate the original : this is because although we are interested in , the equivalent "contrapositive" statement is more convenient to accumulatively check as we iterate through the function types in .

In the base case of when P has been exhausted, the function checks that either the arrows not in P′ could have handled the value of (the original) type (i.e. is now empty), otherwise it checks if the value we mapped the input to must be a subtype of (the original) type (i.e. is now empty).

In the case where has not been exhausted, we examine the first arrow in and check two cases: one for when that arrow is not in (i.e. when it is in ) and one for when it is in .

The first clause in the conjunction of the non-empty P case is for when is not in . It first checks if the set of arrows we’re not considering (i.e. P \ P′) would handle a value of type (i.e. ), and if not it remembers that is not in P′ by subtracting from for the recursive call which keeps searching.

The second clause in the conjunction is for when is in . As we noted, instead of checking (resembling the original mathematical description above), it turns out to be more convenient to check the contrapositive statement (recall that was actually negated originally when was called). First we check if having in means we would indeed map a value of type to a value of type (i.e. the check). If so we are done, otherwise we recur while remembering that is in P′ by adding to (i.e. "subtracting" negated from the negated we are accumulating by using intersection).

5 Other Key Type-level Functions

In addition to being able to decide type inhabitation, we need to be able to semantically calculate types for the following situations:

projection from a product,
a function’s domain, and
the result of function application.

5.1 Product Projection

In a language with syntactic types, calculating the type of the first or second projection of a pair simply involves matching on the product type and extracting the first or second field. In a language with semantic types, however, we could be dealing with a complex pair type which uses numerous set-theoretic constructors and we can no longer simply pattern match to determine the types of its projections. Instead, we must reason semantically about the fields.

To begin, first note that if a type is a subtype of (i.e. it is indeed a pair), we can focus on the product portion of the type:

Projecting the field from (where ∈ {1,2}) involves unioning each positive type for field in the DNF intersected with each possible combination of negations for that field:

This calculation follows the same line of reasoning involved with deciding product type inhabitation (see Product Type Inhabitation), i.e. it considers each logical clause in the DNF of the type and unions them.

Actually that equation is sound but a little too coarse: it only considers the type of field and thus may include some impossible cases where the other field would have been . In other words, if j is an index and j ≠ i (i.e. it is the index of the other field), then as we’re calculating the projection of i, we’ll want to "skip" any cases N′ where the following is true:

i.e. cases where the other field j is uninhabited. If we incorporate that subtlety, our inner loop will end up containing a conditional statement:

5.1.1 Implementing Product Projection

As was suggested by our use of index variables i and j in the previous section’s discussion, we implement product projection as a single function indexed by some ∈ {1,2} and use (defined in figure 29) to return the appropriate type in non-empty clauses.

Figure 29: function for selecting a type during product projection

Because projection can fail, we have the function as the "public interface" to projection. performs important preliminary checks (i.e. is this type actually a product?) before extracting the product portion of the type and passing it to the "internal" function where the real work begins.

Figure 30: functions for projecting from a product type

walks the BDD, accumulating for each path (i.e. each clause in the DNF) the positive type information for each field in variables and respectively. Along the way, if either or are empty we can ignore that path. Otherwise at non-trivial leaves we call the helper function which traverses the possible combinations of negations, calculating and unioning the type of field for each possibility.

5.2 Function Domain

Similar to product projection, deciding the domain of a function in a language with set-theoretic types cannot be done using simple pattern matching; we must reason about the domain of a function type potentially constructed with intersections and/or unions.

To do this, first note that for an intersection of arrows, the domain is equivalent to the union of each of the domains (i.e. the function can accept any value any of the various arrows can collectively accept):

Second, note that for a union of arrows, the domain is equivalent to the intersection of each of the domains (i.e. the function can only accept values that each of the arrows can accept):

With those two points in mind, we can deduce that the domain of an arbitrary function type

is the following intersection of unions:

5.2.1 Implementing Function Domain

We perform those domain calculations with the functions defined in figure 31.

Figure 31: domain calculation for function types

first checks if the type is indeed a function (i.e. is it a subtype of ), if so it then calls with the function portion of the type () to begin traversing the BDD calculating the intersection of the union of the respective domains.

5.3 Function Application

When applying an arbitrary function to a value, we must be able to determine the type of the result. If the application is simple, e.g. a function of type applied to an argument of type , calculating the result is trivial (). However, when we are dealing with an arbitrarily complicated function type which could contain set-theoretic connectives, deciding the return type is a little more complicated. As we did in the previous section, let us again reason separately about how we might apply intersections and unions of function types to guide our intuition.

In order to apply a union of function types, the argument type of course would have to be in the domain of each function (see the discussion in the previous section). The result type of the application would then be the union of the ranges:

This corresponds to the logical observation that if we know P and that either P implies Q or P implies R, then we can conclude that either Q or R holds.

When applying an intersection of function types, the result type is the combination (i.e. intersection) of each applicable arrow’s range. This more or less corresponds to the logical observation that if we know P and that both P implies Q and P implies R, then we can conclude that Q and R hold.

Combining these lines of reasoning we can deduce that when considering a function type

being applied to an argument of type , we first verify that is in the domain of (i.e. using ) and then calculate the result type of the application as follows:

Basically, we traverse each clause in the DNF of the function type (i.e. each pair (P,N)) unioning the results. In each clause (P,N), we consider each possible set of arrows P′ in P and if that set would necessarily have to handle a value of type . For those sets P′ that would necessarily handle the input, we intersect their arrow’s result types (otherwise we ignore it by returning for that clause). This reasoning resembles that which was required to decide function type inhabitation (see Function Type Inhabitation), i.e. both are considering which combinations of arrows necessarily need to be considered to perform the relevant calculation.

5.3.1 Implementing Function Application

Figure 32: function application result type calculations

Figure 32 describes the functions which calculate the result type for function application. first ensures that the alleged function type is indeed a function with the appropriate domain before calling to calculate the result type of the application. then traverses the BDD combining recursive results via union. As it traverses down a BDD node’s left edge (i.e. when a function type is a member of a set P) it makes two recursive calls: one for when that arrow is in P′ (where we intersect the arrow’s range with the result type accumulator ) and one for when it is not in P′ (where we subtract from the argument type parameter to track if the arrows in P \ P′ can handle the argument type). At non-trivial leaves where is not empty (i.e. when we’re considering a set of arrows P′ which necessarily would need to handle the argument) we return the accumulated range type () for that set of arrows. Note that we can "short-circuit" the calculation when either of the accumulators ( and ) are empty, which is important to keeping the complexity of this calculation reasonable.

6 Strategies for Testing

For testing an implementation of the data structures and algorithms described in this tutorial there are some convenient properties we can leverage:

(1) any type generated by the grammar of types is a valid type;
(2) since these types logically correspond to sets, we can create tests based on well-known set properties and ensure our types behave equivalently; and
(3) we have "naive", inefficient mathematical descriptions of many of the algorithms in addition to more efficient algorithms which purport to perform the same calculation.

With these properties in mind, in addition to writing simple "unit tests" that are written entirely by hand we can use a tool such as QuickCheck (Claessen and Hughes 2000) or redex-check (Fetscher et al. 2015) to generate random types and verify our implementation respects well-known set properties. Additionally, we can write two implementations of algorithms which have both a naive and efficient description and feed them random input while checking that their outputs are always equivalent.

The model we used to generate the pseudo code in this tutorial uses these testing strategies. This approach helped us discover several subtle bugs in our implementation at various points that simpler hand-written unit tests had not yet exposed.

Bibliography

Giuseppe Castagna. Covariance and Contravariance: a fresh look at an old issue (a primer in advanced type systems for learning functional programmers). 2013. Unpublished manuscript, periodically updated.

Koen Claessen and John Hughes. QuickCheck: a lightweight tool for random testing of Haskell programs. In Proc. The 5th ACM SIGPLAN International Conference on Functional Programming (ICFP), 2000.

Burke Fetscher, Koen Claessen, Michał Pałka, John Hughes, and Robert Bruce Findler. Making Random Judgments: Automatically Generating Well-Typed Terms from the Definition of a Type-System. In Proc. ESOP, 2015.

Alain Frisch, Giuseppe Castagna, and Véronique Benzaken. Semantic subtyping: Dealing set-theoretically with function, union, intersection, and negation types. Journal of the ACM 55(19), 2008.

Casey Klein, John Clements, Christos Dimoulas, Carl Eastlund, Matthias Felleisen, Matthew Flatt, Jay A. McCarthy, Jon Rafkind, Sam Tobin-Hochstadt, and Robert Bruce Findler. Run Your Research: On the Effectiveness of Lightweight Mechanization. In Proc. The 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2012.

1	Introduction
2	Set-theoretic Types: An Overview
3	Set-theoretic Type Representation
4	Implementing Semantic Subtyping
5	Other Key Type-level Functions
6	Strategies for Testing
	Bibliography