Some Changes in Typed Racket v6.8

:: Typed Racket, Performance

The more things change, the more they stay the same… at least for these updates! =)

TL;DR Typed Racket has a whole bunch of new parts under the hood for v6.8—hopefully as a user you can’t tell (unless of course it’s because your program type checks faster)!

If you’re a little curious about “how the sausage is made” (or what new ingredients they contain…?) read on!

Motivation

Typed Racket works hard to solve a problem whose difficulty is right up there with SAT: type checking idiomatic Racket code. Because of this, we do not expect it to be the fastest kid on the block as far as type checkers are concerned… but we also don’t expect it to be unbearably slow either! And while Typed Racket has been blessed with contributions from some extremely talented individuals, it has nonetheless accumulated some technical (performance) debt over the years.

Some of my recent research (adding refinement types to Typed Racket) has the potential to slow down type checking for some programs. Because of this, I thought it prudent to pay off some of this debt before merging my work into Typed Racket proper. The upcoming (Typed) Racket release (v6.8) contains several performance improvements that hopefully help programmers with shorter type checking times. If you’re a Typed Racket user it would be great to hear if there are any noticeable changes (good or bad) for your compile times!

Summary

Basically we’ve tried to improve the type checker in following ways:

  1. reduce the amount of data kept around while type checking,

  2. reduce unreasonable dynamic dispatch,

  3. more efficiently represent some of our common base types,

  4. and perform less unnecessary logical reasoning.

Some of these things feel like no-brainers in retrospect… but that’s what can happen in large, complicated projects that a lot of people contribute to over a decade: it’s easy for little things that can add up to slip by!

Interning: Less is More

Relevant commit(s): 8e7f390

Since Typed Racket works with logical propositions about syntactic objects and their types, we can generate a large number of instances of the same values during type checking if we’re not careful. To mitigate this, interning (i.e. storing only one copy of each unique value) was adopted years ago for the internal representations of types and propositions. Unfortunately this ended up being done in such a way that the unique instances of types and propositions were kept around for the entirety of type checking. This essentially represented a memory leak in the type checker which created an undue burden for the type checking of some programs.

Once we identified this issue (and saw how much memory it was consuming for some programs) it was pretty easy to fix:

First, we decided that interning, while valuable for some values, does not pay its way universally. For example, while it does reduce memory usage and provide for faster equality checks, it makes (re)constructing the values (like when performing substitution) more expensive (i.e. we have to build a value and then check a hash table to fetch its interned representation). In the end we ended up using interning for union types (since they can be large and more complicated than other types) and logical propositions only.

Secondly—and this is important to consider when implementing something like interning!—we switched to data structures with weak references to intern values. This means that interned items, once they no longer are being used by the type checker, can be garbage collected like any other value. In Racket this can be accomplished by using a weak-hash table (which weakly stores keys) and either an ephemeron or weak-box for storing the value:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
(define intern-table (make-weak-hash))

(define (get-interned-copy x)
  (define b (hash-ref intern-table x #f))
  (define val (and b (weak-box-value b #f)))
  (cond
    [val val]
    [else
     (hash-set! intern-table x (make-weak-box x))
     x]))

Generics for Constant Time Dispatch

Relevant commit(s): 24c64e9 8e7f390

In a dynamically typed language, dispatching between a set of possible values is usually done either directly with a sequence of conditionals (i.e. if-elseif-…-else) or with a syntactic form that compiles to such a sequence (e.g. Racket’s match):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
(struct Point2d (x y) #:transparent)
(struct Point3d (x y z) #:transparent)

(define (distance p1 p2)
  (match* (p1 p2)
    [((Point2d x1 y1) (Point2d x2 y2))
     (sqrt (+ (expt (- x2 x1) 2)
              (expt (- y2 y1) 2)))]
    [((Point3d x1 y1 z1) (Point3d x2 y2 z2))
     (sqrt (+ (expt (- x2 x1) 2)
              (expt (- y2 y1) 2)
              (expt (- z2 z1) 2)))]
    [(_ _) (error 'distance "invalid point(s): ~a ~a" p1 p2)]))

Unfortunately, the number of tests required for dynamic dispatch in large functions can be non-trivial (n/2 checks on average for n clauses that are equally likely)! Sometimes the encountered values are concentrated enough that simply placing the common cases at the beginning of the sequence of tests is enough, but in many cases we encountered in the implementation of Typed Racket this simply was not the case.

We decided to reduce the amount of dynamic dispatch in Typed Racket by using Racket’s generics:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
(require racket/generic
         rackunit)

(define-generics mappable
  (fmap mappable f))

(struct Point2d (x y) #:transparent
  #:methods gen:mappable
  [(define (fmap p f)
     (match p
       [(Point2d x y) (Point2d (f x) (f y))]))])

(define (add1* x)
  (cond
    [(number? x) (add1 x)]
    [(mappable? x) (fmap x add1*)]
    [else x]))

(check-equal? (fmap (Point2d 1 3) add1*)
              (Point2d 2 4))

In this example we have a mappable interface that a struct can implement, which describes how to generically map over and rebuild structs of this type. This information is then stored in the meta-information for the Point2d struct and calling fmap on a Point2d then dispatches in constant time to Point2d’s implementation of fmap instead of potentially traversing a large number of clauses in a large match expression.

We basically used this same strategy to implement various forms of substitution in Typed Racket to avoid costly dynamic dispatch.

Bit Fields for Base Types

Relevant commit(s): 4bfb101

A cornerstone of Typed Racket’s type system is the Union type (e.g. (U Number String)), which describes the sets of possible values people naturally think of when reasoning about untyped programs.

Subtyping and other operations for union types can be costly (often quadratic if we’re working with two union types!), since we’re basically performing point-wise set operations that must individually consider each element of the union. Fortunately, since the set of base types in Typed Racket is closed (i.e. they are fixed a priori — no new ones are added during typechecking) we can represent unions of base types efficiently using a bit field. Not only does this mean the physical representation of these union types is significantly smaller (we can fit all our base types into two 64-bit words) but we can perform constant time set operations on bit fields which suffices for many set operations required by the type checker.

For example, subtyping between two unions of bases is now a simple subset check with bitwise operations:

1
2
(define (bits-subtype? bits1 bits2)
  (zero? (bitwise-and bits1 (bitwise-not bits2))))

Work Smarter, Not Harder (Logically)

Relevant commit(s): be155fa

This last point is simple but important… and I’m going to be perhaps a little too concise for the average reader, so it may only be useful as a note to future Typed Racket developers who are intimately familiar with the type system. You have been warned!

In a type checker that has to consider logical propositions about the terms in the program the size of the data you’re reasoning with can grow exponentially! Wherever possible during type checking we want to propagate our type-related expectations down the AST so we’re reasoning locally as often as possible.

Previous work on Typed Racket had done this with the types but not the expected propositions =(

Basically, this meant that while type checking simple programs that looked like the this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
(: parse : (Any -> Symbol))
(define (parse v)
  (match v
    [`(VARREF ,(? symbol? x)) 'VARREF]
    [`(Lambda ,(app baz fs) ,e ...) 'Lambda]
    [`(CaseLambda ,lc ...) 'CaseLambda]
    [`(If ,(app baz cond) ,(app baz then) ,(app baz else)) 'If]
    [`(Begin ,e ...) 'Begin]
    [`(Begin0 ,(app baz e1) ,e ...) 'Begin0]
    [`(LetValues (,lvs ...) ,e ...) 'LetValues]
    [`(LetrecValues (,lvs ...) ,e ...) 'LetrecValues]
    ;; lost more clauses
    ...))

Typed Racket was reasoning about every possible logical combination of propositions that could be learned at each point during the match even though we don’t care in most cases (and recall that large matches simply expand into even larger ASTs of nested if and let expressions — that’s a lot of disjunctions to reason about!).

After fixing the type checker to push down the expected logical information, programs such as the above match began type checking in far more reasonable amounts of time (i.e. taking time that appeared to be linear w.r.t. the number of clauses instead of exponential).