Friday, January 12, 2007

A compiler optimization

Here are the times to run the demos/raytracer benchmark in recent Factor versions:
VersionTime (seconds)
0.8543
0.8651
0.8751
0.8829

I'm still not sure what's behind the regression from 0.85 to 0.86, maybe I can reclaim that time too.

The performance gain comes from improved type inference. The compiler is now able to remove overflow checks on loop index variables if it can prove that they are bounded above by a fixnum. The raytracer benefits form this because it calls vector
arithmetic words with arrays as arguments, and the length of an array is always a fixnum.

Another less impressive gain is found for a similar reason when reading a 3mb file into a string:
VersionTime (seconds)
0.871.8
0.881

The code to read a file into a string:
"/Users/slava/repos/Factor/boot.image.ppc"  4 1024 * 1024 *  [ stream-copy ] keep >string drop

I/O performance still sucks, as you can see. I omitted the 0.85 and 0.86 results because they are the same as 0.87.

I will now explain where the performance gain comes from.

The running example which will be used in the below description is the following short snippet:
dup type 0 eq? [ dup 5 < [ 1+ ] when ] when

The compiler here is able to remove both the type dispatch and overflow check on the 1+, transforming it from a full-fledged call to a polymorphic word into a single instruction. This example is contrived but it is exactly the simplest form of the very realistic case of a loop counter being incremented in a recursive word, bounded by a fixnum such as an array length.

The type inferencer maintains two forms of state; the association between values and their inferred classes in the current control flow path, and a map of constraints, where a constraint is basically a logical statement of the form "the value X has class Y". In the constraint map, each key/value pair is like an implication; if the constraint in the key becomes true, then the inferencer assumes that the constraint in the value becomes true.

In the above example, type inference proceeds as follows. When the inferencer sees the call to type, it adds a bunch of constraint pairs to the map of constraints:
(the output of 'type' is equal to 0) -> (the input is a fixnum)
(the output of 'type' is equal to 1) -> (the input is a bignum)
...

Then, the inferencer sees the eq? word, and notices that one of the inputs is a literal. This forces a new constraint pair to be added to the constraint map:
(the output of 'eq?' is equal to t) -> (the first input of 'eq?' is equal to 0)

Next, the inferencer proceeds to infer types inside the when conditional. Upon entering the body of the conditional, the inferencer makes an assumption:
(the type of the top of the stack is t)

Upon making this assumption, it checks the constraint map for any additional facts that this assumption may imply. As it turns out, the constraints set up previously yield some information. It was recorded that the assumption that the top of the stack is true implies that (the first input of 'eq?' is equal to 0), which implies that (the input to 'type' is a fixnum).

So inside the branch, the type inferencer now knows that the top of the stack is a fixnum. The call to < is inlined and the type dispatch is removed, thus replacing it with fixnum<.

This is where the fun begins. If fixnum< returns true, then the first input is less than the second. This is completely obvious. However, the second input must be less than or equal to the most positive fixnum. So by transitivity, the first input must be strictly less than the most positive fixnum. The type inferencer knows this, and the fixnum< word defines a constraint to this effect.

There is a new type in Factor now:
PREDICATE: fixnum small 1+ fixnum? ;

The fixnum< word defines a constraint of the form:
(the output of 'fixnum<' is t) -> (the first input is small)

Once again, we have a branch, inside which the type inferencer assumes the top of the stack is t, which by a previous implication forces the type inferencer to assume the top of the stack is small.

At this point, the call to 1+ is reached, and the definition of the polymorphic 1+ is inlined and specialized to the case of a fixnum input. This replaces the call to 1+ with 1 fixnum+.

The type inference is almost done. However, there is still one thing to do. An optimizer hook for fixnum+ word notices that one of its inputs is small and the other input is 1, and this replaces the call to fixnum+ with fixnum+fast, which has no overflow check and compiles to a single instruction.

Here is an example of very pretty stack code defining the type inference rule for <:
! if a is less than b and b is a fixnum, then a is small
\ < [
[
small 0 `input class,
fixnum 0 `input class,
fixnum 1 `input class,
general-t 0 `output class,
] set-constraints
] "constraints" set-word-prop


It is worth noting that as long as your integers still have arbitrary precision, none of these optimizations would be any easier with a static type system. I'm sure GHC does something very similar to this.

No comments: