Tuesday, December 05, 2006

Factor 0.87 approaching

Factor 0.85 introduced code GC, and 0.86 introduced the local roots refactoring. Both of these changes had their fair share of bugs. I think 0.87 shakes most of them out; I used a simple looping regression test:
: foo \ = reload foo ; foo

What this does is reload the source file defining the = word, which in turn causes most of the library to be recompiled. The word then calls itself, causing an infinite loop. Eventually the code heap fills up and code GC is performed.

Factor 0.86 would crash pretty quickly with this test, because of problems with local root recording. I would modify the runtime to trigger a GC after every heap allocation, which always moves objects, and thus always breaks code which does not record local roots properly. At first this would crash Factor after a few seconds, but after working these issues out, I hit a new one; Doug Coleman reported that this tests crashed Factor on Windows after a few hours. I managed to reproduce it and the cause was surprising: if the last block in the code heap was allocated and not free, the free list would be corrupted and all blocks from the last free block onwards would be marked free. Oops!

I don't claim Factor is bug-free, but I think 0.87 is a lot more stable than 0.86. Every time I make a major change to memory management, it takes a few point releases to shake the bugs out; the last time this happened was with generational GC some time in the 0.7x series. Overall I've had virtually no "Heisenbugs" which were impossible to reproduce, unlike my experience with jEdit. In fact most Factor bugs I've had fell into one of several categories:
  • Insufficient coverage of corner-cases; this describes the code GC bug above.
  • Portability breakage: stuff that works on one platform but breaks on another because some API call doesn't behave as one would expect
  • Insufficient validation: some runtime primitives and low-level words would not validate their inputs enough, and create inconsistent objects and/or cause some form of corruption when handled bad values.

Doug Coleman has continued enhancing his random compiler tester in order to catch the last type of bug. When he first started his project I was somewhat skeptical, however as the tester has grown in sophistication it has uncovered some real bugs.

One thing I have noticed is that I rarely run into runtime type errors, or stack balancing errors. Logic errors certainly account for the bulk of my bugs. However I have found that compiler-checked stack comments are nice to have. I am still considering adding mandatory static stack effects, and optional static typing, at some point in the distant future, however I don't have any concrete plans yet.

No comments: