Wednesday, June 14, 2006

GC bug

I found and fixed a bug in the garbage collector today. It seems to have been introduced when quotations became arrays. I triggered it by following the following steps:
  • Running 'tests' in the UI
  • Redefining a core word
  • Recompiling everything
  • Invoking full-gc

At this point the runtime crashed with a memory corruption error. Further investigation revealed that some code was holding on to an address which appears to have been moved by the GC. Several hours later, I uncovered the suspect code.

When a callback is called, the current interpreter state is saved in a stack_state struct, and these structures are chained. This includes a copy of all stacks, and the currently executing quotation. This is because each nested callback runs with its own isolated data and call stacks. When leaving the callback, the topmost entry in the linked list is removed, and the saved state is restored. The top-level stack_state struct does not contain a valid saved current quotation field, since there was no current quotation when the top level object was created. So the garbage collector would not consider the saved quotation there a valid root, and would not copy this object. However, the test for this case was wrong; it would ignore the saved current quotation in the first stack_state, and not the last. The result was that if callbacks were never used, then the first and last elements of the linked list would coincide and there would be no problem. However when the garbage collector was invoked from within a nested callback, the correct pointer would not be updated, and thus the interpreter would continue executing at an address which was not even valid anymore.

No comments: