Wednesday, May 14, 2008

Improved destructors merged into core; working on SSL binding

I don't have anything earth-shattering to blog about at this stage, so this post is mostly just a disorganized progress report for the last week or so.

Last September, Doug Coleman implement a destructors abstraction for Factor. At the start of this year, I implemented a generic resource disposal protocol. I combined these two concepts into a single vocabulary and moved it into the core.

Over the last few months, destructors have proved their worth over and over again when dealing with C libraries. Doug's post above describes them in great detail.

Here I will skim over the general concept, together with an overview of the new word names.

Suppose you need to call a C function which takes an array of three structures. The array itself can be allocated in the Factor heap, as a byte array, however the structures must be allocated in the malloc heap because the garbage collector doesn't know about pointers inside byte arrays. So we can start by writing the following code:
FUNCTION: int our_c_function ( SHAPE* shapes ) ;

"SHAPE" malloc-object
"SHAPE" malloc-object
"SHAPE" malloc-object
3array >c-void*-array
our_c_function 0 < [ "Error return from C function" throw ] when

This is useless because it leaks memory every time it is called.

Let's keep the array around and deallocate them when we're done:
"SHAPE" malloc-object
"SHAPE" malloc-object
"SHAPE" malloc-object
3array [
>c-void*-array our_c_function
0 < [ "Error return from C function" throw ] when
] keep
[ free ] each

This is still problematic. Any one of the malloc-object calls can fail, and we're throwing an exception too. The solution is to use destructors:
[
"SHAPE" malloc-object &free
"SHAPE" malloc-object &free
"SHAPE" malloc-object &free
3array >c-void*-array our_c_function
0 < [ "Error return from C function" throw ] when
] with-destructors

The &free word schedules an allocated block of memory for deallocation for when the with-destructors word returns, whether or not an error was thrown.

The more general word is &dispose, which can take any object implementing the generic resource disposal protocol. And this is where the destructors and disposal meet. The with-disposal combinator is now just a shorthand for with-destructors:
[ X ] with-disposal == [ &dispose X ] with-destructors

Recent language changes, such as destructors, the new slot accessors, inheritance, and the thread refactoring, have really cleaned up the Windows I/O code in particular. It used to be absolutely hideous with too much stack shuffling and repetition, but after my next set of patches are pushed out, it is pretty close to exemplary Win32 API usage -- and much cleaner than anything you'd write in C, which is a pretty good accomplishment I think, given that Win32 is designed around C. I will blog about the Windows I/O code as soon as I finish up with the next set of changes.

What motivated moving destructors in the core is working on OpenSSL bindings. The OpenSSL API is pretty complex, especially so when doing non-blocking I/O, and I've had to overhaul the Unix I/O code to be able to plug it in. While overhauling the code I noticed a few places where my error handling logic was convoluted, and even a resource leak, so I used destructors to clean them up.

I will blog about the new io.sockets.secure vocabulary when the OpenSSL binding is done. For now, I can show off OpenSSL checksums support. A few weeks ago I removed some duplication between our CRC32, MD5, SHA1 and SHA2 implementations by implementing a generic checksum protocol. Since then Doug also added an Adler32 implementation. The idea behind the protocol is that you have three words, checksum-bytes, checksum-stream, and checksum-file, each one taking some kind of object and a singleton representing the checksum algorithm. For example,
"factor.image" md5 checksum-file .
=> B{ 177 55 229 87 205 67 139 188 136 33 111 38 3 0 32 23 }

However, if you try the Factor MD5 implementation you will notice it is rather slow. While it will become faster over time, OpenSSL already has a fast implementation. Thanks to the checksum protocol and some FFI fun, you can use it:
"factor.image" openssl-md5 checksum-file .
=> B{ 177 55 229 87 205 67 139 188 136 33 111 38 3 0 32 23 }

There is also openssl-sha1. In fact any checksum supported by OpenSSL's EVP_* functions can be used, as follows:
"factor.image" "SHA256" <openssl-checksum> checksum-file

The code is in checksums.openssl.

1 comment:

Anonymous said...

great work!
keep it up