Thursday, August 16, 2007

Parallel map and each

To ensure that Factor 0.90 is robust and stable, I'm setting up a "compile farm" for testing Factor on a variety of machines; this involves bootstrapping, loading all modules, running unit tests, and benchmarks.

Additionally, onn Mac OS X, the .app deployment tool is tested in an automated fashion. The deploy.app tool spawns a separate Factor process to ensure that deployment is done in a clean image, then blocks until this process completes. Since I'm on a quad CPU machine, I can save a lot of time by deploying all modules at the same time. I do this using Chris Double's extra/concurrency library. What I want to do is spawn a bunch of threads, then wait until all these threads complete. Each thread calls deploy.app to Factor process, then waits for this process to comoplete. As suggested by Chris, I can use concurrency futures for this.

The idea behind futures is simple. You call future with a quotation yielding a value; this spawns a process to compute the value. Then at a later time, you call ?future which waits until the value is ready, and returns this value.

We can use futures to write a parallel-map combinator:
: parallel-map ( seq quot -- newquot )
[ curry future ] curry map [ ?future ] map ;

This spawns a thread to compute each value, then waits until all values have been computed, collecting them into a new sequence.

We can test this out:
  [ { 1 2 3 4 } [ 1000 sleep 1 + ] parallel-map ] time .
1003 ms run / 0 ms GC time
{ 2 3 4 5 }

Four threads are spawned, and each one takes a second to complete. Compare with an ordinary map:
  [ { 1 2 3 4 } [ 1000 sleep 1 + ] map ] time
4001 ms run / 0 ms GC time
{ 2 3 4 5 }

(Incidentally, the reason the run time does not come out exactly at 1000 and 4000 ms, respectively, is not because a simple iteration over 4 elements takes 3 ms in Factor -- it doesn't -- but because the sleep word has poor resolution. This will be fixed at some point.)

In my specific use-case, I don't care about a return value, I just want to spawn a bunch of threads and wait until they're all done. So I can implement parallel-each in terms of parallel-map:
: parallel-each ( seq quot -- )
[ f ] compose parallel-map drop ;

We simply modify the quotation to yield a null value, then we parallel map this quotation across the sequence and drop the results (which should be a sequence consisting entirely of fs).

Now my automated deployment tester is easy to write:

{
"automata.ui"
"boids.ui"
"bunny"
"color-picker"
"gesture-logger"
"golden-section"
"hello-ui"
"lsys.ui"
"maze"
"nehe"
"tetris"
} [
"deploy-log/" over append <file-writer>
[ deploy.app ] with-stream
] parallel-each

The parallel-map and parallel-each combinators are now in extra/concurrency.

In this case, each thread simply spawns another OS process then waits for I/O to complete, so Factor's simple co-operative scheduler has no problem parallelizing the work. For compute-intensive tasks, this approach is not useful, since Factor cannot take advantage of pre-emptive threading or multiple CPUs.

1 comment:

Anonymous said...

Very interesting article!
this language is so cool
I love what you're doing with it.