Monday, May 14, 2007

Introducing the profiler

Factor now has a simple call-counting profiler. Usage is simple. You must first enable it, which recompiles all words:
USE: profiler
enable-profiler

Then you can use a combinator to profile quotations:
( scratchpad ) [ "hello " "world" append ] profile

Then you can view results:
( scratchpad ) profile.
profiling 1
append 1
call 1
> 2
< 2
+ 2
check-copy 2
length 4
(copy) 13

Here we see that very few actual words were called; append is compiled, and so much inlining and open-coding occurs that pretty much the only indication of any work being done is the (copy) word being invoked once per character.

Another example is reversing a string:
( scratchpad ) [ "hello world" reverse ] profile
( scratchpad ) profile.
profiling 1
> 1
< 1
+ 1
like 1
reverse 1
call 1
check-copy 1
frange? 3
column? 3
groups? 3
- 11
circular? 11
<= 12
(copy) 12
gb? 14
reversed? 14
length 18
1- 22
nth-unsafe 22
>fixnum 33

Here we see more work is done. Type checks, generic arithmetic, and so on. This is because of how reverse is implemented; it creates a virtual reversal sequence then converts this into a sequence of the same type as the input.

Of course, the profiler is a lot more useful when applied to non-trivial computations. Here the simple list produced by profile. gets rather large, so various other words, such as vocab-profile., usage-profile. and vocabs-profile. come in handy. All these words are documented in the help system, along with a general article describing the profiler itself. Grab Factor from darcs and enter "profiling" help in the UI listener.

The profiler has a long history. I believe even before CFactor got a native compiler, it had a call counting profiler for a little while. When the compiler came along, I removed it because it didn't work for compiled code. In Factor 0.89, I re-implemented it, intending to add support for compiled profiling before releasing 0.89, however I felt this would delay the release too much, so I left it undocumented and not fully functional.

Last week, I was under the impression that accurate counting was too much overhead anyway, especially with compiled code, so I scrapped the profiler and re-implemented it as a sampling profiler which took measurements every 10ms. Unfortunately, sampling was not as accurate as I would like, and sampling compiled code was complex; it would require writing code for every CPU/OS combination that Factor runs on to get the program counter from a suspended context. Not to mention this didn't work right on Mac OS X. So I went back to accurate call counting, and implemented support for profiling compiled code by compiling a simple prolog which increments a word's counter at the beginning of every word. This is why you must call enable-profiling which recompiles all words first. Except for pathological cases such as Fibonacci, the overhead is not that bad, and the implementation is simple.

So the journey is complete and Factor has a profiler now, which supports both interpreted and compiled code. Even though the profiler is pretty simple (you basically get a flat call count, not a call graph) I've already used it to optimize a few things. I will use it some more to optimize the compiler; now that enable-profiler recompiles all words, I have even more incentive to speed up compile times.

No comments: