Wednesday, December 13, 2006

Factor module system considered harmful?

Eduardo Cavazos doesn't like the current Factor module system, and has a proposal for a replacement. In a nutshell, he wants to unify the concept of a module with a vocabulary, and remove the conceptual overhead that goes with the distinction. I agree with most of his ideas. I wrote a lengthy post to the Factor-talk mailing list in response to an e-mail from Eduardo which discusses Eduardo's module system idea, and suggests some additions/improvements which would fix even more shortcomings of the current module system. I've reproduced the post below:
Hi Eduardo,

The more I think about it the more I like your idea. It would solve some
problems I perceive with the current setup. The first one, I think, is
the one you're focusing on, but the others strike me as easy
consequences of the design:

1) Too many concepts. Vocabularies, modules, source files... USING:,
REQUIRES:, IN:, PROVIDE:. The module system feels hacked on to an
existing core which has a notion of vocabularies, and this is exactly
what happened. I implemented the module system initially as a half-hour
hack to make it easier to load libraries from contrib/ without manually
loading dependencies. It did this job well, but now we both want to use
it to /structure code/, and later on to edit code in the UI, which is
something else entirely.

2) Suppose you have a source file where you write,

IN: foo
: a ;
: b a ;

You load the source file into Factor. Then you add a new word definition
: c b ; but you add it _before_ the definition of b. The file still
loads, however, because the definition of b is present in the vocabulary
when c is parsed.

But if you start Factor with a clean image, the source file won't load
(and bottom-up ordering of definitions a good thing; in CL,
automatically interning symbols in the wrong package can be a source of
problems).

--

The new module system can fix the problem (which is that the modified
source file could load without an error in the first place) as follows.
You start by marking words in a module with a special flag before
reloading the module; then as each definition is read, the word's flag
is reset. If a definition refers to a word with the flag still set, a
parse-time error is raised. When the module is finished loading, the
module system can also check if any words which have the flag set are
still referenced by words in other modules; if so, a message can be
printing instructing the programmer to either reintroduce the word, or
refactor those modules, etc.

3) Suppose you have a source file X.factor where you write,

IN: x
: a ;
: b a ;

And another file Y.factor,

USING: x ;
IN: y
: c b ;

You load X.factor and Y.factor, then you decide to move the b word from
the x vocabulary to the y vocabulary. So you move the word and reload
the two files, but the definition of b in x is still in the dictionary
and this can cause confusion when testing the code, or if y has a USING:
x ; after the IN: y.

--

The new module system can fix this, again, by using 'obsolete' flags on
words. After loading a module, any words which are marked as being old
have not been redefined by the updated source file, and thus they can be
forgotten.

4) Source files are allowed to define words in arbitrary vocabularies.
By unifying the notion of a source file and a vocabulary, you completely
do away with this 'feature'.

--

Redefining words in other vocabularies is not a problem in itself, but
it is something I want to move away from. Right now for example you
cannot load the Cocoa and X11 UI backends at the same time, and use the
Cocoa instance while running Factory, because the two UI backends will
redefine the same DEFER:'d words.

Running the Cocoa UI and Factory at the same time is contrived use case,
but with a structure editor you want to load code before editing it, and
not being able to load certain modules is totally unacceptable. So
instead, I want to restructure the UI code, as well as I/O and compiler
backends, so instead of redefining words it uses generic words,
dispatching in a singleton tuple representing the "backend" in question.

If the module system enforced such a design, it would motivate me to do
this refactoring faster :)

Now I'm completely against a language which enforces *arbitrary* design
decisions, because it requires bending over backwards to implement
certain things; for example, something that only supports
message-passing OOP is too restrictive in my opinion. But enforcing
design decisions which demonstrably lead to code which is cleaner and
easier to write is a good thing, for example memory safety and garbage
collection. (I'm not going to get into whenever I think static typing
enforces arbitrary design decisions or improves code quality, and I
don't want any replies to this thread to turn into a dynamic -vs- static
discussion, please, since this has nothing to do with the module
system.)

5) A continuation of the theme in 1/2/3/4. I've run into this problem
from time to time. Suppose you have a file a.factor,

IN: jelly

: toast ... ;

And another file, b.factor,

IN: jelly

: toast ... ;

You forgot that a 'toast' word already exists, and redefined it with a
different meaning! Then you test some caller of 'toast' in a.factor only
to discover it doesn't work. Of course unless you decide to put the
words in a.factor and b.factor in different vocabularies, the new module
system as Eduardo describes it won't solve this issue outright, since
the naive programmer could simply concatenate a.factor and b.factor into
one:

IN: jelly

: toast ... ;

... words using toast #1

! Oops, didn't notice this is a redefinition... : toast ... ;

... words using toast #2

However, just like referring to a word with the 'obsolete' bit still set
could be a parse time error, /redefining/ a word with the 'obsolete' bit
/not/ set could be an error too, unless the old definition was a DEFER:.

6) Documentation will be simplified. Right now we have several parallel
conceptual 'trees' in Factor;

- vocabularies contain words - modules contain source files which
contain definitions (where definitions includes words, methods, etc) -
the documentation for the core library is rooted at a 'handbook' page
which links to multiple articles, each one documenting a major subsystem
of Factor, but many such top-level articles don't really correspond to
modules and vocabularies

If modules and vocabularies were unified, I could redo the core
documentation in a more structured way, with each vocabulary/module
having a main documentation page, containing as subsections more
articles, word documentation, and so on. The handbook could still start
with an 'entry points' page with links to the most commonly needed
topics, much like the 'cookbook' now.

Note that all this could be done without drawing a distinction between
'loading' and 'reloading' from the viewpoint of the user, and even more
importantly it would not require extra bookkeeping beyond tracking
loaded modules and words (for instance, implementing the first 3 items
can be done with the current setup, by recording a list of definitions
loaded from every source file in a new data structure, but maintaining
this would be extra burden).

7) What should be done with unsafe words in -internals vocabularies is
an open question. I'm against having private words and such, because I
believe any word should be interactively testable. One way to get around
this is to have private words, but also have another parsing word
FRIEND: which is like USING: but makes private words available. But this
would complicate the implementation (and description!) of the vocabulary
search path. Instead, it might be best to simply split off -internals
vocabs into separate files and treat them like normal
modules/vocabularies.

Let me know what you guys think,

Slava

No comments: