Saturday, December 08, 2007

Factor-powered pastebin with syntax highlighting for 100+ languages

I'm proud to announce a major new version of extra/webapps/pastebin. I've deployed the new pastebin on an experimental HTTP server. I hope it doesn't crash by the time you read this. The major new feature is syntax highlighting support.

For the last two weeks or so I've been working on a Factor port of jEdit's syntax highlighting engine. For those who have never used jEdit, syntax highlighting rules are specified in XML files. Rules either involve literal or regular expression matching. Factor already had an XML parser which works pretty well, but regexps were missing.

I've been working with Doug to implement a new regexp library; he will blog about it in the next few days. It suffices to say that it is coming along very well, and the code is very concise and readable, as is the norm with Factor. We're using Chris Double's parser-combinators library to parse the regexp itself, then construct parser combinators from the regexp.

There is quite a bit of history behind the jEdit syntax highlighting engine. jEdit 1.2, released in late 1998, was the first release to support syntax highlighting. It featured a small number of hand-coded "token markers" -- simple incremental parers -- all based on the original JavaTokenMarker contributed by Tal Davidson.

Around the time of jEdit 1.5 in 1999, Mike Dillon began developing a jEdit plugin named "XMode". This plugin implemented a generic, rule-driven token marker which read mode descriptions from XML files. XMode eventually matured to the point where it could replace the formerly hand-coded token markers.

With the release of jEdit 2.4, I merged XMode into the core and eliminated the old hand-coded token markers.

As you can guess from the age of the code, many design decisions date back to Java 1.1, a long-forgotten time when Java VMs with JIT compilers were relatively uncommon, object allocation was expensive, and heap space tight. As a result the parser is basically procedural spaghetti code, with lots of mutable state, the sort that gives Haskell programmers nightmares. So while the code is ugly and the parser design is archaic, the huge advantage is that there is a large suite of contributed modes ready to go.

I expected it would be a pain to port to Factor, but this hasn't been the case. Even though I kept the original design, warts and all, the resulting Factor code is pretty decent. I intend to refactor it to use a more modern design and take advantage of Factor's abstraction features instead of just representing parser state with an bunch of integer and boolean-valued variables. I also need to optimize it more.

Because XMode is an incremental parser, it would be easy to cook up an editor gadget with support for syntax highlighting in the UI. I'll probably do that at some point. Then it will be easy for a contributor to build a text editor in Factor, and we will have come full circle. :-)

Being able to implement a syntax highlighting pastebin in Factor -- one that manages to reuse jEdit mode files no less -- is a milestone. It means we can do XML, regular expressions, web applications, and not to mention the various minor bits and pieces which support these components, such as date/time support, various parsers, and so on. We have different people working on different libraries, and everyone is able to understand and reuse other people's work. It only took a handful of contributors to build all this infrastructure.

3 comments:

Anonymous said...

Instead of a full text editor in Factor, how about a universal text editor "engine" or library (and then make the text editor)? As in, how Emacs can simulate Vi with Emacs Lisp, except the "engine" isn't a full text editor itself. It would deal only with text display and keyboard commands (and perhaps a few other things), with the rest of a given editor (like spell checking, code highlighting, plugins, GUI, etc...) being normal Factor code, or DLL files linked to it. It could simulate Emacs, Vi, Xywrite, Archy, Notepad, etc... The only default setting in the engine would be, perhaps, automatic conversion from Mac/Unix/Dos text format to native for display. Emacs Lisp would probably serve as a good inspiration. Factor likely has most of the words that would make up such a "toolkit", rendering an "engine" unnecessary... Just a bit of baloney for the day :).

Slava Pestov said...

Anonymous: personally I'm happy with using jEdit as my editor, I was tossing out some ideas for others. Your idea is good. Even if we don't end up with an editor, building the various pieces such as spell check and so on would be a good exercise.

Hiren said...

Hey Slava Its working great. Thank You.

Hiren