Sunday, August 30, 2009

TeX++: An Object Oriented TeX Implementation

I have been thinking for a while now about writing in plain TeX. There is just something appealing about it that I can't describe. However, being one of these confangled young programmers, I have some reservations with how TeX is implemented. For one thing, it seems like memory limitations is quite...well, alarming. Similarly, my knee jerk reaction is that any font used should be UTF-8 or some other Unicode font...

At any rate, I have also independently been playing around (just playing mind you) with some toy computer algebra system code. It doesn't have much ability yet, but I'm just playing mind you! This system is done in C++ for efficiency and the object oriented-ness. Ideally it would be nice to have some system that reads in TeX formatted stuff and spits out TeX formatted answers, so one can easily copy/paste it into the document in question.

This would require writing a TeX parser from scratch (which is bad!). Well, while stumbling around looking at TeX resources, I found this one implementation:

  • Kasper Peeters, TeX++, A modern and free C++ implementation of TeX based on CommonTeX

I honestly know very little about TeX, and even less about parsing systems (Ant? Yacc? Bison? These are all words I know, but not as programs). It seems like there would need to be a UTF analog to metafont/metapost, maybe metapost is up to the task but I don't know. It seems kind of limited as far as memory is concerned too.

Perhaps it is unrealistic for me to be so concerned about such limits since its unrealistic I would ever reach them, and if I ever did it would be possible to modify TeXlive to handle it.

It seems that this is more a random post than a review (sorry!), but I'm just thinking about it out loud. It'd be nice to implement TeX by hand, just a learning experience I guess. There is a C implementation which was obtained by using the p2c program to convert the original Pascal source code into modern C:

  • Waldek Hebisch, TeX in C, alpha version of TeX converted to C.

I personally don't mind the metapost/metafont syntax, nor do I mind the TeX syntax. (Personally, I love them both! I know there are young mathematicians and physicists who are used to MS Word who think learning TeX is like learning Latin; however, I can't help but feel overwhelming pity for these souls, they don't know the capacity TeX has!) It's just their limitations and their output target that bothers me.

So personally, the steps for anyone who wants to "reinvent the wheel" (so to speak) would probably have to do the following:

  1. study the TeXbook (fortunately it is freely available online here with the necessary macros here);
  2. study the metapost/metafont implementation;
  3. learn how to work with one of these fancy lexical parsers and some corresponding semantic programs (flex/bison/yacc/ant/etc.);
  4. work with UTF encoding (I think UTF-8 or UTF-16 would be good, UTF-32 as optional?);
  5. work with PDF and postscript (and, for backwards compatibility, DVI) output;
  6. then finally the actual implementation which is itself a one-billion step process...

I'll add any addendums or links that I see fit. Below in their respective sections.

TeXbook Related

So far nothing...

Metapost/Metafont Related

I found this article which looks interesting

The introduction beautifully sets up the problem in this component of the algorithm:

Already relatively long ago it was recognized that the design of METAFONT is unsufficiently extensible. In 1989, at the tug meeting, Hobby announced the beginning of work on METAPOST, a program for the generation of a set of eps (encapsulated PostScript ) files instead of a bitmap font; in 1990, the first version of METAPOST was running. In the same year, Yanai and Berry [23] considered modifying METAFONT in order to output PostScript Type 3 [7] fonts.

Type 3 fonts can be legitimately used with TEX. Actually, bitmap fonts are always implemented as Type 3 fonts by dvi-to-PostScript drivers. Recently, Bzyl [15] has put a lot of effort into the revival of Type 3 fonts in the TEX world. Nevertheless, Type 3 fonts have never become as popular as Type 1 fonts, and probably they never will. One cannot install Type 3 fonts under Windows, MacOS, or X Window, although there are no serious reasons for that—it would suffice to include a PostScript interpreter into an operating system, which is not an unthinkable enterprise. But the commercial world is ruled by its own iffy rights... Anyway, in order to preserve the compatibility with the surrounding world, one should rather think about Type 1 than Type 3 fonts.

Alas! The issue of converting automatically METAFONT sources to Type 1 format turned out to be more difficult than one could expect (cf. [18, 19, 20, 21, 22]) and after nearly twenty years since the birth of TEX no programming tool for generating Type 1 fonts has appeared. As a consequence there is a glaring scarcity of fonts created by the TEX community.

The MetaType1 package was developed as a response to that bitter situation. Whether it can be classified as good —the future will reveal. So far, MetaType1 helped us to prepare a replica of a Polish font designed in the second decade of the twentieth century, Antykwa P ́ltawskiego [16]. It also proved useful in improving some freely available families of fonts [17].

Also the Metafont Book is available online (and its macros are here).

There's nothing like RTFM:

Lexer/Tokenizer Related

I found several free references:

UTF Related

So far nothing...

PDF, PostScript, DVI Libraries Related

So far nothing...

Implementing TeX from Scratch

One might want to look at the WEB sources for TeX, specifically when compiling it with weave tex.web; pdftex tex.tex on the command line. It is the details of the inner workings of the TeX program!

No comments: