Wednesday, February 07, 2007

Unicode

Over at deinde Danny Zacharias has a post about publishers accepting unicode in submissions. Since it is directed against publishers, Eisenbrauns (in the person of Jim Eisenbraun) chose to respond. I quote:

Your comments on the virtues of Unicode and OpenType are well placed. However, your comments on publishers being unwilling to move forward are not. Almost all publishers in this field would love to move directly to Unicode and OpenType. The problem is not that publishers are slow to embrace these standards: the problem is that publishing software lags behind. Yes, InDesign handles Unicode quite well. However, Adobe created InDesign first of all to serve the general (consumer) magazine market, not the academic publishing market. In academic publishing, much more is required than getting the font/glyphs right. One has to index; create cross-references; generate tables of contents; include footnotes. Until the latest version of InDesign (CS 2), it could not do footnotes! Now, in the CS 2 revision, it will do footnotes. But it still cannot do cross-references. It cannot do more than one index (without doing so manually; suppose you want more than a Scripture index? or an author index?). (Plug-ins are available from third-party commercial companies to do these things, but they are expensive and very very slow. In fact, InDesign suffers from what I would call software piggishness: it's very fat. An example: the same file in the older publishing software that Eisenbrauns has used for 15+ years if recreated in InDesign will require a file size that is roughly 10-20x the disk space. That by itself is not much of an issue any longer, but the fact that the software is very slow and unwieldy (in comparison) simply raises costs signficantly. The list of things that InDesign will do is long.

In short, biblical studies publishers would very rapidly embrace OpenType/Unicode--and we all will. We believe that it is the future (at least I do). But the creation of true professional publishing software lags the development of this encoding standard. We're quite frustrated with this fact. But I would remind us all that academic publishing is less than 1% of the general publishing market. We feed on the scraps that fall from the table of big business. -- Jim Eisenbraun, Eisenbrauns

I encourage you to read the whole post.

1 comment:

Jonadab said...

There are very good reasons why existing software does not all jump to support Unicode right away. From the point of view of a programmer, Unicode does some truly heinous things. For starters, it redefines such extremely basic concepts as what constitutes a character -- not in the way that a new character set does (that just redefines the _list_ of characters, and possibly how many bytes they all take up) but at the conceptual level. That's one thing if you're writing new software, but if you're updating old software, it's quite another. Quite aside from the data structure issues, which are substantial, there are major logical issues. Every place your old code looks at a character, it has to be determined whether the code now needs to look at a codepoint, a grapheme, or what. If your software deals heavily with characters, as publishing software surely must, it's a nightmare probably best dealt with by scrapping the existing codebase entirely and writing a new version totally from scratch. If that sounds like a lot of extra work, you're beginning to get the idea. Additionally, Unicode support is easy to implement incorrectly in various ways, so significant bugginess is a likely outcome.

Traditionally, software publishers receive a lot of demands for what features the users of their software want -- more features than they can actually implement. So they triage these requests by estimating how much programmer time is required for each one, and whether it's worth that amount of time based on how badly users want it. For the effort required to add Unicode support, they can probably implement 200+ other features, some of them quite major, and all of them things that various users would like to have.

Indeed, what surprises me is the amount of software that *does* support Unicode already. Apparently, although it's a whole lot of work, programmers really *like* to work on Unicode support. Maybe because it's a challenge.