Archive for the ‘Proceedings of THATCamp’ Category

Scholarship and Digital Texts

Wednesday, May 14th, 2008

In both history and literature, we study the past through surviving writings. Our many stately scholarly conclusions sprout from the fertile soil of critical editions, which provide textual history, variant readings, linguistic and structural analysis, and relevant comparisons.

Digital critical editions surpass their print counterparts in the depth of interconnected information that can be expressed and the breadth of audience that can quickly find accurate information. More importantly, scholarly communities too small to warrant the typesetting costs of print critical apparatus could easily create such texts with the aid of appropriate software. Rather: all of this would be true, except that producing a digital critical edition is currently technically difficult and viewing one is less than satisfactory.

Where are we?

The TEI Guidelines have set good reference points for the character encoding, semantic tagging, and other technical requirements for saving archival-quality digital texts, The Standard ensures that these texts are saved in an open format readable by all, and that they will remain readable for long into the future.

But I would like to suggest that we move beyond seeing TEI as synonymous with digital texts and consider it instead simply a storage protocol. Then we face two interesting tasks: how can these texts best be created? how can they best be displayed?

Midwifing digital-born critical editions

TEI is superior to other standards because it represents data about a text semantically, rather than simply by visual formatting. A Word document may visually suggest to a human that some blocks of text are titles, translations, notes, etc.; but to a computer it is simply a series of distances, font sizes, and other purely decorative touches. This is problematic because such file formats may change and render old files unreadable, and also because the computer does not understand the structure of the text and cannot answer any meaningful questions about it.

TEI texts, on the other hand, use XML to mark the semantic properties of the text and can thus be operated on in useful ways. But the standard includes all the extensibility of XML itself, so scholars who want to produce such texts are quickly instructed to learn the details of XML, doctype declarations, and character encoding. Unsurprisingly, the scholars who do original textual scholarship and those who create digital texts are generally different groups.

We would never say to museum staff: “we’ll be saving your exhibit in a relational database, so here is a SQL tutorial.” We do the hard work and then hand them a lovely application like Omeka. Similarly, if we want to get scholars creating new digital-first critical editions, we need to stop pretending that someday everyone will know XML and do the hard work of creating useful software for creating semantically-tagged texts.

Screenshot of Critex

Critex is my in-development tool for doing just this. It is a Cocoa-based application for creating critical editions that can then be exported to rich text, .pdf, html, or TEI XML. It eliminates all the unnecessary formatting options available in most word processors and instead includes features of use to textual scholars. It will eventually include multiple footnote series, different formatting options for critical apparatus, and a database for tracking editions, glosses, and word usage. At the moment it is somewhat pre-alpha, but I am always looking for suggestions or programmers who would like to help.

Typesetting digital critical editions

Let’s just all agree: there is nothing lovelier than well-set critical apparatus. We’ve all had a crush on a book–maybe an edition of Milton–with big margins, marginal notes, two-columns of footnotes, all set in a beautiful humanist face with kerning and ligatures.

I want to see if we can claim that same beauty (and usability) back for online presentation.

Digital critical editions are usually displayed with each set of notes in a separate frame and appropriate links connecting them. Perhaps the best texts I’ve seen come from a group working at Oxford, which has produced “Old English Literature: A Hypertext Course Pack.

As a way of exploring possible formats for displaying critical editions, let’s compare their “Ælfric’s Life of St Edmund” with my version. I have reformatted the linked notes into floating notes that display themselves appropriately when the relevant text is visible. This is only an experiment, and I’ve just spent a few minutes entering a few paragraphs of the text, but I wonder what a reader’s experience is like on this sort of page, or how else we might better improve the look and feel of online critical editions.

New ways of storing and organizing text demand new models of writing and reading that are accessible even to the technically disinclined. I hope we will take up this rather plain topic among the many excited visualization and digitization topics at THATCamp.