Scholarship and Digital Texts

Wednesday, May 14th, 2008 | Adam Solove

In both history and literature, we study the past through surviving writings. Our many stately scholarly conclusions sprout from the fertile soil of critical editions, which provide textual history, variant readings, linguistic and structural analysis, and relevant comparisons.

Digital critical editions surpass their print counterparts in the depth of interconnected information that can be expressed and the breadth of audience that can quickly find accurate information. More importantly, scholarly communities too small to warrant the typesetting costs of print critical apparatus could easily create such texts with the aid of appropriate software. Rather: all of this would be true, except that producing a digital critical edition is currently technically difficult and viewing one is less than satisfactory.

Where are we?

The TEI Guidelines have set good reference points for the character encoding, semantic tagging, and other technical requirements for saving archival-quality digital texts, The Standard ensures that these texts are saved in an open format readable by all, and that they will remain readable for long into the future.

But I would like to suggest that we move beyond seeing TEI as synonymous with digital texts and consider it instead simply a storage protocol. Then we face two interesting tasks: how can these texts best be created? how can they best be displayed?

Midwifing digital-born critical editions

TEI is superior to other standards because it represents data about a text semantically, rather than simply by visual formatting. A Word document may visually suggest to a human that some blocks of text are titles, translations, notes, etc.; but to a computer it is simply a series of distances, font sizes, and other purely decorative touches. This is problematic because such file formats may change and render old files unreadable, and also because the computer does not understand the structure of the text and cannot answer any meaningful questions about it.

TEI texts, on the other hand, use XML to mark the semantic properties of the text and can thus be operated on in useful ways. But the standard includes all the extensibility of XML itself, so scholars who want to produce such texts are quickly instructed to learn the details of XML, doctype declarations, and character encoding. Unsurprisingly, the scholars who do original textual scholarship and those who create digital texts are generally different groups.

We would never say to museum staff: “we’ll be saving your exhibit in a relational database, so here is a SQL tutorial.” We do the hard work and then hand them a lovely application like Omeka. Similarly, if we want to get scholars creating new digital-first critical editions, we need to stop pretending that someday everyone will know XML and do the hard work of creating useful software for creating semantically-tagged texts.

Screenshot of Critex

Critex is my in-development tool for doing just this. It is a Cocoa-based application for creating critical editions that can then be exported to rich text, .pdf, html, or TEI XML. It eliminates all the unnecessary formatting options available in most word processors and instead includes features of use to textual scholars. It will eventually include multiple footnote series, different formatting options for critical apparatus, and a database for tracking editions, glosses, and word usage. At the moment it is somewhat pre-alpha, but I am always looking for suggestions or programmers who would like to help.

Typesetting digital critical editions

Let’s just all agree: there is nothing lovelier than well-set critical apparatus. We’ve all had a crush on a book–maybe an edition of Milton–with big margins, marginal notes, two-columns of footnotes, all set in a beautiful humanist face with kerning and ligatures.

I want to see if we can claim that same beauty (and usability) back for online presentation.

Digital critical editions are usually displayed with each set of notes in a separate frame and appropriate links connecting them. Perhaps the best texts I’ve seen come from a group working at Oxford, which has produced “Old English Literature: A Hypertext Course Pack.

As a way of exploring possible formats for displaying critical editions, let’s compare their “Ælfric’s Life of St Edmund” with my version. I have reformatted the linked notes into floating notes that display themselves appropriately when the relevant text is visible. This is only an experiment, and I’ve just spent a few minutes entering a few paragraphs of the text, but I wonder what a reader’s experience is like on this sort of page, or how else we might better improve the look and feel of online critical editions.

New ways of storing and organizing text demand new models of writing and reading that are accessible even to the technically disinclined. I hope we will take up this rather plain topic among the many excited visualization and digitization topics at THATCamp.

Tags:

8 Responses to “Scholarship and Digital Texts”

  1. Ben Brumfield Says:

    Would you be willing to demo your alpha in a session? I’d love to see it, since my own project also is an alpha-stage system for producing critical editions of manuscripts, but I’m dissatisfied with my visual presentation.

    Are you interested in demoing at a joint session?

  2. “We need to stop pretending that someday everyone will know XML” « Public Historian Says:

    […] of all the participants, basic info about the conference, and a blog (the quote above comes from this post about TEI and digital critical editions). For those of us who can’t attend, this is a great […]

  3. Matthew Gaventa Says:

    I’m really interested in this. I work on an online encyclopedia project (Encyclopedia Virginia). The state encyclopedia community has been trying to figure out best practices for digitally-born content and has been stuck between two competing goals: one, of adopting commonly-used schemas so that our content will be as readable as possible by the technologies of tomorrow, and; two, of doing so in a way wherein less technically-inclined editors can nonetheless comfortably use, edit, and publish content.

    EV is probably more of a techie-oriented state encyclopedia project that most that are out there; we’ve adopted TEI as an encoding standard for our content, which is all digitally-born. It was critically important to us — we’re archivists and librarians at heart — to create content that would remain “readable long into the future.” But we certainly struggle now with a workflow that inevitably means a lot of hand-encoding, and I don’t think I’m putting words in mouths to suggest that this kind of approach is fairly daunting (and appropriately so) to a lot of other digitally-born encyclopedias. A long way of saying: I think there’s a lot of interest in finding ways to make that process friendlier and more user-accessible, and am curious to hear what you have to offer.

  4. Ben Brumfield Says:

    Matthew,

    I think that Adam and I would both be interested in discussing this. We’re each trying to make software that’s both rigorous and user-friendly, and would love to explore interface options with you.

  5. Laura Mandell Says:

    Adam, Ben, and Matthew:

    I have been working with a computer scientist to develop something that does TEI encoding for people — could we talk about having them help you instead? Let’s discuss at THATCamp.

    Laura Mandell

  6. THATCamp » Blog Archive » Collaborative annotation Says:

    […] is to take a digital edition of a text (possibly a TEI file exported from an application like Adam’s) and allow a class of undergraduates (or graduate students) to write all over it, producing shared […]

  7. THATCamp » Blog Archive » Visualization and Interface for Variorum and Critical Editions, Text and Video Says:

    […] working on seeks to bring together several existing standards and tools some of which–(TEI) standards for markup of digital texts, collaborative annotation, and timelines–have been proposed for discussion. I would like to […]

  8. Liste non exhaustive des thématiques abordées lors des THATCamp | ThatCamp Paris 2010 Says:

    […] thatcamp.org/2008/05/scholarship-and-digital-texts/ […]