Archive for the ‘Uncategorized’ Category

Crowdsourcing Transcriptions

Friday, May 23rd, 2008

The other idea I would love to talk about is the idea of distributed document transcription as I explain it in my blog post: Archival Transcriptions: for the public, by the public.  While I do love what reCaptcha does at the word level and Footnote.com does with locations, names and dates – I still think there is a place for a centralized web-based system where digitized documents can be uploaded and then transcribed & verified by volunteers. I think this would be especially powerful for smaller archives.

I would love to hash out this idea with others as well as learn what other projects like this might already exist.

Visualizing Aggregated Data

Friday, May 23rd, 2008

I would love to discuss ideas for visualizing aggregated data.

My personal focus has been on descriptive data about archival record groups and manuscript collections – with a stress on subject terms, quantity of materials (think total linear feet), subject terms and physical location of the materials.  I worked on a prototype visualization tool called ArchivesZ – but I have also seen many other inspirations for alternate approaches.

General topics I would like to include:

  • leveraging standard markup, such as EAD (Encoded Archival Description), to support aggregation of  information about collections both within and across institutions
  • the challenge of non-standard subject terms
  • the coolest visualizations we think could be adapted to this type of data (my current obsession being the TimeRiver as exemplified by the NY Time’s box office revenue visualization)

I think that these ideas could coordinate well with what Laura mentioned in An archive aggregator.

2 Ideas

Friday, May 23rd, 2008

I am already committed to talking about historical visualizations and event standards with my buddy Jerm. But I’d also really like to see/attend/crash two additional panels:

1) Something on management: Project management, organizational management, staff management, financial management, resource management. Digital humanities work has put a lot of us in the position of managing fairly large “businesses”—work for which our graduate work emphatically did NOT prepare us. I’d love the chance to discuss management challenges and strategies with other campers. A group therapy session, if you will.

2) Something on funding, and more specifically on sustainability, both for digital humanities projects and digital humanities organizations. Dan, Mills, and I talked about sustainability on the last Digital Campus, but there’s a lot more to be said. It’s a huge issue both for us and for our funders, and one around which there’s a lot of miscommunication and misunderstanding. It would be nice to start a dialog at THATCamp.

Anyone interested in either of these ideas, please chime in with comments.

Challenges to Historical Visualization: the Need for an Event Standard

Friday, May 23rd, 2008

Hello, THATCampers! Jeremy and I have been thinking about doing something on the genres and challenges of historical visualization in the digital realm.  We’d like to take a catholic view of visualization, considering everything from simple timelines to rich visual reconstructions such as Rome Reborn.  The former have gotten a pretty bad rap over the years, but as historians, we personally tend to be just as skeptical of the latter.  We’ll tell you why at Camp 😉

One of the things we’d like to discuss in particular is what we see as one of the primary roadblocks facing quality historical visalizations of all kinds: the fact that there aren’t any good or widely accepted standards for describing and marking up historical events. Digital historians have managed to do a lot with maps and documents, places and artifacts, because there are good and well established metadata standards for describing these units of historical analysis (e.g. longitude, latitude, KML, MARC, OAI, etc.)  But we don’t have anything comparable for marking up happenings, which are at least as important as place and stuff to historical discourse.  There are, however, several contenders, including HEML, Microformats (hCard, hCalendar, Geo), and iCal, and we’d like to bounce these around to see if any stand out or can be made/hacked to do the job. At the very least, we’d like to start a conversation and encourage smart people to start thinking about just what a useful event standard would look like.

We were thinking Tom might introduce the session with some thoughts on historical visualizations in general and on timelines (and their persistent audience popularity) in particular.  Jeremy could then introduce the more specific (i.e. meaty/practical/useful) topic of event standards and demonstrate a proof-of-concept for implementing various Microformats for creating maps, timelines, and other visualizations with ads from the Virginia Runaway Slave database. To round out the session it would be great if we could find a couple campers with more experience working in Second Life, gaming, or 3-D reconstruction to join us to share their thoughts on the role (or lack thereof) of time-centered and other standards in more immersive visualizations. Finally, we’re totally open to suggestions from campers who would like to take the session in another direction altogether. Jump on board!

Tom and Jeremy

Text mining and visualizations

Friday, May 23rd, 2008

I’d like to second Laura’s suggestion about a session on textual visualizations and document archives.  I’d like to add to that the issue of text mining.  I’m soon going to be beginning a text mining project using a couple of collections of nineteenth-century American documents: the Valley of the Shadow Archive and the complete run of the Daily Dispatch (Richmond, Virginia, paper) during the Civil War.  I emphasize the word beginning.  I’m really suggesting the panel this as a humble supplicant.  I know that Bill’s done some work on text mining (and I’m already indebted to him for his very thoughtful blog posts on the subject), and that Dan, Jeremy, and Sean Takats are beginning a major text mining project at CHNM–I was hoping that you who have been thinking about and have some expertise in text mining would be interested in such a session.  A particular issue I’d be interesting in brainstorming about is methods to use text mining for analysis, i.e. producing visualizations drawn from thousands, maybe tens or even hundreds of thousands, of documents that shed new light on historical questions.

I’ve been doing a bit of thinking about using text mining for analyzing the Valley Archive.  Compared to large online archives like Google Books and the American Periodical Series, the Valley Archive is comparatively modest in size.  But as a curated collection it does offer something that I think likely to be useful: a number of pre-existing axes that offer what I expect will be some analytically interesting opportunities to contrast different caches of documents .  One obvious point of contrast is northern vs. southern documents.  Another axes would involve the nature of the document–from public documents (any published source), to very private documents (e.g. diaries), to semi-private (e.g. letters).  Being able to throw up visualizations in six quadrants (well, that’s not the right word, but I can tell you that “sextrants” definitely isn’t right) and compare language used in northern vs. southern public writings, southern public vs. southern semi-public vs. southern very private documents, etc., might immediately offer what I’m hoping will be interesting and useful interpretative possibilities.  What are the difference between the way northerners and southerners write about “nation?”   How does (or does) that change over the course of the war?  Do we see convergence or divergence between the sections/nations?  Is there any sectional difference in the language involving or surrounding death?  How does that change over time?   And what are the differences in the language around “death” in public vs. private documents?

I was really interested in Adam’s post on “Scholarship and Digital Texts.”  Perhaps we should have a pair of sessions on texts, one focused on issues of deep markup (xml, tei, etc.) and innovative presentation of particular documents (making the most of micro collections like a critical edition) and another focused on issues of using text mining and visualizations to tackle and make use of the abundance of digitized documents now available (macro collections).