Building a F/OSS DH Infrastructure

Friday, May 2nd, 2008 |

Back in November, I was brought in as the first DH lead developer for the TAMU College of Liberal Arts because a growing number of faculty are wanting to do something with digital humanities, they didn’t have anyone available who could interface between the faculty and the technologies, and they were wanting to leverage the open source community. The easiest way to produce open source is to pay someone.

I’m a single person. The faculty are many. It is obvious that I can’t build fully customized projects for each faculty member. I don’t have time. So I’ve set out to build an infrastructure that allows us to reuse as much code as possible between projects, even to the point of running projects on the same application layer but with different data, different processing, and different presentations/skins.

This project is still in its infancy. I’m slowly building a project development infrastructure, working through the design, and planning the milestones that will result in software releases. Ultimately, we hope to have a sandbox for faculty to play in where they can link together various actions and widgets, building presentations from data without having to write code.

We’re leveraging as much open source we can. We believe in using open protocols and building everything so it’s inter-operable with other systems. We want people to be able to explore a collection of documents presented by our system and file relevant documents away in Zotero. We want faculty to be able to work easily with collections of source documents presented in Omeka. We want to build on whatever comes out of Project Bamboo.

The resulting system isn’t something we expect a faculty member to install on their spare machine and run. We’re building something that should be installed in a central location by a knowledgeable UNIX system administrator (but without the need for a DBA or other specialized employees) so everyone can use it.

At the moment, I can count five or six bibliography projects here at TAMU. Every single one of them stores different data in different formats that can’t inter-operate. If I want to tie together papers written about Shakespeare by science fiction authors from California, I have to write quite a bit of custom code to screen scrape the World Shakespeare Bibliography and correlate information there with what I can download in the database dump of the Internet Speculative Fiction Database — assuming I don’t want to do all of that by hand (finding out who is a SF writer from CA and then searching for them in the WSB).

I want a standard RDF language for storing bibliography data (as an example of what can be stored). I want a standard way to reference source artifacts that support particular statements of fact in an RDF knowledge base. I want to be able to ask the computer a question and get back an answer with the references to back up that answer.

There are a lot of problems that we aren’t solving with this system, but I’m only one person, and there are a lot of faculty. This system can support a wide range of data-driven projects that allow for interactive exploration of the data. It’s extensible, and the openness allows others to build tools that we haven’t planned for.

This is what I’d like to talk about at THATCamp and perhaps get feedback (am I crazy?), pointers to useful resources (is there an “academic grade” bibliography RDF vocabulary already?), and some help.

5 Responses to “Building a F/OSS DH Infrastructure”

  1. asolove Says:

    This sounds very interesting. My first thought is to make sure you know about MIT”s SIMILE, as PiggyBank (simile.mit.edu/wiki/Piggy_Bank) has a number of scrapers that might help with getting data into such a system, and Exhibit (simile.mit.edu/exhibit/) has a couple of examples of displaying such data (and even hosts a live reftex->json converter for this purpose). I too would love to know what work is being done with bibliography, as many fields have very precise additional requirements.

  2. Ben Brumfield Says:

    I’d love to talk with you about a totally different aspect of your work: what your experience has been as a technical expert to humanities faculty. How much agency do you have? How effectively do you feel the individual faculty projects have been using your skills?

  3. patrickgmj Says:

    It hasn’t been officially released yet, but there is a Bibliography Ontology that’s designed to be very robust and extensible when needed. I’d probably start here: Bibliographic Ontology Specification.

    It sounds like you’d also need a vocabulary/ies to link up the statements of fact to the references…I’m very curious about what kinds of facts you anticipate needing to work with, and what vocabularies you are or are thinking about bringing in.

    Last, if you haven’t already seen it I’d suggest checking out DBpedia. They’ve scraped Wikipedia info and put it into RDF. That might be something either to think about incorporating, or at least checking out!
    BTW–generally speaking, Semantic Web/RDF stuff is exactly the kind of stuff I’d like to talk about at THATCamp, too!

  4. jgsmith Says:

    I have a fair amount of agency in my position. My position description has seventy percent of my time as mine to manage as I see fit. I don’t report to any particular faculty member who has a DH project. I’m part of the college-level information technology team.

    Any DH projects to which I commit a significant amount of time must pass review by the college’s DH Program advisory committee. A faculty member can’t walk into my office and demand that I spend several hours programming for them.

    Part of my job description is to produce open source and provide an infrastructure that can support a large number of DH projects. We’re trying to avoid ad hoc development that only supports one project, so no one project can dictate what I need to provide. This also means that almost all DH projects that were off the ground before I started in November are not expecting programming support from me.

    If I see that I’m unable to do my job because of faculty demands, I have several ways of pushing back while giving faculty a way to have their request vetted by a peer-review process. I also have office hours during which anyone can drop by and talk about anything. It’s a bit of an experiment for us, but it seems to be going well so far.

    A lot of my support for individual projects has been as a consultant helping faculty understand the technology so they can write their grant applications, conference proposals, etc. We’re still in the bootstrapping stage for the position and the Program. We hope to have a few project proposals come through over the summer and early fall. These would be the first to have full support from my position.

    My use of a bibliography was purely for illustration, though it is relevant to what we’re doing. I am also interested in genealogy because that field has done a lot of work with relationships between people. Another example of a question I want to be able to ask someday is (assuming a database of O’Reilly authors), “who are the grandchildren of the author of _Programming Collective Intelligence_?” and perhaps, “where do they live now?” displayed on a Google Maps mash-up (perhaps to see if they prefer the coasts).

    Simile Exhibit does look nice. I think I’ve seen it before, but it’s been a while. Definitely something I’ll be making available to faculty — it might provide the basis for a faceted browsing widget.

  5. Liste non exhaustive des thématiques abordées lors des autres THATCamp | ThatCamp Paris 2010 Says:

    […] Building a F/OSS DH Infrastructure (THATCamp 2008): interopérabilité des projets en utilisant RDF, particulièrement projets bibliographiques (pris comme exemples), utilisation de normes/standards ouverts, open source dans le cadre d’une université; […]