Beyond citation and search

Wednesday, May 28th, 2008 | Douglas Knox

My original THATCamp proposal mentioned some playing around I have done under the inspiration of Bill Turkel’s mapping of libraries that hold copies of William Cronon’s Nature’s Metropolis.

I adapted Bill’s Python scripts to screen-scrape Worldcat holdings records for a number of thematically related publications, and then made a script to create some PDF maps using the GMT software (Generic Mapping Tools) rather than Google Maps. I now have a couple hundred publications in a MYSQL database and have set up a small project using the Django development framework to browse the information, although I haven’t incorporated mapping into the framework. This is a personal hobby project at the moment, and I have not yet had as much time as I hoped to develop it further.

Although I could demo GMT and Django at a basic level (I’m no expert) if there is any interest in them, the larger issues I’m interested in discussing concern aggregation and visualization to support inference. Some clear connections here are Laura Mandell’s Archive Aggregator, Jeanne Kramer-Smyth’s Visualizing Aggregated Data, Tom Scheinfeldt’s Challenges to Historical Visualization, and Karin Dalziel’s Search and Digital Projects.

It’s exciting to see how much is going on with visualization and with data linking, and how swiftly the barriers to entry are dropping. We need experiments of many kinds, including free-form play, to figure out what the tools are good for. Without limiting the experiments, I’m interested in thinking about how it is that a visualization or data pattern can come to mean something and support some kind of inference. That’s different from search.

Within the area of bibliographic data of various kinds, what starts out as “metadata” (created with a particular context in mind of search and discovery, description and access) can have an additional role, provisionally, as primary data. When I do a search in a library catalog, or a timeline visualization in Zotero, although that can be a means of discovery for particular items, and it may not need to be anything more than that, it can also be a direct provocation to interpretation—if I see a pattern, and if there is some historical hypothesis that can explain that pattern. How can we think more clearly about when such inferences are warranted, and what information researchers might need to better evaluate them? And what would it take to develop the standards and tools to better enable this kind of exploitation of existing data? I would like our library catalog searches not to return twenty results at a time with a “next” button, but to offer the full set of results in a single standard form to be downloaded or piped directly to whatever other tools we might have for further processing.

In addition to this kind of discussion, I’m very much interested in digital civic engagement, in sustainability and project management, and in the demos and tools discussions, and in the RDF-related presentations. I posted a comment to Tom’s “event standards” post that I think bridges between the point made above and my novice curiosity about RDF.

There is such a range of interesting stuff here! Enough to go all week and not run out. I’m looking forward to meeting you all.

3 Responses to “Beyond citation and search”

  1. Adam Solove Says:

    Douglas, I would love to add GWT and Django to the Dork Shorts (thatcamp.org/2008/05/dork-shorts/) session. These are two things I’m very interested in but know almost nothing about.

  2. Karin Dalziel Says:

    I too would love to hear more about Generic Mapping Tools.

    I have a slideshow put together of some innovative search/broswing/mapping/time interfaces I have collected (a tiny, tiny subset of what’s out there), and I would be willing to blaze through them, Pecha Kucha style, if anyone is interested.

  3. Karin Dalziel Says:

    Oh, and I *love* your idea of returning results from the library catalog as a downloadable subset of data.