Monday, February 3, 2014

TEI in the wild: EpiDoc

First of all, I would like to say that it wasn't easy to find a project that a) interested me and b) I could understand. For example, I found something on the TEI Wikipedia page that sounded really intriguing: a 4.5 million word corpus of Esperanto! But when I actually went to the page, it was all.... well, in Esperanto. D'oh. So much for that one.

I did find something neat, though. EpiDoc is an "international, collaborative effort that provides guidelines and tools for encoding scholarly and educational editions of ancient documents". From what I understand, this project establishes guidelines and provides tools to represent ancient documents online. It uses TEI's standard, focusing on the "history and materiality" of the text. What I found interesting was that, just as we discussed in class, the project avoids encoding the appearance of the texts (including the letters and other markings), instead looking at the relationships between and among the texts, as noted by the "human editor".

I found the website to be quite extensive, explaining that XML is very useful to encode ancient texts because, for example, you can tag missing pieces as such, and then search only for text that is not tagged as "missing". They also explain why they use TEI - it enables scholars to collaborate and share, as well as create their own specialized languages for very specific uses. 

The authors of the project have published articles about encoding ancient texts, including one article that specifically discusses EpiDoc. They also have an extensive bibliography of scholarly articles on digital epigraphy and text encoding, as well as examples of websites that use EpiDoc source, and similarly themed websites that don't. The EpiDoc Community does make its code and tools available to everyone - no registration or anything required. I was able to look at some code of an ancient Roman marble fragment: there are only a few remaining letters that can be fully read, the rest are broken off or simply missing, and the code accounts for that by tagging "gaps", and even encoding the reason they are there: "lost" or "damage". Pretty cool, I think. 

2 comments:

  1. I agree that is pretty cool. I got lost in the EpiDoc website for a few minutes there. Very interesting stuff. Thanks for posting that. It reminded me of what Alan was saying in class this morning about coding text you don't understand or you can't read; giving the example of the Voynich manuscript (unless I'm confusing my potential hoaxes).

    I thought you might find this article interesting: http://www.jhu.edu/~jhumag/0903web/code.html

    It's an article about a scholar from John Hopkins, who is translating the Enuma Elish to English and then taking it a step further and coding it using C++. I would say check out the project if you're interested in ancient history, near eastern studies, or religious studies (as well as coding, obviously). I think it's very interesting how a texts from thousands of years ago - that have somewhat survived in their original form - are being used in new ways, being digitized, and researched from new angles.

    ReplyDelete
  2. Thanks, Kevin! This is a fascinating read, for sure. I really liked the fact that it's also being taken a step further and printed by a 3D printer, so that "lay-folk" like myself could potentially actually hold a replica of a tablet. Again, my brain wanders off to educational venues - in conjunction with books and websites, what if students had access to replicas of the primary source! Breathtaking, I think.

    ReplyDelete