Thursday, February 6, 2014

The ARTFL Project

Thought I'd share the Project for American and French Research on the Treasury of the French Language (ARTFL) being done by the University of Chicago.  Simply put, the TrĂ©sor de la Langue Francaise and it`s sum of 150 million dictionary words have been converted to a TEI-conformant encoding scheme. They recently finished "another round of text and metadata corrections based on user submissions from [their] 'Report Error' interface" (artfl-project.uchicago.edu).

The project is important, not just to lexicographers, but to many other types of humanists and social scientists as well. The idea behind the project - the mandate, even - was to create a database as versatile as possible that could be easily accessible to the research community. It has all kinds of historical and linguistic importance and there is much scholarly work to be done on this collection. It is perhaps the nearest thing to a definitive collection of French words drawing on poetry, literature, the sciences, mathematics, and the like, in existence.

But unlike the digitization project in my last post, this project is not accessible to the public unless they have a "very modest" $500 subscription. The website boasts 363 research institution and universities are currently subscribed to ARTFL, including the University of Toronto (which pays $250 annually for its subscription). To me, number of subscribed facilities is low when you consider the number of higher education facilities world wide (Times Higher Education rates the top 500 schools in the world), not mention the number of research institutions, museums, and the like worldwide that could benefit from access to this collection. Members of the public at large are even more at a financial disadvantage.

Though their restrictions on access disappoints me, they provide a bit of an example of how they are encoding their text data.  They've adopted a multilevel hierarchy for main textual objects (i.e. documents). For the purposes of this blog, it may spark ideas of how a major project uses division tags, paragraph tags, sentence tags, header tags. There are quite a few examples of code to use as a basis for digital humanities TEI coding projects; especially any dictionary TEI coding projects. Unfortunately, the full schema is restricted for internal data use only. But they do state they have "decided that [their] internal data representation should leverage existing stands" rather than creating new ones (ARTFL Project, University of Chicago).

This project is constantly being edited and expanded on and it will be interesting to keep an eye on how it grows over the years. Perhaps one day access to unrestricted.

No comments:

Post a Comment