Update: More technical information can be found on Jeff's blog announcing the project.
Decoder Ring is a web-based, collaborative language analysis tool designed for academic research of textual content. It features:
- Abstracted, flexible, powerful data model
- Sustainable, low cost, open source framework
- Project- and group-based to facilitate collaboration
- Tools for gathering, importing, browsing, and exporting large data sets
- Automated and extensible reporting tools
If you'd like to learn more, including about how to gain access or get a demo, please contact Jeff Beeman.


Background
Ever since I was introduced to the concept of situated meaning in James Paul Gee's "What Video Games Have to Teach Us About Learning and Literacy," I've been fascinated by the idea of language as a malleable, contextual, wildly variable human construction. In particular I'm drawn to the idea of "specialist" language - those words, phrases and ways of speaking whose usage ebbs and wanes as we transition between various professional domains. Although, the term "professional" isn't really right here - it's really any domain in which a level of expertise, or specialization, is valued. Academic language is a commonly discussed form of situated language.
This interest in situated meaning has led me to pay a lot of attention to its prevalance in gaming culture. In particular, MMOs have a strong tendency to foster their own language and way of speaking. While playing games like World of Warcraft, Lord of the Rings Online, EVE Online, and others, I've wished for a tool that could help me capture and analyze the vast amounts of data streaming through virtual worlds in the form of chat messages, and outside virtual worlds in the form of message board posts, wiki pages, blogs, and other resources.
The research question and objective
As a programmer, I tend to see most problems I'm presented with as being solvable at least in some part by technology and, more specifically, code. While I'm not intimately familiar with the intricacies of how various researchers have collected and analyzed similar datasets in the past, I met with and observed enough of them working to understand that most use a fairly manual process. Furthermore, with my experience as a "techie" for a large university, I also get the feeling many researchers spend (make that "waste") a lot of time inputting, visualizing, and generally moving around (and around? and around... and around) large amounts of data.
I believed that a computer-based tool could be created to help facilitate this process at several levels. A strong toolset suited to the purpose of performing language analysis would, ideally:
- Get out of the researchers' way and allow them to focus on the data and what it means;
- Provide mechanisms to parse, import, search, catalog, and navigate large, normalized data sets;
- Facilitate collaboration among researchers, technical consultants, and their peers;
- Provide tools for gathering demographic information about subjects whose texts are being researched;
- Include at least rudimentary tools for automated discovery and reporting of basic language features and usage patterns;
- Have several "layers" of depth to the data (taxonomy-, time-, and demographic-based).
Decoder Ring lowers technical and situational barriers to performing this type of research.