TEI and XML for Humanists

My original blog post is here.

How and why do humanists use programming? XML (eXtensible Mark-up Language) is an accessible but robust syntax for making texts machine-readable, and the Text Encoding Initiative has developed standards for describing texts through XML. These mark-up syntaxes, in conjunction with XSLT (Extensible Stylesheet Language Transformations), allow the humanist to make primary source materials accessible to new kinds of scholars who use technology to see connections in large datasets. Examples of scholarship using these technologies include the Swinburne Archives, the London Lives project, and the Electronic Enlightenment project; both are built on encoding platforms that originate in XML to provide relational data among interconnected documents, often in visual form. This presentation—a report from the 2013 DHOxSS—will contain a hands-on component to introduce faculty to the concept of TEI/XML for projects in the humanities. Resources will be available at cerosia.org (search for “innovations” or “DHOxSS”).

Today, we’ll learn a little bit about what XML and TEI are good for, look at some examples, and then I’ll ask you to try your hand at basic structural markup on a piece of literature. Keep in mind that you can create your own schemas and doctypes, so the sky really is the limit–of course, if not standardized, your work may not be legible to others. But, since we’re only going to be peeping into the abyss, we won’t worry too much about whether your documents are valid or draw accurately on a specific standard. Feel free to make up some tags!


A Very Gentle Introduction to TEI

Getting Started Using TEI (Oxford)

TEI Structures (Oxford)

TEI by Example

TEI Handout, Poetry Edition (UVa)

Samples that I’m working on (The Tatler No.238; Mary Hays)

Sample highly marked-up Swinburne poem, “On the Cliffs” (should open in browser; if not, right-click, save as, and open with wordpad, notepad, or the equivalent)

Sample visualization created using the encoded “On the Cliffs”, by John Walsh

Presentation on “‘Quivering web of living thought’: Conceptual Networks in Swinburne’s Songs of the Springtides.”  (see slides 16-20 for thematic encoding)

TEI Template (UVa) (copy the template code into a new notepad, wordpad, or equivalent program; we’ll use this to play!)

Visualizing Literary Texts

My original blog post is available here.

Ever wonder how web-based tools and text-based analysis intersect? Come find out how to analyze literature (and text more broadly understood) using a variety of online tools that have minimal learning curves.  I will introduce you to a manageable number of such tools–Voyant, Mandala browser, ManyEyes, and others–and then we will experiment with  them as a group and independently. This demonstration and workshop will be useful for pedagogical and scholarly purposes.

I encourage you to bring a sample text or corpora you want to work with during the session; it should be in either .TXT format, a .ZIP collection of texts, .XML, or, in some cases, a URL that points to a text you want to work with. I will also bring a selection of texts for us to draw on. Online materials will be located at http://cerosia.org–search for keyword “innovations.” This session derives from material covered in the 2012 University of Victoria Digital Humanities Summer Institute.

Coursepack: Online tools for literary analysis

Plaintext and ZIPped corpora

Links to electronic text collections:

Bamboo DiRT (Digital Research Tools)

Bamboo DiRT is a tool, service, and collection registry of digital research tools for scholarly use. Developed by Project Bamboo, Bamboo DiRT is an evolution of Lisa Spiro’s DiRT wiki and makes it easy for digital humanists and others conducting digital research to find and compare resources ranging from content management systems to music OCR, statistical analysis packages to mindmapping software.

TAPoR (Text Analysis Portal) [TAPoR2 Test Environment–try this link if the first doesn’t work]

TAPoR is a gateway to the tools used in sophisticated text analysis and retrieval. Browse tools by type or tag, search and use tools, read and create tool reviews, contribute and advertise your own tools.

Voyant Tools

Voyant is a web-based text analysis environment. It is designed to be user-friendly, flexible and powerful. Voyant is part of the Hermeneuti.ca, a collaborative project to develop and theorize text analysis tools and text analysis rhetoric. This section of the Hermeneuti.ca web site provides information and documentation for users and developers of Voyeur. Note: The original name of the environment was “Voyeur,” which was recently changed given the connotations of “voyeur.” You might see these names used interchangeably. You can also get to Voyant Tools via TAPoR.

IBM ManyEyes

View, discuss, and create data sets and visualizations of data sets using a variety of filters including pie charts, scatterplots, bubble charts, treemaps, word clouds, phrase nets, and more.

Google n-Gram viewer

Read more about the Google n-Gram viewer here. See some sample uses of the n-Gram viewer here. Try it yourself!

Zotero timelines

Make a timeline from your Zotero collections to visualize your research.

Juxta (Collation Software for Scholars): http://juxtacommons.org

Sample visualizations, :

Relative word frequencies of five gothic novels (Voyant)


Skip to toolbar