Introduction to Digital Humanities

RELI/ENGL 39, Fall 2015, University of the Pacific

Utilizing Voyant in the Digital Humanities

 

Voyant is an extremely useful and clever tool to use in the digital humanities… especially when you’re looking at vocabulary. However, when looking at something other than vocabulary and word frequencies within the content, it’s pretty much useless.

When first entering Voyant, there’s a colorful word cloud that visually depicts how often a word will appear within the dataset, and one can remove the more common words like “the” or “and” by going to the “Stopwords” options. Then the more interesting words, the words that are more able to show the point of the content, appear.

Screen Shot 2015-09-02 at 12.54.24 PMClearly, these words are much more interesting than “the” and “and”. Not to say those words are unimportant or anything, but, well… you get the point. Looking at the word cloud as a whole, it seems like the content of the dataset is really interesting, I mean, look at all those cool words: “death,” “tortures,” “shall,” “martyrdom”, et cetera, et cetera. This obviously implies that the content is a lot more complex than “and” or “the” would entail.

 

 

 

Moving on from the word cloud (difficult, right? There’s so many pretty colors), one can see that there are a lot more tools that can be utilized in examining the vocabulary content of the dataset. The summary shows how many documents are in the dataset, the longest and shortest of those documents, the highest vocabulary densities in the whole set, and the frequencies of the words. The corpus reader, just to the right of the word cloud and summary, shows the content of the dataset in its natural form, along with certain words that you can select to be highlighted.

Now, possibly the neatestScreen Shot 2015-09-02 at 12.54.50 PM thing about Voyant is the”Words in the Entire Corpus” tool, as it shows you the most common word frequencies (which can also be filtered by using the “stopwords” option), and allows you to compare certain word frequencies throughout the dataset.

Here, I compared the words “men” and “beasts”, just because they seem pretty opposite in definition, and it’d be neat to see how many times they’re used in the same document. What I found was that, there was always a notable difference in the word frequencies of each document (besides 10)scilitan and 12)readme, in which both words do not appear at all). While “men” would be used multiple times within a document, “beasts” would appear quite infrequently, if at all, and if “beasts” was used generously in the document, “men” would seldom appear.

Interesting, right? It kind of makes you wonder what these words were being used for. And that’s exactly the problem with Voyant.

 

It’s undeniable that Voyant has its uses, but it doesn’t quite have a knack for finding the context in which a word will appear, without searching through the whole “Corpus Reader” tool to find it. You just don’t know if, in the documents provided, men are being called beasts instead of men, or if they really are alluding to men. Sure, the Corpus Reader can help with that, but it can be pretty tedious to have to search through the whole thing for two words that repeat over one hundred times to see how they are used in context. So it really does seem like you would have to read just to see how a word is used instead of clicking on the nice, pretty words provided in the “Cirrus” tool to find out just what the texts are about.

4 Comments

  1. I definitely agree that Voyant doesn’t give you much context as far as the terms it shows you goes. I like your example of “beasts” vs. “men” a lot – my assumption would be that when “beasts” appears a lot, and “men” doesn’t, and vise versa, then the words are substituting each other, but with Voyant’s capabilities it’s hard to tell. So you hit the nail on the head there. It’s a useful tool, but not practical for all applications of reading and exploring documents.

  2. I found it interesting how you compared “men” to “beasts” using the Word Trends tool, and at the same time, I agree that Voyant is not your go-to tool for examining content closely. In that sense, I think the Corpus Reader tool that you referred to could definitely use some improvement. I, personally, would prefer it to clearly separate the texts and condense the amount of text shown around the specific word you’re searching for.

  3. I liked that you mentioned that is is “pretty much useless” when it comes to analyzing documents for things other than vocabulary. It does make me wonder if there are other applications or tools that analyze other things in a text, like maybe figure of speech.

  4. I enjoyed how you broke down and explained some of the parts of Voyant and let the reader know which ones interested you the most. I also like how to acknowledged that Voyant does have it’s issues and you proceeded to point out the ones that you saw or could foresee.

Leave a Reply

Your email address will not be published.

*