RELI/ENGL 39, Fall 2015, University of the Pacific

Experimenting with Voyant – A Study in Word Counts and Pretty Colors

Upon dropping the “test corpus” file into the Voyant system, I found myself embarking on a journey into the inner workings of vocabulary quite unlike one I have ever taken before. For one, Voyant makes words much more colorful than the text in a book might, so that was understandably exciting. My fondness for aesthetics aside, Voyant proved to be quite the useful tool for unraveling word usage and trends in documents, especially ones I myself am unfamiliar with.word cloud

The first thing I was drawn to was the word cloud – or more specifically, the word “said” in that jumble of colorful letters. (This was
after filtering the cloud, of course, because let’s be honest, nobody wants to count how many times the word “and” is used in a document. That would be as tedious as counting how many times the average teenager says “like” in one conversation.) Looking at “said”, obviously we can tell there’s a lot of talking in these documents. 312 instances of it, to be exact. But that doesn’t tell us much otherwise, unless we compare “said” with some of the other words that crop up frequently.

This brings me to the other batch of lovely colors Voyant has to offer – the graphs of word trends. Upon clicking on “said” in the word cloud, Voyant generated a chart for me that detailed how often “said” appeared in each of the “test corpus” documents. It cropped up the most in the document titled “scillitan”, and the second most in “justin-et-al.” This doesn’t tell me much as someone unfamiliar with the context of these texts, so in order to enlighten myself, I compared “said” with another word, “god.”

god vs said chartWhen I politely asked Voyant to show me the frequencies of both words, it generated the graph at the left. Though the trends in “said” and “god” don’t entirely match up, “god” seems to peak in both “scillitan” and “justin-et-al,” matching the peaks of the word “said.” Clearly this, along with the word cloud, points toward the documents being of religious origin. I think it’s also safe to infer that the texts where “said” and “god” appear together most often involve some kind of religious speeches. Upon examining some other common words in the corpus list, such as “death,” “tortures,” and most obviously “martyrdom,” I can conclude that the people most likely giving those speeches were martyrs, perhaps at the ends of their lives, perhaps trying to inspire the people of the faith they were dying for.

Overall, my venture into Voyant’s database gave me a bit of context for the “test corpus” documents where I had none previously, and it granted me a peek at how certain words and vocabulary come into play within the texts, giving them their due emphasis. However, beyond that, it didn’t really teach me how those words were used in their specific contexts, which could be problematic if I were researching these documents. I would still have to read the documents themselves to understand how those words came into play, who said them and for what purpose, and all that fun jazz. So essentially, Voyant is useful for getting a basic overview of a document, and deciding whether that document would be beneficial to read, but apart from that, it doesn’t give much to go off of as far as content. Regardless, I can’t say it’s not fun to play with.

2 Comments

  1. m_tran41

    I totally agree with the fact that voyant does not give us all the info that we need but more so of a general idea of a document and if it would be helpful. I also agree that it is fun to use and the pretty colors also help to.

  2. k_elliott3

    I definitely agree with the fact that Voyant is pretty much useless in regards to the context of the words. There’s only so much you can guess about a story when you have a few words lined up together, so yeah, it’s probably not the best tool for research when it comes to the details of the document. However, when you need a general topic and a place to start searching, Voyant would be particularly useful.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.