Voyant is an extremely useful and clever tool to use in the digital humanities… especially when you’re looking at vocabulary. However, when looking at something other than vocabulary and word frequencies within the content, it’s pretty much useless.
When first entering Voyant, there’s a colorful word cloud that visually depicts how often a word will appear within the dataset, and one can remove the more common words like “the” or “and” by going to the “Stopwords” options. Then the more interesting words, the words that are more able to show the point of the content, appear.
Clearly, these words are much more interesting than “the” and “and”. Not to say those words are unimportant or anything, but, well… you get the point. Looking at the word cloud as a whole, it seems like the content of the dataset is really interesting, I mean, look at all those cool words: “death,” “tortures,” “shall,” “martyrdom”, et cetera, et cetera. This obviously implies that the content is a lot more complex than “and” or “the” would entail.
Moving on from the word cloud (difficult, right? There’s so many pretty colors), one can see that there are a lot more tools that can be utilized in examining the vocabulary content of the dataset. The summary shows how many documents are in the dataset, the longest and shortest of those documents, the highest vocabulary densities in the whole set, and the frequencies of the words. The corpus reader, just to the right of the word cloud and summary, shows the content of the dataset in its natural form, along with certain words that you can select to be highlighted.
Now, possibly the neatest thing about Voyant is the”Words in the Entire Corpus” tool, as it shows you the most common word frequencies (which can also be filtered by using the “stopwords” option), and allows you to compare certain word frequencies throughout the dataset.
Here, I compared the words “men” and “beasts”, just because they seem pretty opposite in definition, and it’d be neat to see how many times they’re used in the same document. What I found was that, there was always a notable difference in the word frequencies of each document (besides 10)scilitan and 12)readme, in which both words do not appear at all). While “men” would be used multiple times within a document, “beasts” would appear quite infrequently, if at all, and if “beasts” was used generously in the document, “men” would seldom appear.
Interesting, right? It kind of makes you wonder what these words were being used for. And that’s exactly the problem with Voyant.
It’s undeniable that Voyant has its uses, but it doesn’t quite have a knack for finding the context in which a word will appear, without searching through the whole “Corpus Reader” tool to find it. You just don’t know if, in the documents provided, men are being called beasts instead of men, or if they really are alluding to men. Sure, the Corpus Reader can help with that, but it can be pretty tedious to have to search through the whole thing for two words that repeat over one hundred times to see how they are used in context. So it really does seem like you would have to read just to see how a word is used instead of clicking on the nice, pretty words provided in the “Cirrus” tool to find out just what the texts are about.