RELI/ENGL 39, Fall 2015, University of the Pacific

Tag: voyant

Voyant in Researching Philosophy and Legal Files

Group members: a_colombo, k_elliott3, a_rocha3, Kyle C, p_drake

Our given website to examine was a philosophical collection of sorts. The site’s author, Kieran Healy, had complied a smorgasbord of citations into one giant, graphical web, all references to philosophy journals and articles written by people who most likely knew what they were talking about better than we understood it. From what we gathered, the research question most likely being focused on was simply an effort to discover what kinds of things the philosophical community was chitchatting about, and who was doing the chitchatting. Hence, the website was geared toward an audience whose heads were much higher up in the philosophical clouds than ours. Upon dropping the website URL into Voyant, it became obvious that Healy was primarily focused on one thing in his writing: the graph he’d made, as demonstrated by the word cloud below.

philosophy word cloud thing

Apart from admiring all the colorful dots and crisscrossed lines of Healy’s graph, there wasn’t much for us to glean from the website, so we started branching our discussion off into other areas of study that Voyant might be useful for. We focused particularly on researching legal cases, and how Voyant would make it exceptionally easy to sort through legal files, precedents, and other related documents to find correlations between cases. If we wanted to compare different cases of domestic abuse, for example, we would simply have to plop a handful of files into Voyant to find the ones that would help us the most. So if anyone ever needs to write a research paper about legal proceedings, perhaps Voyant is a good place for you to start.

DH projects/websites: how did they make that? especially







What is the research question?

What is the dataset?

What is the method/tool?

Who/what is the audience?

Evaluate — does it work?


Voyant: Words in the Clouds (September 3rd)

Voyant is an interesting piece of software, that technically speaking, works very poorly as a web based tool. I find its uses and applications interesting and vast, yet the volume of data that it is processing is so large, that when I found myself playing with particular tools, specifically the “sunburst” tool, it would cause my internet browser to stop responding and I’d have to start the whole process anew. Relating this back to modularity, the reason I would find this more useful as a downloadable standalone application is because, in my experience, web based tools are far more susceptible to program failures such as I experienced with Voyant, and locally hosted applications don’t react in such a “if I’m going down I’m taking you with me” sort of manner. Having an external failure of one program that doesn’t take everything else I’m doing across my emails, this blog post, and my research down with it leads to lessened frustration in the end user, for example if Voyant crashed and I had to restart my entire computer every time it did, I would lose efficiency and be more frustrated. Additionally, all of the tools would function without the use of plug-ins, for example, I was unable to get “lava” or “mandala” to work because I was missing some unspecified web plug-in (the picture looked like I was missing something from Adobe Flash, but looking at my available in-browser plug-ins I’m not missing anything crucial and I don’t expect Voyant to be using pointedly specific plug-ins without telling the user what they are.


Alt text is cool right?

I would frame this and put it on my wall. Maybe give a poster of it to a favorite high school Lit teacher.

Now moving on to why Voy-aunt (which doesn’t rhyme with buoyant, but savant) is a particularly interesting tool to utilize in humanities research. I used the Shakespeare texts, and I found myself playing with the visual aspects of the program, such as “bubblelines” which visualizes the words you input in a very nice almost artwork fashion. What caught my eye using this tool was how out of the words “good” “shall” “lord” “come” “sir” and “love”, Shakespeare’s Comedy of Errors only uses the word “sir” throughout it’s text, which sets it apart as the singular, though still visually appealing, monochromatic line among a series of more psychedelic ones.


Next I used the “knots” tool, which unlike “bubblelines” or “cirrus” gave me no usable information, and the ability to change the “angles” and “tangles” with no relevant correspondence to the data makes this tool seem very questionable.

Okay fine, my artwork at 12.

My artwork as a 6-year old, or classic Microsoft screensaver?

And when I was clicking around in it, this message popped up: raising more than a few questions while actually providing me with more interesting subject matter than what the tool generated.

Why is "good" bold?

Who is this for? Why is it here? Why is my only option to say “OK” after it rants to me?

Some practical applications to this software would be comparing two translations of a text (lets say Shakespeare again) to compare exactly how the wording changes between the slight variations in text. Or we could take, say, the First Folio and run it against Folgers modern translations to see how the language has or has not changed over time and how similar or completely different what we’re reading now is compared to the original texts. You could do the same with various translations of the bible and compare word clouds to see if one favors particular words over other synonyms and why that is. A short comment on the word clouds, looking through the media library (sharing this blog means sharing the libraries too if you noticed), I see how the cloud generated different shapes, patterns, and colors for the same data (credit for below clouds goes to whomever uploaded them).

downloadThe dataset provided in Figure 1 is provided from the Test Corpus 2 files.word cloud Screen Shot 2015-09-02 at 12.54.24 PM Cirrus

Five different visualizations of word clouds or “Cirrus” for Dr. S’s test corpus.

A brief note on the shared media library: it is interesting to see what data is associated with each word cloud, such as file name. The diversity in the naming across these five clouds is more than I would expect.

Returning my post back to practical applications of Voyant, I think it can be used for many purposes other than finding commonalities within a corpus, though the visualizations seem to be most grand when they are accessing a large body of work. I could see myself using Voyant in many “this wasn’t made to do that but okay, I guess it works” kind of ways such as:

  • Running a personal journal through Voyant and analyzing the recurring themes, people, and places mentioned in the text to better understand how I got to where I am today.
  • Running the data of a series of lists, such as the lyrics to the Billboard Top 40 songs of any given day to see a visual representation of what words to you would probably hear if you turned your radio on. Another example would be using a list of ingredients for each menu item of a restaurant to see what their most used item is and use that information to gain insight to how they may make their recipes.
  • Analyzing the code of a program through Voyant to see how often a certain function is used.

If Voyant was slightly more powerful and could search short key phrases (Name Surname, places that aren’t one word like Los Angeles or New York, or just common word combinations or descriptors like chocolate milk or tired student) I think it would become exponentially more useful. I do not believe that the program accounts for aspects of the upload that are not actually “part” of the text, such as the Project Gutenberg disclaimers at the start of each text in the Shakespeare upload. Since it leaves that information, it skews the data slightly past what you are actually analyzing, and a system that allowed you to choose which parts of the document upload functions as text to analyze and which functions as non academic information would be something that takes Voyant one step further. Additionally, if it was able to count pluralizations and their singular forms as one set of data used (at least have a setting to inclusively count both as one countable object), this tool would be able to offer better analysis of comparing two subjects that may be missing information because it’s reading and comparing “love vs. hate” as opposed to “love/s vs. hate/s”.

Now as I round out this blog post, I would like to offer up a Cirrus of my own and some other analytics that I made to visualize all of the blog posts posted so far (including this one up to this point in the text) and how frequently some words are used.

This is us. We sure like talking about Voyant huh? Many words are repeated alongside their plurals too.

Collectively, we used a total of 1,267 unique words, said “Voyant” a total of 75 times, “Cirrus” a total of 7 times, and “fun” a total of 3. Though two of those were from one person, so they really liked using Voyant. To the one person who posted their blog while I was making and analyzing the above Cirrus, I’m sorry I couldn’t include you! Adding text to the corpus reader after initializing the program now suddenly seems like a useful feature too.