Introduction to Digital Humanities

RELI/ENGL 39, Fall 2015, University of the Pacific

Category: cultural heritage

Maps with CartoDB and Tableau

Our unit in Intro DH right now is on mapping.  In class we’ll be working on creating maps with Palladio.  We also had a preliminary introduction to data, tables, and maps by experimenting with Google Fusion Tables.  In preparation for class, I imported a data set consisting of a list of images from the Cushman Archive into a few different tools to experiment.

Here is the map of the data in a Google Fusion map:

This is Miriam Posner’s version of the data. She downloaded the data from the Cushman archives site, restricted the dates slightly, and cleaned it up.  This data went straight into Google’s Fusion Tables as is.  The map shows the locations of the objects photographed.  One dot for every photograph.  Locations are longitude-latitude geocoordinates.

Then I tried CartoDB. I’ve never used it before, but it’s fairly user friendly for anyone willing to spend some time just playing around and seeing what works and doesn’t work.  The first thing I discovered was that CartoDB (unlike Fusion Tables) does not like geocoordinates in one field.  In the Cushman dataset, the longitude and latitude were together in one field.  But in CartoDB, longitude and latitude must be disaggregated.  So to create the following map in CartoDB I first followed the instructions in their FAQ to create separate columns for longitude and latitude.  Then I had fun playing with their map options.

This is just a plain map, but with the locations color coded by the primary genre of each photograph (direct link to CartoDB map):

This one shows the photographs over time (go to the direct link to CartoDB map, because on the embedded map below, the legend blocks the slider):

Then I decided I wanted to see if I could map based on states or cities (for example, summing the number of photographs in a certain state, and color-coding or sizing the dots on the map based on the number of photographs from that city or state).  So I used the same process to disaggregate cities and states as I used to disaggregate longitude/latitude — I just changed the field names. I noted, though, that for some reason, trying to geo-code by the city led to some incorrect locations. If you zoom out in the map below, you’ll see that some of the photographs of objects in Atlanta, Georgia, have been placed in Central Asia, in Georgia and Armenia. This map represents many efforts to clean the data through automation — simply retelling CartoDB to geocode the cities or states. Didn’t work well.

I also couldn’t figure out a good way to visualize density — the number of photographs from each state, for example. So I downloaded my new dataset from CartoDB as a csv file and then imported it into Tableau (Desktop 9.0). By dragging and dropping the “state” field onto the workspace, I quickly created a map showing all the states where photographs in the collection had been taken:
Screen Shot 2015-10-28 at 4.28.39 PM

Then I dragged and dropped Topical Subject Heading 1 (under the Dimensions list on the left in Tableau) onto my map, and I dragged and dropped the “Number of Records” Measure (under the Measures list on the left in Tableau), and I got a series of maps, one for each of the subjects listed in the TSH1 field:
Screen Shot 2015-10-28 at 4.29.29 PM

Note that Tableau kindly tells you how many entries it was unable to map!  (the ## unknown in the lower right).

Below I’ve Summed by the number of records (no genre, topical subject, etc.) for each state. For this, it’s better to use the graded color option than the stepped color option.  If you have just five steps or stages of color, it looks like most of the states have the same number of images, when it is more varied.  The graded color (used below) shows the variations better.

Screen Shot 2015-10-28 at 4.35.24 PM

This map also shows that the location information for photographs from Mexico was not interpreted properly by Tableau.  Sonora (for which there is data) is not highlighted.

 

Then I decided hey, why not a bubble map of locations, so here we go.  Same data as above map, but I selected a different kind of visualization (called “Packed Bubbles” in Tableau).

Screen Shot 2015-10-28 at 4.35.39 PM

When I hovered on some of the bubbles, I could easily see the messy data in Tableau.  Ciudad Juarez is one of the cities/states that got mangled during import, probably due to the accent:

Screen Shot 2015-10-28 at 4.35.56 PM

Finally, a simple map with circles corresponding to the number of photographs from that location. (Again clearly showing that the info from Mexico is not visible.  In fact, 348 items seem not to be mapped.)

Screen Shot 2015-10-28 at 4.36.33 PM

Obviously the next step would be to clean the data, using Google Refine, probably, and then reload.

Many many thanks to the Indiana University for making the Charles Cushman Photograph collection data available and so well-structured and detailed. Many thanks also to Miriam Posner for cleaning the data and providing tutorials for all of us to use!

Digital pedagogy and student knowledge production

The past two weeks in my Introduction to Digital Humanities course, students have been using the open-source content management system Omeka to create online exhibits related to the early Christian text, the Martyrdom of Perpetua and Felicitas.

I was astounded by their accomplishments.  The students raised thoughtful questions about the text, found items online related to Perpetua and Felicitas to use/curate/re-mix, and then created thoughtful exhibits on different topics in groups.

None of them know much if anything about early Christianity. (I think one student has taken a class with me before).  None of them had used Omeka before.  Few of them would consider themselves proficient in digital technology before taking the class.

Here’s what they created.  In two weeks. And I’m super proud of them.

Here’s what we did:

  • We read and discussed the text together.
  • They all registered on our joint Omeka site, and we created a list of questions and themes that would drive our work.
  • Each student then went home and found three items online related to Perpetua and Felicitas or any of the themes and questions we brainstormed. (They watched out for the licensing of items to be sure they could reuse and republish them.)
  • In class each person added one item to the Omeka site — we talked about metadata, licensing, classfication
  • We revised revised revised; in groups, each student added two more items
  • We grouped the Items into Collections (which required discussion about *how* to group Items)
  • Then in small groups, students created Exhibits based on key themes we had been discussing.  Each group created an Exhibit; each student a page within the exhibit.

What made it work?

  • Before even starting with Omeka, we read about cultural heritage issues and digitization, licensing, metadata, and classification — all issues they had to apply when doing their work
  • Lots and lots of in class time for students to work
  • Collaboration!  Students all contributed items to Omeka, and then they each could use any other students’ items to create their exhibits; we had a much more diverse pool of resources by collaborating in this way
  • Peer evaluating: students reviewed each others work
  • The great attitude and generosity of the students — they completely submersed themselves into it.
  • The Omeka CMS forced students to think about licensing, sourcing, classification, etc., as they were adding and creating content.

The writing and documentation in these exhibits exceeded my expectations, and also exceeded what I usually see in student papers and projects.  Some of this is due to the fact that I have quite a few English majors, who are really good at writing, interpreting, documenting.   I also was pleasantly surprised by the level of insight from students who were not formally trained in early Christian history.  They connected items about suicide and noble death, as well as myths about the sacrifice of virgins; they found WWII photos of Carthage.

Are there some claims in these exhibits that I would hope someone more steeped in early Christian history would modify, nuance, frame differently?  Sure.  And not all items are as well sourced or documented as others.  We also did not as a class do a good job of conforming all of our metadata to set standards (date standards, consistent subjects according to Dublin Core or Library of Congress subject categories, etc.).  We tried, but it was a lot of data wrangling for an introductory class.  And honestly, I was satisfied that they wrestled with these issues and were as consistent as we were.

So in sum, for undergraduate work, I was pleased with the results, and am happy to share them with you.

My digital future

This fall, as I have been trying to finish up my book project, Monks and Their Children, I have been asked more than once:  What’s your next project?   When I start describing copticscriptorium.org, I frequently get the reply:  no, I mean your real project, your next book.  My internal response was always twofold:  the snarky, “What, bringing the study of an entire language into the 21st century is not enough?” and the desperate, “I am not sure I have another monograph in me.”  And as the fall wore on, and 2014 became 2015, I became more and more convinced of the authenticity of those sentiments:  that digital scholarship in early Christian studies and late antiquity is still not regarded as legitimate as print monographs and articles, and that indeed I had no interest in writing another monograph.  It’s not that I thought I couldn’t write another book, but that I just had no desire to spend another decade on a long-form argument.  I was more interested in digital writing and digital scholarship that could be read or used by a community more quickly.  And in tighter, more focused arguments in essay form.

I also began chafing more and more at the conservatism of the field.  The definitions of “real” scholarship, the structural sexism that colleagues like Ellen Muehlberger and Kelly Baker were documenting in academia, and the perception of Egypt and Coptic as marginal areas of study.  That conservatism stoked my rebellious fires further; I was not going to force myself to come up with a book project just because that was what one “did” as an active scholar.

And then I saw the CFP for the Debates in the Digital Humanities Series.  It’s a call for essays, not monographs, but like Augustine hearing the child chant, “Tolle lege,” I had an epiphany:  I damn well had a third book in me. I just hadn’t put the pieces together.

In fact, I have two projects in mind:  both are examinations of the field of early Christianity as it intersects (or does not) with Digital Humanities.  Both are political and historiographical.

The book (as yet untitled) is about early Christian studies (especially Coptic and other “Eastern” traditions and manuscript collections), cultural heritage, and digitization.  Planned chapters are:

  1. Digitizing the Dead and Dismembered.  About the material legacy of the colonial dismemberment of archives, the limitations of existing DH standards and technologies (e.g., the TEI, Unicode characterset, etc.) to account for these archives, and how these standards, technologies, practices must transform.  The Coptic language and the White Monastery/Monastery of Shenoute manuscript repository will be the primary source examples, but there should be other examples from Syriac, Arabic.
  2. Can the Colonial Archive Speak? Orientalist Nostalgia, Technological Utopianism, and the Limits of the Digital.  This chapter will look at the practice of constructing digital editions and digital libraries and (building on the issues discussed in the previous chapter) explore the premise that digitization can “recover” an original dismembered archive such as the White Monastery’s repository.  To what extent can digitization recover and reconstruct lost libraries?  What are the political and ethical obligations of Western libraries to digitize manuscripts from Egypt and the wider Middle East?  Does digitization transcend or reify colonial archaeological and archival practices?  This chapter focuses on the concepts of the archive and library and voice.  [HT to Andrew Jacobs for inspiring the chapter title.]
  3. Ownership, Open Access, and Orientalism.  About the benefits, consequences, and dangers of the open access paradigm for digitizing eastern Christian manuscript collections.  Will look at the history of theft of physical text object from monasteries by Western scholars and will ask whether open access digitization is cultural repatriation or digital colonization.  Will look at a number of complexities:  a) the layers and levels of digitization (metadata, text, images); b) the spectrum of openness and privacy possible; and c) the different constituencies involved in asking the question:  whose heritage is this?  who owns/owned the text?  Church, local monastery, “the world” (as world heritage), American/European scholars who have privileged access to some of these texts already in their libraries or on their computers. Will explicitly draw on insights from indigenous cultural heritage studies related to digitization and digital repatriation.
  4. Transparency and Overexposure:  Digital Media and Online Scholarship in Debates about Artifact Provenance.  This chapter will examine the extent to which blogs and social media have changed the conversation about the provenance of text-bearing objects we study, and the ethical responsibilities of researchers.  Will also look at the risks of online debates, and suggest ways to have constructive conversations moving forward.  With special attention to the intersections of status (who’s online and who’s not?) and gender.
  5. The Digital Humanities as Cultural Capital: Implications for Biblical and Religious Studies.  Why our field needs to stop treating digital scholarship as derivative or less rigorous, the implications for us being so conservative about digital scholarship as a field, and how Biblical and Religious Studies can contribute to DH as a discipline (not just in content but in concept, in theory, in its very understanding of itself as a discipline or field, in other words, why DH needs Biblical and Religious studies).
  6. Desirable but maybe a stretch:  War and the Western Savior Complex:  Looks at the rhetoric of crisis and loss (especially in the context of the early 21st c. wars and revolutions in the Middle East) around saving texts, artifacts, and traditions.  What does it mean for scholars from Europe and America who are not the policy makers in their countries but are nonetheless citizens of them to be making pleas for the preservation of antiquities and or cultural traditions (and there is —see Johnson’s JAAR article “‘He Made the Dry Bones Live'”— a conflation of ancient traditions and modern Eastern Christian peoples in scholarship and the media)  that are endangered in part because of the actions of our governments?

The other project will be digital historiography:  using digital and computational methods to crunch Journal of Early Christian Studies (and hopefully its precursor the Second Century?) to look at trends in the field, especially with respect to gender.  Who is publishing, what are we publishing on?  Who is citing whom?  Who is reviewing whom?  How has that changed (or not) over the decades?  This may be one or two essays, not a book.  And it is inspired in part by Ellen Muehlberger’s work micro-blogging statistics on gender in biblical studies book reviews.  I’m taking the Topic Modeling course at DHSI this summer and will think more how that or other methods (concordance text analysis, network analysis, etc.) will support this project.

I hope to publish all of this in digital form, including the monograph on cultural heritage and cultural capital.

So that’s my digital future.  Of course, first I need to get a couple of other things out the door.  And of course Coptic Scriptorium continues.  But when you ask me what my next book is about, there you go.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.