RELI/ENGL 39, Fall 2015, University of the Pacific

Category: linguistics

March 2014 Coptic SCRIPTORIUM Release notes

Coptic SCRIPTORIUM is pleased to announce a new release of data and an update on our project.  Please visit our site at (backup at

We’ve released several new corpora:
-two fragments of Shenoute’s Acephelous Work #22 (aka A22, from Canons Vol. 3)
-two letters of Besa (to Aphthonia and to Thieving Nuns)
-chapters 1-6 of the Sahidic Gospel of Mark (based on Warren Wells’ Sahidica New Testament)

These corpora include:
⁃    visualizations and annotations of diplomatic manuscript transcriptions (except for Mark)
⁃    visualizations and annotations of the normalized text
⁃    annotations of the English translation (except for some A22 material)
⁃    part-of-speech annotations (which can be searched)
⁃    search and visualization capabilities for normalized text, Coptic morphemes, and bound groups in most of the corpora
⁃    Language of origin annotations (Greek, Hebrew, Latin) in most corpora (which can be searched)
⁃    TEI XML files of the texts in the corpora, which validate to the EpiDoc subset

We’ve also:
⁃    Updated the documentation about our part-of-speech tag set and tagging script.  (If you’re interested at all in Coptic linguistics please do read about our tag set)
⁃    Provided some example queries for our search and visualization tool (ANNIS); just click on a query and ANNIS will open and run it
⁃    updated our Frequently Asked Questions document
⁃    released an update to the Apophthegmata Patrum corpus to incorporate some of the new technologies described above
⁃    improved automation of normalizing text, annotating it for part-of-speech, annotating language of origin, annotating word segmentation (bound groups vs morphemes, etc.)

We would love to hear from you if you use our site; we think it will be useful for people teaching Coptic as well as conducting research.  Please email either of us feedback directly.

The improvements in automation also mean we would love to work with you if you have digitized Coptic texts that you would like to be able to search or annotate, if there are texts you would like to digitize, or if you would like to annotate existing texts in our corpus in new ways.  We are ready to scale up!

Thanks for all of your support.  This project is designed for the use of the entire Coptological community, as well as folks in Linguistics, Classics, and related fields.

January 2014 Coptic SCRIPTORIUM release notes

We’ve released some additional TEI XML files for our SCRIPTORIUM corpora at (backup site

  • All the TEI files have been lightly annotated with linguistic annotations.
  • The metadata has been updated to provide more information about the repositories and manuscript fragments.
  • There are now TEI downloads for every file in our public ANNIS database.
  • All TEI files conform to the EpiDoc TEI XML subset and validate to the EpiDoc schema.
  • The files are licensed under a CC-BY 3.0 license which allows unrestricted reuse and remixing as long as the source is credited (Coptic SCRIPTORIUM).  Linguistic annotations were made possible with the sharing of resources from Dr. Tito Orlandi and the CMCL (Corpus dei Manoscritti Copti Letterari); please credit them, as well.

We welcome your feedback on the TEI XML.  We hope to release more texts in the corpora later this winter or in early spring.


SBL presentation on Digital Technologies to find and study biblical references in Coptic literature

The slides from my 2013 Society of Biblical Literature presentation are now available on and are referenced on Coptic SCRIPTORIUM’s Zotero Group Library page.

Searching for Scripture: Digital Tools for Detecting and Studying the Re-use of Biblical Texts in Coptic Literature (Caroline T. Schroeder, Amir Zeldes)


Some of our most important biblical manuscripts and extra-canonical early Christian literature survive in the Coptic language. Coptic writers are also some of our most important sources for early scriptural quotation and exegesis. This presentation will introduce the prototype for a new online platform for digital and computational research in Coptic, and demonstrate its potential for the detection and analysis of “text-reuse” (quotations from, citations and re-workings of, and allusions to prior texts). The prototype platform will include tools for formatting digital Coptic text as well as a digital corpus of select texts (most specifically the writings of Shenoute of Atripe, who is known for both his biblical citations and his biblical style of writing). It will allow searching for patterns of shared vocabulary with biblical texts as well as for grammatical and syntactical information useful for stylistic analyses. Both the potential uses and imitations of implicit methodologies will be discussed.

New grant funded for Coptic digital studies

The German Federal Ministry of Education and Research (BMBF) has approved Dr. Amir Zeldes’ (Humboldt University) proposal for a young researcher group on Digital Humanities at HU Berlin, starting early next year. The project is called KOMeT (Korpuslinguistische Methoden für eHumanities mit TEI), and aims to apply corpus linguistics methods to ancient texts encoded in TEI XML, focusing initially on richly annotated corpora of Sahidic Coptic. Dissertations within the group will be mentored by Frank Kammerzell, Anke Lüdeling, Laurent Romary and myself.

The group will cooperate with the SCRIPTORIUM project that Dr. Zeldes and I presented at our workshop in Berlin in May.

(The text of this announcement is taken from Amir Zeldes’.)