Message

F1423 Digital Texts and Multicultural Studies (Cultural Heritage Sp. Track)

Boschetti Federico

The course aims at illustrating the complete work-flow from bilingual printed editions out of copyright to digital editions linguistically analyzed and annotated by the students.


The first part of the course is devoted to the techniques to acquire digital texts from printed editions by Optical Character Recognition (OCR). Page images are scanned by the teacher or downloaded from repertories available online, such as archive.org. Open source OCR engines and tools developed at the ILC-CNR of Pisa in partnership with the Perseus Project of Boston are described and used by the students during the labs. Examples suggested by the teacher are mainly based on Greek texts with English translation, due to the interesting challenges offered by polytonic Greek, but no preliminary knowledge of the language is required. Students can base their mid term projects on short texts written in different languages, if either the original or the translation is in Greek, Latin, English, German, French, Spanish, Italian or Venetian and the other language is well mastered by them. Typical examples are the Venetian and the Italian translations of the first book of the Iliad by Casanova or some epigrams of the Anthologia Palatina in Greek and the related translation in a modern language, but texts and translations can be both in modern languages on topics selected by the students and discussed with the teacher. The principles of the Text Encoding Initiative (TEI) guidelines are illustrated and students are requested to provide their texts with minimal metainformation (author, title, edition, etc.), layout annotation (division in paragraphs, separation between text and critical apparatus, etc.) and anchors between the source and the related translation.


The second part of the course is devoted to the illustration of tools for linguistic analyses, such as the lemmatization, the morphological analysis and the syntactic parsing. During the lab, automated linguistic analyses are performed by the students on their texts. The suite of tools for editing and annotating texts developed at the Perseus Project (Perseids) is illustrated and tested with the students. Finally, the features of Aporia, the system for text retrieval developed at the ILC-CNR, are described and texts selected by the students, provided with linguistic analyses, are uploaded on the platform. Texts visualized in parallel (at the level of granularity established by the distance of the anchors between source and translation) are annotated by the students with historical, stylistic and linguistic annotations, focusing in particular the main differences between the original text and its translation, due to cultural reasons.

 

Learning outcomes of the course
Students will learn to manage the complete digitization work-flow of multilingual texts and will
annotate texts focusing their attention on cultural differences between the original work and its
translation.