Friday, November 15, 2013

On Pure Text Corpora

Stephan Seidlmayer makes an important point about text corpora in this article:
Für die Grammatik sucht man sich „reine” Corpora zusammen, Gruppen von Texten, von denen man meint, dass sie dieselbe Sprache sprechen.

For grammar one seeks for "pure" corpora, groups of texts which one considers to be in the same language.
But such pure corpora are illusory:
Die Auswahl „reiner“ Corpora ist, indem sie auf einer petitio principii fußt, zirkulär.
The selection of the texts as representing a "pure" corpus is a circular argument. What we find in Egyptian texts is a range of usage that changes over time. But we also have various registers of language that authors slide between.

To speak or write in an archaic form of the language or in an antique style conveys something different than doing so in a contemporary style. The same thing happens when one uses classical Arabic instead of colloquial or King James English (or even imitation King James English) rather than a more contemporary idiom. This is not to say that using archaic Egyptian necessarily meant the same thing to an Egyptian that archaic English does in modern English, but that in both languages the shift in register provides different connotations than the use of contemporary speech. A master of the language can use such things to great effect. Unfortunately, not all writers are masters of the language.

Speaking of text corpora, one of my favorite selections of texts was used by E. A. W. Budge in his book Egyptian Language. The text examples come either from the Pyramid of Unas or Papyrus D'Orbiney. The texts are about a thousand years apart. It would be a little like learning English with examples from Beowulf and Jane Austin. While there may not be such a thing as a pure text corpus, there are reasons we usually do not put those particular texts together.