Automatic Lemmatization of Old English Class III Strong Verbs (L-Y) with ALOEV3

Roberto Torre Alonso


This article presents ALOEV3, a lemmatizer based on Morphological Generation that allows for the type-based automatic lemmatization of Old English Class III strong verbs beginning with the letters L–Y. The lemmatizer operates on the basis of the inflectional, derivational and morpho-phonological alternation rules characteristic of this class. The generated form-types are checked against the two most reputed Old English corpora, namely the Dictionary of Old English Corpus and The York-Toronto-Helsinki Parsed Corpus of Old English Prose to validate their attestations and assign the corresponding lemma. Results show that 97 percent of the validated forms are successfully assigned a single lemma. The remaining inflectional forms (38 out of 1,256) show competition between two lemmas, which implies that despite the high level of accuracy of the lemmatizer, contextual, token-based analysis is still needed for disambiguation. However, the research shows that competition only occurs in a limited set of lemma pairs and their derivatives. Although the research focuses on but one strong verb class, it confirms that exploring the avenues of automatic lemmatization will contribute to the field of Old English lexicography by either lemmatizing attested inflectional form types or by highlighting areas for manual revision.


Old English; automatic lemmatization; strong verb; morphology; Natural Language Generation; Morphological Generation


