Automatic Lemmatization of Old English Class III Strong Verbs (L-Y) with ALOEV3

Roberto Torre Alonso

Abstract


This article presents ALOEV3, a lemmatizer based on Morphological Generation that allows for the type-based automatic lemmatization of Old English Class III strong verbs beginning with the letters L–Y. The lemmatizer operates on the basis of the inflectional, derivational and morpho-phonological alternation rules characteristic of this class. The generated form-types are checked against the two most reputed Old English corpora, namely the Dictionary of Old English Corpus and The York-Toronto-Helsinki Parsed Corpus of Old English Prose to validate their attestations and assign the corresponding lemma. Results show that 97 percent of the validated forms are successfully assigned a single lemma. The remaining inflectional forms (38 out of 1,256) show competition between two lemmas, which implies that despite the high level of accuracy of the lemmatizer, contextual, token-based analysis is still needed for disambiguation. However, the research shows that competition only occurs in a limited set of lemma pairs and their derivatives. Although the research focuses on but one strong verb class, it confirms that exploring the avenues of automatic lemmatization will contribute to the field of Old English lexicography by either lemmatizing attested inflectional form types or by highlighting areas for manual revision.


Keywords


Old English; automatic lemmatization; strong verb; morphology; Natural Language Generation; Morphological Generation

References


(1899). The Holy Bible Translated from the Latin Vulgate (Douay Rheims Version). London: Tan books and publishers.

Biber, D., S. Conrad and R. Reppen 1998. Corpus Linguistics: Investigating Language Structure and Use. Cambridge: Cambridge University Press

Bosworth, J. and T.N. Toller. 1973 (1898). An Anglo-Saxon Dictionary. Oxford: Oxford University Press.

Brunner, K. 1965. Altenglische Grammatik, nach der Angelsachsischen Grammatik von Eduard Sievers. Berlin: Max Niemeyer.

Campbell, A. 1987 (1959). Old English Grammar. Oxford: Oxford University Press.

Clark Hall, J. R. 1996. A Concise Anglo-Saxon Dictionary. Supplement by Herbert D. Merritt. Toronto: University of Toronto Press.

Ferrés, D., A. AbuRa’ed, and H. Saggion. 2017. “Spanish morphological generation with wide-coverage lexicons and decision trees”. Procesamiento del Lenguaje Natural 58: 109-116.

Forcada, M.L., M. Ginestí-Rosell, J. Nordfalk, J. O’Regan, S. Ortiz-Rojas, J.A. Pérez-Ortiz, F. Sánchez-Martínez, G. Ramírez-Sánchez and F. Tyers. 2011. “Apertium: A free/open-source platform for translation”. Machine Translation 25 (2): 127-144.

García Fernández, L. 2020. Lemmatising Old English on a Relational Database. Preterite-Present, Contracted, Anomalous and Strong VII Verbs. Munich: Utzverlag.

García García, L. 2019. “The basic valency orientation of Old English and the causative ja- formation: A synchronic and diachronic approach”. English Language and Linguistics 24 (1): 153-177. doi: 10.1017/S1360674318000345.

García García, L. and E. Ruiz Narbona. 2021. “Lability in Old English verbs: Chronological and textual distribution”. Anglia: Journal of English Philology 139 (2): 283-326. doi: 10.1515/ang-2021-0022.

Healey, A., ed., with J. Price and X. Xiang. 2009. The Dictionary of Old English Web Corpus. Dictionary of Old English Project, Centre for Medieval Studies, University of Toronto.

Healey, A., ed. 2018. The Dictionary of Old English: A to I. Dictionary of Old English Project, Centre for Medieval Studies, University of Toronto.

Hogg, R. & R.D. Fulk. 2011. A Grammar of Old English. Oxford: Wiley-Blackwell.

Hostetter, A.K. n.d. The Old English Narrative Poetry Project: Beowulf. Accessed September 2. https: //oldenglishpoetry.camden.rutgers.edu/beowulf/

Kastovsky, D. 1992. “Semantics and vocabulary”. The Cambridge History of the English Language I: The Beginnings to 1066. Ed. R. Hogg. Cambridge: Cambridge University Press. 290-408.

Khemakhem, A., B. Gargouri, A. Ben Hamadou and G. Francopoulo. 2015. “ISO standard modeling of a large Arabic dictionary”. Journal of Natural Language Engineering 22 (6): 849-879.

Krygier, M. 1994. The Disintegration of the English Strong Verb System. Frankfurt am Main: Peter Lang.

Levin, S.R. 1964. “A reclassification of the Old English strong verbs”. Language 40 (2): 156-161.

Manjavacas, E., A. Kádár and M. Kestemont. 2019. “Improving lemmatization of non-standard languages with joint”. Proceedings of NAACL-HLT 2019. Ed. J. Burstein, Doran, Christy and T. Solorio. Minneapolis: ACL. 1493-1503.

Martín Arista, J. 2012a. “Lexical database, derivational map and 3D representation”. RESLA-Revista Española de Lingüística Aplicada (Extra 1): 119-144.

Martín Arista, J. 2012b. “The Old English prefix ge-: A panchronic reappraisal”. Australian Journal of Linguistics 32 (4). 411-433. doi: 10.1080/07268602.2012.744264.

Martín Arista, J. 2013a. “Recursivity, derivational depth and the search for Old English lexical primes”. Studia Neophilologica 85(1). 1-21. doi: 10.1080/00393274.2013.771829.

Martín Arista, J. 2013b. Nerthus. Lexical database of Old English: From word-formation to meaning construction. Research Seminar, School of English, University of Sheffield.

Martín Arista, J., ed. 2016. NerthusV3. Online Lexical Database of Old English. Nerthus Project, Universidad de La Rioja. www.nerthusproject.com.

Martín Arista, J. 2017a. “El paradigma derivativo del inglés antiguo”. Onomázein 37: 144-169.

Martín Arista, J. 2017b. The design and implementation of a pilot parallel corpus of Old English. Paper presented at the SHELL Session of the 2017 International Medieval Conference, University of Leeds, July 4.

Martín Arista, J. 2018. “The semantic poles of Old English. Toward the 3D representation of complex polysemy”. Digital Scholarship in the Humanities 33 (1): 96-111.

Martín Arista, J. 2019. “Another look at Old English zero derivation and alternations”. ATLANTIS 41 (1): 163-182.

Martín Arista, J. 2020a. “Old English rejoice verbs. Derivation, grammatical behaviour and class membership”. POETICA 93: 133-153.

Martin Arista, J. 2020b. “Further remarks on the deflexion and grammaticalization of the Old English past participle with habban”. International Journal of English Studies 20 (1): 51-71.

Martín Arista, J. 2021a. “The syntax and semantics of the Old English predicative construction”., Language Change and Linguistic Theory in the 21st Century. Eds. N. Lavidas and K. Nikiforidou. Leiden: Brill. Forthcoming.

Martín Arista, J. 2021b. “Word alignment in a parallel corpus of Old English prose. From asymmetry to inter-syntactic annotation”. Corpora in Translation Research: Recent Advances and Applications. Eds. J. Lavid-López, C. Maíz-Arévalo and J.R. Zamorano. Amsterdam: John Benjamins. 76-100.

Martín Arista, J. & A.E. Ojanguren López. 2018. Doing electronic lexicography of Old English with a knowledge-base. Workshop delivered at the Consolidated Library of Anglo-Saxon Poetry (CLASP) Project, Faculty of English Language and Literature of the University of Oxford.

Martín Arista, J., S. Domínguez Barragán, L. García Fernández, E, Ruíz Narbona, R. Torre Alonso, and R. Vea Escarza (Comp.) 2021. ParCorOEv2. An Open Access Annotated Parallel Corpus Old English-English. Nerthus Project, Universidad de La Rioja, www.nerthusproject.com.

Mateo Mendaza, R. 2014. “The Old English adjectival affixes ful-and -ful: a text-based account on productivity”. NOWELE-North-Western European Language Evolution 67 (1): 77-94. doi: 10.1075/nowele.67.1.

Mateo Mendaza, R. 2015. “Matching productivity indexes and diachronic evolution. The Old English affixes ful-, -isc, -cund and -ful”. Canadian Journal of Linguistics 60 (1): 1-24.

Mateo Mendaza, R. 2016a. “The search for Old English semantic primes: The case of HAPPEN”. Nordic Journal of English Studies 15: 71-99.

Mateo Mendaza, R. 2016b. “The Old English exponent for the semantic prime MOVE”. Australian Journal of Linguistics 34 (4): 542-559. doi: 10.1080/07268602.2016.1169976

von Mengden, F. 2011. “Ablaut or transfixation? On the Old English strong verbs”. More than Words: English Lexicography and Lexicology Past and Present. Eds. R. Bauer and U. Krischke. Frankfurt am Main: Peter Lang. 123-139.

Metola Rodríguez, D. 2015. Lemmatisation of Old English strong verbs on a lexical database. Unpublished Ph. D. thesis. University of La Rioja: Spain.

Metola Rodríguez, D. 2017. “Strong verb lemmas from a corpus of Old English. Advances and issues”. Revista de Lingüística y Lenguas Aplicadas 12: 65-76.

Novo Urraca, C. 2015. “Old English deadjectival paradigms. Productivity and recursivity”. NOWELE-North-Western European Language Evolution 68 (1): 61-80.

Novo Urraca, C. 2016a. “Old English suffixation. Content and transposition”. English Studies 97: 638-655.

Novo Urraca, C. 2016b. “Morphological relatedness and the typology of adjectival formation in Old English”. Studia Neophilologica 88 (1): 43-55.

Novo Urraca, C. and A.E. Ojanguren López. 2018. “Lemmatising treebanks. Corpus annotation with knowledge bases”. RAEL: Revista electrónica de Lingüística Aplicada 17 (1): 99-120.

Oflazer, K. and M. Saraçlar, eds. 2018. Turkish Natural Language Processing. Cham: Springer. doi: 10.1007/978- 3- 319- 90165- 7.

Ojanguren López, A.E. 2020. “The semantics and syntax of Old English end verbs”. ATLANTIS 42 (1): 163-188.

Pintzuk, S. and L. Plug. 2001. The York-Helsinki Parsed Corpus of Old English Poetry. Department of Language and Linguistic Science, University of York.

Reiter, E. and R. Dale. 1997. “Building applied natural language generation systems”. Natural Language Engineering 3 (1): 57-87. doi: 10.1017/S1351324997001502

Rissanen, M., M. Kytö, L. Kahlas-Tarkka, M. Kilpiö, S. Nevanlinna, I. Taavitsainen, T. Nevalainen, and H. Raumolin-Brunberg, comp. 1991. The Helsinki Corpus of English Texts. Department of English, University of Helsinki.

Riyeff, J. 2017. The Old English rule of Saint Benedict with related Old English texts. Collegeville: Liturgical press.

Taylor, A., A. Warner, S. Pintzuk and F. Beths. 2003. The York-Toronto-Helsinki Parsed Corpus of Old English Prose. York: University of York.

Tapsai, C., H. Unger and P. Meesad. 2021. Thai Natural Language Processing. Cham: Springer.

Tío Sáenz, M. 2019. The lemmatisation of Old English Weak verbs of a relational database. Unpublished Ph. D. thesis. University of La Rioja: Spain.

Vea Escarza, R. 2013. “Old English adjectival affixation. Structure and function”. Studia Anglica Posnaniensia 48 (2-3): 5-25.

Vea Escarza, R. 2016a. “Recursivity and inheritance in the formation of Old English nouns and adjectives”. Studia Neophilologica 88: 1-23. doi: 10.1080/00393274.2015. 1049830

Vea Escarza, R. 2016b. “Old English affixation. A structural-functional analysis”. Nordic Journal of English Studies 15 (1): 101-119.

Vea Escarza, R. 2018. “Las funciones y categorías de los nombres y adjetivos afijados del inglés antiguo”. Onomázein 41: 208-226.




DOI: https://doi.org/10.18172/jes.5324

Copyright (c) 2022 Roberto Torre Alonso

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

© Universidad de La Rioja, 2013

ISSN 1576-6357

EISSN 1695-4300