AUTOMATIC LEMMATIZATION OF OLD ENGLISH CLASS III STRONG VERBS (L-Y) WITH ALOEV3

. This article presents ALOEV3, a lemmatizer based on Morphological Generation that allows for the type-based automatic lemmatization of Old English Class III strong verbs beginning with the letters L–Y. The lemmatizer operates on the basis of the inflectional, derivational and morpho-phonological alternation rules characteristic of this class. The generated form-types are checked against the two most reputed Old English corpora, namely the Dictionary of Old English Corpus and The York-Toronto-Helsinki Parsed Corpus of Old English Prose to validate their attestations and assign the corresponding lemma. Results show that 97 percent of the validated forms are successfully assigned a single lemma. The remaining inflectional forms (38 out of 1,256) show competition between two lemmas, which implies that despite the high level of accuracy of the lemmatizer, contextual, token-based analysis is still needed for disambiguation. However, the research shows that competition only occurs in a limited set of lemma pairs and their derivatives. Although the research focuses on but one strong verb class, it confirms that exploring the avenues of automatic lemmatization will contribute to the field of Old English lexicography by either lemmatizing attested inflectional form types or by highlighting areas for manual revision.


AIMS AND RELEVANCE
This article explores the limits and potentials of the automation of the lemmatization process for Old English (OE) Class III strong verbs beginning with the letters L-Y. The aim of this study has been to develop ALOEV3, a tool that performs the morphological generation (MG) of the paradigms of the selected verbs and assesses the attestation of the generated forms in the major OE corpora. Four partial goals have been targeted. To wit, (i) to develop a morphological generator that can turn out OE inflectional forms consistent with the morphological paradigms described in the grammars of this historical language; (ii) to implement ALOEV3 with sets of rules that guide the MG of Class III strong verb forms (both simplex and complex) taking morpho-phonological and spelling alternation variants into consideration and (iii) to automatically assess the attestation of the generated forms in the authoritative OE corpora, Healey et al.'s (2009) The Dictionary of Old English Corpus (DOEC) and Taylor et al.'s (2003) The York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE) and (iv) to link the attested forms with their corresponding lemma. lemmatization has been one of the targets of the Natural Language Generation (NLG) theory, defined by Reiter and Dale (1997: 1) as " […] the subfield of artificial intelligence and computational linguistics that is concerned with the construction of computer systems that can produce understandable texts in English or other human languages from some underlying non-linguistic representation of information". Indeed, according to Manjavacas et al. (2019Manjavacas et al. ( : 1493 "lemmatization is considered to be solved for analytic and resource-rich languaguages such as Englihs". To just mention a few, NLG has been applied to different modern languages, including Spanish (Forcada et al. 2011), Arabic (Khemakhem et al. 2015), Turkish (Oflazer and Saraçlar 2018) or Thai (Tapsai et al. 2021).
However, lemmatization is still a pending task for low-resource languages (including historical languages like OE) for a number of reasons, including overlaps of graphic forms resulting from morphological and phonological developments as well as from diatopic variation, on the one hand, and the relatively limited amount of textual data, on the other. The surviving OE word stock is reduced to a few million words stored in defective, fragmentary corpora. The major corpora compiling OE written records are the DOEC, comprising 3,000,000 words and the YCOE which includes about 1,500,000 words. Further, several smaller corpora including Rissanen et al.'s (1991) The Helsinki Corpus of English texts (300,000 words) and Pintzuk and Plug's (2001) The York-Helsinki Parsed Corpus of Old English Poetry (70,000 words) are at the researcher's disposal. However, these corpora have not yet been lemmatized, although the YCOE and the York-Helsinki corpus do provide linguistic metadata, including morphological tagging and syntactic parsing. Martín Arista et al.'s (2021) An open access annotated parallel corpus Old English-English constitutes the most recent incorporation to the repository of OE corpora. It currently files 110,000 word tokens, which, unlike the corpora cited above, are provided with glosses and translations and are fully lemmatized.
Most importantly, the lack of a lemmatized corpus prevents lemma-form associations to be established. Thus, the development of automatized text taggers, which need abundant linguistic (meta)data for machine training, is heavily restricted. As Manjavacas et al. (2019Manjavacas et al. ( : 1493 explain, "for languages with higher surface variation, lemmatization plays a crucial role as a preprocessing step for downstream tasks such as topic modelling, stylometry and information retrieval." Even if some editions of the texts include glossaries, there is considerable divergence in their structure and lemma selections both internally and intratextually. This is also the case with dictionaries. Bosworth andToller's (1973[1898]) An Anglo-Saxon Dictionary, or Clark Hall's (1996) A Concise Anglo-Saxon Dictionary, show several inconsistencies and differences in headword or lemma selections. The Dictionary of Old English, by Healey et al. (2018), does offer a rather systematic lemma selection process in addition to including lists of attested forms linked to each lemma. However, it has only been completed up to the letter I at this point Given all these limmitations, this article explores the potentials of the MG of a set of OE Class III strong verbs as a means to contribute to the lemmatization of OE corpora. Ferrés et al. (2017: 110) define MG as 'the task of producing the appropriate inflected form of a lemma in a given textual context and according to some morphological features.' Figure 1 shows an example approach of MG with regard to the OE verb swelgan 'to swallow', which, inflected for person (third), number (singular), tense (present), mode (indicative), generates the form swilhþ 'he/she swallows.' With regard to the methodological approach adopted, this article is type-based rather than token-based, which entails the methodological decision of lemmatizing in two steps. Firstly, a type, that is to say, an abstraction from all the attestations of an inflection, is lemmatized. Then, in a second step, that necessarily involves disambiguation, all the attestations are lemmatized in their respective contexts. This article takes the first of these steps.
The outline of the article is as follows. Section 2 describes previous research on the automatic lemmatization of the OE corpora. Section 3 discusses the subclassification of Class III verbs and limits the scope of the research. Section 4 describes the methodological underpinnings. Section 5 discusses the results of the research and elaborates on the limits of automation. Finally, Section 6 summarizes the most relevant findings, presents some conclusions, and offers paths of research yet to be explored.

PREVIOUS RESEARCH ON OE (SEMI)-AUTOMATIC LEMMATIZATION
Several works have recently been published which engage in the semi-automatic lemmatization of OE corpora. Novo Urraca and Ojanguren López (2018) have successfully incorporated lemma assignment to the YCOE syntactic treebanks. Metola Rodríguez (2015; has tackled the lemmatisation of strong verbs; Tío Sáenz (2019) has dealt with weak verbs; and García Fernández (2020) has engaged in the identification of lemmas for preterite-present, anomalous, and contracted verbs. These authors set their research on the knowledge base The Grid (Martín Arista 2013b), while adopting different approaches and scopes that will be discussed in turn. Among other data, The Grid includes a dictionary of Old English (Nerthus), a secondary source indexing dictionary (Freya) and indexed versions of the DOEC and the YCOE.
As for semi-automatic verb lemmatization, Metola Rodríguez (2015) and Tío Sáenz (2019) follow a similar method, and design sets of specific query strings (QS), guided by the specificities of the strong and weak verbal morphology. Their goal is to detect text strings in the DOEC and YCOE indexes compatible with the hypothetical inflectional forms of the corresponding verb classes. However, their research is limited by the possibilities of the search tools included in the storing database program. For example, some QS make use of wildcard symbols to replace strings of texts -e.g. (*) replaces an unspecified number of characters-, which results in the retrieval of a considerable number of undesired results This produces the need on the researchers' part to develop filters, and also increases the amount of manual revision needed to rule them out. To mention just a few, undesired results obtained from searches of the paradigm of beadan 'to command' include the forms beada 'counselor', and beadas 'tables'.
While they adopt similar search strategies, Metola Rodríguez (2015) and Tío Sáenz (2019) differ in the scope of their research in two ways. For one, Metola Rodriguez (2015; 2017) does not take participial forms into consideration. However, Tío Sáenz (2019) acknowledges the verbal nature of past and present participial adjectives and develops a particular QS to detect them. For another, Metola Rodríguez (2015; compares his data with the DOE while Tío Sáenz (2019) additionally checks her results against the YCOE. This allows her to lemmatize 6,300 inflectional forms of verbs I-Y not included in the DOE. At the same time, Tío Sáenz (2019: 544) admits the need for further validation of these forms with the corresponding fragments from the DOEC. As for accuracy, Metola Rodriguez (2017: 73) claims that, before manual revision, his method allows the validation of 80% of the proposed forms when compared to the inflectional forms provided by the DOE.
By contrast, García Fernández (2020) develops a system closer in nature to MG. She compiles a list of inflectional forms of simplex verbs based upon and attested in various grammars including Brunner (1965), Campbell (1987Campbell ( [1959), and Hogg and Fulk (2011). To each of those forms, García Fernández (2020) attaches the different spelling variants of the prefixes listed by Kastovsky (1992) to develop complex inflectional forms. 1 The question that arises from this approach is the direct relation that García Fernández (2020) assumes between the simplex and the complex forms of the verbs. If complex forms are derived on the basis of the spelling of the attested simplexes, it is the case that the lack of a given simple form or a change to the spelling prevents the identification in the corpora of the corresponding derivative. To illustrate this question, consider the case of beþearfst 'you have need', which cannot be identified nor lemmatized in the paradigm of beþurfan because a form þearfst is not listed as an inflectional form of the verb þurfan 'to need'. The attested forms of þurfan include ðearf, ðearfende, ðearft, ðorfeð, ðorfað, ðorfaeð, ðorfte, ðorfton, ðuran, ðurfe, ðurfon, ðurfu, ðyrfe, þearf, þearfende, þearft, þorfende, þorfonde, þorfte, þorfton, þurfan, þurfe, þurfende, þurfon, þurfu, þurfun, and þyrfe (García Fernández 2020: 133). However, the form is attested in the DOEC as shown in (1).
Ic saede drihtne god min eart þu forþon goda minra þu ne beþearfst 'I have said to the Lord, thou art my God, for thou hast no need of my goods.' (Douay-Rheims, 1971[1899: 586) The reviewed works constitute, to the best of my knowledge, the only available research addressed at the automatic verb lemmatization of the OE corpora. Against the described scenario, Section 3 limits the scope of this research.

SCOPE
Considering previous research and the current state of publication of the DOE the scope of this study is limited to the letters L-Y, whose inflections have not been provided yet. This section delves into the features of Class III that allow the identification of form patterns upon which automatic MG processes can be developed.
The classification of OE verbs into seven classes based on the ablaut patterns of the verbal paradigms constitutes, according to von Mengden (2011), an almost undiscussed description of the grammar of this language. As for Class III, the taxonomy given in Figure 2 is generally accepted.

Infinitive
Preterite 1  Nevertheless, the state of affairs displayed in Figure 2 constitutes an oversimplified representation of the various types of OE class III verbs. To pave the ground for later discussion, it is necessary to offer a closer description of Class III subtypes. Levin (1964), and Campbell (1987and Campbell ( [1959) distinguish five Class III subclasses, as seen in Figure 3. Levin's (1964) Classification of OE Class III Strong Verbs.
Class III verbs are characterized by presenting a single vowel and a consonant cluster in the stem. The phonological differences in the first consonant of the cluster justifies the first three subclasses given in Figure 3. Thus, Class IIIa verbs have a nasal sound after the vowel (closed to <i> from Proto-Germanic <e>); Class IIIb verbs present a liquid <l> after the vowel, which remains unchanged, and Class IIIc verbs display a liquid <r> which triggers breaking in the stem vowel. Class IIId includes the aorist present verbs spurnan 'spurn' and murnan 'mourn', and Class IIIe, for a group of verbs which were originally Class V. Class IIIe verbs display a syllabic pattern <e+C> (where C is neither a nasal nor a liquid). Their infinitives were extended with a dental suffix, as is the case with stregdan 'strew' and feohtan 'fight' (Campbell, 1987(Campbell, [1959: 303).
Considering that the ablaut series described by Levin (1964) are identical for subclasses b, c, and d, but for the infinitive vowel, Marcin Krygier (1994) advances a 3 sub-class taxonomy. He postulates that Class IIIa verbs have a nasal sound after the vowel and display the ablaut pattern iNC -a/oNC -uNC -uNC; Class IIIb verbs display a liquid consonant <l, r> and the ablaut pattern elC/eorC/urC -eaLC -uLC -oLC; and Class IIIc includes those originally class V verbs described by Campbell (1987Campbell ( [1959).

METHOD
This section unfolds the methodological underpinnings of this research. There are two basic processes, namely, the generation of plausible inflectional forms for each of the selected verb paradigms and the automatic identification of the generated forms in the authoritative corpora. Given the limitations in the available data described in Section 1, ALOEV3 generates a remarkable amount of forms that cannot be attested in the corpora.
To start with, ALOEV3 generates complete verb paradigms on the basis of attested inflectional patterns. The paradigm of sneorcan 'shrivel' is given in (2). (2) The inflectional paradigm in (2) does not show well attested changes involving vocalic alternation, syncopation, assimilations, Verner's law, of the simplification of consonant clusters. It provides, however, a basic schema upon which MG rules can be implemented to account for variation in (i) the inflectional endings, including the assimilation and simplification of consonants, the weakening of vowels in unaccented syllables as well as diatopic spelling varieties and (ii) the stem, where imutation and Verner's law operate.
Given the high degree of formal variation in OE, the MG of inflectional forms has been addressed in three stages, namely inflection (4.1), mutation (4.2) and derivation (4.3). To round off, section (4.4) describes the automatized attestation process.

MG OF CLASS III VERBS: INFLECTIONAL FORMS
ALOEV3 has been implemented with MG rules to generate inflectional endings subject to formal variation. For the development of the rules given in Figure 4 below, I draw on Campbell (1987: 299-300  Each of the rules in Figure 4 addresses a specific process of formal variation. Several processes may operate upon the same inflectional form. Thus, rules I_#1, I_#2 and I_#3 apply to the 1 st person singular of the present indicative forms. MG processes can be recurrent or non-recurrent. A non-recurrent MG process implies the application of a MG rule which puts an end to the MG process. In recurrent MG processes, a form generated by a specific MG rule re-enters the system for further rules to operate upon it. 4 More precisely, the forms generated by I_#4 are processed again and the ALOEV3 checks whether I_#5, I_#7, I_#9 or I_#10 need to be applied. Rules I_#29 through I_#31 deserve further comment. These rules generate present and past participle forms which, in OE, can be inflected following the weak and strong adjectival paradigms. In line with Campbell (1987Campbell ( [1959: 266-272), rules I_#32-I_#75 have been generated for ALOEV 3 to attach the endings ø; -ne; -es;um; -e; -ra; -u; -re; -a; -an; and -ena to the generated participial forms.

MG OF CLASS III VERBS: STEM MUTATION
Three sets of rules have been designed to account for vocalic and consonantal changes taking place in the verb stem. Such changes include i-mutation, Verner's law, and the simplification of consonantal clusters among others. Evidence supporting these rules is provided by Krygier (1994: 44-49) and Campbell (1987Campbell ( [1959: 310-312). These rule sets are class-specific and only operate on the MG inflectional forms belonging in the corresponding sub-class. 5 Figures 5, 6,

MG OF CLASS III VERBS: DERIVATION
The application of the inflection and mutation rule sets described in sections 4.1 and 4.2 allows for the generation of full paradigms of the simplexes of the verbs under analysis. This section addresses the generation of their complex counterparts.
Complex forms of the selected verb lemmas and of the MG inflectional forms are generated by attaching the set of L-Y prefixes described by Metola Rodríguez (2015) and García Fernández (2020) to each of the MG forms. In spite of lying out of the alphabetical scope of this research, the prefix ge-has also been included in the inventory, given its salient role in the formation of past participle forms. 6 OE prefixes are also subject to spelling variation, which is the reason why a rule has been designed for each prefix formal variant. As the aim of this article is to advance in the lemmatization of OE, these prefixes have been classified in canonical (lemma) and non-canonical forms. (3) shows the canonical form and non-canonical forms (in brackets) of the selected prefixes.
Summarizing, ALOEV3 has been designed to generate standardized Class III verb paradigms. Rules have been implemented in the system to account for formal variation and word formation processes. As a result, a pool of potential OE class III verb forms is developed. Each of the forms is provided with a lemma tag, as shown by Figure 1 above. 6 The automation of the process precludes the development of target specific rules that guide the attachment of prefixes to particular sets of forms. Consequently, the application of the derivational MG rules results in the MG of complete ge-prefixed paradigms.
Before moving onto the next stage, MG forms must be reduced to types. In the MG process, homographic, formally ambiguous forms have been generated both within and across paradigms. Consequently, intra-paradigmatic homographs must be ruled out. To do so, ALOEV3 searches each paradigm for duplicated MG forms and reduces them to one single occurrence, as shown by Figure 8. At the same time, the process described in Figure 8 allows for the presence of formally ambiguous forms in different paradigms. Thus, the forms metsunge (medsingan) and metsunge (medswingan). The identification in the corpora of these ambiguous forms will give raise to instances of lemma competition. The implications hereby derived will be discussed in section 5.

ASSESSING THE ATTESTATION OF FORMS IN THE CORPORA
The final part of the method tackles the identification of the MG forms in the selected corpora. Such identification is automatically carried out by means of three interrelated databases. To wit, (i) ALOEV3_DB, a database filing ALOEV3's generated form types; (ii) DOEC_DB, which stores the indexed version of the DOEC (Healey et al. 2009) retrieved from Martín Arista's (2013b) The Grid, and (iii) YCOE_DB, a database filing those words that are labelled with a verbal POS -part of speech-tag in the YCOE (Taylor et al. 2003). A list of POS tags and their meaning is given in Appendix 1.
ALOEV3_DB includes a field for the MG form (Inflectional form), a field for the lemma from which the form has been generated (Class III Lemma), a field to check attestation in the DOEC (DOEC attestation), a field to check attestation in the YCOE (YCOE attestation), and a field for the YCOE POS (Tag summary) for forms attested in the YCOE. DOEC_DB incorporates a field for the indexed form in the DOEC (ConcTerm), a field for the text before the concorded term (Prefield), and a field for the text following the concorded term (Postfield). YCOE_DB present a field for the inflectional form in the YCOE (YCOE_verbal_form) and a field for the different verbal POS tags assigned to such form (YCOE_verbal_tag). Figure 9 offers a view of limpe in each of the databases. The three databases are related to one another through the fields Inflectional form, ConcTerm, and YCOE_verbal_form. Consequently, if there is a spelling coincidence between the form in Inflectional form and ConcTerm and/or the YCOE_verbal_form, the corresponding DOEC_attestation and/or YCOE_attestation fields (YES) are activated. Such is the case of limpe in Figure 9, whose attestation is confirmed in both corpora. Figure 10 shows these relationships.

RESULTS AND DISCUSSION
ALOEV3 has generated 1,101,555 form types, which, after being filed in ALOEV3_DB, are compared with the inflectional forms filed in DOEC_DB and YCOE_DB. On the quantitative side, such comparison turns out the identification of 571 types in the DOEC belonging to 150 distinct lemmas (Appendix 2a); 653 in both the DOEC and the YCOE corresponding to 121 different lemmas (Appendix 2b), and 31 types from 14 lemmas which are only attested in the YCOE (Appendix 2c). The identified types are grouped by lemma, which may or may not be itself attested. When the generated form is identified in the YCOE, the POS tag is provided alongside. (4) offers an account of the forms of þerscan 'to strike, beat' identified in the corpora.

YCOE: ðerscan (VB).
All in all, 1,256 form types have been identified in the corpora, corresponding to 192 distinct lemmas. On the qualitative side, 1,218 forms are assigned a distinct lemma. This means that the MG lemmatizer reaches 96,97% efficiency when assigning a single lemma to Class III strong verb inflectional forms, a figure which constitutes a 17% percent improvement with respect to Metola Rodriguez's (2017) approach. The remaining 38 forms are duplications which show competition among lemmas. The competing lemmas and inflectional forms are given in (5).
(5) gesingan ~ geswingan : gesungaen, gesunge, gesungen, gesungena, gesungene, gesungenne, gesungenre, gesungenum, gisunge; medrinnan ~ murnan: mearn; medsingan ~ medswingan: metsunge; stincan ~ stingan: stincð, stincþ; singan ~ swingan: sunge, sungen, sungene; sungenne; swincan ~ swingan: swincst, swincð. As seen in (5), most of the lemma competition occurrences involve the lemas singan 'to sing' and swingan 'to beat, strike' and their derivatives. Overlapping cases mostly occur in the preterite and past participle forms. Campbell (1987Campbell ( [1959: 310) attests the occurrence of a past participle sungen as well as a form swungen in the Martyrology, which justifies the implementation of the rule -w-> -u-> ø when followed by -u-to the lemmatizer. The other cases in conflict involve the devoicing of /g/ into /k/ through assimilation processes, thus giving way to homophonic and homographic forms for the verbs stincan~stingan and swincan~swingan. The last case in competition corresponds to the inflectional form mearn, which is assigned to both the prefixed lemma medrinnan (through the syncopated form of the prefix; metathesis of rinnan into irnan and the retracted form of the preterite instead of the most common -ea-diphthong) and to the aorist present murnan. The complexity of the processes justifying the association of mearn to the lemma medrinnan suggests it is not a correct association. Indeed, a quick search in the DOEC reveals that there are only six occurrences of this inflectional form, four of which appear in Beowulf. In all cases, the verb at stake is murnan 'to care, be anxious, mourn'. Take (6) as illustration.
'Beowulf made himself ready with noble armor, he didn't mourn for his life' (Hostetter, n.d.) This and other examples justify the need for additional manual revisions of the results. A fully automated method cannot be put forward for two main reasons. The limited scope of the research and the type-based approach adopted.
As for the scope of the research, only Class III strong verbs are being analysed in this article. OE morphological features and spelling variation lead to the development of formally ambiguous forms, not only within a single paradigm of a given lexical class, as shown in (2) above, but also across paradigms across lexical classes. As an example, consider the form gelimpe, which ALOEV3 generates in the paradigm of gelimpan. Such form, however, can also be associated to the nominal paradigm of the neuter noun gelimp 'an event, accident'. In such cases, disambiguation and lemma assignment must come from token-based lemmatization.
The second reason is that type-based lemmatization is context independent and, consequently, semantic features and syntagmatic relationships cannot be taken into account. ALOEV3 assigns a lemma to a form-type, but disregards the amount of occurrences as well as the context(s) in which such form is attested. However, typebased lemmatization is useful when it comes to identifying forms in conflict and areas of formal overlapping.
In the case of gelimpe, there are 58 occurrences attested in the DOEC. The contextual analysis of these forms will prove whether the form-lemma association established by ALOEV3 between gelimpe and gelimpan holds true in each context. As illustration, (7) shows a confirmed association.
(7) [BenR 027200 (11.36.1)] […], butan hit faerlice swa gelimpe, […] […] unless it happen by chance […] (Riyeff 2017: 67) Once the association is confirmed, the lemma as well as any other relevant inflectional information can be incorporated by the lexicographer into a lemmatized corpus such as ParCorOEv2. The combination of both tools represents a remarkable advance in the process of lemmatizing the existing OE corpora. ALOEV3 establishes a wide net of relations between forms and lemmas, while the contextual analysis provided by ParCorOEv2 can confirm the accuracy of the association. When multiple lemmas are associated to a given form type, the lexicographer is provided with a closed set of lemmas to choose from, which guides the manual revision and lemma selection processes, thus enhancing the overall efficiency of the method.

CONCLUSIONS
NLG greatly depends on the existence of large, lemmatized corpora. Given that the extant OE corpora are not lemmatized, there is a great gap to bridge before considering OE as a candidate language to conduct NLG research on an extensive basis. This article has presented a way to fill this gap, by exploring and checking the possibilities to carry out a systematic and automatized type-based lemmatization process of the OE corpora. A semi-automatic process of lemmatization has been proposed, rooted in the theoretical framework of MG. The research has been guided by three main aims, namely automation, validation, and accuracy. Regarding automation, several issues have arisen, especially regarding the spelling and morphological features of OE. Nevertheless, the article has proven that type-based lemmatization can be largely automatized, thus speeding and systematizing the lemmatization of OE. As for validation, the proposed method generates plausible OE word forms and compares them with the major OE corpora. This constitutes a step forward regarding previous research on OE morphology based on dictionaries. As for accuracy, the inferential approach followed here results in the identification of specific forms in the corpora which are mostly assigned a single lemma. Thus, the amount of manual revision is greatly reduced.
In this exploration of non-manual lemmatization, the limits of automation have also been reached in the current state of affairs. While type-based lemmatization can be automatized to a large extent, the spelling and morphological properties of OE prevent the MG analysis from systematizing and implementing rules to account for all the spelling variants found in the corpora and assign a distinctive lemma to all the validated forms. However, as research advances through the strong verb classes, light will be shed upon this matter, in such a way that the disambiguation of homographic forms will be possible or, at least, by highlighting those instances that require contextual analysis.
This article opens two avenues of research. The first is the completion of the analysis of Class III strong verbs. There are two major groups of verbs that have been disregarded. First, only the letters L-Y have been considered. Second, this article has dealt with simplex and single prefixed verbs. This means that cases of recursive prefixation like inbestingan 'to penetrate' or onawinnan 'to fight against' have been left out. The second avenue involves extending this method to other strong verb classes and later to other variable lexical categories.