Issues of Homonyms/Homoforms Automatic Recognition in Applied Linguistics

Akerke Meirbekova; Anar Fazylzhanova; Aiman Zhanabekova; Aigul Amirbekova; Gulnara Talgatkyzy

doi:10.18172/cif.6716

Autores/as

Akerke Meirbekova A. Baitursynov Institute of Linguistics https://orcid.org/0009-0000-2998-1349
Anar Fazylzhanova A. Baitursynov Institute of Linguistics
Aiman Zhanabekova A. Baitursynov Institute of Linguistics
Aigul Amirbekova A. Baitursynov Institute of Linguistics
Gulnara Talgatkyzy A. Baitursynov Institute of Linguistics

DOI:

https://doi.org/10.18172/cif.6716

Palabras clave:

corpus lingüístico, analizador morfológico, algoritmo, pares homográficos, rasgos formales

Resumen

El fenómeno de la homonimia crea dificultades adicionales para los procesos de reconocimiento automático de texto, lo que requiere el uso de algoritmos y métodos de procesamiento más complejos. La investigación busca abordar los problemas del reconocimiento automático de homónimos en kazajo, ruso, inglés, turco y tártaro. El estudio identificó los principales problemas que surgen en la detección e identificación automática de homónimos/homoformas, en particular, la falta de una estructura oracional clara (ruso, turco), la significativa diversidad morfológica de la lengua (presencia de un gran número de categorías gramaticales, alto nivel de afijación), la ambigüedad contextual y el insuficiente desarrollo de las cuestiones teóricas del estudio de la homonimia. Se consideraron los métodos más eficaces para distinguir homónimos en corpus nacionales de ruso (modelo de Markov con máxima entropía), inglés (incrustaciones del modelo lingüístico), turco (método híbrido) y tártaro (método de eliminación de homónimos de un corpus marcado con homónimos). El presente estudio se centra en la solución del problema desde la perspectiva del análisis morfológico y sintáctico, y requiere un estudio más detallado de los aspectos semánticos y contextuales.

Descargas

Los datos de descargas todavía no están disponibles.

Citas

ALMALKI, I., METWALLY, A.A., and ASIRI, E. (2025). “The Role of Online Dictionaries in Translating Literary Collocations: An Experimental-based Case Study of BA Students at KKU”. Dragoman, 17, 136-156. https://doi.org/10.63132/ati.2025.therol.99950666 DOI: https://doi.org/10.63132/ati.2025.therol.99950666

ALWATEER, M.M., ELMEZAIN, M., FARSI, M., and ATLAM, E. (2023). “Hidden Markov Models for pattern recognition”. In Khalil, H., and Ghaffari, A. (eds.), Markov Model – Theory and Applications. Rijeka, IntechOpen. https://doi.org/10.5772/intechopen.1001364 DOI: https://doi.org/10.5772/intechopen.1001364

BASHMANIVSKIY, O. (2016). “The problems of automated translation of business correspondence using free software”. Society. Document. Communication, 1(2), 79-90.

BILOUS, N., and BARLADIUHA, A. (2020). “Structurally semantic and stylistic features of phraseological units”. International Journal of Philology, 24(4), 70-77. https://doi.org/10.31548/philolog2020.04.070 DOI: https://doi.org/10.31548/philolog2020.04.070

BORYSOVA, N. (2023). “Mind maps as an effective tool in “Practical English language course””. Scientia et Societus, 2(2), 96-109. https://doi.org/10.69587/ss/2.2023.96 DOI: https://doi.org/10.69587/ss/2.2023.96

BRAIT, B., SOUZA, G.T., AMORIM, M., S.A.P.F. PENTEADO, P.M.H. CRUZ, R.C.G., STELLA, P.R., and STORTO, L.J. (2025). “Bakhtin and Linguistics: A Dialogue Settled in the Beginning of the 20’s/”. Bakhtiniana, 20(1), e66039e. https://doi.org/10.1590/2176-4573e66039 DOI: https://doi.org/10.1590/2176-4573e66039

CASAS, B., HERNÁNDEZ-FERNÁNDEZ, A., CATALÀ, N., FERRER-I-CANCHO, R., and BAIXERIES, J. (2019). “Polysemy and brevity versus frequency in language”. Computer Speech & Language, 58, 19-50. https://doi.org/10.1016/j.csl.2019.03.007 DOI: https://doi.org/10.1016/j.csl.2019.03.007

CHAIKA, O. (2023). “Key advantages of multiculturalism for foreign language teaching and learning”. Humanities Studios: Pedagogy, Psychology, Philosophy, 11(1), 119-128. https://doi.org/10.31548/hspedagog14(1).2023.119-128 DOI: https://doi.org/10.31548/hspedagog14(1).2023.119-128

CING, D.L., and SOE, K.M. (2020). “Improving accuracy of part-of-speech (POS) tagging using hidden Markov model and morphological analysis for Myanmar Language”. International Journal of Electrical and Computer Engineering, 10.2, 2023-2030. http://doi.org/10.11591/ijece.v10i2.pp2023-2030 DOI: https://doi.org/10.11591/ijece.v10i2.pp2023-2030

DOSZHAN, G. (2016). “Semantic and pragmatical aspects of English business lexemes in Turkic languages”. Procedia Economics and Finance, 39, 24-31. https://doi.org/10.1016/S2212-5671(16)30236-2 DOI: https://doi.org/10.1016/S2212-5671(16)30236-2

DURO, R., and KONDRATENKO, Y. (2015). Advances in Intelligent Robotics and Collaborative Automation. River Publishers. https://doi.org/10.13052/rp-9788793237049 DOI: https://doi.org/10.13052/rp-9788793237049

ELOV, B.B., HAMROYEVA, S.M., and AXMEDOVA, X.I. (2023). “Methods for creating a morphological analyzer”. In Zaynidinov, H., Singh, M., Shanker Tiwary, U., and Singh, D. (eds.), Intelligent Human Computer Interaction 14th International Conference. Cham, Springer, pp. 27-38. https://doi.org/10.1007/978-3-031-27199-1_4 DOI: https://doi.org/10.1007/978-3-031-27199-1_4

FASHWAN, A., and ALANSARY, S. (2021). “A morphologically annotated corpus and a morphological analyzer for Egyptian Arabic”. Procedia Computer Science, 189, 203-210. https://doi.org/10.1016/j.procs.2021.05.084 DOI: https://doi.org/10.1016/j.procs.2021.05.084

HABIBI, A.A., HAUER, B., and KONDRAK, G. (2021). “Homonymy and polysemy detection with multilingual information”. In Vossen, P., and Fellbaum, C. (eds.), Proceedings of the 11th Global Wordnet Conference. Potchefstroom, Global Wordnet Association, pp. 26-35. https://aclanthology.org/2021.gwc-1.4 DOI: https://doi.org/10.18653/v1/2021.gwc-1.4

HASSAN, M.A., KHAN, R.R., KABIR, F., and RAHMAN, C.M. (2009). “Finding the appropriate meaning of polysemous words using context dependency”. In Proceedings of 2nd International Conference on Data Management (ICDM’2009). https://www.researchgate.net/publication/292132545_Finding_the_Appropriate_Meaning_of_Polysemous_Words_Using_Context_Dependency

HOFF, S.L., and BARBOZA, G. (2025). “Languages, Language, and Linguists: The Study of the Diversity of Languages According to Saussure and Benveniste”. Bakhtiniana, 20(1), e65692e. https://doi.org/10.1590/2176-4573e65692 DOI: https://doi.org/10.1590/2176-4573p65692

HROMKO, T., PANCHUK, L., and MATVEIKO, O. (2023). “Versification of language units from the lens of morphological statistics (Modern British and German poetry)”. International Journal of Philology, 27(2), 23-32. https://doi.org/10.31548/philolog14(2).2023.03 DOI: https://doi.org/10.31548/philolog14(2).2023.03

KALJANOV, A. (2021). “Some questions of classification of homonyms in the Karakalpak language”. Bulletin of the Karakalpak Branch of the Academy of Sciences of the Republic of Uzbekistan, 265.4, 196-199. https://www.researchgate.net/publication/357164236_Qaraqalpaq_tilindegi_omonimler_klassifikaciyasinin_ayirim_maseleleri

KANERVA, J., GINTER, F., and SALAKOSKI, T. (2021). “Universal Lemmatizer: A sequence-to-sequence model for lemmatizing Universal Dependencies treebanks”. Natural Language Engineering, 27.5, 545-574. https://doi.org/10.1017/S1351324920000224 DOI: https://doi.org/10.1017/S1351324920000224

KESSIKBAYEVA, G., and CICEKLI, I. (2016). “A rule based morphological analyzer and a morphological disambiguator for Kazakh language”. Linguistics and Literature Studies, 4.1, 96-104. https://doi.org/10.13189/lls.2016.040111 DOI: https://doi.org/10.13189/lls.2016.040111

KHAKIMOV, B., GATAULLIN, R., and GILMULLIN, R. (2016). “Grammatical disambiguation in the Tatar national corpus”. In Moreno Ortiz, A. and Pérez Hernández, C. (eds.), 8th International Conference on Corpus Linguistics. Málaga, University of Málaga, Spanish Association for Corpus Linguistics, pp. 236-244. https://doi.org/10.29007/jkgl DOI: https://doi.org/10.29007/jkgl

KIM, H., and KIM, H. (2020). “Integrated model for morphological analysis and named entity recognition based on label attention networks in Korean”. Applied Sciences, 10.11, 3740. https://doi.org/10.3390/app10113740 DOI: https://doi.org/10.3390/app10113740

KONDRATENKO, Y.P. (2014). “Robotics, Automation and information systems: Future perspectives and correlation with culture, Sport and life science”. Lecture Notes in Economics and Mathematical Systems, 675, 43-55. https://doi.org/10.1007/978-3-319-03907-7_6 DOI: https://doi.org/10.1007/978-3-319-03907-7_6

KULYK, O.D. (2023). “Training future translators in the age of artificial intelligence”. Scientia et Societus, 2(1), 48-56. https://doi.org/10.31470/2786-6327/2023/3/48-56 DOI: https://doi.org/10.31470/2786-6327/2023/3/48-56

LEE, Y. (2021). “Systematic homonym detection and replacement based on contextual word embedding”. Neural Processing Letters, 53, 17-36. https://doi.org/10.1007/s11063-020-10376-8 DOI: https://doi.org/10.1007/s11063-020-10376-8

LI, S., PAN, R., LUO, H., LIU, X., and ZHAO, G. (2021). “Adaptive cross-contextual word embedding for word polysemy with unsupervised topic modelling”. Knowledge-Based Systems, 218, 106827. https://doi.org/10.1016/j.knosys.2021.106827 DOI: https://doi.org/10.1016/j.knosys.2021.106827

LIU, F., LU, H., and NEUBIG, G. (2018). “Handling homographs in neural machine translation”. In Walker, M., Heng, J. and Stent, A. (eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans, Association for Computational Linguistics, pp. 1336-1345. https://doi.org/10.18653/v1/N18-1121 DOI: https://doi.org/10.18653/v1/N18-1121

PAPA, E., MANDRI, H., ZOKIROVA, N., RAXMANOVA, A., and PETROVA, E. (2025). “The Impact of Quality Translation on the Correct Interpretation of Literary Works”. Dragoman, 2025(18), 118-143. https://doi.org/10.63132/ati.2025.theimp.36423928 DOI: https://doi.org/10.63132/ati.2025.theimp.36423928

PATIL, N., PATIL, A., and PAWAR, B.V. (2020). “Named entity recognition using conditional random fields”. Procedia Computer Science, 167, 1181-1188. https://doi.org/10.1016/j.procs.2020.03.431 DOI: https://doi.org/10.1016/j.procs.2020.03.431

PETERS, M.E., NEUMANN, M., IYYER, M., GARDNER, M., CLARK, C., LEE, K., and ZETTLEMOYER, L. (2018). “Deep contextualized word representations”. In Walker, M., Heng, J. and Stent, A. (eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans, Association for Computational Linguistics, pp. 2227-2237. https://doi.org/10.18653/v1/N18-1202 DOI: https://doi.org/10.18653/v1/N18-1202

QUECEDO, J.M.H., KOPPATZ, M.W., and YANGARBER, R. (2020). “Neural disambiguation of lemma and part of speech in morphologically rich languages”. In Proceedings of the Twelfth Language Resources and Evaluation Conference. Marseille, European Language Resources Association, pp. 3573-3582. https://aclanthology.org/2020.lrec-1.439

RICE, C.A., BEEKHUIZEN, B., DUBROVSKY, V., STEVENSON, S., and ARMSTRONG, B.C. (2019). “A comparison of homonym meaning frequency estimates derived from movie and television subtitles, free association, and explicit ratings”. Behavior Research Methods, 51, 1399-1425. https://doi.org/10.3758/s13428-018-1107-7 DOI: https://doi.org/10.3758/s13428-018-1107-7

SAK, H., GÜNGÖR, T., and SARAÇLAR, M. (2007). “Morphological disambiguation of Turkish text with perceptron algorithm”. In Gelbukh, A. (ed.), Computational Linguistics and Intelligent Text Processing. Berlin, Heidelberg, Springer, pp. 107-118. https://www.researchgate.net/publication/226961087_Morphological_Disambiguation_of_Turkish_Text_with_Perceptron_Algorithm DOI: https://doi.org/10.1007/978-3-540-70939-8_10

SERDIUK, N. (2023). “Methodology and organization of professional research and academic integrity in the formation of a modern Foreign language and literature teacher”. Professional Education: Methodology, Theory and Technologies, 9(1), 159-179. https://doi.org/10.31470/2415-3729-2023-17-159-179 DOI: https://doi.org/10.31470/2415-3729-2023-17-159-179

SHASHANK, U., VENKATESH, B.N., RAJESHWARI, S.B., and KALLIMANI, J.S. (2019). “Identification and contextual semantic retrieval of polysemy words”. International Journal of Recent Technology and Engineering, 8.2S8, 1201-1204. https://doi.org/10.35940/ijrte.B1038.0882S819 DOI: https://doi.org/10.35940/ijrte.B1038.0882S819

SHLAPAK, I. (2016). “Terms polysemy in translation of scientific and technical literature”. Society. Document. Communication, 1(1), 227-234.

SHYNGYSSOVA, N.T., and SKRIPNIKOVA, А.I. (2018). “Polylingual periodicals of Kazakhstan”. Bulletin of Al-Farabi Kazakh National University, Series of Journalism, 49.3, 38-45. https://bulletin-journalism.kaznu.kz/index.php/1-journal/article/view/1029

STRILETS, V. (2021). “Application of corpus technologies in teaching specialized translation”. Humanities Studios: Pedagogy, Psychology, Philosophy, 9(4), 48-52. https://doi.org/10.31548/hspedagog2021.04.048 DOI: https://doi.org/10.31548/hspedagog2021.04.048

TANG, Х. (2006). “English morphological analysis with machine-learned rules”. In Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation. Wuhan, Tsinghua University Press, pp. 35-41. https://doi.org/http://hdl.handle.net/2065/29035

TEMIRGALIEVA, A. (2016). Frequency dictionary of Kazakh language developed.

TOEWS, D., and VAN HOLLAND, L. (2019). “Determining domain-specific differences of polysemous words using context information”. In Proceedings of International Conference on Requirements Engineering – Foundation for Software Quality. Wachtberg, Fraunhofer Institute for Communication, Information Processing and Ergonomics. https://publica.fraunhofer.de/handle/publica/410113

VAN DEN BEUKEL, S., and AROYO, L. (2018). “Homonym detection for humor recognition in short text”. In Balahur, A., Mohammad, S. M., Hoste, V., and Klinger, R. (eds.), Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Brussels, Association for Computational Linguistics, pp. 286-291. https://doi.org/10.18653/v1/W18-6242 DOI: https://doi.org/10.18653/v1/W18-6242

WILSON, K., and MARANTZ, A. (2022). “Contextual embeddings can distinguish homonymy from polysemy in a human-like way”. In Proceedings of the 5th International Conference on Natural Language and Speech Processing. Trento, Association for Computational Linguistics, pp. 144-155. https://aclanthology.org/2022.icnlsp-1.17

WOLDEYOHANNIS, M.M., and MESHESHA, M. (2022). “Usable Amharic text corpus for natural language processing applications”. Applied Corpus Linguistics, 2.3, 100033. https://doi.org/10.1016/j.acorp.2022.100033 DOI: https://doi.org/10.1016/j.acorp.2022.100033

YURET, D., and TÜRE, F. (2006). “Learning morphological disambiguation rules for Turkish”. In Proceedings of the Main Conference on “Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics”. New York, Association for Computational Linguistics, pp. 328-334. https://doi.org/10.3115/1220835.1220877 DOI: https://doi.org/10.3115/1220835.1220877

ZHUBANOV, A.K., and ZHANABEKOVA, A.A. (2016). Corpus linguistics: Educational tool. Almaty, “Kazakh Language” Publishing House.