Cuestiones de reconocimiento automático de homónimos/homoformas en lingüística aplicada
DOI:
https://doi.org/10.18172/cif.6716Palabras clave:
corpus lingüístico, analizador morfológico, algoritmo, pares homográficos, rasgos formalesResumen
El fenómeno de la homonimia crea dificultades adicionales para los procesos de reconocimiento automático de texto, lo que requiere el uso de algoritmos y métodos de procesamiento más complejos. La investigación busca abordar los problemas del reconocimiento automático de homónimos en kazajo, ruso, inglés, turco y tártaro. El estudio identificó los principales problemas que surgen en la detección e identificación automática de homónimos/homoformas, en particular, la falta de una estructura oracional clara (ruso, turco), la significativa diversidad morfológica de la lengua (presencia de un gran número de categorías gramaticales, alto nivel de afijación), la ambigüedad contextual y el insuficiente desarrollo de las cuestiones teóricas del estudio de la homonimia. Se consideraron los métodos más eficaces para distinguir homónimos en corpus nacionales de ruso (modelo de Markov con máxima entropía), inglés (incrustaciones del modelo lingüístico), turco (método híbrido) y tártaro (método de eliminación de homónimos de un corpus marcado con homónimos). El presente estudio se centra en la solución del problema desde la perspectiva del análisis morfológico y sintáctico, y requiere un estudio más detallado de los aspectos semánticos y contextuales.
Descargas
Citas
ALMALKI, I., METWALLY, A.A., and ASIRI, E. (2025). “The Role of Online Dictionaries in Translating Literary Collocations: An Experimental-based Case Study of BA Students at KKU”. Dragoman, 17, 136-156. https://doi.org/10.63132/ati.2025.therol.99950666 DOI: https://doi.org/10.63132/ati.2025.therol.99950666
ALWATEER, M.M., ELMEZAIN, M., FARSI, M., and ATLAM, E. (2023). “Hidden Markov Models for pattern recognition”. In Khalil, H., and Ghaffari, A. (eds.), Markov Model – Theory and Applications. Rijeka, IntechOpen. https://doi.org/10.5772/intechopen.1001364 DOI: https://doi.org/10.5772/intechopen.1001364
BASHMANIVSKIY, O. (2016). “The problems of automated translation of business correspondence using free software”. Society. Document. Communication, 1(2), 79-90.
BILOUS, N., and BARLADIUHA, A. (2020). “Structurally semantic and stylistic features of phraseological units”. International Journal of Philology, 24(4), 70-77. https://doi.org/10.31548/philolog2020.04.070 DOI: https://doi.org/10.31548/philolog2020.04.070
BORYSOVA, N. (2023). “Mind maps as an effective tool in “Practical English language course””. Scientia et Societus, 2(2), 96-109. https://doi.org/10.69587/ss/2.2023.96 DOI: https://doi.org/10.69587/ss/2.2023.96
BRAIT, B., SOUZA, G.T., AMORIM, M., S.A.P.F. PENTEADO, P.M.H. CRUZ, R.C.G., STELLA, P.R., and STORTO, L.J. (2025). “Bakhtin and Linguistics: A Dialogue Settled in the Beginning of the 20’s/”. Bakhtiniana, 20(1), e66039e. https://doi.org/10.1590/2176-4573e66039 DOI: https://doi.org/10.1590/2176-4573e66039
CASAS, B., HERNÁNDEZ-FERNÁNDEZ, A., CATALÀ, N., FERRER-I-CANCHO, R., and BAIXERIES, J. (2019). “Polysemy and brevity versus frequency in language”. Computer Speech & Language, 58, 19-50. https://doi.org/10.1016/j.csl.2019.03.007 DOI: https://doi.org/10.1016/j.csl.2019.03.007
CHAIKA, O. (2023). “Key advantages of multiculturalism for foreign language teaching and learning”. Humanities Studios: Pedagogy, Psychology, Philosophy, 11(1), 119-128. https://doi.org/10.31548/hspedagog14(1).2023.119-128 DOI: https://doi.org/10.31548/hspedagog14(1).2023.119-128
CING, D.L., and SOE, K.M. (2020). “Improving accuracy of part-of-speech (POS) tagging using hidden Markov model and morphological analysis for Myanmar Language”. International Journal of Electrical and Computer Engineering, 10.2, 2023-2030. http://doi.org/10.11591/ijece.v10i2.pp2023-2030 DOI: https://doi.org/10.11591/ijece.v10i2.pp2023-2030
DOSZHAN, G. (2016). “Semantic and pragmatical aspects of English business lexemes in Turkic languages”. Procedia Economics and Finance, 39, 24-31. https://doi.org/10.1016/S2212-5671(16)30236-2 DOI: https://doi.org/10.1016/S2212-5671(16)30236-2
DURO, R., and KONDRATENKO, Y. (2015). Advances in Intelligent Robotics and Collaborative Automation. River Publishers. https://doi.org/10.13052/rp-9788793237049 DOI: https://doi.org/10.13052/rp-9788793237049
ELOV, B.B., HAMROYEVA, S.M., and AXMEDOVA, X.I. (2023). “Methods for creating a morphological analyzer”. In Zaynidinov, H., Singh, M., Shanker Tiwary, U., and Singh, D. (eds.), Intelligent Human Computer Interaction 14th International Conference. Cham, Springer, pp. 27-38. https://doi.org/10.1007/978-3-031-27199-1_4 DOI: https://doi.org/10.1007/978-3-031-27199-1_4
FASHWAN, A., and ALANSARY, S. (2021). “A morphologically annotated corpus and a morphological analyzer for Egyptian Arabic”. Procedia Computer Science, 189, 203-210. https://doi.org/10.1016/j.procs.2021.05.084 DOI: https://doi.org/10.1016/j.procs.2021.05.084
HABIBI, A.A., HAUER, B., and KONDRAK, G. (2021). “Homonymy and polysemy detection with multilingual information”. In Vossen, P., and Fellbaum, C. (eds.), Proceedings of the 11th Global Wordnet Conference. Potchefstroom, Global Wordnet Association, pp. 26-35. https://aclanthology.org/2021.gwc-1.4 DOI: https://doi.org/10.18653/v1/2021.gwc-1.4
HASSAN, M.A., KHAN, R.R., KABIR, F., and RAHMAN, C.M. (2009). “Finding the appropriate meaning of polysemous words using context dependency”. In Proceedings of 2nd International Conference on Data Management (ICDM’2009). https://www.researchgate.net/publication/292132545_Finding_the_Appropriate_Meaning_of_Polysemous_Words_Using_Context_Dependency
HOFF, S.L., and BARBOZA, G. (2025). “Languages, Language, and Linguists: The Study of the Diversity of Languages According to Saussure and Benveniste”. Bakhtiniana, 20(1), e65692e. https://doi.org/10.1590/2176-4573e65692 DOI: https://doi.org/10.1590/2176-4573p65692
HROMKO, T., PANCHUK, L., and MATVEIKO, O. (2023). “Versification of language units from the lens of morphological statistics (Modern British and German poetry)”. International Journal of Philology, 27(2), 23-32. https://doi.org/10.31548/philolog14(2).2023.03 DOI: https://doi.org/10.31548/philolog14(2).2023.03
KALJANOV, A. (2021). “Some questions of classification of homonyms in the Karakalpak language”. Bulletin of the Karakalpak Branch of the Academy of Sciences of the Republic of Uzbekistan, 265.4, 196-199. https://www.researchgate.net/publication/357164236_Qaraqalpaq_tilindegi_omonimler_klassifikaciyasinin_ayirim_maseleleri
KANERVA, J., GINTER, F., and SALAKOSKI, T. (2021). “Universal Lemmatizer: A sequence-to-sequence model for lemmatizing Universal Dependencies treebanks”. Natural Language Engineering, 27.5, 545-574. https://doi.org/10.1017/S1351324920000224 DOI: https://doi.org/10.1017/S1351324920000224
KESSIKBAYEVA, G., and CICEKLI, I. (2016). “A rule based morphological analyzer and a morphological disambiguator for Kazakh language”. Linguistics and Literature Studies, 4.1, 96-104. https://doi.org/10.13189/lls.2016.040111 DOI: https://doi.org/10.13189/lls.2016.040111
KHAKIMOV, B., GATAULLIN, R., and GILMULLIN, R. (2016). “Grammatical disambiguation in the Tatar national corpus”. In Moreno Ortiz, A. and Pérez Hernández, C. (eds.), 8th International Conference on Corpus Linguistics. Málaga, University of Málaga, Spanish Association for Corpus Linguistics, pp. 236-244. https://doi.org/10.29007/jkgl DOI: https://doi.org/10.29007/jkgl
KIM, H., and KIM, H. (2020). “Integrated model for morphological analysis and named entity recognition based on label attention networks in Korean”. Applied Sciences, 10.11, 3740. https://doi.org/10.3390/app10113740 DOI: https://doi.org/10.3390/app10113740
KONDRATENKO, Y.P. (2014). “Robotics, Automation and information systems: Future perspectives and correlation with culture, Sport and life science”. Lecture Notes in Economics and Mathematical Systems, 675, 43-55. https://doi.org/10.1007/978-3-319-03907-7_6 DOI: https://doi.org/10.1007/978-3-319-03907-7_6
KULYK, O.D. (2023). “Training future translators in the age of artificial intelligence”. Scientia et Societus, 2(1), 48-56. https://doi.org/10.31470/2786-6327/2023/3/48-56 DOI: https://doi.org/10.31470/2786-6327/2023/3/48-56
LEE, Y. (2021). “Systematic homonym detection and replacement based on contextual word embedding”. Neural Processing Letters, 53, 17-36. https://doi.org/10.1007/s11063-020-10376-8 DOI: https://doi.org/10.1007/s11063-020-10376-8
LI, S., PAN, R., LUO, H., LIU, X., and ZHAO, G. (2021). “Adaptive cross-contextual word embedding for word polysemy with unsupervised topic modelling”. Knowledge-Based Systems, 218, 106827. https://doi.org/10.1016/j.knosys.2021.106827 DOI: https://doi.org/10.1016/j.knosys.2021.106827
LIU, F., LU, H., and NEUBIG, G. (2018). “Handling homographs in neural machine translation”. In Walker, M., Heng, J. and Stent, A. (eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans, Association for Computational Linguistics, pp. 1336-1345. https://doi.org/10.18653/v1/N18-1121 DOI: https://doi.org/10.18653/v1/N18-1121
PAPA, E., MANDRI, H., ZOKIROVA, N., RAXMANOVA, A., and PETROVA, E. (2025). “The Impact of Quality Translation on the Correct Interpretation of Literary Works”. Dragoman, 2025(18), 118-143. https://doi.org/10.63132/ati.2025.theimp.36423928 DOI: https://doi.org/10.63132/ati.2025.theimp.36423928
PATIL, N., PATIL, A., and PAWAR, B.V. (2020). “Named entity recognition using conditional random fields”. Procedia Computer Science, 167, 1181-1188. https://doi.org/10.1016/j.procs.2020.03.431 DOI: https://doi.org/10.1016/j.procs.2020.03.431
PETERS, M.E., NEUMANN, M., IYYER, M., GARDNER, M., CLARK, C., LEE, K., and ZETTLEMOYER, L. (2018). “Deep contextualized word representations”. In Walker, M., Heng, J. and Stent, A. (eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans, Association for Computational Linguistics, pp. 2227-2237. https://doi.org/10.18653/v1/N18-1202 DOI: https://doi.org/10.18653/v1/N18-1202
QUECEDO, J.M.H., KOPPATZ, M.W., and YANGARBER, R. (2020). “Neural disambiguation of lemma and part of speech in morphologically rich languages”. In Proceedings of the Twelfth Language Resources and Evaluation Conference. Marseille, European Language Resources Association, pp. 3573-3582. https://aclanthology.org/2020.lrec-1.439
RICE, C.A., BEEKHUIZEN, B., DUBROVSKY, V., STEVENSON, S., and ARMSTRONG, B.C. (2019). “A comparison of homonym meaning frequency estimates derived from movie and television subtitles, free association, and explicit ratings”. Behavior Research Methods, 51, 1399-1425. https://doi.org/10.3758/s13428-018-1107-7 DOI: https://doi.org/10.3758/s13428-018-1107-7
SAK, H., GÜNGÖR, T., and SARAÇLAR, M. (2007). “Morphological disambiguation of Turkish text with perceptron algorithm”. In Gelbukh, A. (ed.), Computational Linguistics and Intelligent Text Processing. Berlin, Heidelberg, Springer, pp. 107-118. https://www.researchgate.net/publication/226961087_Morphological_Disambiguation_of_Turkish_Text_with_Perceptron_Algorithm DOI: https://doi.org/10.1007/978-3-540-70939-8_10
SERDIUK, N. (2023). “Methodology and organization of professional research and academic integrity in the formation of a modern Foreign language and literature teacher”. Professional Education: Methodology, Theory and Technologies, 9(1), 159-179. https://doi.org/10.31470/2415-3729-2023-17-159-179 DOI: https://doi.org/10.31470/2415-3729-2023-17-159-179
SHASHANK, U., VENKATESH, B.N., RAJESHWARI, S.B., and KALLIMANI, J.S. (2019). “Identification and contextual semantic retrieval of polysemy words”. International Journal of Recent Technology and Engineering, 8.2S8, 1201-1204. https://doi.org/10.35940/ijrte.B1038.0882S819 DOI: https://doi.org/10.35940/ijrte.B1038.0882S819
SHLAPAK, I. (2016). “Terms polysemy in translation of scientific and technical literature”. Society. Document. Communication, 1(1), 227-234.
SHYNGYSSOVA, N.T., and SKRIPNIKOVA, А.I. (2018). “Polylingual periodicals of Kazakhstan”. Bulletin of Al-Farabi Kazakh National University, Series of Journalism, 49.3, 38-45. https://bulletin-journalism.kaznu.kz/index.php/1-journal/article/view/1029
STRILETS, V. (2021). “Application of corpus technologies in teaching specialized translation”. Humanities Studios: Pedagogy, Psychology, Philosophy, 9(4), 48-52. https://doi.org/10.31548/hspedagog2021.04.048 DOI: https://doi.org/10.31548/hspedagog2021.04.048
TANG, Х. (2006). “English morphological analysis with machine-learned rules”. In Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation. Wuhan, Tsinghua University Press, pp. 35-41. https://doi.org/http://hdl.handle.net/2065/29035
TEMIRGALIEVA, A. (2016). Frequency dictionary of Kazakh language developed.
TOEWS, D., and VAN HOLLAND, L. (2019). “Determining domain-specific differences of polysemous words using context information”. In Proceedings of International Conference on Requirements Engineering – Foundation for Software Quality. Wachtberg, Fraunhofer Institute for Communication, Information Processing and Ergonomics. https://publica.fraunhofer.de/handle/publica/410113
VAN DEN BEUKEL, S., and AROYO, L. (2018). “Homonym detection for humor recognition in short text”. In Balahur, A., Mohammad, S. M., Hoste, V., and Klinger, R. (eds.), Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. Brussels, Association for Computational Linguistics, pp. 286-291. https://doi.org/10.18653/v1/W18-6242 DOI: https://doi.org/10.18653/v1/W18-6242
WILSON, K., and MARANTZ, A. (2022). “Contextual embeddings can distinguish homonymy from polysemy in a human-like way”. In Proceedings of the 5th International Conference on Natural Language and Speech Processing. Trento, Association for Computational Linguistics, pp. 144-155. https://aclanthology.org/2022.icnlsp-1.17
WOLDEYOHANNIS, M.M., and MESHESHA, M. (2022). “Usable Amharic text corpus for natural language processing applications”. Applied Corpus Linguistics, 2.3, 100033. https://doi.org/10.1016/j.acorp.2022.100033 DOI: https://doi.org/10.1016/j.acorp.2022.100033
YURET, D., and TÜRE, F. (2006). “Learning morphological disambiguation rules for Turkish”. In Proceedings of the Main Conference on “Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics”. New York, Association for Computational Linguistics, pp. 328-334. https://doi.org/10.3115/1220835.1220877 DOI: https://doi.org/10.3115/1220835.1220877
ZHUBANOV, A.K., and ZHANABEKOVA, A.A. (2016). Corpus linguistics: Educational tool. Almaty, “Kazakh Language” Publishing House.
Descargas
Publicado
Cómo citar
Número
Sección
Licencia
Derechos de autor 2026 Akerke Meirbekova, Anar Fazylzhanova, Aiman Zhanabekova, Aigul Amirbekova, Gulnara Talgatkyzy

Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.
El autor o autora conserva todos los derechos sobre su artículo y cede a la revista el derecho de la primera publicación, no siendo necesaria la autorización de la revista para su difusión una vez publicado. Una vez publicada la versión del editor el autor está obligado a hacer referencia a ella en las versiones archivadas en los repositorios personales o institucionales.
El artículo se publicará con una licencia Creative Commons de Atribución, que permite a terceros utilizar lo publicado siempre que se mencione la autoría del trabajo y la primera publicación en esta revista.
Se recomienda a los autores/as el archivo de la versión de editor en repositorios institucionales.





