Analysing corpus-based criterial conjunctions for automatic proficiency classification

Ángeles Zarco-Tejada, Carmen Noya Gallardo, Mª Carmen Merino Ferradá, Isabel Calderón López


The linguistic profiling of L2 learning texts can be taken as a model for automatic proficiency assessment of new texts. But proficiency levels are distinguished by many different linguistic features among which the use of cohesive devices can be a criterial element for level distinctions, either in the number of conjunctions used (quantitative) and/or in the type and variety of them (qualitative). We have carried such an analysis with a subgroup of the CLEC (CEFR-levelled English Corpus) using Coh-Metrix, a tool for computing computational cohesion and coherence metrics for written and spoken texts, but our results suggest that automatic proficiency level assessment needs a deeper examination of conjunctions that should rely on the analysis of conjunction-types use and conjunction varieties, with an analysis of lexical choice. A variable based on familiarity ranks could help to predict cohesive levels proficiencyoriented.


Cohesion; language assessment; corpus linguistics; L2 English learning texts; linguistic profiling; Coh-Metrix.

Full Text:



Anthony, L. 2014. AntConc 3.4.3 [Computer Software]. Tokyo, Japan: Waseda University. (Accessed 6 September 2015).

Bae, J. 2001. “Cohesion and coherence in children’s written English: immersion and English-only classes”. Issues in Applied Linguistics 12 (1): 55-88. (Accessed 15 January 2016).

Bloor, Th. and M. Bloor. 1995. The Functional Analysis of English: a Hallidayan Approach. London: Arnold.

Cain, K. and H. M. Nash. 2011. “The influence of connectives on young readers’ processing and comprehension of text”. Journal of Educational Psychology 103 (2): 429-441.

Castro, C. D. 2004. “Cohesion and the social construction of meaning in the essays of filipino college students writing in L2 English”. Asia Pacific Education Review 5 (2): 215-225.

Chen, J. 2008. “An investigation of EFL students’ use of cohesive devices”. Asia Pacific Education Review 5 (2): 215-225.

Collins, J. L. 1998. Strategies for Struggling Writers. New York: Guilford.

Collins-Thompson, K. and J. Callan. 2005. “Predicting reading difficulty with statistical language models”. Journal of the American Society for Information Science and Technology 56 (13): 1448-1462.

Connor, U. 1990. “Linguistic/rethorical measures for international persuasive student writing”. Research in the Teaching of English 24: 67-87.

Crismore, A., Markkanen, R. and M. Steffensen. 1993. “Metadiscourse in persuasive writing: a study of texts written by American and Finnish university students”. Written Communication 10: 39-71.

Crossley, S. A., Greenfield, J. and D. McNamara. 2008. “Assessing text readability using cognitively based indices”. TESOL Quarterly 42 (3): 475-493.

Crossley, S. A., Louwerse, M., McCarthy, P. M. and D. McNamara. 2007. “A linguistic analysis of simplified and authentic texts”. Modern Language Journal 91: 15-30.

Crossley, S. A. and D. McNamara. 2009. “Computational assessment of lexical differences in L1 and L2 writing”. Journal of Second Language Writing 18: 119-130.

Crossley, S. A. and D. McNamara. 2011. “Predicting second language writing proficiency: the roles of cohesion and linguistic sophistication”. Journal of Research in Reading 35 (2): 115-135.

Crossley, S. A., Salsbury, T. and D. McNamara. 2009. “Measuring second language lexical growth using hypernymic relationships”. Language Learning 60 (3): 307-334.

Crossley, S. A., Salsbury, T., McNamara, D. S. and S. Jarvis. 2011. “What is lexical proficiency? Some answers from computational models of speech data”. TESOL Quarterly 45 (1): 182-193.

Dahlmeier, D., Ng, H. T. and S. M. Wu. 2013. “Building a large annotated corpus or learner English: The NUS corpus of learner English”. Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications. Atlanta, Georgia, June 13, 2013. Association for Computational Linguistics. 22-31.

De Villez, R. 2003. Writing: Step by Step. Dubuque, I. A.: Kendall Hunt. Dell’Orletta, F., Montemagni, S. and E. M. Vecchi. 2011a. “Technologie linguistico-computazionali per il monitoraggio della competenza linguistica italiana degli alumni stranieri nella scuola primaria e secondaria”. Percorsi Migranti: Uomini, Diritto, Lavoro, Linguaggi. Eds. G. C. Bruno, I. Caruso, M. Sanna and I. Vellecco. Milano: McGraw-Hill. 319-336.

Dell’Orletta, F., Montemagni, S. and G. Venturi. 2011b. “READ-IT: Assessing readability of Italian texts with a view to text simplification”. Proceedings of the Workshop on Speech and Language Processing for Assistive Technologies (SLPAT 2011). July 30, 2011, Edimburgh, UK. 73-83.

Dell’Orletta, F. and S. Montemagni. 2012. “Tecnologie linguistico-computazionali per la valutazione delle competenze linguistiche in ambito scholastico”. Lingüística Educativa. Ed. S. Ferreri. Atti del XLIV Congresso Internazionale di Studi della SLI, Roma, Bulzoni Editore. 343-359.

Dell’Orletta, F., Montemagni, S. and G. Venturi. 2013. “Linguistic profiling of texts across textual genres and readability levels. An exploratory study on Italian fictional prose”. Proceedings of Recent Advances in Natural Language Processing. September 2013, Hissar, Bulgaria. Association for Computational Linguistics. 189-197.

Downing, A. and P. Locke. 2006. A University Course in English Grammar. London: Routledge.

Eggins, S. 2004. An Introduction to Systemic Functional Linguistics. London: Continuum.

Engber, C. A. 1995. “The relationship of lexical proficiency to the quality of ESL compositions”. Journal of Second Language Writing 4 (2): 139-155.

Ferris, D. 1994. “Lexical and syntactic features in ESL writing by students at different levels of L2 proficiency”. TESOL Quarterly 28 (2): 414-420.

Ferris, D. 2003. Response to Student Writing: Implications for Second Language Students. Mahwah, N.J.: Lawrence Erlbaum.

Graesser, A., McNamara, D. S., Louwerse, M. and Z. Cai. 2004. “Coh-Metrix: Analysis of text on cohesion and language”. Behavioral Research Methods, Instruments, and Computers 36: 193-202.

Granger, S. and S. Tyson. 1996. “Connector usage in the English essay writing of native and non-native EFL speakers of English”. World Englishes 15 (1): 17-21. Grant, L. and A. Ginther. 2000. “Using computer-tagged linguistic features to describe L2 writing differences”. Journal of Second Language Writing 9: 123-145.

Green, C. 2012. “A computational investigation of cohesion and lexical network density in L2 writing”. English Language Teaching 5 (8): 57-69.

Jarvis, S. 2002. “Short texts, best fitting curves and new measures of lexical diversity”. Language Testing 19: 57-84.

Johnson, P. 1992. “Cohesion and coherence in compositions in Malay and English”. Journal of Language Teaching and Research 23 (2): 1-17.

Halliday M. A. K. 2004. An Introduction to Functional Grammar. London: Arnold. Halliday, M. A. K. 2013. Halliday’s Introduction to Functional Grammar (4th ed). London: Routledge.

Halliday, M. A. K. and R. Hasan. 1976. Cohesion in English. London: Longman. Halliday, M. A. K. and C. Matthiessen. 2014. Halliday’s Introduction to Functional Grammar. London: Edward Arnold.

Heilman, M., Collins-Thompson, K., Callan, J. and M. Eskenazi. 2007. “Combining lexical and grammatical features to improve readability measures for first and second language texts”. Proceedings of NAACL Human Language Technologies-2007. Rochester, New York, 2007. Association for Computational Linguistics. 460-467.

Liu, M. and G. Braine. 2005. “Cohesive features in argumentative writing produced by Chinese undergraduates”. System 33: 623-636.

Longo, B. 1994. “The role of metadiscourse in persuasion”. Technical Communication 41: 348-352.

Louwerse, M. M., McCarthy, P. M., McNamara, D. S. and A. C. Graesser. 2004. “Variation in language and cohesion across written and spoken registers”. Proceedings of the 26th Annual Cognitive Science Society. Eds. K. Forbus, D. Gentner and T. Regier. Mahwah, HJ: Erlbaum. 843-848.

Mahlberg, M. 2006. “Lexical Cohesion. Corpus Linguistic Theory and its Applications in ELT”. Special issue of the International Journal of Corpus Linguistics 11 (3): 363-383.

McCarthy, P. M. 2005. “An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD)”. Dissertation Abstracts International, 66 (12), UMI No. 3199485.

McCarthy, P. M., Lewis, G. A., Dufty, D. F. and D. S. McNamara. 2006. “Analyzing writing styles with Coh-metrix”. Proceedings of the Florida Artificial Intelligence Research Society International Conference. Eds. G. Sutcliffe and R. Goebel. AAAI Press. 764-769.

McNamara, D. S., Louwerse, M. M., Cai, Z. and A. Graesser. 2005. Coh-Metrix version 1.4. (Accessed September 2015). McNamara, D. S., Crossley, S. A. and P. McCarthy. 2010. “Linguistic features of writing quality”. Written Communication 27 (1): 57-86.

McNamara, D. S., Louwerse, M. M., Cai, Z. and A. Graesser. 2013. Coh-Metrix version 3.0. (Accessed September 2015).

Montemagni, S. 2013. “Tecnologie linguistico-computazionale e monitoraggio della lingua italiana”. Studi Italiani di Linguistica Teorica e Applicata (SILTA) 42 (1): 145-172.

Morley, G. D. 2000. Syntax in Functional Grammar. An Introduction to Lexicogrammar in Systemic Linguistics. NewYork: Continuum. (Accessed February 2016).

Petersen, S. E. and M. Ostendorf. 2009. “A Machine Learning Approach to reading level assessment”. Computer Speech and Language 23: 89-106.

Roark, B., Mitchell, M. and K. Hollingshead. 2007. “Syntactic complexity measures for detecting mild cognitive impairment”. Proceedings of ACL Workshop on Biological, Translational, and Clinical Language Processing (BioNLP’07). Prague, Czech Republic. 1-8.

Sagae, K., Lavie, A. and B. MacWhinney. 2005. “Automatic measurement of syntactic development in child language”. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2005). University of Michigan, USA. 197-204.

Sanders, T. J. M. and L. G. M. Noordman. 2000. “The role of coherence relations and their linguistic markers in text processing”. Discourse Processes 29 (1): 37-60.

Templin, M. 1957. Certain Language Skills in Children. Minneapolis, MN: University of Minnesota Press.

Thompson, G. 2004. Introducing Functional Grammar. New York: Saint Martin’s Press.

Van de Kopple, W. J. 1985. “Some explanatory discourse on metadiscourse”. College Composition and Communication 36: 82-93.

Witte, S. P. and L. Faigle. 1981. “Coherence, cohesion and writing quality”. College Composition and Communication 22: 189-204.

Xi, Y. 2010. “Cohesion studies in the past 30 years: development, application and chaos”. The International Journal-Language Society and Culture 31. (Accessed January 2016).

Zarco-Tejada, M. A., Noya Gallardo, C., Merino Ferradá, M. C. and I. Calderón López. 2015a. “2L English texts and cohesion in upper CEFR levels: a corpus- based approach”. Procedia – Social and Behavioral Sciences, 212: 192-197.

Zarco-Tejada, M. A., Noya Gallardo, C., Merino Ferradá, M. C. and I. Calderón López. 2015b. “Building a corpus of 2L English for automatic assessment: The CLEC corpus”. Procedia -Social and Behavioral Sciences 198: 515-525.


Copyright (c) 2016 Ángeles Zarco-Tejada, Carmen Noya Gallardo, Mª Carmen Merino Ferradá, Isabel Calderón López

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

© Universidad de La Rioja, 2013

ISSN 1576-6357

EISSN 1695-4300