Evaluating the Reliability of Automatic Analysis of Noun Phrase Complexity in Learners' Written Production

Ichika Yamaguchi
Tokyo University of Foreign Studies, Tokyo, Japan


Syntactic complexity is one of the key aspects reflecting the language development in second language writing. Early research on syntactic complexity focused primarily on the clausal complexity, which is operationalized by indices such as MLTU (Mean length of T-unit). However, since the 2010s, the phrasal complexity has started to be emphasized and actively investigated, with a particular focus on noun phrase elaboration. Nevertheless, there are still few studies that analyze automatically the developmental features of noun phrases in learners’ written production by making effective use of technology in the field of natural language processing. Although it is technologically possible to automatically extract noun phrases by utilizing constituency parsing, the particular characteristics of the written production by less advanced learners, such as grammatical errors and improper punctuation usage, could potentially affect the accuracy and reliability of automatic analyses. Therefore, it is important to examine the applicability of these automatic tools to the analysis of the learners’ language. In this study, in order to examine the accuracy and reliability of automatic analyses on learners’ written production of the different proficiency levels, manual and automated analyses of essays in the ICNALE (International Corpus Network of Asian Learners of English) are compared. Based on the quantitative comparison, this study demonstrates how the accuracy of automated analysis of noun phrases is affected by the proficiency levels of the learners. Furthermore, through both the quantitative and qualitative analyses, this study identifies which specific features found in learners' production causes a decrease in accuracy. The findings of this study will inform future research that propose better developmental indices reflecting learners’ development of the noun phrase structure.


syntactic complexity, noun phrase complexity, Learner Corpus Research, Natural Language Processing, writing development

International Joint Conference of APLX, ETRA40, and TESPA 2023