First Corpus-based Quantitative Exploration of Verb-Frame Distribution in Chinese Textbooks

Aggression towards Animals and Violent Behaviour 虐畜暴力事件簿
5 December 2023
The Era of AI: How Persuasive is Artificial Intelligence as Compared to Humans?
26 January 2024

First Corpus-based Quantitative Exploration of Verb-Frame Distribution in Chinese Textbooks

Principal investigator: Prof LIU Meichun (Department of Linguistics and Translation)

Photo 1: The most frequent frame was either SELF MOTION or COMMUNICATION (relevant to Tier 1 sensory–motor).

Extensive reading is one of the golden rules to help early learners grasp language learning effectively. Suitable books for age and language level can arouse children’s interest in learning the language and seek further improvement. Textbooks serve as a vital language input for language acquisition, as indicated by usage-based accounts of language acquisition. In choosing suitable texts for learners, the most frequently appearing words are expected to be acquired in lower grades and are viewed as easier in terms of difficulty level. However, fewer studies have analysed textbook difficulty levels from a cognitive semantic perspective in language development.

Professor LIU Meichun, Professor of CityU’s Department of Linguistics and Translation, has led a research project on text evaluation with Professor John LEE Sie-yuen from the same department. The study findings were summarised in the paper “Verb frame distribution and text difficulty: A corpus-based analysis of verb frames in Chinese textbooks”, which was published in International Journal of Applied Linguistics. On top of word frequency-based studies, they introduced semantic types of verbs (verb frames) as a salient feature of a text and argued that verb frames may offer potential references to measure text difficulty. This study regards textbook grades as the difficulty levels and examines whether the performance of verb frames changes by grade-based difficulty.

Based on a corpus of nine sets of primary school Chinese textbooks, verb-frame diversity and distribution trends by difficulty levels were examined. The study utilised frequent verbs from 13 Archi-frames distributed across 107 Basic frames, involving over 1,800 different verbs with a total occurrence of over 52,000 in the corpus. The Archi-frames were grouped into three tiers of gradually increasing complexity: the sensory-motor, the representational, and the abstract tiers, with reference to frame definitions and representative verbs in the database of in the Mandarin VerbNet.

Photo 2: Tier-3 (abstract domains) Archi-frames TEMPORAL PROCEEDING OF ACTIVITY, SOCIAL INTERACTION, and CREATE displayed an upward trend as the difficulty level increases.

The corpus contains over 2,475 articles with 33,000 sentences and around 962,000 characters. The average sentence lengths increase at a decreasing rate as the difficulty level rises. Adopting the Spearman correlation analysis of verb-frame diversity and difficulty level, the result proves that the diversity of verb frames was highly correlated with grade-based difficulty levels. The more abstract, image-schematic categorisation points to a higher level of experience based on further structuring of such basic experiential domains. In line with cognitive semantics, verb frames are taken to be semantic categories that represent different situations or scenes. The tier-based frame distinction is also closely connected with the cognitive skills corresponding to different complexity levels. As the difficulty level increases, the first-tier frames decrease whereas the third-tier frames increase.

This study is the first corpus-based quantitative exploration of verb-frame distribution in Chinese language textbooks. In sum, Prof Liu and her team believed that the study can offer an empirically valid and feasible measure to improve the existing state-of-art in text difficulty assessment. The study has contributed positively to the research of exploring semantic frames in relation to language applications. It sets a pilot model to explore semantic features related to verb frames in a quantitative manner that goes beyond the commonly recognized lexical and syntactic features. The study also links the frame-based semantic theory with the tier-based developmental theory to provide a cognitively valid model of semantic assessment. Lastly, it presents a statistically verifiable and linguistically explainable model based on verb frames that bear psycholinguistic relevance to the processing of eventive information. The findings suggest that the average verb-frame diversity can serve as a simple, preliminary index to text difficulty level with a strong statistical correlation. Ultimately, the study paves the way for further empirical studies on semantic measures of text difficulty.





Achievement and publication

Liu, M, Zhang, Z & Lee, JSY 2023, ‘Verb frame distribution and text difficulty: A corpus-based analysis of verb frames in Chinese textbooks’, International Journal of Applied Linguistics