Corpus of Native Youth English (CONYE23)

This research project was managed by British Council China for the Chinese Basic Education Curriculum and Teaching Material Research Center (BECTMRC). The Project sought to identify:

1. the most commonly used medium to high frequency, age-appropriate language chunks that are presented in the updated 2021 New National English Curriculum (NNEC) (covering Grades 3 to 9) through comparison with commonly used, age-appropriate lexical chunks used by similarly aged native-speaking children in the UK. 

2. prominent gaps in high-frequency language within the New National English Curriculum (NNEC) that can be supplemented or included in future materials revision. 

The principal researcher, James Thomas, created a corpus of native speaking children’s output, the language produced by children. Large sample of language input written for the NNEC age group, (i.e. 9to15), were also collected, as children’s language output is strongly influenced by the language they encounter in written texts. Project outputs take the form of an online database and a ‘book of chunks’ (with and without metadata). 

The research outputs will enable Chinese curriculum designers and materials writers for Grades 3 to 9 to verify their intuitions about the language used by similarly aged native-speaking children in the UK, and also to enhance the NNeC word lists with vocabulary items that will bring their teaching materials into closer alignment with native youth English

Whats in the Project Database:

Word List

3101 words listed in alphabetical order

China Corpus Database - Word List
Corpus of Native Youth English - Bigrams


542594 Bigrams made of nouns, verbs, adjectives, adverbs, and prepositions.


The database includes 286,834 Collocations in grammatical relationships

Grammar Patterns

5958 G Patterns


38448 Chunks of NNEC vocabulary based on syntagms presented with metadata

Corpus Database - Chunks of NNEC Vocabulary
Corpus Database Sentences


Over 1,200,000 sentences, including different sentence lengths and genres

How to use the Corpus Database:

Check out the User Guide

The user guide helps you get the most out of the corpus database.

To see the User Guide, visit the corpus database below, and click on the “User Guide” menu item on the top-right of the main menu bar.

User Guide for China Corpus Database

Want to explore the full project report?

Click the button to read more
Scroll to Top