5 releases (breaking)
0.5.0 | Oct 27, 2024 |
---|---|
0.4.0 | Oct 26, 2024 |
0.3.0 | Oct 8, 2023 |
0.2.0 | Mar 26, 2023 |
0.1.0 | Mar 26, 2023 |
#35 in #chinese
273 downloads per month
2.5MB
309 lines
TOCFL
The Test of Chinese as a Foreign Language (TOCFL) (Chinese: 華語文能力測驗; pinyin: Huáyǔwén Nénglì Cèyàn) is a standardized test of Taiwanese Mandarin language proficiency for non-native speakers, including foreign students. While there are many vocabulary lists available online, a lot of them are either incomplete / outdated or behind paywalls.
This repo provides a dataset based on (linked from the official TOCFL website):
coct.naer.edu.tw/download/tech_report
Vocabulary
Taiwan Chinese Language Proficiency Benchmark Vocabulary List_111-11-14.xlsx
The vocabulary list is great, it gives frequency for written AND spoken. It also provides pinyin to differentiate same char with different meaning pronounciation.
Characters
Taiwan Chinese Language Proficiency Benchmark Chinese Character List_111-09-20.xlsx
Other
https://github.com/tomcumming/tocfl-word-list also provides TOCFL lists, but seems to be incomplete (or outdated). The source used to compile the list is not entirely clear.
Dependencies
~8MB
~201K SLoC