#chinese #learning #pinyin #read #user #phrase #reading

nightly bin+lib duguo

A Rocket web-app designed to facilitate learning how to read Chinese

1 unstable release

0.1.0 Feb 6, 2021

#13 in #phrase

MIT license

115KB
2K SLoC

Rust 1.5K SLoC // 0.1% comments Tera 334 SLoC // 0.0% comments JavaScript 313 SLoC // 0.1% comments

DuGuo

DuGuo is under construction - will be prototyping some ideas, if you're interested in contributing then let me know and happy to collaborate!

docs: 0.2.0 License: AGPL

DuGuo demo gif

Overview

DuGuo is an open-source web application that allows users to read Chinese text in an interactive learning environment. The main features include:

  • Phonetic support (Pinyin + Zhuyin) and phrase lookup via CC-CEDICT
  • Phrase tokenization via spaCy
  • Text-to-speech via the SpeechSynthesis API
  • Transposition between Simplified + Traditional Chinese text
  • ... other ideas tbd - view + contribute in the Issues tab!

This app is designed in particular for L2 (second-language) learners, though hopefully it is useful for all levels of Chinese learning!

Tech Stack

The app has 2 microservices:

  1. A web server written in Rust using Rocket
  2. An NLP tokenization service written in Python primarily using spaCy's Chinese module (which builds on top of jieba)

For data persistance, mongoDB and Redis are used.

Tokenized words are looked-up in the CC-CEDICT which is generously available under a Creative Commons license. Radical information (for saved vocab) is sourced from this web API and can be quickly accessed using the accompanying Hemiola Chinese Character Browser.

Motivation

Learning Chinese as a second language is hard for many reasons. To start, Chinese characters are logographic whereas English characters are alphabetic - this necessitates a fundamentally different approach to phrase memorization. Additionally, phrase pronunciation requires learning technical phonetic syntax (e.g. pinyin) which is rarely used by natives and virtually non-existant in practice.

While there are many more nuanced approaches to Chinese learning (e.g. the HSK framework), one simplified view is that there are 3 levels of Chinese reading mastery:

  1. Almost entirely pinyin-dependent (for beginners and L2 learners that can speak but can't read, like myself...)
  2. Some pinyin needed (roughly grade-school level for native Chinese speakers)
  3. Almost no pinyin needed (adult level - phrases are either memorized or able to be intuited based on the context)

Below are images to provide a visual reference. While for natives the jump from tier 1 to 3 is trivial, for L2 learners it can feel insurmountable!

  1. A beginner-level Chinese textbook with pinyin included for all words ('Tier 1').
  2. An intermediate-level Chinese textbook with pinyin for some words ('Tier 2'). In practice, this is grade-school level for natives!
  3. A native-level article from a Chinese newspaper ('Tier 3'). No pinyin is used at all, since natives don't really need it!

Contextual Learning

Contextual learning is arguably the best way to learn a language. People remember things that are linked to experiences or assorted significant pieces of information. For natives, learning Chinese is essential. However for L2 learners, finding the urgency to learn is uniquely difficult without an external driving force (e.g. living in a Chinese-speaking country).

Barring the ability to live in a foreign country, DuGuo hopes to offer the next-best thing by allowing users to pick what they want to read (improving contextual relevance) and saving contextual references for "learned" phrases (adding contextual triggers).

Other Existing Tools

There are several existing tools that provide similar functionality, including (but not limited to): Zhongwen Chrome Extension, Purple Culture Pinyin Converter, Du Chinese (mobile), mdbg.net, Hànzì Analyzer, pin1yin1, etc.

The main differentiators DuGuo hopes to provide with this project are improved UX, progress persistance (via accounts), document difficulty scoring (in progress), and Duey! Ultimately this is provided as an additional tool to help users learn Chinese, so definitely use the combination of tools that best supplements your learning experience.

Acknowledgement

This project was adopted from Martin Kess's previous CS6460 final project, the Chinese Reading Machine (中文读机). He provided the starter code (in Python Flask) and a strong existing framework to build on. The images for Duey came from Dzaky Taufik (his Upwork linked here). 感谢 and 大家加油!

Duey! Confused Duey? Surprised Duey Worried Duey :-( Happy Duey!

Dependencies

~41–57MB
~1M SLoC