Asynchronously scrape the ArchChinese dictionary to generate Anki flashcards
pip install ankichinese
playwright install
ankichinese
-h, --help Show help message and exit
--export, -x {anki, csv, update} Export mode (default: anki)
anki: Generate new AnkiChinese deck
csv: Generate CSV file
update: Update existing deck
--input, -i INPUT Input file with characters to scrape (default: input.txt)
--output, -o OUTPUT Name of output file (do not include extension)
(default: ankichinese_output)
--definitions, -def NUM Number of definitions to scrape per character (default: 5)
--examples, -ex NUM Number of example words to scrape per character (default: 5)
--requests-at-once, -r NUM Maximum number of requests at once (default: 10)
--requests-per-second, -rs NUM Maximum number of requests per second (default: 5)
How to create an entirely new Anki deck with the name ankichinese_output.apkg
in the current directory using custom AnkiChinese styling.
- Create
input.txt
with the characters you want to scrape. - Run
ankichinese -x anki
. - Open Anki and import
ankichinese_output.apkg
.
Updating is Easy!
Just run ankichinese -x anki
again with new characters in input.txt
and import the new ankichinese_output.apkg
file into Anki. Anki will automatically update the existing deck without losing progress.
- Create
input.txt
with the characters you want to scrape (can be the same as the existing deck). - Run
ankichinese -x update
. - Choose deck and model of cards to update. AnkiChinese will search for and overwrite any fields with the same names as the following.
Field Name | Description |
---|---|
Hanzi | Simplified character (REQUIRED) |
Traditional | Traditional form |
Definition | Meaning of character |
Pinyin | Most common pinyin |
Pinyin 2 | Other possible pinyin |
Words | Example words |
Formation | Origin / mnemonic for character |
HSK | Hanyu Shuiping Kaoshi level |
Audio | Audio file name (required for audio) |
- Import the new
ankichinese_audio.apkg
file into Anki. This will import the audio files (and create an empty deck that can be deleted)
- Asynchronous I/O: Asyncio
- Limit concurrency: Aiometer
- Web automation and HTML interaction: Playwright
- Anki deck generation: Genanki
- Anki database access: AnkiPandas
- Progress bars: tqdm
- HTML parsing and scraping: Beautiful Soup
- Data manipulation: Pandas
- GUI: Tkinter
Character information: ArchChinese
Stroke order diagrams:
- Online stroke order diagrams: Hanzi Writer
- Offline stroke order font: Reinaert Albrecht
Chinese audio:
- Standard tones: Yoyo Chinese
- Neutral tones: Purple Culture