This repository is archived and will not receive future updates or support.
As of February, 2022 this software is still working assuming you install the correct dependencies.
All code in this repository is made available under the MIT license, so please feel free to fork / copy / modify it for your own use according to the license terms. Thank you to the contributors who have helped improve glossika-to-anki over the years.
The original README follows below.
Generate Anki decks from Glossika PDFs and audio files
glossika-to-anki
is a set of Python 3 scripts to generate Anki flashcards using the PDFs and audio from the Glossika language program.
glossika-to-anki
provides three main utilities:
glossika_extract_pdf.py
- Generate a TSV file of English and target language phrases from Glossika PDFsglossika_split_audio.py
- Split GMS-C audio files into individual mp3s for each phrasegenerate_anki.py
- Create an Anki deck by combining each phase and its corresponding audio into a separate Anki note / card
- Python 3
- On Windows you can run the installer from the Python website.
- MacOS or Linux you can install python with
brew install python
orapt install python
.
- pdftotext - Converts the Glossika PDFs to text so that the phrases can be extracted with regex.
-
On Windows download Xpdf tools and copy pdftotext.exe to a folder on the path (i.e. the Python folder). If you installed python with the Windows installer, the default path should be
C:\Program Files
orC:\Users\your_name\AppData\Local
. Alternatively, you might also be able to runwhich python
orwhere python
from cmd prompt to figure out where the python executable is located. -
On MacOS
brew cask install pdftotext
; on Linuxapt install poppler-utils
-
- mp3splt - Splits the GMS-C files into individual files on the silence between sentences.
- On Windows download mp3splt from the project homepage and add it to the path
- On MacOS
brew install mp3splt
; on Linuxapt install mp3splt
- genanki to generate Anki decks
pip install genanki
orpip3 install genanki
- v2 Glossika PDFs and GMS-C mp3 files
- The PDF script only works with the v2 Glossika PDFs. These files are searchable, and have a blue box around each phrase. If your PDFs are more than 20 mb you have the older version that is not supported.
- Here's an example of what the audio file names should look like:
ENZS-F1-GMS-C-0001.mp3
. Note that the ENZS prefix is for Mandarin and varies by language. - Note: The Glossika PDFs and audio files are difficult to find since Glossika recently discontinued them and launched a subscription service. I heard some people have had luck contacting Glossika's support team to purchase the older PDF courses, but your milage may vary.
Clone or download the repository.
git clone [email protected]:emesterhazy/glossika-to-anki.git
cd glossika-to-anki/glossika-to-anki
Run each script and follow the prompts to copy the Glossika files into the source folder that is created.
python glossika_extract_pdf.py
python glossika_split_audio.py
python generate_anki.py
Import your new Anki deck!
-
Only supports the v2 Glossika PDFs, not the older non-searchable PDFs. PDFs with copy protection must have it removed before sentences can be extracted.
-
No support for extracting IPA
Pull requests are welcome. If you would like to add support for the v1 Glossika PDFs or make changes that require new dependencies please open an issue first to discuss.