-
Kiwix team need a GPT like platform to be available to use in remote areas
-
It should be trained on custom data (flat files / zim files)
-
Knowledge / Tokenized data should be available offline & transferrable from one system to another
-
No GPU should be required for using the GPT (Only CPU - Mid size)
-
Build a GPT like platform which can perform Q/A offline
-
It should be trained on specific data (It can extend already existing knowledge)
-
Training can be on Flat Files / Zim Files / Json (Q/K/V Pattern) etc
-
Should be standalone deployed on small - mid level system to get results
- Source ticket(kiwix/overview#93)
- Use pre-built Model for tokenizer
- Read flat file
- Tokenize data
- Create config for MLM / NSP model for Training
- Train
- Store Knowledge / Tokens in Flat files (easy to share)
- Test