- 📍Table of Contents
- 📖 Project Introduction
- 🗺️ Technical Architecture
- ✨ Technical Report
- 📆 Update Notes
- 🛠️ Usage Guide
- 📋 Project Code Structure
- ☕ Project Members (listed in no particular order)
- 💖 Special Thanks
- License
- Star History
This project, named "The God of Cookery," is inspired by the renowned movie of the same name starring the comedic master, Stephen Chow. The project's goal is to provide cooking advice and recipe recommendations through artificial intelligence technology, helping users to enhance their cooking skills and reduce the barriers to cooking, thereby realizing the movie's message: "With heart, anyone can become a god of cookery." The core concept of this application is based on the InternLM dialogue model, which has been fine-tuned using the XiaChuFang Recipe Corpus, consisting of 1,520,327 Chinese recipes. The model is hosted on ModelScope, and the application is deployed on OpenXLab. Special thanks to the Moda Community for providing free space for model hosting and to OpenXLab for offering the deployment environment and GPU resources. Please note that the answers provided by this application are intended for reference only and should not be considered as actual steps for recipe preparation. Due to the "hallucination" characteristics of large-scale models, some recipes might cause psychological or physiological effects. Users are advised not to take these recipes out of context.
The project primarily relies on the open-source model from the Shanghai AI Lab, known as internlm-chat-7b, which includes both first and second generations. We fine-tuned this model on the XiaChuFang Recipe Corpus, which consists of 1,520,327 Chinese recipes. This tuning was facilitated by Xtuner with LoRA fine-tuning, resulting in the creation of the shishen2_full model. Post-tuning, the model was integrated with a vector database into Langchain, achieving an enhanced retrieval effect through RAG (Retrieval-Augmented Generation). It supports multimodal (voice, text, image) question-answering dialogues. The frontend interaction with users is implemented using Streamlit.
Upon receiving a request from a user, the application loads the models (voice model, text-to-image model, fine-tuned dialogue model) and processes the user's text or voice input. If the RAG switch is not activated, it directly calls the fine-tuned dialogue model to generate a reply, formats the result, and uses the stable diffusion model to generate an image, finally returning the result to the user. If the RAG switch is activated, it uses Langchain to search the vector database, inputs the search results into the fine-tuned dialogue model to generate a reply, formats the result, and calls the stable diffusion model to generate an image, ultimately returning the result to the user.
Access the technical report and explanatory videos through the following links:
Section Name | Document Author | Technical Lead |
---|---|---|
General Overview | zzd2001, chg001, zhanghui-china | zhanghui-china |
Voice Recognition | zzd001 | sole fish |
Text-to-Image | Fang Yuliang | Fang Yuliang |
RAG | zzd2001 | Charles, Yue Zhengmeng |
Model Fine-Tuning | zzd2001 | chg001, zzd2001, zhanghui-china |
Web UI | Fang Yuliang | Fang Yuliang |
-
Coming Soon...
-
RAG system based on llama-index and HyQE
-
Speech output
-
Support of other LLMs
-
[2024.4.21] HyQE RAG system with LangChain proposed by team member @Yue Zhengmeng merged to main branch
-
[2024.3.20] Updated README
-
[2024.3.19] Integrated documentation into the docs directory
-
[2024.3.9] Based on the RAG module (faiss) by team member @Yue Zhengmeng , integrated the text2image branch, released the fourth phase of the second-generation application based on OpenXLab A100 Click to try it out and OpenXLab A10 application Click to try it out
-
[2024.3.4] Added English README
-
[2024.3.3] Based on the paraformer voice input module by team member @sole fish, integrated the text2image branch, released the third phase of the second-generation application based on OpenXLab A100
Click to try it out(Link deprecated) -
[2024.2.24] Based on the RAG module (Chroma) by team member @Charles, integrated the text2image branch, released the second phase of the second-generation application based on OpenXLab A100
Click to try it out(Link deprecated) -
[2024.2.22] Based on the text-to-image module by team member @Fang Yuliang and the whisper voice input module by @sole fish, integrated the text2image branch, released the first phase of the second-generation application(InternLM2-chat-7B as the base model) based on OpenXLab A100
Click to try it out(Link deprecated) -
[2024.1.30] Released the model and APP finetuned on the whole 1.5 million recipe based on InternLM-chat-7B (Using InternStudio+A100 1/4X2 40G memory for fine-tuning, from 1.25 15:46 to 1.30 12:25, fine-tuning duration was 4 days 20 hours 39 minutes) by team member @zhanghui-china
-
[2024.1.28] Released the model and APP finetuned on a slice of 1.5 million recipe based on InternLM-chat-7B (Using WSL+Ubuntu22.04+RTX4090 24G memory for fine-tuning, from 1.26 18:40 to 1.28 13:46, fine-tuning duration was 1 day 19 hours 6 minutes) by team member @zhanghui-china
Download the 1.5 million XiaChuFang fine-tuning dataset: Download Link (password: 8489)
- Set up a Python virtual environment:
conda create -n cook python=3.10 -y
conda activate cook
- Clone the repository:
git clone https://github.com/SmartFlowAI/TheGodOfCookery.git
cd ./TheGodOfCookery
- Install PyTorch and other dependencies:
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt
Note: Choose the CUDA version according to your own CUDA installation, typically 11.8 or 12.1.
- Train the first-generation 7b model using xtuner 0.1.9, fine-tune on internlm-chat-7b.
- Train the second-generation 7b model using xtuner 0.1.13, fine-tune on internlm2-chat-7b.
- Train the second-generation 1.8b model using xtuner 0.1.15.dev0, fine-tune on internlm2-chat-1.8b.
Fine-tuning method:
xtuner train ${YOUR_CONFIG} --deepspeed deepspeed_zero2
--deepspeed
indicates using DeepSpeed to optimize the training process. XTuner integrates several strategies, including ZeRO-1, ZeRO-2, and ZeRO-3. If you wish to disable this feature, simply remove this parameter.
Convert the saved .pth
model (if using DeepSpeed, this will be a directory) into a LoRA model:
export MKL_SERVICE_FORCE_INTEL=1
xtuner convert pth_to_hf ${YOUR_CONFIG} ${PTH} ${LoRA_PATH}
Merge the LoRA model into the HuggingFace model:
xtuner convert merge ${Base_PATH} ${LoRA_PATH} ${SAVE_PATH}
xtuner chat ${SAVE_PATH} [optional arguments]
Arguments:
--prompt-template
: Use 'internlm_chat' for the first-generation model and 'internlm2_chat' for the second-generation model.--system
: Specify the dialogue system identifier.--bits {4,8,None}
: Specify the LLM's bit rate. Default is fp16.--no-streamer
: If you want to remove the streamer.--top
: For second-generation models, a recommendation of 0.8.--temperature
: For second-generation models, a recommendation of 0.8.--repetition-penalty
: For the second-generation 7b model, recommended 1.002; for the 1.8b model, recommended 1.17; for the first-generation model, no need to specify.- For more information, execute
xtuner chat -h
to view.
Two-phase dialogue effects (text + image dialogue):
Demo access addresses: A100 A10
First-phase dialogue effects (text-only dialogue):
Demo examples
- ModelScope First-Generation 7b Model
- ModelScope Second-Generation 7b Model
- ModelScope Second-Generation 1.8b Model
- OpenXLab First-Generation 7b Model
- OpenXLab Second-Generation 7b Model
Example code for model interaction:
import torch
from modelscope import AutoTokenizer, AutoModelForCausalLM
from tools.transformers.interface import GenerationConfig, generate_interactive
# Relative path on ModelScope, for example, the path for the second-generation fine-tuned model would be zhanghuiATchina/zhangxiaobai_shishen2_full
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map='auto')
model = model.eval()
messages = []
generation_config = GenerationConfig(max_length=max_length, top_p=0.8, temperature=0.8, repetition_penalty=1.002)
response, history = model.chat(tokenizer, "Hello", history=[])
print(response)
response, history = model.chat(tokenizer, "How to make sour and spicy fish", history=history)
print(response)
Second Phase
Project Directory
|---assets # Image directory, generated assets are also temporarily stored here, planning to move to other directories in the future
| |---robot.png # Dialogue robot icon
| |---user.png # Dialogue user icon
| |---shishen.png # Project icon (Main contributor: Liu Guanglei)
|
|---config # Configuration file directory (Main contributor: Fang Yuliang)
| |---__init__.py # Initialization script
| |---config.py # Configuration script
|
|---docs # Documentation directory
| |---tech_report.md # Technical report
| |---Introduce_x.x.pdf # Project introduction PPT
|
|---eval # RAG module evaluation directory
|
|---food_icon # Ingredient icon directory
| |---*.png # Icons for various ingredients
|
|---gen_image # Text-to-Image directory (Main contributor: Fang Yuliang)
| |---__init__.py # Initialization script
| |---sd_gen_image.py # Text-to-Image module using Stable Diffusion
| |---zhipu_ai_image.py # Text-to-Image module using Zhipu AI
|
|---images # Cache images generated by the text-to-image model
|
|---rag_langchain # Second-generation RAG code directory (Main contributor: Yue Zhengmeng)
| |---chroma_db # Chroma database directory
| | |- chroma.sqlite3 # Chroma database file
| |---data # Directory of Recipe Datasets
| | |- tran_dataset_1000.json # Test Recipe Datasets with only 1000 data
| |---faiss_index # FAISS database directory
| | |- index.faiss
| | |- index.pkl
| |---retrieve # retrieve save directory
| | |- bm25retriever.pkl # Serialized saved BM25retrieve
| |---CookMasterLLM.py # LLM packaged by langchain
| |---create_db_json.py # Create vector database script
| |---HyQEContextualCompressionRetriever.py # HyQE retriever
| |---interface.py # RAG module interface
| |---README.md # RAG module description
|
|---speech # Paraformer voice recognition directory (Main contributor: solo fish)
| |---__init__.py # Initialization script
| |---utils.py # Voice recognition processing script
|
|---app.py # Web Demo main script
|---cli_demo.py # Model testing script
|---convert_t2s.py # Traditional to Simplified Chinese conversion tool (Main contributor: Bin Bin)
|---download.py # Model download script
|---parse_cur_response.py # Output formatting tool (Main contributor: Bin Bin)
|---start.py # streamlit start script
|---web_demo.py # Web Demo start script
|---requirements.txt # System dependency packages (please use pip install -r requirements.txt for installation)
|---README.md # This document
Name | Organization | Contribution | Remarks |
---|---|---|---|
Zhang Xiaobai | Graduated from Nanjing University, Data Engineer at a company | Project planning, testing, and miscellaneous tasks | Huawei Cloud HCDE (formerly Huawei Cloud MVP), Top 10 Huawei Cloud Community Bloggers in 2020, Outstanding Ascend Community Developer in 2022, Outstanding Huawei Cloud Community Moderator in 2022, MindSpore Evangelist, Excellent DataWhale Learner |
sole fish | PhD student at the University of Chinese Academy of Sciences | Voice input module | |
Charles | Bachelor's degree from Tongji University, currently applying for master's | RAG module (based on Chroma) for the first generation | |
Yue Zhengmeng | Bachelor's degree from Shanghai Ocean University, currently applying for master's | RAG module (based on faiss & Chroma) for the second generation | |
Bin Bin | Bachelor's degree from East China Normal University, Algorithm Developer at a company | Formatting output | |
Fang Yuliang | Graduated from Nanjing University, Algorithm Engineer at a company | Text-to-Image module, configuration tools | |
Liu Guanglei | - | Icon design, frontend optimization | |
Xuanyuan | Master's student at Nanjing University | Document preparation, dataset, model fine-tuning | |
Hong Cheng | Major maintainer of minisora | Resource integration and suggestions on future development | |
usamimeri | Undergraduate in Xiamen University | First steps of llama-index framework |
We would like to extend our gratitude to the Shanghai Artificial Intelligence Laboratory for organizing the Shusheng·Puyu Practical Camp event~~~
We are deeply grateful for the computational support provided by OpenXLab for project deployment~~~
A heartfelt thank you to Puyu Assistant for their support of the project~~~
This project is licensed under the Apache License 2.0.