diff --git a/_contents/S0-L25.md b/_contents/S0-L25.md index 54f04080..82d1a773 100755 --- a/_contents/S0-L25.md +++ b/_contents/S0-L25.md @@ -1,6 +1,6 @@ --- layout: post -title: LLM Tooling II +title: LLM Tooling II fine tuning lecture: lectureVersion: next extraContent: @@ -19,15 +19,10 @@ In this session, our readings cover: ## Required Readings: -### Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models." -arXiv preprint arXiv:2203.06904 (2022).[4] - - ### Recent Large Language Models Reshaping the Open-Source Arena + https://deci.ai/blog/list-of-large-language-models-in-open-source/ + The release of Meta’s Llama model and the subsequent release of Llama 2 in 2023 kickstarted an explosion of open-source language models, with better and more innovative models being released on what seems like a daily basis. With new open-source models being released on a daily basis, here we dove into the ocean of open-source possibilities to curate a select list of the most intriguing and influential models making waves in recent months, inlcuding Qwen1.5/ Yi/ Smaug/ Mixtral-8x7B-v0.1/ DBRX/ SOLAR-10.7B-v1.0 / Tulu 2 / WizardLM/ Starling 7B/ OLMo-7B/ Gemma and DeciLM-7B. - + Plus the newly avaiable DBRX model https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm @@ -37,17 +32,17 @@ arXiv preprint arXiv:2203.06904 (2022).[4] ## More readings -### Must know tools for training/finetuning LLM's - -1. Torchtune - Build on top of Pytorch, for training and finetuning LLM's. Uses yaml based configs for easily running experiments. Github - https://lnkd.in/ghu6wx9r -2. axolotl - Built on top on Huggigface peft and transformer library, supports fine-tuning a large number for models like Mistral, LLama etc. Provides support for techniques like RLHF, DPO, LORA, qLORA etc. Github - https://lnkd.in/gYpisva9 -3. LitGPT - Build on nanoGPT and Megatron, support pre-training and fine-tuning, has examples like Starcoder, TinyLlama etc. Github - https://lnkd.in/gKisgXms -4. Maxtext - Jax based library for training LLM's on Google TPU's with configs for models like Gemma, Mistral and LLama2 etc. Github - https://lnkd.in/gjeHvZF4 ### Instruction Tuning for Large Language Models: A Survey + https://arxiv.org/abs/2308.10792 + Shengyu Zhang, Linfeng Dong, Xiaoya Li, Sen Zhang, Xiaofei Sun, Shuhe Wang, Jiwei Li, Runyi Hu, Tianwei Zhang, Fei Wu, Guoyin Wang + This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research. Project page: this http URL +### Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models." +arXiv preprint arXiv:2203.06904 (2022).[4] + + + ### QLoRA: Efficient Finetuning of Quantized LLMs + Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. QLoRA introduces a number of innovations to save memory without sacrificing performance: (a) 4-bit NormalFloat (NF4), a new data type that is information theoretically optimal for normally distributed weights (b) double quantization to reduce the average memory footprint by quantizing the quantization constants, and (c) paged optimziers to manage memory spikes. We use QLoRA to finetune more than 1,000 models, providing a detailed analysis of instruction following and chatbot performance across 8 instruction datasets, multiple model types (LLaMA, T5), and model scales that would be infeasible to run with regular finetuning (e.g. 33B and 65B parameter models). Our results show that QLoRA finetuning on a small high-quality dataset leads to state-of-the-art results, even when using smaller models than the previous SoTA. We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Furthermore, we find that current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to ChatGPT. We release all of our models and code, including CUDA kernels for 4-bit training. @@ -58,6 +53,11 @@ We present QLoRA, an efficient finetuning approach that reduces memory usage eno + https://arxiv.org/abs/2106.09685 + An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at this https URL. +### Must know tools for training/finetuning LLM's - +1. Torchtune - Build on top of Pytorch, for training and finetuning LLM's. Uses yaml based configs for easily running experiments. Github - https://lnkd.in/ghu6wx9r +2. axolotl - Built on top on Huggigface peft and transformer library, supports fine-tuning a large number for models like Mistral, LLama etc. Provides support for techniques like RLHF, DPO, LORA, qLORA etc. Github - https://lnkd.in/gYpisva9 +3. LitGPT - Build on nanoGPT and Megatron, support pre-training and fine-tuning, has examples like Starcoder, TinyLlama etc. Github - https://lnkd.in/gKisgXms +4. Maxtext - Jax based library for training LLM's on Google TPU's with configs for models like Gemma, Mistral and LLama2 etc. Github - https://lnkd.in/gjeHvZF4 ### Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models