From 5bfb2d06b3022deda539d94ec40b47de0ea70b12 Mon Sep 17 00:00:00 2001 From: naimzieraupro <149715137+naimzieraupro@users.noreply.github.com> Date: Mon, 21 Oct 2024 14:45:55 +0200 Subject: [PATCH] InstructLab Knowledge Signed-off-by: Naim Zierau --- knowledge/technology/attribution.txt | 5 ++ knowledge/technology/qna.yaml | 130 +++++++++++++++++++++++++++ 2 files changed, 135 insertions(+) create mode 100644 knowledge/technology/attribution.txt create mode 100644 knowledge/technology/qna.yaml diff --git a/knowledge/technology/attribution.txt b/knowledge/technology/attribution.txt new file mode 100644 index 000000000..01c95c146 --- /dev/null +++ b/knowledge/technology/attribution.txt @@ -0,0 +1,5 @@ +Title of work: Contributing knowledge to the open source Granite models and LLMs using the InstructLab UI +Link to work: https://developer-stage.dc4.usva.ibm.com/tutorials/awb-contributing-llm-granite-instructlab-ui/ +Revision: Feeding information about instructionlab into an LLM +License of the work: A hands-on guide +Creator names: IBM diff --git a/knowledge/technology/qna.yaml b/knowledge/technology/qna.yaml new file mode 100644 index 000000000..1f16e29d2 --- /dev/null +++ b/knowledge/technology/qna.yaml @@ -0,0 +1,130 @@ +created_by: naimzieraupro +version: 3 +domain: IBM Website on InstructLab +document_outline: >- + We finetune a model to have advanced knowledge on how to set up and run + instruct lab. +seed_examples: + - context: >- + InstructLab is an open-source project that focuses on improving Large + Language Models (LLMs) by enabling community contributions. It addresses + challenges like the need for specialized skills and extensive computing + resources by offering a user-friendly interface. The project facilitates + collaborative fine-tuning of LLMs, allowing developers and non-developers + alike to contribute new knowledge or skills without dealing with the + complexities of YAML structures or GitHub processes. + questions_and_answers: + - question: What is the primary goal of InstructLab? + answer: >- + The primary goal of InstructLab is to improve Large Language Models + (LLMs) through community contributions, making the process of + fine-tuning and knowledge addition more accessible and collaborative. + - question: How does InstructLab reduce complexity in fine-tuning LLMs? + answer: > + InstructLab reduces complexity by providing a user-friendly interface + that eliminates the need for manually handling YAML structures or + navigating GitHub pull requests, making it easier for a broader range + of users to contribute. + - question: What type of contributors can benefit from using InstructLab? + answer: >- + Both developers and non-developers can benefit from using InstructLab, + as it allows them to contribute to LLMs without requiring extensive + technical expertise in YAML or GitHub. + - context: >- + The "lab" in InstructLab stands for Large-Scale Alignment for ChatBots, a + method used to ensure that LLMs are fine-tuned effectively with + user-contributed knowledge and skills. This alignment is achieved through + a process of generating synthetic data and creating taxonomies that help + the models understand and categorize information better. LAB is designed + to make models more efficient and accurate in handling specific tasks. + questions_and_answers: + - question: What does LAB stand for in InstructLab? + answer: >- + LAB stands for Large-Scale Alignment for ChatBots, which is the method + used to align LLMs with user-contributed knowledge and skills. + - question: How does the LAB method enhance the training of LLMs? + answer: >- + The LAB method enhances LLM training by using synthetic data + generation and taxonomies to fine-tune models, ensuring they better + understand and perform specific tasks. + - question: What role do taxonomies play in the LAB method? + answer: >- + Taxonomies play a crucial role in the LAB method by organizing + knowledge and skills into structured categories, making it easier for + LLMs to align with the intended contributions and tasks. + - context: >- + The InstructLab User Interface (UI) simplifies the process of contributing + to LLMs by providing an intuitive platform for adding knowledge or skills. + Users can focus on the content of their contributions without worrying + about technical aspects like YAML formatting or validation rules. This + feature is particularly beneficial for users who are unfamiliar with + GitHub processes or complex coding tasks. + questions_and_answers: + - question: ' How does the InstructLab UI simplify the contribution process?' + answer: >- + The InstructLab UI simplifies the contribution process by providing an + intuitive interface, allowing users to focus on their knowledge or + skill contributions without handling technical tasks like YAML + formatting or validation rules. + - question: What type of users does the InstructLab UI cater to? + answer: >- + The InstructLab UI caters to a wide range of users, including those + who may not be familiar with tools like GitHub or YAML, as well as + more technically skilled developers. + - question: What is one of the main benefits of using InstructLab UI? + answer: >- + One of the main benefits of using InstructLab UI is that it removes + the complexity of manually managing YAML structures, making it easier + for users to contribute knowledge and skills to the taxonomy + repository. + - context: >- + InstructLab allows users to fine-tune open-source models like the IBM + Granite and Merlinite models by contributing new knowledge. The process + involves creating a markdown file with new information, adding it to the + taxonomy, and generating synthetic data for training. Once the model is + fine-tuned, users can verify its performance by asking questions based on + the new knowledge contributed. + questions_and_answers: + - question: Which models can be fine-tuned using InstructLab? + answer: >- + InstructLab allows users to fine-tune open-source models such as the + IBM Granite model and the Merlinite model, which is a derivative of + Mistral-7b. + - question: What is required to fine-tune a model in InstructLab? + answer: >- + To fine-tune a model in InstructLab, users need to create a markdown + file with new knowledge, add it to the taxonomy, generate synthetic + data, and train the model with the updated information. + - question: How can users verify that the model has been successfully fine-tuned? + answer: >- + Users can verify that the model has been successfully fine-tuned by + chatting with it and asking questions related to the new knowledge. + The improved responses indicate successful training. + - context: >- + InstructLab operates as a community-based project where contributors can + share their knowledge or skills to enhance open-source LLMs. The + contributions are reviewed and periodically released on Hugging Face. By + fostering a collaborative environment, InstructLab ensures that LLMs are + continuously evolving with new, relevant information contributed by a + diverse set of users. + questions_and_answers: + - question: How does InstructLab ensure continuous improvement of LLMs? + answer: >- + InstructLab ensures continuous improvement of LLMs by leveraging + community contributions, which are periodically reviewed and released + on platforms like Hugging Face. + - question: Where are the updated models from InstructLab shared? + answer: >- + The updated models from InstructLab are shared on Hugging Face as part + of a regular release cycle, ensuring that the latest improvements are + made accessible to the public. + - question: What is the significance of community contributions in InstructLab? + answer: >- + Community contributions are vital to InstructLab as they allow a + diverse range of users to provide new knowledge and skills, ensuring + that LLMs are updated with relevant, high-quality information. +document: + repo: https://github.com/naimzieraupro/taxonomy-knowledge-docs + commit: 09806c17232de70888f508e8a80466afe2269f90 + patterns: + - ilab-20241021T124347506.md