patch(errors): fix all warnings and errors in build log. fixes #35 (#34)

Signed-off-by: Laura Santamaria <[email protected]>
instructlab · Jan 6, 2025 · 4adf2fa · 4adf2fa
1 parent a45c143
commit 4adf2fa
Show file tree

Hide file tree

Showing 10 changed files with 134 additions and 110 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1 +1,9 @@
 venv*
+
+# uv
+uv.lock
+pyproject.toml
+.python-version
+
+# pycharm
+.idea
diff --git a/docs/adding-data-to-model/creating_new_knowledge_or_skills.md b/docs/adding-data-to-model/creating_new_knowledge_or_skills.md
@@ -162,7 +162,7 @@ ilab model train
 
 ⏳ This step can potentially take **several hours** to complete depending on your computing resources. Please stop `ilab model chat` and `ilab model serve` first to free resources.
 
-When running multi phase training evaluation is run on each phase, we will tell you which checkpoint in this folder performs the best.
+When running multiphase training evaluation is run on each phase, we will tell you which checkpoint in this folder performs the best.
 
 #### Train the model locally on an M-series Mac or on Linux using the full pipeline
 
@@ -237,7 +237,7 @@ On a Mac `ilab model train` outputs a brand-new model that is saved in the `<mod
 
 #### Train the model locally with GPU acceleration
 
-Training has support for GPU acceleration with Nvidia CUDA or AMD ROCm. Please see [the GPU acceleration documentation](./docs/gpu-acceleration.md) for more details. At present, hardware acceleration requires a data center GPU or high-end consumer GPU with at least 18 GB free memory.
+Training has support for GPU acceleration with Nvidia CUDA or AMD ROCm. Please see [the GPU acceleration documentation](https://github.com/instructlab/instructlab/blob/main/docs/gpu-acceleration.md) for more details. At present, hardware acceleration requires a data center GPU or high-end consumer GPU with at least 18 GB free memory.
 
 ```shell
 ilab model train --pipeline accelerated --device cuda --data-path <path-to-sdg-data>
@@ -251,9 +251,9 @@ ilab model train --pipeline full --device cpu --data-path ~/.local/share/instruc
 
 This version of `ilab model train` outputs brand-new models that can be served in the `~/.local/share/instructlab/checkpoints` directory.  These models can be run through `ilab model evaluate` to choose the best one.
 
-#### Train the model locally with multi-phase training and GPU acceleration
+#### Train the model locally with multiphase training and GPU acceleration
 
-`ilab model train` supports multi-phase training. This results in the following workflow:
+`ilab model train` supports multiphase training. This results in the following workflow:
 
 1. We train the model on knowledge
 2. Evaluate the trained model to find the best checkpoint
@@ -264,13 +264,13 @@ This version of `ilab model train` outputs brand-new models that can be served i
 ilab model train --strategy lab-multiphase --phased-phase1-data <knowledge train messages jsonl> --phased-phase2-data <skills train messages jsonl> -y
 ```
 
-This command takes in two `.jsonl` files from your `datasets` directory, one is the knowledge jsonl and the other is a skills jsonl. The `-y` flag skips an interactive prompt asking the user if they are sure they want to run multi-phase training.
+This command takes in two `.jsonl` files from your `datasets` directory, one is the knowledge jsonl and the other is a skills jsonl. The `-y` flag skips an interactive prompt asking the user if they are sure they want to run multiphase training.
 
 ⏳ This command may take 3 or more hours depending on the size of the data and number of training epochs you run.
 
 #### Train the model in the cloud
 
-Follow the instructions in [Training](./notebooks/README.md).
+Follow the instructions in [Training](https://github.com/instructlab/instructlab/blob/main/notebooks/README.md).
 
 ⏳ Approximate amount of time taken on each platform:
 
@@ -452,12 +452,12 @@ argument to specify your new model:
    ilab model chat -m <New model path>
    ```
 
-   If you are interested in optimizing the quality of the model's responses, please see [`TROUBLESHOOTING.md`](./TROUBLESHOOTING.md#model-fine-tuning-and-response-optimization)
+   If you are interested in optimizing the quality of the model's responses, please see [`TROUBLESHOOTING.md`](https://github.com/instructlab/instructlab/blob/main/TROUBLESHOOTING.md#model-fine-tuning-and-response-optimization)
 
 ## 🎁 Submit your new knowledge or skills
 
 Of course, the final step is, if you've improved the model, to open a pull-request in the [taxonomy repository](https://github.com/instructlab/taxonomy) that includes the files (e.g. `qna.yaml`) with your improved data.
 
 ## 📬 Contributing
 
-Check out our [contributing](CONTRIBUTING/CONTRIBUTING.md) guide to learn how to contribute.
+Check out our [contributing](../community/CONTRIBUTING.md) guide to learn how to contribute.
diff --git a/docs/community/CONTRIBUTING.md b/docs/community/CONTRIBUTING.md
@@ -66,7 +66,7 @@ Once you've created a pull request (PR), maintainers will review your code and m
 * Write detailed commit messages
 * Break large changes into a logical series of smaller patches, which are easy to understand individually and combine to solve a broader issue
 
-For a list of the maintainers and triagers, see the [MAINTAINERS.md](MAINTAINERS.md) page.
+For a list of the maintainers and triagers, see the [MAINTAINERS.md](https://github.com/instructlab/community/blob/main/MAINTAINERS.md) page.
 
 ### Proposing new features
 
@@ -93,7 +93,7 @@ Distributed under the [Apache License, Version 2.0](http://www.apache.org/licens
 
 SPDX-License-Identifier: [Apache-2.0](https://spdx.org/licenses/Apache-2.0)
 
-If you would like to see the detailed LICENSE click [here](LICENSE).
+If you would like to see the detailed LICENSE click [here](../LICENSE).
 
 ### Developer Certificate of Origin (DCO)
 
@@ -118,7 +118,7 @@ We automatically verify that all commit messages contain a `Signed-off-by:` line
 
 There are a number of tools that make it easier for developers to manage DCO signoffs.
 
-* DCO command line tool, which let's you do a single signoff for an entire repo ( <https://github.com/coderanger/dco> )
+* DCO command line tool, which lets you do a single signoff for an entire repo ( <https://github.com/coderanger/dco> )
 * GitHub UI integrations for adding the signoff automatically ( <https://github.com/scottrigby/dco-gh-ui> )
 * Chrome - <https://chrome.google.com/webstore/detail/dco-github-ui/onhgmjhnaeipfgacbglaphlmllkpoijo>
 * Firefox - <https://addons.mozilla.org/en-US/firefox/addon/scott-rigby/?src=search>
@@ -133,9 +133,9 @@ The following resources include additional information about each repository, su
 
 ### ilab CLI tool additional resources
 
-* [`ilab` CLI tool README.md](https://github.com/instructlab/instructlab/blob/main/README.md#). This resources provides information about the `ilab` CLI tool, including an overview, getting started, training the model, submitting a pull request, etc.
+* [`ilab` CLI tool README.md](https://github.com/instructlab/instructlab/blob/main/README.md#). This resource provides information about the `ilab` CLI tool, including an overview, getting started, training the model, submitting a pull request, etc.
 
-* [`ilab` CLI tool CONTRIBUTING.md](https://github.com/instructlab/instructlab/blob/main/CONTRIBUTING/CONTRIBUTING.md). This resources provides information about contributing to the `ilab` CLI tool repository, reporting bugs, testing, coding styles, etc.
+* [`ilab` CLI tool CONTRIBUTING.md](https://github.com/instructlab/instructlab/blob/main/CONTRIBUTING/CONTRIBUTING.md). This resource provides information about contributing to the `ilab` CLI tool repository, reporting bugs, testing, coding styles, etc.
 
 ### Taxonomy additional resources
 

diff --git a/docs/community/FAQ.md b/docs/community/FAQ.md
@@ -86,7 +86,7 @@ InstructLab is driven by taxonomies and works by empowering users to add new [_s
 
 ### What are the goals of the InstructLab project?
 
-The goal on the InstructLab project is to emocratize contributions to AI and LLMs. There are two approaches to achieving this goal in our community:
+The goal on the InstructLab project is to democratize contributions to AI and LLMs. There are two approaches to achieving this goal in our community:
 
 * Enabling collaborative contribution to a large language model (LLM) through [the project's _taxonomy_ repository](https://github.com/instructlab/taxonomy). When users contribute to this repository, the project resynthesizes its open source training data. Our community Granite-based model is then retrained, ensuring that community contributions are integrated while enriching the model’s capabilities over time.
 
@@ -118,6 +118,12 @@ When contributors write an addition to the existing taxonomy, make a pull reques
 
 Contributions to the InstructLab project include fine-tuning Granite-7b, an open-source licensed LLM. Contributors have direct access to the model they are improving through [Hugging Face](https://huggingface.co/instructlab).
 
+### What is Merlinite-7b?
+
+Merlinite-7b is a Mistral-7b derivative model fine-tuned with the LAB (**L**arge-scale **A**lignment for chat**B**ots) method using Mixtral-8x7b-Instruct as a teacher model.
+
+More information about the Merlinite-7b can be found on the [Hugging Face project page](https://huggingface.co/instructlab/merlinite-7b-lab).
+
 ### What is Granite-7-lab?
 
 Granite-7b-lab is a model that was built from scratch by IBM and fine tuned with the LAB (**L**arge-scale **A**lignment for chat**B**ots) method.

diff --git a/docs/taxonomy/index.md b/docs/taxonomy/index.md
@@ -66,11 +66,11 @@ domain. Maintainers can decide to change the names of the existing branches or t
    knowledge --> knowledge/miscellaneous_unknown
    knowledge --> knowledge/science
    knowledge --> knowledge/technology
-   knowledge/science --> animals --> birds --> black_capped_chickadee --> black_capped_chikadee-a & black_capped_chikadee-q
+   knowledge/science --> animals --> birds --> black_capped_chickadee --> black_capped_chickadee-a & black_capped_chickadee-q
    knowledge/science --> astronomy --> constellations --> phoenix --> phoenix-a & phoenix-q
 
-   black_capped_chikadee-a{attribution.txt}
-   black_capped_chikadee-q{qna.yaml}
+   black_capped_chickadee-a{attribution.txt}
+   black_capped_chickadee-q{qna.yaml}
    phoenix-a{attribution.txt}
    phoenix-q{qna.yaml}
    classDef na fill:#EEE
@@ -108,7 +108,7 @@ This taxonomy repository will be used as the seed to synthesize the training dat
 
 By contributing your skills and knowledge to this repository, you will see your changes built into an LLM within days of your contribution rather than months or years! If you are working with a model and notice its knowledge or ability lacking, you can correct it by contributing knowledge or skills and check if it's improved after your changes are built.
 
-While public contributions are welcome to help drive community progress, you can also fork this repository under [the Apache License, Version 2.0](LICENSE), add your own internal skills, and train your own models internally. However, you might need your own access to significant compute infrastructure to perform sufficient retraining.
+While public contributions are welcome to help drive community progress, you can also fork this repository under [the Apache License, Version 2.0](../LICENSE), add your own internal skills, and train your own models internally. However, you might need your own access to significant compute infrastructure to perform sufficient retraining.
 
 ## Ways to Contribute
 
@@ -121,10 +121,10 @@ For more information, see the [Ways of contributing to the taxonomy repository](
 
 ## How to contribute skills and knowledge
 
-To contribute to this repo, you'll use the *Fork and Pull* model common in many open source repositories. You can add your skills and knowledge to the taxonomy in multiple ways; for additional information on how to make a contribution, see the [Documentation on contributing](CONTRIBUTING.md). You can also use the following guides to help with contributing:
+To contribute to this repo, you'll use the *Fork and Pull* model common in many open source repositories. You can add your skills and knowledge to the taxonomy in multiple ways; for additional information on how to make a contribution, see the [Documentation on contributing](../community/CONTRIBUTING.md). You can also use the following guides to help with contributing:
 
-- Contributing using the [GitHub webpage UI](docs/contributing_via_GH_UI.md).
-- Contributing knowledge to the taxonomy in the [Knowledge contribution guidelines](docs/knowledge-contribution-guide.md).
+- Contributing using the [GitHub webpage UI](https://github.com/instructlab/taxonomy/blob/main/docs/contributing_via_GH_UI.md).
+- Contributing knowledge to the taxonomy in the [Knowledge contribution guidelines](../taxonomy/knowledge/guide.md).
 
 ### Why should I contribute?
 

diff --git a/docs/user-interface/knowledge_contributions.md b/docs/user-interface/knowledge_contributions.md
@@ -8,7 +8,7 @@ The UI Simplifies the process for Skills & Knowledge contributions by:
 
 * Minimising risk of human error when writing YAML by using the web form. 
 
-* Directly submit a github pull request with a press of a button.
+* Directly submit a GitHub pull request with a press of a button.
 
 When the form is filled out, you also are given the option to download the YAML and attribution files to your local machine, and to view the form in its original YAML structure before submission.
 
@@ -17,11 +17,11 @@ You can view all your submissions on the dashboard page.
 !!! warning
     Even when running the UI locally, you must be logged in via github to successfully submit your Knowledge and Skills contributions. You can still fill out the form, and download the YAML and attribution files.
 
-For tips on writing Skills & Knowledge contributions, please visit the documentation under the [Taxonomy](/taxonomy/) heading.
+For tips on writing Skills & Knowledge contributions, please visit the documentation under the [Taxonomy](../taxonomy/index.md) heading.
 
 ## Knowledge Contributions
 
-Firstly you will need to find a source document for your knowledge. Accepted sources can be found [here](/taxonomy/knowledge/guide).
+Firstly you will need to find a source document for your knowledge. Accepted sources can be found [here](../taxonomy/knowledge/guide.md).
 
 Navigate to the Contribute section of the sidebar and click Knowledge. Here you will see the form to contribute Knowledge to the open-source taxonomy tree.
 
@@ -51,13 +51,13 @@ Here you will begin filling out your QNA examples that represent the knowledge y
 
 ### Document Information
 
-You must prepare a markdown file version of the document you wish to use for the knowledge submission. By dragging and dropping the markdown file into the box, and clicking the submit files button, a forked version of the taxonomy repository will be automatically created on your github profile. 
+You must prepare a markdown file version of the document you wish to use for the knowledge submission. By dragging and dropping the markdown file into the box, and clicking the submit files button, a forked version of the taxonomy repository will be automatically created on your GitHub profile. 
 
 ![UI Knowledge Document Information](../images/user-interface/ui_knowledge_document_info.png)
 
-![Forked Repository Showcase](../images//user-interface/ui_knowledge_repo_created.png)
+![Forked Repository Showcase](../images/user-interface/ui_knowledge_repo_created.png)
 
-If you've already uploaded the markdown file to your github, you can switch to manually adding the document, and entering the `commit sha`.
+If you've already uploaded the markdown file to your GitHub, you can switch to manually adding the document, and entering the `commit sha`.
 
 ![UI Knowledge Document Manual Information](../images/user-interface/ui_knowledge_document_manual_info.png)
 
@@ -77,4 +77,4 @@ Once you have submitted a Skills or Knowledge Contribution, you can view it on y
 
 ![UI Dashboard With Contribution](../images/user-interface/ui_dashboard_with_submission.png)
 
-[Next Steps](/user-interface/skills_contributions/){: .md-button .md-button--primary }
+[Next Steps](skills_contributions.md){: .md-button .md-button--primary }
diff --git a/docs/user-interface/playground_chat.md b/docs/user-interface/playground_chat.md
@@ -4,28 +4,28 @@ description: Steps to set up the playground to chat with a model
 logo: images/ilab_dog.png
 ---
 
-To run with a locally run model, make sure that iLab model serve is running in a seperate terminal. If you are unsure on how to do this, please visit the [Intro to serve and chat](/getting-started/serve_and_chat/) section of this document.
+To run with a locally run model, make sure that iLab model serve is running in a separate terminal. If you are unsure on how to do this, please visit the [Intro to serve and chat](../getting-started/serve_and_chat.md) section of this document.
 
-If you go to `Playground > Chat` by using the side navigation bar, you can interact with the merlinite and granite models. 
+If you go to `Playground > Chat` by using the side navigation bar, you can interact with the Merlinite and Granite models. 
 
 ![UI No Model Response](../images/user-interface/ui_no_model_response.png)
 
-If you are running the ui within a dev environment, the model won't reply because a granite/merinite model endpoint hasn't been given. In this case, we will create a new custom model endpoint, using our locally hosted quantised model.
+If you are running the ui within a dev environment, the model won't reply because a Granite/Merinite model endpoint hasn't been given. In this case, we will create a new custom model endpoint, using our locally hosted quantised model.
 
 To add a custom model endpoint, go to `Playground > Custom Model Endpoints` and press the `Add Endpoint` button on the right side. 
 
-You will have 3 fields to fill out
+You will have 3 fields to fill out:
 
 * The URL, where your customised model is hosted, if hosting locally, the URL would be `http://127.0.0.1:8000/`
 
 * The Model Name, `merlinite-7b-lab-Q4_K_M.gguf`
 
-* API Key, you may put any text in here; in this case I've used`randomCharacters`. If you are setting up an API key, please provide the key in this section.
+* API Key, you may put any text in here; in this case I've used `randomCharacters`. If you are setting up an API key, please provide the key in this section.
 
 ![UI Custom Model Endpoint](../images/user-interface/ui_custom_model_endpoint.png)
 
 Go back to the playground chat, select newly added model and chat.
 
 ![UI Model Response](../images/user-interface/ui_model_response.png)
 
-[Next Steps](/user-interface/knowledge_contributions/){: .md-button .md-button--primary }
+[Next Steps](knowledge_contributions.md){: .md-button .md-button--primary }