Skip to content

Commit

Permalink
patch(errors): fix all warnings and errors in build log. fixes #35 (#34)
Browse files Browse the repository at this point in the history
Signed-off-by: Laura Santamaria <[email protected]>
  • Loading branch information
nimbinatus authored Jan 6, 2025
1 parent a45c143 commit 4adf2fa
Show file tree
Hide file tree
Showing 10 changed files with 134 additions and 110 deletions.
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,9 @@
venv*

# uv
uv.lock
pyproject.toml
.python-version

# pycharm
.idea
16 changes: 8 additions & 8 deletions docs/adding-data-to-model/creating_new_knowledge_or_skills.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ ilab model train
⏳ This step can potentially take **several hours** to complete depending on your computing resources. Please stop `ilab model chat` and `ilab model serve` first to free resources.
When running multi phase training evaluation is run on each phase, we will tell you which checkpoint in this folder performs the best.
When running multiphase training evaluation is run on each phase, we will tell you which checkpoint in this folder performs the best.
#### Train the model locally on an M-series Mac or on Linux using the full pipeline
Expand Down Expand Up @@ -237,7 +237,7 @@ On a Mac `ilab model train` outputs a brand-new model that is saved in the `<mod
#### Train the model locally with GPU acceleration
Training has support for GPU acceleration with Nvidia CUDA or AMD ROCm. Please see [the GPU acceleration documentation](./docs/gpu-acceleration.md) for more details. At present, hardware acceleration requires a data center GPU or high-end consumer GPU with at least 18 GB free memory.
Training has support for GPU acceleration with Nvidia CUDA or AMD ROCm. Please see [the GPU acceleration documentation](https://github.com/instructlab/instructlab/blob/main/docs/gpu-acceleration.md) for more details. At present, hardware acceleration requires a data center GPU or high-end consumer GPU with at least 18 GB free memory.
```shell
ilab model train --pipeline accelerated --device cuda --data-path <path-to-sdg-data>
Expand All @@ -251,9 +251,9 @@ ilab model train --pipeline full --device cpu --data-path ~/.local/share/instruc
This version of `ilab model train` outputs brand-new models that can be served in the `~/.local/share/instructlab/checkpoints` directory. These models can be run through `ilab model evaluate` to choose the best one.
#### Train the model locally with multi-phase training and GPU acceleration
#### Train the model locally with multiphase training and GPU acceleration
`ilab model train` supports multi-phase training. This results in the following workflow:
`ilab model train` supports multiphase training. This results in the following workflow:
1. We train the model on knowledge
2. Evaluate the trained model to find the best checkpoint
Expand All @@ -264,13 +264,13 @@ This version of `ilab model train` outputs brand-new models that can be served i
ilab model train --strategy lab-multiphase --phased-phase1-data <knowledge train messages jsonl> --phased-phase2-data <skills train messages jsonl> -y
```
This command takes in two `.jsonl` files from your `datasets` directory, one is the knowledge jsonl and the other is a skills jsonl. The `-y` flag skips an interactive prompt asking the user if they are sure they want to run multi-phase training.
This command takes in two `.jsonl` files from your `datasets` directory, one is the knowledge jsonl and the other is a skills jsonl. The `-y` flag skips an interactive prompt asking the user if they are sure they want to run multiphase training.
⏳ This command may take 3 or more hours depending on the size of the data and number of training epochs you run.
#### Train the model in the cloud
Follow the instructions in [Training](./notebooks/README.md).
Follow the instructions in [Training](https://github.com/instructlab/instructlab/blob/main/notebooks/README.md).
⏳ Approximate amount of time taken on each platform:
Expand Down Expand Up @@ -452,12 +452,12 @@ argument to specify your new model:
ilab model chat -m <New model path>
```
If you are interested in optimizing the quality of the model's responses, please see [`TROUBLESHOOTING.md`](./TROUBLESHOOTING.md#model-fine-tuning-and-response-optimization)
If you are interested in optimizing the quality of the model's responses, please see [`TROUBLESHOOTING.md`](https://github.com/instructlab/instructlab/blob/main/TROUBLESHOOTING.md#model-fine-tuning-and-response-optimization)

## 🎁 Submit your new knowledge or skills

Of course, the final step is, if you've improved the model, to open a pull-request in the [taxonomy repository](https://github.com/instructlab/taxonomy) that includes the files (e.g. `qna.yaml`) with your improved data.
## 📬 Contributing
Check out our [contributing](CONTRIBUTING/CONTRIBUTING.md) guide to learn how to contribute.
Check out our [contributing](../community/CONTRIBUTING.md) guide to learn how to contribute.
10 changes: 5 additions & 5 deletions docs/community/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Once you've created a pull request (PR), maintainers will review your code and m
* Write detailed commit messages
* Break large changes into a logical series of smaller patches, which are easy to understand individually and combine to solve a broader issue

For a list of the maintainers and triagers, see the [MAINTAINERS.md](MAINTAINERS.md) page.
For a list of the maintainers and triagers, see the [MAINTAINERS.md](https://github.com/instructlab/community/blob/main/MAINTAINERS.md) page.

### Proposing new features

Expand All @@ -93,7 +93,7 @@ Distributed under the [Apache License, Version 2.0](http://www.apache.org/licens

SPDX-License-Identifier: [Apache-2.0](https://spdx.org/licenses/Apache-2.0)

If you would like to see the detailed LICENSE click [here](LICENSE).
If you would like to see the detailed LICENSE click [here](../LICENSE).

### Developer Certificate of Origin (DCO)

Expand All @@ -118,7 +118,7 @@ We automatically verify that all commit messages contain a `Signed-off-by:` line

There are a number of tools that make it easier for developers to manage DCO signoffs.

* DCO command line tool, which let's you do a single signoff for an entire repo ( <https://github.com/coderanger/dco> )
* DCO command line tool, which lets you do a single signoff for an entire repo ( <https://github.com/coderanger/dco> )
* GitHub UI integrations for adding the signoff automatically ( <https://github.com/scottrigby/dco-gh-ui> )
* Chrome - <https://chrome.google.com/webstore/detail/dco-github-ui/onhgmjhnaeipfgacbglaphlmllkpoijo>
* Firefox - <https://addons.mozilla.org/en-US/firefox/addon/scott-rigby/?src=search>
Expand All @@ -133,9 +133,9 @@ The following resources include additional information about each repository, su

### ilab CLI tool additional resources

* [`ilab` CLI tool README.md](https://github.com/instructlab/instructlab/blob/main/README.md#). This resources provides information about the `ilab` CLI tool, including an overview, getting started, training the model, submitting a pull request, etc.
* [`ilab` CLI tool README.md](https://github.com/instructlab/instructlab/blob/main/README.md#). This resource provides information about the `ilab` CLI tool, including an overview, getting started, training the model, submitting a pull request, etc.

* [`ilab` CLI tool CONTRIBUTING.md](https://github.com/instructlab/instructlab/blob/main/CONTRIBUTING/CONTRIBUTING.md). This resources provides information about contributing to the `ilab` CLI tool repository, reporting bugs, testing, coding styles, etc.
* [`ilab` CLI tool CONTRIBUTING.md](https://github.com/instructlab/instructlab/blob/main/CONTRIBUTING/CONTRIBUTING.md). This resource provides information about contributing to the `ilab` CLI tool repository, reporting bugs, testing, coding styles, etc.

### Taxonomy additional resources

Expand Down
8 changes: 7 additions & 1 deletion docs/community/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ InstructLab is driven by taxonomies and works by empowering users to add new [_s

### What are the goals of the InstructLab project?

The goal on the InstructLab project is to emocratize contributions to AI and LLMs. There are two approaches to achieving this goal in our community:
The goal on the InstructLab project is to democratize contributions to AI and LLMs. There are two approaches to achieving this goal in our community:

* Enabling collaborative contribution to a large language model (LLM) through [the project's _taxonomy_ repository](https://github.com/instructlab/taxonomy). When users contribute to this repository, the project resynthesizes its open source training data. Our community Granite-based model is then retrained, ensuring that community contributions are integrated while enriching the model’s capabilities over time.

Expand Down Expand Up @@ -118,6 +118,12 @@ When contributors write an addition to the existing taxonomy, make a pull reques

Contributions to the InstructLab project include fine-tuning Granite-7b, an open-source licensed LLM. Contributors have direct access to the model they are improving through [Hugging Face](https://huggingface.co/instructlab).

### What is Merlinite-7b?

Merlinite-7b is a Mistral-7b derivative model fine-tuned with the LAB (**L**arge-scale **A**lignment for chat**B**ots) method using Mixtral-8x7b-Instruct as a teacher model.

More information about the Merlinite-7b can be found on the [Hugging Face project page](https://huggingface.co/instructlab/merlinite-7b-lab).

### What is Granite-7-lab?

Granite-7b-lab is a model that was built from scratch by IBM and fine tuned with the LAB (**L**arge-scale **A**lignment for chat**B**ots) method.
Expand Down
14 changes: 7 additions & 7 deletions docs/taxonomy/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,11 @@ domain. Maintainers can decide to change the names of the existing branches or t
knowledge --> knowledge/miscellaneous_unknown
knowledge --> knowledge/science
knowledge --> knowledge/technology
knowledge/science --> animals --> birds --> black_capped_chickadee --> black_capped_chikadee-a & black_capped_chikadee-q
knowledge/science --> animals --> birds --> black_capped_chickadee --> black_capped_chickadee-a & black_capped_chickadee-q
knowledge/science --> astronomy --> constellations --> phoenix --> phoenix-a & phoenix-q
black_capped_chikadee-a{attribution.txt}
black_capped_chikadee-q{qna.yaml}
black_capped_chickadee-a{attribution.txt}
black_capped_chickadee-q{qna.yaml}
phoenix-a{attribution.txt}
phoenix-q{qna.yaml}
classDef na fill:#EEE
Expand Down Expand Up @@ -108,7 +108,7 @@ This taxonomy repository will be used as the seed to synthesize the training dat

By contributing your skills and knowledge to this repository, you will see your changes built into an LLM within days of your contribution rather than months or years! If you are working with a model and notice its knowledge or ability lacking, you can correct it by contributing knowledge or skills and check if it's improved after your changes are built.

While public contributions are welcome to help drive community progress, you can also fork this repository under [the Apache License, Version 2.0](LICENSE), add your own internal skills, and train your own models internally. However, you might need your own access to significant compute infrastructure to perform sufficient retraining.
While public contributions are welcome to help drive community progress, you can also fork this repository under [the Apache License, Version 2.0](../LICENSE), add your own internal skills, and train your own models internally. However, you might need your own access to significant compute infrastructure to perform sufficient retraining.

## Ways to Contribute

Expand All @@ -121,10 +121,10 @@ For more information, see the [Ways of contributing to the taxonomy repository](

## How to contribute skills and knowledge

To contribute to this repo, you'll use the *Fork and Pull* model common in many open source repositories. You can add your skills and knowledge to the taxonomy in multiple ways; for additional information on how to make a contribution, see the [Documentation on contributing](CONTRIBUTING.md). You can also use the following guides to help with contributing:
To contribute to this repo, you'll use the *Fork and Pull* model common in many open source repositories. You can add your skills and knowledge to the taxonomy in multiple ways; for additional information on how to make a contribution, see the [Documentation on contributing](../community/CONTRIBUTING.md). You can also use the following guides to help with contributing:

- Contributing using the [GitHub webpage UI](docs/contributing_via_GH_UI.md).
- Contributing knowledge to the taxonomy in the [Knowledge contribution guidelines](docs/knowledge-contribution-guide.md).
- Contributing using the [GitHub webpage UI](https://github.com/instructlab/taxonomy/blob/main/docs/contributing_via_GH_UI.md).
- Contributing knowledge to the taxonomy in the [Knowledge contribution guidelines](../taxonomy/knowledge/guide.md).

### Why should I contribute?

Expand Down
14 changes: 7 additions & 7 deletions docs/user-interface/knowledge_contributions.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The UI Simplifies the process for Skills & Knowledge contributions by:

* Minimising risk of human error when writing YAML by using the web form.

* Directly submit a github pull request with a press of a button.
* Directly submit a GitHub pull request with a press of a button.

When the form is filled out, you also are given the option to download the YAML and attribution files to your local machine, and to view the form in its original YAML structure before submission.

Expand All @@ -17,11 +17,11 @@ You can view all your submissions on the dashboard page.
!!! warning
Even when running the UI locally, you must be logged in via github to successfully submit your Knowledge and Skills contributions. You can still fill out the form, and download the YAML and attribution files.

For tips on writing Skills & Knowledge contributions, please visit the documentation under the [Taxonomy](/taxonomy/) heading.
For tips on writing Skills & Knowledge contributions, please visit the documentation under the [Taxonomy](../taxonomy/index.md) heading.

## Knowledge Contributions

Firstly you will need to find a source document for your knowledge. Accepted sources can be found [here](/taxonomy/knowledge/guide).
Firstly you will need to find a source document for your knowledge. Accepted sources can be found [here](../taxonomy/knowledge/guide.md).

Navigate to the Contribute section of the sidebar and click Knowledge. Here you will see the form to contribute Knowledge to the open-source taxonomy tree.

Expand Down Expand Up @@ -51,13 +51,13 @@ Here you will begin filling out your QNA examples that represent the knowledge y

### Document Information

You must prepare a markdown file version of the document you wish to use for the knowledge submission. By dragging and dropping the markdown file into the box, and clicking the submit files button, a forked version of the taxonomy repository will be automatically created on your github profile.
You must prepare a markdown file version of the document you wish to use for the knowledge submission. By dragging and dropping the markdown file into the box, and clicking the submit files button, a forked version of the taxonomy repository will be automatically created on your GitHub profile.

![UI Knowledge Document Information](../images/user-interface/ui_knowledge_document_info.png)

![Forked Repository Showcase](../images//user-interface/ui_knowledge_repo_created.png)
![Forked Repository Showcase](../images/user-interface/ui_knowledge_repo_created.png)

If you've already uploaded the markdown file to your github, you can switch to manually adding the document, and entering the `commit sha`.
If you've already uploaded the markdown file to your GitHub, you can switch to manually adding the document, and entering the `commit sha`.

![UI Knowledge Document Manual Information](../images/user-interface/ui_knowledge_document_manual_info.png)

Expand All @@ -77,4 +77,4 @@ Once you have submitted a Skills or Knowledge Contribution, you can view it on y

![UI Dashboard With Contribution](../images/user-interface/ui_dashboard_with_submission.png)

[Next Steps](/user-interface/skills_contributions/){: .md-button .md-button--primary }
[Next Steps](skills_contributions.md){: .md-button .md-button--primary }
12 changes: 6 additions & 6 deletions docs/user-interface/playground_chat.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,28 +4,28 @@ description: Steps to set up the playground to chat with a model
logo: images/ilab_dog.png
---

To run with a locally run model, make sure that iLab model serve is running in a seperate terminal. If you are unsure on how to do this, please visit the [Intro to serve and chat](/getting-started/serve_and_chat/) section of this document.
To run with a locally run model, make sure that iLab model serve is running in a separate terminal. If you are unsure on how to do this, please visit the [Intro to serve and chat](../getting-started/serve_and_chat.md) section of this document.

If you go to `Playground > Chat` by using the side navigation bar, you can interact with the merlinite and granite models.
If you go to `Playground > Chat` by using the side navigation bar, you can interact with the Merlinite and Granite models.

![UI No Model Response](../images/user-interface/ui_no_model_response.png)

If you are running the ui within a dev environment, the model won't reply because a granite/merinite model endpoint hasn't been given. In this case, we will create a new custom model endpoint, using our locally hosted quantised model.
If you are running the ui within a dev environment, the model won't reply because a Granite/Merinite model endpoint hasn't been given. In this case, we will create a new custom model endpoint, using our locally hosted quantised model.

To add a custom model endpoint, go to `Playground > Custom Model Endpoints` and press the `Add Endpoint` button on the right side.

You will have 3 fields to fill out
You will have 3 fields to fill out:

* The URL, where your customised model is hosted, if hosting locally, the URL would be `http://127.0.0.1:8000/`

* The Model Name, `merlinite-7b-lab-Q4_K_M.gguf`

* API Key, you may put any text in here; in this case I've used`randomCharacters`. If you are setting up an API key, please provide the key in this section.
* API Key, you may put any text in here; in this case I've used `randomCharacters`. If you are setting up an API key, please provide the key in this section.

![UI Custom Model Endpoint](../images/user-interface/ui_custom_model_endpoint.png)

Go back to the playground chat, select newly added model and chat.

![UI Model Response](../images/user-interface/ui_model_response.png)

[Next Steps](/user-interface/knowledge_contributions/){: .md-button .md-button--primary }
[Next Steps](knowledge_contributions.md){: .md-button .md-button--primary }
Loading

0 comments on commit 4adf2fa

Please sign in to comment.