Skip to content

Commit

Permalink
adding content to llm_explore
Browse files Browse the repository at this point in the history
  • Loading branch information
Andrew Sheet committed Oct 14, 2024
1 parent d7a04b1 commit 6b676f6
Showing 1 changed file with 20 additions and 1 deletion.
21 changes: 20 additions & 1 deletion content/modules/ROOT/pages/60_llm_explore.adoc
Original file line number Diff line number Diff line change
@@ -1,15 +1,29 @@
# What is a Large Language Model?

A Large Language Model (LLM) is an instance of a foundation model. Foundation models are pre-trained on large amounts of unlabeled and self-supervised data. This means that the model learns from patterns in the data in a way that produces generalizable and adaptable output. LLMs are instances of foundation models applied specifically to text and text-like things (code).

Large language models are trained on large datasets of text, such as books, articles and conversations. These datasets can be extremely large. We're talking petabytes of data. Training is the process of teaching the LLM to understand and generate language. It uses algorithms to learn patterns and predict what comes next. ~https://www.ibm.com/topics/large-language-models[1]~ Training an LLM with your data can help ensure that it can answer with the appropriate answer.

The term 'large' in LLM refers to the number of parameters in the model. These parameters are variables that the model uses to make predictions. The higher the number of parameters, the more detailed and nuanced the AI's understanding of language can be. However, training such models requires considerable computational resources and specialized expertise. ~https://www.run.ai/guides/machine-learning-engineering/llm-training[2]~

There are many different types of LLMs for different use cases. Be sure to choose the appropriate one for you specific use case.

# Explore LLMs

In the https://github.com/redhat-ai-services/ai-accelerator[ai-accelerator project], there is an example of an LLM. Let's look at the https://github.com/redhat-ai-services/ai-accelerator/tree/main/tenants/ai-example/single-model-serving-tgis[single-model-serving-tgis] example.

This inference service uses https://huggingface.co/google/flan-t5-small[flan-t5-small] model.

The FLAN-T5 is a Large Language Model open sourced by Google under the Apache license at the end of 2022. We are using the small size which is 80 million parameters. FLAN-T5 models use the following models and techniques: the pretrained model T5 (Text-to-Text Transfer Transformer) and the FLAN (Finetuning Language Models) collection to do fine-tuning multiple tasks.

The model has been uploaded to minio S3 automatically when we ran the bootstrap script. The inference service uses the _TGIS Standalone ServingRuntime for KServe_ and is _**not**_ using a GPU.

Take a look at the InferenceService and the ServingRuntime resource in your _**Demo**_ cluster.

Now let's take a look at the https://github.com/redhat-ai-services/ai-accelerator/tree/main/tenants/ai-example/single-model-serving-vllm[single-model-serving-vllm] example. This inference service uses IBM's https://huggingface.co/ibm-granite/granite-3b-code-base[granite-3b-code-base] model.

The Granite-3B-Code-Base-2K is a decoder-only code model designed for code generative tasks (e.g., code generation, code explanation, code fixing, etc.). It is trained from scratch with a two-phase training strategy. In phase 1, our model is trained on 4 trillion tokens sourced from 116 programming languages, ensuring a comprehensive understanding of programming languages and syntax. In phase 2, our model is trained on 500 billion tokens with a carefully designed mixture of high-quality data from code and natural language domains to improve the models’ ability to reason and follow instructions. Prominent enterprise use cases of LLMs in software engineering productivity include code generation, code explanation, code fixing, generating unit tests, generating documentation, addressing technical debt issues, vulnerability detection, code translation, and more. All Granite Code Base models, including the 3B parameter model, are able to handle these tasks as they were trained on a large amount of code data from 116 programming languages.

The Inference Service uses a vllm ServingRuntime which can be found https://github.com/rh-aiservices-bu/llm-on-openshift/blob/main/serving-runtimes/vllm_runtime/vllm-runtime.yaml[here].

### Nodes and Taints
Expand Down Expand Up @@ -62,4 +76,9 @@ After exploring the GPU Node details, open RHOAI and launch new workbench and ru
- tenants/ai-example/single-model-serving-tgis/test
- tenants/ai-example/single-model-serving-vllm/test
These are very simple tests to make sure that the InferenceService is working. View the logs of the inference service pod while you test.
These are very simple tests to make sure that the InferenceService is working. View the logs of the inference service pod while you test.


### References
1. https://www.ibm.com/topics/large-language-models[]
2. https://www.run.ai/guides/machine-learning-engineering/llm-training[]

0 comments on commit 6b676f6

Please sign in to comment.