diff --git a/docs/docs/integrations/chat/groq.ipynb b/docs/docs/integrations/chat/groq.ipynb index 01b1549a82d1b..556a9f208f42c 100644 --- a/docs/docs/integrations/chat/groq.ipynb +++ b/docs/docs/integrations/chat/groq.ipynb @@ -2,298 +2,259 @@ "cells": [ { "cell_type": "raw", - "metadata": { - "vscode": { - "languageId": "raw" - } - }, + "id": "afaf8039", + "metadata": {}, "source": [ "---\n", "sidebar_label: Groq\n", - "keywords: [chatgroq]\n", "---" ] }, { "cell_type": "markdown", + "id": "e49f1e0d", "metadata": {}, "source": [ - "# Groq\n", + "# ChatGroq\n", + "\n", + "This will help you getting started with Groq [chat models](../../concepts.mdx#chat-models). For detailed documentation of all ChatGroq features and configurations head to the [API reference](https://api.python.langchain.com/en/latest/chat_models/langchain_groq.chat_models.ChatGroq.html). For a list of all Groq models, visit this [link](https://console.groq.com/docs/models).\n", + "\n", + "## Overview\n", + "### Integration details\n", "\n", - "LangChain supports integration with [Groq](https://groq.com/) chat models. Groq specializes in fast AI inference.\n", + "| Class | Package | Local | Serializable | [JS support](https://js.langchain.com/v0.2/docs/integrations/chat/groq) | Package downloads | Package latest |\n", + "| :--- | :--- | :---: | :---: | :---: | :---: | :---: |\n", + "| [ChatGroq](https://api.python.langchain.com/en/latest/chat_models/langchain_groq.chat_models.ChatGroq.html) | [langchain-groq](https://api.python.langchain.com/en/latest/groq_api_reference.html) | ❌ | beta | ✅ | ![PyPI - Downloads](https://img.shields.io/pypi/dm/langchain-groq?style=flat-square&label=%20) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-groq?style=flat-square&label=%20) |\n", "\n", - "To get started, you'll first need to install the langchain-groq package:" + "### Model features\n", + "| [Tool calling](../../how_to/tool_calling.ipynb) | [Structured output](../../how_to/structured_output.ipynb) | JSON mode | [Image input](../../how_to/multimodal_inputs.ipynb) | Audio input | Video input | [Token-level streaming](../../how_to/chat_streaming.ipynb) | Native async | [Token usage](../../how_to/chat_token_usage_tracking.ipynb) | [Logprobs](../../how_to/logprobs.ipynb) |\n", + "| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |\n", + "| ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ | \n", + "\n", + "## Setup\n", + "\n", + "To access Groq models you'll need to create a Groq account, get an API key, and install the `langchain-groq` integration package.\n", + "\n", + "### Credentials\n", + "\n", + "Head to the [Groq console](https://console.groq.com/keys) to sign up to Groq and generate an API key. Once you've done this set the GROQ_API_KEY environment variable:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, + "id": "433e8d2b-9519-4b49-b2c4-7ab65b046c94", "metadata": {}, "outputs": [], "source": [ - "%pip install -qU langchain-groq" + "import getpass\n", + "import os\n", + "\n", + "os.environ[\"GROQ_API_KEY\"] = getpass.getpass(\"Enter your Groq API key: \")" ] }, { "cell_type": "markdown", + "id": "72ee0c4b-9764-423a-9dbf-95129e185210", "metadata": {}, "source": [ - "Request an [API key](https://wow.groq.com) and set it as an environment variable:\n", - "\n", - "```bash\n", - "export GROQ_API_KEY=\n", - "```\n", - "\n", - "Alternatively, you may configure the API key when you initialize ChatGroq.\n", - "\n", - "Here's an example of it in action:" + "If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:" ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 2, + "id": "a15d341e-3e26-4ca3-830b-5aab30ed66de", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "AIMessage(content=\"Low latency is crucial for Large Language Models (LLMs) because it directly impacts the user experience, model performance, and overall efficiency. Here are some reasons why low latency is essential for LLMs:\\n\\n1. **Real-time Interaction**: LLMs are often used in applications that require real-time interaction, such as chatbots, virtual assistants, and language translation. Low latency ensures that the model responds quickly to user input, providing a seamless and engaging experience.\\n2. **Conversational Flow**: In conversational AI, latency can disrupt the natural flow of conversation. Low latency helps maintain a smooth conversation, allowing users to respond quickly and naturally, without feeling like they're waiting for the model to catch up.\\n3. **Model Performance**: High latency can lead to increased error rates, as the model may struggle to keep up with the input pace. Low latency enables the model to process information more efficiently, resulting in better accuracy and performance.\\n4. **Scalability**: As the number of users and requests increases, low latency becomes even more critical. It allows the model to handle a higher volume of requests without sacrificing performance, making it more scalable and efficient.\\n5. **Resource Utilization**: Low latency can reduce the computational resources required to process requests. By minimizing latency, you can optimize resource allocation, reduce costs, and improve overall system efficiency.\\n6. **User Experience**: High latency can lead to frustration, abandonment, and a poor user experience. Low latency ensures that users receive timely responses, which is essential for building trust and satisfaction.\\n7. **Competitive Advantage**: In applications like customer service or language translation, low latency can be a key differentiator. It can provide a competitive advantage by offering a faster and more responsive experience, setting your application apart from others.\\n8. **Edge Computing**: With the increasing adoption of edge computing, low latency is critical for processing data closer to the user. This reduces latency even further, enabling real-time processing and analysis of data.\\n9. **Real-time Analytics**: Low latency enables real-time analytics and insights, which are essential for applications like sentiment analysis, trend detection, and anomaly detection.\\n10. **Future-Proofing**: As LLMs continue to evolve and become more complex, low latency will become even more critical. By prioritizing low latency now, you'll be better prepared to handle the demands of future LLM applications.\\n\\nIn summary, low latency is vital for LLMs because it ensures a seamless user experience, improves model performance, and enables efficient resource utilization. By prioritizing low latency, you can build more effective, scalable, and efficient LLM applications that meet the demands of real-time interaction and processing.\", response_metadata={'token_usage': {'completion_tokens': 541, 'prompt_tokens': 33, 'total_tokens': 574, 'completion_time': 1.499777658, 'prompt_time': 0.008344704, 'queue_time': None, 'total_time': 1.508122362}, 'model_name': 'llama3-70b-8192', 'system_fingerprint': 'fp_87cbfbbc4d', 'finish_reason': 'stop', 'logprobs': None}, id='run-49dad960-ace8-4cd7-90b3-2db99ecbfa44-0')" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "from langchain_core.prompts import ChatPromptTemplate\n", - "from langchain_groq import ChatGroq\n", - "\n", - "chat = ChatGroq(\n", - " temperature=0,\n", - " model=\"llama3-70b-8192\",\n", - " # api_key=\"\" # Optional if not set as an environment variable\n", - ")\n", - "\n", - "system = \"You are a helpful assistant.\"\n", - "human = \"{text}\"\n", - "prompt = ChatPromptTemplate.from_messages([(\"system\", system), (\"human\", human)])\n", - "\n", - "chain = prompt | chat\n", - "chain.invoke({\"text\": \"Explain the importance of low latency for LLMs.\"})" + "# os.environ[\"LANGSMITH_API_KEY\"] = getpass.getpass(\"Enter your LangSmith API key: \")\n", + "# os.environ[\"LANGSMITH_TRACING\"] = \"true\"" ] }, { "cell_type": "markdown", + "id": "0730d6a1-c893-4840-9817-5e5251676d5d", "metadata": {}, "source": [ - "You can view the available models [here](https://console.groq.com/docs/models).\n", - "\n", - "## Tool calling\n", - "\n", - "Groq chat models support [tool calling](/docs/how_to/tool_calling) to generate output matching a specific schema. The model may choose to call multiple tools or the same tool multiple times if appropriate.\n", + "### Installation\n", "\n", - "Here's an example:" + "The LangChain Groq integration lives in the `langchain-groq` package:" ] }, { "cell_type": "code", - "execution_count": 10, + "execution_count": 3, + "id": "652d6238-1f87-422a-b135-f5abbb8652fc", "metadata": {}, "outputs": [ { - "data": { - "text/plain": [ - "[{'name': 'get_current_weather',\n", - " 'args': {'location': 'San Francisco', 'unit': 'Celsius'},\n", - " 'id': 'call_pydj'},\n", - " {'name': 'get_current_weather',\n", - " 'args': {'location': 'Tokyo', 'unit': 'Celsius'},\n", - " 'id': 'call_jgq3'}]" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m24.1.2\u001b[0m\n", + "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n", + "Note: you may need to restart the kernel to use updated packages.\n" + ] } ], "source": [ - "from typing import Optional\n", - "\n", - "from langchain_core.tools import tool\n", - "\n", - "\n", - "@tool\n", - "def get_current_weather(location: str, unit: Optional[str]):\n", - " \"\"\"Get the current weather in a given location\"\"\"\n", - " return \"Cloudy with a chance of rain.\"\n", - "\n", - "\n", - "tool_model = chat.bind_tools([get_current_weather], tool_choice=\"auto\")\n", - "\n", - "res = tool_model.invoke(\"What is the weather like in San Francisco and Tokyo?\")\n", - "\n", - "res.tool_calls" + "%pip install -qU langchain-groq" ] }, { "cell_type": "markdown", + "id": "a38cde65-254d-4219-a441-068766c0d4b5", "metadata": {}, "source": [ - "### `.with_structured_output()`\n", + "## Instantiation\n", "\n", - "You can also use the convenience [`.with_structured_output()`](/docs/how_to/structured_output/#the-with_structured_output-method) method to coerce `ChatGroq` into returning a structured output.\n", - "Here is an example:" + "Now we can instantiate our model object and generate chat completions:" ] }, { "cell_type": "code", - "execution_count": 11, + "execution_count": 4, + "id": "cb09c344-1836-4e0c-acf8-11d13ac1dbae", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Joke(setup='Why did the cat join a band?', punchline='Because it wanted to be the purr-cussionist!', rating=None)" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "from langchain_core.pydantic_v1 import BaseModel, Field\n", - "\n", - "\n", - "class Joke(BaseModel):\n", - " \"\"\"Joke to tell user.\"\"\"\n", - "\n", - " setup: str = Field(description=\"The setup of the joke\")\n", - " punchline: str = Field(description=\"The punchline to the joke\")\n", - " rating: Optional[int] = Field(description=\"How funny the joke is, from 1 to 10\")\n", - "\n", - "\n", - "structured_llm = chat.with_structured_output(Joke)\n", + "from langchain_groq import ChatGroq\n", "\n", - "structured_llm.invoke(\"Tell me a joke about cats\")" + "llm = ChatGroq(\n", + " model=\"mixtral-8x7b-32768\",\n", + " temperature=0,\n", + " max_tokens=None,\n", + " timeout=None,\n", + " max_retries=2,\n", + " # other params...\n", + ")" ] }, { "cell_type": "markdown", + "id": "2b4f3e15", "metadata": {}, "source": [ - "Behind the scenes, this takes advantage of the above tool calling functionality.\n", - "\n", - "## Async" + "## Invocation" ] }, { "cell_type": "code", - "execution_count": 12, - "metadata": {}, + "execution_count": 5, + "id": "62e0dbc3", + "metadata": { + "tags": [] + }, "outputs": [ { "data": { "text/plain": [ - "AIMessage(content='Here is a limerick about the sun:\\n\\nThere once was a sun in the sky,\\nWhose warmth and light caught the eye,\\nIt shone bright and bold,\\nWith a fiery gold,\\nAnd brought life to all, as it flew by.', response_metadata={'token_usage': {'completion_tokens': 51, 'prompt_tokens': 18, 'total_tokens': 69, 'completion_time': 0.144614022, 'prompt_time': 0.00585394, 'queue_time': None, 'total_time': 0.150467962}, 'model_name': 'llama3-70b-8192', 'system_fingerprint': 'fp_2f30b0b571', 'finish_reason': 'stop', 'logprobs': None}, id='run-e42340ba-f0ad-4b54-af61-8308d8ec8256-0')" + "AIMessage(content='I enjoy programming. (The French translation is: \"J\\'aime programmer.\")\\n\\nNote: I chose to translate \"I love programming\" as \"J\\'aime programmer\" instead of \"Je suis amoureux de programmer\" because the latter has a romantic connotation that is not present in the original English sentence.', response_metadata={'token_usage': {'completion_tokens': 73, 'prompt_tokens': 31, 'total_tokens': 104, 'completion_time': 0.1140625, 'prompt_time': 0.003352463, 'queue_time': None, 'total_time': 0.117414963}, 'model_name': 'mixtral-8x7b-32768', 'system_fingerprint': 'fp_c5f20b5bb1', 'finish_reason': 'stop', 'logprobs': None}, id='run-64433c19-eadf-42fc-801e-3071e3c40160-0', usage_metadata={'input_tokens': 31, 'output_tokens': 73, 'total_tokens': 104})" ] }, - "execution_count": 12, + "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "chat = ChatGroq(temperature=0, model=\"llama3-70b-8192\")\n", - "prompt = ChatPromptTemplate.from_messages([(\"human\", \"Write a Limerick about {topic}\")])\n", - "chain = prompt | chat\n", - "await chain.ainvoke({\"topic\": \"The Sun\"})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Streaming" + "messages = [\n", + " (\n", + " \"system\",\n", + " \"You are a helpful assistant that translates English to French. Translate the user sentence.\",\n", + " ),\n", + " (\"human\", \"I love programming.\"),\n", + "]\n", + "ai_msg = llm.invoke(messages)\n", + "ai_msg" ] }, { "cell_type": "code", - "execution_count": 13, + "execution_count": 6, + "id": "d86145b3-bfef-46e8-b227-4dda5c9c2705", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Silvery glow bright\n", - "Luna's gentle light shines down\n", - "Midnight's gentle queen" + "I enjoy programming. (The French translation is: \"J'aime programmer.\")\n", + "\n", + "Note: I chose to translate \"I love programming\" as \"J'aime programmer\" instead of \"Je suis amoureux de programmer\" because the latter has a romantic connotation that is not present in the original English sentence.\n" ] } ], "source": [ - "chat = ChatGroq(temperature=0, model=\"llama3-70b-8192\")\n", - "prompt = ChatPromptTemplate.from_messages([(\"human\", \"Write a haiku about {topic}\")])\n", - "chain = prompt | chat\n", - "for chunk in chain.stream({\"topic\": \"The Moon\"}):\n", - " print(chunk.content, end=\"\", flush=True)" + "print(ai_msg.content)" ] }, { "cell_type": "markdown", + "id": "18e2bfc0-7e78-4528-a73f-499ac150dca8", "metadata": {}, "source": [ - "## Passing custom parameters\n", + "## Chaining\n", "\n", - "You can pass other Groq-specific parameters using the `model_kwargs` argument on initialization. Here's an example of enabling JSON mode:" + "We can [chain](../../how_to/sequence.ipynb) our model with a prompt template like so:" ] }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 7, + "id": "e197d1d7-a070-4c96-9f8a-a0e86d046e0b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "AIMessage(content='{ \"response\": \"That\\'s a tough question! There are eight species of bears found in the world, and each one is unique and amazing in its own way. However, if I had to pick one, I\\'d say the giant panda is a popular favorite among many people. Who can resist those adorable black and white markings?\", \"followup_question\": \"Would you like to know more about the giant panda\\'s habitat and diet?\" }', response_metadata={'token_usage': {'completion_tokens': 89, 'prompt_tokens': 50, 'total_tokens': 139, 'completion_time': 0.249032839, 'prompt_time': 0.011134497, 'queue_time': None, 'total_time': 0.260167336}, 'model_name': 'llama3-70b-8192', 'system_fingerprint': 'fp_2f30b0b571', 'finish_reason': 'stop', 'logprobs': None}, id='run-558ce67e-8c63-43fe-a48f-6ecf181bc922-0')" + "AIMessage(content='That\\'s great! I can help you translate English phrases related to programming into German.\\n\\n\"I love programming\" can be translated as \"Ich liebe Programmieren\" in German.\\n\\nHere are some more programming-related phrases translated into German:\\n\\n* \"Programming language\" = \"Programmiersprache\"\\n* \"Code\" = \"Code\"\\n* \"Variable\" = \"Variable\"\\n* \"Function\" = \"Funktion\"\\n* \"Array\" = \"Array\"\\n* \"Object-oriented programming\" = \"Objektorientierte Programmierung\"\\n* \"Algorithm\" = \"Algorithmus\"\\n* \"Data structure\" = \"Datenstruktur\"\\n* \"Debugging\" = \"Fehlersuche\"\\n* \"Compile\" = \"Kompilieren\"\\n* \"Link\" = \"Verknüpfen\"\\n* \"Run\" = \"Ausführen\"\\n* \"Test\" = \"Testen\"\\n* \"Deploy\" = \"Bereitstellen\"\\n* \"Version control\" = \"Versionskontrolle\"\\n* \"Open source\" = \"Open Source\"\\n* \"Software development\" = \"Softwareentwicklung\"\\n* \"Agile methodology\" = \"Agile Methodik\"\\n* \"DevOps\" = \"DevOps\"\\n* \"Cloud computing\" = \"Cloud Computing\"\\n\\nI hope this helps! Let me know if you have any other questions or if you need further translations.', response_metadata={'token_usage': {'completion_tokens': 331, 'prompt_tokens': 25, 'total_tokens': 356, 'completion_time': 0.520006542, 'prompt_time': 0.00250165, 'queue_time': None, 'total_time': 0.522508192}, 'model_name': 'mixtral-8x7b-32768', 'system_fingerprint': 'fp_c5f20b5bb1', 'finish_reason': 'stop', 'logprobs': None}, id='run-74207fb7-85d3-417d-b2b9-621116b75d41-0', usage_metadata={'input_tokens': 25, 'output_tokens': 331, 'total_tokens': 356})" ] }, - "execution_count": 15, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "chat = ChatGroq(\n", - " model=\"llama3-70b-8192\", model_kwargs={\"response_format\": {\"type\": \"json_object\"}}\n", - ")\n", - "\n", - "system = \"\"\"\n", - "You are a helpful assistant.\n", - "Always respond with a JSON object with two string keys: \"response\" and \"followup_question\".\n", - "\"\"\"\n", - "human = \"{question}\"\n", - "prompt = ChatPromptTemplate.from_messages([(\"system\", system), (\"human\", human)])\n", + "from langchain_core.prompts import ChatPromptTemplate\n", "\n", - "chain = prompt | chat\n", + "prompt = ChatPromptTemplate.from_messages(\n", + " [\n", + " (\n", + " \"system\",\n", + " \"You are a helpful assistant that translates {input_language} to {output_language}.\",\n", + " ),\n", + " (\"human\", \"{input}\"),\n", + " ]\n", + ")\n", "\n", - "chain.invoke({\"question\": \"what bear is best?\"})" + "chain = prompt | llm\n", + "chain.invoke(\n", + " {\n", + " \"input_language\": \"English\",\n", + " \"output_language\": \"German\",\n", + " \"input\": \"I love programming.\",\n", + " }\n", + ")" ] }, { - "cell_type": "code", - "execution_count": null, + "cell_type": "markdown", + "id": "3a5bb5ca-c3ae-4a58-be67-2cd18574b9a3", "metadata": {}, - "outputs": [], - "source": [] + "source": [ + "## API reference\n", + "\n", + "For detailed documentation of all ChatGroq features and configurations head to the API reference: https://api.python.langchain.com/en/latest/chat_models/langchain_groq.chat_models.ChatGroq.html" + ] } ], "metadata": { "kernelspec": { - "display_name": ".venv", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -307,9 +268,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.5" + "version": "3.11.9" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 5 } diff --git a/libs/partners/groq/langchain_groq/chat_models.py b/libs/partners/groq/langchain_groq/chat_models.py index 661ca4b5919ed..c7bed62bb803e 100644 --- a/libs/partners/groq/langchain_groq/chat_models.py +++ b/libs/partners/groq/langchain_groq/chat_models.py @@ -95,13 +95,6 @@ class ChatGroq(BaseChatModel): Any parameters that are valid to be passed to the groq.create call can be passed in, even if not explicitly saved on this class. - Example: - .. code-block:: python - - from langchain_groq import ChatGroq - - model = ChatGroq(model_name="mixtral-8x7b-32768") - Setup: Install ``langchain-groq`` and set environment variable ``GROQ_API_KEY``. @@ -143,12 +136,12 @@ class ChatGroq(BaseChatModel): from langchain_groq import ChatGroq - model = ChatGroq( + llm = ChatGroq( model="mixtral-8x7b-32768", temperature=0.0, max_retries=2, # other params... - ) + ) Invoke: .. code-block:: python @@ -158,7 +151,7 @@ class ChatGroq(BaseChatModel): sentence to French."), ("human", "I love programming."), ] - model.invoke(messages) + llm.invoke(messages) .. code-block:: python @@ -175,7 +168,7 @@ class ChatGroq(BaseChatModel): Stream: .. code-block:: python - for chunk in model.stream(messages): + for chunk in llm.stream(messages): print(chunk) .. code-block:: python @@ -192,7 +185,7 @@ class ChatGroq(BaseChatModel): .. code-block:: python - stream = model.stream(messages) + stream = llm.stream(messages) full = next(stream) for chunk in stream: full += chunk @@ -215,7 +208,7 @@ class ChatGroq(BaseChatModel): Async: .. code-block:: python - await model.ainvoke(messages) + await llm.ainvoke(messages) .. code-block:: python @@ -247,7 +240,7 @@ class GetPopulation(BaseModel): location: str = Field(..., description="The city and state, e.g. San Francisco, CA") - model_with_tools = model.bind_tools([GetWeather, GetPopulation]) + model_with_tools = llm.bind_tools([GetWeather, GetPopulation]) ai_msg = model_with_tools.invoke("What is the population of NY?") ai_msg.tool_calls @@ -274,7 +267,7 @@ class Joke(BaseModel): rating: Optional[int] = Field(description="How funny the joke is, from 1 to 10") - structured_model = model.with_structured_output(Joke) + structured_model = llm.with_structured_output(Joke) structured_model.invoke("Tell me a joke about cats") .. code-block:: python @@ -287,7 +280,7 @@ class Joke(BaseModel): Response metadata .. code-block:: python - ai_msg = model.invoke(messages) + ai_msg = llm.invoke(messages) ai_msg.response_metadata .. code-block:: python