From 53a17feb08ba602843335a9515b3ad3c4d3586bd Mon Sep 17 00:00:00 2001 From: Gabriele Venturi Date: Thu, 8 Jun 2023 02:23:27 +0200 Subject: [PATCH] feat: add support for Falcon LLM At the moment, however, the Falcon LLM seems to hallucinate quite often, we should support custom prompts for each LLM --- Notebooks/Getting Started.ipynb | 2 +- README.md | 5 +++- docs/API/llms.md | 8 +++++ docs/pai_cli.md | 26 ++++++++++++----- pai/__main__.py | 52 ++++++++++++++++++++++----------- pandasai/llm/falcon.py | 51 ++++++++++++++++++++++++++++++++ pandasai/llm/open_assistant.py | 7 ++--- tests/llms/test_falcon.py | 17 +++++++++++ 8 files changed, 138 insertions(+), 30 deletions(-) create mode 100644 pandasai/llm/falcon.py create mode 100644 tests/llms/test_falcon.py diff --git a/Notebooks/Getting Started.ipynb b/Notebooks/Getting Started.ipynb index 51f39be9e..b51dc9f2a 100644 --- a/Notebooks/Getting Started.ipynb +++ b/Notebooks/Getting Started.ipynb @@ -1 +1 @@ -{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[{"file_id":"1XwQuNfgLtgcGd2yWMzBRXTR0bZ5Vjm1C","timestamp":1685076114938},{"file_id":"https://github.com/amjadraza/learn-it/blob/master/Getting_Started_pandas_ai.ipynb","timestamp":1684888879263}]},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["#[PANDASAI](https://github.com/gventuri/pandas-ai) : A generative AI capabilities to Pandas Dataframes\n","\n"],"metadata":{"id":"8OnF6qsbDG1t"}},{"cell_type":"markdown","source":["## Supported Models\n","\n","The Pandasai API currently supports several models, and it is continuously being developed with the possibility of adding more models in the future. The supported models include:\n","\n","1. ChatGPT by OpenAI\n","2. StarCoder by Huggingface\n","3. Azure ChatGPT API\n","4. OpenAI Assistant\n","5. Google PaLM\n","\n","These models can be utilized in a conversational form, allowing users to interact with them effectively.\n","\n","To facilitate the usage of these models, this tutorial provides guidance on running them using Google Colab. Google Colab is a platform that enables straightforward onboarding and provides a Google Colab Notebook, which simplifies the process of getting started.\n","\n","By leveraging the provided Google Colab Notebook, users can easily run the supported models within a conversational context, enhancing the overall interactive experience and enabling efficient data analysis.\n","\n","Please note that as the Pandasai API evolves, additional models may be incorporated, expanding the range of options available to users."],"metadata":{"id":"zYGici1BqvD4"}},{"cell_type":"markdown","source":["## Learning Objectives:\n","This tutorial aims to help you achieve the following learning objectives:\n","\n","1. Install the pandasai library.\n","2. Set up the API_TOKEN using either the OpenAI platform or the Hugging Face platform.\n","3. Explore and run basic functionalities provided by pandasai.\n","4. Execute examples using pandasai with predefined prompts for experimentation.\n","\n","Please note that while the OpenAI platform is not free, it is still reasonably priced for running a few queries. On the other hand, the StartCoder model by Hugging Face is available for personal use at no cost.\n"],"metadata":{"id":"mFPp6trsrauF"}},{"cell_type":"markdown","source":["## Prerequisites\n","Before proceeding with this tutorial, make sure you meet the following prerequisites:\n","\n","1. Basic understanding of Python, Pandas, and APIs for generative models.\n","2. Obtain API tokens from the openai platform and/or Hugging Face, depending on the models you intend to use.\n","3. Demonstrated eagerness to learn and develop your skills.\n","\n","Having a foundational knowledge of Python, Pandas, and generative models APIs will be beneficial for understanding the concepts covered in this tutorial. Additionally, acquiring the necessary API tokens from the respective platforms will enable you to interact with the models effectively.\n","\n","It's important to approach this tutorial with a positive attitude and a willingness to learn and improve your skills. Through active participation and a growth mindset, you will be able to make the most of this learning opportunity."],"metadata":{"id":"yJjtSdTirwi1"}},{"cell_type":"markdown","source":["# Getting Started\n","\n","Installing `Pandasai` is pretty striaght forward using `pip`. The recent releases are hosted on [Pnadasai Pypi](https://pypi.org/project/pandasai/) page. Always check the version, you are going to install. Make sure, check out the [Issues](https://github.com/gventuri/pandas-ai/issues) page of its GitHub Reporistry for any problems you face. "],"metadata":{"id":"-VQE9nGwDaiG"}},{"cell_type":"markdown","source":[],"metadata":{"id":"MtzpjcogPC4z"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"qtqSnCgwDDCg"},"outputs":[],"source":["!pip install --upgrade pandas pandasai"]},{"cell_type":"markdown","source":["Now we import the dependencies:"],"metadata":{"id":"DrrlU4QqDmxl"}},{"cell_type":"markdown","source":["## Generate OPENAI API Token\n","\n","Users are required to generate `YOUR_API_TOKEN`. Follow below simple steps to generate your API_TOKEN with \n","[openai](https://platform.openai.com/overview).\n","\n","1. Go to https://openai.com/api/ and signup with your email address or connect your Google Account.\n","2. Go to View API Keys on left side of your Personal Account Settings\n","3. Select Create new Secret key\n","\n","> The API access to openai is a paid service. You have to set up billing. \n",">Read the [Pricing](https://platform.openai.com/docs/quickstart/pricing) information before experimenting."],"metadata":{"id":"8S1QClHBfcWV"}},{"cell_type":"markdown","source":["## Generate HUGGING FACE PLATFORM API Token\n","\n","It will take around 2 mins to generate API token.\n","\n","Users are required to generate `YOUR_API_TOKEN`. Follow below simple steps to generate your API_TOKEN with [Hugging Face](https://huggingface.co/).\n","\n","1. Go to https://huggingface.co/ and signup with your email address or connect your Google Account.\n","2. Go to https://huggingface.co/settings/tokens and generate User Access Token.\n","3. Copy and Save securely for Personal Use\n","\n","> Hugging Face API keys are FREE for personal / educational use."],"metadata":{"id":"iAtFdufpPN5F"}},{"cell_type":"code","source":["# Import basis libraries\n","import pandas as pd\n","import pandasai as pdai\n","from pandasai import PandasAI\n","from pandasai.llm.openai import OpenAI\n","from pandasai.llm.open_assistant import OpenAssistant\n","from pandasai.llm.starcoder import Starcoder"],"metadata":{"id":"ohPMkz4zDqxz"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# Define List of Models\n","models = {\n"," \"OpenAI\": OpenAI,\n"," \"Starcoder\": Starcoder,\n"," \"Open-Assistant\": OpenAssistant\n","}\n"],"metadata":{"id":"QtLlsddlRSAO"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["#@title Select Model to Run\n","model_to_run = 'OpenAI' #@param [\"OpenAI\", \"Starcoder\", \"Open-Assistant\"]\n","print(f\"Enter API for {model_to_run} platform\")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"sGElcw0NRxeK","executionInfo":{"status":"ok","timestamp":1684891932779,"user_tz":-600,"elapsed":307,"user":{"displayName":"Amjad Raza","userId":"06768084019046926999"}},"outputId":"2b02aef7-23db-4bda-8445-d1b97dd9a58c"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Enter API for OpenAI platform\n"]}]},{"cell_type":"code","source":["# Enter API Key\n","API_KEY = '' #@param {type:\"string\"}"],"metadata":{"id":"DS1QdmQJgKm9"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["We create a dataframe using pandas:"],"metadata":{"id":"JAO_SzlyDr5x"}},{"cell_type":"code","source":["df = pd.DataFrame({\n"," \"country\": [\"United States\", \"United Kingdom\", \"France\", \"Germany\", \"Italy\", \"Spain\", \"Canada\", \"Australia\", \"Japan\", \"China\"],\n"," \"gdp\": [21400000, 2940000, 2830000, 3870000, 2160000, 1350000, 1780000, 1320000, 516000, 14000000],\n"," \"happiness_index\": [7.3, 7.2, 6.5, 7.0, 6.0, 6.3, 7.3, 7.3, 5.9, 5.0]\n","})"],"metadata":{"id":"nyRo07nqD08l"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["After loading the Dataframe and input of Prompt, now we are ready to run the Prompt with given dataframe and query. As a first setp, we instantiate the llm model with selected option. In order to understand, what is going under the hood, we set `verbose=True`. Let us run now!"],"metadata":{"id":"aeTOQettEfWT"}},{"cell_type":"code","source":["# Model Initialisation\n","llm = models[model_to_run](api_token=API_KEY)\n","pandas_ai = PandasAI(llm, conversational=False, verbose=True)\n"],"metadata":{"id":"HZnx0V18EkzS"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# Enter Prompt related to data or Select from Pre-defined for demo purposes.\n","prompt = 'Plot the histogram of countries showing for each the gpd, using different colors for each bar' #@param [ \"What is the relation between GDP and Happines Index\", \"Plot the histogram of countries showing for each the gpd, using different colors for each bar\", \"GDP of Top 5 Happiest Countries?\"] {allow-input: true}\n"],"metadata":{"id":"BQs22IO-Ti_q"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["response = pandas_ai.run(df, prompt=prompt,\n"," is_conversational_answer=False)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"m7_erVstXW9V","executionInfo":{"status":"ok","timestamp":1684891977617,"user_tz":-600,"elapsed":11398,"user":{"displayName":"Amjad Raza","userId":"06768084019046926999"}},"outputId":"7fad4db4-4758-4813-b8f0-bb619bd86a24"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Running PandasAI with openai LLM...\n","\n","Code generated:\n","```\n","import matplotlib.pyplot as plt\n","\n","colors = ['red', 'blue', 'green', 'orange', 'purple', 'pink', 'brown', 'gray', 'olive', 'cyan']\n","plt.bar(df['country'], df['gdp'], color=colors)\n","plt.xticks(rotation=90)\n","plt.xlabel('Country')\n","plt.ylabel('GDP')\n","plt.title('GDP by Country')\n","plt.show()\n","```\n","\n","Code running:\n","```\n","colors = ['red', 'blue', 'green', 'orange', 'purple', 'pink', 'brown',\n"," 'gray', 'olive', 'cyan']\n","plt.bar(df['country'], df['gdp'], color=colors)\n","plt.xticks(rotation=90)\n","plt.xlabel('Country')\n","plt.ylabel('GDP')\n","plt.title('GDP by Country')\n","plt.show()\n","```\n"]},{"output_type":"display_data","data":{"text/plain":["
"],"image/png":"\n"},"metadata":{}},{"output_type":"stream","name":"stdout","text":["Answer: \n"]}]},{"cell_type":"markdown","source":["As you can see from above experiments, having `verbose=True`, we can see the code generated by LLMs API and then this code is run to produce and answer on complete dataset. "],"metadata":{"id":"3w4BLXuVY4jO"}},{"cell_type":"markdown","source":["## Play Around\n","\n","Users can play around various questions about the tiny dataset we generated above. Using this notebook, you can select between Model Options and ask questions based on your problem."],"metadata":{"id":"2A44RePXXldV"}},{"cell_type":"markdown","source":["## More Examples\n","\n","In above section, we showed a little demo with small dataframe. Usually data is stored in `.csv` , .=`.xlsx` and other formats. `Pandasai` treats any data uploaded as Pandas dataframe and proceed accordingly. \n","\n","In this section, we include some of the [examples](https://github.com/gventuri/pandas-ai/tree/main/examples) shipped with `pandasai` reporsitory.\n"],"metadata":{"id":"iWH56NfdiioM"}},{"cell_type":"code","source":["#Loading CSV file \n","import pandas as pd\n","from pandasai import PandasAI\n","from pandasai.llm.openai import OpenAI\n","file_path ='https://raw.githubusercontent.com/gventuri/pandas-ai/main/examples/data/Loan%20payments%20data.csv'\n","df = pd.read_csv(file_path)"],"metadata":{"id":"WnP-JPZzcFrJ"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["llm = OpenAI(api_token=API_KEY)\n","pandas_ai = PandasAI(llm, verbose=True)"],"metadata":{"id":"gupY58GRcFa4"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# Enter Prompt related to data or Select from Pre-defined for demo purposes.\n","\n","prompt = 'Generate bar Plot of Loans Paid by Men & Women' #@param [ \"How many loans are from Women that have been paid off?\", \"Generate bar Plot of Loans Paid by Men & Women\"] {allow-input: true}"],"metadata":{"id":"FOK-uaPBcFOq"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["response = pandas_ai.run(df, prompt = prompt)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"GXeX1QghiC0F","outputId":"30842d6a-af1b-4a52-d0b2-03bc10008f0e","executionInfo":{"status":"ok","timestamp":1684893610216,"user_tz":-600,"elapsed":36186,"user":{"displayName":"Amjad Raza","userId":"06768084019046926999"}}},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Running PandasAI with openai LLM...\n","\n","Code generated:\n","```\n","import pandas as pd\n","import matplotlib.pyplot as plt\n","\n","# Load the dataframe\n","df = pd.read_csv('loan_data.csv')\n","\n","# Filter the dataframe to only include paid off loans\n","paid_off_df = df[df['loan_status'] == 'PAIDOFF']\n","\n","# Group the dataframe by gender and loan status\n","gender_grouped_df = paid_off_df.groupby(['Gender', 'loan_status']).count()['Loan_ID']\n","\n","# Create a bar plot\n","gender_grouped_df.plot(kind='bar')\n","\n","# Set the title and axis labels\n","plt.title('Loans Paid by Men & Women')\n","plt.xlabel('Gender, Loan Status')\n","plt.ylabel('Number of Loans')\n","\n","# Show the plot\n","plt.show()\n","```\n","\n","Code running:\n","```\n","paid_off_df = df[df['loan_status'] == 'PAIDOFF']\n","gender_grouped_df = paid_off_df.groupby(['Gender', 'loan_status']).count()[\n"," 'Loan_ID']\n","gender_grouped_df.plot(kind='bar')\n","plt.title('Loans Paid by Men & Women')\n","plt.xlabel('Gender, Loan Status')\n","plt.ylabel('Number of Loans')\n","plt.show()\n","```\n"]},{"output_type":"display_data","data":{"text/plain":["
"],"image/png":"\n"},"metadata":{}},{"output_type":"stream","name":"stdout","text":["Answer: \n","Conversational answer: Sure, I can help you with that! To generate a bar plot of loans paid by men and women, we can use a data visualization tool like Matplotlib or Seaborn in Python. We'll need to have a dataset that includes information on the loan amounts paid by both men and women. Once we have that, we can create a bar plot with two bars - one for men and one for women - and the height of each bar will represent the total loan amount paid by that gender. This will give us a visual representation of the difference in loan payments between men and women.\n"]}]},{"cell_type":"code","source":["#Let us run the code ourselves and find out the results matches or not?\n","import pandas as pd\n","file_path ='https://raw.githubusercontent.com/gventuri/pandas-ai/main/examples/data/Loan%20payments%20data.csv'\n","df = pd.read_csv(file_path)\n","num_loans = len(df[(df['Gender'] == 'male') & (df['loan_status'] == 'PAIDOFF')]\n"," )\n","print(num_loans)"],"metadata":{"id":"J8K29pWolDbI","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1684892330884,"user_tz":-600,"elapsed":326,"user":{"displayName":"Amjad Raza","userId":"06768084019046926999"}},"outputId":"29a49e10-8786-438b-f3fb-d123386e5a2a"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["247\n"]}]},{"cell_type":"code","source":["import pandas as pd\n","file_path ='https://raw.githubusercontent.com/gventuri/pandas-ai/main/examples/data/Loan%20payments%20data.csv'\n","df = pd.read_csv(file_path)\n","num_loans = len(df[(df['Gender'] == 'female') & (df['loan_status'] == 'PAIDOFF')]\n"," )\n","print(num_loans)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"WgUMtfAOe0t4","executionInfo":{"status":"ok","timestamp":1684893561284,"user_tz":-600,"elapsed":352,"user":{"displayName":"Amjad Raza","userId":"06768084019046926999"}},"outputId":"26b23a27-11b5-4a1b-f724-b5cd587af1d8"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["53\n"]}]},{"cell_type":"markdown","source":["The answers generated by Pandas AI and manually are same. However, it may not be a case always. "],"metadata":{"id":"UbeJXL3wkmXE"}},{"cell_type":"markdown","source":["\n","# Remarks\n","In this concise tutorial, we have explored the Pandasai Library and gained insight into its higher-level architecture. This library offers a convenient solution for individuals to inquire about their data without the need to train in-house Large Language Models (LLMs) on company data. While there are various potential applications for this tool, it's important to remain mindful that the generated code by LLMs may occasionally produce unexpected outputs.\n","\n","It's worth emphasizing that Pandas AI is an actively developed project, indicating ongoing efforts to improve and enhance its functionality. With dedicated contributors, exciting features are being developed and added to this library.\n","\n","> **Stay tuned! There's more to come in the world of Pandas AI as it evolves and continues to empower users in their data exploration and analysis endeavors.**\n"],"metadata":{"id":"AQ11XG3JaOH3"}},{"cell_type":"code","source":[],"metadata":{"id":"0NrbEwgFXswe"},"execution_count":null,"outputs":[]}]} \ No newline at end of file +{"cells":[{"attachments":{},"cell_type":"markdown","metadata":{"id":"8OnF6qsbDG1t"},"source":["#[PANDASAI](https://github.com/gventuri/pandas-ai) : A generative AI capabilities to Pandas Dataframes\n","\n"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"zYGici1BqvD4"},"source":["## Supported Models\n","\n","The Pandasai API currently supports several models, and it is continuously being developed with the possibility of adding more models in the future. The supported models include:\n","\n","1. OpenAI\n","2. StarCoder by Huggingface\n","3. Falcon by Huggingface\n","4. Azure ChatGPT API\n","5. OpenAssistant\n","6. Google PaLM\n","\n","These models can be utilized in a conversational form, allowing users to interact with them effectively.\n","\n","To facilitate the usage of these models, this tutorial provides guidance on running them using Google Colab. Google Colab is a platform that enables straightforward onboarding and provides a Google Colab Notebook, which simplifies the process of getting started.\n","\n","By leveraging the provided Google Colab Notebook, users can easily run the supported models within a conversational context, enhancing the overall interactive experience and enabling efficient data analysis.\n","\n","Please note that as the Pandasai API evolves, additional models may be incorporated, expanding the range of options available to users."]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"mFPp6trsrauF"},"source":["## Learning Objectives:\n","This tutorial aims to help you achieve the following learning objectives:\n","\n","1. Install the pandasai library.\n","2. Set up the API_TOKEN using either the OpenAI platform or the Hugging Face platform.\n","3. Explore and run basic functionalities provided by pandasai.\n","4. Execute examples using pandasai with predefined prompts for experimentation.\n","\n","Please note that while the OpenAI platform is not free, it is still reasonably priced for running a few queries. On the other hand, the StartCoder model by Hugging Face is available for personal use at no cost.\n"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"yJjtSdTirwi1"},"source":["## Prerequisites\n","Before proceeding with this tutorial, make sure you meet the following prerequisites:\n","\n","1. Basic understanding of Python, Pandas, and APIs for generative models.\n","2. Obtain API tokens from the openai platform and/or Hugging Face, depending on the models you intend to use.\n","3. Demonstrated eagerness to learn and develop your skills.\n","\n","Having a foundational knowledge of Python, Pandas, and generative models APIs will be beneficial for understanding the concepts covered in this tutorial. Additionally, acquiring the necessary API tokens from the respective platforms will enable you to interact with the models effectively.\n","\n","It's important to approach this tutorial with a positive attitude and a willingness to learn and improve your skills. Through active participation and a growth mindset, you will be able to make the most of this learning opportunity."]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"-VQE9nGwDaiG"},"source":["# Getting Started\n","\n","Installing `Pandasai` is pretty striaght forward using `pip`. The recent releases are hosted on [Pnadasai Pypi](https://pypi.org/project/pandasai/) page. Always check the version, you are going to install. Make sure, check out the [Issues](https://github.com/gventuri/pandas-ai/issues) page of its GitHub Reporistry for any problems you face. "]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"MtzpjcogPC4z"},"source":[]},{"cell_type":"code","execution_count":null,"metadata":{"id":"qtqSnCgwDDCg"},"outputs":[],"source":["!pip install --upgrade pandas pandasai"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"DrrlU4QqDmxl"},"source":["Now we import the dependencies:"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"8S1QClHBfcWV"},"source":["## Generate OPENAI API Token\n","\n","Users are required to generate `YOUR_API_TOKEN`. Follow below simple steps to generate your API_TOKEN with \n","[openai](https://platform.openai.com/overview).\n","\n","1. Go to https://openai.com/api/ and signup with your email address or connect your Google Account.\n","2. Go to View API Keys on left side of your Personal Account Settings\n","3. Select Create new Secret key\n","\n","> The API access to openai is a paid service. You have to set up billing. \n",">Read the [Pricing](https://platform.openai.com/docs/quickstart/pricing) information before experimenting."]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"iAtFdufpPN5F"},"source":["## Generate HUGGING FACE PLATFORM API Token\n","\n","It will take around 2 mins to generate API token.\n","\n","Users are required to generate `YOUR_API_TOKEN`. Follow below simple steps to generate your API_TOKEN with [Hugging Face](https://huggingface.co/).\n","\n","1. Go to https://huggingface.co/ and signup with your email address or connect your Google Account.\n","2. Go to https://huggingface.co/settings/tokens and generate User Access Token.\n","3. Copy and Save securely for Personal Use\n","\n","> Hugging Face API keys are FREE for personal / educational use."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"ohPMkz4zDqxz"},"outputs":[],"source":["# Import basis libraries\n","import pandas as pd\n","from pandasai import PandasAI\n","from pandasai.llm.openai import OpenAI\n","from pandasai.llm.open_assistant import OpenAssistant\n","from pandasai.llm.starcoder import Starcoder"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"QtLlsddlRSAO"},"outputs":[],"source":["# Define List of Models\n","models = {\n"," \"OpenAI\": OpenAI,\n"," \"Starcoder\": Starcoder,\n"," \"Open-Assistant\": OpenAssistant\n","}\n"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":307,"status":"ok","timestamp":1684891932779,"user":{"displayName":"Amjad Raza","userId":"06768084019046926999"},"user_tz":-600},"id":"sGElcw0NRxeK","outputId":"2b02aef7-23db-4bda-8445-d1b97dd9a58c"},"outputs":[{"name":"stdout","output_type":"stream","text":["Enter API for OpenAI platform\n"]}],"source":["#@title Select Model to Run\n","model_to_run = 'OpenAI' #@param [\"OpenAI\", \"Starcoder\", \"Open-Assistant\"]\n","print(f\"Enter API for {model_to_run} platform\")"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"DS1QdmQJgKm9"},"outputs":[],"source":["# Enter API Key\n","API_KEY = '' #@param {type:\"string\"}"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"JAO_SzlyDr5x"},"source":["We create a dataframe using pandas:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"nyRo07nqD08l"},"outputs":[],"source":["df = pd.DataFrame({\n"," \"country\": [\"United States\", \"United Kingdom\", \"France\", \"Germany\", \"Italy\", \"Spain\", \"Canada\", \"Australia\", \"Japan\", \"China\"],\n"," \"gdp\": [21400000, 2940000, 2830000, 3870000, 2160000, 1350000, 1780000, 1320000, 516000, 14000000],\n"," \"happiness_index\": [7.3, 7.2, 6.5, 7.0, 6.0, 6.3, 7.3, 7.3, 5.9, 5.0]\n","})"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"aeTOQettEfWT"},"source":["After loading the Dataframe and input of Prompt, now we are ready to run the Prompt with given dataframe and query. As a first setp, we instantiate the llm model with selected option. In order to understand, what is going under the hood, we set `verbose=True`. Let us run now!"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"HZnx0V18EkzS"},"outputs":[],"source":["# Model Initialisation\n","llm = models[model_to_run](api_token=API_KEY)\n","pandas_ai = PandasAI(llm, conversational=False, verbose=True)\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"BQs22IO-Ti_q"},"outputs":[],"source":["# Enter Prompt related to data or Select from Pre-defined for demo purposes.\n","prompt = 'Plot the histogram of countries showing for each the gpd, using different colors for each bar' #@param [ \"What is the relation between GDP and Happines Index\", \"Plot the histogram of countries showing for each the gpd, using different colors for each bar\", \"GDP of Top 5 Happiest Countries?\"] {allow-input: true}\n"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"executionInfo":{"elapsed":11398,"status":"ok","timestamp":1684891977617,"user":{"displayName":"Amjad Raza","userId":"06768084019046926999"},"user_tz":-600},"id":"m7_erVstXW9V","outputId":"7fad4db4-4758-4813-b8f0-bb619bd86a24"},"outputs":[{"name":"stdout","output_type":"stream","text":["Running PandasAI with openai LLM...\n","\n","Code generated:\n","```\n","import matplotlib.pyplot as plt\n","\n","colors = ['red', 'blue', 'green', 'orange', 'purple', 'pink', 'brown', 'gray', 'olive', 'cyan']\n","plt.bar(df['country'], df['gdp'], color=colors)\n","plt.xticks(rotation=90)\n","plt.xlabel('Country')\n","plt.ylabel('GDP')\n","plt.title('GDP by Country')\n","plt.show()\n","```\n","\n","Code running:\n","```\n","colors = ['red', 'blue', 'green', 'orange', 'purple', 'pink', 'brown',\n"," 'gray', 'olive', 'cyan']\n","plt.bar(df['country'], df['gdp'], color=colors)\n","plt.xticks(rotation=90)\n","plt.xlabel('Country')\n","plt.ylabel('GDP')\n","plt.title('GDP by Country')\n","plt.show()\n","```\n"]},{"data":{"image/png":"","text/plain":["
"]},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":["Answer: \n"]}],"source":["response = pandas_ai.run(df, prompt=prompt,\n"," is_conversational_answer=False)"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"3w4BLXuVY4jO"},"source":["As you can see from above experiments, having `verbose=True`, we can see the code generated by LLMs API and then this code is run to produce and answer on complete dataset. "]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"2A44RePXXldV"},"source":["## Play Around\n","\n","Users can play around various questions about the tiny dataset we generated above. Using this notebook, you can select between Model Options and ask questions based on your problem."]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"iWH56NfdiioM"},"source":["## More Examples\n","\n","In above section, we showed a little demo with small dataframe. Usually data is stored in `.csv` , .=`.xlsx` and other formats. `Pandasai` treats any data uploaded as Pandas dataframe and proceed accordingly. \n","\n","In this section, we include some of the [examples](https://github.com/gventuri/pandas-ai/tree/main/examples) shipped with `pandasai` reporsitory.\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"WnP-JPZzcFrJ"},"outputs":[],"source":["#Loading CSV file \n","import pandas as pd\n","from pandasai import PandasAI\n","from pandasai.llm.openai import OpenAI\n","file_path ='https://raw.githubusercontent.com/gventuri/pandas-ai/main/examples/data/Loan%20payments%20data.csv'\n","df = pd.read_csv(file_path)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"gupY58GRcFa4"},"outputs":[],"source":["llm = OpenAI(api_token=API_KEY)\n","pandas_ai = PandasAI(llm, verbose=True)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"FOK-uaPBcFOq"},"outputs":[],"source":["# Enter Prompt related to data or Select from Pre-defined for demo purposes.\n","\n","prompt = 'Generate bar Plot of Loans Paid by Men & Women' #@param [ \"How many loans are from Women that have been paid off?\", \"Generate bar Plot of Loans Paid by Men & Women\"] {allow-input: true}"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"executionInfo":{"elapsed":36186,"status":"ok","timestamp":1684893610216,"user":{"displayName":"Amjad Raza","userId":"06768084019046926999"},"user_tz":-600},"id":"GXeX1QghiC0F","outputId":"30842d6a-af1b-4a52-d0b2-03bc10008f0e"},"outputs":[{"name":"stdout","output_type":"stream","text":["Running PandasAI with openai LLM...\n","\n","Code generated:\n","```\n","import pandas as pd\n","import matplotlib.pyplot as plt\n","\n","# Load the dataframe\n","df = pd.read_csv('loan_data.csv')\n","\n","# Filter the dataframe to only include paid off loans\n","paid_off_df = df[df['loan_status'] == 'PAIDOFF']\n","\n","# Group the dataframe by gender and loan status\n","gender_grouped_df = paid_off_df.groupby(['Gender', 'loan_status']).count()['Loan_ID']\n","\n","# Create a bar plot\n","gender_grouped_df.plot(kind='bar')\n","\n","# Set the title and axis labels\n","plt.title('Loans Paid by Men & Women')\n","plt.xlabel('Gender, Loan Status')\n","plt.ylabel('Number of Loans')\n","\n","# Show the plot\n","plt.show()\n","```\n","\n","Code running:\n","```\n","paid_off_df = df[df['loan_status'] == 'PAIDOFF']\n","gender_grouped_df = paid_off_df.groupby(['Gender', 'loan_status']).count()[\n"," 'Loan_ID']\n","gender_grouped_df.plot(kind='bar')\n","plt.title('Loans Paid by Men & Women')\n","plt.xlabel('Gender, Loan Status')\n","plt.ylabel('Number of Loans')\n","plt.show()\n","```\n"]},{"data":{"image/png":"","text/plain":["
"]},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":["Answer: \n","Conversational answer: Sure, I can help you with that! To generate a bar plot of loans paid by men and women, we can use a data visualization tool like Matplotlib or Seaborn in Python. We'll need to have a dataset that includes information on the loan amounts paid by both men and women. Once we have that, we can create a bar plot with two bars - one for men and one for women - and the height of each bar will represent the total loan amount paid by that gender. This will give us a visual representation of the difference in loan payments between men and women.\n"]}],"source":["response = pandas_ai.run(df, prompt = prompt)"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":326,"status":"ok","timestamp":1684892330884,"user":{"displayName":"Amjad Raza","userId":"06768084019046926999"},"user_tz":-600},"id":"J8K29pWolDbI","outputId":"29a49e10-8786-438b-f3fb-d123386e5a2a"},"outputs":[{"name":"stdout","output_type":"stream","text":["247\n"]}],"source":["#Let us run the code ourselves and find out the results matches or not?\n","import pandas as pd\n","file_path ='https://raw.githubusercontent.com/gventuri/pandas-ai/main/examples/data/Loan%20payments%20data.csv'\n","df = pd.read_csv(file_path)\n","num_loans = len(df[(df['Gender'] == 'male') & (df['loan_status'] == 'PAIDOFF')]\n"," )\n","print(num_loans)"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"elapsed":352,"status":"ok","timestamp":1684893561284,"user":{"displayName":"Amjad Raza","userId":"06768084019046926999"},"user_tz":-600},"id":"WgUMtfAOe0t4","outputId":"26b23a27-11b5-4a1b-f724-b5cd587af1d8"},"outputs":[{"name":"stdout","output_type":"stream","text":["53\n"]}],"source":["import pandas as pd\n","file_path ='https://raw.githubusercontent.com/gventuri/pandas-ai/main/examples/data/Loan%20payments%20data.csv'\n","df = pd.read_csv(file_path)\n","num_loans = len(df[(df['Gender'] == 'female') & (df['loan_status'] == 'PAIDOFF')]\n"," )\n","print(num_loans)"]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"UbeJXL3wkmXE"},"source":["The answers generated by PandasAI and manually are same. However, it may not be a case always. "]},{"attachments":{},"cell_type":"markdown","metadata":{"id":"AQ11XG3JaOH3"},"source":["\n","# Remarks\n","In this concise tutorial, we have explored the Pandasai Library and gained insight into its higher-level architecture. This library offers a convenient solution for individuals to inquire about their data without the need to train in-house Large Language Models (LLMs) on company data. While there are various potential applications for this tool, it's important to remain mindful that the generated code by LLMs may occasionally produce unexpected outputs.\n","\n","It's worth emphasizing that Pandas AI is an actively developed project, indicating ongoing efforts to improve and enhance its functionality. With dedicated contributors, exciting features are being developed and added to this library.\n","\n","> **Stay tuned! There's more to come in the world of Pandas AI as it evolves and continues to empower users in their data exploration and analysis endeavors.**\n"]}],"metadata":{"colab":{"provenance":[{"file_id":"1XwQuNfgLtgcGd2yWMzBRXTR0bZ5Vjm1C","timestamp":1685076114938},{"file_id":"https://github.com/amjadraza/learn-it/blob/master/Getting_Started_pandas_ai.ipynb","timestamp":1684888879263}]},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"name":"python"}},"nbformat":4,"nbformat_minor":0} diff --git a/README.md b/README.md index 3b1fc9af4..c72d84342 100644 --- a/README.md +++ b/README.md @@ -163,7 +163,7 @@ Options: - **-d, --dataset**: The file path to the dataset. - **-t, --token**: Your HuggingFace or OpenAI API token, if no token provided pai will pull from the `.env` file. -- **-m, --model**: Choice of LLM, either `openai`, `open-assistant`, `starcoder`, or Google `palm`. +- **-m, --model**: Choice of LLM, either `openai`, `open-assistant`, `starcoder`, `falcon`, `azure-openai` or `google-palm`. - **-p, --prompt**: Prompt that PandasAI will run. To view a full list of available options and their descriptions, run the following command: @@ -205,6 +205,9 @@ llm = OpenAI(api_token="YOUR_API_KEY") # Starcoder llm = Starcoder(api_token="YOUR_HF_API_KEY") + +# Falcon +llm = Falcon(api_token="YOUR_HF_API_KEY") ``` ## License diff --git a/docs/API/llms.md b/docs/API/llms.md index 0495754f5..01632a92e 100644 --- a/docs/API/llms.md +++ b/docs/API/llms.md @@ -33,6 +33,14 @@ Starcoder wrapper extended through Base HuggingFace Class options: show_root_heading: true +### Falcon + +Falcon wrapper extended through Base HuggingFace Class + +::: pandasai.llm.falcon + options: + show_root_heading: true + ### Azure OpenAI OpenAI API through Azure Platform wrapper diff --git a/docs/pai_cli.md b/docs/pai_cli.md index fa07b37d4..d2952491d 100644 --- a/docs/pai_cli.md +++ b/docs/pai_cli.md @@ -1,35 +1,47 @@ # Command-Line Tool + [Pai] is the command line tool designed to provide a convenient way to interact with `pandasai` through a command line interface (CLI). In order to access the CLI tool, make sure to create a virtualenv for testing purpose and to install project dependencies in your local virtual environment using `pip` by running the following command: + ``` pip install -e . ``` + Alternatively, you can use `poetry` to create and activate the virtual environment by running the following command: + ``` poetry shell ``` + Inside the activated virtual environment, install the project dependencies by running the following command: + ``` poetry install ``` -By following these steps, you will now have the necessary environment to access the CLI tool. +By following these steps, you will now have the necessary environment to access the CLI tool. ``` pai [OPTIONS] ``` + Options: + - **-d, --dataset**: The file path to the dataset. - **-t, --token**: Your HuggingFace or OpenAI API token, if no token provided pai will pull from the `.env` file. -- **-m, --model**: Choice of LLM, either `openai`, `open-assistant`, or `starcoder`. +- **-m, --model**: Choice of LLM, either `openai`, `open-assistant`, `starcoder`, `falcon`, `azure-openai` or `google-palm`. - **-p, --prompt**: Prompt that PandasAI will run. To view a full list of available options and their descriptions, run the following command: + ``` pai --help ``` ->For example, ->``` ->pai -d "~/pandasai/example/data/Loan payments data.csv" -m "openai" -p "How many loans are from men and have been paid off?" ->``` ->Should result in the same output as the `from_csv.py` example. + +> For example, +> +> ``` +> pai -d "~/pandasai/example/data/Loan payments data.csv" -m "openai" -p "How many loans are from men and have been paid off?" +> ``` +> +> Should result in the same output as the `from_csv.py` example. diff --git a/pai/__main__.py b/pai/__main__.py index 1cb1632ed..4ddd5dca8 100644 --- a/pai/__main__.py +++ b/pai/__main__.py @@ -10,7 +10,7 @@ - **-d, --dataset**: The file path to the dataset. - **-t, --token**: Your HuggingFace or OpenAI API token, if no token provided pai will pull from the `.env` file. -- **-m, --model**: Choice of LLM, either `openai`, `open-assistant`, or `starcoder`. +- **-m, --model**: Choice of LLM, either `openai`, `open-assistant`, `starcoder`, `falcon`, `azure-openai` or `google-palm`. - **-p, --prompt**: Prompt that PandasAI will run. To view a full list of available options and their descriptions, run the following command: @@ -31,20 +31,35 @@ """ import os + import click import pandas as pd + from pandasai import PandasAI -from pandasai.llm.openai import OpenAI +from pandasai.llm.google_palm import GooglePalm from pandasai.llm.open_assistant import OpenAssistant +from pandasai.llm.openai import OpenAI from pandasai.llm.starcoder import Starcoder -from pandasai.llm.google_palm import GooglePalm + @click.command() -@click.option('-d', '--dataset', type=str, required=True, help='The dataset to use.') -@click.option('-t', '--token', type=str, required=False, default=None, help='The API token to use.') -@click.option('-m', '--model', type=click.Choice(['openai', 'open-assistant', 'starcoder', 'palm']), - required=True, help='The type of model to use.') -@click.option('-p', '--prompt', type=str, required=True, help='The prompt to use.') +@click.option("-d", "--dataset", type=str, required=True, help="The dataset to use.") +@click.option( + "-t", + "--token", + type=str, + required=False, + default=None, + help="The API token to use.", +) +@click.option( + "-m", + "--model", + type=click.Choice(["openai", "open-assistant", "starcoder", "falcon", "palm"]), + required=True, + help="The type of model to use.", +) +@click.option("-p", "--prompt", type=str, required=True, help="The prompt to use.") def main(dataset: str, token: str, model: str, prompt: str) -> None: """Main logic for the command line interface tool.""" @@ -80,31 +95,34 @@ def main(dataset: str, token: str, model: str, prompt: str) -> None: ".xml": pd.read_xml, } if ext in file_format: - df = file_format[ext](dataset) # pylint: disable=C0103 + df = file_format[ext](dataset) # pylint: disable=C0103 else: print("Unsupported file format.") return - except Exception as e: # pylint: disable=W0718 disable=C0103 + except Exception as e: # pylint: disable=W0718 disable=C0103 print(e) return if model == "openai": - llm = OpenAI(api_token = token) + llm = OpenAI(api_token=token) elif model == "open-assistant": - llm = OpenAssistant(api_token = token) + llm = OpenAssistant(api_token=token) + + elif model == "starcoder": + llm = Starcoder(api_token=token) - elif model == 'starcoder': - llm = Starcoder(api_token = token) + elif model == "falcon": + llm = Starcoder(api_token=token) - elif model == 'palm': - llm = GooglePalm(api_key = token) + elif model == "palm": + llm = GooglePalm(api_key=token) try: pandas_ai = PandasAI(llm, verbose=True) response = pandas_ai(df, prompt) print(response) - except Exception as e: # pylint: disable=W0718 disable=C0103 + except Exception as e: # pylint: disable=W0718 disable=C0103 print(e) diff --git a/pandasai/llm/falcon.py b/pandasai/llm/falcon.py new file mode 100644 index 000000000..5c9342393 --- /dev/null +++ b/pandasai/llm/falcon.py @@ -0,0 +1,51 @@ +""" Falcon LLM +This module is to run the Falcon API hosted and maintained by HuggingFace.co. +To generate HF_TOKEN go to https://huggingface.co/settings/tokens after creating Account +on the platform. + +Example: + Use below example to call Falcon Model + + >>> from pandasai.llm.falcon import Falcon +""" + + +import os +from typing import Optional + +from dotenv import load_dotenv + +from ..exceptions import APIKeyNotFoundError +from .base import HuggingFaceLLM + +load_dotenv() + + +class Falcon(HuggingFaceLLM): + + """Falcon LLM API + + A base HuggingFaceLLM class is extended to use Falcon model. + + """ + + api_token: str + _api_url: str = ( + "https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct" + ) + _max_retries: int = 5 + + def __init__(self, api_token: Optional[str] = None): + """ + __init__ method of Falcon Class + Args: + api_token (str): API token from Huggingface platform + """ + + self.api_token = api_token or os.getenv("HUGGINGFACE_API_KEY") or None + if self.api_token is None: + raise APIKeyNotFoundError("HuggingFace Hub API key is required") + + @property + def type(self) -> str: + return "falcon" diff --git a/pandasai/llm/open_assistant.py b/pandasai/llm/open_assistant.py index 96c98bd22..35336f2ed 100644 --- a/pandasai/llm/open_assistant.py +++ b/pandasai/llm/open_assistant.py @@ -5,7 +5,7 @@ the platform. Example: - Use below example to call Starcoder Model + Use below example to call OpenAssistant Model >>> from pandasai.llm.open_assistant import OpenAssistant """ @@ -23,8 +23,8 @@ class OpenAssistant(HuggingFaceLLM): """Open Assistant LLM - A base HuggingFaceLLM class is extended to use OpenAssistant Model via its API. - Currently `oasst-sft-1-pythia-12b` is supported via this module. + A base HuggingFaceLLM class is extended to use OpenAssistant Model via its API. + Currently `oasst-sft-1-pythia-12b` is supported via this module. """ api_token: str @@ -35,7 +35,6 @@ class OpenAssistant(HuggingFaceLLM): _max_retries: int = 10 def __init__(self, api_token: Optional[str] = None): - """ __init__ method of OpenAssistant Calss diff --git a/tests/llms/test_falcon.py b/tests/llms/test_falcon.py new file mode 100644 index 000000000..4b1d70b0f --- /dev/null +++ b/tests/llms/test_falcon.py @@ -0,0 +1,17 @@ +"""Unit tests for the falcon LLM class""" + +import pytest + +from pandasai.exceptions import APIKeyNotFoundError +from pandasai.llm.falcon import Falcon + + +class TestFalconLLM: + """Unit tests for the Falcon LLM class""" + + def test_type(self): + assert Falcon(api_token="test").type == "falcon" + + def test_init(self): + with pytest.raises(APIKeyNotFoundError): + Falcon()