feat[Agent]: add agent conversation code #584

ArslanSaleem · 2023-09-21T15:11:51Z

Summary by CodeRabbit

New Feature: Introduced the Agent class in pandasai.agent module to enhance conversational experience with PandasAI. It includes methods for chat interactions, generating clarification questions, starting a new conversation, and explaining code solutions.
New Feature: Added Memory class in pandasai.helpers.memory module to store conversation history with a configurable memory size.
New Feature: Created ClarificationQuestionPrompt and ExplainPrompt classes in pandasai.prompts module to generate prompts for clarification questions and code explanations.
Test: Added unit tests for the Agent class covering various scenarios and functionalities.
Documentation: Updated examples.md with examples of how to use the new Agent class for interactive and context-aware conversations.

coderabbitai · 2023-09-21T15:11:56Z

Walkthrough

This update introduces the Agent class in the pandasai library, enhancing the conversational experience with the AI assistant. It includes methods for chat interactions, generating clarification questions, starting new conversations, and explaining code solutions. The changes also include unit tests for the Agent class and examples demonstrating its usage.

Changes

File(s)	Summary
`examples/agent.py`, `docs/examples.md`	Demonstrates how to use the `Agent` class for interactive conversations with the AI assistant.
`pandasai/agent/__init__.py`, `pandasai/agent/response.py`	Introduces the `Agent` class with methods for chat interactions, clarification questions, starting new conversations, and explaining code solutions.
`pandasai/helpers/memory.py`	Adds a memory size limit to the `Memory` class to control the number of stored messages.
`pandasai/prompts/clarification_questions_prompt.py`, `pandasai/prompts/explain_prompt.py`	Adds new classes for generating prompts for clarification questions and code explanations.
`pandasai/smart_datalake/__init__.py`	Updates the `chat` method to accept an additional parameter and adds new properties.
`tests/test_agent.py`	Adds unit tests for the `Agent` class covering various scenarios.

"With each line of code, we hop and twirl, 🐇💻

Crafting an AI world, where conversations unfurl. 🗨️🌍

Questions clarified, answers explained, 🧐📚

In this digital burrow, knowledge is gained!" 🎉🥕

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

Mention @coderabbitai in any review comment for bot assistance.
Note: Review comments are made on code diffs or files, not on the PR overview.

Pause Incremental Reviews

Insert @coderabbitai: ignore in the PR description to halt the bot's ongoing reviews. Remove the line to resume.

coderabbitai

Review Status

Actionable comments generated: 10

Commits

Files that changed from the base of the PR and between 6a3879e and 70b0da8.

Files selected for processing (6)

examples/agent.py (1 hunks)
pandasai/init.py (2 hunks)
pandasai/agent/init.py (1 hunks)
pandasai/helpers/memory.py (1 hunks)
pandasai/prompts/clarification_questions_prompt.py (1 hunks)
pandasai/smart_datalake/init.py (3 hunks)

Files skipped from review due to trivial changes (1)

pandasai/init.py

Additional comments (Suppressed): 5

pandasai/helpers/memory.py (1)

11-13: The max_messages parameter in the __init__ method has a default value of sys.maxsize. This could potentially consume a lot of memory if not handled properly. Consider setting a reasonable default limit or ensure that this is well-documented so users are aware of the implications.

pandasai/agent/__init__.py (2)

30-31: The argument descriptions in the docstring are placeholders (_description_). Please provide meaningful descriptions for dfs and memory_size.

26-27: The default value for memory_size is set to 1. Depending on the expected size of the conversations, this might be too small and could lead to loss of conversation history. Consider setting a larger default or make sure that the callers are aware of this limit.

pandasai/smart_datalake/__init__.py (2)

258-263: The chat method now includes an optional parameter start_conversation. Ensure that this change doesn't break any existing calls to the chat method. Also, verify if the start_conversation parameter is being used correctly within the method.

657-658: A new property dfs has been added. Make sure that this property is being set correctly in the class and is being used appropriately where needed.

coderabbitai · 2023-09-21T15:14:27Z

pandasai/helpers/memory.py

    _messages: list
+    _max_messages: int


The _max_messages attribute is not defined in the class scope, which might lead to confusion and potential errors. It's better to define it within the __init__ method.

pandasai/helpers/memory.py

pandasai/prompts/clarification_questions_prompt.py

coderabbitai · 2023-09-21T15:14:27Z

examples/agent.py

+import pandas as pd
+from pandasai.agent import Agent
+
+from pandasai.llm.openai import OpenAI
+
+employees_data = {
+    "EmployeeID": [1, 2, 3, 4, 5],
+    "Name": ["John", "Emma", "Liam", "Olivia", "William"],
+    "Department": ["HR", "Sales", "IT", "Marketing", "Finance"],
+}
+
+salaries_data = {
+    "EmployeeID": [1, 2, 3, 4, 5],
+    "Salary": [5000, 6000, 4500, 7000, 5500],
+}
+
+employees_df = pd.DataFrame(employees_data)
+salaries_df = pd.DataFrame(salaries_data)
+
+
+llm = OpenAI("OPEN_API")
+dl = Agent([employees_df, salaries_df], config={"llm": llm}, memory_size=10)
+response = dl.chat("Who gets paid the most?")
+print(response)
+response = dl.clarification_questions()
+
+response = dl.chat("Which department does he belongs to?")
+print(response)


The code looks good overall. It imports necessary modules, creates dataframes from dictionaries, initializes the OpenAI and Agent classes, and uses the chat and clarification_questions methods of the Agent class.

However, there are a few points that need to be clarified or refined:

The OPEN_API string passed to the OpenAI class on line 21 seems like a placeholder. Please ensure it is replaced with an actual API key or configuration.

The memory_size argument in the Agent initialization on line 22 is set to 10. This means the agent will only remember the last 10 messages. Depending on the use case, this might be too small and could lead to loss of context in longer conversations.

The clarification_questions method is called on line 25, but its return value is not used or printed. If the intention is to display the clarification questions, consider printing the response.

On line 27, the chat question "Which department does he belongs to?" assumes a male employee. If the highest paid employee is female, this question may not yield the expected results. Consider using a gender-neutral term or referring to the employee by name if available.

Here's how you might address these points:

employees_df = pd.DataFrame(employees_data) salaries_df = pd.DataFrame(salaries_data) - 21: llm = OpenAI("OPEN_API") + 21: llm = OpenAI("<Your OpenAI API Key>") dl = Agent([employees_df, salaries_df], config={"llm": llm}, memory_size=10) response = dl.chat("Who gets paid the most?") print(response) - 25: response = dl.clarification_questions() + 25: response = dl.clarification_questions() + 26: print(response) - 28: response = dl.chat("Which department does he belongs to?") + 28: response = dl.chat("Which department does the highest paid employee belong to?") print(response)

pandasai/agent/__init__.py

coderabbitai · 2023-09-21T15:14:28Z

pandasai/agent/__init__.py

+        try:
+            prompt = self._get_clarification_prompt()
+            result = self._lake.llm.generate_code(prompt)
+        except Exception as exception:
+            return (
+                "Unfortunately, I was not able to get your clarification questions, "
+                "because of the following error:\n"
+                f"\n{exception}\n"
+            )


In the clarification_questions method, exceptions are caught and a string message is returned. This might be confusing for the caller as they would expect a list of questions in normal operation. Consider raising the exception again after logging or handling it, instead of returning a string.

coderabbitai

Review Status

Actionable comments generated: 6

Commits

Files that changed from the base of the PR and between 70b0da8 and 1b51727.

Files selected for processing (3)

examples/agent.py (1 hunks)
pandasai/agent/init.py (1 hunks)
tests/test_agent.py (1 hunks)

Files skipped from review due to trivial changes (1)

tests/test_agent.py

Additional comments (Suppressed): 2

examples/agent.py (2)

1-28: The Agent class is instantiated with a list of dataframes, a configuration dictionary containing an instance of OpenAI, and a memory size. The chat method is used to interact with the agent and print responses, while the clarification_questions method generates potential follow-up queries. Ensure that the OPEN_API key passed to the OpenAI constructor is a valid OpenAI API key.

21-22: Ensure that the llm object (an instance of OpenAI) is correctly initialized with a valid API key before being passed into the Agent constructor. Also, verify that the memory_size parameter in the Agent constructor is set appropriately considering the application's memory constraints and usage requirements.

coderabbitai · 2023-09-22T09:10:54Z

pandasai/agent/__init__.py

+
+    _memory: Memory
+    _lake: SmartDatalake = None
+    logger: Logger = None


The logger attribute is initialized as None but it's not checked for None before being used in line 38. This could potentially lead to a NoneType error if the logger is not passed during initialization and later accessed.

- logger: Logger = None + logger: Optional[Logger] = None

coderabbitai · 2023-09-22T09:10:54Z

pandasai/agent/__init__.py

+        self,
+        dfs: Union[DataFrameType, List[DataFrameType]],
+        config: Optional[Union[Config, dict]] = None,
+        logger: Logger = None,
+        memory_size=1,
+    ):


The memory_size parameter in the constructor lacks type hinting, which can lead to confusion about what type of argument should be passed. It's also missing from the docstring. Add an integer type hint and update the docstring accordingly.

- memory_size=1, + memory_size: int = 1,

pandasai/agent/__init__.py

coderabbitai · 2023-09-22T09:10:54Z

pandasai/agent/__init__.py

+    def chat(self, query: str):
+        """
+        Simulate a chat interaction with the assistant on Dataframe.
+        """
+        self._memory.add(query, True)
+        conversation = self._get_conversation()
+        result = self._lake.chat(query, start_conversation=conversation)
+        self._memory.add(result, False)
+        return result


The chat method does not handle exceptions. If an exception occurs during the execution of the self._lake.chat() method, it will propagate up the call stack and could potentially crash the application. Consider adding a try/except block to handle exceptions gracefully.

coderabbitai · 2023-09-22T09:10:54Z

pandasai/agent/__init__.py

+    def clarification_questions(self):
+        """
+        Generate and return up to three clarification questions based on a given prompt.
+        """
+        try:
+            prompt = self._get_clarification_prompt()
+            result = self._lake.llm.generate_code(prompt)
+            questions = json.loads(result)
+        except Exception as exception:
+            return (
+                "Unfortunately, I was not able to get your clarification questions, "
+                "because of the following error:\n"
+                f"\n{exception}\n"
+            )


While it's good that you're handling exceptions in the clarification_questions method, returning a string with the error message might not be the best approach. It would be more appropriate to log the error and raise the exception again after logging, so that the caller can decide how to handle it.

coderabbitai · 2023-09-22T09:10:54Z

pandasai/agent/__init__.py

+    def start_new_conversation(self):
+        """
+        Clears the previous conversation
+        """
+        self._memory.clear()


The start_new_conversation method clears the memory but does not return any confirmation or status. It would be helpful to return a status message or boolean value indicating whether the operation was successful.

coderabbitai

Review Status

Actionable comments generated: 13

Commits

Files that changed from the base of the PR and between 1b51727 and f715035.

Files selected for processing (7)

examples/agent.py (1 hunks)
pandasai/agent/init.py (1 hunks)
pandasai/agent/response.py (1 hunks)
pandasai/helpers/memory.py (1 hunks)
pandasai/prompts/explain_prompt.py (1 hunks)
pandasai/smart_datalake/init.py (4 hunks)
tests/test_agent.py (1 hunks)

Files skipped from review due to trivial changes (2)

examples/agent.py
pandasai/prompts/explain_prompt.py

Additional comments (Suppressed): 6

pandasai/smart_datalake/__init__.py (4)

258-263: The chat method now accepts an additional optional parameter start_conversation. Ensure that all calls to this function throughout the codebase have been updated to handle this new parameter. Also, verify if the default behavior when start_conversation is not provided aligns with the expected functionality.

313-314: The start_conversation value is being added to default_values dictionary. Make sure that the downstream functions or methods that use default_values are capable of handling this new key-value pair without causing any unexpected behavior or errors.

633-634: The setter for last_code_generated has been changed from setting self._code_manager._last_code_generated to self._last_code_generated. This could potentially break the code if there are other parts of the codebase that rely on self._code_manager._last_code_generated. Please ensure that this change is propagated throughout the codebase and doesn't cause any unintended side effects.

656-658: A new property dfs has been added to return self._dfs. This seems like a good practice as it encapsulates the internal state of the object and provides a controlled way to access it. However, please ensure that the returned object is not mutable or if it is, that mutating it won't cause any issues in the rest of your application.

pandasai/helpers/memory.py (2)

1-24: The new Memory class now includes a _max_messages attribute to limit the number of stored messages. The add method has been updated to delete the oldest two messages when this limit is exceeded. This change could potentially impact any code that relies on the Memory class, especially if it expects an unlimited number of messages to be stored. Please verify that all parts of the codebase that use the Memory class are compatible with this change.

11-13: The constructor of the Memory class now accepts an optional parameter max_messages, which defaults to sys.maxsize. This allows users to specify a maximum number of messages to store in memory. If not specified, the default behavior is to store as many messages as possible, limited only by system constraints. This is a good practice as it provides flexibility for different use cases while maintaining backward compatibility.

coderabbitai · 2023-09-22T15:02:19Z

pandasai/agent/response.py

+    def __init__(
+        self, success: bool = True, questions: List[str] = None, message: str = ""
+    ):


The default value for questions is None, which could lead to issues if you try to perform list operations on it. It's better to use an empty list as the default value.

- def __init__(self, success: bool = True, questions: List[str] = None, message: str = "") + def __init__(self, success: bool = True, questions: List[str] = [], message: str = "")

coderabbitai · 2023-09-22T15:02:19Z

pandasai/agent/response.py

+    @property
+    def message(self) -> List[str]:
+        return self._message


The return type hint for the message property is incorrect. It should be str, not List[str].

- def message(self) -> List[str]: + def message(self) -> str:

tests/test_agent.py

coderabbitai · 2023-09-22T15:02:20Z

tests/test_agent.py

+    def test_chat_memory_rollup(self, sample_df, config):
+        agent = Agent(sample_df, config, memory_size=1)
+        agent._lake.chat = Mock()
+        agent._lake.chat.return_value = "United States has the highest gdp"
+
+        # Test the chat function
+        agent.chat("Which country has the highest gdp?")
+
+        memory = agent._memory.all()
+        assert len(memory) == 2
+        assert memory[0]["message"] == "Which country has the highest gdp?"
+        assert memory[1]["message"] == "United States has the highest gdp"
+
+        # Add another conversation
+        agent._lake.chat.return_value = "United Kingdom has the second highest gdp"
+        agent.chat("Which country has the second highest gdp?")
+
+        memory = agent._memory.all()
+        assert len(memory) == 2
+        assert memory[0]["message"] == "Which country has the second highest gdp?"
+        assert memory[1]["message"] == "United Kingdom has the second highest gdp"


The test_chat_memory_rollup method tests the memory rollup feature when the memory size limit is reached. However, it only tests the case where the memory size limit is 1. Consider adding more test cases with different memory size limits to ensure the rollup feature works as expected in all scenarios.

tests/test_agent.py

pandasai/helpers/memory.py

coderabbitai · 2023-09-22T15:02:20Z

pandasai/agent/__init__.py

+    _memory: Memory
+    _lake: SmartDatalake = None
+    logger: Logger = None
+


The class variables _memory, _lake, and logger are declared but not initialized in the class scope. This could lead to confusion as it might imply that these are static class variables, which they are not. It would be better to declare these variables inside the __init__ method.

class Agent: """ Agent class to improve the conversational experience in PandasAI """ - 20: _memory: Memory - 21: _lake: SmartDatalake = None - 22: logger: Logger = None + 20: def __init__( + 21: self, + 22: dfs: Union[DataFrameType, List[DataFrameType]], + 23: config: Optional[Union[Config, dict]] = None, + 24: logger: Logger = None, + 25: memory_size: int = 1, + 26: ): + 27: """ + 28: Args: + 29: df (Union[SmartDataframe, SmartDatalake]): _description_ + 30: memory_size (int, optional): _description_. Defaults to 1. + 31: """ + 32: + 33: if not isinstance(dfs, list): + 34: dfs = [dfs] + 35: + 36: self._lake = SmartDatalake(dfs, config, logger) + 37: self.logger = self._lake.logger + 38: # For the conversation multiple the memory size by 2 + 39: self._memory = Memory(memory_size * 2)

pandasai/agent/__init__.py

coderabbitai · 2023-09-22T15:02:21Z

pandasai/agent/__init__.py

+        try:
+            prompt = ExplainPrompt()
+            prompt.set_var("code", self._lake.last_code_executed)
+            response = self._lake.llm.call(prompt)
+            self.logger.log(
+                f"""Explaination:  {response}
+                """
+            )
+            return response
+        except Exception as exception:
+            return (
+                "Unfortunately, I was not able to explain, "
+                "because of the following error:\n"
+                f"\n{exception}\n"
+            )


Similar to the previous points, consider raising a custom exception instead of returning an error message string.

- 131: except Exception as exception: - 132: return ( - 133: "Unfortunately, I was not able to explain, " - 134: "because of the following error:\n" - 135: f"\n{exception}\n" - 136: ) + 131: except Exception as exception: + 132: raise ExplanationException("Unable to generate explanation") from exception

coderabbitai

Review Status

Actionable comments generated: 3

Commits

Files that changed from the base of the PR and between f715035 and 2da890a.

Files selected for processing (2)

examples/agent.py (1 hunks)
pandasai/agent/init.py (1 hunks)

Files skipped from review due to trivial changes (1)

examples/agent.py

Additional comments (Suppressed): 1

pandasai/agent/__init__.py (1)

33-34: The docstring for the __init__ method seems to be incomplete or incorrect. The argument df is not present in the function signature, and the arguments dfs, config, and logger are not described. Please update the docstring to accurately reflect the function's parameters and their descriptions.

coderabbitai · 2023-09-22T15:07:50Z

pandasai/agent/__init__.py

+    _memory: Memory
+    _lake: SmartDatalake = None
+    logger: Optional[Logger] = None


The class variables _memory, _lake, and logger are declared outside of the __init__ method. This is not a common practice in Python and can lead to confusion. It's better to declare these variables inside the __init__ method.

- _memory: Memory - _lake: SmartDatalake = None - logger: Optional[Logger] = None + def __init__( + self, + dfs: Union[DataFrameType, List[DataFrameType]], + config: Optional[Union[Config, dict]] = None, + logger: Optional[Logger] = None, + memory_size: int = 1, + ): + self._memory = None + self._lake = None + self.logger = logger

pandasai/agent/__init__.py

coderabbitai · 2023-09-22T15:07:50Z

pandasai/agent/__init__.py

+    def start_new_conversation(self) -> True:
+        """
+        Clears the previous conversation
+        """
+
+        self._memory.clear()
+        return True


The start_new_conversation method always returns True. If the purpose of this return value is to indicate success, consider removing it. In Python, it's common to assume that a method succeeded if it didn't raise an exception.

coderabbitai · 2023-09-22T15:12:09Z

Rate Limit Exceeded

You have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 29 minutes and 59 seconds before requesting another review.

To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR.

coderabbitai · 2023-09-22T15:27:57Z

Rate Limit Exceeded

You have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 29 minutes and 59 seconds before requesting another review.

To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR.

codecov-commenter · 2023-09-22T15:30:16Z

Codecov Report

❗ No coverage uploaded for pull request base (feature/v1.3@6a3879e). Click here to learn what that means.
The diff coverage is n/a.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@               Coverage Diff               @@
##             feature/v1.3     #584   +/-   ##
===============================================
  Coverage                ?   84.07%           
===============================================
  Files                   ?       57           
  Lines                   ?     2788           
  Branches                ?        0           
===============================================
  Hits                    ?     2344           
  Misses                  ?      444           
  Partials                ?        0

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

gventuri · 2023-09-22T15:34:57Z

examples/agent.py

@@ -0,0 +1,41 @@
+import pandas as pd
+from pandasai.agent import Agent


We can change it to:
from pandasaiimport Agent

gventuri · 2023-09-22T15:36:20Z

pandasai/agent/__init__.py

+
+    _memory: Memory
+    _lake: SmartDatalake = None
+    logger: Optional[Logger] = None


Let's make logger private: _logger

gventuri · 2023-09-22T15:39:07Z

pandasai/agent/__init__.py

+    Agent class to improve the conversational experience in PandasAI
+    """
+
+    _memory: Memory


Here we should rely on the memory from the SmartDatalake instead, since we have access to it.

gventuri · 2023-09-22T15:42:00Z

pandasai/agent/__init__.py

+        # For the conversation multiple the memory size by 2
+        self._memory = Memory(memory_size * 2)
+
+    def _get_conversation(self):


We already have such a method in the memory.py file. What's the advantage or replicating it instead of using the one from the Memory?

It's important trying to reuse the code as much as possible to grant maintainability over time!

gventuri · 2023-09-22T15:49:57Z

pandasai/agent/__init__.py

+            self._memory.add(query, True)
+            conversation = self._get_conversation()
+            result = self._lake.chat(
+                query, output_type=output_type, start_conversation=conversation


If we use the memory from the SmartDataframe, we don't need to pass the start_conversation, I suggest we remove this.

gventuri · 2023-09-22T15:58:09Z

pandasai/helpers/memory.py

        self._messages = []
+        self._max_messages = max_messages


No need for this (see below)

gventuri · 2023-09-22T15:58:28Z

pandasai/helpers/memory.py


    def add(self, message: str, is_user: bool):
        self._messages.append({"message": message, "is_user": is_user})

+        # Delete two entry because of the conversation
+        if len(self._messages) > self._max_messages:


No need for this, the get_conversation method of memory will automatically limit based on the limit variable that is passed.

pandasai/prompts/explain_prompt.py

gventuri · 2023-09-22T16:00:48Z

pandasai/prompts/explain_prompt.py

+</Code
+
+Can you explain briefly for non technical person on how you came up with code 
+without explaining pandas library?


Let's try with "Explain how you came up with code for non-technical people without mentioning technical details or mentioning the libraries used".

gventuri · 2023-09-22T16:01:32Z

pandasai/smart_datalake/__init__.py

+        self,
+        query: str,
+        output_type: Optional[str] = None,
+        start_conversation: Optional[str] = None,


Let's remove this

@gventuri Currently the get_conversation method is called with default which has limit=1 means returning the last message only. We need to then pass the memory size to SmartLake constructor to do so or let's use memory for that.

coderabbitai

Review Status

Actionable comments generated: 10

Commits

Files that changed from the base of the PR and between 2da890a and 9025f4e.

Files selected for processing (6)

examples/agent.py (1 hunks)
pandasai/agent/init.py (1 hunks)
pandasai/prompts/clarification_questions_prompt.py (1 hunks)
pandasai/prompts/explain_prompt.py (1 hunks)
pandasai/smart_datalake/init.py (4 hunks)
tests/test_agent.py (1 hunks)

Files skipped from review due to trivial changes (2)

pandasai/prompts/explain_prompt.py
tests/test_agent.py

Additional comments (Suppressed): 3

examples/agent.py (1)

1-37: The new hunk of code seems to be well-written and logically sound. It demonstrates the usage of the Agent class from the pandasai.agent module, which is part of the PandasAI library. The code creates two pandas DataFrames, initializes an instance of the OpenAI class, and creates an Agent object. Then it shows how to chat with the agent, get clarification questions, and explain how the chat response is generated.

However, there are a few points that need to be verified or refined:

In line 21, the OpenAI key is hardcoded as "OPEN_API_KEY". This could potentially be a placeholder, but if it's not, it's a security issue. API keys should never be hardcoded in the source code. They should be stored securely and accessed through environment variables or secure key management systems.

In line 22, the Agent class is initialized with a memory size of 10. Ensure that this value is appropriate for your use case. If the number of messages exceeds this limit, the oldest messages will be removed according to the PR summary.

The code doesn't handle any exceptions. While it's mentioned in the instructions that arguments to internal function calls are pre-sanitized and do not require additional error handling, it would still be good practice to add some basic error handling around the main operations (like chatting with the agent, getting clarification questions, etc.) to catch and log any unexpected errors.

pandasai/smart_datalake/__init__.py (2)

258-263: The chat method has been updated to include a new optional parameter start_conversation. Ensure that this change doesn't break any existing calls to the chat method in your codebase. If start_conversation is used, it should be a string representing the start of a conversation.

633-634: The setter for last_code_generated has been changed to set self._last_code_generated instead of self._code_manager._last_code_generated. Ensure that this change is intentional and that it doesn't affect other parts of your code that rely on last_code_generated.

pandasai/prompts/clarification_questions_prompt.py

pandasai/smart_datalake/__init__.py

pandasai/agent/__init__.py

coderabbitai · 2023-09-22T19:32:56Z

pandasai/agent/__init__.py

+        except Exception as exception:
+            raise exception


In the clarification_questions method, exceptions are caught and immediately re-raised. This is redundant and can be removed, as uncaught exceptions will automatically propagate up the call stack.

@ArslanSaleem agree, I suggest we remove try/catch wrapper. In the future we might add a retry mechanism!

Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.

pandasai/agent/__init__.py

coderabbitai

Review Status

Actionable comments generated: 0

Commits

Files that changed from the base of the PR and between 9025f4e and d1b8e61.

Files selected for processing (1)

examples/agent.py (1 hunks)

Additional comments (Suppressed): 3

examples/agent.py (3)

1-37: The new hunk of code seems to be well-written and logically sound. It demonstrates the usage of the Agent class from the pandasai library, including how to initialize an instance of the class, chat with the agent, get clarification questions, and explain how a response is generated.

However, there are a few points that need verification or refinement:

1-2: Ensure that the Agent class has been correctly implemented in the pandasai module.

6-15: The data used for demonstration purposes is hardcoded. While this is acceptable for an example script, in a real-world scenario, you would likely want to load this data from a file or database.
21: The OpenAI key is hardcoded into the script. This could pose a security risk if the script is shared or made public. Consider loading sensitive information like API keys from environment variables or secure files.

22: The Agent class is initialized with two DataFrames, a configuration dictionary containing an instance of the OpenAI class, and a memory size. Ensure that the Agent class's constructor is designed to handle these parameters correctly.

25, 30, 36: These lines call methods (chat, clarification_questions, explain) on the Agent instance. Verify that these methods have been correctly implemented in the Agent class.

Overall, the code appears to be well-structured and easy to understand. However, it's important to verify the implementation of the Agent class and its methods to ensure they work as expected.

coderabbitai · 2023-09-22T20:30:18Z

Rate Limit Exceeded

You have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 29 minutes and 59 seconds before requesting another review.

To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR.

coderabbitai · 2023-09-22T20:33:35Z

Rate Limit Exceeded

You have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 27 minutes and 14 seconds before requesting another review.

To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR.

coderabbitai · 2023-09-22T20:39:48Z

Rate Limit Exceeded

You have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 21 minutes and 1 seconds before requesting another review.

To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR.

gventuri

Great changes, thanks a lot! We are almost there!

gventuri · 2023-09-22T23:02:42Z

pandasai/agent/__init__.py

+        except Exception as exception:
+            raise exception


@ArslanSaleem agree, I suggest we remove try/catch wrapper. In the future we might add a retry mechanism!

gventuri · 2023-09-22T23:08:19Z

pandasai/prompts/clarification_questions_prompt.py

+
+<dataframe>
+{dataframes}
+</dataframe>


{dataframes} only should be enough, the set_var method in prompts already takes care of wrapping each one individually.

gventuri · 2023-09-22T23:09:56Z

pandasai/prompts/clarification_questions_prompt.py

+"""
+
+    def __init__(self, dataframes, conversation):
+        self.set_var("dataframes", dataframes)


def set_var(self, var, value): if var == "dfs": self._args["dataframes"] = self._generate_dataframes(value) self._args[var] = value

This method from Prompt automatically configures the dfs to be used in the prompt, but it requires a dfs key.
Let's change this to self.set_var("dfs", dataframes)

coderabbitai

Review Status

Actionable comments generated: 7

Commits

Files that changed from the base of the PR and between d1b8e61 and 2e4c902.

Files selected for processing (5)

pandasai/agent/init.py (1 hunks)
pandasai/helpers/memory.py (2 hunks)
pandasai/prompts/clarification_questions_prompt.py (1 hunks)
pandasai/prompts/explain_prompt.py (1 hunks)
pandasai/smart_datalake/init.py (3 hunks)

Additional comments (Suppressed): 6

pandasai/helpers/memory.py (1)

26-31: The get_conversation method now uses the memory_size as the default limit if no limit is provided. This change aligns with the introduction of the memory_size attribute and provides more flexibility in retrieving conversation history.
pandasai/smart_datalake/__init__.py (3)
305-311: The new hunk seems to be missing the assignment of the generate_python_code_instruction variable. This could lead to a NameError later in the code if this variable is referenced. Please verify if this is intentional or an oversight.
-                 generate_python_code_instruction = self._get_prompt(
+                 self.generate_python_code_instruction = self._get_prompt(
                      "generate_python_code",
                      default_prompt=GeneratePythonCodePrompt,
624-627: The setter for last_code_generated has been changed from setting the value on _code_manager._last_code_generated to _last_code_generated. Ensure that this change does not break any dependencies on the old behavior, especially if other parts of the code are expecting the value to be set on _code_manager.

645-655: Two new properties, dfs and memory, have been added. These seem to be simple getters for the _dfs and _memory attributes. If these attributes were previously accessed directly, ensure that all such instances have been updated to use the new property methods.
pandasai/agent/__init__.py (2)

28-31: The docstring for the __init__ method seems to be outdated or incorrect. The arguments mentioned in the docstring do not match the actual arguments of the method. Please update the docstring to reflect the correct arguments and their descriptions.

66-67: The code is returning only the first three clarification questions. If there are more than three, the rest will be ignored. Make sure this is the intended behavior. If all questions should be returned, you can simply return questions without slicing it.

coderabbitai · 2023-09-23T07:19:10Z

pandasai/prompts/clarification_questions_prompt.py

+    """Prompt to get clarification questions"""
+
+    text: str = """
+You are provided with the following pandas DataFrames:
+
+{dataframes}
+
+<conversation>
+{conversation}
+</conversation>
+
+Based on the conversation, are there any clarification questions 
+that a senior data scientist would ask? These are questions for non technical people, 
+only ask for questions they could ask given low tech expertise and 
+no knowledge about how the dataframes are structured.
+
+Return the JSON array of the clarification questions. 
+
+If there is no clarification question, return an empty array.
+
+Json:
+"""
+
+    def __init__(self, dataframes, conversation):


The ClarificationQuestionPrompt class is well defined and follows the SOLID principles. It extends from the base Prompt class and sets up its specific prompt text and variables in the constructor. However, it would be beneficial to add type hints for the parameters dataframes and conversation in the __init__ method to improve readability and maintainability.

- def __init__(self, dataframes, conversation): + def __init__(self, dataframes: pd.DataFrame, conversation: str):

coderabbitai · 2023-09-23T07:19:10Z

pandasai/prompts/explain_prompt.py

+""" Prompt to explain solution generated
+The previous conversation we had
+
+<Conversation>
+{conversation}
+</Conversation>
+
+Based on the last conversation you generated the following code:
+
+<Code>
+{code}
+</Code
+
+Explain how you came up with code for non-technical people without 
+mentioning technical details or mentioning the libraries used?
+
+"""


The docstring at the top of the file seems to be a copy of the prompt text. It's not providing any useful information about the module or its contents. Consider replacing it with a more informative docstring that describes the purpose and functionality of the module.

coderabbitai · 2023-09-23T07:19:10Z

pandasai/prompts/explain_prompt.py

+
+
+class ExplainPrompt(Prompt):
+    """Prompt to get clarification questions"""


The class docstring is misleading. This class is for generating explanation prompts, not clarification questions. Please correct it to avoid confusion.

- """Prompt to get clarification questions""" + """Prompt to generate explanation for the code"""

pandasai/prompts/explain_prompt.py

coderabbitai · 2023-09-23T07:19:10Z

pandasai/helpers/memory.py

+    def __init__(self, memory_size: int = 1):
        self._messages = []
+        self._memory_size = memory_size


The __init__ method now accepts an optional parameter memory_size which defaults to 1. This is used to limit the number of stored messages in memory. However, there's no logic implemented yet to enforce this limit when adding new messages. Consider adding a check in the add method to remove the oldest message(s) when the limit is exceeded.

def add(self, message: str, is_user: bool): self._messages.append({"message": message, "is_user": is_user}) + while len(self._messages) > self._memory_size: + self._messages.pop(0)

coderabbitai · 2023-09-23T07:19:11Z

pandasai/agent/__init__.py

+        except Exception as exception:
+            return (
+                "Unfortunately, I was not able to explain, "
+                "because of the following error:\n"
+                f"\n{exception}\n"
+            )


Error handling in these methods only returns a string with the error message. It might be better to log the exception and re-raise it after logging. This way, the caller of the method can decide how to handle the exception. Also, this will help in debugging as the stack trace will be available in the logs.

- return ( - "Unfortunately, I was not able to get your answers, " - "because of the following error:\n" - f"\n{exception}\n" - ) + self._logger.log( + "Unfortunately, I was not able to get your answers, " + "because of the following error:\n" + f"\n{exception}\n" + ) + raise ... - return ( - "Unfortunately, I was not able to explain, " - "because of the following error:\n" - f"\n{exception}\n" - ) + self._logger.log( + "Unfortunately, I was not able to explain, " - "because of the following error:\n" - f"\n{exception}\n" + ) + raise

coderabbitai

Review Status

Actionable comments generated: 5

Commits

Files that changed from the base of the PR and between 2e4c902 and 7a554a5.

Files selected for processing (2)

docs/examples.md (1 hunks)
pandasai/agent/init.py (1 hunks)

Additional comments (Suppressed): 1

docs/examples.md (1)

210-222: The new section "Working with Agent" provides a clear and concise overview of the key features of the Agent class. It explains how context retention, clarification questions, and explanations can enhance user interactions with the chat agent. This is a good addition to the documentation as it helps users understand how to use the Agent class effectively.

docs/examples.md

coderabbitai · 2023-09-23T09:26:04Z

pandasai/agent/__init__.py

+        if not isinstance(dfs, list):
+            dfs = [dfs]
+


The code is assuming that if dfs is not a list, it should be converted into a list. This might lead to unexpected behavior if dfs is of an unsupported type. It would be better to explicitly check for supported types and raise an error if an unsupported type is provided.

- if not isinstance(dfs, list): - dfs = [dfs] + if isinstance(dfs, DataFrameType): + dfs = [dfs] + elif not isinstance(dfs, list): + raise TypeError("dfs must be a DataFrameType or a list of DataFrameType")

coderabbitai · 2023-09-23T09:26:04Z

pandasai/agent/__init__.py

+            result = self._lake.chat(query, output_type=output_type)
+            return result
+        except Exception as exception:
+            return (
+                "Unfortunately, I was not able to get your answers, "
+                "because of the following error:\n"
+                f"\n{exception}\n"
+            )


The exception handling here is too broad, which can make debugging difficult because it hides the details of what went wrong. Consider catching specific exceptions that you expect might occur during the execution of this block. If you want to catch all exceptions, at least log the full traceback to help with debugging.

coderabbitai · 2023-09-23T09:26:05Z

pandasai/agent/__init__.py

+        prompt = ClarificationQuestionPrompt(
+            self._lake.dfs, self._lake._memory.get_conversation()
+        )
+
+        result = self._lake.llm.call(prompt)
+        self._logger.log(
+            f"""Clarification Questions:  {result}
+            """
+        )
+        questions: list[str] = json.loads(result)
+        return questions[:3]


The result from self._lake.llm.call(prompt) is directly loaded as JSON and returned. There's no error handling in case the result is not valid JSON, which could cause the function to crash. Consider adding error handling around json.loads(result).

coderabbitai · 2023-09-23T09:26:05Z

pandasai/agent/__init__.py

+        try:
+            prompt = ExplainPrompt(
+                self._lake._memory.get_conversation(),
+                self._lake.last_code_executed,
+            )
+            response = self._lake.llm.call(prompt)
+            self._logger.log(
+                f"""Explaination:  {response}
+                """
+            )
+            return response
+        except Exception as exception:
+            return (
+                "Unfortunately, I was not able to explain, "
+                "because of the following error:\n"
+                f"\n{exception}\n"
+            )


Similar to the previous comment, the exception handling here is too broad. Consider catching specific exceptions that you expect might occur during the execution of this block. If you want to catch all exceptions, at least log the full traceback to help with debugging.

gventuri · 2023-09-23T09:50:02Z

pandasai/prompts/clarification_questions_prompt.py

+
+<dataframe>
+{dataframe}
+</dataframe>


Let's also change this to

{dataframes}

for consistency!

pandasai/smart_datalake/__init__.py

coderabbitai

Review Status

Actionable comments generated: 2

Commits

Files that changed from the base of the PR and between 7a554a5 and f7e4d98.

Files selected for processing (2)

pandasai/prompts/clarification_questions_prompt.py (1 hunks)
pandasai/prompts/explain_prompt.py (1 hunks)

Additional comments (Suppressed): 2

pandasai/prompts/clarification_questions_prompt.py (1)

48-49: The set_var method is used here but it's not defined in this class or imported from elsewhere. Ensure that this method is properly defined and accessible within this class.

pandasai/prompts/explain_prompt.py (1)

42-44: The set_var method is not defined in this class or its parent class Prompt. Ensure that this method exists and works as expected. If it's intended to set instance variables, you can directly assign the values like self.conversation = conversation and self.code = code.

pandasai/prompts/clarification_questions_prompt.py

coderabbitai · 2023-09-24T14:59:14Z

pandasai/prompts/explain_prompt.py

+    text: str = """
+The previous conversation we had
+
+<Conversation>
+{conversation}
+</Conversation>
+
+Based on the last conversation you generated the following code:
+
+<Code>
+{code}
+</Code
+
+Explain how you came up with code for non-technical people without 
+mentioning technical details or mentioning the libraries used?
+
+"""


The text attribute in the ExplainPrompt class is a class variable, which means it's shared across all instances of this class. If you modify it in one instance, it will affect all other instances. This could lead to unexpected behavior if multiple instances of ExplainPrompt are used concurrently. Consider moving this attribute to the instance level (inside __init__) to avoid potential issues.

gventuri · 2023-09-24T22:15:55Z

@ArslanSaleem thanks a lot for the great job. Merging!

feat[Agent]: add agent conversation code

70b0da8

coderabbitai bot reviewed Sep 21, 2023

View reviewed changes

feat[Agent]: add test cases for the agent class

1b51727

coderabbitai bot reviewed Sep 22, 2023

View reviewed changes

ArslanSaleem added 2 commits September 22, 2023 15:08

feat: add explain method

70244c3

feat: Add Explain functionality in the agent

f715035

coderabbitai bot reviewed Sep 22, 2023

View reviewed changes

fix: refactor types

2da890a

coderabbitai bot reviewed Sep 22, 2023

View reviewed changes

chore: fix typings

6736c44

chore: improve prompt add conversation

cdeec68

ArslanSaleem requested a review from gventuri September 22, 2023 15:44

gventuri reviewed Sep 22, 2023

View reviewed changes

refactor: remove memory from the agent class

9025f4e

coderabbitai bot reviewed Sep 22, 2023

View reviewed changes

refactor: import of Agent class in example

d1b8e61

coderabbitai bot reviewed Sep 22, 2023

View reviewed changes

refactor: memory to return conversation according to size

49d8720

refactor: remove leftover property

b92fb39

fix: prompt comment

7f17af8

gventuri reviewed Sep 22, 2023

View reviewed changes

fix: redundant try catch

2e4c902

coderabbitai bot reviewed Sep 23, 2023

View reviewed changes

chore: improve docstring and add example in documentation

7a554a5

coderabbitai bot reviewed Sep 23, 2023

View reviewed changes

gventuri reviewed Sep 23, 2023

View reviewed changes

fix: Comment in clarification prompts and add dtyps to the constructors

f7e4d98

coderabbitai bot reviewed Sep 24, 2023

View reviewed changes

gventuri merged commit dd6934d into feature/v1.3 Sep 24, 2023
9 checks passed

		@@ -0,0 +1,41 @@
		import pandas as pd
		from pandasai.agent import Agent



		class ExplainPrompt(Prompt):
		"""Prompt to get clarification questions"""

feat[Agent]: add agent conversation code #584

feat[Agent]: add agent conversation code #584

Conversation

ArslanSaleem commented Sep 21, 2023 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Sep 21, 2023 • edited Loading

Walkthrough

Changes

Chat with CodeRabbit Bot (@coderabbitai)

Pause Incremental Reviews

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Sep 21, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 21, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 21, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Sep 22, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 22, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 22, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 22, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 22, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Sep 22, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 22, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 22, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 22, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 22, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Sep 22, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 22, 2023

Choose a reason for hiding this comment

coderabbitai bot commented Sep 22, 2023

Rate Limit Exceeded

coderabbitai bot commented Sep 22, 2023

Rate Limit Exceeded

codecov-commenter commented Sep 22, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArslanSaleem Sep 22, 2023 • edited Loading

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Sep 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Sep 22, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot commented Sep 22, 2023

Rate Limit Exceeded

coderabbitai bot commented Sep 22, 2023

Rate Limit Exceeded

coderabbitai bot commented Sep 22, 2023

ArslanSaleem commented Sep 21, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 21, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

codecov-commenter commented Sep 22, 2023 •

edited

Loading

ArslanSaleem Sep 22, 2023 •

edited

Loading

coderabbitai bot Sep 22, 2023 •

edited

Loading