Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat[Agent]: add agent conversation code #584

Merged
merged 15 commits into from
Sep 24, 2023

Conversation

ArslanSaleem
Copy link
Collaborator

@ArslanSaleem ArslanSaleem commented Sep 21, 2023

Summary by CodeRabbit

  • New Feature: Introduced the Agent class in pandasai.agent module to enhance conversational experience with PandasAI. It includes methods for chat interactions, generating clarification questions, starting a new conversation, and explaining code solutions.
  • New Feature: Added Memory class in pandasai.helpers.memory module to store conversation history with a configurable memory size.
  • New Feature: Created ClarificationQuestionPrompt and ExplainPrompt classes in pandasai.prompts module to generate prompts for clarification questions and code explanations.
  • Test: Added unit tests for the Agent class covering various scenarios and functionalities.
  • Documentation: Updated examples.md with examples of how to use the new Agent class for interactive and context-aware conversations.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 21, 2023

Walkthrough

This update introduces the Agent class in the pandasai library, enhancing the conversational experience with the AI assistant. It includes methods for chat interactions, generating clarification questions, starting new conversations, and explaining code solutions. The changes also include unit tests for the Agent class and examples demonstrating its usage.

Changes

File(s) Summary
examples/agent.py,
docs/examples.md
Demonstrates how to use the Agent class for interactive conversations with the AI assistant.
pandasai/agent/__init__.py,
pandasai/agent/response.py
Introduces the Agent class with methods for chat interactions, clarification questions, starting new conversations, and explaining code solutions.
pandasai/helpers/memory.py Adds a memory size limit to the Memory class to control the number of stored messages.
pandasai/prompts/clarification_questions_prompt.py,
pandasai/prompts/explain_prompt.py
Adds new classes for generating prompts for clarification questions and code explanations.
pandasai/smart_datalake/__init__.py Updates the chat method to accept an additional parameter and adds new properties.
tests/test_agent.py Adds unit tests for the Agent class covering various scenarios.

"With each line of code, we hop and twirl, 🐇💻

Crafting an AI world, where conversations unfurl. 🗨️🌍

Questions clarified, answers explained, 🧐📚

In this digital burrow, knowledge is gained!" 🎉🥕


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • Mention @coderabbitai in any review comment for bot assistance.
  • Note: Review comments are made on code diffs or files, not on the PR overview.

Pause Incremental Reviews

  • Insert @coderabbitai: ignore in the PR description to halt the bot's ongoing reviews. Remove the line to resume.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 10

Commits Files that changed from the base of the PR and between 6a3879e and 70b0da8.
Files selected for processing (6)
  • examples/agent.py (1 hunks)
  • pandasai/init.py (2 hunks)
  • pandasai/agent/init.py (1 hunks)
  • pandasai/helpers/memory.py (1 hunks)
  • pandasai/prompts/clarification_questions_prompt.py (1 hunks)
  • pandasai/smart_datalake/init.py (3 hunks)
Files skipped from review due to trivial changes (1)
  • pandasai/init.py
Additional comments (Suppressed): 5
pandasai/helpers/memory.py (1)
  • 11-13: The max_messages parameter in the __init__ method has a default value of sys.maxsize. This could potentially consume a lot of memory if not handled properly. Consider setting a reasonable default limit or ensure that this is well-documented so users are aware of the implications.
pandasai/agent/__init__.py (2)
  • 30-31: The argument descriptions in the docstring are placeholders (_description_). Please provide meaningful descriptions for dfs and memory_size.

  • 26-27: The default value for memory_size is set to 1. Depending on the expected size of the conversations, this might be too small and could lead to loss of conversation history. Consider setting a larger default or make sure that the callers are aware of this limit.

pandasai/smart_datalake/__init__.py (2)
  • 258-263: The chat method now includes an optional parameter start_conversation. Ensure that this change doesn't break any existing calls to the chat method. Also, verify if the start_conversation parameter is being used correctly within the method.

  • 657-658: A new property dfs has been added. Make sure that this property is being set correctly in the class and is being used appropriately where needed.

Comment on lines 8 to 9
_messages: list
_max_messages: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _max_messages attribute is not defined in the class scope, which might lead to confusion and potential errors. It's better to define it within the __init__ method.

pandasai/helpers/memory.py Outdated Show resolved Hide resolved
pandasai/prompts/clarification_questions_prompt.py Outdated Show resolved Hide resolved
pandasai/prompts/clarification_questions_prompt.py Outdated Show resolved Hide resolved
Comment on lines 1 to 28
import pandas as pd
from pandasai.agent import Agent

from pandasai.llm.openai import OpenAI

employees_data = {
"EmployeeID": [1, 2, 3, 4, 5],
"Name": ["John", "Emma", "Liam", "Olivia", "William"],
"Department": ["HR", "Sales", "IT", "Marketing", "Finance"],
}

salaries_data = {
"EmployeeID": [1, 2, 3, 4, 5],
"Salary": [5000, 6000, 4500, 7000, 5500],
}

employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)


llm = OpenAI("OPEN_API")
dl = Agent([employees_df, salaries_df], config={"llm": llm}, memory_size=10)
response = dl.chat("Who gets paid the most?")
print(response)
response = dl.clarification_questions()

response = dl.chat("Which department does he belongs to?")
print(response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good overall. It imports necessary modules, creates dataframes from dictionaries, initializes the OpenAI and Agent classes, and uses the chat and clarification_questions methods of the Agent class.

However, there are a few points that need to be clarified or refined:

  1. The OPEN_API string passed to the OpenAI class on line 21 seems like a placeholder. Please ensure it is replaced with an actual API key or configuration.
  2. The memory_size argument in the Agent initialization on line 22 is set to 10. This means the agent will only remember the last 10 messages. Depending on the use case, this might be too small and could lead to loss of context in longer conversations.
  3. The clarification_questions method is called on line 25, but its return value is not used or printed. If the intention is to display the clarification questions, consider printing the response.
  4. On line 27, the chat question "Which department does he belongs to?" assumes a male employee. If the highest paid employee is female, this question may not yield the expected results. Consider using a gender-neutral term or referring to the employee by name if available.

Here's how you might address these points:

employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)


- 21: llm = OpenAI("OPEN_API")
+ 21: llm = OpenAI("<Your OpenAI API Key>")
dl = Agent([employees_df, salaries_df], config={"llm": llm}, memory_size=10)
response = dl.chat("Who gets paid the most?")
print(response)
- 25: response = dl.clarification_questions()
+ 25: response = dl.clarification_questions()
+ 26: print(response)

- 28: response = dl.chat("Which department does he belongs to?")
+ 28: response = dl.chat("Which department does the highest paid employee belong to?")
print(response)

pandasai/agent/__init__.py Show resolved Hide resolved
Comment on lines 77 to 85
try:
prompt = self._get_clarification_prompt()
result = self._lake.llm.generate_code(prompt)
except Exception as exception:
return (
"Unfortunately, I was not able to get your clarification questions, "
"because of the following error:\n"
f"\n{exception}\n"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the clarification_questions method, exceptions are caught and a string message is returned. This might be confusing for the caller as they would expect a list of questions in normal operation. Consider raising the exception again after logging or handling it, instead of returning a string.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 6

Commits Files that changed from the base of the PR and between 70b0da8 and 1b51727.
Files selected for processing (3)
  • examples/agent.py (1 hunks)
  • pandasai/agent/init.py (1 hunks)
  • tests/test_agent.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • tests/test_agent.py
Additional comments (Suppressed): 2
examples/agent.py (2)
  • 1-28: The Agent class is instantiated with a list of dataframes, a configuration dictionary containing an instance of OpenAI, and a memory size. The chat method is used to interact with the agent and print responses, while the clarification_questions method generates potential follow-up queries. Ensure that the OPEN_API key passed to the OpenAI constructor is a valid OpenAI API key.

  • 21-22: Ensure that the llm object (an instance of OpenAI) is correctly initialized with a valid API key before being passed into the Agent constructor. Also, verify that the memory_size parameter in the Agent constructor is set appropriately considering the application's memory constraints and usage requirements.


_memory: Memory
_lake: SmartDatalake = None
logger: Logger = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logger attribute is initialized as None but it's not checked for None before being used in line 38. This could potentially lead to a NoneType error if the logger is not passed during initialization and later accessed.

-    logger: Logger = None
+    logger: Optional[Logger] = None

Comment on lines 22 to 27
self,
dfs: Union[DataFrameType, List[DataFrameType]],
config: Optional[Union[Config, dict]] = None,
logger: Logger = None,
memory_size=1,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory_size parameter in the constructor lacks type hinting, which can lead to confusion about what type of argument should be passed. It's also missing from the docstring. Add an integer type hint and update the docstring accordingly.

-        memory_size=1,
+        memory_size: int = 1,

pandasai/agent/__init__.py Show resolved Hide resolved
Comment on lines 54 to 62
def chat(self, query: str):
"""
Simulate a chat interaction with the assistant on Dataframe.
"""
self._memory.add(query, True)
conversation = self._get_conversation()
result = self._lake.chat(query, start_conversation=conversation)
self._memory.add(result, False)
return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chat method does not handle exceptions. If an exception occurs during the execution of the self._lake.chat() method, it will propagate up the call stack and could potentially crash the application. Consider adding a try/except block to handle exceptions gracefully.

Comment on lines 73 to 86
def clarification_questions(self):
"""
Generate and return up to three clarification questions based on a given prompt.
"""
try:
prompt = self._get_clarification_prompt()
result = self._lake.llm.generate_code(prompt)
questions = json.loads(result)
except Exception as exception:
return (
"Unfortunately, I was not able to get your clarification questions, "
"because of the following error:\n"
f"\n{exception}\n"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's good that you're handling exceptions in the clarification_questions method, returning a string with the error message might not be the best approach. It would be more appropriate to log the error and raise the exception again after logging, so that the caller can decide how to handle it.

Comment on lines 90 to 94
def start_new_conversation(self):
"""
Clears the previous conversation
"""
self._memory.clear()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The start_new_conversation method clears the memory but does not return any confirmation or status. It would be helpful to return a status message or boolean value indicating whether the operation was successful.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 13

Commits Files that changed from the base of the PR and between 1b51727 and f715035.
Files selected for processing (7)
  • examples/agent.py (1 hunks)
  • pandasai/agent/init.py (1 hunks)
  • pandasai/agent/response.py (1 hunks)
  • pandasai/helpers/memory.py (1 hunks)
  • pandasai/prompts/explain_prompt.py (1 hunks)
  • pandasai/smart_datalake/init.py (4 hunks)
  • tests/test_agent.py (1 hunks)
Files skipped from review due to trivial changes (2)
  • examples/agent.py
  • pandasai/prompts/explain_prompt.py
Additional comments (Suppressed): 6
pandasai/smart_datalake/__init__.py (4)
  • 258-263: The chat method now accepts an additional optional parameter start_conversation. Ensure that all calls to this function throughout the codebase have been updated to handle this new parameter. Also, verify if the default behavior when start_conversation is not provided aligns with the expected functionality.

  • 313-314: The start_conversation value is being added to default_values dictionary. Make sure that the downstream functions or methods that use default_values are capable of handling this new key-value pair without causing any unexpected behavior or errors.

  • 633-634: The setter for last_code_generated has been changed from setting self._code_manager._last_code_generated to self._last_code_generated. This could potentially break the code if there are other parts of the codebase that rely on self._code_manager._last_code_generated. Please ensure that this change is propagated throughout the codebase and doesn't cause any unintended side effects.

  • 656-658: A new property dfs has been added to return self._dfs. This seems like a good practice as it encapsulates the internal state of the object and provides a controlled way to access it. However, please ensure that the returned object is not mutable or if it is, that mutating it won't cause any issues in the rest of your application.

pandasai/helpers/memory.py (2)
  • 1-24: The new Memory class now includes a _max_messages attribute to limit the number of stored messages. The add method has been updated to delete the oldest two messages when this limit is exceeded. This change could potentially impact any code that relies on the Memory class, especially if it expects an unlimited number of messages to be stored. Please verify that all parts of the codebase that use the Memory class are compatible with this change.

  • 11-13: The constructor of the Memory class now accepts an optional parameter max_messages, which defaults to sys.maxsize. This allows users to specify a maximum number of messages to store in memory. If not specified, the default behavior is to store as many messages as possible, limited only by system constraints. This is a good practice as it provides flexibility for different use cases while maintaining backward compatibility.

Comment on lines 10 to 12
def __init__(
self, success: bool = True, questions: List[str] = None, message: str = ""
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for questions is None, which could lead to issues if you try to perform list operations on it. It's better to use an empty list as the default value.

-    def __init__(self, success: bool = True, questions: List[str] = None, message: str = "")
+    def __init__(self, success: bool = True, questions: List[str] = [], message: str = "")

Comment on lines 26 to 28
@property
def message(self) -> List[str]:
return self._message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type hint for the message property is incorrect. It should be str, not List[str].

-    def message(self) -> List[str]:
+    def message(self) -> str:

tests/test_agent.py Show resolved Hide resolved
tests/test_agent.py Show resolved Hide resolved
Comment on lines 108 to 128
def test_chat_memory_rollup(self, sample_df, config):
agent = Agent(sample_df, config, memory_size=1)
agent._lake.chat = Mock()
agent._lake.chat.return_value = "United States has the highest gdp"

# Test the chat function
agent.chat("Which country has the highest gdp?")

memory = agent._memory.all()
assert len(memory) == 2
assert memory[0]["message"] == "Which country has the highest gdp?"
assert memory[1]["message"] == "United States has the highest gdp"

# Add another conversation
agent._lake.chat.return_value = "United Kingdom has the second highest gdp"
agent.chat("Which country has the second highest gdp?")

memory = agent._memory.all()
assert len(memory) == 2
assert memory[0]["message"] == "Which country has the second highest gdp?"
assert memory[1]["message"] == "United Kingdom has the second highest gdp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_chat_memory_rollup method tests the memory rollup feature when the memory size limit is reached. However, it only tests the case where the memory size limit is 1. Consider adding more test cases with different memory size limits to ensure the rollup feature works as expected in all scenarios.

tests/test_agent.py Show resolved Hide resolved
pandasai/helpers/memory.py Outdated Show resolved Hide resolved
Comment on lines 20 to 23
_memory: Memory
_lake: SmartDatalake = None
logger: Logger = None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class variables _memory, _lake, and logger are declared but not initialized in the class scope. This could lead to confusion as it might imply that these are static class variables, which they are not. It would be better to declare these variables inside the __init__ method.

class Agent:
    """
    Agent class to improve the conversational experience in PandasAI
    """

- 20:     _memory: Memory
- 21:     _lake: SmartDatalake = None
- 22:     logger: Logger = None
+ 20:     def __init__(
+ 21:         self,
+ 22:         dfs: Union[DataFrameType, List[DataFrameType]],
+ 23:         config: Optional[Union[Config, dict]] = None,
+ 24:         logger: Logger = None,
+ 25:         memory_size: int = 1,
+ 26:     ):
+ 27:         """
+ 28:         Args:
+ 29:             df (Union[SmartDataframe, SmartDatalake]): _description_
+ 30:             memory_size (int, optional): _description_. Defaults to 1.
+ 31:         """
+ 32: 
+ 33:         if not isinstance(dfs, list):
+ 34:             dfs = [dfs]
+ 35: 
+ 36:         self._lake = SmartDatalake(dfs, config, logger)
+ 37:         self.logger = self._lake.logger
+ 38:         # For the conversation multiple the memory size by 2
+ 39:         self._memory = Memory(memory_size * 2)

pandasai/agent/__init__.py Outdated Show resolved Hide resolved
Comment on lines 122 to 136
try:
prompt = ExplainPrompt()
prompt.set_var("code", self._lake.last_code_executed)
response = self._lake.llm.call(prompt)
self.logger.log(
f"""Explaination: {response}
"""
)
return response
except Exception as exception:
return (
"Unfortunately, I was not able to explain, "
"because of the following error:\n"
f"\n{exception}\n"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous points, consider raising a custom exception instead of returning an error message string.

- 131:         except Exception as exception:
- 132:             return (
- 133:                 "Unfortunately, I was not able to explain, "
- 134:                 "because of the following error:\n"
- 135:                 f"\n{exception}\n"
- 136:             )
+ 131:         except Exception as exception:
+ 132:             raise ExplanationException("Unable to generate explanation") from exception

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Commits Files that changed from the base of the PR and between f715035 and 2da890a.
Files selected for processing (2)
  • examples/agent.py (1 hunks)
  • pandasai/agent/init.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • examples/agent.py
Additional comments (Suppressed): 1
pandasai/agent/__init__.py (1)
  • 33-34: The docstring for the __init__ method seems to be incomplete or incorrect. The argument df is not present in the function signature, and the arguments dfs, config, and logger are not described. Please update the docstring to accurately reflect the function's parameters and their descriptions.

Comment on lines 20 to 22
_memory: Memory
_lake: SmartDatalake = None
logger: Optional[Logger] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class variables _memory, _lake, and logger are declared outside of the __init__ method. This is not a common practice in Python and can lead to confusion. It's better to declare these variables inside the __init__ method.

-    _memory: Memory
-    _lake: SmartDatalake = None
-    logger: Optional[Logger] = None

+    def __init__(
+        self,
+        dfs: Union[DataFrameType, List[DataFrameType]],
+        config: Optional[Union[Config, dict]] = None,
+        logger: Optional[Logger] = None,
+        memory_size: int = 1,
+    ):
+        self._memory = None
+        self._lake = None
+        self.logger = logger

pandasai/agent/__init__.py Outdated Show resolved Hide resolved
Comment on lines 110 to 116
def start_new_conversation(self) -> True:
"""
Clears the previous conversation
"""

self._memory.clear()
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The start_new_conversation method always returns True. If the purpose of this return value is to indicate success, consider removing it. In Python, it's common to assume that a method succeeded if it didn't raise an exception.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 22, 2023

Rate Limit Exceeded

You have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 29 minutes and 59 seconds before requesting another review.

To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 22, 2023

Rate Limit Exceeded

You have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 29 minutes and 59 seconds before requesting another review.

To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR.

@codecov-commenter
Copy link

codecov-commenter commented Sep 22, 2023

Codecov Report

❗ No coverage uploaded for pull request base (feature/v1.3@6a3879e). Click here to learn what that means.
The diff coverage is n/a.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@               Coverage Diff               @@
##             feature/v1.3     #584   +/-   ##
===============================================
  Coverage                ?   84.07%           
===============================================
  Files                   ?       57           
  Lines                   ?     2788           
  Branches                ?        0           
===============================================
  Hits                    ?     2344           
  Misses                  ?      444           
  Partials                ?        0           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -0,0 +1,41 @@
import pandas as pd
from pandasai.agent import Agent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change it to:
from pandasaiimport Agent


_memory: Memory
_lake: SmartDatalake = None
logger: Optional[Logger] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make logger private: _logger

Agent class to improve the conversational experience in PandasAI
"""

_memory: Memory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should rely on the memory from the SmartDatalake instead, since we have access to it.

# For the conversation multiple the memory size by 2
self._memory = Memory(memory_size * 2)

def _get_conversation(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have such a method in the memory.py file. What's the advantage or replicating it instead of using the one from the Memory?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important trying to reuse the code as much as possible to grant maintainability over time!

self._memory.add(query, True)
conversation = self._get_conversation()
result = self._lake.chat(
query, output_type=output_type, start_conversation=conversation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the memory from the SmartDataframe, we don't need to pass the start_conversation, I suggest we remove this.

self._messages = []
self._max_messages = max_messages
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this (see below)


def add(self, message: str, is_user: bool):
self._messages.append({"message": message, "is_user": is_user})

# Delete two entry because of the conversation
if len(self._messages) > self._max_messages:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this, the get_conversation method of memory will automatically limit based on the limit variable that is passed.

pandasai/prompts/explain_prompt.py Show resolved Hide resolved
</Code

Can you explain briefly for non technical person on how you came up with code
without explaining pandas library?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try with "Explain how you came up with code for non-technical people without mentioning technical details or mentioning the libraries used".

self,
query: str,
output_type: Optional[str] = None,
start_conversation: Optional[str] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this

Copy link
Collaborator Author

@ArslanSaleem ArslanSaleem Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gventuri Currently the get_conversation method is called with default which has limit=1 means returning the last message only. We need to then pass the memory size to SmartLake constructor to do so or let's use memory for that.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 10

Commits Files that changed from the base of the PR and between 2da890a and 9025f4e.
Files selected for processing (6)
  • examples/agent.py (1 hunks)
  • pandasai/agent/init.py (1 hunks)
  • pandasai/prompts/clarification_questions_prompt.py (1 hunks)
  • pandasai/prompts/explain_prompt.py (1 hunks)
  • pandasai/smart_datalake/init.py (4 hunks)
  • tests/test_agent.py (1 hunks)
Files skipped from review due to trivial changes (2)
  • pandasai/prompts/explain_prompt.py
  • tests/test_agent.py
Additional comments (Suppressed): 3
examples/agent.py (1)
  • 1-37: The new hunk of code seems to be well-written and logically sound. It demonstrates the usage of the Agent class from the pandasai.agent module, which is part of the PandasAI library. The code creates two pandas DataFrames, initializes an instance of the OpenAI class, and creates an Agent object. Then it shows how to chat with the agent, get clarification questions, and explain how the chat response is generated.

However, there are a few points that need to be verified or refined:

  1. In line 21, the OpenAI key is hardcoded as "OPEN_API_KEY". This could potentially be a placeholder, but if it's not, it's a security issue. API keys should never be hardcoded in the source code. They should be stored securely and accessed through environment variables or secure key management systems.
  2. In line 22, the Agent class is initialized with a memory size of 10. Ensure that this value is appropriate for your use case. If the number of messages exceeds this limit, the oldest messages will be removed according to the PR summary.
  3. The code doesn't handle any exceptions. While it's mentioned in the instructions that arguments to internal function calls are pre-sanitized and do not require additional error handling, it would still be good practice to add some basic error handling around the main operations (like chatting with the agent, getting clarification questions, etc.) to catch and log any unexpected errors.
pandasai/smart_datalake/__init__.py (2)
  • 258-263: The chat method has been updated to include a new optional parameter start_conversation. Ensure that this change doesn't break any existing calls to the chat method in your codebase. If start_conversation is used, it should be a string representing the start of a conversation.

  • 633-634: The setter for last_code_generated has been changed to set self._last_code_generated instead of self._code_manager._last_code_generated. Ensure that this change is intentional and that it doesn't affect other parts of your code that rely on last_code_generated.

pandasai/prompts/clarification_questions_prompt.py Outdated Show resolved Hide resolved
pandasai/prompts/clarification_questions_prompt.py Outdated Show resolved Hide resolved
pandasai/smart_datalake/__init__.py Outdated Show resolved Hide resolved
pandasai/smart_datalake/__init__.py Show resolved Hide resolved
pandasai/agent/__init__.py Outdated Show resolved Hide resolved
pandasai/agent/__init__.py Outdated Show resolved Hide resolved
Comment on lines 77 to 78
except Exception as exception:
raise exception
Copy link
Contributor

@coderabbitai coderabbitai bot Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the clarification_questions method, exceptions are caught and immediately re-raised. This is redundant and can be removed, as uncaught exceptions will automatically propagate up the call stack.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArslanSaleem agree, I suggest we remove try/catch wrapper. In the future we might add a retry mechanism!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.

pandasai/agent/__init__.py Show resolved Hide resolved
pandasai/agent/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Commits Files that changed from the base of the PR and between 9025f4e and d1b8e61.
Files selected for processing (1)
  • examples/agent.py (1 hunks)
Additional comments (Suppressed): 3
examples/agent.py (3)
  • 1-37: The new hunk of code seems to be well-written and logically sound. It demonstrates the usage of the Agent class from the pandasai library, including how to initialize an instance of the class, chat with the agent, get clarification questions, and explain how a response is generated.

However, there are a few points that need verification or refinement:

  • 1-2: Ensure that the Agent class has been correctly implemented in the pandasai module.

  • 6-15: The data used for demonstration purposes is hardcoded. While this is acceptable for an example script, in a real-world scenario, you would likely want to load this data from a file or database.
    21: The OpenAI key is hardcoded into the script. This could pose a security risk if the script is shared or made public. Consider loading sensitive information like API keys from environment variables or secure files.

22: The Agent class is initialized with two DataFrames, a configuration dictionary containing an instance of the OpenAI class, and a memory size. Ensure that the Agent class's constructor is designed to handle these parameters correctly.

25, 30, 36: These lines call methods (chat, clarification_questions, explain) on the Agent instance. Verify that these methods have been correctly implemented in the Agent class.

Overall, the code appears to be well-structured and easy to understand. However, it's important to verify the implementation of the Agent class and its methods to ensure they work as expected.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 22, 2023

Rate Limit Exceeded

You have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 29 minutes and 59 seconds before requesting another review.

To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 22, 2023

Rate Limit Exceeded

You have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 27 minutes and 14 seconds before requesting another review.

To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 22, 2023

Rate Limit Exceeded

You have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 21 minutes and 1 seconds before requesting another review.

To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR.

Copy link
Collaborator

@gventuri gventuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great changes, thanks a lot! We are almost there!

Comment on lines 77 to 78
except Exception as exception:
raise exception
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArslanSaleem agree, I suggest we remove try/catch wrapper. In the future we might add a retry mechanism!


<dataframe>
{dataframes}
</dataframe>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{dataframes} only should be enough, the set_var method in prompts already takes care of wrapping each one individually.

"""

def __init__(self, dataframes, conversation):
self.set_var("dataframes", dataframes)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def set_var(self, var, value):
        if var == "dfs":
            self._args["dataframes"] = self._generate_dataframes(value)
        self._args[var] = value

This method from Prompt automatically configures the dfs to be used in the prompt, but it requires a dfs key.
Let's change this to self.set_var("dfs", dataframes)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 7

Commits Files that changed from the base of the PR and between d1b8e61 and 2e4c902.
Files selected for processing (5)
  • pandasai/agent/init.py (1 hunks)
  • pandasai/helpers/memory.py (2 hunks)
  • pandasai/prompts/clarification_questions_prompt.py (1 hunks)
  • pandasai/prompts/explain_prompt.py (1 hunks)
  • pandasai/smart_datalake/init.py (3 hunks)
Additional comments (Suppressed): 6
pandasai/helpers/memory.py (1)
  • 26-31: The get_conversation method now uses the memory_size as the default limit if no limit is provided. This change aligns with the introduction of the memory_size attribute and provides more flexibility in retrieving conversation history.
pandasai/smart_datalake/__init__.py (3)
  • 305-311: The new hunk seems to be missing the assignment of the generate_python_code_instruction variable. This could lead to a NameError later in the code if this variable is referenced. Please verify if this is intentional or an oversight.
-                 generate_python_code_instruction = self._get_prompt(
+                 self.generate_python_code_instruction = self._get_prompt(
                      "generate_python_code",
                      default_prompt=GeneratePythonCodePrompt,
  • 624-627: The setter for last_code_generated has been changed from setting the value on _code_manager._last_code_generated to _last_code_generated. Ensure that this change does not break any dependencies on the old behavior, especially if other parts of the code are expecting the value to be set on _code_manager.

  • 645-655: Two new properties, dfs and memory, have been added. These seem to be simple getters for the _dfs and _memory attributes. If these attributes were previously accessed directly, ensure that all such instances have been updated to use the new property methods.

pandasai/agent/__init__.py (2)
  • 28-31: The docstring for the __init__ method seems to be outdated or incorrect. The arguments mentioned in the docstring do not match the actual arguments of the method. Please update the docstring to reflect the correct arguments and their descriptions.

  • 66-67: The code is returning only the first three clarification questions. If there are more than three, the rest will be ignored. Make sure this is the intended behavior. If all questions should be returned, you can simply return questions without slicing it.

Comment on lines 24 to 47
"""Prompt to get clarification questions"""

text: str = """
You are provided with the following pandas DataFrames:

{dataframes}

<conversation>
{conversation}
</conversation>

Based on the conversation, are there any clarification questions
that a senior data scientist would ask? These are questions for non technical people,
only ask for questions they could ask given low tech expertise and
no knowledge about how the dataframes are structured.

Return the JSON array of the clarification questions.

If there is no clarification question, return an empty array.

Json:
"""

def __init__(self, dataframes, conversation):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ClarificationQuestionPrompt class is well defined and follows the SOLID principles. It extends from the base Prompt class and sets up its specific prompt text and variables in the constructor. However, it would be beneficial to add type hints for the parameters dataframes and conversation in the __init__ method to improve readability and maintainability.

-    def __init__(self, dataframes, conversation):
+    def __init__(self, dataframes: pd.DataFrame, conversation: str):

Comment on lines 1 to 17
""" Prompt to explain solution generated
The previous conversation we had

<Conversation>
{conversation}
</Conversation>

Based on the last conversation you generated the following code:

<Code>
{code}
</Code

Explain how you came up with code for non-technical people without
mentioning technical details or mentioning the libraries used?

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring at the top of the file seems to be a copy of the prompt text. It's not providing any useful information about the module or its contents. Consider replacing it with a more informative docstring that describes the purpose and functionality of the module.



class ExplainPrompt(Prompt):
"""Prompt to get clarification questions"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class docstring is misleading. This class is for generating explanation prompts, not clarification questions. Please correct it to avoid confusion.

-     """Prompt to get clarification questions"""
+     """Prompt to generate explanation for the code"""

pandasai/prompts/explain_prompt.py Outdated Show resolved Hide resolved
Comment on lines +10 to +12
def __init__(self, memory_size: int = 1):
self._messages = []
self._memory_size = memory_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The __init__ method now accepts an optional parameter memory_size which defaults to 1. This is used to limit the number of stored messages in memory. However, there's no logic implemented yet to enforce this limit when adding new messages. Consider adding a check in the add method to remove the oldest message(s) when the limit is exceeded.

     def add(self, message: str, is_user: bool):
         self._messages.append({"message": message, "is_user": is_user})
+         while len(self._messages) > self._memory_size:
+             self._messages.pop(0)

Comment on lines +90 to +95
except Exception as exception:
return (
"Unfortunately, I was not able to explain, "
"because of the following error:\n"
f"\n{exception}\n"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error handling in these methods only returns a string with the error message. It might be better to log the exception and re-raise it after logging. This way, the caller of the method can decide how to handle the exception. Also, this will help in debugging as the stack trace will be available in the logs.

-            return (
-                "Unfortunately, I was not able to get your answers, "
-                "because of the following error:\n"
-                f"\n{exception}\n"
-            )
+            self._logger.log(
+                "Unfortunately, I was not able to get your answers, "
+                "because of the following error:\n"
+                f"\n{exception}\n"
+            )
+            raise

...

-            return (
-                "Unfortunately, I was not able to explain, "
-                "because of the following error:\n"
-                f"\n{exception}\n"
-            )
+            self._logger.log(
+                "Unfortunately, I was not able to explain, "
-                "because of the following error:\n"
-                f"\n{exception}\n"
+            )
+            raise

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 5

Commits Files that changed from the base of the PR and between 2e4c902 and 7a554a5.
Files selected for processing (2)
  • docs/examples.md (1 hunks)
  • pandasai/agent/init.py (1 hunks)
Additional comments (Suppressed): 1
docs/examples.md (1)
  • 210-222: The new section "Working with Agent" provides a clear and concise overview of the key features of the Agent class. It explains how context retention, clarification questions, and explanations can enhance user interactions with the chat agent. This is a good addition to the documentation as it helps users understand how to use the Agent class effectively.

docs/examples.md Show resolved Hide resolved
Comment on lines +35 to +37
if not isinstance(dfs, list):
dfs = [dfs]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is assuming that if dfs is not a list, it should be converted into a list. This might lead to unexpected behavior if dfs is of an unsupported type. It would be better to explicitly check for supported types and raise an error if an unsupported type is provided.

-        if not isinstance(dfs, list):
-            dfs = [dfs]
+        if isinstance(dfs, DataFrameType):
+            dfs = [dfs]
+        elif not isinstance(dfs, list):
+            raise TypeError("dfs must be a DataFrameType or a list of DataFrameType")

Comment on lines +46 to +53
result = self._lake.chat(query, output_type=output_type)
return result
except Exception as exception:
return (
"Unfortunately, I was not able to get your answers, "
"because of the following error:\n"
f"\n{exception}\n"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handling here is too broad, which can make debugging difficult because it hides the details of what went wrong. Consider catching specific exceptions that you expect might occur during the execution of this block. If you want to catch all exceptions, at least log the full traceback to help with debugging.

Comment on lines +59 to +69
prompt = ClarificationQuestionPrompt(
self._lake.dfs, self._lake._memory.get_conversation()
)

result = self._lake.llm.call(prompt)
self._logger.log(
f"""Clarification Questions: {result}
"""
)
questions: list[str] = json.loads(result)
return questions[:3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result from self._lake.llm.call(prompt) is directly loaded as JSON and returned. There's no error handling in case the result is not valid JSON, which could cause the function to crash. Consider adding error handling around json.loads(result).

Comment on lines +81 to +97
try:
prompt = ExplainPrompt(
self._lake._memory.get_conversation(),
self._lake.last_code_executed,
)
response = self._lake.llm.call(prompt)
self._logger.log(
f"""Explaination: {response}
"""
)
return response
except Exception as exception:
return (
"Unfortunately, I was not able to explain, "
"because of the following error:\n"
f"\n{exception}\n"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous comment, the exception handling here is too broad. Consider catching specific exceptions that you expect might occur during the execution of this block. If you want to catch all exceptions, at least log the full traceback to help with debugging.


<dataframe>
{dataframe}
</dataframe>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also change this to

{dataframes}

for consistency!

pandasai/smart_datalake/__init__.py Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Commits Files that changed from the base of the PR and between 7a554a5 and f7e4d98.
Files selected for processing (2)
  • pandasai/prompts/clarification_questions_prompt.py (1 hunks)
  • pandasai/prompts/explain_prompt.py (1 hunks)
Additional comments (Suppressed): 2
pandasai/prompts/clarification_questions_prompt.py (1)
  • 48-49: The set_var method is used here but it's not defined in this class or imported from elsewhere. Ensure that this method is properly defined and accessible within this class.
pandasai/prompts/explain_prompt.py (1)
  • 42-44: The set_var method is not defined in this class or its parent class Prompt. Ensure that this method exists and works as expected. If it's intended to set instance variables, you can directly assign the values like self.conversation = conversation and self.code = code.

Comment on lines +24 to +40
text: str = """
The previous conversation we had

<Conversation>
{conversation}
</Conversation>

Based on the last conversation you generated the following code:

<Code>
{code}
</Code

Explain how you came up with code for non-technical people without
mentioning technical details or mentioning the libraries used?

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text attribute in the ExplainPrompt class is a class variable, which means it's shared across all instances of this class. If you modify it in one instance, it will affect all other instances. This could lead to unexpected behavior if multiple instances of ExplainPrompt are used concurrently. Consider moving this attribute to the instance level (inside __init__) to avoid potential issues.

@gventuri
Copy link
Collaborator

@ArslanSaleem thanks a lot for the great job. Merging!

@gventuri gventuri merged commit dd6934d into feature/v1.3 Sep 24, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants