-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat[Agent]: add agent conversation code #584
Conversation
WalkthroughThis update introduces the Changes
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 10
Files selected for processing (6)
- examples/agent.py (1 hunks)
- pandasai/init.py (2 hunks)
- pandasai/agent/init.py (1 hunks)
- pandasai/helpers/memory.py (1 hunks)
- pandasai/prompts/clarification_questions_prompt.py (1 hunks)
- pandasai/smart_datalake/init.py (3 hunks)
Files skipped from review due to trivial changes (1)
- pandasai/init.py
Additional comments (Suppressed): 5
pandasai/helpers/memory.py (1)
- 11-13: The
max_messages
parameter in the__init__
method has a default value ofsys.maxsize
. This could potentially consume a lot of memory if not handled properly. Consider setting a reasonable default limit or ensure that this is well-documented so users are aware of the implications.pandasai/agent/__init__.py (2)
30-31: The argument descriptions in the docstring are placeholders (
_description_
). Please provide meaningful descriptions fordfs
andmemory_size
.26-27: The default value for
memory_size
is set to 1. Depending on the expected size of the conversations, this might be too small and could lead to loss of conversation history. Consider setting a larger default or make sure that the callers are aware of this limit.pandasai/smart_datalake/__init__.py (2)
258-263: The
chat
method now includes an optional parameterstart_conversation
. Ensure that this change doesn't break any existing calls to thechat
method. Also, verify if thestart_conversation
parameter is being used correctly within the method.657-658: A new property
dfs
has been added. Make sure that this property is being set correctly in the class and is being used appropriately where needed.
pandasai/helpers/memory.py
Outdated
_messages: list | ||
_max_messages: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _max_messages
attribute is not defined in the class scope, which might lead to confusion and potential errors. It's better to define it within the __init__
method.
examples/agent.py
Outdated
import pandas as pd | ||
from pandasai.agent import Agent | ||
|
||
from pandasai.llm.openai import OpenAI | ||
|
||
employees_data = { | ||
"EmployeeID": [1, 2, 3, 4, 5], | ||
"Name": ["John", "Emma", "Liam", "Olivia", "William"], | ||
"Department": ["HR", "Sales", "IT", "Marketing", "Finance"], | ||
} | ||
|
||
salaries_data = { | ||
"EmployeeID": [1, 2, 3, 4, 5], | ||
"Salary": [5000, 6000, 4500, 7000, 5500], | ||
} | ||
|
||
employees_df = pd.DataFrame(employees_data) | ||
salaries_df = pd.DataFrame(salaries_data) | ||
|
||
|
||
llm = OpenAI("OPEN_API") | ||
dl = Agent([employees_df, salaries_df], config={"llm": llm}, memory_size=10) | ||
response = dl.chat("Who gets paid the most?") | ||
print(response) | ||
response = dl.clarification_questions() | ||
|
||
response = dl.chat("Which department does he belongs to?") | ||
print(response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good overall. It imports necessary modules, creates dataframes from dictionaries, initializes the OpenAI
and Agent
classes, and uses the chat
and clarification_questions
methods of the Agent
class.
However, there are a few points that need to be clarified or refined:
- The
OPEN_API
string passed to theOpenAI
class on line 21 seems like a placeholder. Please ensure it is replaced with an actual API key or configuration. - The
memory_size
argument in theAgent
initialization on line 22 is set to 10. This means the agent will only remember the last 10 messages. Depending on the use case, this might be too small and could lead to loss of context in longer conversations. - The
clarification_questions
method is called on line 25, but its return value is not used or printed. If the intention is to display the clarification questions, consider printing the response. - On line 27, the chat question "Which department does he belongs to?" assumes a male employee. If the highest paid employee is female, this question may not yield the expected results. Consider using a gender-neutral term or referring to the employee by name if available.
Here's how you might address these points:
employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)
- 21: llm = OpenAI("OPEN_API")
+ 21: llm = OpenAI("<Your OpenAI API Key>")
dl = Agent([employees_df, salaries_df], config={"llm": llm}, memory_size=10)
response = dl.chat("Who gets paid the most?")
print(response)
- 25: response = dl.clarification_questions()
+ 25: response = dl.clarification_questions()
+ 26: print(response)
- 28: response = dl.chat("Which department does he belongs to?")
+ 28: response = dl.chat("Which department does the highest paid employee belong to?")
print(response)
pandasai/agent/__init__.py
Outdated
try: | ||
prompt = self._get_clarification_prompt() | ||
result = self._lake.llm.generate_code(prompt) | ||
except Exception as exception: | ||
return ( | ||
"Unfortunately, I was not able to get your clarification questions, " | ||
"because of the following error:\n" | ||
f"\n{exception}\n" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the clarification_questions
method, exceptions are caught and a string message is returned. This might be confusing for the caller as they would expect a list of questions in normal operation. Consider raising the exception again after logging or handling it, instead of returning a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 6
Files selected for processing (3)
- examples/agent.py (1 hunks)
- pandasai/agent/init.py (1 hunks)
- tests/test_agent.py (1 hunks)
Files skipped from review due to trivial changes (1)
- tests/test_agent.py
Additional comments (Suppressed): 2
examples/agent.py (2)
1-28: The
Agent
class is instantiated with a list of dataframes, a configuration dictionary containing an instance ofOpenAI
, and a memory size. Thechat
method is used to interact with the agent and print responses, while theclarification_questions
method generates potential follow-up queries. Ensure that theOPEN_API
key passed to theOpenAI
constructor is a valid OpenAI API key.21-22: Ensure that the
llm
object (an instance ofOpenAI
) is correctly initialized with a valid API key before being passed into theAgent
constructor. Also, verify that thememory_size
parameter in theAgent
constructor is set appropriately considering the application's memory constraints and usage requirements.
pandasai/agent/__init__.py
Outdated
|
||
_memory: Memory | ||
_lake: SmartDatalake = None | ||
logger: Logger = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logger
attribute is initialized as None
but it's not checked for None
before being used in line 38. This could potentially lead to a NoneType
error if the logger is not passed during initialization and later accessed.
- logger: Logger = None
+ logger: Optional[Logger] = None
pandasai/agent/__init__.py
Outdated
self, | ||
dfs: Union[DataFrameType, List[DataFrameType]], | ||
config: Optional[Union[Config, dict]] = None, | ||
logger: Logger = None, | ||
memory_size=1, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory_size
parameter in the constructor lacks type hinting, which can lead to confusion about what type of argument should be passed. It's also missing from the docstring. Add an integer type hint and update the docstring accordingly.
- memory_size=1,
+ memory_size: int = 1,
pandasai/agent/__init__.py
Outdated
def chat(self, query: str): | ||
""" | ||
Simulate a chat interaction with the assistant on Dataframe. | ||
""" | ||
self._memory.add(query, True) | ||
conversation = self._get_conversation() | ||
result = self._lake.chat(query, start_conversation=conversation) | ||
self._memory.add(result, False) | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chat
method does not handle exceptions. If an exception occurs during the execution of the self._lake.chat()
method, it will propagate up the call stack and could potentially crash the application. Consider adding a try/except block to handle exceptions gracefully.
pandasai/agent/__init__.py
Outdated
def clarification_questions(self): | ||
""" | ||
Generate and return up to three clarification questions based on a given prompt. | ||
""" | ||
try: | ||
prompt = self._get_clarification_prompt() | ||
result = self._lake.llm.generate_code(prompt) | ||
questions = json.loads(result) | ||
except Exception as exception: | ||
return ( | ||
"Unfortunately, I was not able to get your clarification questions, " | ||
"because of the following error:\n" | ||
f"\n{exception}\n" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's good that you're handling exceptions in the clarification_questions
method, returning a string with the error message might not be the best approach. It would be more appropriate to log the error and raise the exception again after logging, so that the caller can decide how to handle it.
pandasai/agent/__init__.py
Outdated
def start_new_conversation(self): | ||
""" | ||
Clears the previous conversation | ||
""" | ||
self._memory.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The start_new_conversation
method clears the memory but does not return any confirmation or status. It would be helpful to return a status message or boolean value indicating whether the operation was successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 13
Files selected for processing (7)
- examples/agent.py (1 hunks)
- pandasai/agent/init.py (1 hunks)
- pandasai/agent/response.py (1 hunks)
- pandasai/helpers/memory.py (1 hunks)
- pandasai/prompts/explain_prompt.py (1 hunks)
- pandasai/smart_datalake/init.py (4 hunks)
- tests/test_agent.py (1 hunks)
Files skipped from review due to trivial changes (2)
- examples/agent.py
- pandasai/prompts/explain_prompt.py
Additional comments (Suppressed): 6
pandasai/smart_datalake/__init__.py (4)
258-263: The
chat
method now accepts an additional optional parameterstart_conversation
. Ensure that all calls to this function throughout the codebase have been updated to handle this new parameter. Also, verify if the default behavior whenstart_conversation
is not provided aligns with the expected functionality.313-314: The
start_conversation
value is being added todefault_values
dictionary. Make sure that the downstream functions or methods that usedefault_values
are capable of handling this new key-value pair without causing any unexpected behavior or errors.633-634: The setter for
last_code_generated
has been changed from settingself._code_manager._last_code_generated
toself._last_code_generated
. This could potentially break the code if there are other parts of the codebase that rely onself._code_manager._last_code_generated
. Please ensure that this change is propagated throughout the codebase and doesn't cause any unintended side effects.656-658: A new property
dfs
has been added to returnself._dfs
. This seems like a good practice as it encapsulates the internal state of the object and provides a controlled way to access it. However, please ensure that the returned object is not mutable or if it is, that mutating it won't cause any issues in the rest of your application.pandasai/helpers/memory.py (2)
1-24: The new
Memory
class now includes a_max_messages
attribute to limit the number of stored messages. Theadd
method has been updated to delete the oldest two messages when this limit is exceeded. This change could potentially impact any code that relies on theMemory
class, especially if it expects an unlimited number of messages to be stored. Please verify that all parts of the codebase that use theMemory
class are compatible with this change.11-13: The constructor of the
Memory
class now accepts an optional parametermax_messages
, which defaults tosys.maxsize
. This allows users to specify a maximum number of messages to store in memory. If not specified, the default behavior is to store as many messages as possible, limited only by system constraints. This is a good practice as it provides flexibility for different use cases while maintaining backward compatibility.
pandasai/agent/response.py
Outdated
def __init__( | ||
self, success: bool = True, questions: List[str] = None, message: str = "" | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for questions
is None
, which could lead to issues if you try to perform list operations on it. It's better to use an empty list as the default value.
- def __init__(self, success: bool = True, questions: List[str] = None, message: str = "")
+ def __init__(self, success: bool = True, questions: List[str] = [], message: str = "")
pandasai/agent/response.py
Outdated
@property | ||
def message(self) -> List[str]: | ||
return self._message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type hint for the message
property is incorrect. It should be str
, not List[str]
.
- def message(self) -> List[str]:
+ def message(self) -> str:
tests/test_agent.py
Outdated
def test_chat_memory_rollup(self, sample_df, config): | ||
agent = Agent(sample_df, config, memory_size=1) | ||
agent._lake.chat = Mock() | ||
agent._lake.chat.return_value = "United States has the highest gdp" | ||
|
||
# Test the chat function | ||
agent.chat("Which country has the highest gdp?") | ||
|
||
memory = agent._memory.all() | ||
assert len(memory) == 2 | ||
assert memory[0]["message"] == "Which country has the highest gdp?" | ||
assert memory[1]["message"] == "United States has the highest gdp" | ||
|
||
# Add another conversation | ||
agent._lake.chat.return_value = "United Kingdom has the second highest gdp" | ||
agent.chat("Which country has the second highest gdp?") | ||
|
||
memory = agent._memory.all() | ||
assert len(memory) == 2 | ||
assert memory[0]["message"] == "Which country has the second highest gdp?" | ||
assert memory[1]["message"] == "United Kingdom has the second highest gdp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test_chat_memory_rollup
method tests the memory rollup feature when the memory size limit is reached. However, it only tests the case where the memory size limit is 1. Consider adding more test cases with different memory size limits to ensure the rollup feature works as expected in all scenarios.
pandasai/agent/__init__.py
Outdated
_memory: Memory | ||
_lake: SmartDatalake = None | ||
logger: Logger = None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class variables _memory
, _lake
, and logger
are declared but not initialized in the class scope. This could lead to confusion as it might imply that these are static class variables, which they are not. It would be better to declare these variables inside the __init__
method.
class Agent:
"""
Agent class to improve the conversational experience in PandasAI
"""
- 20: _memory: Memory
- 21: _lake: SmartDatalake = None
- 22: logger: Logger = None
+ 20: def __init__(
+ 21: self,
+ 22: dfs: Union[DataFrameType, List[DataFrameType]],
+ 23: config: Optional[Union[Config, dict]] = None,
+ 24: logger: Logger = None,
+ 25: memory_size: int = 1,
+ 26: ):
+ 27: """
+ 28: Args:
+ 29: df (Union[SmartDataframe, SmartDatalake]): _description_
+ 30: memory_size (int, optional): _description_. Defaults to 1.
+ 31: """
+ 32:
+ 33: if not isinstance(dfs, list):
+ 34: dfs = [dfs]
+ 35:
+ 36: self._lake = SmartDatalake(dfs, config, logger)
+ 37: self.logger = self._lake.logger
+ 38: # For the conversation multiple the memory size by 2
+ 39: self._memory = Memory(memory_size * 2)
pandasai/agent/__init__.py
Outdated
try: | ||
prompt = ExplainPrompt() | ||
prompt.set_var("code", self._lake.last_code_executed) | ||
response = self._lake.llm.call(prompt) | ||
self.logger.log( | ||
f"""Explaination: {response} | ||
""" | ||
) | ||
return response | ||
except Exception as exception: | ||
return ( | ||
"Unfortunately, I was not able to explain, " | ||
"because of the following error:\n" | ||
f"\n{exception}\n" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the previous points, consider raising a custom exception instead of returning an error message string.
- 131: except Exception as exception:
- 132: return (
- 133: "Unfortunately, I was not able to explain, "
- 134: "because of the following error:\n"
- 135: f"\n{exception}\n"
- 136: )
+ 131: except Exception as exception:
+ 132: raise ExplanationException("Unable to generate explanation") from exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 3
Files selected for processing (2)
- examples/agent.py (1 hunks)
- pandasai/agent/init.py (1 hunks)
Files skipped from review due to trivial changes (1)
- examples/agent.py
Additional comments (Suppressed): 1
pandasai/agent/__init__.py (1)
- 33-34: The docstring for the
__init__
method seems to be incomplete or incorrect. The argumentdf
is not present in the function signature, and the argumentsdfs
,config
, andlogger
are not described. Please update the docstring to accurately reflect the function's parameters and their descriptions.
pandasai/agent/__init__.py
Outdated
_memory: Memory | ||
_lake: SmartDatalake = None | ||
logger: Optional[Logger] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class variables _memory
, _lake
, and logger
are declared outside of the __init__
method. This is not a common practice in Python and can lead to confusion. It's better to declare these variables inside the __init__
method.
- _memory: Memory
- _lake: SmartDatalake = None
- logger: Optional[Logger] = None
+ def __init__(
+ self,
+ dfs: Union[DataFrameType, List[DataFrameType]],
+ config: Optional[Union[Config, dict]] = None,
+ logger: Optional[Logger] = None,
+ memory_size: int = 1,
+ ):
+ self._memory = None
+ self._lake = None
+ self.logger = logger
pandasai/agent/__init__.py
Outdated
def start_new_conversation(self) -> True: | ||
""" | ||
Clears the previous conversation | ||
""" | ||
|
||
self._memory.clear() | ||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The start_new_conversation
method always returns True
. If the purpose of this return value is to indicate success, consider removing it. In Python, it's common to assume that a method succeeded if it didn't raise an exception.
Rate Limit ExceededYou have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 29 minutes and 59 seconds before requesting another review. To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR. |
Rate Limit ExceededYou have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 29 minutes and 59 seconds before requesting another review. To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR. |
Codecov Report
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. @@ Coverage Diff @@
## feature/v1.3 #584 +/- ##
===============================================
Coverage ? 84.07%
===============================================
Files ? 57
Lines ? 2788
Branches ? 0
===============================================
Hits ? 2344
Misses ? 444
Partials ? 0 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
examples/agent.py
Outdated
@@ -0,0 +1,41 @@ | |||
import pandas as pd | |||
from pandasai.agent import Agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can change it to:
from pandasaiimport Agent
pandasai/agent/__init__.py
Outdated
|
||
_memory: Memory | ||
_lake: SmartDatalake = None | ||
logger: Optional[Logger] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make logger private: _logger
pandasai/agent/__init__.py
Outdated
Agent class to improve the conversational experience in PandasAI | ||
""" | ||
|
||
_memory: Memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should rely on the memory from the SmartDatalake
instead, since we have access to it.
pandasai/agent/__init__.py
Outdated
# For the conversation multiple the memory size by 2 | ||
self._memory = Memory(memory_size * 2) | ||
|
||
def _get_conversation(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have such a method in the memory.py
file. What's the advantage or replicating it instead of using the one from the Memory
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important trying to reuse the code as much as possible to grant maintainability over time!
pandasai/agent/__init__.py
Outdated
self._memory.add(query, True) | ||
conversation = self._get_conversation() | ||
result = self._lake.chat( | ||
query, output_type=output_type, start_conversation=conversation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use the memory from the SmartDataframe
, we don't need to pass the start_conversation, I suggest we remove this.
pandasai/helpers/memory.py
Outdated
self._messages = [] | ||
self._max_messages = max_messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this (see below)
pandasai/helpers/memory.py
Outdated
|
||
def add(self, message: str, is_user: bool): | ||
self._messages.append({"message": message, "is_user": is_user}) | ||
|
||
# Delete two entry because of the conversation | ||
if len(self._messages) > self._max_messages: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this, the get_conversation
method of memory will automatically limit based on the limit
variable that is passed.
pandasai/prompts/explain_prompt.py
Outdated
</Code | ||
|
||
Can you explain briefly for non technical person on how you came up with code | ||
without explaining pandas library? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try with "Explain how you came up with code for non-technical people without mentioning technical details or mentioning the libraries used".
pandasai/smart_datalake/__init__.py
Outdated
self, | ||
query: str, | ||
output_type: Optional[str] = None, | ||
start_conversation: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gventuri Currently the get_conversation method is called with default which has limit=1 means returning the last message only. We need to then pass the memory size to SmartLake constructor to do so or let's use memory for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 10
Files selected for processing (6)
- examples/agent.py (1 hunks)
- pandasai/agent/init.py (1 hunks)
- pandasai/prompts/clarification_questions_prompt.py (1 hunks)
- pandasai/prompts/explain_prompt.py (1 hunks)
- pandasai/smart_datalake/init.py (4 hunks)
- tests/test_agent.py (1 hunks)
Files skipped from review due to trivial changes (2)
- pandasai/prompts/explain_prompt.py
- tests/test_agent.py
Additional comments (Suppressed): 3
examples/agent.py (1)
- 1-37: The new hunk of code seems to be well-written and logically sound. It demonstrates the usage of the
Agent
class from thepandasai.agent
module, which is part of the PandasAI library. The code creates two pandas DataFrames, initializes an instance of theOpenAI
class, and creates anAgent
object. Then it shows how to chat with the agent, get clarification questions, and explain how the chat response is generated.However, there are a few points that need to be verified or refined:
- In line 21, the OpenAI key is hardcoded as "OPEN_API_KEY". This could potentially be a placeholder, but if it's not, it's a security issue. API keys should never be hardcoded in the source code. They should be stored securely and accessed through environment variables or secure key management systems.
- In line 22, the
Agent
class is initialized with a memory size of 10. Ensure that this value is appropriate for your use case. If the number of messages exceeds this limit, the oldest messages will be removed according to the PR summary.- The code doesn't handle any exceptions. While it's mentioned in the instructions that arguments to internal function calls are pre-sanitized and do not require additional error handling, it would still be good practice to add some basic error handling around the main operations (like chatting with the agent, getting clarification questions, etc.) to catch and log any unexpected errors.
pandasai/smart_datalake/__init__.py (2)
258-263: The
chat
method has been updated to include a new optional parameterstart_conversation
. Ensure that this change doesn't break any existing calls to thechat
method in your codebase. Ifstart_conversation
is used, it should be a string representing the start of a conversation.633-634: The setter for
last_code_generated
has been changed to setself._last_code_generated
instead ofself._code_manager._last_code_generated
. Ensure that this change is intentional and that it doesn't affect other parts of your code that rely onlast_code_generated
.
pandasai/agent/__init__.py
Outdated
except Exception as exception: | ||
raise exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the clarification_questions
method, exceptions are caught and immediately re-raised. This is redundant and can be removed, as uncaught exceptions will automatically propagate up the call stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArslanSaleem agree, I suggest we remove try/catch wrapper. In the future we might add a retry mechanism!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Files selected for processing (1)
- examples/agent.py (1 hunks)
Additional comments (Suppressed): 3
examples/agent.py (3)
- 1-37: The new hunk of code seems to be well-written and logically sound. It demonstrates the usage of the
Agent
class from thepandasai
library, including how to initialize an instance of the class, chat with the agent, get clarification questions, and explain how a response is generated.However, there are a few points that need verification or refinement:
1-2: Ensure that the
Agent
class has been correctly implemented in thepandasai
module.6-15: The data used for demonstration purposes is hardcoded. While this is acceptable for an example script, in a real-world scenario, you would likely want to load this data from a file or database.
21: The OpenAI key is hardcoded into the script. This could pose a security risk if the script is shared or made public. Consider loading sensitive information like API keys from environment variables or secure files.22: The
Agent
class is initialized with two DataFrames, a configuration dictionary containing an instance of theOpenAI
class, and a memory size. Ensure that theAgent
class's constructor is designed to handle these parameters correctly.25, 30, 36: These lines call methods (
chat
,clarification_questions
,explain
) on theAgent
instance. Verify that these methods have been correctly implemented in theAgent
class.Overall, the code appears to be well-structured and easy to understand. However, it's important to verify the implementation of the
Agent
class and its methods to ensure they work as expected.
Rate Limit ExceededYou have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 29 minutes and 59 seconds before requesting another review. To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR. |
Rate Limit ExceededYou have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 27 minutes and 14 seconds before requesting another review. To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR. |
Rate Limit ExceededYou have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 21 minutes and 1 seconds before requesting another review. To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great changes, thanks a lot! We are almost there!
pandasai/agent/__init__.py
Outdated
except Exception as exception: | ||
raise exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArslanSaleem agree, I suggest we remove try/catch wrapper. In the future we might add a retry mechanism!
|
||
<dataframe> | ||
{dataframes} | ||
</dataframe> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{dataframes}
only should be enough, the set_var
method in prompts already takes care of wrapping each one individually.
""" | ||
|
||
def __init__(self, dataframes, conversation): | ||
self.set_var("dataframes", dataframes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def set_var(self, var, value):
if var == "dfs":
self._args["dataframes"] = self._generate_dataframes(value)
self._args[var] = value
This method from Prompt
automatically configures the dfs to be used in the prompt, but it requires a dfs
key.
Let's change this to self.set_var("dfs", dataframes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 7
Files selected for processing (5)
- pandasai/agent/init.py (1 hunks)
- pandasai/helpers/memory.py (2 hunks)
- pandasai/prompts/clarification_questions_prompt.py (1 hunks)
- pandasai/prompts/explain_prompt.py (1 hunks)
- pandasai/smart_datalake/init.py (3 hunks)
Additional comments (Suppressed): 6
pandasai/helpers/memory.py (1)
- 26-31: The
get_conversation
method now uses thememory_size
as the default limit if no limit is provided. This change aligns with the introduction of thememory_size
attribute and provides more flexibility in retrieving conversation history.pandasai/smart_datalake/__init__.py (3)
- 305-311: The new hunk seems to be missing the assignment of the
generate_python_code_instruction
variable. This could lead to a NameError later in the code if this variable is referenced. Please verify if this is intentional or an oversight.- generate_python_code_instruction = self._get_prompt( + self.generate_python_code_instruction = self._get_prompt( "generate_python_code", default_prompt=GeneratePythonCodePrompt,
624-627: The setter for
last_code_generated
has been changed from setting the value on_code_manager._last_code_generated
to_last_code_generated
. Ensure that this change does not break any dependencies on the old behavior, especially if other parts of the code are expecting the value to be set on_code_manager
.645-655: Two new properties,
dfs
andmemory
, have been added. These seem to be simple getters for the_dfs
and_memory
attributes. If these attributes were previously accessed directly, ensure that all such instances have been updated to use the new property methods.pandasai/agent/__init__.py (2)
28-31: The docstring for the
__init__
method seems to be outdated or incorrect. The arguments mentioned in the docstring do not match the actual arguments of the method. Please update the docstring to reflect the correct arguments and their descriptions.66-67: The code is returning only the first three clarification questions. If there are more than three, the rest will be ignored. Make sure this is the intended behavior. If all questions should be returned, you can simply return
questions
without slicing it.
"""Prompt to get clarification questions""" | ||
|
||
text: str = """ | ||
You are provided with the following pandas DataFrames: | ||
|
||
{dataframes} | ||
|
||
<conversation> | ||
{conversation} | ||
</conversation> | ||
|
||
Based on the conversation, are there any clarification questions | ||
that a senior data scientist would ask? These are questions for non technical people, | ||
only ask for questions they could ask given low tech expertise and | ||
no knowledge about how the dataframes are structured. | ||
|
||
Return the JSON array of the clarification questions. | ||
|
||
If there is no clarification question, return an empty array. | ||
|
||
Json: | ||
""" | ||
|
||
def __init__(self, dataframes, conversation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ClarificationQuestionPrompt
class is well defined and follows the SOLID principles. It extends from the base Prompt
class and sets up its specific prompt text and variables in the constructor. However, it would be beneficial to add type hints for the parameters dataframes
and conversation
in the __init__
method to improve readability and maintainability.
- def __init__(self, dataframes, conversation):
+ def __init__(self, dataframes: pd.DataFrame, conversation: str):
pandasai/prompts/explain_prompt.py
Outdated
""" Prompt to explain solution generated | ||
The previous conversation we had | ||
|
||
<Conversation> | ||
{conversation} | ||
</Conversation> | ||
|
||
Based on the last conversation you generated the following code: | ||
|
||
<Code> | ||
{code} | ||
</Code | ||
|
||
Explain how you came up with code for non-technical people without | ||
mentioning technical details or mentioning the libraries used? | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring at the top of the file seems to be a copy of the prompt text. It's not providing any useful information about the module or its contents. Consider replacing it with a more informative docstring that describes the purpose and functionality of the module.
pandasai/prompts/explain_prompt.py
Outdated
|
||
|
||
class ExplainPrompt(Prompt): | ||
"""Prompt to get clarification questions""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class docstring is misleading. This class is for generating explanation prompts, not clarification questions. Please correct it to avoid confusion.
- """Prompt to get clarification questions"""
+ """Prompt to generate explanation for the code"""
def __init__(self, memory_size: int = 1): | ||
self._messages = [] | ||
self._memory_size = memory_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __init__
method now accepts an optional parameter memory_size
which defaults to 1. This is used to limit the number of stored messages in memory. However, there's no logic implemented yet to enforce this limit when adding new messages. Consider adding a check in the add
method to remove the oldest message(s) when the limit is exceeded.
def add(self, message: str, is_user: bool):
self._messages.append({"message": message, "is_user": is_user})
+ while len(self._messages) > self._memory_size:
+ self._messages.pop(0)
except Exception as exception: | ||
return ( | ||
"Unfortunately, I was not able to explain, " | ||
"because of the following error:\n" | ||
f"\n{exception}\n" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error handling in these methods only returns a string with the error message. It might be better to log the exception and re-raise it after logging. This way, the caller of the method can decide how to handle the exception. Also, this will help in debugging as the stack trace will be available in the logs.
- return (
- "Unfortunately, I was not able to get your answers, "
- "because of the following error:\n"
- f"\n{exception}\n"
- )
+ self._logger.log(
+ "Unfortunately, I was not able to get your answers, "
+ "because of the following error:\n"
+ f"\n{exception}\n"
+ )
+ raise
...
- return (
- "Unfortunately, I was not able to explain, "
- "because of the following error:\n"
- f"\n{exception}\n"
- )
+ self._logger.log(
+ "Unfortunately, I was not able to explain, "
- "because of the following error:\n"
- f"\n{exception}\n"
+ )
+ raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 5
Files selected for processing (2)
- docs/examples.md (1 hunks)
- pandasai/agent/init.py (1 hunks)
Additional comments (Suppressed): 1
docs/examples.md (1)
- 210-222: The new section "Working with Agent" provides a clear and concise overview of the key features of the
Agent
class. It explains how context retention, clarification questions, and explanations can enhance user interactions with the chat agent. This is a good addition to the documentation as it helps users understand how to use theAgent
class effectively.
if not isinstance(dfs, list): | ||
dfs = [dfs] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is assuming that if dfs
is not a list, it should be converted into a list. This might lead to unexpected behavior if dfs
is of an unsupported type. It would be better to explicitly check for supported types and raise an error if an unsupported type is provided.
- if not isinstance(dfs, list):
- dfs = [dfs]
+ if isinstance(dfs, DataFrameType):
+ dfs = [dfs]
+ elif not isinstance(dfs, list):
+ raise TypeError("dfs must be a DataFrameType or a list of DataFrameType")
result = self._lake.chat(query, output_type=output_type) | ||
return result | ||
except Exception as exception: | ||
return ( | ||
"Unfortunately, I was not able to get your answers, " | ||
"because of the following error:\n" | ||
f"\n{exception}\n" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception handling here is too broad, which can make debugging difficult because it hides the details of what went wrong. Consider catching specific exceptions that you expect might occur during the execution of this block. If you want to catch all exceptions, at least log the full traceback to help with debugging.
prompt = ClarificationQuestionPrompt( | ||
self._lake.dfs, self._lake._memory.get_conversation() | ||
) | ||
|
||
result = self._lake.llm.call(prompt) | ||
self._logger.log( | ||
f"""Clarification Questions: {result} | ||
""" | ||
) | ||
questions: list[str] = json.loads(result) | ||
return questions[:3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result from self._lake.llm.call(prompt)
is directly loaded as JSON and returned. There's no error handling in case the result is not valid JSON, which could cause the function to crash. Consider adding error handling around json.loads(result)
.
try: | ||
prompt = ExplainPrompt( | ||
self._lake._memory.get_conversation(), | ||
self._lake.last_code_executed, | ||
) | ||
response = self._lake.llm.call(prompt) | ||
self._logger.log( | ||
f"""Explaination: {response} | ||
""" | ||
) | ||
return response | ||
except Exception as exception: | ||
return ( | ||
"Unfortunately, I was not able to explain, " | ||
"because of the following error:\n" | ||
f"\n{exception}\n" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the previous comment, the exception handling here is too broad. Consider catching specific exceptions that you expect might occur during the execution of this block. If you want to catch all exceptions, at least log the full traceback to help with debugging.
|
||
<dataframe> | ||
{dataframe} | ||
</dataframe> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also change this to
{dataframes}
for consistency!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Files selected for processing (2)
- pandasai/prompts/clarification_questions_prompt.py (1 hunks)
- pandasai/prompts/explain_prompt.py (1 hunks)
Additional comments (Suppressed): 2
pandasai/prompts/clarification_questions_prompt.py (1)
- 48-49: The
set_var
method is used here but it's not defined in this class or imported from elsewhere. Ensure that this method is properly defined and accessible within this class.pandasai/prompts/explain_prompt.py (1)
- 42-44: The
set_var
method is not defined in this class or its parent classPrompt
. Ensure that this method exists and works as expected. If it's intended to set instance variables, you can directly assign the values likeself.conversation = conversation
andself.code = code
.
text: str = """ | ||
The previous conversation we had | ||
|
||
<Conversation> | ||
{conversation} | ||
</Conversation> | ||
|
||
Based on the last conversation you generated the following code: | ||
|
||
<Code> | ||
{code} | ||
</Code | ||
|
||
Explain how you came up with code for non-technical people without | ||
mentioning technical details or mentioning the libraries used? | ||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text
attribute in the ExplainPrompt
class is a class variable, which means it's shared across all instances of this class. If you modify it in one instance, it will affect all other instances. This could lead to unexpected behavior if multiple instances of ExplainPrompt
are used concurrently. Consider moving this attribute to the instance level (inside __init__
) to avoid potential issues.
@ArslanSaleem thanks a lot for the great job. Merging! |
Summary by CodeRabbit
Agent
class inpandasai.agent
module to enhance conversational experience with PandasAI. It includes methods for chat interactions, generating clarification questions, starting a new conversation, and explaining code solutions.Memory
class inpandasai.helpers.memory
module to store conversation history with a configurable memory size.ClarificationQuestionPrompt
andExplainPrompt
classes inpandasai.prompts
module to generate prompts for clarification questions and code explanations.Agent
class covering various scenarios and functionalities.examples.md
with examples of how to use the newAgent
class for interactive and context-aware conversations.