-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(RephraseQuery): rephrase user query to get more accurate responses #592
Conversation
WalkthroughThis update introduces a new feature to rephrase queries for better model responses. It includes changes in the agent and prompt classes, adding validation methods and error handling. The test suite has been expanded to cover these new functionalities. Documentation is updated to guide users on how to use the new feature. Changes
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 3
Files selected for processing (5)
- pandasai/agent/init.py (5 hunks)
- pandasai/prompts/base.py (1 hunks)
- pandasai/prompts/clarification_questions_prompt.py (2 hunks)
- pandasai/prompts/rephase_query_prompt.py (1 hunks)
- tests/test_agent.py (4 hunks)
Files skipped from review due to trivial changes (1)
- pandasai/prompts/base.py
Additional comments (Suppressed): 6
pandasai/prompts/clarification_questions_prompt.py (1)
- 18-21: The import of
json
module is new in this hunk. Ensure that it's used appropriately and securely throughout the codebase.pandasai/agent/__init__.py (1)
- 123-139: The
rephrase_query
method is a new addition to theAgent
class. It uses the_call_llm_with_prompt
method to call the LLM with aRephraseQueryPrompt
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.tests/test_agent.py (4)
4-11: The new imports
ClarificationQuestionPrompt
andExplainPrompt
frompandasai.prompts
are introduced. Ensure that these modules exist and are accessible in the project structure.57-84: A new fixture
agent
is introduced which creates an instance ofAgent
usingsample_df
andconfig
. This fixture is used in subsequent tests, replacing the previous method of creating anAgent
instance within each test. This change improves code reusability and readability.150-156: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [150-178]
The
test_explain
function has been updated to use the newagent
fixture instead of creating a newAgent
instance within the test. This change improves code reusability and readability.
- 180-275: New test cases have been added to test the retry mechanism implemented in
_call_llm_with_prompt()
. These tests cover various scenarios such as success on first call, exceeding max retries, success after one or two failures, disabling retries, and adjusting the max retries configuration. Additionally, there are tests for the validation of output fromClarificationQuestionPrompt
. These additions significantly improve the coverage of edge cases and error conditions in the test suite.
def validate(self, output) -> bool: | ||
try: | ||
json_data = json.loads(output) | ||
return isinstance(json_data, List) | ||
except Exception: | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation method checks if the output can be parsed as JSON and if the parsed data is a list. However, it doesn't handle exceptions properly. When an exception occurs, it simply raises it again which is redundant. Instead, it should return False
or provide a more specific error message.
- except Exception:
- raise
+ except json.JSONDecodeError:
+ return False
Rate Limit ExceededYou have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 18 minutes and 7 seconds before requesting another review. To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR. |
Rate Limit ExceededYou have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 7 minutes and 38 seconds before requesting another review. To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR. |
Codecov Report
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. @@ Coverage Diff @@
## feature/v1.3 #592 +/- ##
===============================================
Coverage ? 84.94%
===============================================
Files ? 60
Lines ? 2910
Branches ? 0
===============================================
Hits ? 2472
Misses ? 438
Partials ? 0 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@ArslanSaleem great job, I'd only add a mention of this method in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 4
Files selected for processing (5)
- docs/getting-started.md (1 hunks)
- pandasai/agent/init.py (5 hunks)
- pandasai/prompts/clarification_questions_prompt.py (2 hunks)
- pandasai/prompts/rephase_query_prompt.py (1 hunks)
- tests/test_agent.py (4 hunks)
Files skipped from review due to trivial changes (1)
- docs/getting-started.md
Additional comments (Suppressed): 5
pandasai/prompts/clarification_questions_prompt.py (2)
- 18-21: The import statement for
json
is new and it's used in thevalidate
method. However, there seems to be an unnecessary change in indentation for the import statements on lines 19-21. This could potentially cause a syntax error. Please verify if this was intended.- from typing import List - import pandas as pd - from .base import Prompt + from typing import List + import pandas as pd + from .base import Prompt
- 54-59: The
validate
method has been added to validate the output of the prompt. It tries to parse the output as JSON and checks if it's a list. If the parsing fails, it returns False. This is a good practice for ensuring the validity of the output. However, it might be beneficial to add some logging or error messaging when the validation fails (i.e., when ajson.JSONDecodeError
is caught) to aid in debugging.tests/test_agent.py (3)
57-84: The test
test_constructor
has been updated to include a check for data isolation between multiple instances of theAgent
class. This is a good practice as it ensures that each instance of theAgent
class operates independently and does not interfere with the data of other instances.150-153: The signature of the
test_explain
method has been changed from accepting individual parameters to accepting anAgent
object. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.180-279: New tests have been added to verify the functionality of the
_call_llm_with_prompt
method, including successful calls, exceeding max retries, and validation of output. These tests are crucial for ensuring the robustness of the error handling and retry mechanism implemented in the_call_llm_with_prompt
method.
def _call_llm_with_prompt(self, prompt: Prompt): | ||
""" | ||
Call LLM with prompt using error handling to retry based on config | ||
Args: | ||
prompt (Prompt): Prompt to pass to LLM's | ||
""" | ||
retry_count = 0 | ||
while retry_count < self._lake.config.max_retries: | ||
try: | ||
result: str = self._lake.llm.call(prompt) | ||
if prompt.validate(result): | ||
return result | ||
else: | ||
raise Exception("Response validation failed!") | ||
except Exception: | ||
if ( | ||
not self._lake.use_error_correction_framework | ||
or retry_count >= self._lake.config.max_retries - 1 | ||
): | ||
raise | ||
retry_count += 1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _call_llm_with_prompt
method is a good addition for handling retries and error management. However, it's important to note that the exception handling here is quite broad. It catches all exceptions without distinguishing between different types of errors. This could potentially hide unexpected issues and make debugging more difficult. Consider refining the exception handling to be more specific or at least log the exception details before retrying.
def rephrase_query(self, query: str): | ||
try: | ||
prompt = RephraseQueryPrompt( | ||
query, self._lake.dfs, self._lake._memory.get_conversation() | ||
) | ||
response = self._call_llm_with_prompt(prompt) | ||
self._logger.log( | ||
f"""Rephrased Response: {response} | ||
""" | ||
) | ||
return response | ||
except Exception as exception: | ||
return ( | ||
"Unfortunately, I was not able to repharse query, " | ||
"because of the following error:\n" | ||
f"\n{exception}\n" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rephrase_query
method is a new feature that rephrases a given query using the LLM. The implementation looks correct and follows the same pattern as other methods in the class. However, similar to the previous comment, the exception handling is too broad and could hide unexpected issues. Consider refining the exception handling to be more specific or at least log the exception details when an error occurs.
text: str = """ | ||
You are provided with the following pandas DataFrames: | ||
|
||
{dataframes} | ||
{conversation} | ||
Return the rephrased sentence of "{query}” in order to obtain more accurate and | ||
comprehensive responses without any explanations. If something from the original | ||
query is ambiguous, please clarify it in the rephrased query, making assumptions, | ||
if necessary. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same text is repeated twice in the class RephraseQueryPrompt
. This repetition could be avoided by defining the text once and reusing it. This would make the code more maintainable and easier to modify in the future.
- text: str = """
- You are provided with the following pandas DataFrames:
-
- {dataframes}
- {conversation}
- Return the rephrased sentence of "{query}” in order to obtain more accurate and
- comprehensive responses without any explanations. If something from the original
- query is ambiguous, please clarify it in the rephrased query, making assumptions,
- if necessary.
- """
-
- conversation_text: str = """
- And based on our conversation:
-
- <conversation>
- {conversation}
- </conversation>
- """
+ PROMPT_TEXT: str = """
+ You are provided with the following pandas DataFrames:
+
+ {dataframes}
+ {conversation}
+ Return the rephrased sentence of "{query}” in order to obtain more accurate and
+ comprehensive responses without any explanations. If something from the original
+ query is ambiguous, please clarify it in the rephrased query, making assumptions,
+ if necessary.
+ """
+
+ CONVERSATION_TEXT: str = """
+ And based on our conversation:
+
+ <conversation>
+ {conversation}
+ </conversation>
+ """
def __init__(self, query: str, dataframes: List[pd.DataFrame], conversation: str): | ||
conversation_content = ( | ||
self.conversation_text.format(conversation=conversation) | ||
if conversation | ||
else "" | ||
) | ||
self.set_var("conversation", conversation_content) | ||
self.set_var("query", query) | ||
self.set_var("dfs", dataframes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no validation for the input parameters query
, dataframes
, and conversation
. It's a good practice to validate function arguments to ensure they meet certain conditions before proceeding with the function execution. For instance, you could check if query
is a non-empty string, dataframes
is a list of pandas DataFrames, and conversation
is a string.
Summary by CodeRabbit
New Feature:
rephrase_query
method in theAgent
class.Improvement:
Agent
class with a new internal method_call_llm_with_prompt()
to handle Language Model calls with prompts and error retries.Test:
Agent
class, including tests for the newrephrase_query
method and the_call_llm_with_prompt()
method.Documentation:
rephrase_query
feature for improved model responses.