Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(RephraseQuery): rephrase user query to get more accurate responses #592

Merged
merged 26 commits into from
Sep 25, 2023

Conversation

ArslanSaleem
Copy link
Collaborator

@ArslanSaleem ArslanSaleem commented Sep 25, 2023

Summary by CodeRabbit

New Feature:

  • Added a new feature to rephrase queries for better model responses. This can be used via the rephrase_query method in the Agent class.

Improvement:

  • Enhanced the Agent class with a new internal method _call_llm_with_prompt() to handle Language Model calls with prompts and error retries.

Test:

  • Expanded test coverage with multiple new test cases for the Agent class, including tests for the new rephrase_query method and the _call_llm_with_prompt() method.

Documentation:

  • Updated the "Getting Started" guide with a new section on how to use the rephrase_query feature for improved model responses.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 25, 2023

Walkthrough

This update introduces a new feature to rephrase queries for better model responses. It includes changes in the agent and prompt classes, adding validation methods and error handling. The test suite has been expanded to cover these new functionalities. Documentation is updated to guide users on how to use the new feature.

Changes

File(s) Summary
pandasai/agent/__init__.py Introduced _call_llm_with_prompt() method for error handling and retries. Added rephrase_query() method for query rephrasing.
pandasai/prompts/base.py,
pandasai/prompts/clarification_questions_prompt.py,
pandasai/prompts/rephase_query_prompt.py
Added validate method in base class and subclasses for output validation. Introduced RephraseQueryPrompt class for query rephrasing.
tests/test_agent.py Expanded test suite to cover new functionalities including multiple instances of Agent, LLM calls with prompts, and output validation.
docs/getting-started.md Updated documentation with a new section on how to use the rephrase_query method.

🐇💻

"In the land of code where the shadows lie,
A rabbit hopped forth under the cloudless sky.
With each line written, and each test passed,
Our software's strength grows unsurpassed.
Now queries rephrase with a clever twist,
Ensuring no detail will be missed.
So let's celebrate this update, oh so grand,
Crafted by the coderabbit's hand!" 🎉🥕


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • Mention @coderabbitai in any review comment for bot assistance.
  • Note: Review comments are made on code diffs or files, not on the PR overview.

Pause Incremental Reviews

  • Insert @coderabbitai: ignore in the PR description to halt the bot's ongoing reviews. Remove the line to resume.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Commits Files that changed from the base of the PR and between dd6934d and 2d87306.
Files selected for processing (5)
  • pandasai/agent/init.py (5 hunks)
  • pandasai/prompts/base.py (1 hunks)
  • pandasai/prompts/clarification_questions_prompt.py (2 hunks)
  • pandasai/prompts/rephase_query_prompt.py (1 hunks)
  • tests/test_agent.py (4 hunks)
Files skipped from review due to trivial changes (1)
  • pandasai/prompts/base.py
Additional comments (Suppressed): 6
pandasai/prompts/clarification_questions_prompt.py (1)
  • 18-21: The import of json module is new in this hunk. Ensure that it's used appropriately and securely throughout the codebase.
pandasai/agent/__init__.py (1)
  • 123-139: The rephrase_query method is a new addition to the Agent class. It uses the _call_llm_with_prompt method to call the LLM with a RephraseQueryPrompt. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.
tests/test_agent.py (4)
  • 4-11: The new imports ClarificationQuestionPrompt and ExplainPrompt from pandasai.prompts are introduced. Ensure that these modules exist and are accessible in the project structure.

  • 57-84: A new fixture agent is introduced which creates an instance of Agent using sample_df and config. This fixture is used in subsequent tests, replacing the previous method of creating an Agent instance within each test. This change improves code reusability and readability.

  • 150-156: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [150-178]

The test_explain function has been updated to use the new agent fixture instead of creating a new Agent instance within the test. This change improves code reusability and readability.

  • 180-275: New test cases have been added to test the retry mechanism implemented in _call_llm_with_prompt(). These tests cover various scenarios such as success on first call, exceeding max retries, success after one or two failures, disabling retries, and adjusting the max retries configuration. Additionally, there are tests for the validation of output from ClarificationQuestionPrompt. These additions significantly improve the coverage of edge cases and error conditions in the test suite.

Comment on lines 52 to 57
def validate(self, output) -> bool:
try:
json_data = json.loads(output)
return isinstance(json_data, List)
except Exception:
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation method checks if the output can be parsed as JSON and if the parsed data is a list. However, it doesn't handle exceptions properly. When an exception occurs, it simply raises it again which is redundant. Instead, it should return False or provide a more specific error message.

-        except Exception:
-            raise
+        except json.JSONDecodeError:
+            return False

pandasai/prompts/rephase_query_prompt.py Outdated Show resolved Hide resolved
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 25, 2023

Rate Limit Exceeded

You have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 18 minutes and 7 seconds before requesting another review.

To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 25, 2023

Rate Limit Exceeded

You have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 7 minutes and 38 seconds before requesting another review.

To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR.

@codecov-commenter
Copy link

codecov-commenter commented Sep 25, 2023

Codecov Report

❗ No coverage uploaded for pull request base (feature/v1.3@09872ac). Click here to learn what that means.
The diff coverage is n/a.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@               Coverage Diff               @@
##             feature/v1.3     #592   +/-   ##
===============================================
  Coverage                ?   84.94%           
===============================================
  Files                   ?       60           
  Lines                   ?     2910           
  Branches                ?        0           
===============================================
  Hits                    ?     2472           
  Misses                  ?      438           
  Partials                ?        0           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@gventuri
Copy link
Collaborator

@ArslanSaleem great job, I'd only add a mention of this method in the Agent section on getting-started.md. What do you think?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 4

Commits Files that changed from the base of the PR and between 2d87306 and 0c3c997.
Files selected for processing (5)
  • docs/getting-started.md (1 hunks)
  • pandasai/agent/init.py (5 hunks)
  • pandasai/prompts/clarification_questions_prompt.py (2 hunks)
  • pandasai/prompts/rephase_query_prompt.py (1 hunks)
  • tests/test_agent.py (4 hunks)
Files skipped from review due to trivial changes (1)
  • docs/getting-started.md
Additional comments (Suppressed): 5
pandasai/prompts/clarification_questions_prompt.py (2)
  • 18-21: The import statement for json is new and it's used in the validate method. However, there seems to be an unnecessary change in indentation for the import statements on lines 19-21. This could potentially cause a syntax error. Please verify if this was intended.
-  from typing import List
-  import pandas as pd
-  from .base import Prompt
+ from typing import List
+ import pandas as pd
+ from .base import Prompt
  • 54-59: The validate method has been added to validate the output of the prompt. It tries to parse the output as JSON and checks if it's a list. If the parsing fails, it returns False. This is a good practice for ensuring the validity of the output. However, it might be beneficial to add some logging or error messaging when the validation fails (i.e., when a json.JSONDecodeError is caught) to aid in debugging.
tests/test_agent.py (3)
  • 57-84: The test test_constructor has been updated to include a check for data isolation between multiple instances of the Agent class. This is a good practice as it ensures that each instance of the Agent class operates independently and does not interfere with the data of other instances.

  • 150-153: The signature of the test_explain method has been changed from accepting individual parameters to accepting an Agent object. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 180-279: New tests have been added to verify the functionality of the _call_llm_with_prompt method, including successful calls, exceeding max retries, and validation of output. These tests are crucial for ensuring the robustness of the error handling and retry mechanism implemented in the _call_llm_with_prompt method.

Comment on lines +43 to +64
def _call_llm_with_prompt(self, prompt: Prompt):
"""
Call LLM with prompt using error handling to retry based on config
Args:
prompt (Prompt): Prompt to pass to LLM's
"""
retry_count = 0
while retry_count < self._lake.config.max_retries:
try:
result: str = self._lake.llm.call(prompt)
if prompt.validate(result):
return result
else:
raise Exception("Response validation failed!")
except Exception:
if (
not self._lake.use_error_correction_framework
or retry_count >= self._lake.config.max_retries - 1
):
raise
retry_count += 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _call_llm_with_prompt method is a good addition for handling retries and error management. However, it's important to note that the exception handling here is quite broad. It catches all exceptions without distinguishing between different types of errors. This could potentially hide unexpected issues and make debugging more difficult. Consider refining the exception handling to be more specific or at least log the exception details before retrying.

Comment on lines +123 to +139
def rephrase_query(self, query: str):
try:
prompt = RephraseQueryPrompt(
query, self._lake.dfs, self._lake._memory.get_conversation()
)
response = self._call_llm_with_prompt(prompt)
self._logger.log(
f"""Rephrased Response: {response}
"""
)
return response
except Exception as exception:
return (
"Unfortunately, I was not able to repharse query, "
"because of the following error:\n"
f"\n{exception}\n"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rephrase_query method is a new feature that rephrases a given query using the LLM. The implementation looks correct and follows the same pattern as other methods in the class. However, similar to the previous comment, the exception handling is too broad and could hide unexpected issues. Consider refining the exception handling to be more specific or at least log the exception details when an error occurs.

Comment on lines +21 to +30
text: str = """
You are provided with the following pandas DataFrames:

{dataframes}
{conversation}
Return the rephrased sentence of "{query}” in order to obtain more accurate and
comprehensive responses without any explanations. If something from the original
query is ambiguous, please clarify it in the rephrased query, making assumptions,
if necessary.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same text is repeated twice in the class RephraseQueryPrompt. This repetition could be avoided by defining the text once and reusing it. This would make the code more maintainable and easier to modify in the future.

-    text: str = """
-    You are provided with the following pandas DataFrames:
-    
-    {dataframes}
-    {conversation}
-    Return the rephrased sentence of "{query}” in order to obtain more accurate and 
-    comprehensive responses without any explanations. If something from the original
-    query is ambiguous, please clarify it in the rephrased query, making assumptions,
-    if necessary.
-    """
-
-    conversation_text: str = """
-    And based on our conversation:
-    
-    <conversation>
-    {conversation}
-    </conversation>
-    """
+    PROMPT_TEXT: str = """
+    You are provided with the following pandas DataFrames:
+    
+    {dataframes}
+    {conversation}
+    Return the rephrased sentence of "{query}” in order to obtain more accurate and 
+    comprehensive responses without any explanations. If something from the original
+    query is ambiguous, please clarify it in the rephrased query, making assumptions,
+    if necessary.
+    """
+
+    CONVERSATION_TEXT: str = """
+    And based on our conversation:
+    
+    <conversation>
+    {conversation}
+    </conversation>
+    """

Comment on lines +40 to +48
def __init__(self, query: str, dataframes: List[pd.DataFrame], conversation: str):
conversation_content = (
self.conversation_text.format(conversation=conversation)
if conversation
else ""
)
self.set_var("conversation", conversation_content)
self.set_var("query", query)
self.set_var("dfs", dataframes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no validation for the input parameters query, dataframes, and conversation. It's a good practice to validate function arguments to ensure they meet certain conditions before proceeding with the function execution. For instance, you could check if query is a non-empty string, dataframes is a list of pandas DataFrames, and conversation is a string.

@gventuri gventuri changed the title chore(ErrorHandling): retry query multiple times based on the max retries feat(RephraseQuery): rephrase user query to get more accurate responses Sep 25, 2023
@gventuri gventuri merged commit 722ca3e into feature/v1.3 Sep 25, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants