feat(RephraseQuery): rephrase user query to get more accurate responses #592

ArslanSaleem · 2023-09-25T11:37:12Z

Summary by CodeRabbit

New Feature:

Added a new feature to rephrase queries for better model responses. This can be used via the rephrase_query method in the Agent class.

Improvement:

Enhanced the Agent class with a new internal method _call_llm_with_prompt() to handle Language Model calls with prompts and error retries.

Test:

Expanded test coverage with multiple new test cases for the Agent class, including tests for the new rephrase_query method and the _call_llm_with_prompt() method.

Documentation:

Updated the "Getting Started" guide with a new section on how to use the rephrase_query feature for improved model responses.

…mpt validators

coderabbitai · 2023-09-25T11:37:17Z

Walkthrough

This update introduces a new feature to rephrase queries for better model responses. It includes changes in the agent and prompt classes, adding validation methods and error handling. The test suite has been expanded to cover these new functionalities. Documentation is updated to guide users on how to use the new feature.

Changes

File(s)	Summary
`pandasai/agent/__init__.py`	Introduced `_call_llm_with_prompt()` method for error handling and retries. Added `rephrase_query()` method for query rephrasing.
`pandasai/prompts/base.py`, `pandasai/prompts/clarification_questions_prompt.py`, `pandasai/prompts/rephase_query_prompt.py`	Added `validate` method in base class and subclasses for output validation. Introduced `RephraseQueryPrompt` class for query rephrasing.
`tests/test_agent.py`	Expanded test suite to cover new functionalities including multiple instances of `Agent`, LLM calls with prompts, and output validation.
`docs/getting-started.md`	Updated documentation with a new section on how to use the `rephrase_query` method.

🐇💻

"In the land of code where the shadows lie,
A rabbit hopped forth under the cloudless sky.
With each line written, and each test passed,
Our software's strength grows unsurpassed.
Now queries rephrase with a clever twist,
Ensuring no detail will be missed.
So let's celebrate this update, oh so grand,
Crafted by the coderabbit's hand!" 🎉🥕

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

Mention @coderabbitai in any review comment for bot assistance.
Note: Review comments are made on code diffs or files, not on the PR overview.

Pause Incremental Reviews

Insert @coderabbitai: ignore in the PR description to halt the bot's ongoing reviews. Remove the line to resume.

coderabbitai

Review Status

Actionable comments generated: 3

Commits

Files that changed from the base of the PR and between dd6934d and 2d87306.

Files selected for processing (5)

pandasai/agent/init.py (5 hunks)
pandasai/prompts/base.py (1 hunks)
pandasai/prompts/clarification_questions_prompt.py (2 hunks)
pandasai/prompts/rephase_query_prompt.py (1 hunks)
tests/test_agent.py (4 hunks)

Files skipped from review due to trivial changes (1)

pandasai/prompts/base.py

Additional comments (Suppressed): 6

pandasai/prompts/clarification_questions_prompt.py (1)

18-21: The import of json module is new in this hunk. Ensure that it's used appropriately and securely throughout the codebase.

pandasai/agent/__init__.py (1)

123-139: The rephrase_query method is a new addition to the Agent class. It uses the _call_llm_with_prompt method to call the LLM with a RephraseQueryPrompt. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

tests/test_agent.py (4)

4-11: The new imports ClarificationQuestionPrompt and ExplainPrompt from pandasai.prompts are introduced. Ensure that these modules exist and are accessible in the project structure.

57-84: A new fixture agent is introduced which creates an instance of Agent using sample_df and config. This fixture is used in subsequent tests, replacing the previous method of creating an Agent instance within each test. This change improves code reusability and readability.

150-156: > Note: This review was outside of the patch, so it was mapped to the patch with the greatest overlap. Original lines [150-178]

The test_explain function has been updated to use the new agent fixture instead of creating a new Agent instance within the test. This change improves code reusability and readability.

180-275: New test cases have been added to test the retry mechanism implemented in _call_llm_with_prompt(). These tests cover various scenarios such as success on first call, exceeding max retries, success after one or two failures, disabling retries, and adjusting the max retries configuration. Additionally, there are tests for the validation of output from ClarificationQuestionPrompt. These additions significantly improve the coverage of edge cases and error conditions in the test suite.

coderabbitai · 2023-09-25T11:40:00Z

pandasai/prompts/clarification_questions_prompt.py

+    def validate(self, output) -> bool:
+        try:
+            json_data = json.loads(output)
+            return isinstance(json_data, List)
+        except Exception:
+            raise


This validation method checks if the output can be parsed as JSON and if the parsed data is a list. However, it doesn't handle exceptions properly. When an exception occurs, it simply raises it again which is redundant. Instead, it should return False or provide a more specific error message.

- except Exception: - raise + except json.JSONDecodeError: + return False

pandasai/prompts/rephase_query_prompt.py

coderabbitai · 2023-09-25T11:40:44Z

Rate Limit Exceeded

You have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 18 minutes and 7 seconds before requesting another review.

To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR.

coderabbitai · 2023-09-25T11:51:13Z

Rate Limit Exceeded

You have exceeded the limit for the number of files or commits that can be reviewed per hour. Please wait 7 minutes and 38 seconds before requesting another review.

To re-trigger a review, convert this PR to a draft and then mark it as ready for review again. Alternatively, you can push a new commit to this PR.

codecov-commenter · 2023-09-25T11:53:45Z

Codecov Report

❗ No coverage uploaded for pull request base (feature/v1.3@09872ac). Click here to learn what that means.
The diff coverage is n/a.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@               Coverage Diff               @@
##             feature/v1.3     #592   +/-   ##
===============================================
  Coverage                ?   84.94%           
===============================================
  Files                   ?       60           
  Lines                   ?     2910           
  Branches                ?        0           
===============================================
  Hits                    ?     2472           
  Misses                  ?      438           
  Partials                ?        0

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

gventuri · 2023-09-25T12:27:51Z

@ArslanSaleem great job, I'd only add a mention of this method in the Agent section on getting-started.md. What do you think?

coderabbitai

Review Status

Actionable comments generated: 4

Commits

Files that changed from the base of the PR and between 2d87306 and 0c3c997.

Files selected for processing (5)

docs/getting-started.md (1 hunks)
pandasai/agent/init.py (5 hunks)
pandasai/prompts/clarification_questions_prompt.py (2 hunks)
pandasai/prompts/rephase_query_prompt.py (1 hunks)
tests/test_agent.py (4 hunks)

Files skipped from review due to trivial changes (1)

docs/getting-started.md

Additional comments (Suppressed): 5

pandasai/prompts/clarification_questions_prompt.py (2)
18-21: The import statement for json is new and it's used in the validate method. However, there seems to be an unnecessary change in indentation for the import statements on lines 19-21. This could potentially cause a syntax error. Please verify if this was intended.
-  from typing import List
-  import pandas as pd
-  from .base import Prompt
+ from typing import List
+ import pandas as pd
+ from .base import Prompt
54-59: The validate method has been added to validate the output of the prompt. It tries to parse the output as JSON and checks if it's a list. If the parsing fails, it returns False. This is a good practice for ensuring the validity of the output. However, it might be beneficial to add some logging or error messaging when the validation fails (i.e., when a json.JSONDecodeError is caught) to aid in debugging.
tests/test_agent.py (3)

57-84: The test test_constructor has been updated to include a check for data isolation between multiple instances of the Agent class. This is a good practice as it ensures that each instance of the Agent class operates independently and does not interfere with the data of other instances.

150-153: The signature of the test_explain method has been changed from accepting individual parameters to accepting an Agent object. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

180-279: New tests have been added to verify the functionality of the _call_llm_with_prompt method, including successful calls, exceeding max retries, and validation of output. These tests are crucial for ensuring the robustness of the error handling and retry mechanism implemented in the _call_llm_with_prompt method.

coderabbitai · 2023-09-25T13:57:33Z

pandasai/agent/__init__.py

+    def _call_llm_with_prompt(self, prompt: Prompt):
+        """
+        Call LLM with prompt using error handling to retry based on config
+        Args:
+            prompt (Prompt): Prompt to pass to LLM's
+        """
+        retry_count = 0
+        while retry_count < self._lake.config.max_retries:
+            try:
+                result: str = self._lake.llm.call(prompt)
+                if prompt.validate(result):
+                    return result
+                else:
+                    raise Exception("Response validation failed!")
+            except Exception:
+                if (
+                    not self._lake.use_error_correction_framework
+                    or retry_count >= self._lake.config.max_retries - 1
+                ):
+                    raise
+                retry_count += 1
+


The _call_llm_with_prompt method is a good addition for handling retries and error management. However, it's important to note that the exception handling here is quite broad. It catches all exceptions without distinguishing between different types of errors. This could potentially hide unexpected issues and make debugging more difficult. Consider refining the exception handling to be more specific or at least log the exception details before retrying.

coderabbitai · 2023-09-25T13:57:33Z

pandasai/agent/__init__.py

+    def rephrase_query(self, query: str):
+        try:
+            prompt = RephraseQueryPrompt(
+                query, self._lake.dfs, self._lake._memory.get_conversation()
+            )
+            response = self._call_llm_with_prompt(prompt)
+            self._logger.log(
+                f"""Rephrased Response:  {response}
+                """
+            )
+            return response
+        except Exception as exception:
+            return (
+                "Unfortunately, I was not able to repharse query, "
+                "because of the following error:\n"
+                f"\n{exception}\n"
+            )


The rephrase_query method is a new feature that rephrases a given query using the LLM. The implementation looks correct and follows the same pattern as other methods in the class. However, similar to the previous comment, the exception handling is too broad and could hide unexpected issues. Consider refining the exception handling to be more specific or at least log the exception details when an error occurs.

coderabbitai · 2023-09-25T13:57:33Z

pandasai/prompts/rephase_query_prompt.py

+    text: str = """
+You are provided with the following pandas DataFrames:
+
+{dataframes}
+{conversation}
+Return the rephrased sentence of "{query}” in order to obtain more accurate and 
+comprehensive responses without any explanations. If something from the original
+query is ambiguous, please clarify it in the rephrased query, making assumptions,
+if necessary.
+"""


The same text is repeated twice in the class RephraseQueryPrompt. This repetition could be avoided by defining the text once and reusing it. This would make the code more maintainable and easier to modify in the future.

- text: str = """ - You are provided with the following pandas DataFrames: - - {dataframes} - {conversation} - Return the rephrased sentence of "{query}” in order to obtain more accurate and - comprehensive responses without any explanations. If something from the original - query is ambiguous, please clarify it in the rephrased query, making assumptions, - if necessary. - """ - - conversation_text: str = """ - And based on our conversation: - - <conversation> - {conversation} - </conversation> - """ + PROMPT_TEXT: str = """ + You are provided with the following pandas DataFrames: + + {dataframes} + {conversation} + Return the rephrased sentence of "{query}” in order to obtain more accurate and + comprehensive responses without any explanations. If something from the original + query is ambiguous, please clarify it in the rephrased query, making assumptions, + if necessary. + """ + + CONVERSATION_TEXT: str = """ + And based on our conversation: + + <conversation> + {conversation} + </conversation> + """

coderabbitai · 2023-09-25T13:57:33Z

pandasai/prompts/rephase_query_prompt.py

+    def __init__(self, query: str, dataframes: List[pd.DataFrame], conversation: str):
+        conversation_content = (
+            self.conversation_text.format(conversation=conversation)
+            if conversation
+            else ""
+        )
+        self.set_var("conversation", conversation_content)
+        self.set_var("query", query)
+        self.set_var("dfs", dataframes)


There's no validation for the input parameters query, dataframes, and conversation. It's a good practice to validate function arguments to ensure they meet certain conditions before proceeding with the function execution. For instance, you could check if query is a non-empty string, dataframes is a list of pandas DataFrames, and conversation is a string.

ArslanSaleem and others added 21 commits September 21, 2023 20:10

feat[Agent]: add agent conversation code

70b0da8

feat[Agent]: add test cases for the agent class

1b51727

feat: add explain method

70244c3

feat: Add Explain functionality in the agent

f715035

fix: refactor types

2da890a

chore: fix typings

6736c44

chore: improve prompt add conversation

cdeec68

refactor: remove memory from the agent class

9025f4e

refactor: import of Agent class in example

d1b8e61

refactor: memory to return conversation according to size

49d8720

refactor: remove leftover property

b92fb39

fix: prompt comment

7f17af8

fix: redundant try catch

2e4c902

chore: improve docstring and add example in documentation

7a554a5

fix: Comment in clarification prompts and add dtyps to the constructors

f7e4d98

feat(RephraseQuery): rephrase user query to get more accurate responses

21f5bd8

Merge branch 'feature/v1.3' into agent_rephrase_query

393f2f2

chore(agent): add max retries on queries

adfc86a

feat: improve the prompt to also add information about ambiguous parts

bf9667b

feat[retry_wrapper]: add basic wrapper for error handling and add pro…

cccee44

…mpt validators

Merge branch 'agent_rephrase_query' into retry_wrapper

2d87306

coderabbitai bot reviewed Sep 25, 2023

View reviewed changes

Merge branch 'agent_rephrase_query' into retry_wrapper

995b90d

refactor(validation): return False from the validator in case of failure

6fa9c1d

Merge branch 'feature/v1.3' into agent_rephrase_query

30765ac

ArslanSaleem added 2 commits September 25, 2023 18:47

fix(RephraseQuery): remove conversation from the prompt if empty

e9b2342

Merge branch 'agent_rephrase_query' into retry_wrapper

0c3c997

coderabbitai bot reviewed Sep 25, 2023

View reviewed changes

gventuri changed the title ~~chore(ErrorHandling): retry query multiple times based on the max retries~~ feat(RephraseQuery): rephrase user query to get more accurate responses Sep 25, 2023

gventuri merged commit 722ca3e into feature/v1.3 Sep 25, 2023
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(RephraseQuery): rephrase user query to get more accurate responses #592

feat(RephraseQuery): rephrase user query to get more accurate responses #592

ArslanSaleem commented Sep 25, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 25, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

Pause Incremental Reviews

coderabbitai bot left a comment

coderabbitai bot Sep 25, 2023

coderabbitai bot commented Sep 25, 2023

coderabbitai bot commented Sep 25, 2023

codecov-commenter commented Sep 25, 2023 •

edited

Loading

gventuri commented Sep 25, 2023

coderabbitai bot left a comment

coderabbitai bot Sep 25, 2023

coderabbitai bot Sep 25, 2023

coderabbitai bot Sep 25, 2023

coderabbitai bot Sep 25, 2023

feat(RephraseQuery): rephrase user query to get more accurate responses #592

feat(RephraseQuery): rephrase user query to get more accurate responses #592

Conversation

ArslanSaleem commented Sep 25, 2023 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Sep 25, 2023 • edited Loading

Walkthrough

Changes

Chat with CodeRabbit Bot (@coderabbitai)

Pause Incremental Reviews

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Sep 25, 2023

Choose a reason for hiding this comment

coderabbitai bot commented Sep 25, 2023

Rate Limit Exceeded

coderabbitai bot commented Sep 25, 2023

Rate Limit Exceeded

codecov-commenter commented Sep 25, 2023 • edited Loading

Codecov Report

gventuri commented Sep 25, 2023

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Sep 25, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 25, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 25, 2023

Choose a reason for hiding this comment

coderabbitai bot Sep 25, 2023

Choose a reason for hiding this comment

ArslanSaleem commented Sep 25, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 25, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

codecov-commenter commented Sep 25, 2023 •

edited

Loading