feat(QueryTracker): track query execution cycle #630

ArslanSaleem · 2023-10-10T07:55:24Z

Summary by CodeRabbit

New Feature: Introduced QueryExecTracker class in pandasai.helpers.query_exec_tracker for tracking the execution of queries, maintaining execution steps, and generating a summary of the execution.
New Feature: Added APILogger class in pandasai.custom_loggers.api_logger for sending log messages to a server.
Refactor: Updated pandasai.smart_datalake to use QueryExecTracker for tracking query execution, replacing the previous timer-based tracking.
Test: Added comprehensive tests for QueryExecTracker in tests/test_query_tracker.py to ensure correct tracking of query execution.
Chore: Updated examples in examples/custom_logger.py and examples/using_pandasai_log_server.py to demonstrate the usage of the new APILogger and QueryExecTracker classes.

coderabbitai · 2023-10-10T07:55:28Z

Walkthrough

The changes introduce a QueryExecTracker class for tracking query execution and logging. The Agent class now uses this tracker, and the APILogger class sends logs to a server. Test coverage for the new tracker is added. A custom logger example and a log server configuration are also included.

Changes

File(s)	Summary
`pandasai/helpers/query_exec_tracker.py`	Introduced `QueryExecTracker` class for tracking query execution and logging.
`pandasai/smart_datalake/__init__.py`	Updated `Agent` class to use `QueryExecTracker` for execution tracking and logging.
`pandasai/custom_loggers/api_logger.py`	Added `APILogger` class for sending logs to a server.
`tests/test_query_tracker.py`	Added test coverage for `QueryExecTracker`.
`examples/custom_logger.py`	Added an example of using a custom logger with an AI agent.
`pandasai/schemas/df_config.py`	Added a `log_server` field to the `Config` class for log server configuration.

🐇
"In the land of code, where logic is king,
A rabbit hopped in, changes to bring.
With a tracker in hand, and a logger so bright,
He danced through the code, from morning till night.
Now queries are tracked, and logs are sent far,
In the world of AI, we've raised the bar!" 🌟

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

Mention @coderabbitai in any review comment for bot assistance.
Note: Review comments are made on code diffs or files, not on the PR overview.

Pause Incremental Reviews

Insert @coderabbitai: ignore in the PR description to halt the bot's ongoing reviews. Remove the line to resume.
Additionally, enter @coderabbitai pause to halt the bot's ongoing reviews as any issue or review comment. To
resume reviews, enter @coderabbitai resume as any issue or review comment.

coderabbitai

Review Status

Actionable comments generated: 7

Commits

Files that changed from the base of the PR and between 3813663 and 45672fe.

Files selected for processing (7)

examples/custom_logger.py (1 hunks)
pandasai/custom_loggers/api_logger.py (1 hunks)
pandasai/helpers/query_exec_tracker.py (1 hunks)
pandasai/smart_datalake/init.py (7 hunks)
tests/test_codemanager.py (1 hunks)
tests/test_query_tracker.py (1 hunks)
tests/test_smartdatalake.py (1 hunks)

Files skipped from review due to trivial changes (2)

tests/test_codemanager.py
tests/test_smartdatalake.py

Additional comments (Suppressed): 8

pandasai/smart_datalake/__init__.py (8)

20-29: The time module has been removed from the imports. Ensure that this does not affect other parts of the code that might be using it. Also, the QueryExecTracker class has been imported from pandasai.helpers.query_exec_tracker. Verify that this class is implemented correctly and that it is available in the specified location.

193-195: The _start_timer method has been removed. This method was previously used to start a timer for tracking the execution time of the query. If the execution time is now being tracked elsewhere, ensure that it is being done correctly.

276-293: The QueryExecTracker instance is being created and used to track the execution of the query. Ensure that the QueryExecTracker class is implemented correctly and that it provides the necessary functionality for tracking the execution of the query.

299-325: The QueryExecTracker instance is being used to execute functions and track their execution time. Ensure that the execute_func method of the QueryExecTracker class is implemented correctly and that it provides accurate timing information.

342-349: The QueryExecTracker instance is being used to execute the execute_code method of the _code_manager instance. Ensure that the execute_func method of the QueryExecTracker class is implemented correctly and that it provides accurate timing information.

364-369: The QueryExecTracker instance is being used to execute the _retry_run_code method. Ensure that the execute_func method of the QueryExecTracker class is implemented correctly and that it provides accurate timing information.

375-392: The QueryExecTracker instance is being used to add steps to the execution tracking. Ensure that the add_step method of the QueryExecTracker class is implemented correctly and that it provides the necessary functionality for tracking the execution steps.

398-413: The QueryExecTracker instance is being used to get the summary of the execution tracking and the execution time. Ensure that the get_summary and get_execution_time methods of the QueryExecTracker class are implemented correctly and that they provide accurate information.

coderabbitai · 2023-10-10T07:58:13Z

examples/custom_logger.py

+import pandas as pd
+from pandasai import Agent
+from pandasai.custom_loggers.api_logger import APILogger
+
+from pandasai.llm.openai import OpenAI
+
+employees_data = {
+    "EmployeeID": [1, 2, 3, 4, 5],
+    "Name": ["John", "Emma", "Liam", "Olivia", "William"],
+    "Department": ["HR", "Sales", "IT", "Marketing", "Finance"],
+}
+
+salaries_data = {
+    "EmployeeID": [1, 2, 3, 4, 5],
+    "Salary": [5000, 6000, 4500, 7000, 5500],
+}
+
+employees_df = pd.DataFrame(employees_data)
+salaries_df = pd.DataFrame(salaries_data)
+
+
+llm = OpenAI("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")
+agent = Agent(
+    [employees_df, salaries_df],
+    config={"llm": llm, "enable_cache": True},
+    memory_size=10,
+    logger=APILogger("SERVER-URL", "USER-ID", "API-KEY"),
+)
+
+# Chat with the agent
+response = agent.chat("Who gets paid the most?")
+print(response)


The code looks good overall. However, it's not a good practice to hardcode sensitive information like API keys directly in the code. Consider using environment variables or a secure configuration file to store these values. This will also make your code more flexible as you can change the values without modifying the code.

- llm = OpenAI("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") + import os + llm = OpenAI(os.getenv("OPENAI_KEY")) - logger=APILogger("SERVER-URL", "USER-ID", "API-KEY"), + logger=APILogger(os.getenv("SERVER_URL"), os.getenv("USER_ID"), os.getenv("API_KEY")),

coderabbitai · 2023-10-10T07:58:13Z

pandasai/custom_loggers/api_logger.py

+class APILogger(Logger):
+    _api_key: str = None
+    _server_url: str = None
+    _user_id: str = None
+
+    def __init__(self, server_url: str, user_id: str, api_key: str):
+        self._api_key = api_key
+        self._server_url = server_url
+        self._user_id = user_id
+
+    def log(self, message: str, level: int = logging.INFO):
+        try:
+            log_data = {
+                # TODO - Remove user id from the API
+                "user_id": self._user_id,
+                "api_key_id": self._api_key,
+                "json_log": message,
+            }
+            response = requests.post(f"{self._server_url}/api/log/add", json=log_data)
+            if response.status_code != 200:
+                raise Exception(response.text)
+        except Exception:
+            pass


The APILogger class extends the Logger class but does not call the parent's constructor in its __init__ method. This could lead to issues if the parent class Logger has important initializations in its constructor. Also, the log method silently passes any exceptions, which could make debugging difficult if there are issues with the logging. Consider logging the exception or re-raising it after handling.

def __init__(self, server_url: str, user_id: str, api_key: str): + super().__init__() self._api_key = api_key self._server_url = server_url self._user_id = user_id ... except Exception as e: + super().error(f"Failed to log message: {e}") pass

coderabbitai · 2023-10-10T07:58:13Z

pandasai/custom_loggers/api_logger.py

+    def log(self, message: str, level: int = logging.INFO):
+        try:
+            log_data = {
+                # TODO - Remove user id from the API
+                "user_id": self._user_id,
+                "api_key_id": self._api_key,
+                "json_log": message,
+            }
+            response = requests.post(f"{self._server_url}/api/log/add", json=log_data)
+            if response.status_code != 200:
+                raise Exception(response.text)
+        except Exception:
+            pass


The log method sends a POST request to the server every time it is called. This could be a performance issue if the method is called frequently. Consider using a buffer to store log messages and send them in batches.

coderabbitai · 2023-10-10T07:58:14Z

tests/test_query_tracker.py

+        tracker._steps = []
+        # Add the step to the tracker
+        tracker.add_step(step)
+
+        # Check if the step was added correctly
+        assert len(tracker._steps) == 1
+        assert tracker._steps[0] == step
+
+    def test_execute_func_success(self, tracker: QueryExecTracker):
+        tracker._steps = []
+
+        # Create a mock function
+        mock_return_value = Mock()
+        mock_return_value.to_string = Mock()
+        mock_return_value.to_string.return_value = "Mock Result"
+
+        mock_func = Mock()
+        mock_func.return_value = mock_return_value
+
+        # Execute the mock function using execute_func
+        result = tracker.execute_func(mock_func, tag="_get_prompt")
+
+        # Check if the result is as expected
+        assert result.to_string() == "Mock Result"
+        # Check if the step was added correctly
+        assert len(tracker._steps) == 1
+        step = tracker._steps[0]
+        assert step["type"] == "Generate Prompt"
+        assert step["success"] is True
+
+    def test_execute_func_failure(self, tracker: QueryExecTracker):
+        # Create a mock function that raises an exception
+        def mock_function(*args, **kwargs):
+            raise Exception("Mock Exception")
+
+        # Execute the mock function using execute_func and expect an exception
+        with pytest.raises(Exception):
+            tracker.execute_func(mock_function, tag="custom_tag")
+
+    def test_format_response_dataframe(
+        self, tracker: QueryExecTracker, sample_df: pd.DataFrame
+    ):
+        # Create a sample ResponseType for a dataframe
+        response = {"type": "dataframe", "value": sample_df}
+
+        # Format the response using _format_response
+        formatted_response = tracker._format_response(response)
+
+        # Check if the response is formatted correctly
+        assert formatted_response["type"] == "dataframe"
+        assert len(formatted_response["value"]["headers"]) == 3
+        assert len(formatted_response["value"]["rows"]) == 10
+
+    def test_format_response_other_type(self, tracker: QueryExecTracker):
+        # Create a sample ResponseType for a non-dataframe response
+        response = {
+            "type": "other_type",
+            "value": "SomeValue",
+        }
+
+        # Format the response using _format_response
+        formatted_response = tracker._format_response(response)
+
+        # Check if the response is formatted correctly
+        assert formatted_response["type"] == "other_type"
+        assert formatted_response["value"] == "SomeValue"
+
+    def test_get_summary(self, tracker: QueryExecTracker):
+        # Execute a mock function to generate some steps and response
+        def mock_function(*args, **kwargs):
+            return "Mock Result"
+
+        tracker.execute_func(mock_function, tag="custom_tag")
+
+        # Get the summary
+        summary = tracker.get_summary()
+
+        # Check if the summary contains the expected keys
+        assert "query_info" in summary
+        assert "dataframes" in summary
+        assert "steps" in summary
+        assert "response" in summary
+        assert "execution_time" in summary
+
+    def test_get_execution_time(self, tracker: QueryExecTracker):
+        def mock_function(*args, **kwargs):
+            time.sleep(1)
+            return "Mock Result"
+
+        # Sleep for a while to simulate execution time
+        with patch("time.time", return_value=0):
+            tracker.execute_func(mock_function, tag="cache_hit")
+
+        # Get the execution time
+        execution_time = tracker.get_execution_time()
+
+        print("Type", execution_time)
+
+        # Check if the execution time is approximately 1 second
+        self.assert_almost_equal(execution_time, 1.0, delta=0.1)


The test cases are well written and cover all the methods in the QueryExecTracker class. However, there are a few areas that could be improved:

The assert_almost_equal method (lines 80-87) is a custom implementation of a function that already exists in the unittest module. It would be better to use the built-in function to reduce code complexity and potential errors.

The test_execute_func_failure test (lines 135-143) could be improved by checking the specific exception message. This would ensure that the function fails for the expected reason.

The test_get_execution_time test (lines 189-204) uses a time.sleep call to simulate execution time. This could potentially slow down the test suite. Consider using mocking to simulate the passage of time instead.

Here are the suggested changes:

- # Define a custom assert_almost_equal function - def assert_almost_equal(self, first, second, places=None, msg=None, delta=None): - if delta is not None: - if abs(float(first) - float(second)) > delta: - raise AssertionError(msg or f"{first} != {second} within {delta}") - else: - assert round(abs(float(second) - float(first)), places) == 0, ( - msg or f"{first} != {second} within {places} places" - ) + from unittest import TestCase + assert_almost_equal = TestCase().assertAlmostEqual - # Execute the mock function using execute_func and expect an exception - with pytest.raises(Exception): - tracker.execute_func(mock_function, tag="custom_tag") + # Execute the mock function using execute_func and expect a specific exception + with pytest.raises(Exception) as e: + tracker.execute_func(mock_function, tag="custom_tag") + assert str(e.value) == "Mock Exception", "Unexpected exception message" - def mock_function(*args, **kwargs): - time.sleep(1) - return "Mock Result" - - # Sleep for a while to simulate execution time - with patch("time.time", return_value=0): - tracker.execute_func(mock_function, tag="cache_hit") + # Mock time.sleep to simulate execution time without actually delaying the test + with patch("time.sleep", new=lambda x: None): + tracker.execute_func(mock_function, tag="cache_hit")

coderabbitai

Review Status

Actionable comments generated: 12

Commits

Files that changed from the base of the PR and between 45672fe and 7c396f7.

Files selected for processing (3)

examples/custom_logger.py (1 hunks)
pandasai/smart_datalake/init.py (7 hunks)
tests/test_query_tracker.py (1 hunks)

Additional comments (Suppressed): 11

examples/custom_logger.py (3)

1-32: The Agent class is being initialized with a logger instance of APILogger. Ensure that the server at "SERVER-URL" is set up to receive and handle log messages correctly. Also, verify that the "USER-ID" and "API-KEY" are valid and have the necessary permissions to send log messages to the server.

22-28: The OpenAI class is being initialized with "Your-API-Key". Make sure to replace "Your-API-Key" with a valid API key. Also, ensure that the Agent class is compatible with the OpenAI class and can handle any exceptions or errors that may be thrown by it.

31-32: The chat method is being called on the Agent instance. Ensure that the Agent class has a chat method and that it can handle the string "Who gets paid the most?" as an argument. Also, verify that the chat method returns a response that can be printed.

pandasai/smart_datalake/__init__.py (8)

20-29: The import of time module has been removed and QueryExecTracker from pandasai.helpers.query_exec_tracker has been added. Ensure that the QueryExecTracker class is correctly implemented and that the removal of time does not affect other parts of the code.

190-195: The _start_timer method has been removed. This change is acceptable if the timing functionality is now handled by the QueryExecTracker class. Verify that this is the case.

276-292: The QueryExecTracker instance is created and dataframes are added to it. This is a good practice as it encapsulates the tracking logic in a separate class, improving modularity and maintainability.

298-324: The QueryExecTracker's execute_func method is used to execute functions and track their execution time. This is a good practice as it allows for more granular tracking of execution times.

341-348: The QueryExecTracker's execute_func method is used to execute the code and track its execution time. This is a good practice as it allows for more granular tracking of execution times.

363-368: The QueryExecTracker's execute_func method is used to retry running the code and track its execution time. This is a good practice as it allows for more granular tracking of execution times.

374-395: The QueryExecTracker's add_step method is used to add steps to the tracker. This is a good practice as it allows for more granular tracking of the execution process.

396-412: The QueryExecTracker's get_summary and get_execution_time methods are used to log the summary and execution time. This is a good practice as it allows for more granular tracking of the execution process.

coderabbitai · 2023-10-10T11:38:12Z

tests/test_query_tracker.py

+        # Create a sample step
+        step = {"type": "CustomStep", "description": "This is a custom step."}
+
+        tracker._steps = []
+        # Add the step to the tracker
+        tracker.add_step(step)
+
+        # Check if the step was added correctly
+        assert len(tracker._steps) == 1
+        assert tracker._steps[0] == step


The test test_add_step assumes that the step will always be added at the end of the _steps list. This might not be the case if the implementation of add_step changes in the future. Consider making the test more flexible by checking if the step is in the _steps list, not necessarily at the end.

coderabbitai · 2023-10-10T11:38:12Z

tests/test_query_tracker.py

+    def test_format_response_dataframe(
+        self, tracker: QueryExecTracker, sample_df: pd.DataFrame
+    ):
+        # Create a sample ResponseType for a dataframe
+        response = {"type": "dataframe", "value": sample_df}
+
+        # Format the response using _format_response
+        formatted_response = tracker._format_response(response)
+
+        # Check if the response is formatted correctly
+        assert formatted_response["type"] == "dataframe"
+        assert len(formatted_response["value"]["headers"]) == 3
+        assert len(formatted_response["value"]["rows"]) == 10
+


The test test_format_response_dataframe assumes that the dataframe will always have 10 rows. This might not be the case if the dataframe changes in the future. Consider making the test more flexible by checking the actual number of rows in the dataframe.

coderabbitai · 2023-10-10T11:38:12Z

tests/test_query_tracker.py

+    def test_get_summary(self, tracker: QueryExecTracker):
+        # Execute a mock function to generate some steps and response
+        def mock_function(*args, **kwargs):
+            return "Mock Result"
+
+        tracker.execute_func(mock_function, tag="custom_tag")
+
+        # Get the summary
+        summary = tracker.get_summary()
+
+        # Check if the summary contains the expected keys
+        assert "query_info" in summary
+        assert "dataframes" in summary
+        assert "steps" in summary
+        assert "response" in summary
+        assert "execution_time" in summary
+


The test test_get_summary assumes that the summary will always contain the keys "query_info", "dataframes", "steps", "response", and "execution_time". This might not be the case if the implementation of get_summary changes in the future. Consider making the test more flexible by checking if the keys are in the summary, not necessarily all of them.

coderabbitai · 2023-10-10T11:38:12Z

tests/test_query_tracker.py

+    def test_get_execution_time(self, tracker: QueryExecTracker):
+        def mock_function(*args, **kwargs):
+            time.sleep(1)
+            return "Mock Result"
+
+        # Sleep for a while to simulate execution time
+        with patch("time.time", return_value=0):
+            tracker.execute_func(mock_function, tag="cache_hit")
+
+        # Get the execution time
+        execution_time = tracker.get_execution_time()
+
+        # Check if the execution time is approximately 1 second
+        assert_almost_equal(execution_time, 1.0, delta=0.3)


The test test_get_execution_time assumes that the execution time will always be approximately 1 second. This might not be the case if the implementation of execute_func or the system's performance changes in the future. Consider making the test more flexible by checking if the execution time is within a reasonable range.

coderabbitai · 2023-10-10T11:38:12Z

tests/test_query_tracker.py

+    def test_execute_func_success(self, tracker: QueryExecTracker):
+        tracker._steps = []
+
+        # Create a mock function
+        mock_return_value = Mock()
+        mock_return_value.to_string = Mock()
+        mock_return_value.to_string.return_value = "Mock Result"
+
+        mock_func = Mock()
+        mock_func.return_value = mock_return_value
+
+        # Execute the mock function using execute_func
+        result = tracker.execute_func(mock_func, tag="_get_prompt")
+
+        # Check if the result is as expected
+        assert result.to_string() == "Mock Result"
+        # Check if the step was added correctly
+        assert len(tracker._steps) == 1
+        step = tracker._steps[0]
+        assert step["type"] == "Generate Prompt"
+        assert step["success"] is True
+


The test test_execute_func_success assumes that the step will always be added at the end of the _steps list and that the type of the step will always be "Generate Prompt". This might not be the case if the implementation of execute_func or add_step changes in the future. Consider making the test more flexible by checking if the step is in the _steps list and if the type of the step is one of the possible types.

coderabbitai · 2023-10-10T11:38:13Z

tests/test_query_tracker.py

+        tracker._steps = []
+
+        mock_func = Mock()
+        mock_func.return_value = "code"
+
+        # Execute the mock function using execute_func
+        result = tracker.execute_func(mock_func, tag="cache_hit")
+
+        # Check if the result is as expected
+        assert result == "code"
+        # Check if the step was added correctly
+        assert len(tracker._steps) == 1
+        step = tracker._steps[0]
+        assert "code_generated" in step
+        assert step["type"] == "Cache Hit"
+        assert step["success"] is True
+


The test test_execute_func_cache_hit assumes that the step will always be added at the end of the _steps list and that the type of the step will always be "Cache Hit". This might not be the case if the implementation of execute_func or add_step changes in the future. Consider making the test more flexible by checking if the step is in the _steps list and if the type of the step is one of the possible types.

coderabbitai · 2023-10-10T11:38:13Z

tests/test_query_tracker.py

+    def test_execute_func_generate_code(self, tracker: QueryExecTracker):
+        tracker._steps = []
+
+        # Create a mock function
+        mock_func = Mock()
+        mock_func.return_value = "code"
+
+        # Execute the mock function using execute_func
+        result = tracker.execute_func(mock_func, tag="generate_code")
+
+        # Check if the result is as expected
+        assert result == "code"
+        # Check if the step was added correctly
+        assert len(tracker._steps) == 1
+        step = tracker._steps[0]
+        assert "code_generated" in step
+        assert step["type"] == "Generate Code"
+        assert step["success"] is True


The test test_execute_func_generate_code assumes that the step will always be added at the end of the _steps list and that the type of the step will always be "Generate Code". This might not be the case if the implementation of execute_func or add_step changes in the future. Consider making the test more flexible by checking if the step is in the _steps list and if the type of the step is one of the possible types.

coderabbitai · 2023-10-10T11:38:13Z

tests/test_query_tracker.py

+    def test_execute_func_re_rerun_code(self, tracker: QueryExecTracker):
+        tracker._steps = []
+
+        # Create a mock function
+        mock_func = Mock()
+        mock_func.return_value = "code"
+
+        # Execute the mock function using execute_func
+        result = tracker.execute_func(mock_func, tag="_retry_run_code")
+
+        # Check if the result is as expected
+        assert result == "code"
+        # Check if the step was added correctly
+        assert len(tracker._steps) == 1
+        step = tracker._steps[0]
+        assert "code_generated" in step
+        assert step["type"] == "Retry Code Generation"
+        assert step["success"] is True


The test test_execute_func_re_rerun_code assumes that the step will always be added at the end of the _steps list and that the type of the step will always be "Retry Code Generation". This might not be the case if the implementation of execute_func or add_step changes in the future. Consider making the test more flexible by checking if the step is in the _steps list and if the type of the step is one of the possible types.

coderabbitai · 2023-10-10T11:38:13Z

tests/test_query_tracker.py

+    def test_execute_func_execute_code_success(
+        self, sample_df: pd.DataFrame, tracker: QueryExecTracker
+    ):
+        tracker._steps = []
+
+        mock_func = Mock()
+        mock_func.return_value = {"type": "dataframe", "value": sample_df}
+
+        # Execute the mock function using execute_func
+        result = tracker.execute_func(mock_func, tag="execute_code")
+
+        # Check if the result is as expected
+        assert result["type"] == "dataframe"
+        # Check if the step was added correctly
+        assert len(tracker._steps) == 1
+        step = tracker._steps[0]
+        assert "result" in step
+        assert step["type"] == "Code Execution"
+        assert step["success"] is True
+


The test test_execute_func_execute_code_success assumes that the step will always be added at the end of the _steps list and that the type of the step will always be "Code Execution". This might not be the case if the implementation of execute_func or add_step changes in the future. Consider making the test more flexible by checking if the step is in the _steps list and if the type of the step is one of the possible types.

coderabbitai · 2023-10-10T11:38:13Z

tests/test_query_tracker.py

+    def test_execute_func_execute_code_fail(
+        self, sample_df: pd.DataFrame, tracker: QueryExecTracker
+    ):
+        tracker._steps = []
+
+        def mock_function(*args, **kwargs):
+            raise Exception("Mock Exception")
+
+        with pytest.raises(Exception):
+            tracker.execute_func(mock_function, tag="execute_code")
+
+        assert len(tracker._steps) == 1
+        step = tracker._steps[0]
+        assert step["type"] == "Code Execution"
+        assert step["success"] is False


The test test_execute_func_execute_code_fail assumes that the function will always raise an exception and that the type of the step will always be "Code Execution". This might not be the case if the implementation of execute_func or add_step changes in the future. Consider making the test more flexible by checking if the function raises an exception under certain conditions and if the type of the step is one of the possible types.

codecov-commenter · 2023-10-10T11:38:22Z

Codecov Report

Merging #630 (5fcf2d9) into main (93996fb) will increase coverage by 0.44%.
Report is 7 commits behind head on main.
The diff coverage is 97.61%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##             main     #630      +/-   ##
==========================================
+ Coverage   84.29%   84.73%   +0.44%     
==========================================
  Files          65       66       +1     
  Lines        3012     3099      +87     
==========================================
+ Hits         2539     2626      +87     
  Misses        473      473

Files	Coverage Δ
pandasai/helpers/query_exec_tracker.py	`98.50% <98.50%> (ø)`
pandasai/smart_datalake/__init__.py	`93.33% <94.11%> (+0.12%)`	⬆️

... and 7 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

coderabbitai

Review Status

Actionable comments generated: 6

Commits

Files that changed from the base of the PR and between 7c396f7 and c957373.

Files selected for processing (1)

pandasai/helpers/query_exec_tracker.py (1 hunks)

coderabbitai · 2023-10-10T13:11:55Z

pandasai/helpers/query_exec_tracker.py

+    def execute_func(self, function, *args, **kwargs) -> Any:
+        """
+        Tracks function executions, calculates execution time and prepare data
+        Args:
+            function (function): Function that is to be executed
+
+        Returns:
+            Any: Response return after function execution
+        """
+        start_time = time.time()
+
+        func_name = kwargs["tag"] if "tag" in kwargs else function.__name__
+
+        try:
+            result = function(*args, **kwargs)
+
+            execution_time = time.time() - start_time
+            if func_name not in exec_steps:
+                return result
+
+            step_data = self._generate_exec_step(func_name, result)
+
+            step_data["type"] = exec_steps[func_name]
+            step_data["success"] = True
+            step_data["execution_time"] = execution_time
+
+            self._steps.append(step_data)
+
+            return result
+
+        except Exception:
+            execution_time = time.time() - start_time
+            self._steps.append(
+                {
+                    "type": exec_steps[func_name],
+                    "success": False,
+                    "execution_time": execution_time,
+                }
+            )
+            raise
+


The execute_func method is doing too much. It's executing a function, timing it, generating step data, and handling exceptions. Consider breaking this method into smaller, more focused methods to improve readability and maintainability.

coderabbitai · 2023-10-10T13:11:56Z

pandasai/helpers/query_exec_tracker.py

+        func_name = kwargs["tag"] if "tag" in kwargs else function.__name__
+
+        try:


The tag keyword argument is used to override the function name in the tracking data. This could lead to confusion if the tag does not accurately represent the function being executed. Consider removing this feature or ensuring that the tag is always representative of the function.

coderabbitai · 2023-10-10T13:11:56Z

pandasai/helpers/query_exec_tracker.py

+        except Exception:
+            execution_time = time.time() - start_time
+            self._steps.append(
+                {
+                    "type": exec_steps[func_name],
+                    "success": False,
+                    "execution_time": execution_time,
+                }
+            )
+            raise


The exception handling here is too broad. Catching all exceptions and simply re-raising them after logging the failure does not provide much value. Consider catching specific exceptions that you can handle in a meaningful way.

coderabbitai · 2023-10-10T13:11:56Z

pandasai/helpers/query_exec_tracker.py

+    def _generate_exec_step(self, func_name: str, result: Any) -> dict:
+        """
+        Extracts and Generates result
+        Args:
+            func_name (str): function name that is executed
+            result (Any): function output response
+
+        Returns:
+            dict: dictionary with information about the function execution
+        """
+        if (
+            func_name == "cache_hit"
+            or func_name == "generate_code"
+            or func_name == "_retry_run_code"
+        ):
+            return {"code_generated": result}
+        elif func_name == "_get_prompt":
+            return {
+                "prompt_class": result.__class__.__name__,
+                "generated_prompt": result.to_string(),
+            }
+        elif func_name == "execute_code":
+            self._response = self._format_response(result)
+            return {"result": result}
+


The _generate_exec_step method uses a series of if statements to determine what data to return based on the function name. This is not very scalable or maintainable. Consider using a dictionary mapping function names to handler functions, or redesigning this part of the system to avoid the need for this method.

coderabbitai · 2023-10-10T13:11:56Z

pandasai/helpers/query_exec_tracker.py

+        summary = self.get_summary_dict()
+        try:
+            return json.dumps(summary, indent=4)
+        except Exception:
+            return summary


The get_summary method attempts to return a JSON string representation of the summary, but falls back to returning the raw dictionary if this fails. This could lead to inconsistent return types. Consider always returning the raw dictionary and letting the caller decide how to serialize it.

coderabbitai

Review Status

Actionable comments generated: 8

Commits

Files that changed from the base of the PR and between c957373 and f32f2a1.

Files selected for processing (2)

pandasai/helpers/query_exec_tracker.py (1 hunks)
tests/test_query_tracker.py (1 hunks)

Additional comments (Suppressed): 1

pandasai/helpers/query_exec_tracker.py (1)

33-39: The conversation_id is being converted to a string. If it's not a string, this could lead to unexpected behavior. Verify the type of conversation_id before passing it to the QueryExecTracker.

coderabbitai · 2023-10-10T16:49:52Z

tests/test_query_tracker.py

+        tracker._dataframes = []
+        tracker.add_dataframes([smart_dataframe])
+
+        # Check if the dataframe was added correctly
+        assert len(tracker._dataframes) == 1
+        assert len(tracker._dataframes[0]["headers"]) == 3
+        assert len(tracker._dataframes[0]["rows"]) == 5
+


The test test_add_dataframes assumes that the dataframe will always have 3 headers and 5 rows. This might not be the case for different dataframes. Consider making the test more generic to handle different dataframes.

coderabbitai · 2023-10-10T16:49:52Z

tests/test_query_tracker.py

+        with patch("time.time", return_value=0):
+            tracker.execute_func(mock_function, tag="cache_hit")
+
+        # Get the execution time
+        execution_time = tracker.get_execution_time()
+
+        # Check if the execution time is approximately 1 second
+        assert_almost_equal(execution_time, 1.0, delta=0.3)


The test test_get_execution_time assumes that the execution time will always be approximately 1 second. This might not be the case for different functions or systems. Consider making the test more generic to handle different execution times.

coderabbitai · 2023-10-10T16:49:52Z

tests/test_query_tracker.py

+        # Create a mock function that raises an exception
+        def mock_function(*args, **kwargs):
+            raise Exception("Mock Exception")
+
+        with pytest.raises(Exception):
+            tracker.execute_func(mock_function, tag="custom_tag")
+


The test test_execute_func_failure does not check if the step was added correctly to the tracker after the function raises an exception. Consider adding assertions to check the step in the tracker.

coderabbitai · 2023-10-10T16:49:52Z

tests/test_query_tracker.py

+
+        mock_func = Mock()
+        mock_func.side_effect = Exception("Mock Exception")
+        mock_func.__name__ = "execute_code"
+
+        with pytest.raises(Exception):
+            tracker.execute_func(mock_func)
+
+        assert len(tracker._steps) == 1
+        step = tracker._steps[0]
+        assert step["type"] == "Code Execution"
+        assert step["success"] is False


The test test_execute_func_execute_code_fail does not check if the error message in the step is correct after the function raises an exception. Consider adding assertions to check the error message in the step.

coderabbitai · 2023-10-10T16:49:53Z

pandasai/helpers/query_exec_tracker.py

+    _query_info: str = {}
+    _dataframes: List = []
+    _response: ResponseType = {}
+    _steps: List = []
+    _start_time = None


The class variables _query_info, _dataframes, _response, _steps, and _start_time should be instance variables. Class variables are shared across all instances of the class, which is not the desired behavior here. Each QueryExecTracker instance should have its own state.

- _query_info: str = {} - _dataframes: List = [] - _response: ResponseType = {} - _steps: List = [] - _start_time = None + def __init__( + self, + conversation_id: str, + query: str, + instance: str, + output_type: str, + ) -> None: + self._query_info: str = {} + self._dataframes: List = [] + self._response: ResponseType = {} + self._steps: List = [] + self._start_time = None

coderabbitai · 2023-10-10T16:49:53Z

pandasai/helpers/query_exec_tracker.py

+        if (
+            func_name == "cache_hit"
+            or func_name == "generate_code"
+            or func_name == "_retry_run_code"
+        ):
+            return {"code_generated": result}
+        elif func_name == "_get_prompt":
+            return {
+                "prompt_class": result.__class__.__name__,
+                "generated_prompt": result.to_string(),
+            }
+        elif func_name == "execute_code":
+            self._response = self._format_response(result)
+            return {"result": result}
+


The _generate_exec_step method is using a series of if statements to determine the function name. This could be simplified and made more maintainable by using a dictionary mapping function names to handlers.

coderabbitai · 2023-10-10T16:49:53Z

pandasai/helpers/query_exec_tracker.py

+    def _format_response(self, result: ResponseType) -> ResponseType:
+        """
+        Format output response
+        Args:
+            result (ResponseType): response returned after execution
+
+        Returns:
+            ResponseType: formatted response output
+        """
+        formatted_result = {}
+        if result["type"] == "dataframe":
+            formatted_result = {
+                "type": result["type"],
+                "value": {
+                    "headers": result["value"].columns.tolist(),
+                    "rows": result["value"].values.tolist(),
+                },
+            }
+            return formatted_result
+        else:
+            return result


The _format_response method is only formatting the response if the type is "dataframe". If the type is not "dataframe", it returns the result as is. This could lead to inconsistent response formats. Consider handling other types as well.

coderabbitai

Review Status

Actionable comments generated: 6

Commits

Files that changed from the base of the PR and between f32f2a1 and 5fcf2d9.

Files selected for processing (4)

examples/custom_logger.py (1 hunks)
pandasai/custom_loggers/api_logger.py (1 hunks)
pandasai/helpers/query_exec_tracker.py (1 hunks)
pandasai/smart_datalake/init.py (7 hunks)

Additional comments (Suppressed): 8

examples/custom_logger.py (1)

1-36: The new hunk demonstrates the usage of the APILogger class. It creates an instance of Agent with APILogger as the logger. The chat method is then called on the Agent instance, and the response is printed. The code seems to be logically correct and follows best practices. However, ensure that the "SERVER-URL" and "API-KEY" are replaced with actual values when using this code. Also, ensure that the "OPEN-API-KEY" is replaced with your actual OpenAI API key.

pandasai/smart_datalake/__init__.py (7)

25-29: The QueryExecTracker class is imported but the time module is no longer imported. This suggests that the QueryExecTracker class is being used to track the execution time of queries, which was previously done using the time module. Ensure that the QueryExecTracker class provides the same functionality as the time module and that it is used correctly throughout the code.

193-195: The _start_timer method has been removed. This method was previously used to start a timer for tracking the execution time of queries. Ensure that the removal of this method does not affect the functionality of the code and that the execution time of queries is still being tracked correctly.

282-288: A QueryExecTracker object is created and dataframes are added to it. This object is used to track the execution of queries. Ensure that the QueryExecTracker object is used correctly throughout the code and that it provides the same functionality as the previous method of tracking query execution.

366-368: The execute_func method of the QueryExecTracker object is used to execute functions while tracking their execution time. Ensure that this method is used correctly and that it provides the same functionality as the previous method of executing functions.

377-391: Steps are added to the QueryExecTracker object. These steps represent the different stages of the query execution process. Ensure that these steps are added correctly and that they accurately represent the query execution process.

407-407: The get_execution_time method of the QueryExecTracker object is used to get the execution time of the query. Ensure that this method provides the same functionality as the previous method of getting the execution time.

415-415: The execute_func method of the QueryExecTracker object is used to parse the result of the query, and the get_summary method is used to get a summary of the query execution. Ensure that these methods are used correctly and that they provide the same functionality as the previous methods of parsing the result and getting a summary.

coderabbitai · 2023-10-11T18:28:03Z

pandasai/custom_loggers/api_logger.py

+import logging
+from typing import Union
+import requests
+from pandasai.helpers.logger import Logger
+
+
+class APILogger(Logger):
+    _api_key: str = None
+    _server_url: str = None
+
+    def __init__(self, server_url: str, api_key: str):
+        self._api_key = api_key
+        self._server_url = server_url
+
+    def log(self, message: Union[str, dict], level: int = logging.INFO):
+        try:
+            if isinstance(message, dict):
+                log_data = {
+                    "api_key_id": self._api_key,
+                    "json_log": message,
+                }
+                response = requests.post(
+                    f"{self._server_url}/api/log/add", json=log_data
+                )
+                if response.status_code != 200:
+                    raise Exception(response.text)
+
+        except Exception as e:
+            print(f"Exception in APILogger: {e}")


The APILogger class is a good example of a custom logger that sends log messages to a server. However, there are a few areas that could be improved:

Error handling: In the log method, you're catching all exceptions and printing them. This is not a good practice because it can make debugging difficult. Instead, consider logging the error message and re-throwing the exception, or handling specific exceptions that you expect might occur.

Security: You're sending the API key in the body of the POST request. This could potentially expose the API key to anyone who has access to the logs or network traffic. Consider sending the API key in the headers of the request instead.

Performance: The log method sends a POST request every time it's called. This could potentially slow down your application if you're logging a lot of messages. Consider batching log messages and sending them in one request, or using a background thread to send log messages.

Here's how you might address these issues:

def log(self, message: Union[str, dict], level: int = logging.INFO): try: if isinstance(message, dict): log_data = { "json_log": message, } headers = {"Authorization": f"Bearer {self._api_key}"} response = requests.post( f"{self._server_url}/api/log/add", json=log_data, headers=headers ) if response.status_code != 200: raise Exception(response.text) except Exception as e: logging.error(f"Exception in APILogger: {e}") raise

coderabbitai · 2023-10-11T18:28:03Z

pandasai/helpers/query_exec_tracker.py

+    _query_info: str = {}
+    _dataframes: List = []
+    _response: ResponseType = {}
+    _steps: List = []
+    _start_time = None
+    _success: bool = False
+


The class variables are shared across all instances of the class. This could lead to unexpected behavior if multiple instances of QueryExecTracker are created. These should be instance variables, defined inside the __init__ method.

- _query_info: str = {} - _dataframes: List = [] - _response: ResponseType = {} - _steps: List = [] - _start_time = None - _success: bool = False + def __init__( + self, + conversation_id: str, + query: str, + instance: str, + output_type: str, + ) -> None: + self._query_info: str = {} + self._dataframes: List = [] + self._response: ResponseType = {} + self._steps: List = [] + self._start_time = None + self._success: bool = False + ...

coderabbitai · 2023-10-11T18:28:03Z

pandasai/helpers/query_exec_tracker.py

+    def execute_func(self, function, *args, **kwargs) -> Any:
+        """
+        Tracks function executions, calculates execution time and prepare data
+        Args:
+            function (function): Function that is to be executed
+
+        Returns:
+            Any: Response return after function execution
+        """
+        start_time = time.time()
+
+        # Get the tag from kwargs if provided, or use the function name as the default
+        tag = kwargs.pop("tag", function.__name__)
+
+        try:
+            result = function(*args, **kwargs)
+
+            execution_time = time.time() - start_time
+            if tag not in exec_steps:
+                return result
+
+            step_data = self._generate_exec_step(tag, result)
+
+            step_data["type"] = exec_steps[tag]
+            step_data["success"] = True
+            step_data["execution_time"] = execution_time
+
+            self._steps.append(step_data)
+
+            return result
+
+        except Exception:
+            execution_time = time.time() - start_time
+            self._steps.append(
+                {
+                    "type": exec_steps[tag],
+                    "success": False,
+                    "execution_time": execution_time,
+                }
+            )
+            raise
+


The execute_func method is doing too much. It's executing a function, timing it, handling exceptions, and adding steps. Consider breaking this down into smaller methods for better maintainability and readability.

coderabbitai · 2023-10-11T18:28:03Z

pandasai/helpers/query_exec_tracker.py

+            execution_time = time.time() - start_time
+            if tag not in exec_steps:
+                return result
+
+            step_data = self._generate_exec_step(tag, result)
+
+            step_data["type"] = exec_steps[tag]
+            step_data["success"] = True
+            step_data["execution_time"] = execution_time
+
+            self._steps.append(step_data)
+
+            return result


The if tag not in exec_steps: check could lead to silent failures if a tag is misspelled or not included in exec_steps. Consider raising an exception or logging a warning if the tag is not found.

coderabbitai · 2023-10-11T18:28:03Z

pandasai/helpers/query_exec_tracker.py

+        except Exception:
+            execution_time = time.time() - start_time
+            self._steps.append(
+                {
+                    "type": exec_steps[tag],
+                    "success": False,
+                    "execution_time": execution_time,
+                }
+            )
+            raise


When an exception occurs, the step is added with a success value of False, but the _success attribute of the QueryExecTracker instance is not updated. This could lead to inconsistencies in the reported success status.

except Exception: execution_time = time.time() - start_time self._steps.append( { "type": exec_steps[tag], "success": False, "execution_time": execution_time, } ) + self._success = False raise

coderabbitai · 2023-10-11T18:28:03Z

pandasai/helpers/query_exec_tracker.py

+    def get_summary(self) -> dict:
+        """
+        Returns the summary in json to steps involved in execution of track
+        Returns:
+            dict: summary json
+        """
+        execution_time = time.time() - self._start_time
+        return {
+            "query_info": self._query_info,
+            "dataframes": self._dataframes,
+            "steps": self._steps,
+            "response": self._response,
+            "execution_time": execution_time,
+            "success": self._success,
+        }


The get_summary method does not update the _success attribute before returning the summary. This could lead to inconsistencies if the success property setter is used after the summary is generated.

def get_summary(self) -> dict: """ Returns the summary in json to steps involved in execution of track Returns: dict: summary json """ execution_time = time.time() - self._start_time + self._success = all(step.get('success', False) for step in self._steps) return { "query_info": self._query_info, "dataframes": self._dataframes, "steps": self._steps, "response": self._response, "execution_time": execution_time, "success": self._success, }

coderabbitai

Review Status

Actionable comments generated: 1

Commits

Files that changed from the base of the PR and between 5fcf2d9 and 1797f62.

Files selected for processing (1)

pandasai/custom_loggers/api_logger.py (1 hunks)

coderabbitai · 2023-10-11T19:10:46Z

pandasai/custom_loggers/api_logger.py

+import logging
+from typing import Union
+import requests
+from pandasai.helpers.logger import Logger
+
+
+class APILogger(Logger):
+    _api_key: str = None
+    _server_url: str = None
+
+    def __init__(self, server_url: str, api_key: str):
+        self._api_key = api_key
+        self._server_url = server_url
+
+    def log(self, message: Union[str, dict], level: int = logging.INFO):
+        try:
+            if isinstance(message, dict):
+                log_data = {
+                    "json_log": message,
+                }
+                headers = {"Authorization": f"Bearer {self._api_key}"}
+                response = requests.post(
+                    f"{self._server_url}/api/log/add", json=log_data, headers=headers
+                )
+                if response.status_code != 200:
+                    raise Exception(response.text)
+
+        except Exception as e:
+            print(f"Exception in APILogger: {e}")


The APILogger class extends the Logger class but does not call the parent class's constructor. This could lead to issues if the parent class's constructor has important initializations. Also, the log method does not handle the case when the message is a string. It only handles the case when message is a dictionary. This could lead to unexpected behavior if a string message is passed. Lastly, the log method catches all exceptions and prints them, but it does not re-raise them. This could hide errors and make debugging difficult. Consider re-raising the exception after logging it.

class APILogger(Logger): _api_key: str = None _server_url: str = None def __init__(self, server_url: str, api_key: str): + super().__init__() self._api_key = api_key self._server_url = server_url def log(self, message: Union[str, dict], level: int = logging.INFO): try: if isinstance(message, dict): log_data = { "json_log": message, } headers = {"Authorization": f"Bearer {self._api_key}"} response = requests.post( f"{self._server_url}/api/log/add", json=log_data, headers=headers ) if response.status_code != 200: raise Exception(response.text) + else: + log_data = { + "text_log": message, + } + headers = {"Authorization": f"Bearer {self._api_key}"} + response = requests.post( + f"{self._server_url}/api/log/add", json=log_data, headers=headers + ) + if response.status_code != 200: + raise Exception(response.text) except Exception as e: print(f"Exception in APILogger: {e}") + raise

coderabbitai

Review Status

Actionable comments generated: 8

Commits

Files that changed from the base of the PR and between 1797f62 and abd6b25.

Files selected for processing (5)

examples/using_pandasai_log_server.py (1 hunks)
pandasai/helpers/query_exec_tracker.py (1 hunks)
pandasai/schemas/df_config.py (3 hunks)
pandasai/smart_datalake/init.py (7 hunks)
tests/test_query_tracker.py (1 hunks)

Files skipped from review due to trivial changes (1)

examples/using_pandasai_log_server.py

Additional comments (Suppressed): 10

pandasai/schemas/df_config.py (2)
8-14: The LogServerConfig class has been added. It's a TypedDict that expects server_url and api_key as keys. Ensure that the server URL and API key are being properly validated and secured elsewhere in the code, as they are sensitive data.

34-34: The log_server attribute has been added to the Config class. It's of type LogServerConfig. Ensure that all instances of the Config class in the codebase are updated to include this new attribute if necessary.
-    llm: Any = None
+    llm: Any = None
+    log_server: LogServerConfig = None
pandasai/smart_datalake/__init__.py (8)

20-29: The time module has been removed and replaced with the QueryExecTracker class for tracking execution time. Ensure that the QueryExecTracker class provides the same or better functionality as the time module. Also, verify that the QueryExecTracker class has been thoroughly tested and is reliable.

193-195: The _start_timer method has been removed. This method was previously used to start a timer for tracking the execution time of a query. This functionality appears to have been replaced by the QueryExecTracker class. Ensure that the QueryExecTracker class provides the same or better functionality as the _start_timer method.

282-288: The QueryExecTracker object is created with several parameters. Ensure that these parameters are always available and correctly set when this method is called. Also, verify that the QueryExecTracker class handles these parameters correctly.

300-326: The QueryExecTracker class is used to execute functions and track their execution time. This is a significant change from the previous implementation, which directly called the functions. Ensure that the QueryExecTracker class correctly executes the functions and accurately tracks their execution time. Also, verify that the QueryExecTracker class handles errors and exceptions correctly.

343-350: The QueryExecTracker class is used to execute the execute_code method of the _code_manager object. Ensure that the QueryExecTracker class correctly executes this method and accurately tracks its execution time. Also, verify that the QueryExecTracker class handles errors and exceptions correctly.

365-370: The QueryExecTracker class is used to execute the _retry_run_code method. Ensure that the QueryExecTracker class correctly executes this method and accurately tracks its execution time. Also, verify that the QueryExecTracker class handles errors and exceptions correctly.

379-393: The QueryExecTracker class is used to add steps to the tracker. Ensure that the QueryExecTracker class correctly adds these steps and that the steps are accurately represented in the tracker.

409-417: The QueryExecTracker class is used to get the execution time and parse the result. Ensure that the QueryExecTracker class correctly performs these operations and accurately tracks the execution time. Also, verify that the QueryExecTracker class handles errors and exceptions correctly.

coderabbitai · 2023-10-12T10:57:56Z

pandasai/helpers/query_exec_tracker.py

+    _query_info: str = {}
+    _dataframes: List = []
+    _response: ResponseType = {}
+    _steps: List = []
+    _start_time = None
+    _success: bool = False
+    _server_config: dict = None
+


The class variables are shared among all instances of the class. This could lead to data races and inconsistencies if multiple instances of QueryExecTracker are used concurrently. Consider moving these variables to the __init__ method to make them instance variables.

- _query_info: str = {} - _dataframes: List = [] - _response: ResponseType = {} - _steps: List = [] - _start_time = None - _success: bool = False - _server_config: dict = None + def __init__( + self, + conversation_id: str, + query: str, + instance: str, + output_type: str, + server_config: Union[dict, None] = None, + ) -> None: + self._query_info: str = {} + self._dataframes: List = [] + self._response: ResponseType = {} + self._steps: List = [] + self._start_time = None + self._success: bool = False + self._server_config: dict = server_config

coderabbitai · 2023-10-12T10:57:56Z

pandasai/helpers/query_exec_tracker.py

+    def execute_func(self, function, *args, **kwargs) -> Any:
+        """
+        Tracks function executions, calculates execution time and prepare data
+        Args:
+            function (function): Function that is to be executed
+
+        Returns:
+            Any: Response return after function execution
+        """
+        start_time = time.time()
+
+        # Get the tag from kwargs if provided, or use the function name as the default
+        tag = kwargs.pop("tag", function.__name__)
+
+        try:
+            result = function(*args, **kwargs)
+
+            execution_time = time.time() - start_time
+            if tag not in exec_steps:
+                return result
+
+            step_data = self._generate_exec_step(tag, result)
+
+            step_data["type"] = exec_steps[tag]
+            step_data["success"] = True
+            step_data["execution_time"] = execution_time
+
+            self._steps.append(step_data)
+
+            return result
+
+        except Exception:
+            execution_time = time.time() - start_time
+            self._steps.append(
+                {
+                    "type": exec_steps[tag],
+                    "success": False,
+                    "execution_time": execution_time,
+                }
+            )
+            raise


The execute_func method is doing too much. It's executing a function, timing it, generating execution steps, and handling exceptions. Consider breaking this method into smaller, more focused methods to improve maintainability and readability.

coderabbitai · 2023-10-12T10:57:56Z

pandasai/helpers/query_exec_tracker.py

+                raise Exception(response.text)
+
+        except Exception as e:
+            print(f"Exception in APILogger: {e}")


Avoid using print statements for error logging. Consider using a logging library which provides more flexibility and control over log levels, formats, and destinations.

- print(f"Exception in APILogger: {e}") + logging.error(f"Exception in APILogger: {e}")

coderabbitai · 2023-10-12T10:57:56Z

pandasai/helpers/query_exec_tracker.py

+    def success(self):
+        return self._success
+
+    @success.setter
+    def success(self, value):
+        self._success = value


The success property setter is not validating the input. Consider adding a check to ensure that value is a boolean to prevent invalid data types from being assigned.

+ @success.setter + def success(self, value): + if not isinstance(value, bool): + raise ValueError("Success value must be a boolean.") + self._success = value

coderabbitai · 2023-10-12T10:57:56Z

tests/test_query_tracker.py

+    def test_add_dataframes(
+        self, smart_dataframe: SmartDataframe, tracker: QueryExecTracker
+    ):
+        # Add the dataframe to the tracker
+        tracker._dataframes = []
+        tracker.add_dataframes([smart_dataframe])
+
+        # Check if the dataframe was added correctly
+        assert len(tracker._dataframes) == 1
+        assert len(tracker._dataframes[0]["headers"]) == 3
+        assert len(tracker._dataframes[0]["rows"]) == 5
+


The test test_add_dataframes is not comprehensive. It only checks the length of the dataframe and the number of headers and rows. It would be better to also check the actual content of the dataframe to ensure it was added correctly.

+ # Check if the dataframe was added correctly + assert tracker._dataframes[0]["dataframe"].equals(smart_dataframe)

coderabbitai · 2023-10-12T10:57:56Z

tests/test_query_tracker.py

+        def mock_function(*args, **kwargs):
+            time.sleep(1)
+            return "Mock Result"
+
+        # Sleep for a while to simulate execution time
+        with patch("time.time", return_value=0):
+            tracker.execute_func(mock_function, tag="cache_hit")
+
+        # Get the execution time
+        execution_time = tracker.get_execution_time()
+
+        # Check if the execution time is approximately 1 second
+        assert_almost_equal(execution_time, 1.0, delta=0.3)


The test test_get_execution_time is not reliable. It assumes that the execution time of the mock_function is always 1 second. This might not be the case in different environments or under different loads. Consider mocking the time.sleep function instead of assuming its behavior.

- def mock_function(*args, **kwargs): - time.sleep(1) - return "Mock Result" - - # Sleep for a while to simulate execution time - with patch("time.time", return_value=0): - tracker.execute_func(mock_function, tag="cache_hit") + with patch("time.sleep") as mock_sleep: + def mock_function(*args, **kwargs): + time.sleep(1) + return "Mock Result" + + # Sleep for a while to simulate execution time + with patch("time.time", return_value=0): + tracker.execute_func(mock_function, tag="cache_hit") + mock_sleep.assert_called_once_with(1)

coderabbitai · 2023-10-12T10:57:57Z

tests/test_query_tracker.py

+    def test_publish_method_with_server_key(self, tracker: QueryExecTracker):
+        # Define a mock summary function
+        def mock_get_summary():
+            return "Test summary data"
+
+        # Mock the server_config
+        tracker._server_config = {
+            "server_url": "http://custom-server",
+            "api_key": "custom-api-key",
+        }
+
+        # Set the get_summary method to your mock
+        tracker.get_summary = mock_get_summary
+
+        # Mock the requests.post method
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        type(mock_response).text = "Response text"
+        # Mock the requests.post method
+        with patch("requests.post", return_value=mock_response) as mock_post:
+            # Call the publish method
+            result = tracker.publish()
+
+        # Check that requests.post was called with the expected parameters
+        mock_post.assert_called_with(
+            "http://custom-server/api/log/add",
+            json={"json_log": "Test summary data"},
+            headers={"Authorization": "Bearer custom-api-key"},
+        )
+
+        # Check the result
+        assert result is None  # The function should return None
+


The test test_publish_method_with_server_key is not comprehensive. It only checks if the requests.post method was called with the correct parameters. It would be better to also check the response of the requests.post method to ensure it behaves as expected.

+ # Check the response + assert mock_response.status_code == 200 + assert mock_response.text == "Response text"

coderabbitai · 2023-10-12T10:57:57Z

tests/test_query_tracker.py

+    def test_publish_method_with_os_env(self, tracker: QueryExecTracker):
+        # Define a mock summary function
+        def mock_get_summary():
+            return "Test summary data"
+
+        # Define a mock environment for testing
+        os.environ["LOGGING_SERVER_URL"] = "http://test-server"
+        os.environ["LOGGING_SERVER_API_KEY"] = "test-api-key"
+
+        # Set the get_summary method to your mock
+        tracker.get_summary = mock_get_summary
+
+        # Mock the requests.post method
+        mock_response = MagicMock()
+        mock_response.status_code = 200
+        type(mock_response).text = "Response text"
+        # Mock the requests.post method
+        with patch("requests.post", return_value=mock_response) as mock_post:
+            # Call the publish method
+            result = tracker.publish()
+
+        # Check that requests.post was called with the expected parameters
+        mock_post.assert_called_with(
+            "http://test-server/api/log/add",
+            json={"json_log": "Test summary data"},
+            headers={"Authorization": "Bearer test-api-key"},
+        )
+
+        # Check the result
+        assert result is None  # The function should return None


The test test_publish_method_with_os_env is not cleaning up the environment variables it sets. This could potentially affect other tests. Consider using unittest.mock.patch.dict to temporarily set the environment variables for this test.

- # Define a mock environment for testing - os.environ["LOGGING_SERVER_URL"] = "http://test-server" - os.environ["LOGGING_SERVER_API_KEY"] = "test-api-key" + # Define a mock environment for testing + mock_env = { + "LOGGING_SERVER_URL": "http://test-server", + "LOGGING_SERVER_API_KEY": "test-api-key", + } + with patch.dict(os.environ, mock_env): + # Set the get_summary method to your mock + tracker.get_summary = mock_get_summary + + # Mock the requests.post method + mock_response = MagicMock() + mock_response.status_code = 200 + type(mock_response).text = "Response text" + # Mock the requests.post method + with patch("requests.post", return_value=mock_response) as mock_post: + # Call the publish method + result = tracker.publish()

coderabbitai · 2023-10-13T23:06:18Z

CodeRabbit review skipped

By default, CodeRabbit only reviews PRs on the default branch. If you wish to have PRs reviewed on additional branches, you can configure that under the repository settings in the UI.

ArslanSaleem added 3 commits October 10, 2023 12:44

feat(QueryTracker): tracks query execution cycle

045691e

feat(QueryTracker): Adding test cases

947f7a4

feat(ApiLogger): logger to send data to remote api

45672fe

coderabbitai bot reviewed Oct 10, 2023

View reviewed changes

chore: add more test cases to test the flow

7c396f7

coderabbitai bot reviewed Oct 10, 2023

View reviewed changes

Prettify the logged json for current logger

c957373

coderabbitai bot reviewed Oct 10, 2023

View reviewed changes

ArslanSaleem marked this pull request as draft October 10, 2023 14:26

ArslanSaleem added 2 commits October 10, 2023 21:14

fix: remove tag from the PR

56985cc

Remove json to string conversion from summary function

f32f2a1

ArslanSaleem marked this pull request as ready for review October 10, 2023 16:47

coderabbitai bot reviewed Oct 10, 2023

View reviewed changes

chore: add parsing in exec cycle and update APILogger

5fcf2d9

coderabbitai bot reviewed Oct 11, 2023

View reviewed changes

move authorization to header

1797f62

coderabbitai bot reviewed Oct 11, 2023

View reviewed changes

refactor: publish query sumary to server from query tracker

abd6b25

coderabbitai bot reviewed Oct 12, 2023

View reviewed changes

gventuri changed the base branch from main to release/v1.4 October 13, 2023 23:06

Merge branch 'release/v1.4' into log_query_exec

65d661d

gventuri merged commit 269a867 into release/v1.4 Oct 13, 2023
9 checks passed

		func_name = kwargs["tag"] if "tag" in kwargs else function.__name__

		try:

feat(QueryTracker): track query execution cycle #630

feat(QueryTracker): track query execution cycle #630

Conversation

ArslanSaleem commented Oct 10, 2023 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Oct 10, 2023 • edited Loading

Walkthrough

Changes

Chat with CodeRabbit Bot (@coderabbitai)

Pause Incremental Reviews

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

codecov-commenter commented Oct 10, 2023 • edited Loading

Codecov Report

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 11, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 11, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 11, 2023

Choose a reason for hiding this comment

ArslanSaleem commented Oct 10, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 10, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

codecov-commenter commented Oct 10, 2023 •

edited

Loading