Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(QueryTracker): track query execution cycle #630

Merged
merged 11 commits into from
Oct 13, 2023
Merged

Conversation

ArslanSaleem
Copy link
Collaborator

@ArslanSaleem ArslanSaleem commented Oct 10, 2023

Summary by CodeRabbit

  • New Feature: Introduced QueryExecTracker class in pandasai.helpers.query_exec_tracker for tracking the execution of queries, maintaining execution steps, and generating a summary of the execution.
  • New Feature: Added APILogger class in pandasai.custom_loggers.api_logger for sending log messages to a server.
  • Refactor: Updated pandasai.smart_datalake to use QueryExecTracker for tracking query execution, replacing the previous timer-based tracking.
  • Test: Added comprehensive tests for QueryExecTracker in tests/test_query_tracker.py to ensure correct tracking of query execution.
  • Chore: Updated examples in examples/custom_logger.py and examples/using_pandasai_log_server.py to demonstrate the usage of the new APILogger and QueryExecTracker classes.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 10, 2023

Walkthrough

The changes introduce a QueryExecTracker class for tracking query execution and logging. The Agent class now uses this tracker, and the APILogger class sends logs to a server. Test coverage for the new tracker is added. A custom logger example and a log server configuration are also included.

Changes

File(s) Summary
pandasai/helpers/query_exec_tracker.py Introduced QueryExecTracker class for tracking query execution and logging.
pandasai/smart_datalake/__init__.py Updated Agent class to use QueryExecTracker for execution tracking and logging.
pandasai/custom_loggers/api_logger.py Added APILogger class for sending logs to a server.
tests/test_query_tracker.py Added test coverage for QueryExecTracker.
examples/custom_logger.py Added an example of using a custom logger with an AI agent.
pandasai/schemas/df_config.py Added a log_server field to the Config class for log server configuration.

🐇
"In the land of code, where logic is king,
A rabbit hopped in, changes to bring.
With a tracker in hand, and a logger so bright,
He danced through the code, from morning till night.
Now queries are tracked, and logs are sent far,
In the world of AI, we've raised the bar!" 🌟


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • Mention @coderabbitai in any review comment for bot assistance.
  • Note: Review comments are made on code diffs or files, not on the PR overview.

Pause Incremental Reviews

  • Insert @coderabbitai: ignore in the PR description to halt the bot's ongoing reviews. Remove the line to resume.
  • Additionally, enter @coderabbitai pause to halt the bot's ongoing reviews as any issue or review comment. To
    resume reviews, enter @coderabbitai resume as any issue or review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 7

Commits Files that changed from the base of the PR and between 3813663 and 45672fe.
Files selected for processing (7)
  • examples/custom_logger.py (1 hunks)
  • pandasai/custom_loggers/api_logger.py (1 hunks)
  • pandasai/helpers/query_exec_tracker.py (1 hunks)
  • pandasai/smart_datalake/init.py (7 hunks)
  • tests/test_codemanager.py (1 hunks)
  • tests/test_query_tracker.py (1 hunks)
  • tests/test_smartdatalake.py (1 hunks)
Files skipped from review due to trivial changes (2)
  • tests/test_codemanager.py
  • tests/test_smartdatalake.py
Additional comments (Suppressed): 8
pandasai/smart_datalake/__init__.py (8)
  • 20-29: The time module has been removed from the imports. Ensure that this does not affect other parts of the code that might be using it. Also, the QueryExecTracker class has been imported from pandasai.helpers.query_exec_tracker. Verify that this class is implemented correctly and that it is available in the specified location.

  • 193-195: The _start_timer method has been removed. This method was previously used to start a timer for tracking the execution time of the query. If the execution time is now being tracked elsewhere, ensure that it is being done correctly.

  • 276-293: The QueryExecTracker instance is being created and used to track the execution of the query. Ensure that the QueryExecTracker class is implemented correctly and that it provides the necessary functionality for tracking the execution of the query.

  • 299-325: The QueryExecTracker instance is being used to execute functions and track their execution time. Ensure that the execute_func method of the QueryExecTracker class is implemented correctly and that it provides accurate timing information.

  • 342-349: The QueryExecTracker instance is being used to execute the execute_code method of the _code_manager instance. Ensure that the execute_func method of the QueryExecTracker class is implemented correctly and that it provides accurate timing information.

  • 364-369: The QueryExecTracker instance is being used to execute the _retry_run_code method. Ensure that the execute_func method of the QueryExecTracker class is implemented correctly and that it provides accurate timing information.

  • 375-392: The QueryExecTracker instance is being used to add steps to the execution tracking. Ensure that the add_step method of the QueryExecTracker class is implemented correctly and that it provides the necessary functionality for tracking the execution steps.

  • 398-413: The QueryExecTracker instance is being used to get the summary of the execution tracking and the execution time. Ensure that the get_summary and get_execution_time methods of the QueryExecTracker class are implemented correctly and that they provide accurate information.

Comment on lines 1 to 32
import pandas as pd
from pandasai import Agent
from pandasai.custom_loggers.api_logger import APILogger

from pandasai.llm.openai import OpenAI

employees_data = {
"EmployeeID": [1, 2, 3, 4, 5],
"Name": ["John", "Emma", "Liam", "Olivia", "William"],
"Department": ["HR", "Sales", "IT", "Marketing", "Finance"],
}

salaries_data = {
"EmployeeID": [1, 2, 3, 4, 5],
"Salary": [5000, 6000, 4500, 7000, 5500],
}

employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)


llm = OpenAI("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")
agent = Agent(
[employees_df, salaries_df],
config={"llm": llm, "enable_cache": True},
memory_size=10,
logger=APILogger("SERVER-URL", "USER-ID", "API-KEY"),
)

# Chat with the agent
response = agent.chat("Who gets paid the most?")
print(response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good overall. However, it's not a good practice to hardcode sensitive information like API keys directly in the code. Consider using environment variables or a secure configuration file to store these values. This will also make your code more flexible as you can change the values without modifying the code.

- llm = OpenAI("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")
+ import os
+ llm = OpenAI(os.getenv("OPENAI_KEY"))

- logger=APILogger("SERVER-URL", "USER-ID", "API-KEY"),
+ logger=APILogger(os.getenv("SERVER_URL"), os.getenv("USER_ID"), os.getenv("API_KEY")),

Comment on lines 6 to 28
class APILogger(Logger):
_api_key: str = None
_server_url: str = None
_user_id: str = None

def __init__(self, server_url: str, user_id: str, api_key: str):
self._api_key = api_key
self._server_url = server_url
self._user_id = user_id

def log(self, message: str, level: int = logging.INFO):
try:
log_data = {
# TODO - Remove user id from the API
"user_id": self._user_id,
"api_key_id": self._api_key,
"json_log": message,
}
response = requests.post(f"{self._server_url}/api/log/add", json=log_data)
if response.status_code != 200:
raise Exception(response.text)
except Exception:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The APILogger class extends the Logger class but does not call the parent's constructor in its __init__ method. This could lead to issues if the parent class Logger has important initializations in its constructor. Also, the log method silently passes any exceptions, which could make debugging difficult if there are issues with the logging. Consider logging the exception or re-raising it after handling.

    def __init__(self, server_url: str, user_id: str, api_key: str):
+        super().__init__()
        self._api_key = api_key
        self._server_url = server_url
        self._user_id = user_id
...
        except Exception as e:
+            super().error(f"Failed to log message: {e}")
            pass

Comment on lines 16 to 28
def log(self, message: str, level: int = logging.INFO):
try:
log_data = {
# TODO - Remove user id from the API
"user_id": self._user_id,
"api_key_id": self._api_key,
"json_log": message,
}
response = requests.post(f"{self._server_url}/api/log/add", json=log_data)
if response.status_code != 200:
raise Exception(response.text)
except Exception:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log method sends a POST request to the server every time it is called. This could be a performance issue if the method is called frequently. Consider using a buffer to store log messages and send them in batches.

Comment on lines 14 to 204
tracker._steps = []
# Add the step to the tracker
tracker.add_step(step)

# Check if the step was added correctly
assert len(tracker._steps) == 1
assert tracker._steps[0] == step

def test_execute_func_success(self, tracker: QueryExecTracker):
tracker._steps = []

# Create a mock function
mock_return_value = Mock()
mock_return_value.to_string = Mock()
mock_return_value.to_string.return_value = "Mock Result"

mock_func = Mock()
mock_func.return_value = mock_return_value

# Execute the mock function using execute_func
result = tracker.execute_func(mock_func, tag="_get_prompt")

# Check if the result is as expected
assert result.to_string() == "Mock Result"
# Check if the step was added correctly
assert len(tracker._steps) == 1
step = tracker._steps[0]
assert step["type"] == "Generate Prompt"
assert step["success"] is True

def test_execute_func_failure(self, tracker: QueryExecTracker):
# Create a mock function that raises an exception
def mock_function(*args, **kwargs):
raise Exception("Mock Exception")

# Execute the mock function using execute_func and expect an exception
with pytest.raises(Exception):
tracker.execute_func(mock_function, tag="custom_tag")

def test_format_response_dataframe(
self, tracker: QueryExecTracker, sample_df: pd.DataFrame
):
# Create a sample ResponseType for a dataframe
response = {"type": "dataframe", "value": sample_df}

# Format the response using _format_response
formatted_response = tracker._format_response(response)

# Check if the response is formatted correctly
assert formatted_response["type"] == "dataframe"
assert len(formatted_response["value"]["headers"]) == 3
assert len(formatted_response["value"]["rows"]) == 10

def test_format_response_other_type(self, tracker: QueryExecTracker):
# Create a sample ResponseType for a non-dataframe response
response = {
"type": "other_type",
"value": "SomeValue",
}

# Format the response using _format_response
formatted_response = tracker._format_response(response)

# Check if the response is formatted correctly
assert formatted_response["type"] == "other_type"
assert formatted_response["value"] == "SomeValue"

def test_get_summary(self, tracker: QueryExecTracker):
# Execute a mock function to generate some steps and response
def mock_function(*args, **kwargs):
return "Mock Result"

tracker.execute_func(mock_function, tag="custom_tag")

# Get the summary
summary = tracker.get_summary()

# Check if the summary contains the expected keys
assert "query_info" in summary
assert "dataframes" in summary
assert "steps" in summary
assert "response" in summary
assert "execution_time" in summary

def test_get_execution_time(self, tracker: QueryExecTracker):
def mock_function(*args, **kwargs):
time.sleep(1)
return "Mock Result"

# Sleep for a while to simulate execution time
with patch("time.time", return_value=0):
tracker.execute_func(mock_function, tag="cache_hit")

# Get the execution time
execution_time = tracker.get_execution_time()

print("Type", execution_time)

# Check if the execution time is approximately 1 second
self.assert_almost_equal(execution_time, 1.0, delta=0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test cases are well written and cover all the methods in the QueryExecTracker class. However, there are a few areas that could be improved:

  1. The assert_almost_equal method (lines 80-87) is a custom implementation of a function that already exists in the unittest module. It would be better to use the built-in function to reduce code complexity and potential errors.

  2. The test_execute_func_failure test (lines 135-143) could be improved by checking the specific exception message. This would ensure that the function fails for the expected reason.

  3. The test_get_execution_time test (lines 189-204) uses a time.sleep call to simulate execution time. This could potentially slow down the test suite. Consider using mocking to simulate the passage of time instead.

Here are the suggested changes:

-    # Define a custom assert_almost_equal function
-    def assert_almost_equal(self, first, second, places=None, msg=None, delta=None):
-        if delta is not None:
-            if abs(float(first) - float(second)) > delta:
-                raise AssertionError(msg or f"{first} != {second} within {delta}")
-        else:
-            assert round(abs(float(second) - float(first)), places) == 0, (
-                msg or f"{first} != {second} within {places} places"
-            )

+    from unittest import TestCase
+    assert_almost_equal = TestCase().assertAlmostEqual

-    # Execute the mock function using execute_func and expect an exception
-    with pytest.raises(Exception):
-        tracker.execute_func(mock_function, tag="custom_tag")

+    # Execute the mock function using execute_func and expect a specific exception
+    with pytest.raises(Exception) as e:
+        tracker.execute_func(mock_function, tag="custom_tag")
+    assert str(e.value) == "Mock Exception", "Unexpected exception message"

-    def mock_function(*args, **kwargs):
-        time.sleep(1)
-        return "Mock Result"
-
-    # Sleep for a while to simulate execution time
-    with patch("time.time", return_value=0):
-        tracker.execute_func(mock_function, tag="cache_hit")

+    # Mock time.sleep to simulate execution time without actually delaying the test
+    with patch("time.sleep", new=lambda x: None):
+        tracker.execute_func(mock_function, tag="cache_hit")

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 12

Commits Files that changed from the base of the PR and between 45672fe and 7c396f7.
Files selected for processing (3)
  • examples/custom_logger.py (1 hunks)
  • pandasai/smart_datalake/init.py (7 hunks)
  • tests/test_query_tracker.py (1 hunks)
Additional comments (Suppressed): 11
examples/custom_logger.py (3)
  • 1-32: The Agent class is being initialized with a logger instance of APILogger. Ensure that the server at "SERVER-URL" is set up to receive and handle log messages correctly. Also, verify that the "USER-ID" and "API-KEY" are valid and have the necessary permissions to send log messages to the server.

  • 22-28: The OpenAI class is being initialized with "Your-API-Key". Make sure to replace "Your-API-Key" with a valid API key. Also, ensure that the Agent class is compatible with the OpenAI class and can handle any exceptions or errors that may be thrown by it.

  • 31-32: The chat method is being called on the Agent instance. Ensure that the Agent class has a chat method and that it can handle the string "Who gets paid the most?" as an argument. Also, verify that the chat method returns a response that can be printed.

pandasai/smart_datalake/__init__.py (8)
  • 20-29: The import of time module has been removed and QueryExecTracker from pandasai.helpers.query_exec_tracker has been added. Ensure that the QueryExecTracker class is correctly implemented and that the removal of time does not affect other parts of the code.

  • 190-195: The _start_timer method has been removed. This change is acceptable if the timing functionality is now handled by the QueryExecTracker class. Verify that this is the case.

  • 276-292: The QueryExecTracker instance is created and dataframes are added to it. This is a good practice as it encapsulates the tracking logic in a separate class, improving modularity and maintainability.

  • 298-324: The QueryExecTracker's execute_func method is used to execute functions and track their execution time. This is a good practice as it allows for more granular tracking of execution times.

  • 341-348: The QueryExecTracker's execute_func method is used to execute the code and track its execution time. This is a good practice as it allows for more granular tracking of execution times.

  • 363-368: The QueryExecTracker's execute_func method is used to retry running the code and track its execution time. This is a good practice as it allows for more granular tracking of execution times.

  • 374-395: The QueryExecTracker's add_step method is used to add steps to the tracker. This is a good practice as it allows for more granular tracking of the execution process.

  • 396-412: The QueryExecTracker's get_summary and get_execution_time methods are used to log the summary and execution time. This is a good practice as it allows for more granular tracking of the execution process.

Comment on lines +94 to +103
# Create a sample step
step = {"type": "CustomStep", "description": "This is a custom step."}

tracker._steps = []
# Add the step to the tracker
tracker.add_step(step)

# Check if the step was added correctly
assert len(tracker._steps) == 1
assert tracker._steps[0] == step
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_add_step assumes that the step will always be added at the end of the _steps list. This might not be the case if the implementation of add_step changes in the future. Consider making the test more flexible by checking if the step is in the _steps list, not necessarily at the end.

Comment on lines +105 to +118
def test_format_response_dataframe(
self, tracker: QueryExecTracker, sample_df: pd.DataFrame
):
# Create a sample ResponseType for a dataframe
response = {"type": "dataframe", "value": sample_df}

# Format the response using _format_response
formatted_response = tracker._format_response(response)

# Check if the response is formatted correctly
assert formatted_response["type"] == "dataframe"
assert len(formatted_response["value"]["headers"]) == 3
assert len(formatted_response["value"]["rows"]) == 10

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_format_response_dataframe assumes that the dataframe will always have 10 rows. This might not be the case if the dataframe changes in the future. Consider making the test more flexible by checking the actual number of rows in the dataframe.

Comment on lines +133 to +149
def test_get_summary(self, tracker: QueryExecTracker):
# Execute a mock function to generate some steps and response
def mock_function(*args, **kwargs):
return "Mock Result"

tracker.execute_func(mock_function, tag="custom_tag")

# Get the summary
summary = tracker.get_summary()

# Check if the summary contains the expected keys
assert "query_info" in summary
assert "dataframes" in summary
assert "steps" in summary
assert "response" in summary
assert "execution_time" in summary

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_get_summary assumes that the summary will always contain the keys "query_info", "dataframes", "steps", "response", and "execution_time". This might not be the case if the implementation of get_summary changes in the future. Consider making the test more flexible by checking if the keys are in the summary, not necessarily all of them.

Comment on lines +150 to +163
def test_get_execution_time(self, tracker: QueryExecTracker):
def mock_function(*args, **kwargs):
time.sleep(1)
return "Mock Result"

# Sleep for a while to simulate execution time
with patch("time.time", return_value=0):
tracker.execute_func(mock_function, tag="cache_hit")

# Get the execution time
execution_time = tracker.get_execution_time()

# Check if the execution time is approximately 1 second
assert_almost_equal(execution_time, 1.0, delta=0.3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_get_execution_time assumes that the execution time will always be approximately 1 second. This might not be the case if the implementation of execute_func or the system's performance changes in the future. Consider making the test more flexible by checking if the execution time is within a reasonable range.

Comment on lines 165 to 186
def test_execute_func_success(self, tracker: QueryExecTracker):
tracker._steps = []

# Create a mock function
mock_return_value = Mock()
mock_return_value.to_string = Mock()
mock_return_value.to_string.return_value = "Mock Result"

mock_func = Mock()
mock_func.return_value = mock_return_value

# Execute the mock function using execute_func
result = tracker.execute_func(mock_func, tag="_get_prompt")

# Check if the result is as expected
assert result.to_string() == "Mock Result"
# Check if the step was added correctly
assert len(tracker._steps) == 1
step = tracker._steps[0]
assert step["type"] == "Generate Prompt"
assert step["success"] is True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_execute_func_success assumes that the step will always be added at the end of the _steps list and that the type of the step will always be "Generate Prompt". This might not be the case if the implementation of execute_func or add_step changes in the future. Consider making the test more flexible by checking if the step is in the _steps list and if the type of the step is one of the possible types.

Comment on lines 196 to 212
tracker._steps = []

mock_func = Mock()
mock_func.return_value = "code"

# Execute the mock function using execute_func
result = tracker.execute_func(mock_func, tag="cache_hit")

# Check if the result is as expected
assert result == "code"
# Check if the step was added correctly
assert len(tracker._steps) == 1
step = tracker._steps[0]
assert "code_generated" in step
assert step["type"] == "Cache Hit"
assert step["success"] is True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_execute_func_cache_hit assumes that the step will always be added at the end of the _steps list and that the type of the step will always be "Cache Hit". This might not be the case if the implementation of execute_func or add_step changes in the future. Consider making the test more flexible by checking if the step is in the _steps list and if the type of the step is one of the possible types.

Comment on lines 213 to 230
def test_execute_func_generate_code(self, tracker: QueryExecTracker):
tracker._steps = []

# Create a mock function
mock_func = Mock()
mock_func.return_value = "code"

# Execute the mock function using execute_func
result = tracker.execute_func(mock_func, tag="generate_code")

# Check if the result is as expected
assert result == "code"
# Check if the step was added correctly
assert len(tracker._steps) == 1
step = tracker._steps[0]
assert "code_generated" in step
assert step["type"] == "Generate Code"
assert step["success"] is True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_execute_func_generate_code assumes that the step will always be added at the end of the _steps list and that the type of the step will always be "Generate Code". This might not be the case if the implementation of execute_func or add_step changes in the future. Consider making the test more flexible by checking if the step is in the _steps list and if the type of the step is one of the possible types.

Comment on lines 232 to 249
def test_execute_func_re_rerun_code(self, tracker: QueryExecTracker):
tracker._steps = []

# Create a mock function
mock_func = Mock()
mock_func.return_value = "code"

# Execute the mock function using execute_func
result = tracker.execute_func(mock_func, tag="_retry_run_code")

# Check if the result is as expected
assert result == "code"
# Check if the step was added correctly
assert len(tracker._steps) == 1
step = tracker._steps[0]
assert "code_generated" in step
assert step["type"] == "Retry Code Generation"
assert step["success"] is True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_execute_func_re_rerun_code assumes that the step will always be added at the end of the _steps list and that the type of the step will always be "Retry Code Generation". This might not be the case if the implementation of execute_func or add_step changes in the future. Consider making the test more flexible by checking if the step is in the _steps list and if the type of the step is one of the possible types.

Comment on lines 251 to 270
def test_execute_func_execute_code_success(
self, sample_df: pd.DataFrame, tracker: QueryExecTracker
):
tracker._steps = []

mock_func = Mock()
mock_func.return_value = {"type": "dataframe", "value": sample_df}

# Execute the mock function using execute_func
result = tracker.execute_func(mock_func, tag="execute_code")

# Check if the result is as expected
assert result["type"] == "dataframe"
# Check if the step was added correctly
assert len(tracker._steps) == 1
step = tracker._steps[0]
assert "result" in step
assert step["type"] == "Code Execution"
assert step["success"] is True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_execute_func_execute_code_success assumes that the step will always be added at the end of the _steps list and that the type of the step will always be "Code Execution". This might not be the case if the implementation of execute_func or add_step changes in the future. Consider making the test more flexible by checking if the step is in the _steps list and if the type of the step is one of the possible types.

Comment on lines 271 to 285
def test_execute_func_execute_code_fail(
self, sample_df: pd.DataFrame, tracker: QueryExecTracker
):
tracker._steps = []

def mock_function(*args, **kwargs):
raise Exception("Mock Exception")

with pytest.raises(Exception):
tracker.execute_func(mock_function, tag="execute_code")

assert len(tracker._steps) == 1
step = tracker._steps[0]
assert step["type"] == "Code Execution"
assert step["success"] is False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_execute_func_execute_code_fail assumes that the function will always raise an exception and that the type of the step will always be "Code Execution". This might not be the case if the implementation of execute_func or add_step changes in the future. Consider making the test more flexible by checking if the function raises an exception under certain conditions and if the type of the step is one of the possible types.

@codecov-commenter
Copy link

codecov-commenter commented Oct 10, 2023

Codecov Report

Merging #630 (5fcf2d9) into main (93996fb) will increase coverage by 0.44%.
Report is 7 commits behind head on main.
The diff coverage is 97.61%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##             main     #630      +/-   ##
==========================================
+ Coverage   84.29%   84.73%   +0.44%     
==========================================
  Files          65       66       +1     
  Lines        3012     3099      +87     
==========================================
+ Hits         2539     2626      +87     
  Misses        473      473              
Files Coverage Δ
pandasai/helpers/query_exec_tracker.py 98.50% <98.50%> (ø)
pandasai/smart_datalake/__init__.py 93.33% <94.11%> (+0.12%) ⬆️

... and 7 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 6

Commits Files that changed from the base of the PR and between 7c396f7 and c957373.
Files selected for processing (1)
  • pandasai/helpers/query_exec_tracker.py (1 hunks)

Comment on lines 62 to 102
def execute_func(self, function, *args, **kwargs) -> Any:
"""
Tracks function executions, calculates execution time and prepare data
Args:
function (function): Function that is to be executed

Returns:
Any: Response return after function execution
"""
start_time = time.time()

func_name = kwargs["tag"] if "tag" in kwargs else function.__name__

try:
result = function(*args, **kwargs)

execution_time = time.time() - start_time
if func_name not in exec_steps:
return result

step_data = self._generate_exec_step(func_name, result)

step_data["type"] = exec_steps[func_name]
step_data["success"] = True
step_data["execution_time"] = execution_time

self._steps.append(step_data)

return result

except Exception:
execution_time = time.time() - start_time
self._steps.append(
{
"type": exec_steps[func_name],
"success": False,
"execution_time": execution_time,
}
)
raise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The execute_func method is doing too much. It's executing a function, timing it, generating step data, and handling exceptions. Consider breaking this method into smaller, more focused methods to improve readability and maintainability.

Comment on lines 73 to 75
func_name = kwargs["tag"] if "tag" in kwargs else function.__name__

try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tag keyword argument is used to override the function name in the tracking data. This could lead to confusion if the tag does not accurately represent the function being executed. Consider removing this feature or ensuring that the tag is always representative of the function.

Comment on lines 92 to 101
except Exception:
execution_time = time.time() - start_time
self._steps.append(
{
"type": exec_steps[func_name],
"success": False,
"execution_time": execution_time,
}
)
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handling here is too broad. Catching all exceptions and simply re-raising them after logging the failure does not provide much value. Consider catching specific exceptions that you can handle in a meaningful way.

Comment on lines 103 to 127
def _generate_exec_step(self, func_name: str, result: Any) -> dict:
"""
Extracts and Generates result
Args:
func_name (str): function name that is executed
result (Any): function output response

Returns:
dict: dictionary with information about the function execution
"""
if (
func_name == "cache_hit"
or func_name == "generate_code"
or func_name == "_retry_run_code"
):
return {"code_generated": result}
elif func_name == "_get_prompt":
return {
"prompt_class": result.__class__.__name__,
"generated_prompt": result.to_string(),
}
elif func_name == "execute_code":
self._response = self._format_response(result)
return {"result": result}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _generate_exec_step method uses a series of if statements to determine what data to return based on the function name. This is not very scalable or maintainable. Consider using a dictionary mapping function names to handler functions, or redesigning this part of the system to avoid the need for this method.

Comment on lines 156 to 160
summary = self.get_summary_dict()
try:
return json.dumps(summary, indent=4)
except Exception:
return summary
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_summary method attempts to return a JSON string representation of the summary, but falls back to returning the raw dictionary if this fails. This could lead to inconsistent return types. Consider always returning the raw dictionary and letting the caller decide how to serialize it.

@ArslanSaleem ArslanSaleem marked this pull request as draft October 10, 2023 14:26
@ArslanSaleem ArslanSaleem marked this pull request as ready for review October 10, 2023 16:47
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 8

Commits Files that changed from the base of the PR and between c957373 and f32f2a1.
Files selected for processing (2)
  • pandasai/helpers/query_exec_tracker.py (1 hunks)
  • tests/test_query_tracker.py (1 hunks)
Additional comments (Suppressed): 1
pandasai/helpers/query_exec_tracker.py (1)
  • 33-39: The conversation_id is being converted to a string. If it's not a string, this could lead to unexpected behavior. Verify the type of conversation_id before passing it to the QueryExecTracker.

Comment on lines +85 to +92
tracker._dataframes = []
tracker.add_dataframes([smart_dataframe])

# Check if the dataframe was added correctly
assert len(tracker._dataframes) == 1
assert len(tracker._dataframes[0]["headers"]) == 3
assert len(tracker._dataframes[0]["rows"]) == 5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_add_dataframes assumes that the dataframe will always have 3 headers and 5 rows. This might not be the case for different dataframes. Consider making the test more generic to handle different dataframes.

Comment on lines +156 to +163
with patch("time.time", return_value=0):
tracker.execute_func(mock_function, tag="cache_hit")

# Get the execution time
execution_time = tracker.get_execution_time()

# Check if the execution time is approximately 1 second
assert_almost_equal(execution_time, 1.0, delta=0.3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_get_execution_time assumes that the execution time will always be approximately 1 second. This might not be the case for different functions or systems. Consider making the test more generic to handle different execution times.

Comment on lines +189 to +195
# Create a mock function that raises an exception
def mock_function(*args, **kwargs):
raise Exception("Mock Exception")

with pytest.raises(Exception):
tracker.execute_func(mock_function, tag="custom_tag")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_execute_func_failure does not check if the step was added correctly to the tracker after the function raises an exception. Consider adding assertions to check the step in the tracker.

Comment on lines +280 to +291

mock_func = Mock()
mock_func.side_effect = Exception("Mock Exception")
mock_func.__name__ = "execute_code"

with pytest.raises(Exception):
tracker.execute_func(mock_func)

assert len(tracker._steps) == 1
step = tracker._steps[0]
assert step["type"] == "Code Execution"
assert step["success"] is False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_execute_func_execute_code_fail does not check if the error message in the step is correct after the function raises an exception. Consider adding assertions to check the error message in the step.

Comment on lines +20 to +24
_query_info: str = {}
_dataframes: List = []
_response: ResponseType = {}
_steps: List = []
_start_time = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class variables _query_info, _dataframes, _response, _steps, and _start_time should be instance variables. Class variables are shared across all instances of the class, which is not the desired behavior here. Each QueryExecTracker instance should have its own state.

-    _query_info: str = {}
-    _dataframes: List = []
-    _response: ResponseType = {}
-    _steps: List = []
-    _start_time = None
+    def __init__(
+        self,
+        conversation_id: str,
+        query: str,
+        instance: str,
+        output_type: str,
+    ) -> None:
+        self._query_info: str = {}
+        self._dataframes: List = []
+        self._response: ResponseType = {}
+        self._steps: List = []
+        self._start_time = None

Comment on lines 113 to 127
if (
func_name == "cache_hit"
or func_name == "generate_code"
or func_name == "_retry_run_code"
):
return {"code_generated": result}
elif func_name == "_get_prompt":
return {
"prompt_class": result.__class__.__name__,
"generated_prompt": result.to_string(),
}
elif func_name == "execute_code":
self._response = self._format_response(result)
return {"result": result}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _generate_exec_step method is using a series of if statements to determine the function name. This could be simplified and made more maintainable by using a dictionary mapping function names to handlers.

Comment on lines +128 to +148
def _format_response(self, result: ResponseType) -> ResponseType:
"""
Format output response
Args:
result (ResponseType): response returned after execution

Returns:
ResponseType: formatted response output
"""
formatted_result = {}
if result["type"] == "dataframe":
formatted_result = {
"type": result["type"],
"value": {
"headers": result["value"].columns.tolist(),
"rows": result["value"].values.tolist(),
},
}
return formatted_result
else:
return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _format_response method is only formatting the response if the type is "dataframe". If the type is not "dataframe", it returns the result as is. This could lead to inconsistent response formats. Consider handling other types as well.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 6

Commits Files that changed from the base of the PR and between f32f2a1 and 5fcf2d9.
Files selected for processing (4)
  • examples/custom_logger.py (1 hunks)
  • pandasai/custom_loggers/api_logger.py (1 hunks)
  • pandasai/helpers/query_exec_tracker.py (1 hunks)
  • pandasai/smart_datalake/init.py (7 hunks)
Additional comments (Suppressed): 8
examples/custom_logger.py (1)
  • 1-36: The new hunk demonstrates the usage of the APILogger class. It creates an instance of Agent with APILogger as the logger. The chat method is then called on the Agent instance, and the response is printed. The code seems to be logically correct and follows best practices. However, ensure that the "SERVER-URL" and "API-KEY" are replaced with actual values when using this code. Also, ensure that the "OPEN-API-KEY" is replaced with your actual OpenAI API key.
pandasai/smart_datalake/__init__.py (7)
  • 25-29: The QueryExecTracker class is imported but the time module is no longer imported. This suggests that the QueryExecTracker class is being used to track the execution time of queries, which was previously done using the time module. Ensure that the QueryExecTracker class provides the same functionality as the time module and that it is used correctly throughout the code.

  • 193-195: The _start_timer method has been removed. This method was previously used to start a timer for tracking the execution time of queries. Ensure that the removal of this method does not affect the functionality of the code and that the execution time of queries is still being tracked correctly.

  • 282-288: A QueryExecTracker object is created and dataframes are added to it. This object is used to track the execution of queries. Ensure that the QueryExecTracker object is used correctly throughout the code and that it provides the same functionality as the previous method of tracking query execution.

  • 366-368: The execute_func method of the QueryExecTracker object is used to execute functions while tracking their execution time. Ensure that this method is used correctly and that it provides the same functionality as the previous method of executing functions.

  • 377-391: Steps are added to the QueryExecTracker object. These steps represent the different stages of the query execution process. Ensure that these steps are added correctly and that they accurately represent the query execution process.

  • 407-407: The get_execution_time method of the QueryExecTracker object is used to get the execution time of the query. Ensure that this method provides the same functionality as the previous method of getting the execution time.

  • 415-415: The execute_func method of the QueryExecTracker object is used to parse the result of the query, and the get_summary method is used to get a summary of the query execution. Ensure that these methods are used correctly and that they provide the same functionality as the previous methods of parsing the result and getting a summary.

Comment on lines 1 to 29
import logging
from typing import Union
import requests
from pandasai.helpers.logger import Logger


class APILogger(Logger):
_api_key: str = None
_server_url: str = None

def __init__(self, server_url: str, api_key: str):
self._api_key = api_key
self._server_url = server_url

def log(self, message: Union[str, dict], level: int = logging.INFO):
try:
if isinstance(message, dict):
log_data = {
"api_key_id": self._api_key,
"json_log": message,
}
response = requests.post(
f"{self._server_url}/api/log/add", json=log_data
)
if response.status_code != 200:
raise Exception(response.text)

except Exception as e:
print(f"Exception in APILogger: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The APILogger class is a good example of a custom logger that sends log messages to a server. However, there are a few areas that could be improved:

  1. Error handling: In the log method, you're catching all exceptions and printing them. This is not a good practice because it can make debugging difficult. Instead, consider logging the error message and re-throwing the exception, or handling specific exceptions that you expect might occur.

  2. Security: You're sending the API key in the body of the POST request. This could potentially expose the API key to anyone who has access to the logs or network traffic. Consider sending the API key in the headers of the request instead.

  3. Performance: The log method sends a POST request every time it's called. This could potentially slow down your application if you're logging a lot of messages. Consider batching log messages and sending them in one request, or using a background thread to send log messages.

Here's how you might address these issues:

    def log(self, message: Union[str, dict], level: int = logging.INFO):
        try:
            if isinstance(message, dict):
                log_data = {
                    "json_log": message,
                }
                headers = {"Authorization": f"Bearer {self._api_key}"}
                response = requests.post(
                    f"{self._server_url}/api/log/add", json=log_data, headers=headers
                )
                if response.status_code != 200:
                    raise Exception(response.text)

        except Exception as e:
            logging.error(f"Exception in APILogger: {e}")
            raise

Comment on lines 21 to 27
_query_info: str = {}
_dataframes: List = []
_response: ResponseType = {}
_steps: List = []
_start_time = None
_success: bool = False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class variables are shared across all instances of the class. This could lead to unexpected behavior if multiple instances of QueryExecTracker are created. These should be instance variables, defined inside the __init__ method.

-    _query_info: str = {}
-    _dataframes: List = []
-    _response: ResponseType = {}
-    _steps: List = []
-    _start_time = None
-    _success: bool = False
+    def __init__(
+        self,
+        conversation_id: str,
+        query: str,
+        instance: str,
+        output_type: str,
+    ) -> None:
+        self._query_info: str = {}
+        self._dataframes: List = []
+        self._response: ResponseType = {}
+        self._steps: List = []
+        self._start_time = None
+        self._success: bool = False
+        ...

Comment on lines +64 to +105
def execute_func(self, function, *args, **kwargs) -> Any:
"""
Tracks function executions, calculates execution time and prepare data
Args:
function (function): Function that is to be executed

Returns:
Any: Response return after function execution
"""
start_time = time.time()

# Get the tag from kwargs if provided, or use the function name as the default
tag = kwargs.pop("tag", function.__name__)

try:
result = function(*args, **kwargs)

execution_time = time.time() - start_time
if tag not in exec_steps:
return result

step_data = self._generate_exec_step(tag, result)

step_data["type"] = exec_steps[tag]
step_data["success"] = True
step_data["execution_time"] = execution_time

self._steps.append(step_data)

return result

except Exception:
execution_time = time.time() - start_time
self._steps.append(
{
"type": exec_steps[tag],
"success": False,
"execution_time": execution_time,
}
)
raise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The execute_func method is doing too much. It's executing a function, timing it, handling exceptions, and adding steps. Consider breaking this down into smaller methods for better maintainability and readability.

Comment on lines +81 to +93
execution_time = time.time() - start_time
if tag not in exec_steps:
return result

step_data = self._generate_exec_step(tag, result)

step_data["type"] = exec_steps[tag]
step_data["success"] = True
step_data["execution_time"] = execution_time

self._steps.append(step_data)

return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if tag not in exec_steps: check could lead to silent failures if a tag is misspelled or not included in exec_steps. Consider raising an exception or logging a warning if the tag is not found.

Comment on lines +95 to +104
except Exception:
execution_time = time.time() - start_time
self._steps.append(
{
"type": exec_steps[tag],
"success": False,
"execution_time": execution_time,
}
)
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When an exception occurs, the step is added with a success value of False, but the _success attribute of the QueryExecTracker instance is not updated. This could lead to inconsistencies in the reported success status.

        except Exception:
            execution_time = time.time() - start_time
            self._steps.append(
                {
                    "type": exec_steps[tag],
                    "success": False,
                    "execution_time": execution_time,
                }
            )
+            self._success = False
            raise

Comment on lines +155 to +169
def get_summary(self) -> dict:
"""
Returns the summary in json to steps involved in execution of track
Returns:
dict: summary json
"""
execution_time = time.time() - self._start_time
return {
"query_info": self._query_info,
"dataframes": self._dataframes,
"steps": self._steps,
"response": self._response,
"execution_time": execution_time,
"success": self._success,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_summary method does not update the _success attribute before returning the summary. This could lead to inconsistencies if the success property setter is used after the summary is generated.

    def get_summary(self) -> dict:
        """
        Returns the summary in json to steps involved in execution of track
        Returns:
            dict: summary json
        """
        execution_time = time.time() - self._start_time
+        self._success = all(step.get('success', False) for step in self._steps)
        return {
            "query_info": self._query_info,
            "dataframes": self._dataframes,
            "steps": self._steps,
            "response": self._response,
            "execution_time": execution_time,
            "success": self._success,
        }

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Commits Files that changed from the base of the PR and between 5fcf2d9 and 1797f62.
Files selected for processing (1)
  • pandasai/custom_loggers/api_logger.py (1 hunks)

Comment on lines 1 to 29
import logging
from typing import Union
import requests
from pandasai.helpers.logger import Logger


class APILogger(Logger):
_api_key: str = None
_server_url: str = None

def __init__(self, server_url: str, api_key: str):
self._api_key = api_key
self._server_url = server_url

def log(self, message: Union[str, dict], level: int = logging.INFO):
try:
if isinstance(message, dict):
log_data = {
"json_log": message,
}
headers = {"Authorization": f"Bearer {self._api_key}"}
response = requests.post(
f"{self._server_url}/api/log/add", json=log_data, headers=headers
)
if response.status_code != 200:
raise Exception(response.text)

except Exception as e:
print(f"Exception in APILogger: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The APILogger class extends the Logger class but does not call the parent class's constructor. This could lead to issues if the parent class's constructor has important initializations. Also, the log method does not handle the case when the message is a string. It only handles the case when message is a dictionary. This could lead to unexpected behavior if a string message is passed. Lastly, the log method catches all exceptions and prints them, but it does not re-raise them. This could hide errors and make debugging difficult. Consider re-raising the exception after logging it.

class APILogger(Logger):
    _api_key: str = None
    _server_url: str = None

    def __init__(self, server_url: str, api_key: str):
+        super().__init__()
        self._api_key = api_key
        self._server_url = server_url

    def log(self, message: Union[str, dict], level: int = logging.INFO):
        try:
            if isinstance(message, dict):
                log_data = {
                    "json_log": message,
                }
                headers = {"Authorization": f"Bearer {self._api_key}"}
                response = requests.post(
                    f"{self._server_url}/api/log/add", json=log_data, headers=headers
                )
                if response.status_code != 200:
                    raise Exception(response.text)
+            else:
+                log_data = {
+                    "text_log": message,
+                }
+                headers = {"Authorization": f"Bearer {self._api_key}"}
+                response = requests.post(
+                    f"{self._server_url}/api/log/add", json=log_data, headers=headers
+                )
+                if response.status_code != 200:
+                    raise Exception(response.text)

        except Exception as e:
            print(f"Exception in APILogger: {e}")
+            raise

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 8

Commits Files that changed from the base of the PR and between 1797f62 and abd6b25.
Files selected for processing (5)
  • examples/using_pandasai_log_server.py (1 hunks)
  • pandasai/helpers/query_exec_tracker.py (1 hunks)
  • pandasai/schemas/df_config.py (3 hunks)
  • pandasai/smart_datalake/init.py (7 hunks)
  • tests/test_query_tracker.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • examples/using_pandasai_log_server.py
Additional comments (Suppressed): 10
pandasai/schemas/df_config.py (2)
  • 8-14: The LogServerConfig class has been added. It's a TypedDict that expects server_url and api_key as keys. Ensure that the server URL and API key are being properly validated and secured elsewhere in the code, as they are sensitive data.

  • 34-34: The log_server attribute has been added to the Config class. It's of type LogServerConfig. Ensure that all instances of the Config class in the codebase are updated to include this new attribute if necessary.

-    llm: Any = None
+    llm: Any = None
+    log_server: LogServerConfig = None
pandasai/smart_datalake/__init__.py (8)
  • 20-29: The time module has been removed and replaced with the QueryExecTracker class for tracking execution time. Ensure that the QueryExecTracker class provides the same or better functionality as the time module. Also, verify that the QueryExecTracker class has been thoroughly tested and is reliable.

  • 193-195: The _start_timer method has been removed. This method was previously used to start a timer for tracking the execution time of a query. This functionality appears to have been replaced by the QueryExecTracker class. Ensure that the QueryExecTracker class provides the same or better functionality as the _start_timer method.

  • 282-288: The QueryExecTracker object is created with several parameters. Ensure that these parameters are always available and correctly set when this method is called. Also, verify that the QueryExecTracker class handles these parameters correctly.

  • 300-326: The QueryExecTracker class is used to execute functions and track their execution time. This is a significant change from the previous implementation, which directly called the functions. Ensure that the QueryExecTracker class correctly executes the functions and accurately tracks their execution time. Also, verify that the QueryExecTracker class handles errors and exceptions correctly.

  • 343-350: The QueryExecTracker class is used to execute the execute_code method of the _code_manager object. Ensure that the QueryExecTracker class correctly executes this method and accurately tracks its execution time. Also, verify that the QueryExecTracker class handles errors and exceptions correctly.

  • 365-370: The QueryExecTracker class is used to execute the _retry_run_code method. Ensure that the QueryExecTracker class correctly executes this method and accurately tracks its execution time. Also, verify that the QueryExecTracker class handles errors and exceptions correctly.

  • 379-393: The QueryExecTracker class is used to add steps to the tracker. Ensure that the QueryExecTracker class correctly adds these steps and that the steps are accurately represented in the tracker.

  • 409-417: The QueryExecTracker class is used to get the execution time and parse the result. Ensure that the QueryExecTracker class correctly performs these operations and accurately tracks the execution time. Also, verify that the QueryExecTracker class handles errors and exceptions correctly.

Comment on lines +24 to +31
_query_info: str = {}
_dataframes: List = []
_response: ResponseType = {}
_steps: List = []
_start_time = None
_success: bool = False
_server_config: dict = None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class variables are shared among all instances of the class. This could lead to data races and inconsistencies if multiple instances of QueryExecTracker are used concurrently. Consider moving these variables to the __init__ method to make them instance variables.

-    _query_info: str = {}
-    _dataframes: List = []
-    _response: ResponseType = {}
-    _steps: List = []
-    _start_time = None
-    _success: bool = False
-    _server_config: dict = None
+    def __init__(
+        self,
+        conversation_id: str,
+        query: str,
+        instance: str,
+        output_type: str,
+        server_config: Union[dict, None] = None,
+    ) -> None:
+        self._query_info: str = {}
+        self._dataframes: List = []
+        self._response: ResponseType = {}
+        self._steps: List = []
+        self._start_time = None
+        self._success: bool = False
+        self._server_config: dict = server_config

Comment on lines +70 to +110
def execute_func(self, function, *args, **kwargs) -> Any:
"""
Tracks function executions, calculates execution time and prepare data
Args:
function (function): Function that is to be executed

Returns:
Any: Response return after function execution
"""
start_time = time.time()

# Get the tag from kwargs if provided, or use the function name as the default
tag = kwargs.pop("tag", function.__name__)

try:
result = function(*args, **kwargs)

execution_time = time.time() - start_time
if tag not in exec_steps:
return result

step_data = self._generate_exec_step(tag, result)

step_data["type"] = exec_steps[tag]
step_data["success"] = True
step_data["execution_time"] = execution_time

self._steps.append(step_data)

return result

except Exception:
execution_time = time.time() - start_time
self._steps.append(
{
"type": exec_steps[tag],
"success": False,
"execution_time": execution_time,
}
)
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The execute_func method is doing too much. It's executing a function, timing it, generating execution steps, and handling exceptions. Consider breaking this method into smaller, more focused methods to improve maintainability and readability.

raise Exception(response.text)

except Exception as e:
print(f"Exception in APILogger: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using print statements for error logging. Consider using a logging library which provides more flexibility and control over log levels, formats, and destinations.

-            print(f"Exception in APILogger: {e}")
+            logging.error(f"Exception in APILogger: {e}")

Comment on lines +216 to +221
def success(self):
return self._success

@success.setter
def success(self, value):
self._success = value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The success property setter is not validating the input. Consider adding a check to ensure that value is a boolean to prevent invalid data types from being assigned.

+    @success.setter
+    def success(self, value):
+        if not isinstance(value, bool):
+            raise ValueError("Success value must be a boolean.")
+        self._success = value

Comment on lines +82 to +93
def test_add_dataframes(
self, smart_dataframe: SmartDataframe, tracker: QueryExecTracker
):
# Add the dataframe to the tracker
tracker._dataframes = []
tracker.add_dataframes([smart_dataframe])

# Check if the dataframe was added correctly
assert len(tracker._dataframes) == 1
assert len(tracker._dataframes[0]["headers"]) == 3
assert len(tracker._dataframes[0]["rows"]) == 5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_add_dataframes is not comprehensive. It only checks the length of the dataframe and the number of headers and rows. It would be better to also check the actual content of the dataframe to ensure it was added correctly.

+        # Check if the dataframe was added correctly
+        assert tracker._dataframes[0]["dataframe"].equals(smart_dataframe)

Comment on lines +152 to +164
def mock_function(*args, **kwargs):
time.sleep(1)
return "Mock Result"

# Sleep for a while to simulate execution time
with patch("time.time", return_value=0):
tracker.execute_func(mock_function, tag="cache_hit")

# Get the execution time
execution_time = tracker.get_execution_time()

# Check if the execution time is approximately 1 second
assert_almost_equal(execution_time, 1.0, delta=0.3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_get_execution_time is not reliable. It assumes that the execution time of the mock_function is always 1 second. This might not be the case in different environments or under different loads. Consider mocking the time.sleep function instead of assuming its behavior.

-        def mock_function(*args, **kwargs):
-            time.sleep(1)
-            return "Mock Result"
-
-        # Sleep for a while to simulate execution time
-        with patch("time.time", return_value=0):
-            tracker.execute_func(mock_function, tag="cache_hit")
+        with patch("time.sleep") as mock_sleep:
+            def mock_function(*args, **kwargs):
+                time.sleep(1)
+                return "Mock Result"
+
+            # Sleep for a while to simulate execution time
+            with patch("time.time", return_value=0):
+                tracker.execute_func(mock_function, tag="cache_hit")
+            mock_sleep.assert_called_once_with(1)

Comment on lines +294 to +326
def test_publish_method_with_server_key(self, tracker: QueryExecTracker):
# Define a mock summary function
def mock_get_summary():
return "Test summary data"

# Mock the server_config
tracker._server_config = {
"server_url": "http://custom-server",
"api_key": "custom-api-key",
}

# Set the get_summary method to your mock
tracker.get_summary = mock_get_summary

# Mock the requests.post method
mock_response = MagicMock()
mock_response.status_code = 200
type(mock_response).text = "Response text"
# Mock the requests.post method
with patch("requests.post", return_value=mock_response) as mock_post:
# Call the publish method
result = tracker.publish()

# Check that requests.post was called with the expected parameters
mock_post.assert_called_with(
"http://custom-server/api/log/add",
json={"json_log": "Test summary data"},
headers={"Authorization": "Bearer custom-api-key"},
)

# Check the result
assert result is None # The function should return None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_publish_method_with_server_key is not comprehensive. It only checks if the requests.post method was called with the correct parameters. It would be better to also check the response of the requests.post method to ensure it behaves as expected.

+        # Check the response
+        assert mock_response.status_code == 200
+        assert mock_response.text == "Response text"

Comment on lines +352 to +381
def test_publish_method_with_os_env(self, tracker: QueryExecTracker):
# Define a mock summary function
def mock_get_summary():
return "Test summary data"

# Define a mock environment for testing
os.environ["LOGGING_SERVER_URL"] = "http://test-server"
os.environ["LOGGING_SERVER_API_KEY"] = "test-api-key"

# Set the get_summary method to your mock
tracker.get_summary = mock_get_summary

# Mock the requests.post method
mock_response = MagicMock()
mock_response.status_code = 200
type(mock_response).text = "Response text"
# Mock the requests.post method
with patch("requests.post", return_value=mock_response) as mock_post:
# Call the publish method
result = tracker.publish()

# Check that requests.post was called with the expected parameters
mock_post.assert_called_with(
"http://test-server/api/log/add",
json={"json_log": "Test summary data"},
headers={"Authorization": "Bearer test-api-key"},
)

# Check the result
assert result is None # The function should return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_publish_method_with_os_env is not cleaning up the environment variables it sets. This could potentially affect other tests. Consider using unittest.mock.patch.dict to temporarily set the environment variables for this test.

-        # Define a mock environment for testing
-        os.environ["LOGGING_SERVER_URL"] = "http://test-server"
-        os.environ["LOGGING_SERVER_API_KEY"] = "test-api-key"
+        # Define a mock environment for testing
+        mock_env = {
+            "LOGGING_SERVER_URL": "http://test-server",
+            "LOGGING_SERVER_API_KEY": "test-api-key",
+        }
+        with patch.dict(os.environ, mock_env):
+            # Set the get_summary method to your mock
+            tracker.get_summary = mock_get_summary
+
+            # Mock the requests.post method
+            mock_response = MagicMock()
+            mock_response.status_code = 200
+            type(mock_response).text = "Response text"
+            # Mock the requests.post method
+            with patch("requests.post", return_value=mock_response) as mock_post:
+                # Call the publish method
+                result = tracker.publish()

@gventuri gventuri changed the base branch from main to release/v1.4 October 13, 2023 23:06
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 13, 2023

CodeRabbit review skipped

By default, CodeRabbit only reviews PRs on the default branch. If you wish to have PRs reviewed on additional branches, you can configure that under the repository settings in the UI.

@gventuri gventuri merged commit 269a867 into release/v1.4 Oct 13, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants