-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(QueryTracker): track query execution cycle #630
Conversation
WalkthroughThe changes introduce a Changes
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 7
Files selected for processing (7)
- examples/custom_logger.py (1 hunks)
- pandasai/custom_loggers/api_logger.py (1 hunks)
- pandasai/helpers/query_exec_tracker.py (1 hunks)
- pandasai/smart_datalake/init.py (7 hunks)
- tests/test_codemanager.py (1 hunks)
- tests/test_query_tracker.py (1 hunks)
- tests/test_smartdatalake.py (1 hunks)
Files skipped from review due to trivial changes (2)
- tests/test_codemanager.py
- tests/test_smartdatalake.py
Additional comments (Suppressed): 8
pandasai/smart_datalake/__init__.py (8)
20-29: The
time
module has been removed from the imports. Ensure that this does not affect other parts of the code that might be using it. Also, theQueryExecTracker
class has been imported frompandasai.helpers.query_exec_tracker
. Verify that this class is implemented correctly and that it is available in the specified location.193-195: The
_start_timer
method has been removed. This method was previously used to start a timer for tracking the execution time of the query. If the execution time is now being tracked elsewhere, ensure that it is being done correctly.276-293: The
QueryExecTracker
instance is being created and used to track the execution of the query. Ensure that theQueryExecTracker
class is implemented correctly and that it provides the necessary functionality for tracking the execution of the query.299-325: The
QueryExecTracker
instance is being used to execute functions and track their execution time. Ensure that theexecute_func
method of theQueryExecTracker
class is implemented correctly and that it provides accurate timing information.342-349: The
QueryExecTracker
instance is being used to execute theexecute_code
method of the_code_manager
instance. Ensure that theexecute_func
method of theQueryExecTracker
class is implemented correctly and that it provides accurate timing information.364-369: The
QueryExecTracker
instance is being used to execute the_retry_run_code
method. Ensure that theexecute_func
method of theQueryExecTracker
class is implemented correctly and that it provides accurate timing information.375-392: The
QueryExecTracker
instance is being used to add steps to the execution tracking. Ensure that theadd_step
method of theQueryExecTracker
class is implemented correctly and that it provides the necessary functionality for tracking the execution steps.398-413: The
QueryExecTracker
instance is being used to get the summary of the execution tracking and the execution time. Ensure that theget_summary
andget_execution_time
methods of theQueryExecTracker
class are implemented correctly and that they provide accurate information.
examples/custom_logger.py
Outdated
import pandas as pd | ||
from pandasai import Agent | ||
from pandasai.custom_loggers.api_logger import APILogger | ||
|
||
from pandasai.llm.openai import OpenAI | ||
|
||
employees_data = { | ||
"EmployeeID": [1, 2, 3, 4, 5], | ||
"Name": ["John", "Emma", "Liam", "Olivia", "William"], | ||
"Department": ["HR", "Sales", "IT", "Marketing", "Finance"], | ||
} | ||
|
||
salaries_data = { | ||
"EmployeeID": [1, 2, 3, 4, 5], | ||
"Salary": [5000, 6000, 4500, 7000, 5500], | ||
} | ||
|
||
employees_df = pd.DataFrame(employees_data) | ||
salaries_df = pd.DataFrame(salaries_data) | ||
|
||
|
||
llm = OpenAI("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") | ||
agent = Agent( | ||
[employees_df, salaries_df], | ||
config={"llm": llm, "enable_cache": True}, | ||
memory_size=10, | ||
logger=APILogger("SERVER-URL", "USER-ID", "API-KEY"), | ||
) | ||
|
||
# Chat with the agent | ||
response = agent.chat("Who gets paid the most?") | ||
print(response) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good overall. However, it's not a good practice to hardcode sensitive information like API keys directly in the code. Consider using environment variables or a secure configuration file to store these values. This will also make your code more flexible as you can change the values without modifying the code.
- llm = OpenAI("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")
+ import os
+ llm = OpenAI(os.getenv("OPENAI_KEY"))
- logger=APILogger("SERVER-URL", "USER-ID", "API-KEY"),
+ logger=APILogger(os.getenv("SERVER_URL"), os.getenv("USER_ID"), os.getenv("API_KEY")),
class APILogger(Logger): | ||
_api_key: str = None | ||
_server_url: str = None | ||
_user_id: str = None | ||
|
||
def __init__(self, server_url: str, user_id: str, api_key: str): | ||
self._api_key = api_key | ||
self._server_url = server_url | ||
self._user_id = user_id | ||
|
||
def log(self, message: str, level: int = logging.INFO): | ||
try: | ||
log_data = { | ||
# TODO - Remove user id from the API | ||
"user_id": self._user_id, | ||
"api_key_id": self._api_key, | ||
"json_log": message, | ||
} | ||
response = requests.post(f"{self._server_url}/api/log/add", json=log_data) | ||
if response.status_code != 200: | ||
raise Exception(response.text) | ||
except Exception: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The APILogger
class extends the Logger
class but does not call the parent's constructor in its __init__
method. This could lead to issues if the parent class Logger
has important initializations in its constructor. Also, the log
method silently passes any exceptions, which could make debugging difficult if there are issues with the logging. Consider logging the exception or re-raising it after handling.
def __init__(self, server_url: str, user_id: str, api_key: str):
+ super().__init__()
self._api_key = api_key
self._server_url = server_url
self._user_id = user_id
...
except Exception as e:
+ super().error(f"Failed to log message: {e}")
pass
def log(self, message: str, level: int = logging.INFO): | ||
try: | ||
log_data = { | ||
# TODO - Remove user id from the API | ||
"user_id": self._user_id, | ||
"api_key_id": self._api_key, | ||
"json_log": message, | ||
} | ||
response = requests.post(f"{self._server_url}/api/log/add", json=log_data) | ||
if response.status_code != 200: | ||
raise Exception(response.text) | ||
except Exception: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log
method sends a POST request to the server every time it is called. This could be a performance issue if the method is called frequently. Consider using a buffer to store log messages and send them in batches.
tests/test_query_tracker.py
Outdated
tracker._steps = [] | ||
# Add the step to the tracker | ||
tracker.add_step(step) | ||
|
||
# Check if the step was added correctly | ||
assert len(tracker._steps) == 1 | ||
assert tracker._steps[0] == step | ||
|
||
def test_execute_func_success(self, tracker: QueryExecTracker): | ||
tracker._steps = [] | ||
|
||
# Create a mock function | ||
mock_return_value = Mock() | ||
mock_return_value.to_string = Mock() | ||
mock_return_value.to_string.return_value = "Mock Result" | ||
|
||
mock_func = Mock() | ||
mock_func.return_value = mock_return_value | ||
|
||
# Execute the mock function using execute_func | ||
result = tracker.execute_func(mock_func, tag="_get_prompt") | ||
|
||
# Check if the result is as expected | ||
assert result.to_string() == "Mock Result" | ||
# Check if the step was added correctly | ||
assert len(tracker._steps) == 1 | ||
step = tracker._steps[0] | ||
assert step["type"] == "Generate Prompt" | ||
assert step["success"] is True | ||
|
||
def test_execute_func_failure(self, tracker: QueryExecTracker): | ||
# Create a mock function that raises an exception | ||
def mock_function(*args, **kwargs): | ||
raise Exception("Mock Exception") | ||
|
||
# Execute the mock function using execute_func and expect an exception | ||
with pytest.raises(Exception): | ||
tracker.execute_func(mock_function, tag="custom_tag") | ||
|
||
def test_format_response_dataframe( | ||
self, tracker: QueryExecTracker, sample_df: pd.DataFrame | ||
): | ||
# Create a sample ResponseType for a dataframe | ||
response = {"type": "dataframe", "value": sample_df} | ||
|
||
# Format the response using _format_response | ||
formatted_response = tracker._format_response(response) | ||
|
||
# Check if the response is formatted correctly | ||
assert formatted_response["type"] == "dataframe" | ||
assert len(formatted_response["value"]["headers"]) == 3 | ||
assert len(formatted_response["value"]["rows"]) == 10 | ||
|
||
def test_format_response_other_type(self, tracker: QueryExecTracker): | ||
# Create a sample ResponseType for a non-dataframe response | ||
response = { | ||
"type": "other_type", | ||
"value": "SomeValue", | ||
} | ||
|
||
# Format the response using _format_response | ||
formatted_response = tracker._format_response(response) | ||
|
||
# Check if the response is formatted correctly | ||
assert formatted_response["type"] == "other_type" | ||
assert formatted_response["value"] == "SomeValue" | ||
|
||
def test_get_summary(self, tracker: QueryExecTracker): | ||
# Execute a mock function to generate some steps and response | ||
def mock_function(*args, **kwargs): | ||
return "Mock Result" | ||
|
||
tracker.execute_func(mock_function, tag="custom_tag") | ||
|
||
# Get the summary | ||
summary = tracker.get_summary() | ||
|
||
# Check if the summary contains the expected keys | ||
assert "query_info" in summary | ||
assert "dataframes" in summary | ||
assert "steps" in summary | ||
assert "response" in summary | ||
assert "execution_time" in summary | ||
|
||
def test_get_execution_time(self, tracker: QueryExecTracker): | ||
def mock_function(*args, **kwargs): | ||
time.sleep(1) | ||
return "Mock Result" | ||
|
||
# Sleep for a while to simulate execution time | ||
with patch("time.time", return_value=0): | ||
tracker.execute_func(mock_function, tag="cache_hit") | ||
|
||
# Get the execution time | ||
execution_time = tracker.get_execution_time() | ||
|
||
print("Type", execution_time) | ||
|
||
# Check if the execution time is approximately 1 second | ||
self.assert_almost_equal(execution_time, 1.0, delta=0.1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test cases are well written and cover all the methods in the QueryExecTracker
class. However, there are a few areas that could be improved:
-
The
assert_almost_equal
method (lines 80-87) is a custom implementation of a function that already exists in theunittest
module. It would be better to use the built-in function to reduce code complexity and potential errors. -
The
test_execute_func_failure
test (lines 135-143) could be improved by checking the specific exception message. This would ensure that the function fails for the expected reason. -
The
test_get_execution_time
test (lines 189-204) uses atime.sleep
call to simulate execution time. This could potentially slow down the test suite. Consider using mocking to simulate the passage of time instead.
Here are the suggested changes:
- # Define a custom assert_almost_equal function
- def assert_almost_equal(self, first, second, places=None, msg=None, delta=None):
- if delta is not None:
- if abs(float(first) - float(second)) > delta:
- raise AssertionError(msg or f"{first} != {second} within {delta}")
- else:
- assert round(abs(float(second) - float(first)), places) == 0, (
- msg or f"{first} != {second} within {places} places"
- )
+ from unittest import TestCase
+ assert_almost_equal = TestCase().assertAlmostEqual
- # Execute the mock function using execute_func and expect an exception
- with pytest.raises(Exception):
- tracker.execute_func(mock_function, tag="custom_tag")
+ # Execute the mock function using execute_func and expect a specific exception
+ with pytest.raises(Exception) as e:
+ tracker.execute_func(mock_function, tag="custom_tag")
+ assert str(e.value) == "Mock Exception", "Unexpected exception message"
- def mock_function(*args, **kwargs):
- time.sleep(1)
- return "Mock Result"
-
- # Sleep for a while to simulate execution time
- with patch("time.time", return_value=0):
- tracker.execute_func(mock_function, tag="cache_hit")
+ # Mock time.sleep to simulate execution time without actually delaying the test
+ with patch("time.sleep", new=lambda x: None):
+ tracker.execute_func(mock_function, tag="cache_hit")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 12
Files selected for processing (3)
- examples/custom_logger.py (1 hunks)
- pandasai/smart_datalake/init.py (7 hunks)
- tests/test_query_tracker.py (1 hunks)
Additional comments (Suppressed): 11
examples/custom_logger.py (3)
1-32: The
Agent
class is being initialized with a logger instance ofAPILogger
. Ensure that the server at "SERVER-URL" is set up to receive and handle log messages correctly. Also, verify that the "USER-ID" and "API-KEY" are valid and have the necessary permissions to send log messages to the server.22-28: The
OpenAI
class is being initialized with "Your-API-Key". Make sure to replace "Your-API-Key" with a valid API key. Also, ensure that theAgent
class is compatible with theOpenAI
class and can handle any exceptions or errors that may be thrown by it.31-32: The
chat
method is being called on theAgent
instance. Ensure that theAgent
class has achat
method and that it can handle the string "Who gets paid the most?" as an argument. Also, verify that thechat
method returns a response that can be printed.pandasai/smart_datalake/__init__.py (8)
20-29: The import of
time
module has been removed andQueryExecTracker
frompandasai.helpers.query_exec_tracker
has been added. Ensure that theQueryExecTracker
class is correctly implemented and that the removal oftime
does not affect other parts of the code.190-195: The
_start_timer
method has been removed. This change is acceptable if the timing functionality is now handled by theQueryExecTracker
class. Verify that this is the case.276-292: The
QueryExecTracker
instance is created and dataframes are added to it. This is a good practice as it encapsulates the tracking logic in a separate class, improving modularity and maintainability.298-324: The
QueryExecTracker
'sexecute_func
method is used to execute functions and track their execution time. This is a good practice as it allows for more granular tracking of execution times.341-348: The
QueryExecTracker
'sexecute_func
method is used to execute the code and track its execution time. This is a good practice as it allows for more granular tracking of execution times.363-368: The
QueryExecTracker
'sexecute_func
method is used to retry running the code and track its execution time. This is a good practice as it allows for more granular tracking of execution times.374-395: The
QueryExecTracker
'sadd_step
method is used to add steps to the tracker. This is a good practice as it allows for more granular tracking of the execution process.396-412: The
QueryExecTracker
'sget_summary
andget_execution_time
methods are used to log the summary and execution time. This is a good practice as it allows for more granular tracking of the execution process.
# Create a sample step | ||
step = {"type": "CustomStep", "description": "This is a custom step."} | ||
|
||
tracker._steps = [] | ||
# Add the step to the tracker | ||
tracker.add_step(step) | ||
|
||
# Check if the step was added correctly | ||
assert len(tracker._steps) == 1 | ||
assert tracker._steps[0] == step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_add_step
assumes that the step will always be added at the end of the _steps
list. This might not be the case if the implementation of add_step
changes in the future. Consider making the test more flexible by checking if the step is in the _steps
list, not necessarily at the end.
def test_format_response_dataframe( | ||
self, tracker: QueryExecTracker, sample_df: pd.DataFrame | ||
): | ||
# Create a sample ResponseType for a dataframe | ||
response = {"type": "dataframe", "value": sample_df} | ||
|
||
# Format the response using _format_response | ||
formatted_response = tracker._format_response(response) | ||
|
||
# Check if the response is formatted correctly | ||
assert formatted_response["type"] == "dataframe" | ||
assert len(formatted_response["value"]["headers"]) == 3 | ||
assert len(formatted_response["value"]["rows"]) == 10 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_format_response_dataframe
assumes that the dataframe will always have 10 rows. This might not be the case if the dataframe changes in the future. Consider making the test more flexible by checking the actual number of rows in the dataframe.
def test_get_summary(self, tracker: QueryExecTracker): | ||
# Execute a mock function to generate some steps and response | ||
def mock_function(*args, **kwargs): | ||
return "Mock Result" | ||
|
||
tracker.execute_func(mock_function, tag="custom_tag") | ||
|
||
# Get the summary | ||
summary = tracker.get_summary() | ||
|
||
# Check if the summary contains the expected keys | ||
assert "query_info" in summary | ||
assert "dataframes" in summary | ||
assert "steps" in summary | ||
assert "response" in summary | ||
assert "execution_time" in summary | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_get_summary
assumes that the summary will always contain the keys "query_info", "dataframes", "steps", "response", and "execution_time". This might not be the case if the implementation of get_summary
changes in the future. Consider making the test more flexible by checking if the keys are in the summary, not necessarily all of them.
def test_get_execution_time(self, tracker: QueryExecTracker): | ||
def mock_function(*args, **kwargs): | ||
time.sleep(1) | ||
return "Mock Result" | ||
|
||
# Sleep for a while to simulate execution time | ||
with patch("time.time", return_value=0): | ||
tracker.execute_func(mock_function, tag="cache_hit") | ||
|
||
# Get the execution time | ||
execution_time = tracker.get_execution_time() | ||
|
||
# Check if the execution time is approximately 1 second | ||
assert_almost_equal(execution_time, 1.0, delta=0.3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_get_execution_time
assumes that the execution time will always be approximately 1 second. This might not be the case if the implementation of execute_func
or the system's performance changes in the future. Consider making the test more flexible by checking if the execution time is within a reasonable range.
tests/test_query_tracker.py
Outdated
def test_execute_func_success(self, tracker: QueryExecTracker): | ||
tracker._steps = [] | ||
|
||
# Create a mock function | ||
mock_return_value = Mock() | ||
mock_return_value.to_string = Mock() | ||
mock_return_value.to_string.return_value = "Mock Result" | ||
|
||
mock_func = Mock() | ||
mock_func.return_value = mock_return_value | ||
|
||
# Execute the mock function using execute_func | ||
result = tracker.execute_func(mock_func, tag="_get_prompt") | ||
|
||
# Check if the result is as expected | ||
assert result.to_string() == "Mock Result" | ||
# Check if the step was added correctly | ||
assert len(tracker._steps) == 1 | ||
step = tracker._steps[0] | ||
assert step["type"] == "Generate Prompt" | ||
assert step["success"] is True | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_execute_func_success
assumes that the step will always be added at the end of the _steps
list and that the type of the step will always be "Generate Prompt". This might not be the case if the implementation of execute_func
or add_step
changes in the future. Consider making the test more flexible by checking if the step is in the _steps
list and if the type of the step is one of the possible types.
tests/test_query_tracker.py
Outdated
tracker._steps = [] | ||
|
||
mock_func = Mock() | ||
mock_func.return_value = "code" | ||
|
||
# Execute the mock function using execute_func | ||
result = tracker.execute_func(mock_func, tag="cache_hit") | ||
|
||
# Check if the result is as expected | ||
assert result == "code" | ||
# Check if the step was added correctly | ||
assert len(tracker._steps) == 1 | ||
step = tracker._steps[0] | ||
assert "code_generated" in step | ||
assert step["type"] == "Cache Hit" | ||
assert step["success"] is True | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_execute_func_cache_hit
assumes that the step will always be added at the end of the _steps
list and that the type of the step will always be "Cache Hit". This might not be the case if the implementation of execute_func
or add_step
changes in the future. Consider making the test more flexible by checking if the step is in the _steps
list and if the type of the step is one of the possible types.
tests/test_query_tracker.py
Outdated
def test_execute_func_generate_code(self, tracker: QueryExecTracker): | ||
tracker._steps = [] | ||
|
||
# Create a mock function | ||
mock_func = Mock() | ||
mock_func.return_value = "code" | ||
|
||
# Execute the mock function using execute_func | ||
result = tracker.execute_func(mock_func, tag="generate_code") | ||
|
||
# Check if the result is as expected | ||
assert result == "code" | ||
# Check if the step was added correctly | ||
assert len(tracker._steps) == 1 | ||
step = tracker._steps[0] | ||
assert "code_generated" in step | ||
assert step["type"] == "Generate Code" | ||
assert step["success"] is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_execute_func_generate_code
assumes that the step will always be added at the end of the _steps
list and that the type of the step will always be "Generate Code". This might not be the case if the implementation of execute_func
or add_step
changes in the future. Consider making the test more flexible by checking if the step is in the _steps
list and if the type of the step is one of the possible types.
tests/test_query_tracker.py
Outdated
def test_execute_func_re_rerun_code(self, tracker: QueryExecTracker): | ||
tracker._steps = [] | ||
|
||
# Create a mock function | ||
mock_func = Mock() | ||
mock_func.return_value = "code" | ||
|
||
# Execute the mock function using execute_func | ||
result = tracker.execute_func(mock_func, tag="_retry_run_code") | ||
|
||
# Check if the result is as expected | ||
assert result == "code" | ||
# Check if the step was added correctly | ||
assert len(tracker._steps) == 1 | ||
step = tracker._steps[0] | ||
assert "code_generated" in step | ||
assert step["type"] == "Retry Code Generation" | ||
assert step["success"] is True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_execute_func_re_rerun_code
assumes that the step will always be added at the end of the _steps
list and that the type of the step will always be "Retry Code Generation". This might not be the case if the implementation of execute_func
or add_step
changes in the future. Consider making the test more flexible by checking if the step is in the _steps
list and if the type of the step is one of the possible types.
tests/test_query_tracker.py
Outdated
def test_execute_func_execute_code_success( | ||
self, sample_df: pd.DataFrame, tracker: QueryExecTracker | ||
): | ||
tracker._steps = [] | ||
|
||
mock_func = Mock() | ||
mock_func.return_value = {"type": "dataframe", "value": sample_df} | ||
|
||
# Execute the mock function using execute_func | ||
result = tracker.execute_func(mock_func, tag="execute_code") | ||
|
||
# Check if the result is as expected | ||
assert result["type"] == "dataframe" | ||
# Check if the step was added correctly | ||
assert len(tracker._steps) == 1 | ||
step = tracker._steps[0] | ||
assert "result" in step | ||
assert step["type"] == "Code Execution" | ||
assert step["success"] is True | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_execute_func_execute_code_success
assumes that the step will always be added at the end of the _steps
list and that the type of the step will always be "Code Execution". This might not be the case if the implementation of execute_func
or add_step
changes in the future. Consider making the test more flexible by checking if the step is in the _steps
list and if the type of the step is one of the possible types.
tests/test_query_tracker.py
Outdated
def test_execute_func_execute_code_fail( | ||
self, sample_df: pd.DataFrame, tracker: QueryExecTracker | ||
): | ||
tracker._steps = [] | ||
|
||
def mock_function(*args, **kwargs): | ||
raise Exception("Mock Exception") | ||
|
||
with pytest.raises(Exception): | ||
tracker.execute_func(mock_function, tag="execute_code") | ||
|
||
assert len(tracker._steps) == 1 | ||
step = tracker._steps[0] | ||
assert step["type"] == "Code Execution" | ||
assert step["success"] is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_execute_func_execute_code_fail
assumes that the function will always raise an exception and that the type of the step will always be "Code Execution". This might not be the case if the implementation of execute_func
or add_step
changes in the future. Consider making the test more flexible by checking if the function raises an exception under certain conditions and if the type of the step is one of the possible types.
Codecov Report
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. @@ Coverage Diff @@
## main #630 +/- ##
==========================================
+ Coverage 84.29% 84.73% +0.44%
==========================================
Files 65 66 +1
Lines 3012 3099 +87
==========================================
+ Hits 2539 2626 +87
Misses 473 473
... and 7 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def execute_func(self, function, *args, **kwargs) -> Any: | ||
""" | ||
Tracks function executions, calculates execution time and prepare data | ||
Args: | ||
function (function): Function that is to be executed | ||
|
||
Returns: | ||
Any: Response return after function execution | ||
""" | ||
start_time = time.time() | ||
|
||
func_name = kwargs["tag"] if "tag" in kwargs else function.__name__ | ||
|
||
try: | ||
result = function(*args, **kwargs) | ||
|
||
execution_time = time.time() - start_time | ||
if func_name not in exec_steps: | ||
return result | ||
|
||
step_data = self._generate_exec_step(func_name, result) | ||
|
||
step_data["type"] = exec_steps[func_name] | ||
step_data["success"] = True | ||
step_data["execution_time"] = execution_time | ||
|
||
self._steps.append(step_data) | ||
|
||
return result | ||
|
||
except Exception: | ||
execution_time = time.time() - start_time | ||
self._steps.append( | ||
{ | ||
"type": exec_steps[func_name], | ||
"success": False, | ||
"execution_time": execution_time, | ||
} | ||
) | ||
raise | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execute_func
method is doing too much. It's executing a function, timing it, generating step data, and handling exceptions. Consider breaking this method into smaller, more focused methods to improve readability and maintainability.
func_name = kwargs["tag"] if "tag" in kwargs else function.__name__ | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tag
keyword argument is used to override the function name in the tracking data. This could lead to confusion if the tag
does not accurately represent the function being executed. Consider removing this feature or ensuring that the tag
is always representative of the function.
except Exception: | ||
execution_time = time.time() - start_time | ||
self._steps.append( | ||
{ | ||
"type": exec_steps[func_name], | ||
"success": False, | ||
"execution_time": execution_time, | ||
} | ||
) | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception handling here is too broad. Catching all exceptions and simply re-raising them after logging the failure does not provide much value. Consider catching specific exceptions that you can handle in a meaningful way.
def _generate_exec_step(self, func_name: str, result: Any) -> dict: | ||
""" | ||
Extracts and Generates result | ||
Args: | ||
func_name (str): function name that is executed | ||
result (Any): function output response | ||
|
||
Returns: | ||
dict: dictionary with information about the function execution | ||
""" | ||
if ( | ||
func_name == "cache_hit" | ||
or func_name == "generate_code" | ||
or func_name == "_retry_run_code" | ||
): | ||
return {"code_generated": result} | ||
elif func_name == "_get_prompt": | ||
return { | ||
"prompt_class": result.__class__.__name__, | ||
"generated_prompt": result.to_string(), | ||
} | ||
elif func_name == "execute_code": | ||
self._response = self._format_response(result) | ||
return {"result": result} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _generate_exec_step
method uses a series of if
statements to determine what data to return based on the function name. This is not very scalable or maintainable. Consider using a dictionary mapping function names to handler functions, or redesigning this part of the system to avoid the need for this method.
summary = self.get_summary_dict() | ||
try: | ||
return json.dumps(summary, indent=4) | ||
except Exception: | ||
return summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_summary
method attempts to return a JSON string representation of the summary, but falls back to returning the raw dictionary if this fails. This could lead to inconsistent return types. Consider always returning the raw dictionary and letting the caller decide how to serialize it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 8
Files selected for processing (2)
- pandasai/helpers/query_exec_tracker.py (1 hunks)
- tests/test_query_tracker.py (1 hunks)
Additional comments (Suppressed): 1
pandasai/helpers/query_exec_tracker.py (1)
- 33-39: The
conversation_id
is being converted to a string. If it's not a string, this could lead to unexpected behavior. Verify the type ofconversation_id
before passing it to theQueryExecTracker
.
tracker._dataframes = [] | ||
tracker.add_dataframes([smart_dataframe]) | ||
|
||
# Check if the dataframe was added correctly | ||
assert len(tracker._dataframes) == 1 | ||
assert len(tracker._dataframes[0]["headers"]) == 3 | ||
assert len(tracker._dataframes[0]["rows"]) == 5 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_add_dataframes
assumes that the dataframe will always have 3 headers and 5 rows. This might not be the case for different dataframes. Consider making the test more generic to handle different dataframes.
with patch("time.time", return_value=0): | ||
tracker.execute_func(mock_function, tag="cache_hit") | ||
|
||
# Get the execution time | ||
execution_time = tracker.get_execution_time() | ||
|
||
# Check if the execution time is approximately 1 second | ||
assert_almost_equal(execution_time, 1.0, delta=0.3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_get_execution_time
assumes that the execution time will always be approximately 1 second. This might not be the case for different functions or systems. Consider making the test more generic to handle different execution times.
# Create a mock function that raises an exception | ||
def mock_function(*args, **kwargs): | ||
raise Exception("Mock Exception") | ||
|
||
with pytest.raises(Exception): | ||
tracker.execute_func(mock_function, tag="custom_tag") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_execute_func_failure
does not check if the step was added correctly to the tracker after the function raises an exception. Consider adding assertions to check the step in the tracker.
|
||
mock_func = Mock() | ||
mock_func.side_effect = Exception("Mock Exception") | ||
mock_func.__name__ = "execute_code" | ||
|
||
with pytest.raises(Exception): | ||
tracker.execute_func(mock_func) | ||
|
||
assert len(tracker._steps) == 1 | ||
step = tracker._steps[0] | ||
assert step["type"] == "Code Execution" | ||
assert step["success"] is False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_execute_func_execute_code_fail
does not check if the error message in the step is correct after the function raises an exception. Consider adding assertions to check the error message in the step.
_query_info: str = {} | ||
_dataframes: List = [] | ||
_response: ResponseType = {} | ||
_steps: List = [] | ||
_start_time = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class variables _query_info
, _dataframes
, _response
, _steps
, and _start_time
should be instance variables. Class variables are shared across all instances of the class, which is not the desired behavior here. Each QueryExecTracker
instance should have its own state.
- _query_info: str = {}
- _dataframes: List = []
- _response: ResponseType = {}
- _steps: List = []
- _start_time = None
+ def __init__(
+ self,
+ conversation_id: str,
+ query: str,
+ instance: str,
+ output_type: str,
+ ) -> None:
+ self._query_info: str = {}
+ self._dataframes: List = []
+ self._response: ResponseType = {}
+ self._steps: List = []
+ self._start_time = None
if ( | ||
func_name == "cache_hit" | ||
or func_name == "generate_code" | ||
or func_name == "_retry_run_code" | ||
): | ||
return {"code_generated": result} | ||
elif func_name == "_get_prompt": | ||
return { | ||
"prompt_class": result.__class__.__name__, | ||
"generated_prompt": result.to_string(), | ||
} | ||
elif func_name == "execute_code": | ||
self._response = self._format_response(result) | ||
return {"result": result} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _generate_exec_step
method is using a series of if
statements to determine the function name. This could be simplified and made more maintainable by using a dictionary mapping function names to handlers.
def _format_response(self, result: ResponseType) -> ResponseType: | ||
""" | ||
Format output response | ||
Args: | ||
result (ResponseType): response returned after execution | ||
|
||
Returns: | ||
ResponseType: formatted response output | ||
""" | ||
formatted_result = {} | ||
if result["type"] == "dataframe": | ||
formatted_result = { | ||
"type": result["type"], | ||
"value": { | ||
"headers": result["value"].columns.tolist(), | ||
"rows": result["value"].values.tolist(), | ||
}, | ||
} | ||
return formatted_result | ||
else: | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _format_response
method is only formatting the response if the type is "dataframe". If the type is not "dataframe", it returns the result as is. This could lead to inconsistent response formats. Consider handling other types as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 6
Files selected for processing (4)
- examples/custom_logger.py (1 hunks)
- pandasai/custom_loggers/api_logger.py (1 hunks)
- pandasai/helpers/query_exec_tracker.py (1 hunks)
- pandasai/smart_datalake/init.py (7 hunks)
Additional comments (Suppressed): 8
examples/custom_logger.py (1)
- 1-36: The new hunk demonstrates the usage of the
APILogger
class. It creates an instance ofAgent
withAPILogger
as the logger. Thechat
method is then called on theAgent
instance, and the response is printed. The code seems to be logically correct and follows best practices. However, ensure that the "SERVER-URL" and "API-KEY" are replaced with actual values when using this code. Also, ensure that the "OPEN-API-KEY" is replaced with your actual OpenAI API key.pandasai/smart_datalake/__init__.py (7)
25-29: The
QueryExecTracker
class is imported but thetime
module is no longer imported. This suggests that theQueryExecTracker
class is being used to track the execution time of queries, which was previously done using thetime
module. Ensure that theQueryExecTracker
class provides the same functionality as thetime
module and that it is used correctly throughout the code.193-195: The
_start_timer
method has been removed. This method was previously used to start a timer for tracking the execution time of queries. Ensure that the removal of this method does not affect the functionality of the code and that the execution time of queries is still being tracked correctly.282-288: A
QueryExecTracker
object is created and dataframes are added to it. This object is used to track the execution of queries. Ensure that theQueryExecTracker
object is used correctly throughout the code and that it provides the same functionality as the previous method of tracking query execution.366-368: The
execute_func
method of theQueryExecTracker
object is used to execute functions while tracking their execution time. Ensure that this method is used correctly and that it provides the same functionality as the previous method of executing functions.377-391: Steps are added to the
QueryExecTracker
object. These steps represent the different stages of the query execution process. Ensure that these steps are added correctly and that they accurately represent the query execution process.407-407: The
get_execution_time
method of theQueryExecTracker
object is used to get the execution time of the query. Ensure that this method provides the same functionality as the previous method of getting the execution time.415-415: The
execute_func
method of theQueryExecTracker
object is used to parse the result of the query, and theget_summary
method is used to get a summary of the query execution. Ensure that these methods are used correctly and that they provide the same functionality as the previous methods of parsing the result and getting a summary.
import logging | ||
from typing import Union | ||
import requests | ||
from pandasai.helpers.logger import Logger | ||
|
||
|
||
class APILogger(Logger): | ||
_api_key: str = None | ||
_server_url: str = None | ||
|
||
def __init__(self, server_url: str, api_key: str): | ||
self._api_key = api_key | ||
self._server_url = server_url | ||
|
||
def log(self, message: Union[str, dict], level: int = logging.INFO): | ||
try: | ||
if isinstance(message, dict): | ||
log_data = { | ||
"api_key_id": self._api_key, | ||
"json_log": message, | ||
} | ||
response = requests.post( | ||
f"{self._server_url}/api/log/add", json=log_data | ||
) | ||
if response.status_code != 200: | ||
raise Exception(response.text) | ||
|
||
except Exception as e: | ||
print(f"Exception in APILogger: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The APILogger
class is a good example of a custom logger that sends log messages to a server. However, there are a few areas that could be improved:
-
Error handling: In the
log
method, you're catching all exceptions and printing them. This is not a good practice because it can make debugging difficult. Instead, consider logging the error message and re-throwing the exception, or handling specific exceptions that you expect might occur. -
Security: You're sending the API key in the body of the POST request. This could potentially expose the API key to anyone who has access to the logs or network traffic. Consider sending the API key in the headers of the request instead.
-
Performance: The
log
method sends a POST request every time it's called. This could potentially slow down your application if you're logging a lot of messages. Consider batching log messages and sending them in one request, or using a background thread to send log messages.
Here's how you might address these issues:
def log(self, message: Union[str, dict], level: int = logging.INFO):
try:
if isinstance(message, dict):
log_data = {
"json_log": message,
}
headers = {"Authorization": f"Bearer {self._api_key}"}
response = requests.post(
f"{self._server_url}/api/log/add", json=log_data, headers=headers
)
if response.status_code != 200:
raise Exception(response.text)
except Exception as e:
logging.error(f"Exception in APILogger: {e}")
raise
_query_info: str = {} | ||
_dataframes: List = [] | ||
_response: ResponseType = {} | ||
_steps: List = [] | ||
_start_time = None | ||
_success: bool = False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class variables are shared across all instances of the class. This could lead to unexpected behavior if multiple instances of QueryExecTracker
are created. These should be instance variables, defined inside the __init__
method.
- _query_info: str = {}
- _dataframes: List = []
- _response: ResponseType = {}
- _steps: List = []
- _start_time = None
- _success: bool = False
+ def __init__(
+ self,
+ conversation_id: str,
+ query: str,
+ instance: str,
+ output_type: str,
+ ) -> None:
+ self._query_info: str = {}
+ self._dataframes: List = []
+ self._response: ResponseType = {}
+ self._steps: List = []
+ self._start_time = None
+ self._success: bool = False
+ ...
def execute_func(self, function, *args, **kwargs) -> Any: | ||
""" | ||
Tracks function executions, calculates execution time and prepare data | ||
Args: | ||
function (function): Function that is to be executed | ||
|
||
Returns: | ||
Any: Response return after function execution | ||
""" | ||
start_time = time.time() | ||
|
||
# Get the tag from kwargs if provided, or use the function name as the default | ||
tag = kwargs.pop("tag", function.__name__) | ||
|
||
try: | ||
result = function(*args, **kwargs) | ||
|
||
execution_time = time.time() - start_time | ||
if tag not in exec_steps: | ||
return result | ||
|
||
step_data = self._generate_exec_step(tag, result) | ||
|
||
step_data["type"] = exec_steps[tag] | ||
step_data["success"] = True | ||
step_data["execution_time"] = execution_time | ||
|
||
self._steps.append(step_data) | ||
|
||
return result | ||
|
||
except Exception: | ||
execution_time = time.time() - start_time | ||
self._steps.append( | ||
{ | ||
"type": exec_steps[tag], | ||
"success": False, | ||
"execution_time": execution_time, | ||
} | ||
) | ||
raise | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execute_func
method is doing too much. It's executing a function, timing it, handling exceptions, and adding steps. Consider breaking this down into smaller methods for better maintainability and readability.
execution_time = time.time() - start_time | ||
if tag not in exec_steps: | ||
return result | ||
|
||
step_data = self._generate_exec_step(tag, result) | ||
|
||
step_data["type"] = exec_steps[tag] | ||
step_data["success"] = True | ||
step_data["execution_time"] = execution_time | ||
|
||
self._steps.append(step_data) | ||
|
||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if tag not in exec_steps:
check could lead to silent failures if a tag is misspelled or not included in exec_steps
. Consider raising an exception or logging a warning if the tag is not found.
except Exception: | ||
execution_time = time.time() - start_time | ||
self._steps.append( | ||
{ | ||
"type": exec_steps[tag], | ||
"success": False, | ||
"execution_time": execution_time, | ||
} | ||
) | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When an exception occurs, the step is added with a success
value of False
, but the _success
attribute of the QueryExecTracker
instance is not updated. This could lead to inconsistencies in the reported success status.
except Exception:
execution_time = time.time() - start_time
self._steps.append(
{
"type": exec_steps[tag],
"success": False,
"execution_time": execution_time,
}
)
+ self._success = False
raise
def get_summary(self) -> dict: | ||
""" | ||
Returns the summary in json to steps involved in execution of track | ||
Returns: | ||
dict: summary json | ||
""" | ||
execution_time = time.time() - self._start_time | ||
return { | ||
"query_info": self._query_info, | ||
"dataframes": self._dataframes, | ||
"steps": self._steps, | ||
"response": self._response, | ||
"execution_time": execution_time, | ||
"success": self._success, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_summary
method does not update the _success
attribute before returning the summary. This could lead to inconsistencies if the success
property setter is used after the summary is generated.
def get_summary(self) -> dict:
"""
Returns the summary in json to steps involved in execution of track
Returns:
dict: summary json
"""
execution_time = time.time() - self._start_time
+ self._success = all(step.get('success', False) for step in self._steps)
return {
"query_info": self._query_info,
"dataframes": self._dataframes,
"steps": self._steps,
"response": self._response,
"execution_time": execution_time,
"success": self._success,
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import logging | ||
from typing import Union | ||
import requests | ||
from pandasai.helpers.logger import Logger | ||
|
||
|
||
class APILogger(Logger): | ||
_api_key: str = None | ||
_server_url: str = None | ||
|
||
def __init__(self, server_url: str, api_key: str): | ||
self._api_key = api_key | ||
self._server_url = server_url | ||
|
||
def log(self, message: Union[str, dict], level: int = logging.INFO): | ||
try: | ||
if isinstance(message, dict): | ||
log_data = { | ||
"json_log": message, | ||
} | ||
headers = {"Authorization": f"Bearer {self._api_key}"} | ||
response = requests.post( | ||
f"{self._server_url}/api/log/add", json=log_data, headers=headers | ||
) | ||
if response.status_code != 200: | ||
raise Exception(response.text) | ||
|
||
except Exception as e: | ||
print(f"Exception in APILogger: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The APILogger
class extends the Logger
class but does not call the parent class's constructor. This could lead to issues if the parent class's constructor has important initializations. Also, the log
method does not handle the case when the message
is a string. It only handles the case when message
is a dictionary. This could lead to unexpected behavior if a string message is passed. Lastly, the log
method catches all exceptions and prints them, but it does not re-raise them. This could hide errors and make debugging difficult. Consider re-raising the exception after logging it.
class APILogger(Logger):
_api_key: str = None
_server_url: str = None
def __init__(self, server_url: str, api_key: str):
+ super().__init__()
self._api_key = api_key
self._server_url = server_url
def log(self, message: Union[str, dict], level: int = logging.INFO):
try:
if isinstance(message, dict):
log_data = {
"json_log": message,
}
headers = {"Authorization": f"Bearer {self._api_key}"}
response = requests.post(
f"{self._server_url}/api/log/add", json=log_data, headers=headers
)
if response.status_code != 200:
raise Exception(response.text)
+ else:
+ log_data = {
+ "text_log": message,
+ }
+ headers = {"Authorization": f"Bearer {self._api_key}"}
+ response = requests.post(
+ f"{self._server_url}/api/log/add", json=log_data, headers=headers
+ )
+ if response.status_code != 200:
+ raise Exception(response.text)
except Exception as e:
print(f"Exception in APILogger: {e}")
+ raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 8
Files selected for processing (5)
- examples/using_pandasai_log_server.py (1 hunks)
- pandasai/helpers/query_exec_tracker.py (1 hunks)
- pandasai/schemas/df_config.py (3 hunks)
- pandasai/smart_datalake/init.py (7 hunks)
- tests/test_query_tracker.py (1 hunks)
Files skipped from review due to trivial changes (1)
- examples/using_pandasai_log_server.py
Additional comments (Suppressed): 10
pandasai/schemas/df_config.py (2)
8-14: The
LogServerConfig
class has been added. It's a TypedDict that expectsserver_url
andapi_key
as keys. Ensure that the server URL and API key are being properly validated and secured elsewhere in the code, as they are sensitive data.34-34: The
log_server
attribute has been added to theConfig
class. It's of typeLogServerConfig
. Ensure that all instances of theConfig
class in the codebase are updated to include this new attribute if necessary.- llm: Any = None + llm: Any = None + log_server: LogServerConfig = Nonepandasai/smart_datalake/__init__.py (8)
20-29: The
time
module has been removed and replaced with theQueryExecTracker
class for tracking execution time. Ensure that theQueryExecTracker
class provides the same or better functionality as thetime
module. Also, verify that theQueryExecTracker
class has been thoroughly tested and is reliable.193-195: The
_start_timer
method has been removed. This method was previously used to start a timer for tracking the execution time of a query. This functionality appears to have been replaced by theQueryExecTracker
class. Ensure that theQueryExecTracker
class provides the same or better functionality as the_start_timer
method.282-288: The
QueryExecTracker
object is created with several parameters. Ensure that these parameters are always available and correctly set when this method is called. Also, verify that theQueryExecTracker
class handles these parameters correctly.300-326: The
QueryExecTracker
class is used to execute functions and track their execution time. This is a significant change from the previous implementation, which directly called the functions. Ensure that theQueryExecTracker
class correctly executes the functions and accurately tracks their execution time. Also, verify that theQueryExecTracker
class handles errors and exceptions correctly.343-350: The
QueryExecTracker
class is used to execute theexecute_code
method of the_code_manager
object. Ensure that theQueryExecTracker
class correctly executes this method and accurately tracks its execution time. Also, verify that theQueryExecTracker
class handles errors and exceptions correctly.365-370: The
QueryExecTracker
class is used to execute the_retry_run_code
method. Ensure that theQueryExecTracker
class correctly executes this method and accurately tracks its execution time. Also, verify that theQueryExecTracker
class handles errors and exceptions correctly.379-393: The
QueryExecTracker
class is used to add steps to the tracker. Ensure that theQueryExecTracker
class correctly adds these steps and that the steps are accurately represented in the tracker.409-417: The
QueryExecTracker
class is used to get the execution time and parse the result. Ensure that theQueryExecTracker
class correctly performs these operations and accurately tracks the execution time. Also, verify that theQueryExecTracker
class handles errors and exceptions correctly.
_query_info: str = {} | ||
_dataframes: List = [] | ||
_response: ResponseType = {} | ||
_steps: List = [] | ||
_start_time = None | ||
_success: bool = False | ||
_server_config: dict = None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class variables are shared among all instances of the class. This could lead to data races and inconsistencies if multiple instances of QueryExecTracker
are used concurrently. Consider moving these variables to the __init__
method to make them instance variables.
- _query_info: str = {}
- _dataframes: List = []
- _response: ResponseType = {}
- _steps: List = []
- _start_time = None
- _success: bool = False
- _server_config: dict = None
+ def __init__(
+ self,
+ conversation_id: str,
+ query: str,
+ instance: str,
+ output_type: str,
+ server_config: Union[dict, None] = None,
+ ) -> None:
+ self._query_info: str = {}
+ self._dataframes: List = []
+ self._response: ResponseType = {}
+ self._steps: List = []
+ self._start_time = None
+ self._success: bool = False
+ self._server_config: dict = server_config
def execute_func(self, function, *args, **kwargs) -> Any: | ||
""" | ||
Tracks function executions, calculates execution time and prepare data | ||
Args: | ||
function (function): Function that is to be executed | ||
|
||
Returns: | ||
Any: Response return after function execution | ||
""" | ||
start_time = time.time() | ||
|
||
# Get the tag from kwargs if provided, or use the function name as the default | ||
tag = kwargs.pop("tag", function.__name__) | ||
|
||
try: | ||
result = function(*args, **kwargs) | ||
|
||
execution_time = time.time() - start_time | ||
if tag not in exec_steps: | ||
return result | ||
|
||
step_data = self._generate_exec_step(tag, result) | ||
|
||
step_data["type"] = exec_steps[tag] | ||
step_data["success"] = True | ||
step_data["execution_time"] = execution_time | ||
|
||
self._steps.append(step_data) | ||
|
||
return result | ||
|
||
except Exception: | ||
execution_time = time.time() - start_time | ||
self._steps.append( | ||
{ | ||
"type": exec_steps[tag], | ||
"success": False, | ||
"execution_time": execution_time, | ||
} | ||
) | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execute_func
method is doing too much. It's executing a function, timing it, generating execution steps, and handling exceptions. Consider breaking this method into smaller, more focused methods to improve maintainability and readability.
raise Exception(response.text) | ||
|
||
except Exception as e: | ||
print(f"Exception in APILogger: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid using print
statements for error logging. Consider using a logging library which provides more flexibility and control over log levels, formats, and destinations.
- print(f"Exception in APILogger: {e}")
+ logging.error(f"Exception in APILogger: {e}")
def success(self): | ||
return self._success | ||
|
||
@success.setter | ||
def success(self, value): | ||
self._success = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The success
property setter is not validating the input. Consider adding a check to ensure that value
is a boolean to prevent invalid data types from being assigned.
+ @success.setter
+ def success(self, value):
+ if not isinstance(value, bool):
+ raise ValueError("Success value must be a boolean.")
+ self._success = value
def test_add_dataframes( | ||
self, smart_dataframe: SmartDataframe, tracker: QueryExecTracker | ||
): | ||
# Add the dataframe to the tracker | ||
tracker._dataframes = [] | ||
tracker.add_dataframes([smart_dataframe]) | ||
|
||
# Check if the dataframe was added correctly | ||
assert len(tracker._dataframes) == 1 | ||
assert len(tracker._dataframes[0]["headers"]) == 3 | ||
assert len(tracker._dataframes[0]["rows"]) == 5 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_add_dataframes
is not comprehensive. It only checks the length of the dataframe and the number of headers and rows. It would be better to also check the actual content of the dataframe to ensure it was added correctly.
+ # Check if the dataframe was added correctly
+ assert tracker._dataframes[0]["dataframe"].equals(smart_dataframe)
def mock_function(*args, **kwargs): | ||
time.sleep(1) | ||
return "Mock Result" | ||
|
||
# Sleep for a while to simulate execution time | ||
with patch("time.time", return_value=0): | ||
tracker.execute_func(mock_function, tag="cache_hit") | ||
|
||
# Get the execution time | ||
execution_time = tracker.get_execution_time() | ||
|
||
# Check if the execution time is approximately 1 second | ||
assert_almost_equal(execution_time, 1.0, delta=0.3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_get_execution_time
is not reliable. It assumes that the execution time of the mock_function
is always 1 second. This might not be the case in different environments or under different loads. Consider mocking the time.sleep
function instead of assuming its behavior.
- def mock_function(*args, **kwargs):
- time.sleep(1)
- return "Mock Result"
-
- # Sleep for a while to simulate execution time
- with patch("time.time", return_value=0):
- tracker.execute_func(mock_function, tag="cache_hit")
+ with patch("time.sleep") as mock_sleep:
+ def mock_function(*args, **kwargs):
+ time.sleep(1)
+ return "Mock Result"
+
+ # Sleep for a while to simulate execution time
+ with patch("time.time", return_value=0):
+ tracker.execute_func(mock_function, tag="cache_hit")
+ mock_sleep.assert_called_once_with(1)
def test_publish_method_with_server_key(self, tracker: QueryExecTracker): | ||
# Define a mock summary function | ||
def mock_get_summary(): | ||
return "Test summary data" | ||
|
||
# Mock the server_config | ||
tracker._server_config = { | ||
"server_url": "http://custom-server", | ||
"api_key": "custom-api-key", | ||
} | ||
|
||
# Set the get_summary method to your mock | ||
tracker.get_summary = mock_get_summary | ||
|
||
# Mock the requests.post method | ||
mock_response = MagicMock() | ||
mock_response.status_code = 200 | ||
type(mock_response).text = "Response text" | ||
# Mock the requests.post method | ||
with patch("requests.post", return_value=mock_response) as mock_post: | ||
# Call the publish method | ||
result = tracker.publish() | ||
|
||
# Check that requests.post was called with the expected parameters | ||
mock_post.assert_called_with( | ||
"http://custom-server/api/log/add", | ||
json={"json_log": "Test summary data"}, | ||
headers={"Authorization": "Bearer custom-api-key"}, | ||
) | ||
|
||
# Check the result | ||
assert result is None # The function should return None | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_publish_method_with_server_key
is not comprehensive. It only checks if the requests.post
method was called with the correct parameters. It would be better to also check the response of the requests.post
method to ensure it behaves as expected.
+ # Check the response
+ assert mock_response.status_code == 200
+ assert mock_response.text == "Response text"
def test_publish_method_with_os_env(self, tracker: QueryExecTracker): | ||
# Define a mock summary function | ||
def mock_get_summary(): | ||
return "Test summary data" | ||
|
||
# Define a mock environment for testing | ||
os.environ["LOGGING_SERVER_URL"] = "http://test-server" | ||
os.environ["LOGGING_SERVER_API_KEY"] = "test-api-key" | ||
|
||
# Set the get_summary method to your mock | ||
tracker.get_summary = mock_get_summary | ||
|
||
# Mock the requests.post method | ||
mock_response = MagicMock() | ||
mock_response.status_code = 200 | ||
type(mock_response).text = "Response text" | ||
# Mock the requests.post method | ||
with patch("requests.post", return_value=mock_response) as mock_post: | ||
# Call the publish method | ||
result = tracker.publish() | ||
|
||
# Check that requests.post was called with the expected parameters | ||
mock_post.assert_called_with( | ||
"http://test-server/api/log/add", | ||
json={"json_log": "Test summary data"}, | ||
headers={"Authorization": "Bearer test-api-key"}, | ||
) | ||
|
||
# Check the result | ||
assert result is None # The function should return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_publish_method_with_os_env
is not cleaning up the environment variables it sets. This could potentially affect other tests. Consider using unittest.mock.patch.dict
to temporarily set the environment variables for this test.
- # Define a mock environment for testing
- os.environ["LOGGING_SERVER_URL"] = "http://test-server"
- os.environ["LOGGING_SERVER_API_KEY"] = "test-api-key"
+ # Define a mock environment for testing
+ mock_env = {
+ "LOGGING_SERVER_URL": "http://test-server",
+ "LOGGING_SERVER_API_KEY": "test-api-key",
+ }
+ with patch.dict(os.environ, mock_env):
+ # Set the get_summary method to your mock
+ tracker.get_summary = mock_get_summary
+
+ # Mock the requests.post method
+ mock_response = MagicMock()
+ mock_response.status_code = 200
+ type(mock_response).text = "Response text"
+ # Mock the requests.post method
+ with patch("requests.post", return_value=mock_response) as mock_post:
+ # Call the publish method
+ result = tracker.publish()
CodeRabbit review skipped By default, CodeRabbit only reviews PRs on the default branch. If you wish to have PRs reviewed on additional branches, you can configure that under the repository settings in the UI. |
Summary by CodeRabbit
QueryExecTracker
class inpandasai.helpers.query_exec_tracker
for tracking the execution of queries, maintaining execution steps, and generating a summary of the execution.APILogger
class inpandasai.custom_loggers.api_logger
for sending log messages to a server.pandasai.smart_datalake
to useQueryExecTracker
for tracking query execution, replacing the previous timer-based tracking.QueryExecTracker
intests/test_query_tracker.py
to ensure correct tracking of query execution.examples/custom_logger.py
andexamples/using_pandasai_log_server.py
to demonstrate the usage of the newAPILogger
andQueryExecTracker
classes.