-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat : Airtable connector Support #635
feat : Airtable connector Support #635
Conversation
WalkthroughThis pull request introduces the Changes
TipsChat with CodeRabbit Bot (
|
Codecov Report
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. @@ Coverage Diff @@
## main #635 +/- ##
==========================================
- Coverage 84.43% 83.59% -0.84%
==========================================
Files 65 67 +2
Lines 3019 3188 +169
==========================================
+ Hits 2549 2665 +116
- Misses 470 523 +53
... and 10 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 9
Files selected for processing (5)
- pandasai/connectors/init.py (2 hunks)
- pandasai/connectors/airtable.py (1 hunks)
- pandasai/connectors/base.py (1 hunks)
- pandasai/helpers/openai_info.py (2 hunks)
- tests/connectors/test_airtable.py (1 hunks)
Files skipped from review due to trivial changes (1)
- pandasai/connectors/init.py
Additional comments (Suppressed): 2
pandasai/connectors/base.py (1)
- 23-30: The
AirtableConnectorConfig
class is introduced to hold the configuration for the Airtable connector. Ensure that thetoken
andbaseID
are securely stored and not logged or exposed in any way. Also, consider adding input validation or type checking for these fields to prevent misuse or misconfiguration.pandasai/helpers/openai_info.py (1)
- 45-53: No changes have been made to the function signature or the docstring of
get_openai_token_cost_for_model()
. Ensure that the function's usage elsewhere in the codebase is still valid.
self.assertEqual(self.connector._config, self.config) | ||
self.assertEqual(self.connector._root_url, self.root_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_constructor_and_properties
only checks if the _config
and _root_url
properties are set correctly. It would be beneficial to also verify if other properties like token
, baseID
, and table
are correctly set in the AirtableConnector
instance.
tests/connectors/test_airtable.py
Outdated
], | ||
"offset": "itrowYGFfoBEIob3C/recsxnHUagIce7nB2", | ||
} | ||
print(expected_data_json) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The print
statement seems to be a leftover from debugging and should be removed to keep the test output clean.
- print(expected_data_json)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tanmaypatil123 this is a leftover!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.
tests/connectors/test_airtable.py
Outdated
self.connector.head = Mock(return_value=expected_data) | ||
head_data = self.connector.head() | ||
|
||
self.assertEqual(head_data.equals(expected_data), True) | ||
self.assertLessEqual(len(head_data), 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the test_execute
method, in the test_head
method, you are mocking the head
method of the connector
and then calling it. This doesn't test the actual head
method implementation. Instead, you should mock the requests
library's method that head
uses to make the API call, and then check if head
returns the expected data.
pandasai/connectors/airtable.py
Outdated
def _init_connection(self, config: BaseConnectorConfig): | ||
""" | ||
make connection to database | ||
""" | ||
config = config.dict() | ||
_session = requests.Session() | ||
_session.headers = {"Authorization": f"Bearer {config['token']}"} | ||
url = f"{self._root_url}{config['baseID']}/{config['table']}" | ||
response = _session.head(url=url) | ||
if response.status_code == 200: | ||
self._session = _session | ||
else: | ||
raise ValueError( | ||
f"""Failed to connect to Airtable. | ||
Status code: {response.status_code}, | ||
message: {response.text}""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _init_connection
method is making a HEAD
request to the Airtable API to check the connection. This is not a common practice and might lead to unnecessary network traffic. Consider removing this check or replacing it with a more lightweight method, if available in the Airtable API.
def _init_connection(self, config: BaseConnectorConfig):
"""
make connection to database
"""
config = config.dict()
_session = requests.Session()
_session.headers = {"Authorization": f"Bearer {config['token']}"}
url = f"{self._root_url}{config['baseID']}/{config['table']}"
response = _session.head(url=url)
if response.status_code == 200:
self._session = _session
else:
raise ValueError(
f"""Failed to connect to Airtable.
Status code: {response.status_code},
message: {response.text}"""
)
pandasai/connectors/airtable.py
Outdated
def head(self): | ||
""" | ||
Return the head of the table that | ||
the connector is connected to. | ||
|
||
Returns : | ||
DatFrameType: The head of the data source | ||
that the conector is connected to . | ||
""" | ||
return self._response.head() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The head
method is returning the head of the _response
DataFrame. However, it's not checking if _response
is None
or empty. This could lead to an AttributeError
if head
is called before execute
. Consider adding a check for this.
def head(self):
"""
Return the head of the table that
the connector is connected to.
Returns :
DatFrameType: The head of the data source
that the conector is connected to .
"""
if self._response is not None:
return self._response.head()
else:
raise ValueError("No data available. Execute the connector first.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Files selected for processing (1)
- tests/connectors/test_airtable.py (1 hunks)
Additional comments (Suppressed): 1
tests/connectors/test_airtable.py (1)
- 11-13: The
dict()
method is used on theAirtableConnectorConfig
instance, but it's not clear if this method is defined in theAirtableConnectorConfig
class. If it's not defined, this will raise anAttributeError
. Please verify.
tests/connectors/test_airtable.py
Outdated
self.connector.execute = Mock(return_value=expected_data) | ||
execute_data = self.connector.execute() | ||
self.assertEqual(execute_data.equals(expected_data), True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execute
method is mocked to return expected_data
, and then it's called and the returned data is compared with expected_data
. This test will always pass and doesn't test the actual functionality of the execute
method. Consider mocking the requests
library's methods instead, and then test if execute
returns the correct data.
tests/connectors/test_airtable.py
Outdated
self.connector.head = Mock(return_value=expected_data) | ||
head_data = self.connector.head() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the execute
method, the head
method is mocked to return expected_data
, and then it's called and the returned data is compared with expected_data
. This test will always pass and doesn't test the actual functionality of the head
method. Consider mocking the requests
library's methods instead, and then test if head
returns the correct data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tanmaypatil123 mocking the method that you are testing is a bad practice, as it always returns the value you are forcing it to return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to increase the test coverage and let's add an example (from_airtable.py) and also let's add it on the connectors.md file
|
||
def __init__( | ||
self, | ||
config: Optional[Union[AirtableConnectorConfig, dict]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a cache mechanism similar to the other connectors!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tanmaypatil123 check out yahoo finance connector for references
pandasai/connectors/airtable.py
Outdated
elif not config: | ||
airtable_env_vars = { | ||
"token": "AIRTABLE_AUTH_TOKEN", | ||
"baseID": "AIRTABLE_BASE_ID", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make baseID
it snake case (base_id
)?
pandasai/connectors/airtable.py
Outdated
make connection to database | ||
""" | ||
config = config.dict() | ||
_session = requests.Session() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is _session
underscored?
message: {_response.text}""" | ||
) | ||
return self._response | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add a def fallback_name
, similarly to all the other connectors.
Example:
@property
def fallback_name(self):
"""
Return the fallback name of the connector.
Returns:
str: The fallback name of the connector.
"""
return self._config.table
pandasai/connectors/airtable.py
Outdated
DatFrameType: The head of the data source | ||
that the conector is connected to . | ||
""" | ||
return self._response.head() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The head method should be independent. What happens if a _response
hasn't been defined yet?
pandasai/connectors/airtable.py
Outdated
|
||
elif not config: | ||
airtable_env_vars = { | ||
"token": "AIRTABLE_AUTH_TOKEN", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename to api_key
to be consistent with the variable names in the LLMs.
pandasai/connectors/base.py
Outdated
Connecter configuration for Airtable data. | ||
""" | ||
|
||
token: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
api_key
pandasai/connectors/base.py
Outdated
""" | ||
|
||
token: str | ||
baseID: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base_id
Additionally:
|
Also, we should fallback to the env variables for base id and token, similarly to what we've done in the other connectors (i.e. mysql). |
Btw did you try it? It doesn't seem to work for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 11
Files selected for processing (5)
- docs/connectors.md (1 hunks)
- examples/from_airtable.py (1 hunks)
- pandasai/connectors/airtable.py (1 hunks)
- pandasai/connectors/base.py (1 hunks)
- tests/connectors/test_airtable.py (1 hunks)
Additional comments (Suppressed): 8
pandasai/connectors/base.py (2)
23-30: The new
AirtableConnectorConfig
class is introduced to handle the configuration for the Airtable connector. Ensure that theapi_key
,base_id
, anddatabase
attributes are correctly used in theAirtableConnector
class and that theapi_key
andbase_id
are properly secured and not exposed in any logs or error messages.33-35: The
SQLBaseConnectorConfig
class is unchanged. No action required.examples/from_airtable.py (4)
6-16: The
AirtableConnector
is initialized with a configuration dictionary. Ensure that theapi_key
,table
, andbase_id
values are replaced with actual values before running the code. Also, thewhere
clause is optional and can be used to filter the data. If not needed, it can be removed.19-19: The
OpenAI
object is initialized with an API key. Make sure to replace"OPENAI_API_KEY"
with your actual OpenAI API key.20-20: The
SmartDataframe
is initialized with theAirtableConnector
and a configuration dictionary. Thellm
key in the configuration dictionary is set to theOpenAI
object. This is a good practice as it allows for easy configuration of different components.22-22: The
chat
method is called on theSmartDataframe
object with a query string. This is a good practice as it allows for easy querying of the data. However, ensure that the query string is relevant to your data.docs/connectors.md (2)
187-189: The new hunk starts with a continuation of the previous section. Ensure that the context is clear for readers.
193-194: The introduction of the Airtable connector is clear and concise, explaining its purpose and usage.
tests/connectors/test_airtable.py
Outdated
@patch("requests.get") | ||
def test_columns_count_property(self, mock_request_get): | ||
# Test columns_count property | ||
mock_request_get.return_value.json.return_value = json.loads( | ||
self.expected_data_json | ||
) | ||
mock_request_get.return_value.status_code = 200 | ||
rows_count = self.connector.columns_count | ||
self.assertEqual(rows_count, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test_columns_count_property
method, you are testing the columns_count
property. This is a good practice. However, consider adding a test case for when the JSON response does not contain the expected data to ensure your code handles such scenarios correctly.
from pandasai.connectors import AirtableConnector | ||
from pandasai import SmartDataframe | ||
|
||
|
||
airtable_connectors = AirtableConnector( | ||
config={ | ||
"api_key": "AIRTABLE_API_TOKEN", | ||
"table":"AIRTABLE_TABLE_NAME", | ||
"base_id":"AIRTABLE_BASE_ID", | ||
"where" : [ | ||
# this is optional and filters the data to | ||
# reduce the size of the dataframe | ||
["Status" ,"=","In progress"] | ||
] | ||
} | ||
) | ||
|
||
df = SmartDataframe(airtable_connectors) | ||
|
||
df.chat("How many rows are there in data ?") | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code snippet demonstrates the usage of the Airtable connector. It's clear and easy to understand. However, the api_key
, table
, and base_id
are hardcoded. In a real-world scenario, these should be stored securely and not exposed in the code.
airtable_connectors = AirtableConnector(
config={
- "api_key": "AIRTABLE_API_TOKEN",
- "table":"AIRTABLE_TABLE_NAME",
- "base_id":"AIRTABLE_BASE_ID",
+ "api_key": os.getenv("AIRTABLE_API_TOKEN"),
+ "table": os.getenv("AIRTABLE_TABLE_NAME"),
+ "base_id": os.getenv("AIRTABLE_BASE_ID"),
"where" : [
# this is optional and filters the data to
# reduce the size of the dataframe
["Status" ,"=","In progress"]
]
}
)
This change uses the os.getenv
function to retrieve the values from environment variables, which is a common practice for handling sensitive data like API keys.
pandasai/connectors/airtable.py
Outdated
if config["api_key"] and config["base_id"] and config["table"]: | ||
config = AirtableConnectorConfig(**config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code assumes that if config
is a dictionary, it will always contain the keys "api_key", "base_id", and "table". If any of these keys are missing, a KeyError
will be raised. Consider adding error handling to check for the presence of these keys before accessing them.
- if config["api_key"] and config["base_id"] and config["table"]:
+ if "api_key" in config and "base_id" in config and "table" in config:
pandasai/connectors/airtable.py
Outdated
url=url, headers={"Authorization": f"Bearer {config['api_key']}"} | ||
) | ||
if response.status_code == 200: | ||
self.logger.log( | ||
""" | ||
Connected to Airtable. | ||
""" | ||
) | ||
else: | ||
raise ValueError( | ||
f"""Failed to connect to Airtable. | ||
Status code: {response.status_code}, | ||
message: {response.text}""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code does not handle potential exceptions that could be raised by the requests.head
call. Consider adding a try-except block to handle potential requests.exceptions.RequestException
exceptions.
- response = requests.head(
- url=url, headers={"Authorization": f"Bearer {config['api_key']}"}
- )
+ try:
+ response = requests.head(
+ url=url, headers={"Authorization": f"Bearer {config['api_key']}"}
+ )
+ except requests.exceptions.RequestException as e:
+ raise ValueError(f"Failed to connect to Airtable: {str(e)}")
pandasai/connectors/airtable.py
Outdated
response = requests.get( | ||
url=url, headers={"Authorization": f"Bearer {self._config.api_key}"} | ||
) | ||
if response.status_code == 200: | ||
data = response.json() | ||
data = self.preprocess(data=data) | ||
self._save_cache(data) | ||
else: | ||
raise ValueError( | ||
f"""Failed to connect to Airtable. | ||
Status code: {response.status_code}, | ||
message: {response.text}""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code does not handle potential exceptions that could be raised by the requests.get
call. Consider adding a try-except block to handle potential requests.exceptions.RequestException
exceptions.
- response = requests.get(
- url=url, headers={"Authorization": f"Bearer {self._config.api_key}"}
- )
+ try:
+ response = requests.get(
+ url=url, headers={"Authorization": f"Bearer {self._config.api_key}"}
+ )
+ except requests.exceptions.RequestException as e:
+ raise ValueError(f"Failed to connect to Airtable: {str(e)}")
pandasai/connectors/airtable.py
Outdated
response = requests.get( | ||
url=url, headers={"Authorization": f"Bearer {self._config.api_key}"} | ||
) | ||
if response.status_code == 200: | ||
data = response.json() | ||
data = self.preprocess(data=data) | ||
else: | ||
raise ValueError( | ||
f"""Failed to connect to Airtable. | ||
Status code: {response.status_code}, | ||
message: {response.text}""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code does not handle potential exceptions that could be raised by the requests.get
call. Consider adding a try-except block to handle potential requests.exceptions.RequestException
exceptions.
- response = requests.get(
- url=url, headers={"Authorization": f"Bearer {self._config.api_key}"}
- )
+ try:
+ response = requests.get(
+ url=url, headers={"Authorization": f"Bearer {self._config.api_key}"}
+ )
+ except requests.exceptions.RequestException as e:
+ raise ValueError(f"Failed to connect to Airtable: {str(e)}")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 10
Files selected for processing (5)
- docs/connectors.md (1 hunks)
- examples/from_airtable.py (1 hunks)
- pandasai/connectors/airtable.py (1 hunks)
- pandasai/connectors/base.py (1 hunks)
- pandasai/exceptions.py (1 hunks)
Additional comments (Suppressed): 5
pandasai/exceptions.py (2)
8-16: The new
InvalidRequestError
exception is introduced to handle unsuccessful requests. Ensure that this exception is properly caught and handled in the code where requests are made. Also, the docstring for the exception could be improved to provide more information about when and why this exception might be raised.18-20: The
APIKeyNotFoundError
exception has been moved down in the code. Ensure that this does not affect any imports or usage of this exception in other parts of the codebase.pandasai/connectors/base.py (1)
- 23-33: The
AirtableConnectorConfig
class is well defined with appropriate attributes. However, ensure that theapi_key
,base_id
, anddatabase
are properly validated before use to prevent potential security issues or errors.docs/connectors.md (1)
- 187-189: The new hunk starts with a code snippet that seems to be a continuation from a previous section. Ensure that this transition is smooth and the context is clear to the reader.
pandasai/connectors/airtable.py (1)
- 149-163: The code makes a GET request to the Airtable API to fetch data. However, it does not handle
examples/from_airtable.py
Outdated
airtable_connectors = AirtableConnector( | ||
config={ | ||
"api_key": "AIRTABLE_API_TOKEN", | ||
"table": "AIRTABLE_TABLE_NAME", | ||
"base_id": "AIRTABLE_BASE_ID", | ||
"where": [ | ||
# this is optional and filters the data to | ||
# reduce the size of the dataframe | ||
["Status", "==", "In progress"] | ||
], | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AirtableConnector
is initialized with a hardcoded configuration. This might not be an issue if this is just an example, but in a production environment, sensitive data like api_key
and base_id
should be stored securely and not hardcoded. Consider using environment variables or a secure configuration management system.
} | ||
) | ||
|
||
llm = OpenAI("OPENAI_API_KEY") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OpenAI
instance is initialized with a hardcoded API key. As with the AirtableConnector
, this is a security concern. API keys should be stored securely and not hardcoded.
llm = OpenAI("OPENAI_API_KEY") | ||
df = SmartDataframe(airtable_connectors, config={"llm": llm}) | ||
|
||
response = df.chat("How many rows are there in data ?") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chat
method is called on the SmartDataframe
object without any error handling. If an error occurs during the execution of this method, it could cause the program to crash. Consider wrapping this call in a try/except block to handle potential exceptions.
@@ -20,6 +20,17 @@ class BaseConnectorConfig(BaseModel): | |||
where: list[list[str]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The where
parameter is not used in the AirtableConnectorConfig
class. If it's not needed, consider removing it to avoid confusion.
class SQLBaseConnectorConfig(BaseConnectorConfig): | ||
""" | ||
Base Connector configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SQLBaseConnectorConfig
class seems to be duplicated. If it's not intended, consider removing it to maintain code cleanliness.
docs/connectors.md
Outdated
The Airtable connector allows you to connect to Airtable Projects Tables, by simply passing the `base_id` , `api_key` and `table_name` of the table you want to analyze. | ||
|
||
To use the Airtable connector, you only need to import it into your Python code and pass it to a `SmartDataframe` or `SmartDatalake` object: | ||
|
||
```python | ||
from pandasai.connectors import AirtableConnector | ||
from pandasai import SmartDataframe | ||
|
||
|
||
airtable_connectors = AirtableConnector( | ||
config={ | ||
"api_key": "AIRTABLE_API_TOKEN", | ||
"table":"AIRTABLE_TABLE_NAME", | ||
"base_id":"AIRTABLE_BASE_ID", | ||
"where" : [ | ||
# this is optional and filters the data to | ||
# reduce the size of the dataframe | ||
["Status" ,"==","In progress"] | ||
] | ||
} | ||
) | ||
|
||
df = SmartDataframe(airtable_connectors) | ||
|
||
df.chat("How many rows are there in data ?") | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new hunk introduces the Airtable connector and provides a code snippet demonstrating its usage. The code seems to be correct and well-explained. However, the api_key
, table
, and base_id
are hardcoded strings. It's recommended to use environment variables or secure storage for sensitive data like API keys.
airtable_connectors = AirtableConnector(
config={
- "api_key": "AIRTABLE_API_TOKEN",
- "table":"AIRTABLE_TABLE_NAME",
- "base_id":"AIRTABLE_BASE_ID",
+ "api_key": os.getenv("AIRTABLE_API_TOKEN"),
+ "table": os.getenv("AIRTABLE_TABLE_NAME"),
+ "base_id": os.getenv("AIRTABLE_BASE_ID"),
"where" : [
# this is optional and filters the data to
# reduce the size of the dataframe
["Status" ,"==","In progress"]
]
}
)
Don't forget to import the os
module at the beginning of your script.
url=url, headers={"Authorization": f"Bearer {config['api_key']}"} | ||
) | ||
if response.status_code == 200: | ||
self.logger.log( | ||
""" | ||
Connected to Airtable. | ||
""" | ||
) | ||
else: | ||
raise InvalidRequestError( | ||
f"""Failed to connect to Airtable. | ||
Status code: {response.status_code}, | ||
message: {response.text}""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code makes a HEAD request to the Airtable API to check the connection. However, it does not handle potential network errors that could occur during the request, such as a timeout or a connection error. It would be better to wrap the request in a try/except block and handle these potential errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 5
Files selected for processing (2)
- pandasai/connectors/airtable.py (1 hunks)
- pandasai/connectors/base.py (1 hunks)
Additional comments (Suppressed): 2
pandasai/connectors/base.py (2)
23-30: The
AirtableConnectorConfig
class has been added to handle the configuration for the Airtable connector. Ensure that theapi_key
,base_id
, anddatabase
attributes are correctly used in theAirtableConnector
class and that theapi_key
andbase_id
are properly secured and not exposed in any logs or error messages.33-35: The
SQLBaseConnectorConfig
class has not been modified in this hunk. Ensure that any changes to this class in other hunks do not affect its current functionality.
pandasai/connectors/airtable.py
Outdated
cache_path = self._get_cache_path() | ||
if not os.path.exists(cache_path): | ||
return None | ||
|
||
# If the file is older than 1 day , delete it. | ||
if os.path.getmtime(cache_path) < time.time() - self._cache_interval: | ||
if self.logger: | ||
self.logger.log(f"Deleting expired cached data from {cache_path}") | ||
os.remove(cache_path) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache invalidation logic is based on the modification time of the cache file. However, this approach might not work as expected if the system time is changed. A more reliable approach would be to store the cache creation time within the cache file itself and use that for invalidation.
pandasai/connectors/airtable.py
Outdated
url = f"{self._root_url}{self._config.base_id}/{self._config.table}" | ||
response = requests.get( | ||
url=url, | ||
headers={"Authorization": f"Bearer {self._config.api_key}"}, | ||
) | ||
if response.status_code == 200: | ||
data = response.json() | ||
data = self.preprocess(data=data) | ||
self._save_cache(data) | ||
else: | ||
raise InvalidRequestError( | ||
f"""Failed to connect to Airtable. | ||
Status code: {response.status_code}, | ||
message: {response.text}""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code does not handle potential network errors that might occur during the requests.get call. It would be better to wrap this call in a try-except block and handle potential network errors.
- response = requests.get(
- url=url,
- headers={"Authorization": f"Bearer {self._config.api_key}"},
- )
- if response.status_code == 200:
- data = response.json()
- data = self.preprocess(data=data)
- self._save_cache(data)
- else:
- raise InvalidRequestError(
- f"""Failed to connect to Airtable.
- Status code: {response.status_code},
- message: {response.text}"""
- )
+ try:
+ response = requests.get(
+ url=url,
+ headers={"Authorization": f"Bearer {self._config.api_key}"},
+ )
+ response.raise_for_status()
+ data = response.json()
+ data = self.preprocess(data=data)
+ self._save_cache(data)
+ except requests.exceptions.RequestException as e:
+ raise InvalidRequestError(
+ f"""Failed to connect to Airtable.
+ Error: {str(e)}"""
+ )
pandasai/connectors/airtable.py
Outdated
if self._config.where: | ||
for i in self._config.where: | ||
filter_string = f"{i[0]} {i[1]} '{i[2]}'" | ||
df = df.query(filter_string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code assumes that the 'where' attribute in the config is a list of conditions. However, it does not validate if this is the case. If 'where' is not a list or does not contain valid conditions, the code will fail at runtime. It would be better to add a check to ensure 'where' is a list and contains valid conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __init__( | ||
self, | ||
config: Optional[Union[AirtableConnectorConfig, dict]] = None, | ||
cache_interval: int = 600, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache_interval
parameter is not documented in the function docstring. Please add a description for it.
if isinstance(config, dict): | ||
if "api_key" in config and "base_id" in config and "table" in config: | ||
config = AirtableConnectorConfig(**config) | ||
else: | ||
raise KeyError( | ||
"Please specify all api_key,table,base_id properly in config ." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check for the presence of "api_key", "base_id", and "table" in the config dictionary is not robust. If any of these keys have a value that evaluates to False
(like an empty string), the condition will fail. Consider using in
to check for key existence.
elif not config: | ||
airtable_env_vars = { | ||
"api_key": "AIRTABLE_API_TOKEN", | ||
"base_id": "AIRTABLE_BASE_ID", | ||
"table": "AIRTABLE_TABLE_NAME", | ||
} | ||
config = AirtableConnectorConfig( | ||
**self._populate_config_from_env(config, airtable_env_vars) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config
parameter is not used in the _populate_config_from_env
method call. If it's not needed, consider removing it.
cache_dir = os.path.join(os.getcwd(), "") | ||
try: | ||
cache_dir = os.path.join((find_project_root()), "cache") | ||
except ValueError: | ||
cache_dir = os.path.join(os.getcwd(), "cache") | ||
return os.path.join(cache_dir, f"{self._config.table}_data.parquet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _get_cache_path
method does not handle potential os
exceptions. Consider adding a try-except block to handle possible file system errors.
pandasai/connectors/airtable.py
Outdated
""" | ||
Feteches data from airtable server through | ||
API and converts it to DataFrame. | ||
""" | ||
url = f"{self._root_url}{self._config.base_id}/{self._config.table}" | ||
params = {} | ||
|
||
if self._config.where: | ||
params["filterByFormula"] = self.build_formula() | ||
response = requests.get( | ||
url=url, | ||
headers={"Authorization": f"Bearer {self._config.api_key}"}, | ||
params=params, | ||
) | ||
if response.status_code == 200: | ||
data = response.json() | ||
data = self.preprocess(data=data) | ||
self._save_cache(data) | ||
else: | ||
raise InvalidRequestError( | ||
f"""Failed to connect to Airtable. | ||
Status code: {response.status_code}, | ||
message: {response.text}""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fetch_data
method does not handle potential requests
exceptions. Consider adding a try-except block to handle possible network errors or timeouts.
df = pd.DataFrame(records) | ||
return df | ||
|
||
def head(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be cache using @cache for reference check sql.py
pandasai/connectors/airtable.py
Outdated
DatFrameType: The head of the data source | ||
that the conector is connected to . | ||
""" | ||
return self.fetch_data().head() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here fetch only specific records lets say first 5 or random 5 rows instead of all.
int: The number of rows in the data source that the connector is | ||
connected to. | ||
""" | ||
data = self.execute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep properties of rows_count and store it in instance so we don't have to call again and again
pandasai/connectors/airtable.py
Outdated
int: The number of columns in the data source that the connector is | ||
connected to. | ||
""" | ||
data = self.execute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same like rows count to not call again and again. For this use head function instead.
pandasai/connectors/airtable.py
Outdated
int: The hash code that is unique to the columns of the data source | ||
that the connector is connected to. | ||
""" | ||
data = self.execute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should also add where queries as well. Otherwise it will return wrong data from the cache.
pandasai/connectors/airtable.py
Outdated
Returns: | ||
DataFrameType: The result of the connector. | ||
""" | ||
return self.fetch_data() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have cached it but never used it. You can for the cache like:
cached = self._cached() or self._cached(include_additional_filters=True)
if cached:
return pd.read_parquet(cached)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 10
Files selected for processing (3)
- docs/connectors.md (1 hunks)
- examples/from_airtable.py (1 hunks)
- pandasai/connectors/airtable.py (1 hunks)
Files skipped from review due to trivial changes (1)
- docs/connectors.md
Additional comments (Suppressed): 3
examples/from_airtable.py (3)
6-16: The
AirtableConnector
is initialized with a configuration dictionary. Ensure that theapi_key
,table
, andbase_id
values are replaced with actual values before running the script. Also, thewhere
clause is optional and can be used to filter the data. If not needed, it can be removed.19-19: The
OpenAI
class is initialized with an API key. Make sure to replace"OPENAI_API_KEY"
with your actual OpenAI API key.22-22: The
chat
method is called on theSmartDataframe
object. Ensure that the question passed to thechat
method is relevant to your data.
|
||
elif not config: | ||
airtable_env_vars = { | ||
"api_key": "AIRTABLE_API_TOKEN", | ||
"base_id": "AIRTABLE_BASE_ID", | ||
"table": "AIRTABLE_TABLE_NAME", | ||
} | ||
config = AirtableConnectorConfig( | ||
**self._populate_config_from_env(config, airtable_env_vars) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is not handling the case where the config
is not a dictionary and is not None
. This could lead to unexpected behavior if a user passes an invalid configuration. Consider adding an else
clause to handle this case.
config = config.dict() | ||
url = f"{self._root_url}{config['base_id']}/{config['table']}" | ||
response = requests.head( | ||
url=url, headers={"Authorization": f"Bearer {config['api_key']}"} | ||
) | ||
if response.status_code == 200: | ||
self.logger.log( | ||
""" | ||
Connected to Airtable. | ||
""" | ||
) | ||
else: | ||
raise InvalidRequestError( | ||
f"""Failed to connect to Airtable. | ||
Status code: {response.status_code}, | ||
message: {response.text}""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _init_connection
method is making a HEAD request to the Airtable API to check the connection. This is a good practice, but it would be better to handle more specific HTTP status codes. For example, a 401 status code means that the API key is invalid, and a 404 status code means that the base or table does not exist. Handling these specific cases can provide more informative error messages to the user.
# If the file is older than 1 day , delete it. | ||
if os.path.getmtime(cache_path) < time.time() - self._cache_interval: | ||
if self.logger: | ||
self.logger.log(f"Deleting expired cached data from {cache_path}") | ||
os.remove(cache_path) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache invalidation logic is based on the modification time of the cache file. This could lead to unexpected behavior if the cache file is manually modified. Consider using a separate metadata file or a database to store the cache creation time.
pandasai/connectors/airtable.py
Outdated
cached = self._cached() or self._cached(include_additional_filters=True) | ||
if cached: | ||
return pd.read_parquet(cached) | ||
|
||
if isinstance(self.instance, pd.DataFrame): | ||
return self.instance | ||
else: | ||
self.instance = self.fetch_data() | ||
return self.instance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execute
method is checking if self.instance
is a DataFrame and returning it if it is. This could lead to unexpected behavior if self.instance
is modified elsewhere in the code. Consider always fetching the data from the server or the cache in the execute
method.
pandasai/connectors/airtable.py
Outdated
|
||
def fetch_data(self): | ||
""" | ||
Feteches data from airtable server through | ||
API and converts it to DataFrame. | ||
""" | ||
url = f"{self._root_url}{self._config.base_id}/{self._config.table}" | ||
params = {} | ||
if self._config.where: | ||
params["filterByFormula"] = self.build_formula() | ||
response = requests.get( | ||
url=url, | ||
headers={"Authorization": f"Bearer {self._config.api_key}"}, | ||
params=params, | ||
) | ||
if response.status_code == 200: | ||
data = response.json() | ||
data = self.preprocess(data=data) | ||
self._save_cache(data) | ||
else: | ||
raise InvalidRequestError( | ||
f"""Failed to connect to Airtable. | ||
Status code: {response.status_code}, | ||
message: {response.text}""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fetch_data
method is making a GET request to the Airtable API to fetch the data. This is a good practice, but it would be better to handle more specific HTTP status codes, similar to the _init_connection
method.
pandasai/connectors/airtable.py
Outdated
records = [ | ||
{"id": record["id"], **record["fields"]} for record in data["records"] | ||
] | ||
|
||
df = pd.DataFrame(records) | ||
return df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preprocess
method is transforming the JSON response from the Airtable API into a DataFrame. This is a good practice, but it assumes that every record in the response has an id
field and a fields
field. Consider adding error handling for cases where these fields are missing.
def head(self): | ||
""" | ||
Return the head of the table that | ||
the connector is connected to. | ||
|
||
Returns : | ||
DatFrameType: The head of the data source | ||
that the conector is connected to . | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The head
method is decorated with the cache
decorator from the functools
module. This means that the method result is cached and the same result is returned for subsequent calls with the same arguments. However, the head
method does not take any arguments, so the caching does not have any effect. Consider removing the cache
decorator.
pandasai/connectors/airtable.py
Outdated
@cached_property | ||
def rows_count(self): | ||
""" | ||
Return the number of rows in the data source that the connector is | ||
connected to. | ||
|
||
Returns: | ||
int: The number of rows in the data source that the connector is | ||
connected to. | ||
""" | ||
if self._rows_count is not None: | ||
return self._rows_count | ||
data = self.execute() | ||
self._rows_count = len(data) | ||
return self._rows_count | ||
|
||
@cached_property | ||
def columns_count(self): | ||
""" | ||
Return the number of columns in the data source that the connector is | ||
connected to. | ||
|
||
Returns: | ||
int: The number of columns in the data source that the connector is | ||
connected to. | ||
""" | ||
if self._columns_count is not None: | ||
return self._columns_count | ||
data = self.execute() | ||
self._columns_count = len(data.columns) | ||
return self._columns_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rows_count
and columns_count
methods are decorated with the cached_property
decorator from the functools
module. This is a good practice for properties that are expensive to compute. However, these methods are also storing their results in instance variables (_rows_count
and _columns_count
). This is redundant and could lead to inconsistencies if these instance variables are modified elsewhere in the code. Consider removing the instance variables and relying solely on the cached_property
decorator.
def column_hash(self): | ||
""" | ||
Return the hash code that is unique to the columns of the data source | ||
that the connector is connected to. | ||
|
||
Returns: | ||
int: The hash code that is unique to the columns of the data source | ||
that the connector is connected to. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The column_hash
property is computing a hash of the column names and the filter formula. This is a good practice for caching purposes. However, it assumes that self.instance
is a DataFrame. Consider adding error handling for cases where self.instance
is not a DataFrame.
pandasai/connectors/airtable.py
Outdated
@@ -225,8 +246,11 @@ def columns_count(self): | |||
int: The number of columns in the data source that the connector is | |||
connected to. | |||
""" | |||
if self._columns_count is not None: | |||
return self._columns_count | |||
data = self.execute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here for column count get head() as it is also cached using @cache also it will have less data
pandasai/connectors/airtable.py
Outdated
if cached: | ||
return pd.read_parquet(cached) | ||
|
||
if isinstance(self.instance, pd.DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This self.instance is not need because we are already caching it
pandasai/connectors/airtable.py
Outdated
|
||
|
||
class AirtableConnector(BaseConnector): | ||
""" | ||
Airtable connector to retrieving record data. | ||
""" | ||
|
||
_rows_count: int = None | ||
_columns_count: int = None | ||
instance = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is extra because we already cache the dataframe
pandasai/connectors/airtable.py
Outdated
@@ -200,9 +213,14 @@ def head(self): | |||
DatFrameType: The head of the data source | |||
that the conector is connected to . | |||
""" | |||
return self.fetch_data().head() | |||
# return self.fetch_data().head() | |||
if isinstance(self.instance, pd.DataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets remove this instance for the head call api with params limit = 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 10
Files selected for processing (2)
- pandasai/connectors/airtable.py (1 hunks)
- tests/connectors/test_airtable.py (1 hunks)
Additional comments (Suppressed): 1
pandasai/connectors/airtable.py (1)
- 39-47: The code checks if the
config
isNone
and if so, it tries to populate it from environment variables. However, the_populate_config_from_env
method is not defined in this class or its parent class. Please ensure that this method is implemented and correctly populates the configuration from environment variables.
tests/connectors/test_airtable.py
Outdated
@patch("requests.get") | ||
def test_columns_count_property(self, mock_request_get): | ||
# Test columns_count property | ||
mock_request_get.return_value.json.return_value = json.loads( | ||
self.expected_data_json | ||
) | ||
mock_request_get.return_value.status_code = 200 | ||
rows_count = self.connector.columns_count | ||
self.assertEqual(rows_count, 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test_columns_count_property
method, you are testing the columns_count
property of the AirtableConnector
class. This is a good practice. However, you are not asserting that the requests.get
method was called with the correct arguments. Consider adding an assertion to check that the correct URL and headers were passed to the requests.get
method.
def __init__( | ||
self, | ||
config: Optional[Union[AirtableConnectorConfig, dict]] = None, | ||
cache_interval: int = 600, | ||
): | ||
if isinstance(config, dict): | ||
if "api_key" in config and "base_id" in config and "table" in config: | ||
config = AirtableConnectorConfig(**config) | ||
else: | ||
raise KeyError( | ||
"Please specify all api_key,table,base_id properly in config ." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __init__
method checks if the config
parameter is a dictionary and if it contains the keys api_key
, base_id
, and table
. If not, it raises a KeyError
. This could be improved by providing a more specific error message for each missing key. This way, the user will know exactly which key is missing from the configuration.
- if "api_key" in config and "base_id" in config and "table" in config:
- config = AirtableConnectorConfig(**config)
- else:
- raise KeyError(
- "Please specify all api_key,table,base_id properly in config ."
- )
+ missing_keys = [key for key in ["api_key", "base_id", "table"] if key not in config]
+ if missing_keys:
+ raise KeyError(f"Missing keys in config: {', '.join(missing_keys)}")
+ config = AirtableConnectorConfig(**config)
def _init_connection(self, config: BaseConnectorConfig): | ||
""" | ||
make connection to database | ||
""" | ||
config = config.dict() | ||
url = f"{self._root_url}{config['base_id']}/{config['table']}" | ||
response = requests.head( | ||
url=url, headers={"Authorization": f"Bearer {config['api_key']}"} | ||
) | ||
if response.status_code == 200: | ||
self.logger.log( | ||
""" | ||
Connected to Airtable. | ||
""" | ||
) | ||
else: | ||
raise InvalidRequestError( | ||
f"""Failed to connect to Airtable. | ||
Status code: {response.status_code}, | ||
message: {response.text}""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _init_connection
method sends a HEAD
request to the Airtable API to check the connection. If the status code is not 200, it raises an InvalidRequestError
. However, the error message could be improved by including the URL that failed to connect. This would help in debugging connection issues.
- f"""Failed to connect to Airtable.
- Status code: {response.status_code},
- message: {response.text}"""
+ f"""Failed to connect to Airtable at {url}.
+ Status code: {response.status_code},
+ message: {response.text}"""
def _cached(self, include_additional_filters: bool = False): | ||
""" | ||
Returns the cached Airtable data if it exists and | ||
is not older than the cache interval. | ||
|
||
Returns : | ||
DataFrame | None : The cached data if | ||
it exists and is not older than the cache | ||
interval, None otherwise. | ||
""" | ||
cache_path = self._get_cache_path(include_additional_filters) | ||
if not os.path.exists(cache_path): | ||
return None | ||
|
||
# If the file is older than 1 day , delete it. | ||
if os.path.getmtime(cache_path) < time.time() - self._cache_interval: | ||
if self.logger: | ||
self.logger.log(f"Deleting expired cached data from {cache_path}") | ||
os.remove(cache_path) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _cached
method checks if the cache file exists and if it's older than the cache interval. If it's older, it deletes the file. However, it doesn't handle the case where the file deletion fails. It would be better to handle this case and log an error message.
- os.remove(cache_path)
+ try:
+ os.remove(cache_path)
+ except OSError as e:
+ if self.logger:
+ self.logger.log(f"Failed to delete expired cache file {cache_path}: {str(e)}")
pandasai/connectors/airtable.py
Outdated
data = [] | ||
while True: | ||
response = self._request_api(params=params) | ||
|
||
if response.status_code == 200: | ||
res = response.json() | ||
data.append(res) | ||
if len(res["records"]) < 100: | ||
break | ||
else: | ||
raise InvalidRequestError( | ||
f"""Failed to connect to Airtable. | ||
Status code: {response.status_code}, | ||
message: {response.text}""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _fetch_data
method sends a GET
request to the Airtable API to fetch data. If the status code is not 200, it raises an InvalidRequestError
. Similar to the _init_connection
method, the error message could be improved by including the URL that failed to fetch data.
- f"""Failed to connect to Airtable.
- Status code: {response.status_code},
- message: {response.text}"""
+ f"""Failed to fetch data from Airtable at {url}.
+ Status code: {response.status_code},
+ message: {response.text}"""
pandasai/connectors/airtable.py
Outdated
def _preprocess(self, data): | ||
""" | ||
Preprocesses Json response data | ||
To prepare dataframe correctly. | ||
""" | ||
columns = set() | ||
data_dict_list = [] | ||
for item in data: | ||
for entry in item["records"]: | ||
data_dict = {"id": entry["id"], "createdTime": entry["createdTime"]} | ||
for field_name, field_value in entry["fields"].items(): | ||
data_dict[field_name] = field_value | ||
columns.add(field_name) | ||
data_dict_list.append(data_dict) | ||
|
||
df = pd.DataFrame(data_dict_list) | ||
return df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _preprocess
method processes the JSON response data from the Airtable API and converts it to a DataFrame. However, it doesn't handle the case where the fields
key is missing in the entry
dictionary. This could lead to a KeyError
. It would be better to handle this case and set a default value for fields
.
- for field_name, field_value in entry["fields"].items():
+ for field_name, field_value in entry.get("fields", {}).items():
pandasai/connectors/airtable.py
Outdated
""" | ||
Preprocesses Json response data | ||
To prepare dataframe correctly. | ||
""" | ||
records = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets add this logic to _fetch_data function to get rid of three nested loops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 5
Files selected for processing (2)
- pandasai/connectors/airtable.py (1 hunks)
- tests/connectors/test_airtable.py (1 hunks)
Additional comments (Suppressed): 8
tests/connectors/test_airtable.py (1)
- 57-58: The
fallback_name
property is tested twice in this test suite (also at lines 80-83). Consider removing one to avoid redundancy.- def test_fallback_name(self): - self.assertEqual(self.connector.fallback_name, self.config["table"])pandasai/connectors/airtable.py (7)
39-47: If
config
is not provided, the code attempts to populate it from environment variables. This is a good practice for sensitive data like API keys. However, the_populate_config_from_env
method is not shown in this hunk. Ensure that it correctly retrieves and validates the environment variables.58-74: The
_init_connection
method sends a HEAD request to the Airtable API to check the connection. If the response status code is not 200, it raises anInvalidRequestError
. This is a good practice for validating the connection and providing a clear error message if the connection fails. However, consider adding a retry mechanism for transient network errors.104-109: The
_cached
method checks if the cache file exists and if it's older than the cache interval. If it's older, it deletes the file. This is a good practice for managing cache files. However, consider handling potential exceptions when deleting the file.123-127: The
_save_cache
method saves the DataFrame to a Parquet file. This is a good practice for caching data. However, consider handling potential exceptions when saving the file.192-215: The
_fetch_data
method fetches data from the Airtable API and converts it to a DataFrame. It uses pagination to retrieve all records. This is a good practice for handling large datasets. However, consider adding a limit to the number of requests to prevent potential infinite loops.228-234: The
head
method retrieves the first 5 records from the Airtable API. This is a good practice for previewing data. However, consider handling potential exceptions when making the request.262-266: The
rows_count
andcolumns_count
methods return the number of rows and columns in the data source, respectively. They use the@cached_property
decorator to cache the results. This is a good practice for optimizing performance. However, consider handling potential exceptions when executing the connector.
@patch("requests.get") | ||
def test_execute(self, mock_request_get): | ||
mock_request_get.return_value.json.return_value = json.loads( | ||
self.expected_data_json | ||
) | ||
mock_request_get.return_value.status_code = 200 | ||
execute_data = self.connector.execute() | ||
self.assertEqual(type(execute_data), pd.DataFrame) | ||
self.assertEqual(len(execute_data), 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test_execute
method tests the execute
method of the AirtableConnector
class. However, it only checks the type and length of the returned data. It would be beneficial to also verify the content of the returned DataFrame to ensure it matches the expected data.
self, | ||
config: Optional[Union[AirtableConnectorConfig, dict]] = None, | ||
cache_interval: int = 600, | ||
): | ||
if isinstance(config, dict): | ||
if "api_key" in config and "base_id" in config and "table" in config: | ||
config = AirtableConnectorConfig(**config) | ||
else: | ||
raise KeyError( | ||
"Please specify all api_key,table,base_id properly in config ." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config
parameter is optional and can be either a dictionary or an AirtableConnectorConfig
object. If it's a dictionary, it must contain the keys api_key
, base_id
, and table
. If these keys are not present, a KeyError
is raised. Consider providing a more detailed error message to help the user understand what each key represents.
- raise KeyError("Please specify all api_key,table,base_id properly in config .")
+ raise KeyError("Please specify all api_key, table, and base_id properly in the config. api_key is your Airtable API key, base_id is the ID of the base you are connecting to, and table is the name of the table within the base.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self, | ||
config: Optional[Union[AirtableConnectorConfig, dict]] = None, | ||
cache_interval: int = 600, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache_interval
parameter is not documented in the function docstring. Please add a description for it.
if "api_key" in config and "base_id" in config and "table" in config: | ||
config = AirtableConnectorConfig(**config) | ||
else: | ||
raise KeyError( | ||
"Please specify all api_key,table,base_id properly in config ." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check for the presence of "api_key", "base_id", and "table" in the config dictionary is not robust. If any of these keys have a value that evaluates to False
(like an empty string), the check will pass, but the subsequent code may fail. Consider using dict.get()
to check for the presence of these keys.
url = f"{self._root_url}{config['base_id']}/{config['table']}" | ||
response = requests.head( | ||
url=url, headers={"Authorization": f"Bearer {config['api_key']}"} | ||
) | ||
if response.status_code == 200: | ||
self.logger.log( | ||
""" | ||
Connected to Airtable. | ||
""" | ||
) | ||
else: | ||
raise InvalidRequestError( | ||
f"""Failed to connect to Airtable. | ||
Status code: {response.status_code}, | ||
message: {response.text}""" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connection initialization does not handle potential network errors that could occur when making the request. Consider adding a try-except block to handle exceptions like requests.exceptions.RequestException
.
cache_dir = os.path.join(os.getcwd(), "") | ||
try: | ||
cache_dir = os.path.join((find_project_root()), "cache") | ||
except ValueError: | ||
cache_dir = os.path.join(os.getcwd(), "cache") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is trying to find the project root and if it fails, it defaults to the current working directory. This could lead to inconsistent behavior depending on where the script is run from. Consider making the cache directory a configurable option.
if os.path.getmtime(cache_path) < time.time() - self._cache_interval: | ||
if self.logger: | ||
self.logger.log(f"Deleting expired cached data from {cache_path}") | ||
os.remove(cache_path) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache invalidation strategy is based on the modification time of the cache file. This could lead to issues if the system clock is changed or if the file is manually modified. Consider using a more robust cache invalidation strategy, like storing the cache creation time in the file itself or in a separate metadata file.
cached = self._cached() or self._cached(include_additional_filters=True) | ||
if cached: | ||
return pd.read_parquet(cached) | ||
|
||
if isinstance(self._instance, pd.DataFrame): | ||
return self._instance | ||
else: | ||
self._instance = self._fetch_data() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execute
method does not handle potential errors that could occur when fetching data from the API. Consider adding error handling to the _fetch_data
method or to the execute
method itself.
data = [] | ||
while True: | ||
response = self._request_api(params=params) | ||
|
||
if response.status_code != 200: | ||
raise InvalidRequestError( | ||
f"Failed to connect to Airtable. " | ||
f"Status code: {response.status_code}, " | ||
f"message: {response.text}" | ||
) | ||
|
||
res = response.json() | ||
records = res.get("records", []) | ||
data.extend({"id": record["id"], **record["fields"]} for record in records) | ||
|
||
if len(records) < 100 or "offset" not in res: | ||
break | ||
|
||
if "offset" in res: | ||
params["offset"] = res["offset"] | ||
|
||
return pd.DataFrame(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data fetching loop does not have a maximum iteration count, which could lead to an infinite loop if the API continually returns an "offset" in the response. Consider adding a maximum iteration count or a timeout to prevent potential infinite loops.
data = self._request_api(params={"maxRecords": 5}) | ||
return pd.DataFrame( | ||
[ | ||
{"id": record["id"], **record["fields"]} | ||
for record in data.json()["records"] | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The head
method does not handle potential errors that could occur when making the API request. Consider adding error handling to this method.
data = self.head() | ||
self._columns_count = len(data.columns) | ||
return self._columns_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rows_count
and columns_count
methods execute the connector or fetch the head of the data, respectively, even if the count is already stored. Consider checking if the count is already stored before executing the connector or fetching the head.
self._instance = self.execute() | ||
columns_str = "|".join(self._instance.columns) | ||
columns_str += "WHERE" + self._build_formula() | ||
return hashlib.sha256(columns_str.encode("utf-8")).hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The column_hash
property does not handle potential errors that could occur when executing the connector. Consider adding error handling to this property.
…Tanmaypatil123/pandas-ai into feat/support-airtable-connector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertEqual(self.connector._config, self.config) | ||
self.assertEqual(self.connector._root_url, self.root_url) | ||
self.assertEqual(self.connector._cache_interval, 600) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test test_constructor_and_properties
is checking the internal state of the AirtableConnector
object. This is generally not a good practice as it makes the test fragile and tightly coupled with the implementation. Instead, consider testing the behavior of the object. For example, you could test that the execute
method returns the expected results given a certain configuration.
@patch("requests.get") | ||
def test_execute(self, mock_request_get): | ||
mock_request_get.return_value.json.return_value = json.loads( | ||
self.expected_data_json | ||
) | ||
mock_request_get.return_value.status_code = 200 | ||
execute_data = self.connector.execute() | ||
self.assertEqual(type(execute_data), pd.DataFrame) | ||
self.assertEqual(len(execute_data), 3) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test_execute
method, you are mocking the requests.get
method and setting the return value to a predefined JSON response. This is a good practice for isolating the unit under test. However, you should also test the case where the request fails (e.g., the status code is not 200). This will help ensure that your code handles errors correctly.
@patch("requests.get") | ||
def test_head(self, mock_request_get): | ||
mock_request_get.return_value.json.return_value = json.loads( | ||
self.expected_data_json | ||
) | ||
mock_request_get.return_value.status_code = 200 | ||
execute_data = self.connector.head() | ||
self.assertEqual(type(execute_data), pd.DataFrame) | ||
self.assertLessEqual(len(execute_data), 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the test_execute
method, the test_head
method should also test the case where the request fails.
@patch("requests.get") | ||
def test_rows_count_property(self, mock_request_get): | ||
# Test rows_count property | ||
mock_request_get.return_value.json.return_value = json.loads( | ||
self.expected_data_json | ||
) | ||
mock_request_get.return_value.status_code = 200 | ||
rows_count = self.connector.rows_count | ||
self.assertEqual(rows_count, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test_rows_count_property
method, you are testing the rows_count
property of the AirtableConnector
object. This is a good practice as it verifies that the property returns the expected value. However, you should also test the case where the request fails.
@patch("requests.get") | ||
def test_columns_count_property(self, mock_request_get): | ||
# Test columns_count property | ||
mock_request_get.return_value.json.return_value = json.loads( | ||
self.expected_data_json | ||
) | ||
mock_request_get.return_value.status_code = 200 | ||
columns_count = self.connector.columns_count | ||
self.assertEqual(columns_count, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test_columns_count_property
method, similar to the test_rows_count_property
method, you should also test the case where the request fails.
def test_build_formula_method(self): | ||
formula = self.connector._build_formula() | ||
expected_formula = "AND(Status='In progress')" | ||
self.assertEqual(formula,expected_formula) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test_build_formula_method
method is testing the _build_formula
method of the AirtableConnector
object. This is a good practice as it verifies that the method returns the expected value. However, you should also test the case where the where
parameter is not provided in the configuration. This will help ensure that your code handles this case correctly.
@patch("requests.get") | ||
def test_column_hash(self,mock_request_get): | ||
mock_request_get.return_value.json.return_value = json.loads( | ||
self.expected_data_json | ||
) | ||
mock_request_get.return_value.status_code = 200 | ||
returned_hash = self.connector.column_hash | ||
self.assertEqual( | ||
returned_hash, | ||
"e4cdc9402a0831fb549d7fdeaaa089b61aeaf61e14b8a044bc027219b2db941e" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test_column_hash
method, similar to the test_rows_count_property
and test_columns_count_property
methods, you should also test the case where the request fails.
Implemented Airtable connector with the help of official docs.
Required parameters are according to new policy of airtable . Authorization
token
is used for authentication andbaseid
andtable_name
required to connect and load data.Summary by CodeRabbit
AirtableConnector
to thepandasai
library, enabling users to connect to Airtable databases and retrieve data directly into a Pandas DataFrame. This feature enhances data accessibility and manipulation.AirtableConnector
to improve performance by storing and reusing data.connectors.md
file to include instructions and examples on how to use the newAirtableConnector
.AirtableConnector
to ensure its functionality and reliability.InvalidRequestError
andAPIKeyNotFoundError
exceptions.