-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat : Airtable connector Support #635
Changes from 5 commits
197740b
826aa02
cd78117
1ec1dd5
2171296
a1f25b8
02007ae
3f8f55f
8c0e07b
f6955eb
b2fe3a7
fc5c326
88c419d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
from pandasai.connectors import AirtableConnector | ||
from pandasai.llm import OpenAI | ||
from pandasai import SmartDataframe | ||
|
||
|
||
airtable_connectors = AirtableConnector( | ||
config={ | ||
"api_key": "AIRTABLE_API_TOKEN", | ||
"table": "AIRTABLE_TABLE_NAME", | ||
"base_id": "AIRTABLE_BASE_ID", | ||
"where": [ | ||
# this is optional and filters the data to | ||
# reduce the size of the dataframe | ||
["Status", "==", "In progress"] | ||
], | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
) | ||
|
||
llm = OpenAI("OPENAI_API_KEY") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
df = SmartDataframe(airtable_connectors, config={"llm": llm}) | ||
|
||
response = df.chat("How many rows are there in data ?") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
print(response) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,232 @@ | ||
""" | ||
Airtable connectors are used to connect airtable records. | ||
""" | ||
|
||
from .base import AirtableConnectorConfig, BaseConnector, BaseConnectorConfig | ||
from typing import Union, Optional | ||
import requests | ||
import pandas as pd | ||
import os | ||
from ..helpers.path import find_project_root | ||
import time | ||
import hashlib | ||
from ..exceptions import InvalidRequestError | ||
|
||
|
||
class AirtableConnector(BaseConnector): | ||
""" | ||
Airtable connector to retrieving record data. | ||
""" | ||
|
||
def __init__( | ||
self, | ||
config: Optional[Union[AirtableConnectorConfig, dict]] = None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's add a cache mechanism similar to the other connectors! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Tanmaypatil123 check out yahoo finance connector for references |
||
cache_interval: int = 600, | ||
): | ||
Comment on lines
+26
to
+30
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Comment on lines
+27
to
+30
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
if isinstance(config, dict): | ||
if "api_key" in config and "base_id" in config and "table" in config: | ||
config = AirtableConnectorConfig(**config) | ||
else: | ||
raise KeyError( | ||
"Please specify all api_key,table,base_id properly in config ." | ||
) | ||
Comment on lines
+31
to
+37
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The check for the presence of "api_key", "base_id", and "table" in the config dictionary is not robust. If any of these keys have a value that evaluates to
Comment on lines
+26
to
+37
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The - if "api_key" in config and "base_id" in config and "table" in config:
- config = AirtableConnectorConfig(**config)
- else:
- raise KeyError(
- "Please specify all api_key,table,base_id properly in config ."
- )
+ missing_keys = [key for key in ["api_key", "base_id", "table"] if key not in config]
+ if missing_keys:
+ raise KeyError(f"Missing keys in config: {', '.join(missing_keys)}")
+ config = AirtableConnectorConfig(**config)
Comment on lines
+27
to
+37
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The - raise KeyError("Please specify all api_key,table,base_id properly in config .")
+ raise KeyError("Please specify all api_key, table, and base_id properly in the config. api_key is your Airtable API key, base_id is the ID of the base you are connecting to, and table is the name of the table within the base.")
Comment on lines
+32
to
+37
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The check for the presence of "api_key", "base_id", and "table" in the config dictionary is not robust. If any of these keys have a value that evaluates to |
||
|
||
elif not config: | ||
airtable_env_vars = { | ||
"api_key": "AIRTABLE_API_TOKEN", | ||
"base_id": "AIRTABLE_BASE_ID", | ||
"table": "AIRTABLE_TABLE_NAME", | ||
} | ||
config = AirtableConnectorConfig( | ||
**self._populate_config_from_env(config, airtable_env_vars) | ||
) | ||
Comment on lines
+39
to
+47
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Comment on lines
+38
to
+47
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code is not handling the case where the |
||
|
||
self._root_url: str = "https://api.airtable.com/v0/" | ||
self._cache_interval = cache_interval | ||
|
||
super().__init__(config) | ||
|
||
def _init_connection(self, config: BaseConnectorConfig): | ||
""" | ||
make connection to database | ||
""" | ||
config = config.dict() | ||
url = f"{self._root_url}{config['base_id']}/{config['table']}" | ||
response = requests.head( | ||
url=url, headers={"Authorization": f"Bearer {config['api_key']}"} | ||
) | ||
if response.status_code == 200: | ||
self.logger.log( | ||
""" | ||
Connected to Airtable. | ||
""" | ||
) | ||
else: | ||
raise InvalidRequestError( | ||
f"""Failed to connect to Airtable. | ||
Status code: {response.status_code}, | ||
message: {response.text}""" | ||
) | ||
Comment on lines
+61
to
+74
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code makes a HEAD request to the Airtable API to check the connection. However, it does not handle potential network errors that could occur during the request, such as a timeout or a connection error. It would be better to wrap the request in a try/except block and handle these potential errors.
Comment on lines
+58
to
+74
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Comment on lines
+54
to
+74
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The - f"""Failed to connect to Airtable.
- Status code: {response.status_code},
- message: {response.text}"""
+ f"""Failed to connect to Airtable at {url}.
+ Status code: {response.status_code},
+ message: {response.text}"""
Comment on lines
+59
to
+74
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The connection initialization does not handle potential network errors that could occur when making the request. Consider adding a try-except block to handle exceptions like |
||
|
||
def _get_cache_path(self, include_additional_filters: bool = False): | ||
""" | ||
Return the path of the cache file. | ||
|
||
Returns : | ||
str : The path of the cache file. | ||
""" | ||
cache_dir = os.path.join(os.getcwd(), "") | ||
try: | ||
cache_dir = os.path.join((find_project_root()), "cache") | ||
except ValueError: | ||
cache_dir = os.path.join(os.getcwd(), "cache") | ||
Comment on lines
+83
to
+87
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code is trying to find the project root and if it fails, it defaults to the current working directory. This could lead to inconsistent behavior depending on where the script is run from. Consider making the cache directory a configurable option. |
||
return os.path.join(cache_dir, f"{self._config.table}_data.parquet") | ||
Comment on lines
+83
to
+88
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
|
||
def _cached(self): | ||
""" | ||
Returns the cached Airtable data if it exists and | ||
is not older than the cache interval. | ||
|
||
Returns : | ||
DataFrame | None : The cached data if | ||
it exists and is not older than the cache | ||
interval, None otherwise. | ||
""" | ||
cache_path = self._get_cache_path() | ||
if not os.path.exists(cache_path): | ||
return None | ||
|
||
# If the file is older than 1 day , delete it. | ||
if os.path.getmtime(cache_path) < time.time() - self._cache_interval: | ||
if self.logger: | ||
self.logger.log(f"Deleting expired cached data from {cache_path}") | ||
os.remove(cache_path) | ||
return None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The cache invalidation logic is based on the modification time of the cache file. However, this approach might not work as expected if the system time is changed. A more reliable approach would be to store the cache creation time within the cache file itself and use that for invalidation.
Comment on lines
+104
to
+109
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The cache invalidation logic is based on the modification time of the cache file. This could lead to unexpected behavior if the cache file is manually modified. Consider using a separate metadata file or a database to store the cache creation time.
Comment on lines
+105
to
+109
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The cache invalidation strategy is based on the modification time of the cache file. This could lead to issues if the system clock is changed or if the file is manually modified. Consider using a more robust cache invalidation strategy, like storing the cache creation time in the file itself or in a separate metadata file. |
||
|
||
if self.logger: | ||
self.logger.log(f"Loading cached data from {cache_path}") | ||
|
||
return cache_path | ||
|
||
def _save_cache(self, df): | ||
""" | ||
Save the given DataFrame to the cache. | ||
|
||
Args: | ||
df (DataFrame): The DataFrame to save to the cache. | ||
""" | ||
filename = self._get_cache_path( | ||
include_additional_filters=self._additional_filters is not None | ||
and len(self._additional_filters) > 0 | ||
) | ||
df.to_parquet(filename) | ||
Comment on lines
+123
to
+127
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The cache file name does not take into account the |
||
|
||
@property | ||
def fallback_name(self): | ||
""" | ||
Returns the fallback table name of the connector. | ||
|
||
Returns : | ||
str : The fallback table name of the connector. | ||
""" | ||
return self._config.table | ||
|
||
def execute(self): | ||
""" | ||
Execute the connector and return the result. | ||
|
||
Returns: | ||
DataFrameType: The result of the connector. | ||
""" | ||
return self.fetch_data() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You have cached it but never used it. You can for the cache like: cached = self._cached() or self._cached(include_additional_filters=True) |
||
|
||
def fetch_data(self): | ||
""" | ||
Feteches data from airtable server through | ||
API and converts it to DataFrame. | ||
""" | ||
url = f"{self._root_url}{self._config.base_id}/{self._config.table}" | ||
response = requests.get( | ||
url=url, | ||
headers={"Authorization": f"Bearer {self._config.api_key}"}, | ||
) | ||
if response.status_code == 200: | ||
data = response.json() | ||
data = self.preprocess(data=data) | ||
self._save_cache(data) | ||
else: | ||
raise InvalidRequestError( | ||
f"""Failed to connect to Airtable. | ||
Status code: {response.status_code}, | ||
message: {response.text}""" | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code does not handle potential network errors that might occur during the requests.get call. It would be better to wrap this call in a try-except block and handle potential network errors. - response = requests.get(
- url=url,
- headers={"Authorization": f"Bearer {self._config.api_key}"},
- )
- if response.status_code == 200:
- data = response.json()
- data = self.preprocess(data=data)
- self._save_cache(data)
- else:
- raise InvalidRequestError(
- f"""Failed to connect to Airtable.
- Status code: {response.status_code},
- message: {response.text}"""
- )
+ try:
+ response = requests.get(
+ url=url,
+ headers={"Authorization": f"Bearer {self._config.api_key}"},
+ )
+ response.raise_for_status()
+ data = response.json()
+ data = self.preprocess(data=data)
+ self._save_cache(data)
+ except requests.exceptions.RequestException as e:
+ raise InvalidRequestError(
+ f"""Failed to connect to Airtable.
+ Error: {str(e)}"""
+ ) |
||
return data | ||
|
||
def preprocess(self, data): | ||
""" | ||
Preprocesses Json response data | ||
To prepare dataframe correctly. | ||
""" | ||
records = [ | ||
{"id": record["id"], **record["fields"]} for record in data["records"] | ||
] | ||
|
||
df = pd.DataFrame(records) | ||
|
||
if self._config.where: | ||
for i in self._config.where: | ||
filter_string = f"{i[0]} {i[1]} '{i[2]}'" | ||
df = df.query(filter_string) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code assumes that the 'where' attribute in the config is a list of conditions. However, it does not validate if this is the case. If 'where' is not a list or does not contain valid conditions, the code will fail at runtime. It would be better to add a check to ensure 'where' is a list and contains valid conditions. |
||
|
||
return df | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's also add a Example:
|
||
def head(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should be cache using @cache for reference check sql.py |
||
""" | ||
Return the head of the table that | ||
the connector is connected to. | ||
|
||
Returns : | ||
DatFrameType: The head of the data source | ||
that the conector is connected to . | ||
""" | ||
Comment on lines
+216
to
+224
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
return self.fetch_data().head() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here fetch only specific records lets say first 5 or random 5 rows instead of all. |
||
|
||
@property | ||
def rows_count(self): | ||
""" | ||
Return the number of rows in the data source that the connector is | ||
connected to. | ||
|
||
Returns: | ||
int: The number of rows in the data source that the connector is | ||
connected to. | ||
""" | ||
data = self.execute() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Keep properties of rows_count and store it in instance so we don't have to call again and again |
||
return len(data) | ||
|
||
@property | ||
def columns_count(self): | ||
""" | ||
Return the number of columns in the data source that the connector is | ||
connected to. | ||
|
||
Returns: | ||
int: The number of columns in the data source that the connector is | ||
connected to. | ||
""" | ||
data = self.execute() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same like rows count to not call again and again. For this use head function instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here for column count get head() as it is also cached using @cache also it will have less data |
||
return len(data.columns) | ||
|
||
@property | ||
def column_hash(self): | ||
""" | ||
Return the hash code that is unique to the columns of the data source | ||
that the connector is connected to. | ||
|
||
Returns: | ||
int: The hash code that is unique to the columns of the data source | ||
that the connector is connected to. | ||
""" | ||
Comment on lines
+266
to
+274
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
data = self.execute() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should also add where queries as well. Otherwise it will return wrong data from the cache. |
||
columns_str = "|".join(data.columns) | ||
return hashlib.sha256(columns_str.encode("utf-8")).hexdigest() |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,6 +20,16 @@ class BaseConnectorConfig(BaseModel): | |
where: list[list[str]] = None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
|
||
|
||
class AirtableConnectorConfig(BaseConnectorConfig): | ||
""" | ||
Connecter configuration for Airtable data. | ||
""" | ||
|
||
api_key: str | ||
base_id: str | ||
database: str = "airtable_data" | ||
|
||
|
||
class SQLBaseConnectorConfig(BaseConnectorConfig): | ||
""" | ||
Base Connector configuration. | ||
Comment on lines
33
to
35
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The |
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new hunk introduces the Airtable connector and provides a code snippet demonstrating its usage. The code seems to be correct and well-explained. However, the
api_key
,table
, andbase_id
are hardcoded strings. It's recommended to use environment variables or secure storage for sensitive data like API keys.Don't forget to import the
os
module at the beginning of your script.