feat : Airtable connector Support #635

Tanmaypatil123 · 2023-10-10T19:20:15Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.

Implemented Airtable connector with the help of official docs.
Required parameters are according to new policy of airtable . Authorization tokenis used for authentication and baseid and table_name required to connect and load data.

Summary by CodeRabbit

New Feature: Introduced the AirtableConnector to the pandasai library, enabling users to connect to Airtable databases and retrieve data directly into a Pandas DataFrame. This feature enhances data accessibility and manipulation.
New Feature: Added caching functionality to the AirtableConnector to improve performance by storing and reusing data.
Documentation: Updated the connectors.md file to include instructions and examples on how to use the new AirtableConnector.
Test: Added a comprehensive suite of unit tests for the AirtableConnector to ensure its functionality and reliability.
New Feature: Enhanced error handling with the introduction of InvalidRequestError and APIKeyNotFoundError exceptions.

coderabbitai · 2023-10-10T19:20:20Z

Walkthrough

This pull request introduces the AirtableConnector class to the pandasai library, enabling users to connect to Airtable databases, fetch data, and perform operations on the data. The connector uses the requests library for HTTP requests and returns results as a Pandas DataFrame. The PR also includes unit tests for the new class and updates the documentation with usage examples.

Changes

File	Summary
`pandasai/connectors/...`	Introduced `AirtableConnector` and `AirtableConnectorConfig` classes for connecting to Airtable databases, fetching data, and performing operations.
`tests/connectors/...`	Added unit tests for the `AirtableConnector` class, verifying the functionality of its methods and properties.
`docs/connectors.md`	Updated documentation with information about the new Airtable connector and usage examples.
`examples/from_airtable.py`	Added an example script demonstrating how to use the `AirtableConnector` class.
`pandasai/exceptions.py`	Introduced new exception classes for improved error handling.

"In the land of code, changes are afoot,
A new connector, isn't that good? 🎉
With Airtable in sight, and data to boot,
Our pandasai library, gets a new recruit! 🐼📚"

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

Mention @coderabbitai in any review comment for bot assistance.
Note: Review comments are made on code diffs or files, not on the PR overview.

Pause Incremental Reviews

Insert @coderabbitai: ignore in the PR description to halt the bot's ongoing reviews. Remove the line to resume.
Additionally, enter @coderabbitai pause to halt the bot's ongoing reviews as any issue or review comment. To
resume reviews, enter @coderabbitai resume as any issue or review comment.

codecov-commenter · 2023-10-10T19:22:55Z

Codecov Report

Merging #635 (f6955eb) into main (720811f) will decrease coverage by 0.84%.
Report is 5 commits behind head on main.
The diff coverage is 66.66%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##             main     #635      +/-   ##
==========================================
- Coverage   84.43%   83.59%   -0.84%     
==========================================
  Files          65       67       +2     
  Lines        3019     3188     +169     
==========================================
+ Hits         2549     2665     +116     
- Misses        470      523      +53

Files	Coverage Δ
pandasai/connectors/__init__.py	`100.00% <100.00%> (ø)`
pandasai/connectors/base.py	`87.64% <100.00%> (+0.58%)`	⬆️
pandasai/exceptions.py	`88.88% <100.00%> (+4.27%)`	⬆️
pandasai/helpers/openai_info.py	`80.00% <ø> (ø)`
pandasai/connectors/airtable.py	`65.00% <65.00%> (ø)`

... and 10 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

coderabbitai

Review Status

Actionable comments generated: 9

Commits

Files that changed from the base of the PR and between 720811f and 197740b.

Files selected for processing (5)

pandasai/connectors/init.py (2 hunks)
pandasai/connectors/airtable.py (1 hunks)
pandasai/connectors/base.py (1 hunks)
pandasai/helpers/openai_info.py (2 hunks)
tests/connectors/test_airtable.py (1 hunks)

Files skipped from review due to trivial changes (1)

pandasai/connectors/init.py

Additional comments (Suppressed): 2

pandasai/connectors/base.py (1)

23-30: The AirtableConnectorConfig class is introduced to hold the configuration for the Airtable connector. Ensure that the token and baseID are securely stored and not logged or exposed in any way. Also, consider adding input validation or type checking for these fields to prevent misuse or misconfiguration.

pandasai/helpers/openai_info.py (1)

45-53: No changes have been made to the function signature or the docstring of get_openai_token_cost_for_model(). Ensure that the function's usage elsewhere in the codebase is still valid.

coderabbitai · 2023-10-10T19:22:56Z

tests/connectors/test_airtable.py

+        self.assertEqual(self.connector._config, self.config)
+        self.assertEqual(self.connector._root_url, self.root_url)


The test test_constructor_and_properties only checks if the _config and _root_url properties are set correctly. It would be beneficial to also verify if other properties like token, baseID, and table are correctly set in the AirtableConnector instance.

coderabbitai · 2023-10-10T19:22:56Z

tests/connectors/test_airtable.py

+            ],
+            "offset": "itrowYGFfoBEIob3C/recsxnHUagIce7nB2",
+        }
+        print(expected_data_json)


The print statement seems to be a leftover from debugging and should be removed to keep the test output clean.

- print(expected_data_json)

@Tanmaypatil123 this is a leftover!

Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.

coderabbitai · 2023-10-10T19:22:56Z

tests/connectors/test_airtable.py

+        self.connector.head = Mock(return_value=expected_data)
+        head_data = self.connector.head()
+
+        self.assertEqual(head_data.equals(expected_data), True)
+        self.assertLessEqual(len(head_data), 5)


Similar to the test_execute method, in the test_head method, you are mocking the head method of the connector and then calling it. This doesn't test the actual head method implementation. Instead, you should mock the requests library's method that head uses to make the API call, and then check if head returns the expected data.

pandasai/helpers/openai_info.py

pandasai/connectors/airtable.py

coderabbitai · 2023-10-10T19:22:57Z

pandasai/connectors/airtable.py

+    def _init_connection(self, config: BaseConnectorConfig):
+        """
+        make connection to database
+        """
+        config = config.dict()
+        _session = requests.Session()
+        _session.headers = {"Authorization": f"Bearer {config['token']}"}
+        url = f"{self._root_url}{config['baseID']}/{config['table']}"
+        response = _session.head(url=url)
+        if response.status_code == 200:
+            self._session = _session
+        else:
+            raise ValueError(
+                f"""Failed to connect to Airtable. 
+                Status code: {response.status_code}, 
+                message: {response.text}"""
+            )


The _init_connection method is making a HEAD request to the Airtable API to check the connection. This is not a common practice and might lead to unnecessary network traffic. Consider removing this check or replacing it with a more lightweight method, if available in the Airtable API.

def _init_connection(self, config: BaseConnectorConfig): """ make connection to database """ config = config.dict() _session = requests.Session() _session.headers = {"Authorization": f"Bearer {config['token']}"} url = f"{self._root_url}{config['baseID']}/{config['table']}" response = _session.head(url=url) if response.status_code == 200: self._session = _session else: raise ValueError( f"""Failed to connect to Airtable. Status code: {response.status_code}, message: {response.text}""" )

pandasai/connectors/airtable.py

coderabbitai · 2023-10-10T19:22:57Z

pandasai/connectors/airtable.py

+    def head(self):
+        """
+        Return the head of the table that
+          the connector is connected to.
+
+        Returns :
+            DatFrameType: The head of the data source
+                 that the conector is connected to .
+        """
+        return self._response.head()


The head method is returning the head of the _response DataFrame. However, it's not checking if _response is None or empty. This could lead to an AttributeError if head is called before execute. Consider adding a check for this.

def head(self): """ Return the head of the table that the connector is connected to. Returns : DatFrameType: The head of the data source that the conector is connected to . """ if self._response is not None: return self._response.head() else: raise ValueError("No data available. Execute the connector first.")

coderabbitai

Review Status

Actionable comments generated: 2

Commits

Files that changed from the base of the PR and between 197740b and 826aa02.

Files selected for processing (1)

tests/connectors/test_airtable.py (1 hunks)

Additional comments (Suppressed): 1

tests/connectors/test_airtable.py (1)

11-13: The dict() method is used on the AirtableConnectorConfig instance, but it's not clear if this method is defined in the AirtableConnectorConfig class. If it's not defined, this will raise an AttributeError. Please verify.

coderabbitai · 2023-10-10T19:28:21Z

tests/connectors/test_airtable.py

+        self.connector.execute = Mock(return_value=expected_data)
+        execute_data = self.connector.execute()
+        self.assertEqual(execute_data.equals(expected_data), True)


The execute method is mocked to return expected_data, and then it's called and the returned data is compared with expected_data. This test will always pass and doesn't test the actual functionality of the execute method. Consider mocking the requests library's methods instead, and then test if execute returns the correct data.

coderabbitai · 2023-10-10T19:28:21Z

tests/connectors/test_airtable.py

+        self.connector.head = Mock(return_value=expected_data)
+        head_data = self.connector.head()
+


Similar to the execute method, the head method is mocked to return expected_data, and then it's called and the returned data is compared with expected_data. This test will always pass and doesn't test the actual functionality of the head method. Consider mocking the requests library's methods instead, and then test if head returns the correct data.

@Tanmaypatil123 mocking the method that you are testing is a bad practice, as it always returns the value you are forcing it to return.

Looks like humans are talking. I'll be quiet for now. If you need me again, just tag @coderabbitai in a new comment.

gventuri

Let's try to increase the test coverage and let's add an example (from_airtable.py) and also let's add it on the connectors.md file

gventuri · 2023-10-10T21:20:45Z

pandasai/connectors/airtable.py

+
+    def __init__(
+        self,
+        config: Optional[Union[AirtableConnectorConfig, dict]] = None,


Let's add a cache mechanism similar to the other connectors!

@Tanmaypatil123 check out yahoo finance connector for references

gventuri · 2023-10-10T21:23:34Z

pandasai/connectors/airtable.py

+        elif not config:
+            airtable_env_vars = {
+                "token": "AIRTABLE_AUTH_TOKEN",
+                "baseID": "AIRTABLE_BASE_ID",


Could we make baseID it snake case (base_id)?

gventuri · 2023-10-10T21:25:05Z

pandasai/connectors/airtable.py

+        make connection to database
+        """
+        config = config.dict()
+        _session = requests.Session()


Why is _session underscored?

gventuri · 2023-10-10T21:31:16Z

pandasai/connectors/airtable.py

+                    message: {_response.text}"""
+                )
+        return self._response
+


Let's also add a def fallback_name, similarly to all the other connectors.

Example:

@property def fallback_name(self): """ Return the fallback name of the connector. Returns: str: The fallback name of the connector. """ return self._config.table

gventuri · 2023-10-10T21:34:57Z

pandasai/connectors/airtable.py

+            DatFrameType: The head of the data source
+                 that the conector is connected to .
+        """
+        return self._response.head()


The head method should be independent. What happens if a _response hasn't been defined yet?

gventuri · 2023-10-10T21:36:52Z

pandasai/connectors/airtable.py

+
+        elif not config:
+            airtable_env_vars = {
+                "token": "AIRTABLE_AUTH_TOKEN",


Let's rename to api_key to be consistent with the variable names in the LLMs.

gventuri · 2023-10-10T21:38:47Z

pandasai/connectors/base.py

+    Connecter configuration for Airtable data.
+    """
+
+    token: str


gventuri · 2023-10-10T21:38:52Z

pandasai/connectors/base.py

+    """
+
+    token: str
+    baseID: str


gventuri · 2023-10-10T21:49:32Z

Additionally:

the config should accept a param "where" similar to all the other connectors
we should either handle pagination or see how many records we can get max (check out the maxRecords param)

gventuri · 2023-10-10T22:00:24Z

Also, we should fallback to the env variables for base id and token, similarly to what we've done in the other connectors (i.e. mysql).

gventuri · 2023-10-10T22:00:44Z

Btw did you try it? It doesn't seem to work for me.

coderabbitai

Review Status

Actionable comments generated: 11

Commits

Files that changed from the base of the PR and between 826aa02 and cd78117.

Files selected for processing (5)

docs/connectors.md (1 hunks)
examples/from_airtable.py (1 hunks)
pandasai/connectors/airtable.py (1 hunks)
pandasai/connectors/base.py (1 hunks)
tests/connectors/test_airtable.py (1 hunks)

Additional comments (Suppressed): 8

pandasai/connectors/base.py (2)

23-30: The new AirtableConnectorConfig class is introduced to handle the configuration for the Airtable connector. Ensure that the api_key, base_id, and database attributes are correctly used in the AirtableConnector class and that the api_key and base_id are properly secured and not exposed in any logs or error messages.

33-35: The SQLBaseConnectorConfig class is unchanged. No action required.

examples/from_airtable.py (4)

6-16: The AirtableConnector is initialized with a configuration dictionary. Ensure that the api_key, table, and base_id values are replaced with actual values before running the code. Also, the where clause is optional and can be used to filter the data. If not needed, it can be removed.

19-19: The OpenAI object is initialized with an API key. Make sure to replace "OPENAI_API_KEY" with your actual OpenAI API key.

20-20: The SmartDataframe is initialized with the AirtableConnector and a configuration dictionary. The llm key in the configuration dictionary is set to the OpenAI object. This is a good practice as it allows for easy configuration of different components.

22-22: The chat method is called on the SmartDataframe object with a query string. This is a good practice as it allows for easy querying of the data. However, ensure that the query string is relevant to your data.

docs/connectors.md (2)

187-189: The new hunk starts with a continuation of the previous section. Ensure that the context is clear for readers.

193-194: The introduction of the Airtable connector is clear and concise, explaining its purpose and usage.

coderabbitai · 2023-10-11T13:36:24Z

tests/connectors/test_airtable.py

+    @patch("requests.get")
+    def test_columns_count_property(self, mock_request_get):
+        # Test columns_count property
+        mock_request_get.return_value.json.return_value = json.loads(
+            self.expected_data_json
+        )
+        mock_request_get.return_value.status_code = 200
+        rows_count = self.connector.columns_count
+        self.assertEqual(rows_count, 3)


In the test_columns_count_property method, you are testing the columns_count property. This is a good practice. However, consider adding a test case for when the JSON response does not contain the expected data to ensure your code handles such scenarios correctly.

coderabbitai · 2023-10-11T13:36:24Z

docs/connectors.md

+from pandasai.connectors import AirtableConnector
+from pandasai import SmartDataframe
+
+
+airtable_connectors = AirtableConnector(
+    config={
+        "api_key": "AIRTABLE_API_TOKEN",
+        "table":"AIRTABLE_TABLE_NAME",
+        "base_id":"AIRTABLE_BASE_ID",
+        "where" : [
+            # this is optional and filters the data to
+            # reduce the size of the dataframe
+            ["Status" ,"=","In progress"]
+        ]
+    }
+)
+
+df = SmartDataframe(airtable_connectors)
+
+df.chat("How many rows are there in data ?")
+```


The code snippet demonstrates the usage of the Airtable connector. It's clear and easy to understand. However, the api_key, table, and base_id are hardcoded. In a real-world scenario, these should be stored securely and not exposed in the code.

airtable_connectors = AirtableConnector( config={ - "api_key": "AIRTABLE_API_TOKEN", - "table":"AIRTABLE_TABLE_NAME", - "base_id":"AIRTABLE_BASE_ID", + "api_key": os.getenv("AIRTABLE_API_TOKEN"), + "table": os.getenv("AIRTABLE_TABLE_NAME"), + "base_id": os.getenv("AIRTABLE_BASE_ID"), "where" : [ # this is optional and filters the data to # reduce the size of the dataframe ["Status" ,"=","In progress"] ] } )

This change uses the os.getenv function to retrieve the values from environment variables, which is a common practice for handling sensitive data like API keys.

coderabbitai · 2023-10-11T13:36:24Z

pandasai/connectors/airtable.py

+            if config["api_key"] and config["base_id"] and config["table"]:
+                config = AirtableConnectorConfig(**config)


The code assumes that if config is a dictionary, it will always contain the keys "api_key", "base_id", and "table". If any of these keys are missing, a KeyError will be raised. Consider adding error handling to check for the presence of these keys before accessing them.

- if config["api_key"] and config["base_id"] and config["table"]: + if "api_key" in config and "base_id" in config and "table" in config:

coderabbitai · 2023-10-11T13:36:25Z

pandasai/connectors/airtable.py

+            url=url, headers={"Authorization": f"Bearer {config['api_key']}"}
+        )
+        if response.status_code == 200:
+            self.logger.log(
+                """
+                Connected to Airtable.
+            """
+            )
+        else:
+            raise ValueError(
+                f"""Failed to connect to Airtable. 
+                    Status code: {response.status_code}, 
+                    message: {response.text}"""
+            )


The code does not handle potential exceptions that could be raised by the requests.head call. Consider adding a try-except block to handle potential requests.exceptions.RequestException exceptions.

- response = requests.head( - url=url, headers={"Authorization": f"Bearer {config['api_key']}"} - ) + try: + response = requests.head( + url=url, headers={"Authorization": f"Bearer {config['api_key']}"} + ) + except requests.exceptions.RequestException as e: + raise ValueError(f"Failed to connect to Airtable: {str(e)}")

coderabbitai · 2023-10-11T13:36:25Z

pandasai/connectors/airtable.py

+        response = requests.get(
+            url=url, headers={"Authorization": f"Bearer {self._config.api_key}"}
+        )
+        if response.status_code == 200:
+            data = response.json()
+            data = self.preprocess(data=data)
+            self._save_cache(data)
+        else:
+            raise ValueError(
+                f"""Failed to connect to Airtable. 
+                    Status code: {response.status_code}, 
+                    message: {response.text}"""
+            )


The code does not handle potential exceptions that could be raised by the requests.get call. Consider adding a try-except block to handle potential requests.exceptions.RequestException exceptions.

- response = requests.get( - url=url, headers={"Authorization": f"Bearer {self._config.api_key}"} - ) + try: + response = requests.get( + url=url, headers={"Authorization": f"Bearer {self._config.api_key}"} + ) + except requests.exceptions.RequestException as e: + raise ValueError(f"Failed to connect to Airtable: {str(e)}")

coderabbitai · 2023-10-11T13:36:25Z

pandasai/connectors/airtable.py

+        response = requests.get(
+            url=url, headers={"Authorization": f"Bearer {self._config.api_key}"}
+        )
+        if response.status_code == 200:
+            data = response.json()
+            data = self.preprocess(data=data)
+        else:
+            raise ValueError(
+                f"""Failed to connect to Airtable. 
+                    Status code: {response.status_code}, 
+                    message: {response.text}"""
+            )


The code does not handle potential exceptions that could be raised by the requests.get call. Consider adding a try-except block to handle potential requests.exceptions.RequestException exceptions.

- response = requests.get( - url=url, headers={"Authorization": f"Bearer {self._config.api_key}"} - ) + try: + response = requests.get( + url=url, headers={"Authorization": f"Bearer {self._config.api_key}"} + ) + except requests.exceptions.RequestException as e: + raise ValueError(f"Failed to connect to Airtable: {str(e)}")

coderabbitai

Review Status

Actionable comments generated: 10

Commits

Files that changed from the base of the PR and between cd78117 and 1ec1dd5.

Files selected for processing (5)

docs/connectors.md (1 hunks)
examples/from_airtable.py (1 hunks)
pandasai/connectors/airtable.py (1 hunks)
pandasai/connectors/base.py (1 hunks)
pandasai/exceptions.py (1 hunks)

Additional comments (Suppressed): 5

pandasai/exceptions.py (2)

8-16: The new InvalidRequestError exception is introduced to handle unsuccessful requests. Ensure that this exception is properly caught and handled in the code where requests are made. Also, the docstring for the exception could be improved to provide more information about when and why this exception might be raised.

18-20: The APIKeyNotFoundError exception has been moved down in the code. Ensure that this does not affect any imports or usage of this exception in other parts of the codebase.

pandasai/connectors/base.py (1)

23-33: The AirtableConnectorConfig class is well defined with appropriate attributes. However, ensure that the api_key, base_id, and database are properly validated before use to prevent potential security issues or errors.

docs/connectors.md (1)

187-189: The new hunk starts with a code snippet that seems to be a continuation from a previous section. Ensure that this transition is smooth and the context is clear to the reader.

pandasai/connectors/airtable.py (1)

149-163: The code makes a GET request to the Airtable API to fetch data. However, it does not handle

coderabbitai · 2023-10-12T09:47:28Z

examples/from_airtable.py

+airtable_connectors = AirtableConnector(
+    config={
+        "api_key": "AIRTABLE_API_TOKEN",
+        "table": "AIRTABLE_TABLE_NAME",
+        "base_id": "AIRTABLE_BASE_ID",
+        "where": [
+            # this is optional and filters the data to
+            # reduce the size of the dataframe
+            ["Status", "==", "In progress"]
+        ],
+    }


The AirtableConnector is initialized with a hardcoded configuration. This might not be an issue if this is just an example, but in a production environment, sensitive data like api_key and base_id should be stored securely and not hardcoded. Consider using environment variables or a secure configuration management system.

coderabbitai · 2023-10-12T09:47:29Z

examples/from_airtable.py

+    }
+)
+
+llm = OpenAI("OPENAI_API_KEY")


The OpenAI instance is initialized with a hardcoded API key. As with the AirtableConnector, this is a security concern. API keys should be stored securely and not hardcoded.

coderabbitai · 2023-10-12T09:47:29Z

examples/from_airtable.py

+llm = OpenAI("OPENAI_API_KEY")
+df = SmartDataframe(airtable_connectors, config={"llm": llm})
+
+response = df.chat("How many rows are there in data ?")


The chat method is called on the SmartDataframe object without any error handling. If an error occurs during the execution of this method, it could cause the program to crash. Consider wrapping this call in a try/except block to handle potential exceptions.

coderabbitai · 2023-10-12T09:47:29Z

pandasai/connectors/base.py

@@ -20,6 +20,17 @@ class BaseConnectorConfig(BaseModel):
    where: list[list[str]] = None


The where parameter is not used in the AirtableConnectorConfig class. If it's not needed, consider removing it to avoid confusion.

coderabbitai · 2023-10-12T09:47:29Z

pandasai/connectors/base.py

 class SQLBaseConnectorConfig(BaseConnectorConfig):
    """
    Base Connector configuration.


The SQLBaseConnectorConfig class seems to be duplicated. If it's not intended, consider removing it to maintain code cleanliness.

coderabbitai · 2023-10-12T09:47:29Z

docs/connectors.md

+The Airtable connector allows you to connect to Airtable Projects Tables, by simply passing the `base_id` , `api_key` and `table_name` of the table you want to analyze.
+
+To use the Airtable connector, you only need to import it into your Python code and pass it to a `SmartDataframe` or `SmartDatalake` object:
+
+```python
+from pandasai.connectors import AirtableConnector
+from pandasai import SmartDataframe
+
+
+airtable_connectors = AirtableConnector(
+    config={
+        "api_key": "AIRTABLE_API_TOKEN",
+        "table":"AIRTABLE_TABLE_NAME",
+        "base_id":"AIRTABLE_BASE_ID",
+        "where" : [
+            # this is optional and filters the data to
+            # reduce the size of the dataframe
+            ["Status" ,"==","In progress"]
+        ]
+    }
+)
+
+df = SmartDataframe(airtable_connectors)
+
+df.chat("How many rows are there in data ?")
+```


The new hunk introduces the Airtable connector and provides a code snippet demonstrating its usage. The code seems to be correct and well-explained. However, the api_key, table, and base_id are hardcoded strings. It's recommended to use environment variables or secure storage for sensitive data like API keys.

airtable_connectors = AirtableConnector( config={ - "api_key": "AIRTABLE_API_TOKEN", - "table":"AIRTABLE_TABLE_NAME", - "base_id":"AIRTABLE_BASE_ID", + "api_key": os.getenv("AIRTABLE_API_TOKEN"), + "table": os.getenv("AIRTABLE_TABLE_NAME"), + "base_id": os.getenv("AIRTABLE_BASE_ID"), "where" : [ # this is optional and filters the data to # reduce the size of the dataframe ["Status" ,"==","In progress"] ] } )

Don't forget to import the os module at the beginning of your script.

coderabbitai · 2023-10-12T09:47:30Z

pandasai/connectors/airtable.py

+            url=url, headers={"Authorization": f"Bearer {config['api_key']}"}
+        )
+        if response.status_code == 200:
+            self.logger.log(
+                """
+                Connected to Airtable.
+            """
+            )
+        else:
+            raise InvalidRequestError(
+                f"""Failed to connect to Airtable. 
+                    Status code: {response.status_code}, 
+                    message: {response.text}"""
+            )


The code makes a HEAD request to the Airtable API to check the connection. However, it does not handle potential network errors that could occur during the request, such as a timeout or a connection error. It would be better to wrap the request in a try/except block and handle these potential errors.

coderabbitai

Review Status

Actionable comments generated: 5

Commits

Files that changed from the base of the PR and between 1ec1dd5 and 2171296.

Files selected for processing (2)

pandasai/connectors/airtable.py (1 hunks)
pandasai/connectors/base.py (1 hunks)

Additional comments (Suppressed): 2

pandasai/connectors/base.py (2)

23-30: The AirtableConnectorConfig class has been added to handle the configuration for the Airtable connector. Ensure that the api_key, base_id, and database attributes are correctly used in the AirtableConnector class and that the api_key and base_id are properly secured and not exposed in any logs or error messages.

33-35: The SQLBaseConnectorConfig class has not been modified in this hunk. Ensure that any changes to this class in other hunks do not affect its current functionality.

coderabbitai · 2023-10-12T10:40:45Z

pandasai/connectors/airtable.py

+        cache_path = self._get_cache_path()
+        if not os.path.exists(cache_path):
+            return None
+
+        # If the file is older than 1 day , delete it.
+        if os.path.getmtime(cache_path) < time.time() - self._cache_interval:
+            if self.logger:
+                self.logger.log(f"Deleting expired cached data from {cache_path}")
+            os.remove(cache_path)
+            return None


The cache invalidation logic is based on the modification time of the cache file. However, this approach might not work as expected if the system time is changed. A more reliable approach would be to store the cache creation time within the cache file itself and use that for invalidation.

coderabbitai · 2023-10-12T10:40:45Z

pandasai/connectors/airtable.py

+        url = f"{self._root_url}{self._config.base_id}/{self._config.table}"
+        response = requests.get(
+            url=url,
+            headers={"Authorization": f"Bearer {self._config.api_key}"},
+        )
+        if response.status_code == 200:
+            data = response.json()
+            data = self.preprocess(data=data)
+            self._save_cache(data)
+        else:
+            raise InvalidRequestError(
+                f"""Failed to connect to Airtable. 
+                    Status code: {response.status_code}, 
+                    message: {response.text}"""
+            )


The code does not handle potential network errors that might occur during the requests.get call. It would be better to wrap this call in a try-except block and handle potential network errors.

- response = requests.get( - url=url, - headers={"Authorization": f"Bearer {self._config.api_key}"}, - ) - if response.status_code == 200: - data = response.json() - data = self.preprocess(data=data) - self._save_cache(data) - else: - raise InvalidRequestError( - f"""Failed to connect to Airtable. - Status code: {response.status_code}, - message: {response.text}""" - ) + try: + response = requests.get( + url=url, + headers={"Authorization": f"Bearer {self._config.api_key}"}, + ) + response.raise_for_status() + data = response.json() + data = self.preprocess(data=data) + self._save_cache(data) + except requests.exceptions.RequestException as e: + raise InvalidRequestError( + f"""Failed to connect to Airtable. + Error: {str(e)}""" + )

coderabbitai · 2023-10-12T10:40:45Z

pandasai/connectors/airtable.py

+        if self._config.where:
+            for i in self._config.where:
+                filter_string = f"{i[0]} {i[1]} '{i[2]}'"
+                df = df.query(filter_string)


The code assumes that the 'where' attribute in the config is a list of conditions. However, it does not validate if this is the case. If 'where' is not a list or does not contain valid conditions, the code will fail at runtime. It would be better to add a check to ensure 'where' is a list and contains valid conditions.

coderabbitai

Review Status

Actionable comments generated: 9

Commits

Files that changed from the base of the PR and between 2171296 and a1f25b8.

Files selected for processing (1)

pandasai/connectors/airtable.py (1 hunks)

coderabbitai · 2023-10-12T15:30:32Z

pandasai/connectors/airtable.py

+    def __init__(
+        self,
+        config: Optional[Union[AirtableConnectorConfig, dict]] = None,
+        cache_interval: int = 600,
+    ):


The cache_interval parameter is not documented in the function docstring. Please add a description for it.

coderabbitai · 2023-10-12T15:30:32Z

pandasai/connectors/airtable.py

+        if isinstance(config, dict):
+            if "api_key" in config and "base_id" in config and "table" in config:
+                config = AirtableConnectorConfig(**config)
+            else:
+                raise KeyError(
+                    "Please specify all api_key,table,base_id properly in config ."
+                )


The check for the presence of "api_key", "base_id", and "table" in the config dictionary is not robust. If any of these keys have a value that evaluates to False (like an empty string), the condition will fail. Consider using in to check for key existence.

coderabbitai · 2023-10-12T15:30:32Z

pandasai/connectors/airtable.py

+        elif not config:
+            airtable_env_vars = {
+                "api_key": "AIRTABLE_API_TOKEN",
+                "base_id": "AIRTABLE_BASE_ID",
+                "table": "AIRTABLE_TABLE_NAME",
+            }
+            config = AirtableConnectorConfig(
+                **self._populate_config_from_env(config, airtable_env_vars)
+            )


The config parameter is not used in the _populate_config_from_env method call. If it's not needed, consider removing it.

coderabbitai · 2023-10-12T15:30:32Z

pandasai/connectors/airtable.py

+        cache_dir = os.path.join(os.getcwd(), "")
+        try:
+            cache_dir = os.path.join((find_project_root()), "cache")
+        except ValueError:
+            cache_dir = os.path.join(os.getcwd(), "cache")
+        return os.path.join(cache_dir, f"{self._config.table}_data.parquet")


The _get_cache_path method does not handle potential os exceptions. Consider adding a try-except block to handle possible file system errors.

coderabbitai · 2023-10-12T15:30:32Z

pandasai/connectors/airtable.py

+        """
+        Feteches data from airtable server through
+            API and converts it to DataFrame.
+        """
+        url = f"{self._root_url}{self._config.base_id}/{self._config.table}"
+        params = {}
+
+        if self._config.where:
+            params["filterByFormula"] = self.build_formula()
+        response = requests.get(
+            url=url,
+            headers={"Authorization": f"Bearer {self._config.api_key}"},
+            params=params,
+        )
+        if response.status_code == 200:
+            data = response.json()
+            data = self.preprocess(data=data)
+            self._save_cache(data)
+        else:
+            raise InvalidRequestError(
+                f"""Failed to connect to Airtable. 
+                    Status code: {response.status_code}, 
+                    message: {response.text}"""
+            )


The fetch_data method does not handle potential requests exceptions. Consider adding a try-except block to handle possible network errors or timeouts.

ArslanSaleem · 2023-10-13T07:05:39Z

pandasai/connectors/airtable.py

+        df = pd.DataFrame(records)
+        return df
+
+    def head(self):


It should be cache using @cache for reference check sql.py

ArslanSaleem · 2023-10-13T07:06:53Z

pandasai/connectors/airtable.py

+            DatFrameType: The head of the data source
+                 that the conector is connected to .
+        """
+        return self.fetch_data().head()


here fetch only specific records lets say first 5 or random 5 rows instead of all.

ArslanSaleem · 2023-10-13T07:09:03Z

pandasai/connectors/airtable.py

+            int: The number of rows in the data source that the connector is
+            connected to.
+        """
+        data = self.execute()


Keep properties of rows_count and store it in instance so we don't have to call again and again

ArslanSaleem · 2023-10-13T07:10:12Z

pandasai/connectors/airtable.py

+            int: The number of columns in the data source that the connector is
+            connected to.
+        """
+        data = self.execute()


Same like rows count to not call again and again. For this use head function instead.

ArslanSaleem · 2023-10-13T07:11:29Z

pandasai/connectors/airtable.py

+            int: The hash code that is unique to the columns of the data source
+            that the connector is connected to.
+        """
+        data = self.execute()


should also add where queries as well. Otherwise it will return wrong data from the cache.

ArslanSaleem · 2023-10-13T07:16:59Z

pandasai/connectors/airtable.py

+        Returns:
+            DataFrameType: The result of the connector.
+        """
+        return self.fetch_data()


You have cached it but never used it. You can for the cache like:

cached = self._cached() or self._cached(include_additional_filters=True)
if cached:
return pd.read_parquet(cached)

coderabbitai

Review Status

Actionable comments generated: 10

Commits

Files that changed from the base of the PR and between a1f25b8 and 3f8f55f.

Files selected for processing (3)

docs/connectors.md (1 hunks)
examples/from_airtable.py (1 hunks)
pandasai/connectors/airtable.py (1 hunks)

Files skipped from review due to trivial changes (1)

docs/connectors.md

Additional comments (Suppressed): 3

examples/from_airtable.py (3)

6-16: The AirtableConnector is initialized with a configuration dictionary. Ensure that the api_key, table, and base_id values are replaced with actual values before running the script. Also, the where clause is optional and can be used to filter the data. If not needed, it can be removed.

19-19: The OpenAI class is initialized with an API key. Make sure to replace "OPENAI_API_KEY" with your actual OpenAI API key.

22-22: The chat method is called on the SmartDataframe object. Ensure that the question passed to the chat method is relevant to your data.

coderabbitai · 2023-10-13T08:52:02Z

pandasai/connectors/airtable.py

+
+        elif not config:
+            airtable_env_vars = {
+                "api_key": "AIRTABLE_API_TOKEN",
+                "base_id": "AIRTABLE_BASE_ID",
+                "table": "AIRTABLE_TABLE_NAME",
+            }
+            config = AirtableConnectorConfig(
+                **self._populate_config_from_env(config, airtable_env_vars)
+            )


The code is not handling the case where the config is not a dictionary and is not None. This could lead to unexpected behavior if a user passes an invalid configuration. Consider adding an else clause to handle this case.

coderabbitai · 2023-10-13T08:52:02Z

pandasai/connectors/airtable.py

+        config = config.dict()
+        url = f"{self._root_url}{config['base_id']}/{config['table']}"
+        response = requests.head(
+            url=url, headers={"Authorization": f"Bearer {config['api_key']}"}
+        )
+        if response.status_code == 200:
+            self.logger.log(
+                """
+                Connected to Airtable.
+            """
+            )
+        else:
+            raise InvalidRequestError(
+                f"""Failed to connect to Airtable. 
+                    Status code: {response.status_code}, 
+                    message: {response.text}"""
+            )


The _init_connection method is making a HEAD request to the Airtable API to check the connection. This is a good practice, but it would be better to handle more specific HTTP status codes. For example, a 401 status code means that the API key is invalid, and a 404 status code means that the base or table does not exist. Handling these specific cases can provide more informative error messages to the user.

coderabbitai · 2023-10-13T08:52:02Z

pandasai/connectors/airtable.py

+        # If the file is older than 1 day , delete it.
+        if os.path.getmtime(cache_path) < time.time() - self._cache_interval:
+            if self.logger:
+                self.logger.log(f"Deleting expired cached data from {cache_path}")
+            os.remove(cache_path)
+            return None


The cache invalidation logic is based on the modification time of the cache file. This could lead to unexpected behavior if the cache file is manually modified. Consider using a separate metadata file or a database to store the cache creation time.

coderabbitai · 2023-10-13T08:52:02Z

pandasai/connectors/airtable.py

+        cached = self._cached() or self._cached(include_additional_filters=True)
+        if cached:
+            return pd.read_parquet(cached)
+
+        if isinstance(self.instance, pd.DataFrame):
+            return self.instance
+        else:
+            self.instance = self.fetch_data()
+        return self.instance


The execute method is checking if self.instance is a DataFrame and returning it if it is. This could lead to unexpected behavior if self.instance is modified elsewhere in the code. Consider always fetching the data from the server or the cache in the execute method.

coderabbitai · 2023-10-13T08:52:02Z

pandasai/connectors/airtable.py

+
+    def fetch_data(self):
+        """
+        Feteches data from airtable server through
+            API and converts it to DataFrame.
+        """
+        url = f"{self._root_url}{self._config.base_id}/{self._config.table}"
+        params = {}
+        if self._config.where:
+            params["filterByFormula"] = self.build_formula()
+        response = requests.get(
+            url=url,
+            headers={"Authorization": f"Bearer {self._config.api_key}"},
+            params=params,
+        )
+        if response.status_code == 200:
+            data = response.json()
+            data = self.preprocess(data=data)
+            self._save_cache(data)
+        else:
+            raise InvalidRequestError(
+                f"""Failed to connect to Airtable. 
+                    Status code: {response.status_code}, 
+                    message: {response.text}"""
+            )


The fetch_data method is making a GET request to the Airtable API to fetch the data. This is a good practice, but it would be better to handle more specific HTTP status codes, similar to the _init_connection method.

coderabbitai · 2023-10-13T08:52:02Z

pandasai/connectors/airtable.py

+        records = [
+            {"id": record["id"], **record["fields"]} for record in data["records"]
+        ]
+
+        df = pd.DataFrame(records)
+        return df


The preprocess method is transforming the JSON response from the Airtable API into a DataFrame. This is a good practice, but it assumes that every record in the response has an id field and a fields field. Consider adding error handling for cases where these fields are missing.

coderabbitai · 2023-10-13T08:52:03Z

pandasai/connectors/airtable.py

+    def head(self):
+        """
+        Return the head of the table that
+          the connector is connected to.
+
+        Returns :
+            DatFrameType: The head of the data source
+                 that the conector is connected to .
+        """


The head method is decorated with the cache decorator from the functools module. This means that the method result is cached and the same result is returned for subsequent calls with the same arguments. However, the head method does not take any arguments, so the caching does not have any effect. Consider removing the cache decorator.

coderabbitai · 2023-10-13T08:52:03Z

pandasai/connectors/airtable.py

+    @cached_property
+    def rows_count(self):
+        """
+        Return the number of rows in the data source that the connector is
+        connected to.
+
+        Returns:
+            int: The number of rows in the data source that the connector is
+            connected to.
+        """
+        if self._rows_count is not None:
+            return self._rows_count
+        data = self.execute()
+        self._rows_count = len(data)
+        return self._rows_count
+
+    @cached_property
+    def columns_count(self):
+        """
+        Return the number of columns in the data source that the connector is
+        connected to.
+
+        Returns:
+            int: The number of columns in the data source that the connector is
+            connected to.
+        """
+        if self._columns_count is not None:
+            return self._columns_count
+        data = self.execute()
+        self._columns_count = len(data.columns)
+        return self._columns_count


The rows_count and columns_count methods are decorated with the cached_property decorator from the functools module. This is a good practice for properties that are expensive to compute. However, these methods are also storing their results in instance variables (_rows_count and _columns_count). This is redundant and could lead to inconsistencies if these instance variables are modified elsewhere in the code. Consider removing the instance variables and relying solely on the cached_property decorator.

coderabbitai · 2023-10-13T08:52:03Z

pandasai/connectors/airtable.py

+    def column_hash(self):
+        """
+        Return the hash code that is unique to the columns of the data source
+        that the connector is connected to.
+
+        Returns:
+            int: The hash code that is unique to the columns of the data source
+            that the connector is connected to.
+        """


The column_hash property is computing a hash of the column names and the filter formula. This is a good practice for caching purposes. However, it assumes that self.instance is a DataFrame. Consider adding error handling for cases where self.instance is not a DataFrame.

ArslanSaleem · 2023-10-13T09:19:46Z

pandasai/connectors/airtable.py

@@ -225,8 +246,11 @@ def columns_count(self):
            int: The number of columns in the data source that the connector is
            connected to.
        """
+        if self._columns_count is not None:
+            return self._columns_count
        data = self.execute()


Here for column count get head() as it is also cached using @cache also it will have less data

ArslanSaleem · 2023-10-13T09:20:43Z

pandasai/connectors/airtable.py

+        if cached:
+            return pd.read_parquet(cached)
+
+        if isinstance(self.instance, pd.DataFrame):


This self.instance is not need because we are already caching it

ArslanSaleem · 2023-10-13T09:21:17Z

pandasai/connectors/airtable.py



 class AirtableConnector(BaseConnector):
    """
    Airtable connector to retrieving record data.
    """

+    _rows_count: int = None
+    _columns_count: int = None
+    instance = None


This is extra because we already cache the dataframe

ArslanSaleem · 2023-10-13T09:23:09Z

pandasai/connectors/airtable.py

@@ -200,9 +213,14 @@ def head(self):
            DatFrameType: The head of the data source
                 that the conector is connected to .
        """
-        return self.fetch_data().head()
+        # return self.fetch_data().head()
+        if isinstance(self.instance, pd.DataFrame):


lets remove this instance for the head call api with params limit = 5

coderabbitai

Review Status

Actionable comments generated: 10

Commits

Files that changed from the base of the PR and between 3f8f55f and 8c0e07b.

Files selected for processing (2)

pandasai/connectors/airtable.py (1 hunks)
tests/connectors/test_airtable.py (1 hunks)

Additional comments (Suppressed): 1

pandasai/connectors/airtable.py (1)

39-47: The code checks if the config is None and if so, it tries to populate it from environment variables. However, the _populate_config_from_env method is not defined in this class or its parent class. Please ensure that this method is implemented and correctly populates the configuration from environment variables.

coderabbitai · 2023-10-13T12:34:41Z

tests/connectors/test_airtable.py

+    @patch("requests.get")
+    def test_columns_count_property(self, mock_request_get):
+        # Test columns_count property
+        mock_request_get.return_value.json.return_value = json.loads(
+            self.expected_data_json
+        )
+        mock_request_get.return_value.status_code = 200
+        rows_count = self.connector.columns_count
+        self.assertEqual(rows_count, 4)


In the test_columns_count_property method, you are testing the columns_count property of the AirtableConnector class. This is a good practice. However, you are not asserting that the requests.get method was called with the correct arguments. Consider adding an assertion to check that the correct URL and headers were passed to the requests.get method.

coderabbitai · 2023-10-13T12:34:42Z

pandasai/connectors/airtable.py

+    def __init__(
+        self,
+        config: Optional[Union[AirtableConnectorConfig, dict]] = None,
+        cache_interval: int = 600,
+    ):
+        if isinstance(config, dict):
+            if "api_key" in config and "base_id" in config and "table" in config:
+                config = AirtableConnectorConfig(**config)
+            else:
+                raise KeyError(
+                    "Please specify all api_key,table,base_id properly in config ."
+                )


The __init__ method checks if the config parameter is a dictionary and if it contains the keys api_key, base_id, and table. If not, it raises a KeyError. This could be improved by providing a more specific error message for each missing key. This way, the user will know exactly which key is missing from the configuration.

- if "api_key" in config and "base_id" in config and "table" in config: - config = AirtableConnectorConfig(**config) - else: - raise KeyError( - "Please specify all api_key,table,base_id properly in config ." - ) + missing_keys = [key for key in ["api_key", "base_id", "table"] if key not in config] + if missing_keys: + raise KeyError(f"Missing keys in config: {', '.join(missing_keys)}") + config = AirtableConnectorConfig(**config)

coderabbitai · 2023-10-13T12:34:42Z

pandasai/connectors/airtable.py

+    def _init_connection(self, config: BaseConnectorConfig):
+        """
+        make connection to database
+        """
+        config = config.dict()
+        url = f"{self._root_url}{config['base_id']}/{config['table']}"
+        response = requests.head(
+            url=url, headers={"Authorization": f"Bearer {config['api_key']}"}
+        )
+        if response.status_code == 200:
+            self.logger.log(
+                """
+                Connected to Airtable.
+            """
+            )
+        else:
+            raise InvalidRequestError(
+                f"""Failed to connect to Airtable. 
+                    Status code: {response.status_code}, 
+                    message: {response.text}"""
+            )


The _init_connection method sends a HEAD request to the Airtable API to check the connection. If the status code is not 200, it raises an InvalidRequestError. However, the error message could be improved by including the URL that failed to connect. This would help in debugging connection issues.

- f"""Failed to connect to Airtable. - Status code: {response.status_code}, - message: {response.text}""" + f"""Failed to connect to Airtable at {url}. + Status code: {response.status_code}, + message: {response.text}"""

coderabbitai · 2023-10-13T12:34:42Z

pandasai/connectors/airtable.py

+    def _cached(self, include_additional_filters: bool = False):
+        """
+        Returns the cached Airtable data if it exists and
+        is not older than the cache interval.
+
+        Returns :
+            DataFrame | None : The cached data if
+                it exists and is not older than the cache
+                interval, None otherwise.
+        """
+        cache_path = self._get_cache_path(include_additional_filters)
+        if not os.path.exists(cache_path):
+            return None
+
+        # If the file is older than 1 day , delete it.
+        if os.path.getmtime(cache_path) < time.time() - self._cache_interval:
+            if self.logger:
+                self.logger.log(f"Deleting expired cached data from {cache_path}")
+            os.remove(cache_path)
+            return None


The _cached method checks if the cache file exists and if it's older than the cache interval. If it's older, it deletes the file. However, it doesn't handle the case where the file deletion fails. It would be better to handle this case and log an error message.

- os.remove(cache_path) + try: + os.remove(cache_path) + except OSError as e: + if self.logger: + self.logger.log(f"Failed to delete expired cache file {cache_path}: {str(e)}")

coderabbitai · 2023-10-13T12:34:42Z

pandasai/connectors/airtable.py

+        data = []
+        while True:
+            response = self._request_api(params=params)
+
+            if response.status_code == 200:
+                res = response.json()
+                data.append(res)
+                if len(res["records"]) < 100:
+                    break
+            else:
+                raise InvalidRequestError(
+                    f"""Failed to connect to Airtable. 
+                        Status code: {response.status_code}, 
+                        message: {response.text}"""
+                )


The _fetch_data method sends a GET request to the Airtable API to fetch data. If the status code is not 200, it raises an InvalidRequestError. Similar to the _init_connection method, the error message could be improved by including the URL that failed to fetch data.

- f"""Failed to connect to Airtable. - Status code: {response.status_code}, - message: {response.text}""" + f"""Failed to fetch data from Airtable at {url}. + Status code: {response.status_code}, + message: {response.text}"""

coderabbitai · 2023-10-13T12:34:42Z

pandasai/connectors/airtable.py

+    def _preprocess(self, data):
+        """
+        Preprocesses Json response data
+        To prepare dataframe correctly.
+        """
+        columns = set()
+        data_dict_list = []
+        for item in data:
+            for entry in item["records"]:
+                data_dict = {"id": entry["id"], "createdTime": entry["createdTime"]}
+                for field_name, field_value in entry["fields"].items():
+                    data_dict[field_name] = field_value
+                    columns.add(field_name)
+                data_dict_list.append(data_dict)
+
+        df = pd.DataFrame(data_dict_list)
+        return df


The _preprocess method processes the JSON response data from the Airtable API and converts it to a DataFrame. However, it doesn't handle the case where the fields key is missing in the entry dictionary. This could lead to a KeyError. It would be better to handle this case and set a default value for fields.

- for field_name, field_value in entry["fields"].items(): + for field_name, field_value in entry.get("fields", {}).items():

ArslanSaleem · 2023-10-13T13:05:00Z

pandasai/connectors/airtable.py

        """
        Preprocesses Json response data
        To prepare dataframe correctly.
        """
-        records = [


Lets add this logic to _fetch_data function to get rid of three nested loops

coderabbitai

Review Status

Actionable comments generated: 5

Commits

Files that changed from the base of the PR and between 8c0e07b and f6955eb.

Files selected for processing (2)

pandasai/connectors/airtable.py (1 hunks)
tests/connectors/test_airtable.py (1 hunks)

Additional comments (Suppressed): 8

tests/connectors/test_airtable.py (1)
57-58: The fallback_name property is tested twice in this test suite (also at lines 80-83). Consider removing one to avoid redundancy.
-    def test_fallback_name(self):
-        self.assertEqual(self.connector.fallback_name, self.config["table"])
pandasai/connectors/airtable.py (7)

39-47: If config is not provided, the code attempts to populate it from environment variables. This is a good practice for sensitive data like API keys. However, the _populate_config_from_env method is not shown in this hunk. Ensure that it correctly retrieves and validates the environment variables.

58-74: The _init_connection method sends a HEAD request to the Airtable API to check the connection. If the response status code is not 200, it raises an InvalidRequestError. This is a good practice for validating the connection and providing a clear error message if the connection fails. However, consider adding a retry mechanism for transient network errors.

104-109: The _cached method checks if the cache file exists and if it's older than the cache interval. If it's older, it deletes the file. This is a good practice for managing cache files. However, consider handling potential exceptions when deleting the file.

123-127: The _save_cache method saves the DataFrame to a Parquet file. This is a good practice for caching data. However, consider handling potential exceptions when saving the file.

192-215: The _fetch_data method fetches data from the Airtable API and converts it to a DataFrame. It uses pagination to retrieve all records. This is a good practice for handling large datasets. However, consider adding a limit to the number of requests to prevent potential infinite loops.

228-234: The head method retrieves the first 5 records from the Airtable API. This is a good practice for previewing data. However, consider handling potential exceptions when making the request.

262-266: The rows_count and columns_count methods return the number of rows and columns in the data source, respectively. They use the @cached_property decorator to cache the results. This is a good practice for optimizing performance. However, consider handling potential exceptions when executing the connector.

coderabbitai · 2023-10-13T14:36:15Z

tests/connectors/test_airtable.py

+    @patch("requests.get")
+    def test_execute(self, mock_request_get):
+        mock_request_get.return_value.json.return_value = json.loads(
+            self.expected_data_json
+        )
+        mock_request_get.return_value.status_code = 200
+        execute_data = self.connector.execute()
+        self.assertEqual(type(execute_data), pd.DataFrame)
+        self.assertEqual(len(execute_data), 3)


The test_execute method tests the execute method of the AirtableConnector class. However, it only checks the type and length of the returned data. It would be beneficial to also verify the content of the returned DataFrame to ensure it matches the expected data.

coderabbitai · 2023-10-13T14:36:15Z

pandasai/connectors/airtable.py

+        self,
+        config: Optional[Union[AirtableConnectorConfig, dict]] = None,
+        cache_interval: int = 600,
+    ):
+        if isinstance(config, dict):
+            if "api_key" in config and "base_id" in config and "table" in config:
+                config = AirtableConnectorConfig(**config)
+            else:
+                raise KeyError(
+                    "Please specify all api_key,table,base_id properly in config ."
+                )


The config parameter is optional and can be either a dictionary or an AirtableConnectorConfig object. If it's a dictionary, it must contain the keys api_key, base_id, and table. If these keys are not present, a KeyError is raised. Consider providing a more detailed error message to help the user understand what each key represents.

- raise KeyError("Please specify all api_key,table,base_id properly in config .") + raise KeyError("Please specify all api_key, table, and base_id properly in the config. api_key is your Airtable API key, base_id is the ID of the base you are connecting to, and table is the name of the table within the base.")

coderabbitai

Review Status

Actionable comments generated: 11

Commits

Files that changed from the base of the PR and between f6955eb and b2fe3a7.

Files selected for processing (1)

pandasai/connectors/airtable.py (1 hunks)

coderabbitai · 2023-10-16T07:12:16Z

pandasai/connectors/airtable.py

+        self,
+        config: Optional[Union[AirtableConnectorConfig, dict]] = None,
+        cache_interval: int = 600,
+    ):


The cache_interval parameter is not documented in the function docstring. Please add a description for it.

coderabbitai · 2023-10-16T07:12:16Z

pandasai/connectors/airtable.py

+            if "api_key" in config and "base_id" in config and "table" in config:
+                config = AirtableConnectorConfig(**config)
+            else:
+                raise KeyError(
+                    "Please specify all api_key,table,base_id properly in config ."
+                )


The check for the presence of "api_key", "base_id", and "table" in the config dictionary is not robust. If any of these keys have a value that evaluates to False (like an empty string), the check will pass, but the subsequent code may fail. Consider using dict.get() to check for the presence of these keys.

coderabbitai · 2023-10-16T07:12:16Z

pandasai/connectors/airtable.py

+        url = f"{self._root_url}{config['base_id']}/{config['table']}"
+        response = requests.head(
+            url=url, headers={"Authorization": f"Bearer {config['api_key']}"}
+        )
+        if response.status_code == 200:
+            self.logger.log(
+                """
+                Connected to Airtable.
+            """
+            )
+        else:
+            raise InvalidRequestError(
+                f"""Failed to connect to Airtable. 
+                    Status code: {response.status_code}, 
+                    message: {response.text}"""
+            )


The connection initialization does not handle potential network errors that could occur when making the request. Consider adding a try-except block to handle exceptions like requests.exceptions.RequestException.

coderabbitai · 2023-10-16T07:12:16Z

pandasai/connectors/airtable.py

+        cache_dir = os.path.join(os.getcwd(), "")
+        try:
+            cache_dir = os.path.join((find_project_root()), "cache")
+        except ValueError:
+            cache_dir = os.path.join(os.getcwd(), "cache")


The code is trying to find the project root and if it fails, it defaults to the current working directory. This could lead to inconsistent behavior depending on where the script is run from. Consider making the cache directory a configurable option.

coderabbitai · 2023-10-16T07:12:16Z

pandasai/connectors/airtable.py

+        if os.path.getmtime(cache_path) < time.time() - self._cache_interval:
+            if self.logger:
+                self.logger.log(f"Deleting expired cached data from {cache_path}")
+            os.remove(cache_path)
+            return None


The cache invalidation strategy is based on the modification time of the cache file. This could lead to issues if the system clock is changed or if the file is manually modified. Consider using a more robust cache invalidation strategy, like storing the cache creation time in the file itself or in a separate metadata file.

coderabbitai · 2023-10-16T07:12:17Z

pandasai/connectors/airtable.py

+        cached = self._cached() or self._cached(include_additional_filters=True)
+        if cached:
+            return pd.read_parquet(cached)
+
+        if isinstance(self._instance, pd.DataFrame):
+            return self._instance
+        else:
+            self._instance = self._fetch_data()
+


The execute method does not handle potential errors that could occur when fetching data from the API. Consider adding error handling to the _fetch_data method or to the execute method itself.

coderabbitai · 2023-10-16T07:12:17Z

pandasai/connectors/airtable.py

+        data = []
+        while True:
+            response = self._request_api(params=params)
+
+            if response.status_code != 200:
+                raise InvalidRequestError(
+                    f"Failed to connect to Airtable. "
+                    f"Status code: {response.status_code}, "
+                    f"message: {response.text}"
+                )
+
+            res = response.json()
+            records = res.get("records", [])
+            data.extend({"id": record["id"], **record["fields"]} for record in records)
+
+            if len(records) < 100 or "offset" not in res:
+                break
+
+            if "offset" in res:
+                params["offset"] = res["offset"]
+
+        return pd.DataFrame(data)


The data fetching loop does not have a maximum iteration count, which could lead to an infinite loop if the API continually returns an "offset" in the response. Consider adding a maximum iteration count or a timeout to prevent potential infinite loops.

coderabbitai · 2023-10-16T07:12:17Z

pandasai/connectors/airtable.py

+        data = self._request_api(params={"maxRecords": 5})
+        return pd.DataFrame(
+            [
+                {"id": record["id"], **record["fields"]}
+                for record in data.json()["records"]
+            ]
+        )


The head method does not handle potential errors that could occur when making the API request. Consider adding error handling to this method.

coderabbitai · 2023-10-16T07:12:17Z

pandasai/connectors/airtable.py

+        data = self.head()
+        self._columns_count = len(data.columns)
+        return self._columns_count


The rows_count and columns_count methods execute the connector or fetch the head of the data, respectively, even if the count is already stored. Consider checking if the count is already stored before executing the connector or fetching the head.

coderabbitai · 2023-10-16T07:12:17Z

pandasai/connectors/airtable.py

+            self._instance = self.execute()
+        columns_str = "|".join(self._instance.columns)
+        columns_str += "WHERE" + self._build_formula()
+        return hashlib.sha256(columns_str.encode("utf-8")).hexdigest()


The column_hash property does not handle potential errors that could occur when executing the connector. Consider adding error handling to this property.

…Tanmaypatil123/pandas-ai into feat/support-airtable-connector

coderabbitai

Review Status

Actionable comments generated: 7

Commits

Files that changed from the base of the PR and between b2fe3a7 and 88c419d.

Files selected for processing (1)

tests/connectors/test_airtable.py (1 hunks)

coderabbitai · 2023-10-17T13:11:44Z

tests/connectors/test_airtable.py

+        self.assertEqual(self.connector._config, self.config)
+        self.assertEqual(self.connector._root_url, self.root_url)
+        self.assertEqual(self.connector._cache_interval, 600)
+


The test test_constructor_and_properties is checking the internal state of the AirtableConnector object. This is generally not a good practice as it makes the test fragile and tightly coupled with the implementation. Instead, consider testing the behavior of the object. For example, you could test that the execute method returns the expected results given a certain configuration.

coderabbitai · 2023-10-17T13:11:44Z

tests/connectors/test_airtable.py

+    @patch("requests.get")
+    def test_execute(self, mock_request_get):
+        mock_request_get.return_value.json.return_value = json.loads(
+            self.expected_data_json
+        )
+        mock_request_get.return_value.status_code = 200
+        execute_data = self.connector.execute()
+        self.assertEqual(type(execute_data), pd.DataFrame)
+        self.assertEqual(len(execute_data), 3)
+


In the test_execute method, you are mocking the requests.get method and setting the return value to a predefined JSON response. This is a good practice for isolating the unit under test. However, you should also test the case where the request fails (e.g., the status code is not 200). This will help ensure that your code handles errors correctly.

coderabbitai · 2023-10-17T13:11:44Z

tests/connectors/test_airtable.py

+    @patch("requests.get")
+    def test_head(self, mock_request_get):
+        mock_request_get.return_value.json.return_value = json.loads(
+            self.expected_data_json
+        )
+        mock_request_get.return_value.status_code = 200
+        execute_data = self.connector.head()
+        self.assertEqual(type(execute_data), pd.DataFrame)
+        self.assertLessEqual(len(execute_data), 5)


Similar to the test_execute method, the test_head method should also test the case where the request fails.

coderabbitai · 2023-10-17T13:11:45Z

tests/connectors/test_airtable.py

+    @patch("requests.get")
+    def test_rows_count_property(self, mock_request_get):
+        # Test rows_count property
+        mock_request_get.return_value.json.return_value = json.loads(
+            self.expected_data_json
+        )
+        mock_request_get.return_value.status_code = 200
+        rows_count = self.connector.rows_count
+        self.assertEqual(rows_count, 3)


In the test_rows_count_property method, you are testing the rows_count property of the AirtableConnector object. This is a good practice as it verifies that the property returns the expected value. However, you should also test the case where the request fails.

coderabbitai · 2023-10-17T13:11:45Z

tests/connectors/test_airtable.py

+    @patch("requests.get")
+    def test_columns_count_property(self, mock_request_get):
+        # Test columns_count property
+        mock_request_get.return_value.json.return_value = json.loads(
+            self.expected_data_json
+        )
+        mock_request_get.return_value.status_code = 200
+        columns_count = self.connector.columns_count
+        self.assertEqual(columns_count, 3)


In the test_columns_count_property method, similar to the test_rows_count_property method, you should also test the case where the request fails.

coderabbitai · 2023-10-17T13:11:45Z

tests/connectors/test_airtable.py

+    def test_build_formula_method(self):
+        formula = self.connector._build_formula()
+        expected_formula = "AND(Status='In progress')"
+        self.assertEqual(formula,expected_formula)


The test_build_formula_method method is testing the _build_formula method of the AirtableConnector object. This is a good practice as it verifies that the method returns the expected value. However, you should also test the case where the where parameter is not provided in the configuration. This will help ensure that your code handles this case correctly.

coderabbitai · 2023-10-17T13:11:45Z

tests/connectors/test_airtable.py

+    @patch("requests.get")
+    def test_column_hash(self,mock_request_get):
+        mock_request_get.return_value.json.return_value = json.loads(
+            self.expected_data_json
+        )
+        mock_request_get.return_value.status_code = 200
+        returned_hash = self.connector.column_hash
+        self.assertEqual(
+            returned_hash,
+            "e4cdc9402a0831fb549d7fdeaaa089b61aeaf61e14b8a044bc027219b2db941e"
+        )


In the test_column_hash method, similar to the test_rows_count_property and test_columns_count_property methods, you should also test the case where the request fails.

feat : Airtable connector Support

197740b

coderabbitai bot reviewed Oct 10, 2023

View reviewed changes

Removed unneccesary print statements

826aa02

coderabbitai bot reviewed Oct 10, 2023

View reviewed changes

gventuri reviewed Oct 10, 2023

View reviewed changes

Fixed tests and implementation

cd78117

coderabbitai bot reviewed Oct 11, 2023

View reviewed changes

where filter and pagination added

1ec1dd5

coderabbitai bot reviewed Oct 12, 2023

View reviewed changes

Removed Pagination

2171296

coderabbitai bot reviewed Oct 12, 2023

View reviewed changes

Refactored where filter functionality

a1f25b8

coderabbitai bot reviewed Oct 12, 2023

View reviewed changes

Docs and example change

02007ae

ArslanSaleem reviewed Oct 13, 2023

View reviewed changes

ArslanSaleem requested changes Oct 13, 2023

View reviewed changes

Fixed cached methods and code improvements

3f8f55f

coderabbitai bot reviewed Oct 13, 2023

View reviewed changes

ArslanSaleem reviewed Oct 13, 2023

View reviewed changes

Added Pagination to connector

8c0e07b

coderabbitai bot reviewed Oct 13, 2023

View reviewed changes

ArslanSaleem reviewed Oct 13, 2023

View reviewed changes

Removed preprocces method

f6955eb

coderabbitai bot reviewed Oct 13, 2023

View reviewed changes

add offset not in res break condition

b2fe3a7

coderabbitai bot reviewed Oct 16, 2023

View reviewed changes

Tanmaypatil123 added 2 commits October 17, 2023 18:29

Added more test coverge for where clause and column hashing

fc5c326

Merge branch 'feat/support-airtable-connector' of https://github.com/…

88c419d

…Tanmaypatil123/pandas-ai into feat/support-airtable-connector

coderabbitai bot reviewed Oct 17, 2023

View reviewed changes

gventuri merged commit cec0a61 into Sinaptik-AI:main Oct 18, 2023
9 checks passed

		self.assertEqual(self.connector._config, self.config)
		self.assertEqual(self.connector._root_url, self.root_url)

		self.connector.head = Mock(return_value=expected_data)
		head_data = self.connector.head()

		if config["api_key"] and config["base_id"] and config["table"]:
		config = AirtableConnectorConfig(**config)

		@@ -20,6 +20,17 @@ class BaseConnectorConfig(BaseModel):
		where: list[list[str]] = None

feat : Airtable connector Support #635

feat : Airtable connector Support #635

Conversation

Tanmaypatil123 commented Oct 10, 2023 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Oct 10, 2023 • edited Loading

Walkthrough

Changes

Chat with CodeRabbit Bot (@coderabbitai)

Pause Incremental Reviews

codecov-commenter commented Oct 10, 2023 • edited Loading

Codecov Report

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coderabbitai bot Oct 10, 2023

Choose a reason for hiding this comment

gventuri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gventuri commented Oct 10, 2023

gventuri commented Oct 10, 2023

gventuri commented Oct 10, 2023

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 11, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 11, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 11, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 11, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 11, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 11, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 12, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 12, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 12, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 12, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 12, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 12, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 12, 2023

Choose a reason for hiding this comment

Tanmaypatil123 commented Oct 10, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 10, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

codecov-commenter commented Oct 10, 2023 •

edited

Loading

coderabbitai bot Oct 10, 2023 •

edited

Loading

coderabbitai bot Oct 10, 2023 •

edited

Loading