Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandasai can't use training data. #1457

Open
YBlood134 opened this issue Dec 9, 2024 · 5 comments
Open

pandasai can't use training data. #1457

YBlood134 opened this issue Dec 9, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@YBlood134
Copy link

System Info

OS version:Windows11
python version:3.9

🐛 Describe the bug

Hello! I have tried to train PandasAI in two ways:instruction training & q/a training.And I followed the tutorial here exactly:https://docs.pandas-ai.com/train. But it didn't work,below is the INFO screen shot.
image
Besides,I can't see anything about training in this website:
image

Copy link

dosubot bot commented Dec 9, 2024

Hey @YBlood134! I'm here to help you with PandasAI. I can assist you in solving bugs, answering questions, and becoming a contributor. Let's tackle this issue together!

I found two similar open issues that might help you resolve the problem with PandasAI not using the training data:

  1. Pandas Agent unable to use the trained data: This issue suggests ensuring that your PandasAI API key is correctly set as an environment variable, enabling verbose logging to get detailed execution information, checking logs and errors, and verifying the configuration of your vector store. It also emphasizes the correct instantiation of the vector store and agent [1].

  2. Unable to train using interface: This issue recommends setting the PandasAI API Key as an environment variable, using the train method on the Agent for specific instructions or Q/A pairs, ensuring the vector store is instantiated correctly with the API key, clearing previous conversation history if needed, and using the explain method to understand the solution process [2].

These steps should help you troubleshoot and resolve the issue with PandasAI not utilizing the training data.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the bug Something isn't working label Dec 9, 2024
@YBlood134
Copy link
Author

I have made the changes according to the instructions in both articles, but it still doesn't work and the message still appears.
@dosu

Hey @YBlood134! I'm here to help you with PandasAI. I can assist you in solving bugs, answering questions, and becoming a contributor. Let's tackle this issue together!

I found two similar open issues that might help you resolve the problem with PandasAI not using the training data:

  1. Pandas Agent unable to use the trained data: This issue suggests ensuring that your PandasAI API key is correctly set as an environment variable, enabling verbose logging to get detailed execution information, checking logs and errors, and verifying the configuration of your vector store. It also emphasizes the correct instantiation of the vector store and agent [1].
  2. Unable to train using interface: This issue recommends setting the PandasAI API Key as an environment variable, using the train method on the Agent for specific instructions or Q/A pairs, ensuring the vector store is instantiated correctly with the API key, clearing previous conversation history if needed, and using the explain method to understand the solution process [2].

These steps should help you troubleshoot and resolve the issue with PandasAI not utilizing the training data.

To continue talking to Dosu, mention @dosu.

Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Copy link

dosubot bot commented Dec 9, 2024

To address the issue with PandasAI not using the training data, ensure the following steps are correctly implemented:

  1. API Key Configuration: Double-check that your PandasAI API key is set as an environment variable. This is crucial for accessing the vector store:

    import os
    os.environ["PANDASAI_API_KEY"] = "YOUR_PANDASAI_API_KEY"
  2. Vector Store Setup: Ensure that a vector store is correctly instantiated and passed to the Agent. If you're using a local vector store like ChromaDB, make sure it's properly configured:

    from pandasai import Agent
    from pandasai.ee.vectorstores import ChromaDB
    
    vector_store = ChromaDB()
    agent = Agent("data.csv", vectorstore=vector_store)
  3. Training Method: Use the train method on the Agent for both instructions and Q/A training. Ensure that both queries and codes are provided for Q/A training:

    agent.train(queries=["Your query"], codes=["Your code"])
  4. Verbose Logging: Enable verbose logging to capture detailed execution information, which can help diagnose issues:

    from pandasai.helpers.logger import Logger
    
    logger = Logger(save_logs=True, verbose=True)
    logger.log("Starting data processing", level=logging.INFO)
  5. Error Correction and Retries: Ensure that the use_error_correction_framework and max_retries options are enabled in your configuration to handle potential errors:

    {
      "use_error_correction_framework": true,
      "max_retries": 3
    }

If these steps are correctly followed and the issue persists, consider checking the logs for any specific error messages that might indicate what is going wrong. This can provide more insight into the problem [1][2].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@YBlood134
Copy link
Author

To address the issue with PandasAI not using the training data, ensure the following steps are correctly implemented:

  1. API Key Configuration: Double-check that your PandasAI API key is set as an environment variable. This is crucial for accessing the vector store:
    import os
    os.environ["PANDASAI_API_KEY"] = "YOUR_PANDASAI_API_KEY"
  2. Vector Store Setup: Ensure that a vector store is correctly instantiated and passed to the Agent. If you're using a local vector store like ChromaDB, make sure it's properly configured:
    from pandasai import Agent
    from pandasai.ee.vectorstores import ChromaDB
    
    vector_store = ChromaDB()
    agent = Agent("data.csv", vectorstore=vector_store)
  3. Training Method: Use the train method on the Agent for both instructions and Q/A training. Ensure that both queries and codes are provided for Q/A training:
    agent.train(queries=["Your query"], codes=["Your code"])
  4. Verbose Logging: Enable verbose logging to capture detailed execution information, which can help diagnose issues:
    from pandasai.helpers.logger import Logger
    
    logger = Logger(save_logs=True, verbose=True)
    logger.log("Starting data processing", level=logging.INFO)
  5. Error Correction and Retries: Ensure that the use_error_correction_framework and max_retries options are enabled in your configuration to handle potential errors:
    {
      "use_error_correction_framework": true,
      "max_retries": 3
    }

If these steps are correctly followed and the issue persists, consider checking the logs for any specific error messages that might indicate what is going wrong. This can provide more insight into the problem [1][2].

To continue talking to Dosu, mention @dosu.

Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

C:\Users\22935\PycharmProjects\learnPy.venv\Scripts\python.exe C:\Users\22935\PycharmProjects\learnPy\from_sql_with_llama\bamboollm.py
2024-12-09 14:05:27.756 INFO pandasai.helpers.logger: Persisting Agent Training data in C:\Users\22935\PycharmProjects\learnPy\from_sql_with_llama\chromadb
2024-12-09 14:05:27.761 INFO pandasai.helpers.logger: Successfully initialized collection pandasai
Traceback (most recent call last):
File "C:\Users\22935\PycharmProjects\learnPy.venv\lib\site-packages\chromadb\api\models\CollectionCommon.py", line 90, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\22935\PycharmProjects\learnPy.venv\lib\site-packages\chromadb\api\models\CollectionCommon.py", line 207, in _validate_and_prepare_add_request
validate_insert_record_set(record_set=add_records)
File "C:\Users\22935\PycharmProjects\learnPy.venv\lib\site-packages\chromadb\api\types.py", line 225, in validate_insert_record_set
_validate_record_set_length_consistency(record_set)
File "C:\Users\22935\PycharmProjects\learnPy.venv\lib\site-packages\chromadb\api\types.py", line 252, in _validate_record_set_length_consistency
raise ValueError(f"Unequal lengths for fields: {error_str}")
ValueError: Unequal lengths for fields: ids: 40, documents: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\22935\PycharmProjects\learnPy\from_sql_with_llama\bamboollm.py", line 72, in
agent.train(docs="only show the sql query for the question")
File "C:\Users\22935\PycharmProjects\learnPy.venv\lib\site-packages\pandasai\agent\base.py", line 371, in train
self._vectorstore.add_docs(docs)
File "C:\Users\22935\PycharmProjects\learnPy.venv\lib\site-packages\pandasai\ee\vectorstores\chroma.py", line 130, in add_docs
self._docs_collection.add(
File "C:\Users\22935\PycharmProjects\learnPy.venv\lib\site-packages\chromadb\api\models\Collection.py", line 81, in add
add_request = self._validate_and_prepare_add_request(
File "C:\Users\22935\PycharmProjects\learnPy.venv\lib\site-packages\chromadb\api\models\CollectionCommon.py", line 93, in wrapper
raise type(e)(msg).with_traceback(e.traceback)
File "C:\Users\22935\PycharmProjects\learnPy.venv\lib\site-packages\chromadb\api\models\CollectionCommon.py", line 90, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\22935\PycharmProjects\learnPy.venv\lib\site-packages\chromadb\api\models\CollectionCommon.py", line 207, in _validate_and_prepare_add_request
validate_insert_record_set(record_set=add_records)
File "C:\Users\22935\PycharmProjects\learnPy.venv\lib\site-packages\chromadb\api\types.py", line 225, in validate_insert_record_set
_validate_record_set_length_consistency(record_set)
File "C:\Users\22935\PycharmProjects\learnPy.venv\lib\site-packages\chromadb\api\types.py", line 252, in _validate_record_set_length_consistency
raise ValueError(f"Unequal lengths for fields: {error_str}")
ValueError: Unequal lengths for fields: ids: 40, documents: 1 in add.

Process finished with exit code 1
@dosu

Copy link

dosubot bot commented Dec 9, 2024

The error "Unequal lengths for fields: ids: 40, documents: 1" indicates that there is a mismatch between the number of IDs and the number of documents being added to the ChromaDB vector store. This typically happens when the add_docs method is called with a single document but multiple IDs are generated or provided.

To resolve this issue, ensure that the docs parameter in the train method is correctly formatted. The docs parameter should be a string, as shown in the example below:

agent.train(docs="Only return loans information about the past 10 years")

If you are passing a single string, make sure that the add_docs method is not expecting multiple documents. If you need to add multiple documents, ensure that the number of documents matches the number of IDs provided or generated [1].

Check your implementation to ensure that the docs parameter is correctly set as a single string or adjust the logic to handle multiple documents if needed.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant