Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datasets query error #8211

Closed
5 tasks done
soulzzz opened this issue Sep 10, 2024 · 8 comments · Fixed by #8217
Closed
5 tasks done

datasets query error #8211

soulzzz opened this issue Sep 10, 2024 · 8 comments · Fixed by #8217
Assignees
Labels
🐞 bug Something isn't working

Comments

@soulzzz
Copy link

soulzzz commented Sep 10, 2024

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.8.0

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

After upgrading the version, some of my knowledge datasets cannot be queried!!!
And i try to create a new dataset and the same error occurred
1.creating a new dataset,and upload 1 file
2.try a hit testing

2024-09-10 09:18:55,802.802 INFO [Thread-218 (process_request_thread)] [_internal.py:97] - 10.8.8.141 - - [10/Sep/2024 09:18:55] "OPTIONS /console/api/datasets/beb7e2e4-fa97-46fc-93c9-26612c3858f0/hit-testing HTTP/1.1" 200 -
2024-09-10 09:18:55,845.845 ERROR [Thread-219 (process_request_thread)] [hit_testing.py:82] - Hit testing failed.
Traceback (most recent call last):
  File "/home/sky/Second/dify/api/controllers/console/datasets/hit_testing.py", line 55, in post
    response = HitTestingService.retrieve(
  File "/home/sky/Second/dify/api/services/hit_testing_service.py", line 38, in retrieve
    all_documents = RetrievalService.retrieve(
  File "/home/sky/Second/dify/api/core/rag/datasource/retrieval_service.py", line 100, in retrieve
    raise Exception(exception_message)
Exception: Error during query: [{'locations': [{'column': 22885, 'line': 1}], 'message': 'Cannot query field "page" on type "Vector_index_beb7e2e4_fa97_46fc_93c9_26612c3858f0_Node".', 'path': None}]
2024-09-10 09:18:55,845.845 ERROR [Thread-219 (process_request_thread)] [app.py:838] - Exception on /console/api/datasets/beb7e2e4-fa97-46fc-93c9-26612c3858f0/hit-testing [POST]
Traceback (most recent call last):
  File "/home/sky/Second/dify/api/controllers/console/datasets/hit_testing.py", line 55, in post
    response = HitTestingService.retrieve(
  File "/home/sky/Second/dify/api/services/hit_testing_service.py", line 38, in retrieve
    all_documents = RetrievalService.retrieve(
  File "/home/sky/Second/dify/api/core/rag/datasource/retrieval_service.py", line 100, in retrieve
    raise Exception(exception_message)
Exception: Error during query: [{'locations': [{'column': 22885, 'line': 1}], 'message': 'Cannot query field "page" on type "Vector_index_beb7e2e4_fa97_46fc_93c9_26612c3858f0_Node".', 'path': None}]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/sky/anaconda3/envs/dify/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
  File "/home/sky/anaconda3/envs/dify/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/home/sky/anaconda3/envs/dify/lib/python3.10/site-packages/flask_restful/__init__.py", line 489, in wrapper
    resp = resource(*args, **kwargs)
  File "/home/sky/anaconda3/envs/dify/lib/python3.10/site-packages/flask/views.py", line 110, in view
    return current_app.ensure_sync(self.dispatch_request)(**kwargs)  # type: ignore[no-any-return]
  File "/home/sky/anaconda3/envs/dify/lib/python3.10/site-packages/flask_restful/__init__.py", line 604, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/home/sky/Second/dify/api/controllers/console/setup.py", line 65, in decorated
    return view(*args, **kwargs)
  File "/home/sky/Second/dify/api/libs/login.py", line 93, in decorated_view
    return current_app.ensure_sync(func)(*args, **kwargs)
  File "/home/sky/Second/dify/api/controllers/console/wraps.py", line 22, in decorated
    return view(*args, **kwargs)
  File "/home/sky/Second/dify/api/controllers/console/datasets/hit_testing.py", line 83, in post
    raise InternalServerError(str(e))
werkzeug.exceptions.InternalServerError: 500 Internal Server Error: Error during query: [{'locations': [{'column': 22885, 'line': 1}], 'message': 'Cannot query field "page" on type "Vector_index_beb7e2e4_fa97_46fc_93c9_26612c3858f0_Node".', 'path': None}]
2024-09-10 09:18:55,846.846 INFO [Thread-219 (process_request_thread)] [_internal.py:97] - 10.8.8.141 - - [10/Sep/2024 09:18:55] "POST /console/api/datasets/beb7e2e4-fa97-46fc-93c9-26612c3858f0/hit-testing HTTP/1.1" 500 -

✔️ Expected Behavior

queried Successfully

❌ Actual Behavior

queried fail

@dosubot dosubot bot added the 🐞 bug Something isn't working label Sep 10, 2024
Copy link

dosubot bot commented Sep 10, 2024

It looks like you're encountering a "Cannot query field" error when trying to perform hit testing on your datasets. This issue is often caused by querying a field that is not defined in the dataset fields.

Here are some common causes and solutions for this error:

  1. Field Not Defined: Ensure that the field you are querying, such as page, is defined in the dataset fields. If it is not, you need to add it to the dataset schema.

  2. Typographical Errors: Double-check for any typos in the field name. For example, querying page instead of the correct field name will cause this error.

  3. Incorrect Nesting: If the field is nested, ensure that the query correctly references the nested structure.

Here is an example of correctly defined fields in dataset_query_detail_fields:

dataset_query_detail_fields = {
    "id": fields.String,
    "content": fields.String,
    "source": fields.String,
    "source_app_id": fields.String,
    "created_by_role": fields.String,
    "created_by": fields.String,
    "created_at": TimestampField,
}

Ensure that the field page (or any other field you are querying) is correctly defined and referenced in the dataset fields [1].

Additionally, there are known issues or limitations with the hit-testing endpoint in the current version of Dify, such as:

  1. Dataset Not Found: If the dataset specified by dataset_id does not exist, a NotFound error is raised.
  2. No Permission: If the current user does not have permission to access the dataset, a Forbidden error is raised.
  3. Index Not Initialized: If the index for the dataset is not initialized, a DatasetNotInitializedError is raised.
  4. Provider Token Not Initialized: If the provider token is not initialized, a ProviderNotInitializeError is raised.
  5. Quota Exceeded: If the provider quota is exceeded, a ProviderQuotaExceededError is raised.
  6. Model Not Supported: If the model currently does not support the requested operation, a ProviderModelCurrentlyNotSupportError is raised.
  7. No Embedding or Reranking Model: If no embedding model or reranking model is available, a ProviderNotInitializeError is raised with a specific message.
  8. Invoke Error: If there is an error during the invocation of the model, a CompletionRequestError is raised.
  9. General Errors: Any other exceptions are logged and an InternalServerError is raised [2].

By addressing these potential issues, you should be able to resolve the error and successfully query your datasets.

To continue talking to Dosu, mention @dosu.

@AAEE86
Copy link
Contributor

AAEE86 commented Sep 10, 2024

Error during query: [{'locations': [{'column': 161, 'line': 1}], 'message': 'Cannot query field "page" on type "Vector_index_ed9112e3_6883_4e92_bb39_13f56f4dbe18_Node".', 'path': None}]; Error during query: [{'locations': [{'column': 17089, 'line': 1}], 'message': 'Cannot query field "page" on type "Vector_index_ed9112e3_6883_4e92_bb39_13f56f4dbe18_Node".', 'path': None}]

@Howe829
Copy link
Contributor

Howe829 commented Sep 10, 2024

I'll put some related issues here #8117 #8175

@AAEE86
Copy link
Contributor

AAEE86 commented Sep 10, 2024

Error during query: [{'locations': [{'column': 22888, 'line': 1}], 'message': 'Cannot query field "page" on type "Vector_index_ed9112e3_6883_4e92_bb39_13f56f4dbe18_Node".', 'path': None}]



During handling of the above exception, another exception occurred:



Traceback (most recent call last):

  File "/app/api/.venv/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request

    rv = self.dispatch_request()

  File "/app/api/.venv/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request

    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]

  File "/app/api/.venv/lib/python3.10/site-packages/flask_restful/__init__.py", line 489, in wrapper

    resp = resource(*args, **kwargs)

  File "/app/api/.venv/lib/python3.10/site-packages/flask/views.py", line 110, in view

    return current_app.ensure_sync(self.dispatch_request)(**kwargs)  # type: ignore[no-any-return]

  File "/app/api/.venv/lib/python3.10/site-packages/flask_restful/__init__.py", line 604, in dispatch_request

    resp = meth(*args, **kwargs)

  File "/app/api/controllers/console/setup.py", line 65, in decorated

    return view(*args, **kwargs)

  File "/app/api/libs/login.py", line 93, in decorated_view

    return current_app.ensure_sync(func)(*args, **kwargs)

  File "/app/api/controllers/console/wraps.py", line 22, in decorated

    return view(*args, **kwargs)

  File "/app/api/controllers/console/datasets/hit_testing.py", line 83, in post

    raise InternalServerError(str(e))

werkzeug.exceptions.InternalServerError: 500 Internal Server Error: Error during query: [{'locations': [{'column': 150, 'line': 1}], 'message': 'Cannot query field "page" on type "Vector_index_ed9112e3_6883_4e92_bb39_13f56f4dbe18_Node".', 'path': None}];

Error during query: [{'locations': [{'column': 22888, 'line': 1}], 'message': 'Cannot query field "page" on type "Vector_index_ed9112e3_6883_4e92_bb39_13f56f4dbe18_Node".', 'path': None}]

@crazywoola
Copy link
Member

I think this is introduced in this pr #7749

@crazywoola crazywoola self-assigned this Sep 10, 2024
@Howe829
Copy link
Contributor

Howe829 commented Sep 10, 2024

I think this is introduced in this pr #7749

No wonder, I tried the pdf file the hit-testing worked well, but the txt file failed.

@soulzzz
Copy link
Author

soulzzz commented Sep 10, 2024

I think this is introduced in this pr #7749

I think this is the root cause of the problem,i'm using weaviate DB as default DB

@Howe829
Copy link
Contributor

Howe829 commented Sep 10, 2024

attributes = ['doc_id', 'dataset_id', 'document_id', 'doc_hash', 'page']

remove the 'page' from attributes it will be solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants