Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When splitting_pdf_page is started, only the last set of API requests can succeed. #467

Open
issj6 opened this issue Oct 14, 2024 · 1 comment

Comments

@issj6
Copy link

issj6 commented Oct 14, 2024

Describe the bug
When I set split_pdf_page=True,split_pdf_concurrency_level=15.
Assuming the pdf is divided into 10 sets, it will report an error:
ERROR: Failed to send request for page 1
...
WARNING: Failed to partition set #1, its elements will be omitted in the final result.
...
WARNING: Failed to partition set #9, its elements will be omitted in the final result.
INFO: Successfully partitioned set #10, elements added to the final result.

To Reproduce
code:

import os, json

import requests
from unstructured_client.models.operations import PartitionRequest
from unstructured_client.models.shared import PartitionParameters, ChunkingStrategy

os.environ["UNSTRUCTURED_API_KEY"] = "EMPTY"
os.environ["UNSTRUCTURED_API_URL"] = ""

import unstructured_client
from unstructured_client.models import shared, operations

requests_client = requests.Session()
client = unstructured_client.UnstructuredClient(
    api_key_auth=os.getenv("UNSTRUCTURED_API_KEY"),
    server_url=os.getenv("UNSTRUCTURED_API_URL"),
    client=requests_client
)

filename = "./test_pdf.pdf"

file = open(filename, "rb")
req = operations.PartitionRequest(
    partition_parameters=shared.PartitionParameters(
        files=shared.Files(
            content=file.read(),
            file_name=filename,
        ),
        strategy=shared.Strategy.HI_RES,
        split_pdf_page=True,
        split_pdf_concurrency_level=15,
        chunking_strategy=ChunkingStrategy("by_title")
    )
)

try:
    res = client.general.partition(req)
    element_dicts = [element for element in res.elements]

    print(element_dicts)
    for e in element_dicts:
        print(e['text'])
except Exception as e:
    print(e)

Console Information:

INFO: Preparing to split document for partition.
INFO: Concurrency level set to 15
INFO: Splitting pages 1 to 23 (23 total)
INFO: Determined optimal split size of 2 pages.
INFO: Partitioning 11 files with 2 page(s) each.
INFO: Partitioning 1 file with 1 page(s).
INFO: Partitioning set #1 (pages 1-2).
INFO: Partitioning set #2 (pages 3-4).
INFO: Partitioning set #3 (pages 5-6).
INFO: Partitioning set #4 (pages 7-8).
INFO: Partitioning set #5 (pages 9-10).
INFO: Partitioning set #6 (pages 11-12).
INFO: Partitioning set #7 (pages 13-14).
INFO: Partitioning set #8 (pages 15-16).
INFO: Partitioning set #9 (pages 17-18).
INFO: Partitioning set #10 (pages 19-20).
INFO: Partitioning set #11 (pages 21-22).
INFO: Partitioning set #12 (pages 23-23).
ERROR: Failed to send request for page 1
ERROR: Failed to send request for page 3
ERROR: Failed to send request for page 5
ERROR: Failed to send request for page 7
ERROR: Failed to send request for page 9
ERROR: Failed to send request for page 11
ERROR: Failed to send request for page 13
ERROR: Failed to send request for page 15
ERROR: Failed to send request for page 17
ERROR: Failed to send request for page 19
ERROR: Failed to send request for page 21
WARNING: Failed to partition set #1, its elements will be omitted in the final result.
WARNING: Failed to partition set #2, its elements will be omitted in the final result.
WARNING: Failed to partition set #3, its elements will be omitted in the final result.
WARNING: Failed to partition set #4, its elements will be omitted in the final result.
WARNING: Failed to partition set #5, its elements will be omitted in the final result.
WARNING: Failed to partition set #6, its elements will be omitted in the final result.
WARNING: Failed to partition set #7, its elements will be omitted in the final result.
WARNING: Failed to partition set #8, its elements will be omitted in the final result.
WARNING: Failed to partition set #9, its elements will be omitted in the final result.
WARNING: Failed to partition set #10, its elements will be omitted in the final result.
WARNING: Failed to partition set #11, its elements will be omitted in the final result.
INFO: Successfully partitioned set #12, elements added to the final result.
INFO: Successfully partitioned the document.
@sam-ayo
Copy link

sam-ayo commented Oct 23, 2024

I get the same error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants