Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redshift Data Paginator misleading documentation for NextToken when setting MaxItems #4125

Open
samuelwithey opened this issue May 13, 2024 · 2 comments
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue pagination service-api This issue is caused by the service API, not the SDK implementation.

Comments

@samuelwithey
Copy link

Describe the bug

Identified an issue with AWS Paginators, specifically using the GetStatementResult paginator from the AWS SDK for Python (Boto3) RedshiftDataAPIService. The primary concern is related to the expected behaviour of pagination when setting the MaxItems parameter in the PaginationConfig dictionary. Contrary to the AWS documentation expectations, providing a value for MaxItems in PaginationConfig does not result in multiple items representing pages in the PageIterator object when the total number of items is less than the available items and a NextToken is not returned to resume pagination.

Related issue:

Expected Behavior

In the GetStatementResult documentation, it states a PaginationConfig dict can be provide to control pagination, where MaxItems limits the total number of items to return, and if the total number of items available is more than the value specified in max-items then a NextToken will be provided in the output that you can use to resume pagination.

Current Behavior

The description of MaxItems and StartingToken does not match the observed behaviour. When setting the MaxItems to 1000, the expected behaviour is for the returned number of records to be the same as the set MaxItems, and if the returned number of items is less than the total number of items available, a NextToken will be provided in the output. From the code snippet below, a NextToken is not returned and is not present in the response syntax in RedshiftDataAPIService.Paginator.GetStatementResult. We can verify this by comparing TotalNumRows to the length of result['Records'] and viewing the response dictionary keys.

In the linked issue, it specifies using PageSize in the pagination config and using a position token, however PageSize is not a valid parameter and position or marker are not returned.

Error: Error during pagination: PageSize parameter is not supported for the pagination interface for this operation.

Reproduction Steps

result: Dict[str, Any] = {"Records": []}
paginator = client.get_paginator("get_statement_result")
pagination_config: Dict[str, Any] = {'MaxItems': 1000}
print(f"PaginationConfig: {pagination_config}")
page_iterator = paginator.paginate(
    Id=statement_id,
    PaginationConfig=pagination_config,
)
for page in page_iterator:
    print(f"Page Keys: {page.keys()}")
    if "ColumnMetadata" not in result:
        result["ColumnMetadata"] = page["ColumnMetadata"]
    result["Records"].extend(page["Records"])
    if "NextToken" in page:
        result["NextToken"] = page["NextToken"]
    print(f"Page TotalNumRows: {page['TotalNumRows']}")
print(f"Response Record Length: {len(result['Records'])}")
return result
PaginationConfig: {'MaxItems': 1000}
Page Keys: dict_keys(['ColumnMetadata', 'Records', 'TotalNumRows', 'ResponseMetadata'])
Page TotalNumRows: 104897
Response Record Length: 1000

Possible Solution

When applying the dir() function to the iterator or looking at the code for PageIterator, property resume_token is present. Using the resume_token for StartingToken in PaginationConfig for PageIterator will resume pagination. The new PageIterator will include another value for the resume_token property if there are more items available.

def get_statement_result(statement_id: str, next_token: str) -> Dict[str, Any]:
    result: Dict[str, Any] = {"Records": []}
    while True:
        paginator = client.get_paginator("get_statement_result")
        pagination_config: Dict[str, Any] = {"MaxItems": 1000}
        if next_token:
            pagination_config["StartingToken"] = next_token
        paginator = paginator.paginate(Id=statement_id, PaginationConfig=pagination_config)
        page_iter = iter(paginator)
        page = next(page_iter)
        if "ColumnMetadata" not in result:
            result["ColumnMetadata"] = page["ColumnMetadata"]
        result["Records"].extend(page["Records"])
        previous_token = next_token
        next_token = paginator.resume_token
        if not next_token or previous_token == next_token:
            break
    return result

Additional Information/Context

No response

SDK version used

Python 3.10, boto3 1.28.78

Environment details (OS name and version, etc.)

MacOS Sonoma 14.4.1

@samuelwithey samuelwithey added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels May 13, 2024
@tim-finnigan tim-finnigan self-assigned this May 20, 2024
@tim-finnigan
Copy link
Contributor

Thanks for reporting this issue. Here is where the paginator is modeled in Botocore (which Boto3 uses).

Note that it is missing this line: "limit_key": "MaxResults", which other paginator models have. I believe this is causing the behavior to not be aligned with what is documented here: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/paginators.html.

I created a new issue (aws/aws-sdk#751) to track this in our cross-SDK repository going forward, since paginators are used across SDKs. I also reached out to the Redshift Data team regarding this and will share any updates in the new issue I created. Thanks again for bringing this to our attention.

@tim-finnigan tim-finnigan removed the needs-triage This issue or PR still needs to be triaged. label May 20, 2024
Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

@tim-finnigan tim-finnigan added service-api This issue is caused by the service API, not the SDK implementation. pagination p2 This is a standard priority issue labels Oct 30, 2024
@tim-finnigan tim-finnigan removed their assignment Oct 30, 2024
@tim-finnigan tim-finnigan reopened this Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. p2 This is a standard priority issue pagination service-api This issue is caused by the service API, not the SDK implementation.
Projects
None yet
Development

No branches or pull requests

2 participants