View first page before entire document is loaded - support range header #419

joepio · 2019-06-26T12:39:34Z

Before you start - checklist

I have read documentation in README
I have checked sample and test suites to see real life basic implementation
I have checked if this question is not already asked

What are you trying to achieve? Please describe.

In our project (issue, demo), I'd like to load only the pages that I'm viewing, and render the first page before the entire document is loaded.

From my understanding, PDF.js supports Range headers and the react-pdf API describes that it's possible to include a PDFDataRangeTransport object in the file property. I fail to see what to do to actually send these Range headers, though!

Describe solutions you've tried

Check if the source PDF is optimized for the web
Check if the hosting service supports HTTP Range headers

Environment

Chrome 75
MacOS 10.14.5
React-PDF 4.0.5
React-scripts 3.0.1
React 16.8.6

The text was updated successfully, but these errors were encountered:

wojtekmaj · 2019-07-04T02:29:09Z

Hi,
Yeah, PDFDataRangeTransport should be supported, as React-PDF just passes it to pdf.js, does not much else with it. I found this topic on PDFDataRangeTransport objects creation.

It seems like the easiest way to get the behavior you want is to simply pass an URL as file prop. This should work just fine: https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#range

joepio · 2019-07-04T07:51:31Z

Thanks for the reply (and the awesome library, for that matter) @wojtekmaj!

Unfortunately, I do pass the URL as a file prop (source, demo), but it only renders after the entire document has been fetched.

Also, the request for the PDF file does not appear have any range headers.

Perhaps this conditional is never actually true if I pass a string?

    // File is PDFDataRangeTransport
    if (file instanceof PDFDataRangeTransport) {
      return { range: file };
    }

joepio · 2019-08-05T11:36:20Z

According to PDF.js developers, PDF.js does not support gzip encoding of range responses, so it needs to be set explicitly. According to the PDF.js docs, you can set custom headers. Since Document passes the options object to PDFjs.getDocument, this should work:

<Document
  options={{
    httpHeaders: {
      'Accept-Encoding': 'Identity',
    }
  }}
  file={"https://example.com/some.pdf"}
>

However, it does not, so I'm still investigating what is going on. It seems likely that it's a pdf.js issue.

angel-langdon · 2022-01-28T10:15:18Z

@joepio Did you manage to solve this issue?

I am using FastAPI as backend in Python and I have not managed to solve it

I have tried passing to <Document/> this sample URL (2 GB PDF) https://s3.amazonaws.com/pdftron/downloads/pl/2gb-sample-file.pdf
and it loads the first page immediately

joepio · 2022-01-28T10:31:07Z

@angel-langdon never managed to get it working, unfortunately...

angel-langdon · 2022-01-30T13:56:16Z

@joepio Well I finally managed to do it, it was failing because our backend implementation was not compatible with pdf.js

Frontend component

interface MemoizedDocumentProps {
  url: string;
  children: JSX.Element | null;
}

const MemoizedDocument = memo((props: MemoizedDocumentProps) => {
  const file = useMemo(
    () => ({ url: props.url }),
    [props.url]
  );
  return (
    <Document
      file={file}
    >
      {props.children}
    </Document>
  );
});

Backend implementation (in Python)

import os
from typing import BinaryIO

from fastapi import FastAPI, HTTPException, Request, status
from fastapi.responses import StreamingResponse


def send_bytes_range_requests(
    file_obj: BinaryIO, start: int, end: int, chunk_size: int = 10_000
):
    """Send a file in chunks using Range Requests specification RFC7233

    `start` and `end` parameters are inclusive due to specification
    """
    with file_obj as f:
        f.seek(start)
        while (pos := f.tell()) <= end:
            read_size = min(chunk_size, end + 1 - pos)
            yield f.read(read_size)


def _get_range_header(range_header: str, file_size: int) -> tuple[int, int]:
    def _invalid_range():
        return HTTPException(
            status.HTTP_416_REQUESTED_RANGE_NOT_SATISFIABLE,
            detail=f"Invalid request range (Range:{range_header!r})",
        )

    try:
        h = range_header.replace("bytes=", "").split("-")
        start = int(h[0]) if h[0] != "" else 0
        end = int(h[1]) if h[1] != "" else file_size - 1
    except ValueError:
        raise _invalid_range()

    if start > end or start < 0 or end > file_size - 1:
        raise _invalid_range()
    return start, end


def range_requests_response(
    request: Request, file_path: str, content_type: str
):
    """Returns StreamingResponse using Range Requests of a given file"""

    file_size = os.stat(file_path).st_size
    range_header = request.headers.get("range")

    headers = {
        "content-type": content_type,
        "accept-ranges": "bytes",
        "content-encoding": "identity",
        "content-length": str(file_size),
        "access-control-expose-headers": (
            "content-type, accept-ranges, content-length, "
            "content-range, content-encoding"
        ),
    }
    start = 0
    end = file_size - 1
    status_code = status.HTTP_200_OK

    if range_header is not None:
        start, end = _get_range_header(range_header, file_size)
        size = end - start + 1
        headers["content-length"] = str(size)
        headers["content-range"] = f"bytes {start}-{end}/{file_size}"
        status_code = status.HTTP_206_PARTIAL_CONTENT

    return StreamingResponse(
        send_bytes_range_requests(open(file_path, mode="rb"), start, end),
        headers=headers,
        status_code=status_code,
    )


app = FastAPI()


@app.get("/video")
def get_video(request: Request):
    return range_requests_response(
        request, file_path="path_to_my_video.mp4", content_type="video/mp4"
    )

I would strongly recommend reading the Range Requests RFC https://datatracker.ietf.org/doc/html/rfc7233 to understand everything, there are a few gotchas

duriann · 2022-03-01T10:39:56Z

@joepio好吧，我终于设法做到了，由于区分大小写的标头而失败了。此外，您需要指定所有这些标头才能正常工作：

    headers  = {
         "Content-Type" : "application/pdf" ,
         "Accept-Ranges" : "bytes" ,
         "Content-Encoding" : "identity" ,
         "Access-Control-Expose-Headers" : (
             "Accept-Ranges , 内容长度, 内容范围" 
        ), "内容长度" : str ( end - start + 1 ),
         "内容范围" : f"字节{开始} - {结束} / {
         文件大小} "
    }

我强烈建议阅读 Range Requests RFC https://datatracker.ietf.org/doc/html/rfc7233以了解所有内容，有一些陷阱

hello, Are you write like this ？
<Document options={{ httpHeaders: { 'Content-Type': 'application/pdf', 'Accept-Ranges': 'bytes', 'Content-Encoding': 'identity', 'Access-Control-Expose-Headers': 'Accept-Ranges , Content-Length, Content-Range', 'Content-Length': '1000000', 'Content-Range':bytes 0 - 999999 / 1000000, }, }} file={'https://s3.amazonaws.com/pdftron/downloads/pl/2gb-sample-file.pdf'}/>
I found it doesn't work for me。。

angel-langdon · 2022-03-01T12:35:08Z

@bolosea I don't know if not english characters are valid, see my updated answer for full details

duriann · 2022-03-03T08:06:29Z

@bolosea I don't know if not english characters are valid, see my updated answer for full details

thanks for your reply，but I don't backend。 and I found that url ' https://s3.amazonaws.com/pdftron/downloads/pl/2gb-sample-file.pdf' in pdf.js example project can works as expected in this
what's happend?
The weird thing is that it doesn't work when I download the pdf.js source code and run it with pdf.getDocument, I want to cry，QAQ

github-actions · 2022-06-06T00:01:17Z

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this issue will be closed in 14 days.

github-actions · 2022-06-20T00:01:29Z

This issue was closed because it has been stalled for 14 days with no activity.

wojtekmaj added the question Further information is requested label Jul 4, 2019

wojtekmaj self-assigned this Jul 4, 2019

joepio mentioned this issue Jul 31, 2019

Support gzip in range request / Explicitly set accept-encoding: identity mozilla/pdf.js#11027

Closed

github-actions bot added the stale label Jun 6, 2022

github-actions bot closed this as completed Jun 20, 2022

angel-langdon mentioned this issue Jun 30, 2022

Progressive rendering not working mozilla/pdf.js#9609

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

View first page before entire document is loaded - support range header #419

View first page before entire document is loaded - support range header #419

joepio commented Jun 26, 2019 •

edited

Loading

wojtekmaj commented Jul 4, 2019

joepio commented Jul 4, 2019 •

edited

Loading

joepio commented Aug 5, 2019 •

edited

Loading

angel-langdon commented Jan 28, 2022 •

edited

Loading

joepio commented Jan 28, 2022

angel-langdon commented Jan 30, 2022 •

edited

Loading

duriann commented Mar 1, 2022 •

edited

Loading

angel-langdon commented Mar 1, 2022

duriann commented Mar 3, 2022

github-actions bot commented Jun 6, 2022

github-actions bot commented Jun 20, 2022

View first page before entire document is loaded - support range header #419

View first page before entire document is loaded - support range header #419

Comments

joepio commented Jun 26, 2019 • edited Loading

wojtekmaj commented Jul 4, 2019

joepio commented Jul 4, 2019 • edited Loading

joepio commented Aug 5, 2019 • edited Loading

angel-langdon commented Jan 28, 2022 • edited Loading

joepio commented Jan 28, 2022

angel-langdon commented Jan 30, 2022 • edited Loading

Frontend component

Backend implementation (in Python)

duriann commented Mar 1, 2022 • edited Loading

angel-langdon commented Mar 1, 2022

duriann commented Mar 3, 2022

github-actions bot commented Jun 6, 2022

github-actions bot commented Jun 20, 2022

joepio commented Jun 26, 2019 •

edited

Loading

joepio commented Jul 4, 2019 •

edited

Loading

joepio commented Aug 5, 2019 •

edited

Loading

angel-langdon commented Jan 28, 2022 •

edited

Loading

angel-langdon commented Jan 30, 2022 •

edited

Loading

duriann commented Mar 1, 2022 •

edited

Loading