Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming uncompressed data (with direct access) #773

Closed
rgaudin opened this issue May 23, 2024 · 7 comments · Fixed by #778
Closed

Streaming uncompressed data (with direct access) #773

rgaudin opened this issue May 23, 2024 · 7 comments · Fixed by #778
Assignees
Milestone

Comments

@rgaudin
Copy link
Member

rgaudin commented May 23, 2024

What we do on Android for videos doesn't seem possible directly. It's a combination of tricks that are missing on iOS (at least I did not find them easily).
First the AssetFileDescriptor allows to mimic an asset (so a clearly bound piece of data) from a filepath, an offset and a size.
Then, there is the integration between Android MediaPlayer and the WebChromeClient renderer (with some hacks to transition smoothly).

I found some other people inquiring for AVPlayer and WKWebvierw integration but without answer. Maybe the wording or the keywords were wrong…

Despite this, I had some success with an easy tweak to the code: streaming the data to the client. Currently, we request the requested content from libzim, store that in a variable and put that on the response. On the renderer side, this content is read (sometimes not completely) and used.

This is OK for relatively small content, or maybe even large ones that are consumed entirely by the client (RAM will be required) but in the video use case, we know that the client wont even try to read the whole thing and will display it piece by piece.

Doing so is as easy as repeatedly calling urlSchemeTask.didReceive(additionalData)

let response = HTTPURLResponse(
    url: url,
    statusCode: statusCode,
    httpVersion: "HTTP/1.1",
    headerFields: headers)
urlSchemeTask.didReceive(response!)

...

for i in 1...nbStreams {
    partEnd = partStart + streamThreshold
    content = ZimFileService.shared.getURLContent(url: url, start: partStart, end: partEnd)
    urlSchemeTask.didReceive(content!.data)
    partStart = partEnd
}
if (finalBytes > 0) {
    content = ZimFileService.shared.getURLContent(url: url, start: partStart, end: partStart + finalBytes)
    urlSchemeTask.didReceive(content!.data)
}
urlSchemeTask.didFinish()

This is very efficient in keeping RAM usage under control on very large videos.

I don't know exactly how it works internally but simply looping on writing 2MB chunks does the trick so I suppose renderer-reading is synced somehow.


Another improvement that is independent from this is reading video files directly from the filesystem. Leveraging item.getDirectAccessInformation() which returns the ZIM path on the fs and the offset at which the content start, we can easily read the video data from it (we already know its size).

WARN ⚠️: We can't pass the filehandle directly to the webview because FileHandle has no size parameter so it would not stop reading at the end of the content. Above streaming experiment shows we might not need this but we could still reimplement a FileHandle that stops after a defined size.

WARN ⚠️: getDirectAccessInformation only works on raw (uncompressed) entries which is something that's decided at ZIM-write time.

In my experiment, I used it on non-text/ mimetypes because I know that currently libzim only compresses those types. Downloads (un-handled formats as you call them) would similarily from it I suppose.

In a real implementation, we might look at whether entry is compressed (is libzim telling us this?) or using a fallback in case the function returns empty data (it doesn't fail…).

On whether we should use it or not, I don't know.

@mgautierfr, what do you think of using getDirectAccessInformation() and reading from filesystem instead of reading from the libzim? Is is worth the separate implementation code? What about other non-compressed content like PDF?

@kelson42 kelson42 added this to the 3.4.0 milestone May 23, 2024
@kelson42
Copy link
Contributor

kelson42 commented May 23, 2024

@BPerlakiH Any chance you can implement this for video files and get a chance to fix #744?

@BPerlakiH
Copy link
Collaborator

@kelson42 @rgaudin I think this is a very good direction, I also had a look at the:

urlSchemeTask.didReceive(data)

to be used on partial chunks, I just need to wrap that into some nicer error handling (as theoretically reading any given chunk can fail).

I also had a look at AVPlayer earlier, which can be started with AVAsset/AVPlayerItem. Unfortunately it does not support webm directly at this stage.
It's also possible to have our own AVAssetReader but it won't go close enough to file reading, so I couldn't find a way toinject our ZIM file reading mechanism somewhere "in between".

I am setting up a PR for this reading optimisation as a standalone improvement for video files (without the HTTP range requests).

@mgautierfr
Copy link
Member

@mgautierfr, what do you think of using getDirectAccessInformation() and reading from filesystem instead of reading from the libzim? Is is worth the separate implementation code?

I can't really answer about technicall difficulties about implementing that with "apple technologies". But getDirectAccessInformation is here to allow user code to bypass libzim and do direct reading of the content by reopening the file, seek and read (mmap is also a solution)
So I would say yes.

What about other non-compressed content like PDF?

getDirectAccessInformation works equally for any non-compressed content (if we content is not split between two file parts). I not sure it worth it as pdf content is pretty small compared to video but it would work too.

@kelson42 kelson42 removed their assignment May 27, 2024
@BPerlakiH
Copy link
Collaborator

@mgautierfr I have found an issue related to this in libzim 9.2.0, please have a look if you can re-create it:
openzim/libzim#886

@kelson42
Copy link
Contributor

@BPerlakiH I though we decided to make the read operation directly without using the libzim?!

@BPerlakiH BPerlakiH linked a pull request May 29, 2024 that will close this issue
@BPerlakiH
Copy link
Collaborator

BPerlakiH commented May 29, 2024

I've created a PR for this, currently it is in draft but can be tested, and reviewed, to see if it makes sense:
#778

@BPerlakiH BPerlakiH changed the title Streaming data Streaming uncompressed data (with direct access) Jun 1, 2024
@BPerlakiH
Copy link
Collaborator

As discussed I am narrowing down this issue to uncompressed data (with direct access), the follow up ticket for compressed data is here:
#784

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants