Should the PWP document refer to the full HTTP Payload? #14

iherman · 2017-12-04T17:53:43Z

No description provided.

iherman · 2017-12-04T17:55:26Z

It has been resolved (closing issue #6) that the document would include:

MAY contain the request & response HTTP payloads for each resource

Question is whether the full HTTP Payload should be referred to, or whether we should specify the entries of interest based on the HTTP standard.

More about the discussion in a separate comment (#6 (comment)) containing an IRC dump.

iherman · 2017-12-04T17:56:49Z

For reference: a collection of the HTTP Header fields that have been defined over the years:

https://github.com/dret/webconcepts/blob/master/MD/headers.md

There are 170 of them...

baldurbjarnason · 2017-12-04T19:21:08Z

It would be helpful if you outlined how subsetting HTTP would benefit the specification process, UAs, and tool makers. And it's not clear what you mean by specifying the entries of interest. Does it mean subsetting the HTTP you are allowed to store in a PWP? Or, is it a UA requirement ('you must support these when listed in a PWP')?

Without more details I find it difficult to see why this could be a good idea. As is, without a clearly stated benefit, I have a lot of questions and concerns about what this could mean (listed below, some of which might not be relevant if I've misunderstood what you're proposing).

This reasoning would imply that the PWP specification should also be subsetting CSS, HTML, and JS based on the selectors, properties, elements, and APIs based on interest to publications. If we accept the argument that a standard vital for interoperability with the regular web—the foundation of the web stack—is too complex for PWPs and needs to be subsetted, then we need to explain why that exact same argument shouldn't be applied to the rest of the stack.
The task of going through all of those HTTP standards and deciding which one of them is important to PWPs and which are not is sizeable and error-prone. We could miss one that turns out to be an important edge case.
How would this deal with the fact that new HTTP headers are standardised regularly? Doesn't this lock us into a intensive revision process where we have to regularly go back to the standard and update it to reflect innovations in HTTP?
I'm not convinced this would simplify the archive format as the complexity of the format depends on the semantics and structure of an HTTP header and not on the number of entries stored.
I'm doubtful it would save much on file size.
It might make creating PWPs harder as this would require more processing and you couldn't just store the request and response headers as is.
User agents could always ignore HTTP headers they don't support, much like they do already.

I see why you would want a format that let you not store HTTP requests and responses at all. That would limit compatibility with the regular web but could simplify creation and workflow substantially. The EPUB ecosystem, for example, would probably find that to be an acceptable trade-off.

Overall, I don't see the benefit to subsetting HTTP. It could end up being lot of additional work both for the specification process as well as for implementors both of UAs and tooling.

dauwhe · 2017-12-04T19:51:37Z

Could someone describe how a PWP might store and use such headers? Are we creating HTTP servers inside the package?

baldurbjarnason · 2017-12-04T21:00:46Z

@dauwhe Well, for starters, we do know that all UAs are going to be HTTP clients or be implemented using an HTTP client (like a WebView). So it's a little bit clearer how the UA side might use these headers when they are available. As far as the resources in the PWP are concerned, the point is that they shouldn't have to be aware of their PWPness, otherwise we risk losing interoperability with regular WPs.

The likely scenarios as far as I understand it (which may well be wrong):

The PWP is created from a WP that is already hosted on a web server by storing the HTTP request and response headers used to access the WP in the first place. In many cases I'd imagine that these archives are created on the fly from a regular WP during the acquisition process so they can store things like authentication tokens and locale.
Then it's up to the UA implementor to decide how they want to use this information:
- Some of them might be using a browser engine that supports the format natively so the response headers automatically get processed into the browser cache as if it were an HTTP request. This could be the case if we use the web packaging spec and it gets implemented by browsers. I know the AMP team is really keen on web packaging with signing as a solution to some of the problems they are facing with the AMP CDN. And the people behind both Electron and Node are considering it for app and library distribution.
- Some of them might present the resources via a server that's internal to the UA, again basically replaying the relevant headers for each response to the WebView.
- Some of them might implement HTTP request/response semantics as API calls. This would help there.
- Some of them might use an internal service worker and store the PWP's resources with response headers using the service worker cache API, which supports storing complete responses.

And for many of these situations you need the request headers to be able to decide if the response headers are relevant (auth tokens being a classic example). In general, deciding on which HTTP headers to support and which to ignore is likely to be very implementation-specific. I don't see how we can decide that for them in advance.

It would also make unpacking the PWP at another web location more reliable as you'd be able to preserve things like security headers, encoding, etc at the new location (which is something that services like AWS S3 support, for example).

Another reason to support storing HTTP headers would be to store credentials, e.g. storing a JWT token or signed cookie that lets the UA update PWP resources with their latest versions, even if they are behind a paywall.

None of this rules out the possibility of creating PWPs that don't store any HTTP headers at all, although in that case we risk being back in the EPUB situation of limited interoperability with the rest of the web stack.

I think there's a valid argument to be made for a PWP format that does not store HTTP headers in any way to get rid of the requirement of having to use a web server to create the archive. That would simplify authorship and some aspects of distribution and be more compatible with the pre-existing EPUB ecosystem. And, as outlined above, I also think there's a valid argument for allowing the storage of HTTP headers to maximise interoperability with regular WP and the web stack in general.

I'm a bit sceptical that we could accommodate both in a single archive format, but that's a separate issue.

What I'm not convinced of is the need for the specification do define a subset of HTTP headers that can be stored. Unless there's a compelling reason to do so, that feels like unnecessary additional complexity both for the specification process and for UA/tooling implementors.

(Anyway, I apologise for dominating this thread. I'll back off now. Plenty of other work waiting for me 😊)

dauwhe · 2017-12-04T21:07:43Z

I think there's a valid argument to be made for a PWP format that does not store HTTP headers in any way to get rid of the requirement of having to use a web server to create the archive. That would simplify authorship and some aspects of distribution and be more compatible with the pre-existing EPUB ecosystem. And, as outlined above, I also think there's a valid argument for allowing the storage of HTTP headers to maximise interoperability with regular WP and the web stack in general.

I'm a bit sceptical that we could accommodate both in a single archive format, but that's a separate issue.

You might have just defined the difference between PWP and EPUB4 for us. Baldur saves Christmas!

HadrienGardeur · 2017-12-05T14:06:57Z

First of all, I'd like to add another big +1 for all the things that @baldurbjarnason just said.

Based on some of the discussions that we've had on other issues as well (with @iherman and @lrosenthol), I believe that the following scenario is the most likely to happen:

PWP will use Web Packaging (or something similar to it) as its default packaging format, which will cover our requirement for HTTP payloads (and work much more nicely with the Web than EPUB)
EPUB 4 will continue to be ZIP based to remain compatible with the rest of the EPUB ecosystem

From a reading system developer perspective, we're already using HTTP extensively behind the scene for EPUB 2.x/3.0 support.

Both versions of Readium rely on an internal HTTP server to serve resources contained in a package, and while we barely customize the HTTP headers that we're using (mostly Cache-Control and ETag to improve performance through caching), it would be fairly easy to go beyond that and include headers specified in the package itself.

In order to support PWP and offline reading for a WP, we'll need to act as a proxy as well (unlike EPUB where we can roll out an internal server, we'll have to intercept network requests, a much more difficult task).
This will be quite challenging across all platforms, but a generic Service Worker + full HTTP payloads might be our best chance of implementing such features.

llemeurfr added the topic:google-web-packaging Issues related to the Google Web Packaging format https://github.com/WICG/webpackage label Apr 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should the PWP document refer to the full HTTP Payload? #14

Should the PWP document refer to the full HTTP Payload? #14

iherman commented Dec 4, 2017

iherman commented Dec 4, 2017

iherman commented Dec 4, 2017

baldurbjarnason commented Dec 4, 2017

dauwhe commented Dec 4, 2017

baldurbjarnason commented Dec 4, 2017

dauwhe commented Dec 4, 2017

HadrienGardeur commented Dec 5, 2017

Should the PWP document refer to the full HTTP Payload? #14

Should the PWP document refer to the full HTTP Payload? #14

Comments

iherman commented Dec 4, 2017

iherman commented Dec 4, 2017

iherman commented Dec 4, 2017

baldurbjarnason commented Dec 4, 2017

dauwhe commented Dec 4, 2017

baldurbjarnason commented Dec 4, 2017

dauwhe commented Dec 4, 2017

HadrienGardeur commented Dec 5, 2017