Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add HTTP spec #508

Merged
merged 38 commits into from
Jun 13, 2024
Merged
Changes from 3 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
bc1aa59
add HTTP spec
marten-seemann Jan 22, 2023
1f075f6
2nd attempt for server auth
marten-seemann Jan 29, 2023
12f86b8
require client to authenticate the server when doing client auth
marten-seemann Jan 29, 2023
146c09a
better motivation for libp2p+HTTP (#515)
marten-seemann Feb 14, 2023
5398f5d
fix a few typos
marten-seemann Feb 14, 2023
b6c1bc2
http: use .well-known/libp2p.json for configuration
marten-seemann Mar 2, 2023
8a57943
http: nest libp2p.json config to allow for future configuration
marten-seemann Mar 2, 2023
d506145
Merge pull request #529 from libp2p/http-well-known-configuration
MarcoPolo Jun 1, 2023
946f516
Reformat the spec from the Point of View of an implementer
MarcoPolo Jul 7, 2023
3681472
Add link
MarcoPolo Jul 7, 2023
dd5d07c
Merge comments
MarcoPolo Jul 10, 2023
46d1857
Merge pull request #556 from libp2p/marco/http-update
MarcoPolo Jul 10, 2023
ebe612c
Add note about how this is just one possible auth mechanism
MarcoPolo Jul 10, 2023
7e5a077
Add lidel to interest group
MarcoPolo Jul 14, 2023
db2b3b5
Update http/README.md
MarcoPolo Jul 17, 2023
6319458
Formatting
MarcoPolo Jul 17, 2023
c7c9c43
Add thomas
MarcoPolo Jul 17, 2023
454e25c
Use metadata map and call it protocols
MarcoPolo Jul 17, 2023
a25267b
Add mermaid diagrom for HTTP semantics vs transport
MarcoPolo Jul 17, 2023
3014b22
Grammar fixes
MarcoPolo Jul 17, 2023
f96359b
Lidel suggestions
MarcoPolo Jul 17, 2023
1e87960
Define where the libp2p-token will be
MarcoPolo Jul 17, 2023
d0f0d93
Grammar fix
MarcoPolo Jul 17, 2023
8fbd64a
Specify IX vs NX in auth scheme
MarcoPolo Jul 19, 2023
71415b0
Add SNI and HTTP_libp2p_token to Noise extensions
MarcoPolo Jul 19, 2023
4a03bb0
Reword Namespace section a bit
MarcoPolo Aug 2, 2023
877899d
Remove SNI and token from extensions
MarcoPolo Aug 2, 2023
dc71f2c
Define the multiaddr URI
MarcoPolo Aug 24, 2023
d8850aa
update protocol name for IPFS gateway
marten-seemann Oct 4, 2023
78e8ca1
Be clear about no pipelining
MarcoPolo Mar 14, 2024
d30efda
Use SHOULD instead of MUST
MarcoPolo Mar 18, 2024
8628b5a
Update RFC for connection: close
MarcoPolo Apr 3, 2024
3c0ac40
Rename well-known
MarcoPolo Apr 3, 2024
75bc635
Add sentence on why POST and other mappings
MarcoPolo Apr 3, 2024
f95e4db
Sukun's review comments
MarcoPolo Apr 15, 2024
e3eb9dc
Small typo fixes
MarcoPolo Apr 15, 2024
95ffe6d
Update to http-path
MarcoPolo Jun 3, 2024
8f44d00
Merge pull request #568 from libp2p/marco/multiaddr-scheme
MarcoPolo Jun 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 159 additions & 0 deletions http/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# libp2p + HTTP: the spec

| Lifecycle Stage | Maturity | Status | Latest Revision |
|-----------------|--------------------------|--------|-----------------|
| 1A | Working Draft | Active | r0, 2023-01-23 |

Authors: [@marten-seemann]

Interest Group: [@MarcoPolo]

[@marten-seemann]: https://github.com/marten-seemann
[@MarcoPolo]: https://github.com/MarcoPolo

MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
## Introduction

This document defines how libp2p nodes can offer a HTTP endpoint next to (or instead of) their full libp2p node. Services can be offered both via traditional libp2p protocols and via HTTP, allowing a wide variety of nodes to access these services. Crucially, this for the first time, allows browsers to access libp2p services without spinning up a Web{Socket, Transport, RTC} connection first. It also allows interacting with libp2p services from environments where plain HTTP is the only option, e.g. curl from the command line, and certain cloud edge workers and lambdas.

At the same time, nodes that are already connected via a libp2p connection, will be able to (re)use this connection to issue the same kind of requests, without dialing a dedicated HTTP connection.

Any protocol that follows request-response semantics can easily be mapped onto HTTP (mapping protocols that don’t follow a request-response flow can be more challenging). Protocols are encouraged to follow best practices for building REST APIs. Once a mapping has been defined, a single implementation can be used to serve both traditional libp2p as well as libp2p-HTTP clients.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

## Addressing

Nodes may advertise HTTP multiaddresses to signal support for libp2p over HTTP. An address might look like this: `/ip4/1.2.3.4/tcp/443/tls/sni/example.com/http/p2p/<peer id>` (for HTTP/1.1 and HTTP/2), or `/ip4/1.2.3.4/udp/443/quic/sni/example.com/http/p2p/<peer id>` (for HTTP/3).
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

Nodes MUST use HTTPS (i.e. they MUST NOT use unencrypted HTTP). It is RECOMMENDED to use HTTP/2 and HTTP/3, but the protocols also work over HTTP/1.1.

Note that the peer ID in this address is 1. optional and 2. advisory and not (necessarily) verified during the HTTP handshake (depending on the HTTP client). If and when desired, clients can cryptographically verify the peer ID once the HTTP connection has been established, see [Authentication] for details on peer authentication.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

Nodes can also link to a specific resource directly, similar to how a URL includes a path. This will require us to resolve [https://github.com/multiformats/multiaddr/issues/63](https://github.com/multiformats/multiaddr/issues/63) first. For example, the URL of a specific CID might be: `/ip4/1.2.3.4/tcp/443/tls/sni/example.com/http/<my data transfer protocol>/{/path/to/<cid>}`.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

## Namespace

libp2p does not squat the global namespace. By convention, all libp2p services are located at a well-known URL: `http://example.com/.well-known/libp2p/<service name>/<path (optional)>`.
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved

Putting the service name into the URL allows for future extensibility. It is easy to define new protocols, and the replace existing protocols by newer versions.

Applications MAY expose services under different URIs. For example, an application might decide to generate nicer-looking (and probably more SEO-friendly) URLs, and map paths under `[https://example.com/dht/](https://example.com/dht/)` to `https://example.com/.well-known/libp2p/kad-dht-v1/`.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved

### Service Names

Traditionally, libp2p protocols have used path-like protocol identifiers, e.g. `/libp2p/autonat/1.0.0`. Due to the use of `/`s, this doesn’t work well with the naming convention defined above.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

Protocols that wish to use the libp2p request-response mechanism MUST define a service name that is a valid URI component (according to RFC 8820).

In practice, this isn’t expected cause too much friction, since current libp2p protocols were not designed to use the request-reponse mechanism, and will need to make arrangements to support it anyway (e.g. define how requests and responses are serialized).
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

### Privacy Properties

This leads to some very desirable properties:

1. It is possible to run libp2p alongside a normal HTTP web service, i.e. on the same domain and port, without having to worry about collisions.
1. As an on-path observer only sees SNI and ALPN, this effectively hides the fact that a client is establishing a connection in order to speak libp2p.
2. Since authentication is flexible (see below), this enables servers to
1. require authentication to (some) paths below `.well-known/libp2p`, and to enforce ACLs
2. stealth mode: return 404 for paths below `.well-known/libp2p`, *unless* the client has already authenticated itself, thereby hiding the fact that it runs a libp2p server, even if probed explicitly

## Certificates

libp2p doesn’t prescribe how nodes obtain the TLS certificate to secure the HTTPS connection. Since browsers are expected to connect to the node, the certificate’s trust chain must end in the browser’s trust store.

This is somewhat tricky in a p2p context, as nodes might not have a (sub)domain, which for many CAs is a requirement to obtain a certificate. Specifically, Let’s Encrypt doesn’t support IP certificates at the moment. ZeroSSL does, however, this requires setting up a (free) account.

To speed of server authentication, a node MAY include the libp2p TLS extension in its certificate. Note that this is currently not possible when using Let’s Encrypt, since the libp2p TLS extension is not whitelisted by LE. Not every HTTP client will have access to the TLS certificate (for example, browsers usually don’t expose an API for that), but if an HTTP client does, it SHOULD use that information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any CA that does allow this extension? (I don't think so)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there isn't any, as far as I'm aware. Once we have a strong use case and a spec, we could start reaching out to CAs to see if it's possible to be included in their whitelist.


## Authentication

Traditionally, libp2p was built on the assumption that both peers authenticate each other during the libp2p handshake. libp2p+HTTP acknowledges that this isn’t always possible, or even desirable, and that different use cases call for different authentication modes. For example, a server might offer a certain set of services to any client, like a HTTP webserver does.

### Server Authentication

Since HTTP requests are independent from each other (they are not bound to a single connection, and when using HTTP/1.1, will actually use different connections), the server needs to authenticate itself on every single request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add a section to allows for some signature caching? e.g.

Clients MAY reuse their libp2p-server-auth value for a short period of time but no more than 1 day. This allows the server to cache the signature and avoid resigning. Which may be helpful in certain high frequency protocols.

I'm not sure if this is actually an optimization (How slow is the memory access vs this signature?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe? On the other hand, if you're willing to do a TLS handshake for every HTTP/1.1 request, maybe you don't really care about a signature. Of course this will look different for a HTTP/2 or HTTP/3 connection...
When caching, you'd have to be very careful about HTTP redirects.

There's also the risk that this signature scheme negates the benefits of using a CDN: Unless the libp2p key is at the edge, every request has to go back to the origin to perform this signature.

This comment was marked as outdated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these things together mean the responses for the same thing sent to different users will have different header, and will not be reusable across users. Effectively making it impossible to benefit from HTTP caching across users.

Is that so? HTTP headers shouldn't have any effect on cacheability, unless you include them in the Vary header.

For example, DHT lookup for a peer records would hit the backend server every time, even tho you could safely cache results for a while at the edge CDN.

The first question I'd ask is if it makes sense to request authentication in that case. I imagine that almost all requests we handle won't need peer authentication, since we're dealing with either signed or content-addressed data anyway.


As browsers don’t expose an API to access details of the TLS certificate used, nor allow any access to the (an exporter to) the TLS master secret, server authentication is a bit more contrived than one might initially expect.

To request the server to authenticate, the client sets the `libp2p-server-auth` HTTP header to a randomly generated ASCII string of at least 10 (and a maximum of 100) characters. The server signs the following string using its host key:
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

```
"libp2p-server-auth:" || the value of the libp2p-server-auth header || "libp2p-server-domain:" || the domain (including subdomains)
```

It then sets the following two HTTP headers on the response:

1. `libp2p-server-pubkey`: its public key (from the libp2p key pair)
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
2. `libp2p-server-auth-signature`: the signature derived as described above

When requesting server authentication, the client MUST check that these two header fields are present, and MUST check the signature. It MUST NOT process the response if either one of these checks fails
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

### Client Authentication
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

When an unauthenticated client tries to access a resource that requires authentication, the server SHOULD use a [401 HTTP status code](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/401). The client MAY then authenticate itself using the protocol described below, and then retry the request.

Support for client authentication is an optional feature. It is expected that only a subsection of clients will implement it. For example, browser simply retrieving a few (elements of) web pages from IPFS probably won’t have any need to even generate a libp2p identity in the first place.

The protocol defined here takes 2 RTTs to authenticate the client. It is designed to be stateless on the server side. In the first round-trip, the client obtains a (pseudo-) random value from the server, which it then signs with its host key and sends back to the server, which then issues an authentication token (acting somewhat like a cookie) which can be included on future requests.

The service name is `client-auth`. For the first step, the client sends a GET request to this HTTP endpoint. As described in [server-authentication], the client MUST authenticate the server in this step. The server responds with at least 8 and up to 1024 bytes of pseudorandom data:

```json
{
"random": <multibase-encoded random bytes>,
"signature": <multibase-encoded signature>
}
```

In order to keep this exchange stateless, the server SHOULD 1. include the current timestamp or an expiry data and 2. a signature in that data. This allows it to check in step 2 that it actually generated that data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused. Are you saying that the server needs to include the expiry data and their signature as part of the random field? Are you suggesting they put the useful state in the field and encrypt it? And that's the random field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this basically offloads that state to the client, in a way that the client can't manipulate.

The client MUST check that the signature obtained in the JSON response is correct and was generated using the same key that the server used to authenticate itself.

The client signs the data received in step 1, and sends a POST request with the following JSON object to the server:

```json
{
"data": <the random bytes received from the server, multibase encoded>,
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved
"peer-id": <peer ID, in string reprensentation>,
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
"signature": <multibase-encoded signature>
}
```

The server verifies the signature and issues an authentication token. In order to allow stateless operation, at the very minimum, the authentication token SHOULD contain the peer ID. It SHOULD also contain an expiry date and it MAY be bound to the client’s IP address. The token is sent in the response body.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

The client uses the auth token on requests that require client authentication, by setting the `libp2p-auth-token` HTTP header.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

## Mapping to libp2p Streams

libp2p services whose service is specified as request-response protocols can use a single protocol implementation to make the service available over HTTP as well as on top of libp2p streams.

The libp2p protocol identifier is `/http1.1`. After negotiating this protocol using multistream-select, nodes treat the stream as a HTTP/1.1 stream for a single HTTP request (i.e. nodes MUST NOT use request pipelining).
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

## Outlook: Interaction with Intermediaries

One of the advantages of running HTTP is that there’s widely deployed caching infrastructure (CDNs). Content-addressed data is infinitely cacheable. Assuming a properly design data transfer protocol, retrieval for CIDs could be cached by the CDN and made available via a POP (geographically) close to the user, dramatically reducing retrieval latencies.

Services SHOULD specify the caching properties (if any), and set the appropriate cache headers (according to RFC 9111).
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

CDNs can also be used to increase censorship resistance, since the CDN effectively hides the IP address of the origin server. With the upcoming introduction of ECHO (Encrypted ClientHello) in TLS, all that an on-path observer will be able to see is that a client is establishing a connection to a certain CDN, but not to which domain name.

The level of delegation between the origin node and the CDN can be adjusted. In the simplest configuration, the origin node is the only node that holds the libp2p private key, thus requests to the `server-auth` protocol would be forwarded from the CDN to the origin server. In a more advanced configuration, it would be possible to move the private key to a worker on the edge of the CDN, and perform the signing operation there (thereby reducing the request latency for `server-auth` requests).

## FAQ

### Why not gRPC?

This would be the perfect fit, allowing both request-response schemes as well as variations with multiple requests and multiple responses. However, it’s not possible to use gRPC from the browser.
MarcoPolo marked this conversation as resolved.
Show resolved Hide resolved

### Why tie ourselves to HTTP when mapping onto libp2p? Can’t we have a more general serialization format?

We could, but rolling our own serialization comes with some costs. First of all, we’d have define how HTTP request and response header, bodies, trailers are serialized onto the wire. Most likely, we’d define a Protobuf for that. Second, once we add more features to that format, they would need to be back-ported to HTTP, so that nodes that only speak HTTP can make use of them as well.

It’s just simpler to commit to HTTP.

### Why not use HTTP/3 for the libp2p mapping?

I’d love to! This would allow us to use HTTP header compression using QPACK, and a binary format instead of a text-based one. However, HTTP/3 requires the peers to exchange HTTP/3 SETTINGS frames first, and it’s not immediately obvious when / how this would be done in libp2p. It’s also not clear how easy it would be to use HTTP/3 in JavaScript.

The good news is that once we’ve come up with a solution for these two problems, it will be rather easy to add support for HTTP/3: nodes will just offer `/http3` in addition (and one day, instead of) `/http1.1`, and nodes that support can hit that endpoint. Nothing in the implementation of the protocols will need to change, since protocols only deal with (deserialized) HTTP requests and responses.

### Can I run QUIC, WebTransport and an HTTP/3 server on the same IP and port?

Yes, once [https://github.com/libp2p/specs/issues/507](https://github.com/libp2p/specs/issues/507) is resolved.