The FLEDGE proposal calls for k-anonymity thresholds on network
updates to interest groups. A browser should not request interest group
updates unless there are at least
Interest groups are one use case for k-anonymity thresholds. FLEDGE also
calls for thresholds on the renderUrl
that won an auction, and there
are other Privacy Sandbox APIs, such as Shared Storage, that may impose
k-anonymity thresholds. Other browser features, like start_url
parameters
for progressive web apps,
might benefit from k-anonymity thresholds as well.
Given the variety of use cases, we intend to implement a k-anonymity server
that's quite general. Let's define an object as the browser-stored state
(e.g. an interest group) that we wish to have a k-anonymity threshold for.
We'll design the server to operate on integers s = Hash(object)
; that
is, every object will be hashed consistently across browsers. On the
server each hash will map to a single set, where that set contains all the
browsers that have told the server they have the object as local browser-state.
To support different use cases, with possibly different server-side behavior,
we'll define a type for each set.
The diagram below shows the write path, which we call Join
, from a
browser to the server. The browser has an identifier or token, b
, that is
used by the server for counting purposes. The browser hashes the object,
computing a set hash s
. It sends these parameters, along with the type
of the set (e.g. "interest group"), t
, to the server: Join(b, t, s)
.
The server will store this membership and apply a type-defined TTL that's
configured on the server. When the TTL expires, the server will stop
considering the browser b
as part of (t,s)
unless the browser sends
another Join
request that resets the TTL for its membership.
The server also needs a read endpoint to expose back to the browser whether
a particular set is above the k-anonymity threshold. We call this endpoint
Query
, and it takes a type t
and the set s
to check: Query(t, s)
.
It returns a boolean that is true if the set has met the k-anonymity threshold.
The list of sets above the threshold, each represented as (t, s)
and used to
serve Query
requests, is updated periodically with recent Join
requests.
A given typed set
The browser will periodically call Query
for its local objects and update
one bit per-object, is_kanon
, with the result. This bit can be used by the
browser to enforce thresholds. For example, the browser could only request
network updates for interest groups where is_kanon == true
or only render
an ad if the stored renderUrl
has the bit set.
This simple design implements the business functionality of the server, but there are many other things we want to do to ensure the server protects the privacy of Chrome users.
We recognize the sensitivity of the information being sent to this server: the set hashes might represent browsing behavior, whether an interest group or otherwise. These are highly sensitive, and we don't want the server, or someone interacting with the server, to be able to link set hashes back to individual users.
In designing this server we're taking an iterative, privacy-focused approach. Our initial design contains robust privacy protections that are outlined in more detail below. Over time we plan to further strengthen privacy protections as research areas advance and new technologies and tools become available.
For requests to the k-anonymity server that are intended to be anonymous, such
as Join
and Query
requests described above, we plan to use an Oblivious
HTTP relay so Google
is blind to the IP addresses of end users. The payload for Join
and Query
requests contain set hashes, which we consider sensitive browsing behavior,
so we'd like Google to know as little as possible about user requests.
Oblivious HTTP is well suited to this use case, where we have small, anonymous,
stateless, requests and we don't need the server that knows the browsing data,
i.e. the set hashes, privy to identifying information, including IP address,
of the user making the request. To implement Oblivious HTTP we're engaging
a third-party company to operate a relay resource on our behalf. Chrome
browsers, when making Join
or Query
requests, send to this relay, in the
body of an HTTP POST
message, an encrypted payload for the k-anonymity server.
The payload is a Binary HTTP
message that has been encrypted using Hybrid Public Key
Encryption. Chrome will encrypt
the message using keys fetched directly from the k-anonymity server (on the
Google domain, not through the relay). The relay will forward the request
to a gateway that will run on Google servers near the k-anonymity server.
The relay is therefore oblivious to the content of the request but privy
to the requestor's IP address and the k-anonymity server (and gateway)
are oblivious to the requestor but privy to the request content.
This design lets us separate between two entities user identifying data that we can't remove by other means (such as the IP address) and the content of the user's request. No single entity sees both, and we expect a contractual relationship that ensures these entities don't collude to share request data.
We evaluated several other proxying designs that could avoid Google becoming
privy to user IP addresses. Among the other options we considered, Oblivious
HTTP stood out as offering a solution with the most coverage for Chrome users
in restrictive network environments, such as those with existing forward
proxies required to access the internet. As a normal, small, POST
request
to the relay, Oblivious HTTP traffic is expected to work in more restricted
environments than, say, HTTP CONNECT or MASQUE proxy requests.
Next, we're taking a conservative approach to the browser identifier, b
,
that is sent with Join
requests. Even the operator of the k-anonymity server
shouldn't be able to identify unique users calling Join
. If the operator
cheats, and examines the database stored by Join
, we're protecting users
by only sending to Join
a value b
with entropy limited to
With our IP blindness principles, we have no way of distinguishing users that
share a b
when they call Join(b, t, s)
. When we count the cardinality
of a set on the server, each distinct b
will be counted only once, even if
multiple users join the same set with identical values of b
. This means
that our cardinality calculation may undercount and that we can only count
up to a limit of Join
will be part of
the Chromium open source codebase, and can enforce on the user's device that
b
is not longer than
Abusive or malicious writes to Join
can undermine the k-anonymity thresholds,
misleading browsers into thinking they are members of a k-anonymous interest
group when, in fact, the other members of the group are not real. It is
important that we protect Join
against malicious write traffic, and,
to maintain privacy, that we do this in an anonymous way.
To protect this endpoint we will use Trust
Tokens. Every write to Join will
require a one-time-use Trust Token be attached to the request, and tokens
will be bound to a specific low-entropy identifier, b
. Each browser will
be issued tokens with its assigned b
, and it can spend those tokens as it
wishes to make Join
calls to the server.
We will operate a Trust Token issuer specific to this server and these
tokens; we'll call this issuer Sign
. In our current proposal,
Sign
will require, at least initially for desktop Chrome, that the user
be signed-in to Chrome with a Google Account. Requiring sign-in lets us
rate limit the number of tokens issued to a given user, assign each user a
stable value for b
, and prevent naive abuse of Join
by anonymous users.
Even though the user is signed-in, and Google Account credentials are
used to issue Trust Tokens, the Trust Tokens received by Join
cannot be
linked
back to the Google Account they were issued to. The Trust Token issuer can
learn which users join a large number of interest groups. To guard against
this, we're exploring options that include having the client request tokens
at a constant rate and discard unused tokens.
Query
is a read-only API, so it doesn't have the same abuse concerns as
Join
. We won't require Trust Tokens, or a Google Account, for a browser
to call Query
.
We're working to ensure that the data this server exposes through Query
, that
is the set of k-anonymous hashes, meets a quantifiable level of differential
privacy
and does not reveal information about what sets a non-malicious user may have
joined. In addition, we want to bound false negatives and false positives
within the set of k-anonymous hashes because false positives pose a privacy
risk and false negatives limit the utility of FLEDGE.
To build on the privacy protections we are implementing today, there are a few different areas we're researching that could offer even better privacy to Chrome users. None of these approaches are ready for production today, but we commit to continue investing in research, prototyping, and testing in these areas.
The Query
endpoint to this server receives sensitive information in
the form of set hashes that the browser wants to check the k-anonymous
property of. If Query
requests are batched, with multiple set
hashes in a single request, then that request contains cross-site data
known to be from a single browser. Private information retrieval
(PIR) is a
technique that could allow the server to process Query
requests, either
batched or unbatched, without the server knowing which set hashes are being
queried. We're exploring both single-party PIR, which currently has a lot
of network and computational overhead, and multi-party PIR, which has less
overhead but the additional complexity of operating two non-colluding servers
with consistent copies of the dataset.
We're working on researching and testing a privacy improvement to
low-entropy browser identifiers. The b
, is constant for a given browser, which allows some inferences
to be made by the Join
server, in spite of collisions between users.
To improve the privacy of this scheme and increase the accuracy of our
cardinality calculations, while maintaining our ability to prevent abusive
traffic, we're developing a new token scheme that we're calling Anonymous
Counting Tokens.
Anonymous Counting Tokens will allow a token issuer (the Sign
server)
to issue tokens to a client that are associated with a value that the
client provides. The issuer will be able to ensure that the client
can obtain only a single token for a given value without knowing
the value the client is requesting a token for, i.e. the tokens are
blind.
The Sign
server will use Anonymous Counting Tokens to issue tokens signed
for a single set hash, in contrast to the low-entropy identifier design
where tokens are signed for a value derived from a first party identity the
Sign
server is given by the client. Each user will be able to request
only a single token for each set hash and the Join
server will verify, in
what will become a Join(t, s)
call, that the token was issued for the set
hash s
that the client is attempting to join. It may be surprising that
this behavior is even possible without the token issuer learning anything
about the set hashes joined by a particular user. A forthcoming paper will
describe the cryptographic technique that makes this possible.
To support validity periods for the Anonymous Counting Tokens where each client can get one token per value per period, we will have a registration mechanism which will enable clients to refresh their parameters per TTL period and obtain a fresh token for each period.
Similar to low-entropy identifiers, Sign
will require with Anonymous Counting
Tokens that the server has a first party identity with the user (a Google
Account or other trusted identity provider) for the server to issue tokens.
We are exploring approaches to transition components of this server to run in trusted execution environments (TEEs) with open source code. TEEs implemented by chip manufacturers and cloud providers could allow the browser to verify the server code executing matches the open source project and offer encryption of the server's RAM while in-use, protecting some of the server's data from insider access. Combined with thoughtful key management, TEEs could offer an opportunity to increase the privacy of other server functions like counting cardinalities and persisting state.
Our initial reliance on Google Accounts to issue Trust Tokens that authenticate writes is necessary partly because we don't have other methods of authenticating a Chrome browser to a server. Some platforms, like Android with SafetyNet attestations, can assure the server that a request originates from a legitimate device and client application. Desktop Chrome, however, runs on many different platforms with varying degrees of platform-level security. In the future we hope to develop methods of attesting to our server that requests are from a legitimate instance of desktop Chrome without necessarily requiring the user to be signed-in to their Google Account.
The choices we are making here to protect user privacy impact the behavior of the server, and the behavior of FLEDGE within Chrome.
By using low-entropy identifiers that intentionally collide among browsers
we by design undercount the cardinality of a given set hash. This has the
potential to require more than k browsers to join a set before it is marked
k-anonymous. Even if b
, then the set won't be marked k-anonymous. We will
mitigate this by choosing a uniform distribution of b
identifiers across
browsers. Over time we hope to migrate from low-entropy identifiers to the
anonymous token scheme, which does not undercount cardinality in the same way.
To ensure differential privacy of the output data from this
server, i.e. the set of k-anonymous set hashes, we must
limit how frequently we update the data. We must also add
noise to the
membership of a given set hash in the output. These restrictions mean that an
interest group will not be marked k-anonymous immediately after the
To prevent abuse of the Join
API, we are only allowing writes from users
that are signed-in to Chrome. Developers can still use FLEDGE for users
that aren't signed-in to Google within the Chrome browser, including making
calls to Query
to check k-anonymity thresholds. However, we recognize that
the addition of those users to interest groups won't contribute to counts that
make a set hash k-anonymous. We recognize that this may bias the system against
interest groups that might be more popular with signed-out users. Over time
we expect to reduce, or otherwise eliminate, this potential bias by adding
support for device attestation or other approaches to device-level trust.
The Trust Token issuer, Sign
, will enforce limits on token issuance to
each Google Account. Tokens are one-time-use, so these limits will restrict
the number of Join
calls a browser can make in a given period of time.
This doesn't necessarily limit the number of interest groups the browser can
join locally, only the number it will be considered a part of when computing
k-anonymity on the server. This limit will be per-user, and the browser can
make decisions about which interest groups to spend its tokens on and make
Join
calls for.