Bucket Shadowing

Bucket Shadowing allows the Sync Gateway to serve an existing Couchbase Server bucket, making the contents of that bucket syncable with mobile clients. Actually this isn't completely true -- rather than directly serving the bucket, the gateway manages its own "shadow" bucket that contains the same documents but with the extra revision history metadata it needs. (For complicated reasons it can't store that metadata directly in the original bucket, because your Couchbase app already writes to those documents and the changes would conflict.)

Bucket shadowing is actually a different style of sync that operates between your app bucket and the gateway's shadow bucket. Every time your app changes a document, the gateway detects that and copies the change into its bucket as a new revision of the version-tracked document. And every time a mobile client revises a gateway document, the current revision is saved to your app bucket.

Configuration

We assume you already have a Couchbase Server with a bucket whose contents you want to make syncable.

You'll need another Couchbase bucket to act as the Sync Gateway's shadow. This doesn't have to be on the same server, although that's the most convenient way to do it. The two servers just have to be mutually reachable.

Configure the Sync Gateway as per the existing documentation. Then in your JSON configuration add a new property called shadow to the database configuration object; its value must be an object with properties server and bucket, representing the location of the app bucket to shadow:

"shadow": {
    "server": "http://localhost:8091",
    "bucket": "myapp"
}

NOTE: If you're running a cluster of multiple sync gateways serving the same database, make sure that you only add the shadow property to one gateway's configuration. Otherwise you'll have multiple tasks simultaneously trying to copy the same documents to and from the app bucket, which will result in collisions.

You may also want to add the key "Shadow" to the top-level configuration's "log" property, to get logging output from the shadowing task.

When you start the gateway, it will run through the app bucket's history (its tap feed) copying any new or changed documents into the gateway database. Depending on how large the bucket is, this may take a while. (Unfortunately there's no way to bypass this on subsequent launches of the gateway, due to limitations of the tap feed implementation.)

If you shut down the gateway (or it crashes), and changes are subsequently made to the app bucket, the gateway will find and apply those changes when it next starts up. However, the reverse situation doesn't work yet: if the app bucket becomes unavailable while the gateway is running, changes made to the gateway's database won't get propagated to the app bucket when it comes back. (Hopefully we can fix this before GA.)

Q & A

Q: Doesn't this double the storage required?
A: Yes, unfortunately. (Actually it's a little more than doubling because of the extra revision-history metadata.) In the future we may be able to avoid storing a copy of the document body in the gateway.

Q: Does the bucket used by the gateway have to be on the same Couchbase server as the app bucket?
A: No. In fact, there's probably a performance benefit to having them on separate servers, because the gateway's traffic won't be putting a load on the main server. (You could view the gateway as being a type of caching proxy for mobile clients.)

Q: What happens if the app updates a doc in the bucket at the same time that a mobile client pushes a change to it?
A: In the gateway's database you get a conflict, just as if two clients had changed the document. Both revisions exist, and one will be (arbitrarily) picked as the default. The default revision will be copied back to the app bucket. When a client resolves the conflict by adding or deleting revisions, the resolved revision will be copied to the app bucket.

OLD DESIGN NOTES — 18 Oct 2013

This is a third proposed solution to the problem of letting Couchbase Server apps coexist with the Sync Gateway (the dreaded issue #7.) I think it's better, and probably easier to implement, than either of the two others we've talked about.

The Big Idea

Rather than try to share a bucket with a Couchbase Server app, have the Sync Gateway use a separate bucket. The gateway's bucket will operate pretty much as it does today, but the gateway will also watch the app's bucket (using Tap or XDCR) and apply the changes to its own copies of the documents. Mobile clients will replicate from/to the gateway's bucket as they already do. The gateway can propagate changes made by mobile clients back to the corresponding documents in the app bucket.

Details are below, in the appropriately-named "More Details" section.

Benefits

This should work with existing Couchbase 2.x servers; no need to request new features.
Only minor slowdown of the app bucket: just the overhead of sending the Tap notifications.
Doesn't mess with the original bucket by adding funny metadata fields. Doesn't even require write access to the bucket, if the app doesn't need mobile changes propagated back to it.
The Couchbase client code in the app continues to talk directly to Couchbase Server without having to know about the gateway at all.
We can subset the original bucket if desired, by applying a fast filter to the change notifications from Tap; that way the gateway only has to scale to the number of documents that are relevant to it, which might be a small fraction of what's in the app.
It's very easy to detect changes made by the app and incorporate them into the document's revision history.
We can be one-way if desired, never propagating any changes back to the app bucket. Or we can propagate only certain changes; the app can be in charge of accepting them, e.g. via a custom REST endpoint it runs.
The change-watching process can be abstracted enough to make it a plug-in, and it could even read from some other type of server or data source entirely, like SAP. The only requirements are that it needs to provide a change feed and a document change-id property (like a CAS).

Drawbacks

Requires an extra bucket.
Duplicates the document bodies. (We could avoid this, but it would make the algorithm more complex, slow down document access, and add overhead to the app bucket.)
Requires a task to monitor the app bucket's Tap feed. This can be done within a gateway process, but there should be only one, so if there are multiple gateway nodes they'd have to agree on who did it (with failover, etc.)
Introduces some latency before changes made by the app show up to mobile clients; it should be very little, but it depends on whether the Tap feed and incoming-changes handlers can keep up.

More Details

Setup

Create a new empty bucket for the gateway's use (as today).
Create a JSON config file including the URL of the gateway bucket (as today).
Add a new property to the config: the URL of the app bucket.
Start the gateway.

Startup

(Written from the gateway's perspective)

When opening a database, read a lastTapCount property from a special key in its bucket. This is the last Tap count that's been processed from the app bucket. If missing, it defaults to zero.
Create a Tap feed, including backfill, starting from the saved count.

Note that on the first run of a new database, the Tap backfill will cause the database to be populated with the current contents of the app bucket as the Incoming Changes algorithm (q.v.) runs.

Incoming Changes

When a Tap notification arrives from the app bucket:

Optional: test it against an app-provided quick filter. For example, we might only pay attention to docs whose ID starts with "mobile:". (This filter is going to be run at high volume so it probably shouldn't use JavaScript. Maybe just a regex match on the ID.)
Get the doc with the matching ID in the gateway bucket (the "gateway doc"). If there is none, start with a JSON object initialized to an empty state.
Compare the upstreamCAS property of the gateway doc to the CAS value in the Tap notification; if they're equal, ignore the notification (it must be a dup or something.)
Get the upstreamRevID property of the gateway doc, if there is one; this will be considered the parent revision ID.
Insert the document body from the Tap notification as a new revision with the given parent ID, as though it were coming in from a push replication. This will run it through the sync function, assign channels, push it to _changes listeners, etc. (Question: If there's a CAS conflict with the insert, do I need to go back to step 3 instead of just retrying the stuff in step 5?)
Update the gateway document's upstreamCAS and upstreamRevID properties as part of saving the new revision.
Update the bucket's lastTapCount value.

Outgoing Doc Changes

When a document is updated by a replication client or the gateway REST API (but not by a Tap notification):

Optional: test it against an app-provided filter to see if it should be copied back to the app server.
Check whether the document's default/winning revision's ID is the same as upstreamRevID. If so, there's nothing more to do.
Store the default revision's body to the app bucket, but only if the doc in the app bucket still has a CAS that matches the upstreamCAS. Otherwise, get the app document again and go back to the previous step. (Note: This could be a direct Put to the app bucket, or it could go through a web-hook in the app server.)
Update the upstreamCAS and upstreamRevID in the gateway doc to correspond to the revision sent to the app bucket. (Note: There are probably race conditions here I haven't considered yet.)

Questions

~~Is Tap or XDCR a better protocol to use?~~ Use Tap.
Do we need to store our own copies of the doc bodies, or is the metadata sufficient?
Do we need to store the entire revision tree, every change made in the original bucket, or can we collapse them?
Do we have to be careful about waiting for persistence of changes from the original bucket? (If so, the upcoming UPR features will help.)
If we do write changes back to the original bucket, what happens when there are conflicts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bucket Shadowing

Bucket Shadowing

Configuration

Q & A

The Big Idea

Benefits

Drawbacks

More Details

Setup

Startup

Incoming Changes

Outgoing Doc Changes

Questions

Clone this wiki locally