Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial draft of search API. #2868

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 168 additions & 0 deletions src/sandstorm/indexer.capnp
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Sandstorm - Personal Cloud Sandbox
# Copyright (c) 2017 Sandstorm Development Group, Inc. and contributors
# All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

@0xcae98639575b2b35;

$import "/capnp/c++.capnp".namespace("sandstorm");

using Grain = import "grain.capnp";

interface IndexingSession(Metadata) extends(Grain.UiSession) {
# This is a UiView session type, created by calling UiView.newSession().
#
# Sandstorm requests a session of this type when it wants to index a grain for search purposes.
# Sandstorm will attempt to index all grains, but grains which do not implement IndexingSession
# cannot be indexed.
#
# Indexing sessions, like any other sessions, *must* pay attention to the UserInfo passed to
# `newSession()`; only content which is visible to that user can be indexed.
#
# Note that, as an optimization, Sandstorm may start out assuming that all users see exactly
# the same content, and may do all indexing as an anonymous user, perhaps with no permissions.
# However, if the app logs an ActivityEvent that specifies that it requires specific permissions
# or is visible only to certain users, Sandstorm uses that as a hint to index the path associated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that Sandstorm tries to index any grain that calls SessionContext.activity()?

# with that event separately specifically for people who can see said event. Hence, if your app
# has content that is visible to some users but not others, and that needs to be indexed, it
# should be sure to log appropriate activity events.

indexAll @0 (indexer :GrainIndexer(Metadata));
# Asks the app to iterate through the grain's entire text contents, pushing it all to the given
# indexer capability.

indexPaths @1 (paths :List(Text), indexer :GrainIndexer(Metadata));
# Index a specific list of paths. If a path in the list doesn't exist, calls `indexer.index()`
# with `content` = null for that path.
}

interface GrainIndexer(Metadata) {
# Capability used to index the content of a grain.
#
# This is a one-way capability. Although GrainIndexer is implemented by the indexer and is called
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only vaguely understand what you're intending to protect against by making this a "one-way" capability. Should we be worried that capabilities could be smuggled through the metadata field of IndexableContent?

# by arbitrary content grains, Sandstorm ensures that information cannot leak from the indexer to
# the content grains by implementing a one-way message queue in between. This ensures that bugs
# (or malicious backdoors) in the indexer cannot be exploited by an app to learn about other
# information in the index.
#
# One-way communication is enforced via a message queue: When a content grain calls index(),
# Sandstorm places the call parameters in a queue and returns immediately. Later on, Sandstorm
# takes calls from the queue and actually delivers them to the indexer. This means that not only
# is the indexer unable to return data to the caller, but the caller cannot find out how long it
# takes to process the call nor if it threw an exception.
#
# If data passed to `index()` contains capabilities, Sandstorm seals those capabilities such that
# the indexer cannot communicate to the outside world through them. In the case of `UiView`
# capabilities, the indexer receives a sealed `UiView` which can only be used in limited ways:
# * It can be `save()`ed.
# * It can be passed to `offer()`, causing the grain to open in the user's UI. However, Sandstorm
# implements this such that revoking the user's access to the indexer grain does not cause them
# to lose access to capabilities obtained through it, since that wouldn't make sense for this
# use case.
# * Its `getViewInfo()` method can be called. However, the results will be served from cache
# without invoking the underlying grain.

index @0 (path :Text, content :IndexableContent(Metadata));
# Add content of the given path to the index.
#
# A null value for `content` indicates that the path doesn't exist.
#
# TODO(someday): When Cap'n Proto supports bulk-transfer methods with flow control, mark this
# method as such. This will cause the Cap'n Proto implementation to pretend the method has
# completed (resolving the promise) as soon as it's appropriate for the caller to make another
# call. For now, apps should pretend this is already in place, and make only one call at a
# time -- the current performance penalty in doing so probably isn't a big deal.
}

struct IndexableContent(Metadata) {
body @0 :Text;
# Freeform natural-language body text. Will be tokenized by the search indexer.

links @1 :List(Grain.UiView);
# Other grains that can be accessed through this content.

threadPath @2 :Text;
# Optional path of thread of which this item is a part, if any. See `ActivityEvent.thread`.

metadata @3 :Metadata;
# Metadata for search operators, e.g. "subject:", "author:", etc. Each app can define its own
# metadata format.
#
# TODO(someday): Spec out how this works. Not needed for MVP of search.
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious to me how an indexer would be able to use "generic" operators like this while treating the metadata as completely opaque.

If the plan is to not implement this right away anyway, is there a good reason to even include this method? We can always add more later...


# ========================================================================================
# Indexer implementation

interface IndexerSession extends(Grain.UiSession) {
# A session type specifically implemented by the indexer app. The app's UiView's newSession()
# supports this session type for the purpose of the platform informing the app of new content
# to index.
#
# Each Sandstorm server runs a single indexer app (chosen by the admin), but creates at least
# one indexer grain per user. This ensures that private data from multiple users are not mixed
# and stored in the same place.
#
# As an optimization, if Sandstorm observes a situation where one or more grains containing a
# large amount of indexable data are accessible to a large number of users, it may create a
# shared indexer grain to handle that data. Then, when any user in the group performs a search,
# the query is sent both to their own private index and to the shared index, and the results are
# merged. This way, the data in the shared grains need only be indexed in the shared index,
# rather than be separately indexed in every user's private index. This is sometimes called
# "multi-tier" indexing.

indexGrain @0 (info :GrainInfo) -> (indexer :GrainIndexer);
# Begin a complete index of the given grain. Returns an indexer capability to which grain content
# should be pushed. If any information about this grain (as identified by the grain ID) already
# exists in the index, it should be deleted.
#
# The `grain` capability given here is sealed -- the only things the indexer can do with it are
# save it and offer() it to the user via the search UI.

updateGrain @1 (info :GrainInfo) -> (indexer :GrainIndexer);
# Begin a partial index of the given grain. Specific paths passed to the indexer should be
# replaced with new content, but other paths should be left alone. A call to `indexer.index()`
# with a null `content` means that that specific entry should be deleted.

struct GrainInfo {
cap @0 :Grain.UiView;
id @1 :Text;
title @2 :Text;

# TODO(now): If the grain is not directly in the user's capability store but was found by
# traversing through e.g. a collection, we need some info about said collection so that:
# 1. It can be displayed to the user to tell them where this grain came from.
# 2. If the user opens this grain, Sandstorm actually needs to traverse the path through
# other grains on-demand in order to get the target grain to land in the user's capability
# store. It can do this by opening new IndexingSessions on each intermediate grain and
# fetching just the desired path in order to get the capabilities, perhaps. Or maybe
# the sealed UiView capabilities passed to the indexer actually encapsulate this
# information, and the source app is required to properly revoke the previously-indexed
# capability if the item is deleted.
# Also need to think about what happens if the grain is accessible by multiple paths.

# TODO(someday): Metadata schema?
}

# TODO(now): How do searches happen? It would be nice if the indexer could implement its own UI,
# but this runs into complication when we start doing multi-tier search: it's important that
# information cannot leak from a personal index to a group or public index (although
# information flow in the opposite direction is fine). We could have the shared index tiers
# report (via one-way communications) search results back to the personal indexer in an
# abtrirary format, which it would then merge with its own results. However, we would then not
# be able to make the search query box itself be part of the app, since the app could leak
# info by appending it to the queries. Maybe this is fine because there are a few reasons
# it would be better to keep the query box inside the shell (e.g. to avoid starting up the
# grain just to display the box), but how do "advanced searches" work?
}