Skip to content

Commit

Permalink
Merge pull request #35 from juttle/update-api-walkthrough
Browse files Browse the repository at this point in the history
Limit gmail adapter walkthrough to gmail-specific stuff.
  • Loading branch information
Mark Stemm committed Mar 9, 2016
2 parents 0b492c5 + 0250e24 commit d9d3913
Show file tree
Hide file tree
Showing 3 changed files with 70 additions and 251 deletions.
110 changes: 70 additions & 40 deletions docs/adapter_impl_notes.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,23 @@
# Adapter Implementation Notes

This page talks about implementation details of writing adapters. If you want to write your own adapter for your own backend, use this document as a guide.
This page talks about the implementation details of the gmail adapter. It can be used with the more general [adapter API guide](https://github.com/juttle/juttle/blob/master/docs/adapters/adapter_api.md) to provide details on how to write a new adapter.

For this document, we'll use the Gmail adapter as an example. In this case, the backend consists of a set of email messages. Each message contains a timestamp (when the message was received) and fields (email headers such as `From:`, `To:`, `Subject:`, the message body, etc). The [Gmail API](https://www.npmjs.com/package/googleapis) supports [search expressions](https://support.google.com/mail/answer/7190?hl=en) that select messages based on date, headers, or a full-text search on the message contents, and the ability to read and write messages.
## Module Initialization

For reads, the adapter's job is to interpret the options included in the juttle `read` command into a set of matching email messages, construct json objects representing those messages, and pass them to the Juttle Runtime by calling `emit()`. For writes, the adapter's job is to take the output of programs and "save" that output to the backend by sending emails.
The initialization function exported by
[lib/index.js](../lib/index.js) takes a `config` argument containing
the configuration object for the adapter. For the gmail adapter, this
contains the client credentials for the application and the OAuth
token to access the mailbox. The initialization function calls
`authorize()` to initialize the Gmail API with the credentials and
token and passes the result to the Read and Write modules.

More sophisticated adapters work together with the juttle optimizer to push aggregation operations directly into the backend. For example, to count the number of messages in a given time period you could simply fetch all the messages and have the juttle program perform the counting. However it would be more efficient to count the number of messages via Gmail APIs and simply return the count instead.

This document describes details on module loading and configuration. There are separate documents that discuss the details of the [read gmail](./read.md) and [write gmail](./write.md) procs.

## The `JuttleAdapterAPI` global

An adapter will need to access functions and objects from the Juttle runtime. All of these functions should be accessed via a global object [JuttleAdapterAPI](https://github.com/juttle/juttle/blob/master/lib/adapters/api.js). [this page](https://github.com/juttle/juttle/blob/master/docs/adapters/adapter_api.md) describes the API object and how to use it in more detail.

## Javascript Modules, Classes and Methods

The Gmail adapter implements a javascript module in [index.js](../index.js). It requires the main module in [lib/index.js](../lib/index.js) via:

```Javascript
module.exports = require('./lib/');
```

When the adapter is loaded, the CLI/juttle-engin perform a `require` of the module (i.e. the top-level directory containing `index.js`). `lib/index.js` in turn `require()s` `read.js` and `write.js`, which contain the implementation of the read and write classes, respectively.

The main function exported by `lib/index.js` takes a `config` argument containing the configuration object for the adapter, and returns an object with `name`, `read`, and `write` attributes. The value for the `name` attribute is `gmail`, corresponding to the `read gmail`/`write gmail` proc in juttle programs. The value for `read` is a class inheriting from `AdapterRead`, which performs the read work of the adapter. The value for `write` is an class inheriting from `AdapterWrite`, which performs the write work of the adapter.

Here's a slighly simplified version of the exported function from `lib/index.js`:

```Javascript
var Read = require('./read');
var Write = require('./write');
Here's the relevant section of `lib/index.js`:

```JavaScript
var GmailAdapter = function(config) {

...
var auth = authorize(config['client-credentials'],
config['oauth2-token']);

Expand All @@ -46,22 +30,68 @@ var GmailAdapter = function(config) {
write: Write.write
};
};

module.exports = GmailAdapter;
```

## Configuration
## Read and `read gmail`

The Gmail adapter needs application client credentials as well as an OAuth2 token to use the Gmail API. These items are provided in the config object passed to the `GmailAdapter` function exported by the module.
### Timerange and Filtering Expression

The configuration is saved in the juttle [configuration file](https://github.com/juttle/juttle/blob/master/docs/reference/cli.md#configuration). Within the configuration object, the module name (in this case `juttle-gmail-adapter`) is used to select the portion of the configuration to pass to the module's function. That is, given a configuration file:
The Gmail API supports date-based searches via `before:` and `after:`. However, the arguments to `before:` and `after:` can only be dates, while the `-from`/`-to` options to `read` have greater (sub-second) precision. So when fetching messages, the adapter rounds down the `-from` to the beginning of the day and `-to` to the end of the day. Afterward, the adapter compares the actual message receipt time (in the `internalDate` field) against the `-from`/`-to` and only returns matching messages from `read()`.

Field matches in search expressions are interpreted as message header matches for a limited set of headers:

* `from`
* `to`
* `subject`
* `cc`
* `bcc`

The following comparison operators are supported:

* ~, =~ (wildcard operator). This is because Gmail's header searches match on substrings and do not perform exact matches.
* !~ (wildcard negation).

These header matches are pushed into the Gmail API search expression. Logical operators such as `AND`, `OR`, and `NOT` join terms in the expression, and parentheses can be used for logical grouping and nesting.

Full-text search is supported by the Gmail API, so any full-text searches are passed through to the search expression.

If a filter expression refers to other fields or uses other operators, the adapter returns an error.

### Constructing Points

After selecting the matching messages in `read()`, the messages must be converted to points. The Gmail adapter supports points with meaningful times, so the `time` field must be of type `JuttleMoment`, and is converted from the `internalDate` field of the message.

The adapter includes the following fields in each point:

* `time`
* `id`: the message-id from the message
* `snippet`: a short summary of the message
* `from`: the from header in the message
* `to`: the from header in the message
* `subject`: the from header in the message
* `cc`: the cc: header in the message (if present)
* `bcc`: the cc: header in the message (if present)

The goal is to provide meaningful fields that may be useful in a variety of juttle programs, without simply passing the entire message to the program.

Once the set of points are built, they are returned from `read()` and passed to the juttle program.

## Write and `write gmail`

### Program Output

The Gmail Adapter buffers points in memory until either the program has completed or a configurable batch size has been reached. This means that program output may be split across multiple messages.

When a batch of points are sent, it constructs emails containing the JSON points and sends those emails using the Gmail API. It does so using asynchronous functions (specifically [Bluebird Promise](http://bluebirdjs.com) chains) that do not block the node.js event loop.

### `write` handling

`write()` maintains a queue of points. Calls to `write()` simply append to the queue. If a `-limit` was specified in `write gmail`, when the queue of points reaches the limit, a message is sent (asynchronously by creating a promise, see below).

### `eof()` handling

`eof()` should return a promise that resolves when all output has been written. In the case of the Gmail Adapter, the promise resolves when all points have been packaged in email messages and sent. Each time a message is sent via `write()`, the promise is appended to the base promise in the `writePromise` instance variable. The chained promise is returned when `eof()` is called, and resolves when all emails have been sent.

```
{
"adapters": {
"twitter": {...},
"gmail": {...}
}
}
```
The object below `gmail` will be passed to the module's main function.

154 changes: 0 additions & 154 deletions docs/read.md

This file was deleted.

Loading

0 comments on commit d9d3913

Please sign in to comment.