forked from rstudio/apis-plumber
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path05-execution-model.Rmd
93 lines (53 loc) · 10.4 KB
/
05-execution-model.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# Runtime {#runtime}
## Execution Model {#execution-model}
When you `plumb()` a file, Plumber calls `source()` on that file which will evaluate any top-level code that you have defined.
```{r, echo=FALSE, results='asis'}
include_r("files/apis/05-01-scope.R")
```
If you call `plumb()` on this file, the `counter` variable will be created and will live in the environment created for this API. However, the endpoint defined will not be evaluated until it is invoked in response to an incoming request. Because the endpoint uses `<<-`, the "double-assignment" operator, it mutates the `counter` variable that was previously defined when the file was `plumb()`d. This technique allows all endpoints and filters to share some data defined at the top-level of your API.
## Environments
By default, when you create a new Plumber router (which happens implicitly when you call `plumb()` on a file), a new environment is created especially for this router. It is in this environment that all expressions will be evaluated and all endpoints invoked.
This can become important if you consider mounting routers onto one another. In that case, you may expect that they would be able to share state via their environment, but that will not work by default. If you're [creating your routers programmatically](#programmatic-usage), then you can specify an environment when initializing your Plumber router using the `envir` parameter. This is the environment in which:
- A decorated R script, if provided, will be `source()`d.
- All expressions will be evaluated.
- All endpoint and filter functions will be executed.
It is important to be aware that subrouters, by default, would each have their own environment. If you want multiple Plumber routers to share an environment, you will need to provide a single, shared environment when you create the routers.
## Performance & Request Processing
R is a single-threaded programming language, meaning that it can only do one task at a time. This is still true when serving APIs using Plumber, so if you have a single endpoint that takes two seconds to generate a response, then every time that endpoint is requested, your R process will be unable to respond to any additional incoming requests for those two seconds.
Incoming HTTP requests are serviced in the order in which they appeared, but if requests are coming in more quickly than they can be processed by the API, a backlog of requests will accrue. The common solutions to this problem are to do either or both of:
1. Keep your API performant. All filters and endpoints should complete very quickly and any long-running or complicated tasks should be done outside of the API process.
2. Run multiple R processes to redundantly host a single Plumber API and load-balance incoming requests between all available processes. See the [hosting section](#hosting) for details on which hosting environments support this feature.
## Managing State
Often, Plumber APIs will require coordination of some state. This state may need to be shared between multiple endpoints in the same API (e.g. a counter that increments every time an endpoint is invoked). Alternatively, it could be information that needs to be persisted across requests from a single client (e.g. storing a preference or setting for some user). Lastly, it might require coordinating between multiple Plumber processes running independently behind a load-balancer. Each of these scenarios have unique properties that determine which solution might be appropriate.
As previously discussed, R is single-threaded. Therefore it's important that you consider the fact that you may eventually need multiple R processes running in parallel to handle the incoming traffic of your API. While this may not seem important initially, you may thank yourself later for designing a "horizontally scalable" API (or one that can be scaled by adding more R processes in parallel).
The key to building a horizontally scalable API is to ensure that each Plumber process is "stateless," meaning that any persistent state lives outside of the Plumber process. In any of the hosting environments that exist today, it is not guaranteed that two subsequent requests from a single client will be served by the same process. Thus it's never safe to assume that information stored in-memory will be available between requests for a horizontally scaled app. Below are a few options to consider to coordinate state for a Plumber API.
### In-Memory
As shown previous in the [Execution Model section](#execution-model), it is possible to share state using the environment associated with the Plumber router.
```{r, echo=FALSE, results='asis'}
include_r("files/apis/05-01-scope.R")
```
This is the one approach presented that does not allow your Plumber process to be stateless. The approach is sufficient for coordinating state within a single process, but as you scale your API by adding processes, this state will no longer be coordinated between them.
Therefore this approach can be effective for "read-only" data -- such as if you were to load a single, large dataset into memory when the API starts, then allow all filters and endpoints to reference that dataset moving forward -- but it will not allow you to share state across multiple processes as you scale. If you want to build a scalable, stateless application, you should avoid relying on the in-memory R environment to coordinate state between the pieces of your API.
### File System
Writing to files on disk is often the next most obvious choice for storing state. Plumber APIs could modify a data frame then use `write.csv()` to save that data to disk, or use `writeLines()` to append some new data to an existing file. These approaches enable your R process to be stateless, but are not always resilient to concurrency issues. For instance, if you've horizontally scaled your API to five R processes and two go to `write.csv()` simultaneously, you will either see one process's data get immediately overwritten by the other's, or -- even worse -- you may end up with a corrupted CSV file which can't be read. Unless otherwise stated, it's safe to assume that any R function that writes data to disk is not resilient to concurrency contention, so you should not rely on the filesystem to coordinate shared state for any more than a single R process running concurrently.
It's also important to ask whether or not the [hosting platform](#hosting) you'll be using supports persistent storage on disk. For instance, Docker may insulate your R process from your hardware and not allow you to write outside of your container. RStudio Connect, too, will provision a new directory every time you deploy an updated version of your API which will discard any data you had written to disk up to that point. So if you're considering writing your state to disk long-term, be sure that your hosting environment supports persistent on-disk storage and that you've considered the concurrency implications of your code.
### Cookies {#state-cookies}
HTTP cookies are a convention that allow web servers to send some state to a client with the expectation that the client would then include that state in future requests. See the [Setting Cookies section](#setting-cookies) for details on how to leverage cookies in Plumber.
// TODO: graphic
All modern web browsers support cookies (unless configured not to) and many other clients do, as well, though some clients require additional configuration in order to do so. If you're confident that the intended clients for your API support cookies then you could consider storing some state in cookies. This approach mitigates concerns about horizontal scalability, as the state is written to each client independently and then included in subsequent requests from that client. This also minimizes the infrastructure requirements for hosting your Plumber APIs since you don't need to setup a system capable of storing all of this state; instead, you've commissioned your clients to store their own state.
One issue with maintaining state in cookies is that their size should be kept to a minimum. Clients impose restrictions differently, but you should not plan to store more than 4kB of information in a cookie. And realize that whatever information gets placed in the cookie must be retransmitted by the client with *every* request. This can significantly increase the size of each HTTP request that your clients make.
The most notable concern when considering using cookies to store state is that since your clients are responsible for storing and sending their state, you cannot expect that the state has not been tampered with. Thus, while it may be acceptable to store user preferences like `preferredColor="blue"`, you should *not* store authentication information like `userID=1493`, since the user could trivially change that cookie to another user's ID to impersonate them.
If you'd like to use cookies to store information with guarantees that the user cannot either read or modify the state, see the [Encrypted Cookies section](#encrypted-cookies)).
### External Data Store
The final option to consider when coordinating state for an API is leveraging an external data store. This could be a relational database (like MySQL or Amazon RedShift), a non-relational database (like MongoDB), or an transactional data store like Redis.
One important consideration for any of these options is to ensure that they are "transactional," meaning that two Plumber processes trying to write at the same time won't overwrite one another. If you're interested in pursuing this option you should see [db.rstudio.com](http://db.rstudio.com/) or looks at [some](http://shiny.rstudio.com/articles/overview.html) of the [resources](http://shiny.rstudio.com/articles/sql-injections.html) [put together](http://shiny.rstudio.com/articles/pool-basics.html) for Shiny as pertains to dealing with databases in a web-accessible R platform.
## Exit Handlers
It may be useful to define a function that you want to run as your API is closing -- for instance, if you have a pool of database connections that need to be cleaned up when your Plumber process is being terminated. You can use the `exit` hook to define such a handler.
```r
pr <- plumb("plumber.R")
pr$registerHook("exit", function(){
print("Bye bye!")
})
pr$run()
```
When you interrupt the API (for instance by pressing the `Escape` key or `Ctrl+C`) you'll see `Bye bye!` printed to the console. You can even register multiple `exit` hooks and they'll be run in the order in which they were registered.