Skip to content

Commit

Permalink
add error handling and draft for PokiDB
Browse files Browse the repository at this point in the history
  • Loading branch information
Paul Okstad committed May 31, 2018
1 parent b7cae48 commit 171ada9
Show file tree
Hide file tree
Showing 3 changed files with 359 additions and 10 deletions.
211 changes: 211 additions & 0 deletions _drafts/pokidb-couchdb-reimagined.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
---
layout: post
title: PokiDB: CouchDB Reimagined
tags: [software, couchdb, databases, golang]
---

CouchDB has been one of my favorite databases, but it's also been one of my biggest
disappointments.

To understand my initial optimism, check out my Quora answer to
["What is the essence of CouchDB"](https://www.quora.com/What-is-the-essence-of-CouchDB):

> CouchDB is of the web.
>
> Damien Katz (original creator of CouchDB) said in an interview that CouchDB
> felt like something in the natural world that he discovered rather than
> invented. The creation and continued evolution of CouchDB is something that
> has a life of its own.
>
> Everything in CouchDB is designed around being scalable. This means not
> supporting single server actions that are slower than O(log(n)). Everything is
> stored in B-Trees that are easy to append data to and fast to look up and
> indexes are updated incrementally on each access. (Note: The benefits of
> scaling will become more evident as BigCouch, a.k.a. IBM Cloudant's CouchDB,
> is merged into Apache CouchDB 2.0 to support multi-server CouchDB.)
>
> Everything in CouchDB is web-accessible by being RESTful and JSON-based. This
> means that it can be accessed from browser side Javascript. This is so true
> that the management tool is actually a web app (Futon/Fauxton) written in
> purely HTML and Javascript. It also means that you don't need special database
> drivers to access CouchDB. If you have a library for handling HTTP requests,
> then you can access CouchDB. Stuck behind an HTTP proxy? You can still access
> your database.
>
> Probably the biggest thing that makes CouchDB what it is, is master-master
> replication. CouchDB is a peer-to-peer app. It allows data to propagate
> virally the same way that rumors spread via word of mouth. The full potential
> of CouchDB has yet to be tapped into. A Javascript clone of CouchDB, called
> PouchDB, was created to be compatible with the same replication API. This
> allows a user to pull their data off the server and store it locally in the
> browser. Mobile libraries Couchbase Lite and Cloudant Sync are also compatible
> with the replication API so that data can be pulled and pushed from mobile
> devices. Couchbase Server uses the replication API as a gateway between their
> non-HTTP-based database and the HTTP world.
>
> CouchDB's entire API is a web API. So if you really wanted to, you could take
> any server side environment (e.g. Node, Python, Perl, Go, Java, Ruby, etc...)
> and emulate the same API to behave the same way and function with existing
> CouchDB servers. Theoretically, you could take an existing datastore (e.g.
> Google App Engine, Amazon Dynamo, MongoDB, MySQL) and put a "CouchDB layer" on
> top of it to make it compatible with other CouchDB servers. That is really
> cool.
Through my experience with both CouchDB and Go programming, I've come to the conclusion that things could be done differently... and by differently I mean better.

## Pitfalls

That was written back at the end of 2014. Since then, I've learned some lessons about Couch.

### Couch Apps

Couch Apps were one of the big promises that attracted many initial users, but it was
[secretly deprecated](http://mail-archives.apache.org/mod_mbox/couchdb-user/201702.mbox/%[email protected]%3E).
Read through the mailing lists and you'll see recommendations to avoid using the `_show` and
`_list` transformations that were [hyped so well in the CouchDB: Definitive Guide] (
http://guide.couchdb.org/editions/1/en/show.html). Some CouchApp users became concerned and
voiced frustration:

- [the future of couchapps](http://mail-archives.apache.org/mod_mbox/couchdb-user/201505.mbox/%3cCADdZc95CgPfjh74CSS=_0M5qy4M2-Kw9H3EJ4ZK50Y2cPBBW1A@mail.gmail.com%3e)
- [how couchapps fit into the couchdb story](http://mail-archives.apache.org/mod_mbox/couchdb-user/201505.mbox/%3cCAPMhwa4mc=RMSPHcabSB0gPL6LaZi0MbDSEnosgeW_0achXEqQ@mail.gmail.com%3e)

At the end of the day, the core CouchDB team recommended using dedicated web server code to
access and present data from CouchDB. That's all fine and dandy, but then why are we using this
HTTP based protocol that is easily accessible from web browsers (but not the most performant for
use from a backend service)? That leads me into the next issue...

### JSON HTTP API

While the JSON API was bleeding edge at the time, it has begun to show its age. HTTP 1.1 doesn't
expose the performance benefits of HTTP2, such as multiplexing requests, proactively pushing
responses, and reducing overhead.

Better yet, gRPC could allow for bi-directional streams between CouchDB and a client. The ability
to push streams of data between peers would be awesome possum. Even better yet: use protobuf
as the message format to reduce the overhead of repeitive field definitions.

### No Schema

One of the first things people enjoy about CouchDB is the ability to dump JSON messages with
any arbitrary structure. It's nice because you add new document types without having to update
a database schema. In theory, this utopian freedom seems liberating, but in practice it's frustrating.

My primary concern with no schemas is human error. As developers, we need feedback that the
way we are interacting with the database is valid. With no schemas, there is a lack of feedback.
While coordinating with other developers, or even just working on a solo project, having a way to
guarantee that types of data in a database fits a understood pattern is critical.

In my mind, there is no doubt that some enforcement of schema's is necessary, but the more
important question is: how much? We, as developers, want freedom to insert whatever we want
into the database. Over time though, we start to recognize patterns and understand where that
freedom is not wanted. Ideally, a good CouchDB schema enforcement would rely on the
designation of required and optional fields. Something like JSON schemas could accomplish this.
Maybe JSON schemas are overkill. Protobuf definitions are much simpler and allow for deprecating
fields over time.

### Erlang

Erlang has always been hyped as the special sauce of CouchDB. "It's designed for distributed
systems" is what we always hear, and it's true. Erlang shines in distributed systems. The problem
though, has been the fact that CouchDB historically was a single node system that scaled via an
HTTP based protocol. The Erlang VM was only leveraged for a single node (not sure if that's the
case for CouchDB 2.0).

Perhaps the biggest issue with Erlang has been the aversion of CouchDB users to dive into the
code base. If the project were reimagined today, it would almost certainly be written in an easier to
onboard language like Go or Elixir. Looking at many of the newer database projects, there has
been a surge in new codebases based on Go.

There is also the strange quirk that view and transform functions are written in Javascript. The
CouchDB community has strong crossover ties to the Javascript peeps. Take a look at PouchDB,
and Mustache.js. It probably would have made the most sense to write CouchDB in Node.js given
the community that ended up embracing CouchDB the strongest. Max Ogden, one of the most
influential CouchDB hackers, ended up writing a new database, [Dat](https://datproject.org), in
Node.js that was heavily inspired by CouchDB.

### Big Couch (a.k.a. CouchDB 2.0)

The current state of CouchDB has been the result of a major refactoring/merging
of code from IBM's acquired Cloudant service. This signaled a new milestone in
CouchDB's lifetime. It was no longer simply a single node product, but now a
very complex system designed for enterprise usage.

## Poki: CouchApps Reimagined

Poki is a new kind of database with a modest goal: become the database/webserver of choice for small independently owned websites. The goal of this tool is not to become the infrastructure of large corporations. It is designed to enable simple solutions for communities that wish to be self reliant. Self reliance means not depending on a spyware infested social media company to provide your services.

The core design principles of Poki:

- Poki Apps are the core feature, everything else supports their purpose
- Database and app design is centered around user data
- Pluggable architecture that allows for sharing of apps and submodules
- A single language for all components: i.e. server, apps, and submodules
- Single process design - all pluggable components operate in the same process space

The database improves upon CouchDB in the following ways:

- Pure Go - written in pure Go to leverage all tooling benefits and improve onboarding
- Library first - designed for easily embedding in other Go apps, similar to BoltDB
- Authentication/View/Show/Validate/Filter functions are implemented in the same process via:
- Closures - for experienced Go devs - simliar to BoltDB's transaction model
- Compiled Go plugins - for non-devs - allow easy sharing of Poki Apps within community without writing code
- Multiple pluggable interfaces based upon core Go API
- Protobuf based gRPC interface
- JSON based HTTP2 interface
- Replication protocol
- available for reuse through libraries - e.g. allows writing adapters that can perform backups

### Poki: Go Interface Draft

How should this interface look to a Go dev embedding this into a program? Some initial thoughts:

#### Documents

A document is a collection of bytes with the following attributes:

- ID (database unique) - user provided (or database generated) UUID
- Revision - a crypto hash of the data
- content type - is this JSON, PNG, HTML, JSON, Protobuf? Useful to serve resources.

```go
package poki

type ContentType string

type Document struct {
ID string
Rev []byte
Data []byte
Type ContentType
}
```

#### Databases

A database is a sequential collection of revisions. It is possible to iterate over every single document revision in order. This is what makes replication possible: a time series log of documents.

```go
db, err := NewServer("/home/poki").Get("users")
If err != nil { return err }

err = db.ForEachRev(func(d poki.Doc) error {
fmt.Printf("Doc: %+v", d)
return nil
})
```

You can also iterate over every document (which implicitly only iterates over the last revision for each ID) with `ForEachDoc`.

#### Servers are composed of DB's

A server is essentially a collection of databases and the access controls to them. Servers control access to databases through the use of a pluggable authentication modules.

```go
type Auther interface {
Authenticate() (UserContext, error)
}

srv := NewServer("/home/poki")
srv.SetAuth(
```

16 changes: 6 additions & 10 deletions _layouts/list.html
Original file line number Diff line number Diff line change
@@ -1,16 +1,12 @@
---
layout: base
---
<ul>
{% for post in site.posts %}
<li>
<a href="{{ post.url }}">{{ post.title }}</a>
<p>{{ post.date | date: "%B %-d, %Y" }}</p>
<p>
{{ post.excerpt | remove: '<p>' | remove: '</p>' }}
</p>
<p>{% for tag in post.tags %}{{ tag }}{% unless forloop.last %} • {% endunless %}{% endfor %}{% if forloop.last %}<br/>{% endif %}</p>
</li>
<div>
<h2 class="h2-underline"><a href="{{ post.url }}">{{ post.title }}</a></h2>
<p>{{ post.date | date: "%B %-d, %Y" }}</p>
<p>{{ post.excerpt | remove: '<p>' | remove: '</p>' }}</p>
<p>{% for tag in post.tags %}{{ tag }}{% unless forloop.last %} • {% endunless %}{% endfor %}{% if forloop.last %}<br/>{% endif %}</p>
</div>
{% endfor %}
</ul>
{% include footer.html %}
142 changes: 142 additions & 0 deletions _posts/2018-02-25-simple-golang-strategies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
---
layout: post
title: Simple Golang Strategies for Repetitive Error Handling
tags: [golang, error handling]
---
Error handling in Golang can sometimes suck. You may sometimes start to feel a bit of pain in your wrist from all of the `if err != nil {` you've been typing. Even if you have that code handled by an efficient macro, you're going to end up with a file that looks like this:

```go
if err := setStuffUp(); err != nil {
return err
}
if err := doSumptin(); err != nil {
return err
}
val, err := doDis()
if err != nil {
return err
}
if err := doDat(); err != nil {
return err
}
```

How are we going to reduce the error handling boilerplate without introducing silly constructs like functors and monads that don't scale well in non-generics Go?

## Helper Functions

The simplest solution to reducing error boilerplate is to use helper functions. The simplest example is logging an error if it occurs:

```go
func logIfErr(err error) {
if err != nil {
log.Printf("error occurred: %s", err)
}
}
```

This is a great helper function for things like `defer logIfErr(f.Close())` where we don't care about returning the error so much but we still would like to report that it's happening. This is very useful when using the `errcheck` linter.

## Table Pattern

Sometimes we are executing repetitive code where each unit has the potential to return an error:

```go
var err error
readContents := struct {
f1 []byte
f2 []byte
f3 []byte
}{}
readContents.f1, err = ioutil.ReadAll(f1)
if err != nil {
return fmt.Errorf("could not read f1 due to: %s", err)
}
readContents.f2, err = ioutil.ReadAll(f2)
if err != nil {
return fmt.Errorf("could not read f2 due to: %s", err)
}
readContents.f3, err = ioutil.ReadAll(f3)
if err != nil {
return fmt.Errorf("could not read f3 due to: %s", err)
}
```
*[See above snippet on Go Playground](https://play.golang.org/p/rwPaEjGSSbL)*

Because the contents of each read operation is being assigned to a unique field of readContents, we aren't able to generalize this pattern as much as we want without using reflection (ew yuck). We also can't put each of these operations into a function without having to repetively handling each error in both the helper function and the caller. This is where we recognize that the repetitive calls fit the very similar table driven test pattern:

```go
var err error
readContents := struct {
f1 []byte
f2 []byte
f3 []byte
}{}
for _, readOp := range []struct {
contents *[]byte
source io.Reader
}{
{
contents: &readContents.f1,
source: f1,
},
{
contents: &readContents.f2,
source: f2,
},
{
contents: &readContents.f3,
source: f3,
},
} {
*readOp.contents, err = ioutil.ReadAll(readOp.source)
if err != nil {
return fmt.Errorf("could not read due to: %s", err)
}
}
return nil
```
*[See above snippet on Go Playground](https://play.golang.org/p/cUXh4ABv8uT)*

## Panic & Recover

An alternative to the table driven approach is to use helper functions that panic.

An important Go idiom is that panics should always be recovered from so that the user of a library or program is never exposed to an unhandled panic.
Another Go idiom is to name functions that panic with a `must` prefix. This communicates that this function must succeed, or a panic will be fired.

```go
readContents := struct {
f1 []byte
f2 []byte
f3 []byte
}{}
mustRead := func(dst *[]byte, src io.Reader) {
var err error
*dst, err = ioutil.ReadAll(src)
if err != nil {
panic(fmt.Errorf("could not read due to: %s", err))
}
}
err := func() (err error) {
defer func() {
e := recover()

if e == nil {
// no panic error
return
}
if e, ok := e.(error); ok {
// err panic occurred
err = e
}
}()
mustRead(&readContents.f1, f1)
mustRead(&readContents.f2, f2)
mustRead(&readContents.f3, f3)
return
}
```
*[See the above snippet on the Go Playground](https://play.golang.org/p/h362GbAJCZa)*

Note the usage of a named return, so that the deferred statement may modify the return value. Make sure to use a linting tool (e.g. govet) to identify shadowing of the `err` variable.

0 comments on commit 171ada9

Please sign in to comment.