Skip to content

Commit

Permalink
fix line endings and dprint fmt
Browse files Browse the repository at this point in the history
  • Loading branch information
simonsan committed Aug 17, 2023
1 parent 2ddfd97 commit 73cbec0
Showing 1 changed file with 94 additions and 43 deletions.
137 changes: 94 additions & 43 deletions src/functional/lenses.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,14 @@ simpler.

To explain the relevant parts of the concept, the Serde API will be used as an
example, as it is one that is difficult for many to to understand from simply
the API documentation. In the process, different specific patterns,
called Optics, will be covered. These are *The Iso*, *The Poly Iso*, and *The
Prism*.
the API documentation.

In the process, different specific patterns, called Optics, will be covered.
These are _The Iso_, _The Poly Iso_, and _The Prism_.

## An API Example: Serde

Trying to understand the way *Serde* works by only reading the API is a
Trying to understand the way _Serde_ works by only reading the API is a
challenge, especially the first time.
Consider the `Deserializer` trait, implemented by any library
which parses a new data format:
Expand Down Expand Up @@ -62,25 +63,41 @@ pub trait Visitor<'de>: Sized {
}
```

There is a lot of type erasure going on here, with multiple levels of associated types being passed back and forth.
There is a lot of type erasure going on here, with multiple levels of associated
types being passed back and forth.

What is the big picture? Why not just have the `Visitor` return the pieces the
caller needs in a streaming API, and call it a day? Why all the extra pieces?

What is the big picture? Why not just have the `Visitor` return the pieces the caller needs in a streaming API, and call it a day? Why all the extra pieces?
One way to understand it is to look at a functional languages concept called
_optics_.

One way to understand it is to look at a functional languages concept called *optics*. This is a way to do composition of behavior and proprieties that is designed to facilitate patterns common to Rust: failure, type transformation, etc.[^1]
This is a way to do composition of behavior and proprieties that is designed to
facilitate patterns common to Rust: failure, type transformation, etc.[^1]

The Rust language does not have very good support for these directly. However, they appear in the design of the language itself, and their concepts can help to understand some of Rust's APIs. As a result, this attempts to explain the concepts with the way Rust does it.
The Rust language does not have very good support for these directly.
However, they appear in the design of the language itself, and their concepts
can help to understand some of Rust's APIs.
As a result, this attempts to explain the concepts with the way Rust does it.

This will perhaps shed light on what those APIs are achieving: specific properties of composability.
This will perhaps shed light on what those APIs are achieving: specific
properties of composability.

## Basic Optics

### The Iso

The Iso is a value transformer between two types. It is extremely simple, but a conceptually important building block.
The Iso is a value transformer between two types. It is extremely simple, but a
conceptually important building block.

As an eaxmple, suppose that we have a custom Hash table structure used as a concordance for a document.[^2] It uses strings for keys (words) and a list of indexes for values (file offsets, for instance).
As an eaxmple, suppose that we have a custom Hash table structure used as a
concordance for a document.[^2] It uses strings for keys (words) and a list of
indexes for values (file offsets, for instance).

A key feature is the ability to serialize this format to disk. A "quick and dirty" approach would be to implement a conversion to and from a string in JSON format. (Errors are ignored for the time being, they will be handled later.)
A key feature is the ability to serialize this format to disk.
A "quick and dirty" approach would be to implement a conversion to and
from a string in JSON format. (Errors are ignored for the time being, they
will be handled later.)

To write it in a normal form expected by functional language users:

Expand All @@ -91,7 +108,8 @@ case class ConcordanceSerDe {
}
```

The Iso is thus a pair of functions which convert values of different types: `serialize` and `deserialize`.
The Iso is thus a pair of functions which convert values of different types:
`serialize` and `deserialize`.

A straightforward implementation:

Expand All @@ -116,17 +134,21 @@ impl ConcordanceSerde {
}
```

This may seem rather silly. In Rust, this type of behavior is typically done with traits. After all, the standard library has `FromStr` and `ToString` in it!
This may seem rather silly. In Rust, this type of behavior is typically done
with traits. After all, the standard library has `FromStr` and `ToString` in it!

But that is where our next subject comes in: Poly Isos.

### Poly Isos

The previous example was simply converting between values of two fixed types. This next block builds upon it with generics, and is more interesting.
The previous example was simply converting between values of two fixed types.
This next block builds upon it with generics, and is more interesting.

It is a Poly Iso. This allows an operation to be generic over any type while returning a single type.
It is a Poly Iso. This allows an operation to be generic over any type while
returning a single type.

This brings us closer to parsing. Consider what a basic parser would do ignoring error cases. Again, here it is in normal form:
This brings us closer to parsing. Consider what a basic parser would do ignoring
error cases. Again, here it is in normal form:

```text
case class Serde[T] {
Expand All @@ -137,7 +159,8 @@ case class Serde[T] {

Here we have our first generic, the type `T` being converted.

In Rust, this could be implemented with a pair of traits in the standard library: `FromStr` and `ToString`. The Rust version even handles errors:
In Rust, this could be implemented with a pair of traits in the standard library:
`FromStr` and `ToString`. The Rust version even handles errors:

```rust,ignore
pub trait FromStr: Sized {
Expand All @@ -151,9 +174,11 @@ pub trait ToString {
}
```

Unlike the Iso, the Poly Iso allows application of multiple types, and returns them generically. This is what you would want for a basic string parser.
Unlike the Iso, the Poly Iso allows application of multiple types, and returns
them generically. This is what you would want for a basic string parser.

At first glance, this seems like a good option for writing a parser. Let's see it in action:
At first glance, this seems like a good option for writing a parser.
Let's see it in action:

```rust,ignore
use anyhow;
Expand Down Expand Up @@ -186,13 +211,22 @@ fn main() {

That seems quite logical. However, there are two problems with this.

First, `to_string` is to a very good way to explain "this is JSON." Every type would need to agree on a JSON representation, and many of the types in the Rust standard library already don't. Using this is a poor fit. This can easily be resolved with our own trait.
First, `to_string` is to a very good way to explain "this is JSON." Every type
would need to agree on a JSON representation, and many of the types in the Rust
standard library already don't.
Using this is a poor fit. This can easily be resolved with our own trait.

But there is a second, subtler problem: scaling.

When every type writes `to_string` by hand, this works. But if every single person who wants their type to be serializable has to write a bnuch of code -- and possibly different JSON libraries -- to do it themselves, it will turn into a mess very quickly!
When every type writes `to_string` by hand, this works. But if every single
person who wants their type to be serializable has to write a bnuch of code --
and possibly different JSON libraries -- to do it themselves, it will turn into
a mess very quickly!

The answer is one of Serde's two key innovations: an independent data model to represent Rust data in structures common to data serialization languages. The result is that it can use Rust's code generation abilities to create an intermediary conversion type it calls a `Visitor`.
The answer is one of Serde's two key innovations: an independent data model to
represent Rust data in structures common to data serialization languages.
The result is that it can use Rust's code generation abilities to create an
intermediary conversion type it calls a `Visitor`.

This means, in normal form (again, skipping error handling for simplicity):

Expand All @@ -208,7 +242,8 @@ case class Visitor[T] {
}
```

The result is one Poly Iso and one Iso (respectively). Both of these can be implemented with traits:
The result is one Poly Iso and one Iso (respectively).
Both of these can be implemented with traits:

```rust
trait Serde {
Expand All @@ -223,7 +258,9 @@ trait Visitor {
}
```

Because there is a uniform set of rules to transform Rust structures to the independent form, it is even possible to have code generation created the `Visitor` associated with type `T`:
Because there is a uniform set of rules to transform Rust structures to the
independent form, it is even possible to have code generation created the
`Visitor` associated with type `T`:

```rust,ignore
#[derive(Default, Serde)] // the "Serde" derive creates the trait impl block
Expand All @@ -238,7 +275,6 @@ generate_visitor!(TestStruct);

Or do they?


```rust,ignore
fn main() {
let a = TestStruct { a: 5, b: "hello".to_string() };
Expand All @@ -249,13 +285,17 @@ fn main() {
}
```

It turns out that the conversion isn't symmetric after all! On paper it is, but with the auto-generated code the name of the actual type necessary to convert all the way from `String` is hidden. We'd need some kind of `generated_visitor_for!` macro to obtain the type name.
It turns out that the conversion isn't symmetric after all! On paper it is, but
with the auto-generated code the name of the actual type necessary to convert
all the way from `String` is hidden. We'd need some kind of
`generated_visitor_for!` macro to obtain the type name.

It's wonky, but it works... until we get to the elephant in the room.

The only format currently supported is JSON. How would we support more formats?

The current design requires *completely re-writing all of the code generation and creating a new Serde trait*. That is quite terrible and not extensible at all!
The current design requires _completely re-writing all of the code generation
and creating a new Serde trait_. That is quite terrible and not extensible at all!

In order to solve that, we need something more powerful.

Expand All @@ -270,14 +310,21 @@ case class Serde[T, F] {
}
```

This construct is called a Prism. It is "one level higher" in generics than Poly Isos (in this case, the "intersecting" type F is the key).

This construct is called a Prism. It is "one level higher" in generics than Poly
Isos (in this case, the "intersecting" type F is the key).

Unfortunately because `Visitor` is a trait (since each incarnation requires its own custom code), this would require a kind of generic type boundary that Rust does not support.
Unfortunately because `Visitor` is a trait (since each incarnation requires its
own custom code), this would require a kind of generic type boundary that Rust
does not support.

Fortunately, we still have that `Visitor` type from before. What is the `Visitor` doing? It is attempting to allow each data structure to define the way it is itself parsed.
Fortunately, we still have that `Visitor` type from before.
What is the `Visitor` doing? It is attempting to allow each data structure to
define the way
it is itself parsed.

Well what if we could add one more interface for the generic format? Then the `Visitor` is just an implementation detail, and it would "bridge" the two APIs.
Well what if we could add one more interface for the generic format?
Then the `Visitor` is just an implementation detail, and it would "bridge" the
two APIs.

In normal form:

Expand All @@ -298,13 +345,19 @@ case class SerdeFormat[T, V] {
}
```

And what do you know, a pair of Poly Isos at the bottom which can be implemented as traits!
And what do you know, a pair of Poly Isos at the bottom which can be implemented
as traits!

Thus we have the Serde API:

1. Each type to be serialized implements `Deserialize` or `Serialize`, equivalent to the `Serde` class
1. They get a type (well two, one for each direction) implementing the `Visitor` trait, which are usually (but not always) through macro generated code. This contains the logic to construct or destruct between the data type and the format of the Serde data model.
1. The type implementing the `Deserializer` trait handles all details specific to the format, being "driven by" the `Visitor`.
1. Each type to be serialized implements `Deserialize` or `Serialize`, equivalent
to the `Serde` class
1. They get a type (well two, one for each direction) implementing the `Visitor`
trait, which are usually (but not always) through macro generated code.
This contains the logic to construct or destruct between the data type and the
format of the Serde data model.
1. The type implementing the `Deserializer` trait handles all details specific
to the format, being "driven by" the `Visitor`.

This splitting and Rust type erasure is really to achieve a Prism through indirection.

Expand Down Expand Up @@ -373,7 +426,7 @@ How does actual Serde deserialize a bit of JSON into `struct Concordance` from e

For our very simple structure above, the expected pattern would be:

1. Begin visiting a map (*Serde*'s equvialent to `HashMap` or JSON's dictionary).
1. Begin visiting a map (_Serde_'s equvialent to `HashMap` or JSON's dictionary).
1. Visit a string key called "keys".
1. Begin visiting a map value.
1. For each item, visit a string key then an integer value.
Expand All @@ -393,7 +446,7 @@ reflection of each type based on the type itself.
Rust does not support that, so every single type would need to have its own
code written based on its fields and their properties.

*Serde* solves this usability challenge with a derive macro:
_Serde_ solves this usability challenge with a derive macro:

```rust,ignore
use serde::Deserialize;
Expand All @@ -419,8 +472,7 @@ The `deserialize` code will then create a `Visitor` which will have its calls
If everything goes well, eventually that `Visitor` will construct a value
corresponding to the type being parsed and return it.

For a complete example, see the [*Serde*
documentation](https://serde.rs/deserialize-struct.html).
For a complete example, see the [_Serde_ documentation](https://serde.rs/deserialize-struct.html).

The result is that types to be deserialized only implement the "top layer" of
the API, and file formats only need to implement the "bottom layer".
Expand All @@ -442,8 +494,7 @@ But it may also need procedural macros to create bridges for its generics.
- [luminance](https://github.com/phaazon/luminance-rs) is a crate for drawing
computer graphics that uses similar API design, including proceducal macros to
create full prisms for buffers of different pixel types that remain generic
- [An Article about Lenses in
Scala](https://web.archive.org/web/20221128185849/https://medium.com/zyseme-technology/functional-references-lens-and-other-optics-in-scala-e5f7e2fdafe)
- [An Article about Lenses in Scala](https://web.archive.org/web/20221128185849/https://medium.com/zyseme-technology/functional-references-lens-and-other-optics-in-scala-e5f7e2fdafe)
that is very readable even without Scala expertise.
- [Paper: Profunctor Optics: Modular Data
Accessors](https://web.archive.org/web/20220701102832/https://arxiv.org/ftp/arxiv/papers/1703/1703.10857.pdf)
Expand Down

0 comments on commit 73cbec0

Please sign in to comment.