Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standards #4

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Standards #4

wants to merge 4 commits into from

Conversation

kozross
Copy link
Member

@kozross kozross commented Dec 17, 2024

Closes #1. This is the thread for feedback on these.

@itsfarseen
Copy link
Collaborator

LANGUAGE pragmata / Justification

The phrase “signposting”

Is this extension something that needs ‘signposting’, in that it involves additional thought or context that wouldn’t be needed otherwise?

It’s used in the following contexts:

Having this (DeriveTraversable) globally enabled makes this inconsistency slightly less visible, and due to Traversable’s lawfulness, is completely safe. While this merely provides a derivation for a lawful Traversable, rather than the lawful Traversable, this is still useful and requires no signposting.

While technically, only DerivingStrategies would be sufficient for our requirements, since DerivingVia is not mandatory and is clearly signposted, while having no effects beyond its use sites, we enable it globally for its usefulness.

Enabling MultiParamTypeClasses globally is practically a necessity given all of these, and is clear enough that it doesn’t need signposting.

StandaloneDeriving, while not being needed often, is quite useful when using via-derivations with complex constraints, such as those driven by type families, or for GADTs. This can pose some syntactic difficulties (especially with via derivations), but the extension is not problematic in and of itself, as it doesn’t really change how the language works, and is self-signposting.

When you say something needs signposting, do you mean we need to put a comment explaining why we are using it at the use-sites?

What do you mean by "signposted" in contrast to "needs signposting"?

What do you mean by self-signposting?

“self telegraphing”

TypeApplications is so widely used that it would likely be enabled everywhere anyway. Modern Haskell APIs now prefer to take type arguments directly, rather than passing a Proxy or an undefined (as was done before). Furthermore, TypeApplications is self-telegraphing, and poses no problems for us, as we consider type arguments and their order to be part of the API, and require explicit foralls and signatures.

What is “self telegraphing”?

DerivingVia, DerivingStrategies

DerivingVia provides two benefits. Firstly, it implies DerivingStrategies, which is good practice to use (and in fact, is required by this document): this avoid ambiguities between different derivations, and makes the intent of a derivation clear on immediate reading. This reduces the amount of non-local information about derivation priorities. Secondly, DerivingVia enables considerable savings in boilerplate in combination with other extensions that we enable either directly or by implication.
While technically, only DerivingStrategies would be sufficient for our requirements, since DerivingVia is not mandatory and is clearly signposted, while having no effects beyond its use sites, we enable it globally for its usefulness.

When you say “DerivingStrategies is required by this document”, is it because of -Werror -Wmissing-deriving-strategies?

“Since DerivingVia is not mandatory ..”

Do you mean in contrast with DerivingStrategies?

“.. and is clearly signposted, ..”

“ .. while having no effects beyond its use sites, we enable it globally for its usefulness.”

Do you mean “we enable it because it’s useful and its effects are limited to its use sites”?

NoFieldSelectors

DuplicateRecordFields and NoFieldSelectors is part of the constellation of extensions designed to address the deficiencies of Haskell2010 records. In particular, the first of these allows defining two records in the same scope with identically-named fields:

-- Won't work without DuplicateRecordFields
data Foo = Foo {
    bar :: Int,
    baz :: Int
    }

data Bar = Bar {
    bar :: Int,
    baz :: Int
    }

The second of these allows us to completely replace field accessors with optics (as described in the 'Records') section without losing construction syntax, but while also avoiding the many pitfalls record field accessors have.

What pitfalls are you referring to?

I am not very well versed in this area, but I’ve felt that optics is a fairly heavy dependency. I thought that it’s primarily used in application code that deals with large structured data like an API response that corresponds to a JSON structure.

Is the Haskell built-in record access really that bad to make us use optics from the get go?

Answered by the Records section of the document.

EmptyCase

EmptyCase resolves an inconsistency in Haskell2010, as the report allows us to define an empty data type (that is, one with no constructors), but not pattern match on it. This should really be in the language, and enabling this globally resolves a strange inconsistency in the language at no real cost.

How does pattern matching on an empty data type work?

TypeFamilies, UndecidableInstances

TypeFamilies and UndecidableInstances are enabled as part of our records solution. These are enabled globally for the same reason DuplicateRecordFields is: we'd need them almost everywhere anyway.

Is this because the optics library needs them?

Records

Field selectors 'clash' with local bindings of the same name. This is even worse when you deal with module re-exports or qualified imports; this problem can even arise within a module which imports no clashes, even if the types would give an unambiguous solution.

Ah I’ve felt this pain so many times. I think this is why people use some kind of a prefix, like in runFoo or getFoo or fooField

Stability of interface with field selectors is basically impossible if you want to change representations. Record patterns help, but their interactions with many record-related extensions are brittle and confusing, if they work at all.

Do you mean, we shouldn’t have to rewrite too much code when we split a large record into smaller nested records?

.. when all we want to do is use records in a way even the C programming language doesn’t create complications for.

Amen.

..

foo .^ bar ..^ baz ^? quux %!~ jazz

Although the operators are consistent, and their meaning is possible to 'read off' with practice, to someone less-trained, this reads like line noise.

Yes! Thanks for recognizing this : )

Furthermore, this is entirely unnecessary: each of these operators has a corresponding 'wordy' version (view for .^preview for ^?, etc). These are much clearer in their use, and also give an indication of what's happening that's much more readable to someone not deeply familiar with optics

This will make my life a lot easier, as a beginner to optics.

Versioning

A project MUST use the [PVP](https://pvp.haskell.org/). Three, and only three, version numbers MUST be used: a major version and two minor versions. The first version MUST be 1.0.0.

Are we going to increment the version number as each PR is merged, or are we going to do that only for the public releases?

let versus where

For function-local definitions, if we require re-use of bindings outside of the function's arguments, or, in the case of type class methods, type class instance type variable bindings, let MUST be used. Otherwise, if the definition needs to be used in other where-bindings, or if the definition is a function, where MUST be used. Otherwise, let MUST be used.

This is a bit hard to follow. Is the following correct?

  1. If bindings outside function args needed || type class instance type variables are needed

    then use let

    else use where

  2. If the definition is a function || needs to be used in other where definitions

    then use where

    else use let

Which of these two rules have priority?

What do you mean by “bindings outside of function args”? Do you mean if we use any top level bindings, it must be a let? If yes, why?

When you say “type class instance type variable bindings”, do you mean a in instance Foo (Maybe a)? Or do you mean associated types in a type class?

Could you add a few examples to demonstrate these rules?

Rule 2 makes complete sense.

Other

Lists SHOULD NOT be field values of types; this extends to Strings. Instead, Vectors (Texts) SHOULD be used, unless a more appropriate structure exists. We allow exceptions for newtypes over lists or Strings.

We exempt newtypes, as they don't actually change the runtime representation of either type (and are useful for avoiding orphan instances).

The justification here is not clear.

We are avoiding String/List due to it being a linked list and thus inefficient, right? How does a newtype over it make things better?

Does having List/String inside a data type prevent certain GHC optimizations, which are not affected if we wrap it in a newtype?

@kozross
Copy link
Member Author

kozross commented Jan 16, 2025

@itsfarseen - thanks for the feedback! I will address these in turn.

Signposting and telegraphing

'Signposting' and 'telegraphing' means 'having to indicate something is going on behind the scenes'. Usually, we use this in the context of 'having to do stuff outside the code to show something is happening'. A class thing requiring signposting by us is PolyKinds: it is very easy to have code stop (or start) compiling with changes to PolyKinds being on or off. Thus, if we use it, we have to watch out for this in a way that's hard to notice in the code itself: we don't change kind signatures for PolyKinds, just the inference algorithm.

For a counter-example, consider DerivingVia, as when you use the extension (rather than just enable it), you have to write things differently. Consider the difference between:

newtype Foo = Foo Word64
   deriving newtype (Eq, Ord)

versus

newtype Foo = Foo Word64
    deriving (Eq, Ord) via Word64

Here, we see a difference every time it is used. I will rewrite this section to be more clear.

DerivingStrategies and being mandatory

DerivingStrategies is indeed essentially mandatory, because we have -Wmissing-deriving-strategies on. Even just writing deriving stock (Show) cannot be done if we don't have DerivingStrategies enabled, and thus, having to enable it on a per-module basis would just be line noise. Furthermore, we can clearly see its use and meaning, as this is different to regular, non-explicit-strategy, derivations. I'm happy to reword it if that would be clearer.

How to use empty data types

Empty data types are essentially meant to be 'exciting' as Conor McBride calls them: it should be absolutely impossible to ever have one, and they're often used as 'contradiction proof objects'. However, you cannot, without EmptyCase, exhaustively match on them. This is an odd omission, even though it comes up rarely, and having EmptyCase on allows us to be consistent.

Why TypeFamilies and UndecidableInstances are required

TypeFamilies and UndecidableInstances are indeed needed for record optics. The first of these is required because any given field of a record might have a different optic type, and thus we must associate the appropriate type with the appropriate field by name. The second is required because multi-parameter type classes, even with functional dependencies, almost always fall afoul of the Paterson conditions, despite actually being completely decidable. In order to have unambiguous field instances, we need to have three parameters: a type of record, a label (as a Symbol) and a field type, with the first two determining the last, thus running afoul of this problem.

How we will use PVP version numbers

Version numbers will only change for Hackage releases. We'll keep main stable, and have WIP stuff on a separate branch, probably staging.

Records

When I refer to 'interface stability', the idea here is that a record's representation (how its data is laid out in memory) and its API (how we can manipulate this data by reading and modification) should be kept cleanly separated, such that we can change the representation without having to change the API to go with it. In Haskell 2010, a record's representation is its API: changing the first forces us to also change the second. Optics allow us to separate the two; one of the changes this makes possible, as you noted, is splitting up a record into smaller records. However, it can go much further than this. For example, consider if we wanted to make a type representing ASCII text. Initially, let's say we chose the representation as a pair of [Word8] and Int (for its length so we don't have to keep recomputing it):

data ASCIIText = ASCIIText (Int, [Word8])

We then provided two optics to work with it, which form (a key part of) its API:

  1. A Getter for the length, called len (as we don't want the length to be arbitrarily changeable); and
  2. An IndexedFold for the actual bytes, which we call contents; if modified, it would also adjust the length appropriately.

Now, suppose later, we realized that our representation was horribly inefficient, and decided to change the representation to ByteArray:

newtype AsciiText = AsciiText ByteArray

We can keep the exact same optics in place, with the exact same functionality, without anyone having to change anything downstream from us. This is not difficult with optics, but practically impossible with Haskell 2010 records without a lot of work.

let versus where

The 'let' versus 'where' rules could do with some rewriting. Basically, the problem tends to come up when we have functions being bound to names, rather than non-function values. This is because we can write either this:

foo x = let f y = doSomething x y
   in f someOtherThing

or this:

foo x = f someOtherThing
   where
      f :: ...
      f y = doSomething x y

However, if we want to re-use a previous let binding, we can't use both:

-- this works
foo x = let y = doSomething x
            f z = doSomeOtherThing y z
         in f aRandomOtherThing
         
-- but this doesn't, because the where can't 'see' y
foo x = let y = doSomething x 
          in f aRandomOtherThing
   where
      f :: ... 
      f z = doSomeOtherThing y z -- y is free here

Essentially, the rules boil down to this:

  1. Use let for non-function values, unless these values are needed for multiple where binds, in which case where is fine.
  2. Use where for functions, unless those functions need let-bound values (of any kind), in which case let is fine.

Thus, option 2 from your list is the right interpretation. I'll rewrite this whole section to be clearer. What I meant with the 'type class instances' mention I have no idea, and will remove it.

List/String newtypes

I think that was a holdover from a previous version that no longer makes sense here. I'll remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Code standards
2 participants