One framework to begin with which kinds of pulls in together a lot of interesting ideas here is the Boom hierarchy.
It is a structure between presence/absence of 4 properties giving us 2⁴ = 16 structures.
Write about how they can be classified as { monoids, semigroups, trees, non-empty trees }
TODO: Find out what the best structures to characterize the trees and their variants (idempotent, non-empty) are.
TODO: Insert a diagram on this here.
Also, show how some are forgetful and some are free.
Talk about the idea of adjunctions.
Probably not the right place to expand out on Galois theory, but a very broad strokes overview of Galois theory might help here. Just keep the salient bits for computation here. I can then point to resources which then elaborate this.
Abstraction Interpretation, Noson Yanofsky / Rohith Parikh work.
Discuss about the free vs. forgetful distinction in category theory.
Talk about why ++ is a free structure while Last or +, * are not.
Discuss about the practical implication of having a free structure in programming.
Discuss about the properties: associativity, commutativity, distributivity, idempotency, reflexivity, symmetry, transitivity
TODO: Describe how can logic be used to capture the various enumerative combinatorics of the structures?
Quantifying structures using logical operators and another approach using generating functions.
TODO: Tie in groupoidification, combinatory species, generating functionology, orthogonal polynomials, analytic functors, and 12 fold way once you investigate them deeper here. Maintain only the contextually relevance details when describing: we are using them to understand computation and how they are related to program derivation.
These are nice structures. They can be thought of as giving a principled way to locate the evolution of an entity through multiple ways. And by the nature of its axiomatization, it is ensured that each stage in this evolution is tracked in a way that the commonalities between the branches of evolution converge to common points and then converge/diverge out from there. This is the process interpretation of the lattices. If we think about the structure interpretation, the best way to understand them is as closures and kernels of the problem domain.
TODO: I have to understand the idea of closures/kernels, implication/conjunction, and their counterparts in algebraic geometry and how it connects up with the composition of the lattice.
There are at least two dualities in play here
Intersection / Union duality
Chain / Antichain duality
TODO: Describe how they are related to partial orders.
How you can do closures and kernels on these structures to generate transitive reduction and transitive closure. What does this mean in terms of logic?
What role does the lattice play?
How did the idea of lattices emerge?
Dualgruppen idea
Link with how splitting fields and group extensions idea gave birth to Galois theory
TODO: Investigate why Peirce was after grounding some kind of duality using lattice like structures. TODO: Investigate how it connects with Schröder’s work TODO: Link with the history papers here TODO: Link with the work of Dedekind and how he leveraged the idea of ideals/filters and identified the concept of PID
Ideals/Filters
Link to articles like that of Sowa and ISKO
TODO: Think about how lattices are connected to quantifiers and quantifier elimination. TODO: Talk about how they are related to sheaves/fibres/sections/stack etc.
Talk about how adjunctions between lattices give you closure/kernel operators and this is mirrored by monads and comonads.
How lattices connect up with the idea of eigenvalue decomposition and extended Euclidean algorithm
Briefly mention about the connection between Fourier Transforms, Bezout theorem, Euclidean GCD, Singular Value Decomposition, Krohn-Rhodes theory, Eigenvalue Decomposition etc.
https://twitter.com/LdaFCosta/status/1454520031132459008 https://sridharramesh.github.io/HowSridharThinks/math/BezoutEtc.html
With formal concept analysis, you get a way to partition out the problem space in such a way that the components are linearly independent. You attempt to coordinatize the space in a way that your axes, like eigenvectors spans the whole space so that when you decompose them into subcomponents, the intertwining is reduced and what you get out is orthogonal components which you can linearly recombine to recover the original space. They are in some sense the necessary and sufficient concepts to describe the space in a parsimonious manner. So an equivalence class with maximal information, i.e. the least amount of redundancy (entropy?) needed to describe the original space. It also prevents a skewed coordinate system which might cause overlap of two things on one idea or the converse of one thing as two ideas.
Mention about Duality of Structure and Process from Anna Sfard’s paper and how it connects up with the idea of right and left fold having time / space tradeoffs.
Eigenvalues describing the variance in a distribution
Covariance/correlation of distributions
Lattice homomorphisms
Sets ~ Distributive? Lattices Ordered Sets ~ Matrices Multisets ~ Modular? Lattices Lists ~ Hypermatrices
One of my realization lately has been that matrices model functions: https://twitter.com/prathyvsh/status/1459936336182452233 So I think matrices represent ordered sets in the BOOM hierarchy because you have a certain sense of direction with domain mapping to codomain. With lattices, what you are representing are the multisets/sets which can have different overlaps / elisions within them. I will gladly elaborate on this if you have any doubts on this. But I think I
Monotonicity/continuity
Discuss about the two schools: Eindhoven / Oxford
Talk about their difference
Talk about the general approach in a schematic manner once you investigate it.
TODO: Read that Backhouse paper and investigate how Eindhoven style relates to Oxford style
Stalactite vs. Stalagmite distinction in quotients vs. subgroups
Injectivity vs. Surjectivity
Two aspects of Homomorphism
How this is also reflected in the canonical decomposition of a function and in formal concept analysis style disjunction
Richard Bird Oxford, 16th and 17th December 1986
A calculus used for deriving efficient solutions to certain kinds of problems in computation. A set of laws, lemmas, and theorems in the same sense of integral calculus. It does not stand for a formal systems with axioms and inference rules. Semantic and foundational issues are not touched upon.
Theory of expression trees are waiting to be organized.
Program specification / transformation
General laws which enable programs to be calculated.
Richard Bird has been dissatisfied about the lack of penetrating results such as in integral calculus that scientists and engineers use daily.
Only two players: Richard Bird and Lambert Meertens
Dijkstra, Grieves, Backhouse program calculation with invariant assertions
Difference between Eindhoven school is that of style not of objective.
Eindhoven use imperative notation andd uses predicate calculus as the main tool and many of the results are presented using arrays.
Bird-Meertens is: Functional notation, specialized functional calculus and considers lists as a more basic data structure than arrays.
First Lecture
Notation suggested by David Turner
Second Lecture
First example of a calculational proof and some further notation
Third Lecture
Problems about segments of lists
Fourth Lecture
Problems about partitions of lists
Fifth Lecture
More material and further examples
Lists: Ordered and homogenous
[] [‘a’] [‘a’,’p’,’p’,’l’,’e’]
a, b, c … elements of lists x, y, z … lists xs, ys, zs … lists of lists
#[a1,a2,…an] = n
++ :: [ α ] × [ α ] → [ α ] Associativity: (x ++ y) ++ z = x ++ ( y ++ z ) Identity: x ++ [] = [] ++ x = x
(#, ++ distribution)
f * [ a₁, a₂, … , aₓ ] = [ fa₁ , fa₂, … faₓ]
f * ( x ++ y ) = ( f * x ) ++ ( f * y ) { *, ++ distributivity }
(f • g) * = (f * ) • ( g * ) { *, • dist }
( f * )⁻¹ = ( f⁻¹ * ) { *, ⁻¹ comm }
Let ⊕ :: ( ⍺ ✕ β ) → 𝛾
(a ⊕ ) :: β → 𝛾 | ( a ⊕ ) b = a ⊕ b
( ⊕ b ) :: ⍺ → 𝛾 | ( ⊕ b ) a = a ⊕ b
( f * ) f-map function ( + 1 ) successor function ( ++ [a] ) append a function
Function application is left-associative and has highest precedence
f x y + 3 = ((f x) y) + 3
◁ :: ( α → Bool ) × [ ⍺ ] → [ ⍺ ] p ◁ x the sublists of elements of x satisfying p
even ◁ [1 .. 10] = [2, 4, 6, 8, 10]
p ◁ ( x ++ y) = (p ◁ x) ++ (p ◁ y) {◁, ++ dist }
(p ◁ ) • (p ◁) = (p ◁) { ◁ idem } (p ◁) • (q ◁) = (q ◁) • (p ◁) { ◁ comm } (For total functions)
(p ◁) • (f *) = (f *) • ((p • f) ◁) { ◁, • comm }
8 Reduce
Borrowed from APL
Operator which takes an operator / :: ( ⍺ × ⍺ → ⍺) × [ ⍺ ] → ⍺ ⊕ / [ a₁, a₂, … , aₓ ] = a₁ ⊕ a₂ ⊕ … ⊕ aₓ Only defined if ⊕ is associative
Laws
⊕ / [ a ] = a { / singletons } ⊕ / (x ++ y) = (⊕ / x) ⊕ (⊕ / y) { / dist }
If ⊕ has an identity element e, then ⊕ / [] = e otherwise, ⊕ / [] is not defined
⊕ / y = ⊕ / ( [] ++ y ) = (⊕ / [] ) ⊕ (⊕ / y) = e ⊕ (⊕/y)
9 Examples
sum = + / product = × /
n! = x / [1 .. n]
flatten = ++ /
flatten [[1, 2], [], [2, 3]] = [1, 2, 2, 3]
min = ↓ / max = ↑ /
head = << / last = >> /
all p = (˄ / ) • (p •) some p = (˅ / ) • (p •)
10 Promotion Lemmas
Generalize the distribution laws of map, filter, reduce
(f *) • (++ /) = (++ /) • ((f *) *) { * promotion }
fmap to a flatten list is the same as flatten map of f map map.
f*(++ / [ x₁, x₂, … , xₓ ] = f*(x₁ ++ x₂ ++ … ++ xₓ) = (f*x₁) ++ f(x₂) ++ … ++ f(xₓ) = ++ / [f*x₁, f(x₂), … , f(xₓ) ] = ++ / (f*)* [ x₁, x₂, … , xₓ ]
Rather than flattening Promote the map into each component list and then flatten the result
( p ◁ ) • (++ /) = (++ /) • ( p ◁ * ) { ◁ promotion }
( ⊕ ◁ ) • (++ /) = (⊕ /) • ( ⊕ / * ) { ⊕ promotion }
11 Homomorphisms
A function that preserves the properties of associativity and identity e.
h [] = e h (x ++ y) = h x ⊕ h y
Equivalently, if h • ( ++ / ) = (⊕ / ) • (h *)
12 Homomorphism Lemma
h is a homomorphism iff h = (⊕ / ) • (f *) for some ⊕ and f.
Proof
Suppose h = (⊕ / ) • (f *) Then h • (++ /) = (⊕ / ) • (f *) • ( ++ / ) { hypothesis } = (⊕ / ) • (++ /) • ( (f *) * ) { *-promotion } = (⊕ / ) • (+(⊕ / ) *) • ( (f *) * ) { /-promotion } = (⊕ / ) • (+(⊕ / ) *) • ( (f *) * ) { *, • dist } = (⊕ / ) • (h *) { hypothesis }
Second, define □ a = [a]
so (++ /) • (□ *) = id
Now h = h • (++ /) • ( □ * ) { definition of □ } = (⊕ /) • (h *) • (□ *) { h is a homomorphism} = (⊕ /) • (f *) { *, • dist }
where f = h • □ Hence h = (⊕ /) • (f *) for suitable ⊕ and f.
Filter is a homomorphism (p ◁) = (++ /) • (f_p *) where f_p a = [a] if p a = [] otherwise
K is the K combinator for combinatory calculus
sort = (merge /) • (□ *)
reverse = (++~ /) • (□ *) where a ⊕~ b = b ⊕ a
a ⊕ b = h(h⁻¹ a ++ h⁻¹ b)
Then h (x ++ y) = h x ⊕ h y
Try and solve a problem by looking for a homomorphism
Text = [ Char ] Line = [ Char \ { NL } ]
unlines :: [Line]⁺ → Text unlines = ⊕ / x ⊕ y = x ++ [ NL ] ++ y
lines is an injective function
lines :: Text → [ Line ] lines • unlines = id
Problem: give a constructive definition of lines
Since line is an injective function (intuitively at least)
lines = (⊗ / ) • (f *)
Direct calculation yields:
f a = [[], []] if a = NL = a, otherwise
(xs ++ [x]) ⊗ ([y] ++ ys) = xs ++ [x ++ y] ++ ys
Word = [ Char \ { SP, NL ]⁺ Para = [ Line⁺ ]⁺
unwords :: [ Word ]⁺ → Line unwords = ⊕SP / x ⊕sp y = x ++ [ SP ] ++ y
words :: Line → [ Word ] words = (( ≠ [] ) ◁ ) • ( ⊗ / ) • (f_SP * )
unparas :: [ Para ]⁺ → [ Lines ] unparas = ⊕[] / x ⊕[] y = x ++ [ [] ] ++ y
paras :: [ Line ] → [ Para ] paras = (( ≠ [] ) ◁ ) • ( ⊗ / ) • (f_[] * )
countlines = # • lines
countwords = # • (++ / ) • (words * ) • lines
countparas = # • paras • lines
normalise :: Text → Text normalize = unparse • parse
parse = ( ( words * ) * ) • paras • lines unparse = unlines • unparas • ( ( unwords * ) * )
Unparse is correct because
( f • g )⁻¹ = g⁻¹ • f⁻¹ (f *)⁻¹ = (f⁻¹ *)
←/- right-reduce -/→ left-reduce
(⊕ ←/-)[ a₁, a₂, … , aₓ ] = a₁ ⊕( a₂ ⊕ … (aₓ⊕e)) The ⊕ need not be associative and doesn’t need to have a unit
←/- :: (( ⍺ × β → β ) × β ) → [ ⍺ ] → β -/→ :: (( β × ⍺ → β) × β ) → [ ⍺ ] → β
(⊕ -/→)[ a₁, a₂, … , aₓ ] = (((e⊕a₁)⊕ a₂) ⊕ … aₓ)
Why do we need more reductions? Because they are implementations of the fold function and are closer to what can be achieved by machines.
(f * ) (⊕ ←/- []) where a ⊕ x = [ f a ] ++ x
(⊕ -/→) = (⊕~ ←/-) • reverse
Example: fact n = x / [ 1 … n ] fact n = ( × -/→ 1 ) [1 … n] “going up”
fact n = (× ←/- 1) [ 1 … n ] = ( ×~ -/→ 1) [n, n-1 … 1] = (× -/→ 1) [n, n-1, … 1] “going down”
Every homomorphism can be written as a left-reduction, or as a right reduction.
( ⊙ / ) • ( f * ) = (⊕ ←/- e) = (⊗ -/→ e) where a ⊕ b = f a ⊙ b a ⊗ b = a ⊙ f b
and e is the identity element of ⊙.
(:) :: ⍺ × [ ⍺ ] → [ ⍺ ] a : x = [ a ] ++ x x ++ y = (: ←/- y ) x
Either : or ++ can be taken as primitive, but unlike ++, every list can be expressed/constructed in terms of [] and : in exactly one way.
Cons has a canonical form, ++ has many ways in which the same thing can be expressed.
(⊕ ←/- e) [] = e
(⊕ ←/- e) ( [a] ++ x ) = a ⊕ (⊕ ←/- e) x
We can say f = (⊕ ←/- e) is the solution of:
f [] = e f([a] ++ x) = a ⊕ fx
Progress of computation is recursive.
Similarly,
(⊕ -/→ e)[] = e (⊕ -/→ e)(x ++ [a] ) = (⊕ -/→ e) x ⊕ a
But also (⊕ -/→ e) [] = e (⊕ -/→ e)([a] ++ x) = (⊕ -/→ (e ⊕ a) ) x
So f = (⊕ -/→ e)(⊕ -/→ e) is the solution of f x = g e x g e [] = e g e([a] ++ x) = g(e ⊕ a) x
So left reductions have the advantage that they give an iterative notion of programming. They are immediately expressing in terms of imperative notions:
y := (⊕ -/→ e) x ⟹ y := e; for a ← x do y := y ⊕ a
In a functional programming language ←/- can be more time-efficient than -/→. ←/- can be more space-efficient than -/→.
Recall a << b = a and a >> b = b (<< ←/- e)[1, 2, 3]
(⊕ ←/- e)([a] ++ x) = a ⊕ (⊕ ←/- e) x (⊕ -/→ e)([a] ++ x) = (⊕ -/→ (e ⊕ a)) x
(<< ←/- e)[1, 2, 3] = 1 << (<< ←/- e)[2, 3] (←/- .2) = 1 (<<.1) This can terminate after one step
(>> -/→ e)[3, 2, 1] = (>> -/→ (e >> 3))[2, 1] (-/→ .2) = (>> -/→ 3)[2, 1] (>>.1) = (>> -/→ (3 >> 2))[1] (-/→ .2) = (>> -/→ 2)[1] (>>.1) = (>> -/→ ( 2 >> 1))[] (-/→ .2) = (>> -/→ 1)[] (>>.1) = 1 (-/→ .1)
Whole of the list must be traversed
(+ ←/- 0)[1, 2, 3] = 1 + (+ ←/- 0)[2, 3] = 1 + (2 + (+ ←/- 0)[3]) = 1 + (2 + (3 + (+ ←/- 0)[]) = 1 + (2 + (3 + 0)) = 6
Linear space. Same size as the list that we started with.
(+ -/→ 0)[1, 2, 3] = (+ -/→ (0 + 1))[2, 3] { We can evaluate the answer now reducing the size of the list } = (+ -/→ 1)[2, 3] = (+ -/→ (1 + 2))[3] = (+ -/→ 3)[3] = (+ -/→ (3 + 3))[] = (+ -/→ 6)[] = 6 Constant space
Conclusion
Use (⊕ -/→ e) when ⊕ is strict in the sense that it requires the evaluation of both arguments to return the result. Eg: + × ↑ ↓
Use (⊕ ←/- e) when ⊕⊕ is non-strict that is it does not always demand the complete evaluation of both left and right arguments to return the result.
e.g. and or <<