-
Notifications
You must be signed in to change notification settings - Fork 0
/
References.v
1887 lines (1601 loc) · 70.2 KB
/
References.v
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
(** * References: Typing Mutable References *)
(** Up to this point, we have considered a variety of _pure_
language features, including functional abstraction, basic types
such as numbers and booleans, and structured types such as records
and variants. These features form the backbone of most
programming languages -- including purely functional languages
such as Haskell and "mostly functional" languages such as ML, as
well as imperative languages such as C and object-oriented
languages such as Java, C[#], and Scala.
However, most practical languages also include various _impure_
features that cannot be described in the simple semantic framework
we have used so far. In particular, besides just yielding
results, computation in these languages may assign to mutable
variables (reference cells, arrays, mutable record fields, etc.);
perform input and output to files, displays, or network
connections; make non-local transfers of control via exceptions,
jumps, or continuations; engage in inter-process synchronization
and communication; and so on. In the literature on programming
languages, such "side effects" of computation are collectively
referred to as _computational effects_.
In this chapter, we'll see how one sort of computational effect --
mutable references -- can be added to the calculi we have studied.
The main extension will be dealing explicitly with a _store_ (or
_heap_) and _pointers_ that name store locations. This extension
is fairly straightforward to define; the most interesting part is
the refinement we need to make to the statement of the type
preservation theorem. *)
Require Import Coq.Arith.Arith.
Require Import Coq.omega.Omega.
Require Import Coq.Lists.List.
Import ListNotations.
Require Import Maps.
Require Import Smallstep.
(* ################################################################# *)
(** * Definitions *)
(** Pretty much every programming language provides some form of
assignment operation that changes the contents of a previously
allocated piece of storage. (Coq's internal language Gallina is a
rare exception!)
In some languages -- notably ML and its relatives -- the
mechanisms for name-binding and those for assignment are kept
separate. We can have a variable [x] whose _value_ is the number
[5], or we can have a variable [y] whose value is a
_reference_ (or _pointer_) to a mutable cell whose current
contents is [5]. These are different things, and the difference
is visible to the programmer. We can add [x] to another number,
but not assign to it. We can use [y] to assign a new value to the
cell that it points to (by writing [y:=84]), but we cannot use [y]
directly as an argument to an operation like [+]. Instead, we
must explicitly _dereference_ it, writing [!y] to obtain its
current contents.
In most other languages -- in particular, in all members of the C
family, including Java -- _every_ variable name refers to a
mutable cell, and the operation of dereferencing a variable to
obtain its current contents is implicit.
For purposes of formal study, it is useful to keep these
mechanisms separate. The development in this chapter will closely
follow ML's model. Applying the lessons learned here to C-like
languages is a straightforward matter of collapsing some
distinctions and rendering some operations such as dereferencing
implicit instead of explicit. *)
(* ################################################################# *)
(** * Syntax *)
(** In this chapter, we study adding mutable references to the
simply-typed lambda calculus with natural numbers. *)
Module STLCRef.
(** The basic operations on references are _allocation_,
_dereferencing_, and _assignment_.
- To allocate a reference, we use the [ref] operator, providing
an initial value for the new cell. For example, [ref 5]
creates a new cell containing the value [5], and reduces to
a reference to that cell.
- To read the current value of this cell, we use the
dereferencing operator [!]; for example, [!(ref 5)] reduces
to [5].
- To change the value stored in a cell, we use the assignment
operator. If [r] is a reference, [r := 7] will store the
value [7] in the cell referenced by [r]. *)
(* ----------------------------------------------------------------- *)
(** *** Types *)
(** We start with the simply typed lambda calculus over the
natural numbers. Besides the base natural number type and arrow
types, we need to add two more types to deal with
references. First, we need the _unit type_, which we will use as
the result type of an assignment operation. We then add
_reference types_. *)
(** If [T] is a type, then [Ref T] is the type of references to
cells holding values of type [T].
T ::= Nat
| Unit
| T -> T
| Ref T
*)
Inductive ty : Type :=
| TNat : ty
| TUnit : ty
| TArrow : ty -> ty -> ty
| TRef : ty -> ty.
(* ----------------------------------------------------------------- *)
(** *** Terms *)
(** Besides variables, abstractions, applications,
natural-number-related terms, and [unit], we need four more sorts
of terms in order to handle mutable references:
t ::= ... Terms
| ref t allocation
| !t dereference
| t := t assignment
| l location
*)
Inductive tm : Type :=
(* STLC with numbers: *)
| tvar : id -> tm
| tapp : tm -> tm -> tm
| tabs : id -> ty -> tm -> tm
| tnat : nat -> tm
| tsucc : tm -> tm
| tpred : tm -> tm
| tmult : tm -> tm -> tm
| tif0 : tm -> tm -> tm -> tm
(* New terms: *)
| tunit : tm
| tref : tm -> tm
| tderef : tm -> tm
| tassign : tm -> tm -> tm
| tloc : nat -> tm.
(** Intuitively:
- [ref t] (formally, [tref t]) allocates a new reference cell
with the value [t] and reduces to the location of the newly
allocated cell;
- [!t] (formally, [tderef t]) reduces to the contents of the
cell referenced by [t];
- [t1 := t2] (formally, [tassign t1 t2]) assigns [t2] to the
cell referenced by [t1]; and
- [l] (formally, [tloc l]) is a reference to the cell at
location [l]. We'll discuss locations later. *)
(** In informal examples, we'll also freely use the extensions
of the STLC developed in the [MoreStlc] chapter; however, to keep
the proofs small, we won't bother formalizing them again here. (It
would be easy to do so, since there are no very interesting
interactions between those features and references.) *)
(* ----------------------------------------------------------------- *)
(** *** Typing (Preview) *)
(** Informally, the typing rules for allocation, dereferencing, and
assignment will look like this:
Gamma |- t1 : T1
------------------------ (T_Ref)
Gamma |- ref t1 : Ref T1
Gamma |- t1 : Ref T11
--------------------- (T_Deref)
Gamma |- !t1 : T11
Gamma |- t1 : Ref T11
Gamma |- t2 : T11
------------------------ (T_Assign)
Gamma |- t1 := t2 : Unit
The rule for locations will require a bit more machinery, and this
will motivate some changes to the other rules; we'll come back to
this later. *)
(* ----------------------------------------------------------------- *)
(** *** Values and Substitution *)
(** Besides abstractions and numbers, we have two new types of values:
the unit value, and locations. *)
Inductive value : tm -> Prop :=
| v_abs : forall x T t,
value (tabs x T t)
| v_nat : forall n,
value (tnat n)
| v_unit :
value tunit
| v_loc : forall l,
value (tloc l).
Hint Constructors value.
(** Extending substitution to handle the new syntax of terms is
straightforward. *)
Fixpoint subst (x:id) (s:tm) (t:tm) : tm :=
match t with
| tvar x' =>
if beq_id x x' then s else t
| tapp t1 t2 =>
tapp (subst x s t1) (subst x s t2)
| tabs x' T t1 =>
if beq_id x x' then t else tabs x' T (subst x s t1)
| tnat n =>
t
| tsucc t1 =>
tsucc (subst x s t1)
| tpred t1 =>
tpred (subst x s t1)
| tmult t1 t2 =>
tmult (subst x s t1) (subst x s t2)
| tif0 t1 t2 t3 =>
tif0 (subst x s t1) (subst x s t2) (subst x s t3)
| tunit =>
t
| tref t1 =>
tref (subst x s t1)
| tderef t1 =>
tderef (subst x s t1)
| tassign t1 t2 =>
tassign (subst x s t1) (subst x s t2)
| tloc _ =>
t
end.
Notation "'[' x ':=' s ']' t" := (subst x s t) (at level 20).
(* ################################################################# *)
(** * Pragmatics *)
(* ================================================================= *)
(** ** Side Effects and Sequencing *)
(** The fact that we've chosen the result of an assignment
expression to be the trivial value [unit] allows a nice
abbreviation for _sequencing_. For example, we can write
r:=succ(!r); !r
as an abbreviation for
(\x:Unit. !r) (r:=succ(!r)).
This has the effect of reducing two expressions in order and
returning the value of the second. Restricting the type of the
first expression to [Unit] helps the typechecker to catch some
silly errors by permitting us to throw away the first value only
if it is really guaranteed to be trivial.
Notice that, if the second expression is also an assignment, then
the type of the whole sequence will be [Unit], so we can validly
place it to the left of another [;] to build longer sequences of
assignments:
r:=succ(!r); r:=succ(!r); r:=succ(!r); r:=succ(!r); !r
*)
(** Formally, we introduce sequencing as a _derived form_
[tseq] that expands into an abstraction and an application. *)
Definition tseq t1 t2 :=
tapp (tabs (Id 0) TUnit t2) t1.
(* ================================================================= *)
(** ** References and Aliasing *)
(** It is important to bear in mind the difference between the
_reference_ that is bound to some variable [r] and the _cell_
in the store that is pointed to by this reference.
If we make a copy of [r], for example by binding its value to
another variable [s], what gets copied is only the _reference_,
not the contents of the cell itself.
For example, after reducing
let r = ref 5 in
let s = r in
s := 82;
(!r)+1
the cell referenced by [r] will contain the value [82], while the
result of the whole expression will be [83]. The references [r]
and [s] are said to be _aliases_ for the same cell.
The possibility of aliasing can make programs with references
quite tricky to reason about. For example, the expression
r := 5; r := !s
assigns [5] to [r] and then immediately overwrites it with [s]'s
current value; this has exactly the same effect as the single
assignment
r := !s
_unless_ we happen to do it in a context where [r] and [s] are
aliases for the same cell! *)
(* ================================================================= *)
(** ** Shared State *)
(** Of course, aliasing is also a large part of what makes references
useful. In particular, it allows us to set up "implicit
communication channels" -- shared state -- between different parts
of a program. For example, suppose we define a reference cell and
two functions that manipulate its contents:
let c = ref 0 in
let incc = \_:Unit. (c := succ (!c); !c) in
let decc = \_:Unit. (c := pred (!c); !c) in
...
*)
(** Note that, since their argument types are [Unit], the
arguments to the abstractions in the definitions of [incc] and
[decc] are not providing any useful information to the bodies of
these functions (using the wildcard [_] as the name of the bound
variable is a reminder of this). Instead, their purpose of these
abstractions is to "slow down" the execution of the function
bodies. Since function abstractions are values, the two [let]s are
executed simply by binding these functions to the names [incc] and
[decc], rather than by actually incrementing or decrementing [c].
Later, each caddll to one of these functions results in its body
being executed once and performing the appropriate mutation on
[c]. Such functions are often called _thunks_.
In the context of these declarations, calling [incc] results in
changes to [c] that can be observed by calling [decc]. For
example, if we replace the [...] with [(incc unit; incc unit; decc
unit)], the result of the whole program will be [1]. *)
(* ================================================================= *)
(** ** Objects *)
(** We can go a step further and write a _function_ that creates [c],
[incc], and [decc], packages [incc] and [decc] together into a
record, and returns this record:
newcounter =
\_:Unit.
let c = ref 0 in
let incc = \_:Unit. (c := succ (!c); !c) in
let decc = \_:Unit. (c := pred (!c); !c) in
{i=incc, d=decc}
*)
(** Now, each time we call [newcounter], we get a new record of
functions that share access to the same storage cell [c]. The
caller of [newcounter] can't get at this storage cell directly,
but can affect it indirectly by calling the two functions. In
other words, we've created a simple form of _object_.
let c1 = newcounter unit in
let c2 = newcounter unit in
// Note that we've allocated two separate storage cells now!
let r1 = c1.i unit in
let r2 = c2.i unit in
r2 // yields 1, not 2!
*)
(** **** Exercise: 1 star (store_draw) *)
(** Draw (on paper) the contents of the store at the point in
execution where the first two [let]s have finished and the third
one is about to begin. *)
(* FILL IN HERE *)
(** [] *)
(* ================================================================= *)
(** ** References to Compound Types *)
(** A reference cell need not contain just a number: the primitives
we've defined above allow us to create references to values of any
type, including functions. For example, we can use references to
functions to give an (inefficient) implementation of arrays
of numbers, as follows. Write [NatArray] for the type
[Ref (Nat->Nat)].
Recall the [equal] function from the [MoreStlc] chapter:
equal =
fix
(\eq:Nat->Nat->Bool.
\m:Nat. \n:Nat.
if m=0 then iszero n
else if n=0 then false
else eq (pred m) (pred n))
To build a new array, we allocate a reference cell and fill
it with a function that, when given an index, always returns [0].
newarray = \_:Unit. ref (\n:Nat.0)
To look up an element of an array, we simply apply
the function to the desired index.
lookup = \a:NatArray. \n:Nat. (!a) n
The interesting part of the encoding is the [update] function. It
takes an array, an index, and a new value to be stored at that index, and
does its job by creating (and storing in the reference) a new function
that, when it is asked for the value at this very index, returns the new
value that was given to [update], while on all other indices it passes the
lookup to the function that was previously stored in the reference.
update = \a:NatArray. \m:Nat. \v:Nat.
let oldf = !a in
a := (\n:Nat. if equal m n then v else oldf n);
References to values containing other references can also be very
useful, allowing us to define data structures such as mutable
lists and trees. *)
(** **** Exercise: 2 stars, recommended (compact_update) *)
(** If we defined [update] more compactly like this
update = \a:NatArray. \m:Nat. \v:Nat.
a := (\n:Nat. if equal m n then v else (!a) n)
would it behave the same? *)
(* FILL IN HERE *)
(** [] *)
(* ================================================================= *)
(** ** Null References *)
(** There is one final significant difference between our
references and C-style mutable variables: in C-like languages,
variables holding pointers into the heap may sometimes have the
value [NULL]. Dereferencing such a "null pointer" is an error,
and results either in a clean exception (Java and C[#]) or in
arbitrary and possibly insecure behavior (C and relatives like
C++). Null pointers cause significant trouble in C-like
languages: the fact that any pointer might be null means that any
dereference operation in the program can potentially fail.
Even in ML-like languages, there are occasionally situations where
we may or may not have a valid pointer in our hands. Fortunately,
there is no need to extend the basic mechanisms of references to
represent such situations: the sum types introduced in the
[MoreStlc] chapter already give us what we need.
First, we can use sums to build an analog of the [option] types
introduced in the [Lists] chapter. Define [Option T] to be an
abbreviation for [Unit + T].
Then a "nullable reference to a [T]" is simply an element of the
type [Option (Ref T)]. *)
(* ================================================================= *)
(** ** Garbage Collection *)
(** A last issue that we should mention before we move on with
formalizing references is storage _de_-allocation. We have not
provided any primitives for freeing reference cells when they are
no longer needed. Instead, like many modern languages (including
ML and Java) we rely on the run-time system to perform _garbage
collection_, automatically identifying and reusing cells that can
no longer be reached by the program.
This is _not_ just a question of taste in language design: it is
extremely difficult to achieve type safety in the presence of an
explicit deallocation operation. One reason for this is the
familiar _dangling reference_ problem: we allocate a cell holding
a number, save a reference to it in some data structure, use it
for a while, then deallocate it and allocate a new cell holding a
boolean, possibly reusing the same storage. Now we can have two
names for the same storage cell -- one with type [Ref Nat] and the
other with type [Ref Bool]. *)
(** **** Exercise: 1 star (type_safety_violation) *)
(** Show how this can lead to a violation of type safety. *)
(* FILL IN HERE *)
(** [] *)
(* ################################################################# *)
(** * Operational Semantics *)
(* ================================================================= *)
(** ** Locations *)
(** The most subtle aspect of the treatment of references
appears when we consider how to formalize their operational
behavior. One way to see why is to ask, "What should be the
_values_ of type [Ref T]?" The crucial observation that we need
to take into account is that reduci a [ref] operator should
_do_ something -- namely, allocate some storage -- and the result
of the operation should be a reference to this storage.
What, then, is a reference?
The run-time store in most programming-language implementations is
essentially just a big array of bytes. The run-time system keeps
track of which parts of this array are currently in use; when we
need to allocate a new reference cell, we allocate a large enough
segment from the free region of the store (4 bytes for integer
cells, 8 bytes for cells storing [Float]s, etc.), record somewhere
that it is being used, and return the index (typically, a 32- or
64-bit integer) of the start of the newly allocated region. These
indices are references.
For present purposes, there is no need to be quite so concrete.
We can think of the store as an array of _values_, rather than an
array of bytes, abstracting away from the different sizes of the
run-time representations of different values. A reference, then,
is simply an index into the store. (If we like, we can even
abstract away from the fact that these indices are numbers, but
for purposes of formalization in Coq it is convenient to use
numbers.) We use the word _location_ instead of _reference_ or
_pointer_ to emphasize this abstract quality.
Treating locations abstractly in this way will prevent us from
modeling the _pointer arithmetic_ found in low-level languages
such as C. This limitation is intentional. While pointer
arithmetic is occasionally very useful, especially for
implementing low-level services such as garbage collectors, it
cannot be tracked by most type systems: knowing that location [n]
in the store contains a [float] doesn't tell us anything useful
about the type of location [n+4]. In C, pointer arithmetic is a
notorious source of type-safety violations. *)
(* ================================================================= *)
(** ** Stores *)
(** Recall that, in the small-step operational semantics for
IMP, the step relation needed to carry along an auxiliary state in
addition to the program being executed. In the same way, once we
have added reference cells to the STLC, our step relation must
carry along a store to keep track of the contents of reference
cells.
We could re-use the same functional representation we used for
states in IMP, but for carrying out the proofs in this chapter it
is actually more convenient to represent a store simply as a
_list_ of values. (The reason we didn't use this representation
before is that, in IMP, a program could modify any location at any
time, so states had to be ready to map _any_ variable to a value.
However, in the STLC with references, the only way to create a
reference cell is with [tref t1], which puts the value of [t1]
in a new reference cell and reduces to the location of the newly
created reference cell. When reducing such an expression, we can
just add a new reference cell to the end of the list representing
the store.) *)
Definition store := list tm.
(** We use [store_lookup n st] to retrieve the value of the reference
cell at location [n] in the store [st]. Note that we must give a
default value to [nth] in case we try looking up an index which is
too large. (In fact, we will never actually do this, but proving
that we don't will require a bit of work.) *)
Definition store_lookup (n:nat) (st:store) :=
nth n st tunit.
(** To update the store, we use the [replace] function, which replaces
the contents of a cell at a particular index. *)
Fixpoint replace {A:Type} (n:nat) (x:A) (l:list A) : list A :=
match l with
| nil => nil
| h :: t =>
match n with
| O => x :: t
| S n' => h :: replace n' x t
end
end.
(** As might be expected, we will also need some technical
lemmas about [replace]; they are straightforward to prove. *)
Lemma replace_nil : forall A n (x:A),
replace n x nil = nil.
Proof.
destruct n; auto.
Qed.
Lemma length_replace : forall A n x (l:list A),
length (replace n x l) = length l.
Proof with auto.
intros A n x l. generalize dependent n.
induction l; intros n.
destruct n...
destruct n...
simpl. rewrite IHl...
Qed.
Lemma lookup_replace_eq : forall l t st,
l < length st ->
store_lookup l (replace l t st) = t.
Proof with auto.
intros l t st.
unfold store_lookup.
generalize dependent l.
induction st as [|t' st']; intros l Hlen.
- (* st = [] *)
inversion Hlen.
- (* st = t' :: st' *)
destruct l; simpl...
apply IHst'. simpl in Hlen. omega.
Qed.
Lemma lookup_replace_neq : forall l1 l2 t st,
l1 <> l2 ->
store_lookup l1 (replace l2 t st) = store_lookup l1 st.
Proof with auto.
unfold store_lookup.
induction l1 as [|l1']; intros l2 t st Hneq.
- (* l1 = 0 *)
destruct st.
+ (* st = [] *) rewrite replace_nil...
+ (* st = _ :: _ *) destruct l2... contradict Hneq...
- (* l1 = S l1' *)
destruct st as [|t2 st2].
+ (* st = [] *) destruct l2...
+ (* st = t2 :: st2 *)
destruct l2...
simpl; apply IHl1'...
Qed.
(* ================================================================= *)
(** ** Reduction *)
(** Next, we need to extend the operational semantics to take
stores into account. Since the result of reducing an expression
will in general depend on the contents of the store in which it is
reduced, the evaluation rules should take not just a term but
also a store as argument. Furthermore, since the reduction of a
term can cause side effects on the store, and these may affect the
reduction of other terms in the future, the reduction rules need
to return a new store. Thus, the shape of the single-step
reduction relation needs to change from [t ==> t'] to [t / st ==> t' /
st'], where [st] and [st'] are the starting and ending states of
the store.
To carry through this change, we first need to augment all of our
existing reduction rules with stores:
value v2
-------------------------------------- (ST_AppAbs)
(\x:T.t12) v2 / st ==> [x:=v2]t12 / st
t1 / st ==> t1' / st'
--------------------------- (ST_App1)
t1 t2 / st ==> t1' t2 / st'
value v1 t2 / st ==> t2' / st'
---------------------------------- (ST_App2)
v1 t2 / st ==> v1 t2' / st'
Note that the first rule here returns the store unchanged, since
function application, in itself, has no side effects. The other
two rules simply propagate side effects from premise to
conclusion.
Now, the result of reducing a [ref] expression will be a fresh
location; this is why we included locations in the syntax of terms
and in the set of values. It is crucial to note that making this
extension to the syntax of terms does not mean that we intend
_programmers_ to write terms involving explicit, concrete locations:
such terms will arise only as intermediate results during reduction.
This may seem odd, but it follows naturally from our design decision
to represent the result of every reduction step by a modified _term_.
If we had chosen a more "machine-like" model, e.g., with an explicit
stack to contain values of bound identifiers, then the idea of adding
locations to the set of allowed values might seem more obvious.
In terms of this expanded syntax, we can state reduction rules
for the new constructs that manipulate locations and the store.
First, to reduce a dereferencing expression [!t1], we must first
reduce [t1] until it becomes a value:
t1 / st ==> t1' / st'
----------------------- (ST_Deref)
!t1 / st ==> !t1' / st'
Once [t1] has finished reducing, we should have an expression of
the form [!l], where [l] is some location. (A term that attempts
to dereference any other sort of value, such as a function or
[unit], is erroneous, as is a term that tries to dereference a
location that is larger than the size [|st|] of the currently
allocated store; the reduction rules simply get stuck in this
case. The type-safety properties established below assure us
that well-typed terms will never misbehave in this way.)
l < |st|
---------------------------------- (ST_DerefLoc)
!(loc l) / st ==> lookup l st / st
Next, to reduce an assignment expression [t1:=t2], we must first
reduce [t1] until it becomes a value (a location), and then
reduce [t2] until it becomes a value (of any sort):
t1 / st ==> t1' / st'
----------------------------------- (ST_Assign1)
t1 := t2 / st ==> t1' := t2 / st'
t2 / st ==> t2' / st'
--------------------------------- (ST_Assign2)
v1 := t2 / st ==> v1 := t2' / st'
Once we have finished with [t1] and [t2], we have an expression of
the form [l:=v2], which we execute by updating the store to make
location [l] contain [v2]:
l < |st|
------------------------------------- (ST_Assign)
loc l := v2 / st ==> unit / [l:=v2]st
The notation [[l:=v2]st] means "the store that maps [l] to [v2]
and maps all other locations to the same thing as [st.]" Note
that the term resulting from this reduction step is just [unit];
the interesting result is the updated store.
Finally, to reduct an expression of the form [ref t1], we first
reduce [t1] until it becomes a value:
t1 / st ==> t1' / st'
----------------------------- (ST_Ref)
ref t1 / st ==> ref t1' / st'
Then, to reduce the [ref] itself, we choose a fresh location at
the end of the current store -- i.e., location [|st|] -- and yield
a new store that extends [st] with the new value [v1].
-------------------------------- (ST_RefValue)
ref v1 / st ==> loc |st| / st,v1
The value resulting from this step is the newly allocated location
itself. (Formally, [st,v1] means [st ++ v1::nil] -- i.e., to add
a new reference cell to the store, we append it to the end.)
Note that these reduction rules do not perform any kind of
garbage collection: we simply allow the store to keep growing
without bound as reduction proceeds. This does not affect the
correctness of the results of reduction (after all, the
definition of "garbage" is precisely parts of the store that are
no longer reachable and so cannot play any further role in
reduction), but it means that a naive implementation of our
evaluator might run out of memory where a more sophisticated
evaluator would be able to continue by reusing locations whose
contents have become garbage.
Here are the rules again, formally: *)
Reserved Notation "t1 '/' st1 '==>' t2 '/' st2"
(at level 40, st1 at level 39, t2 at level 39).
Import ListNotations.
Inductive step : tm * store -> tm * store -> Prop :=
| ST_AppAbs : forall x T t12 v2 st,
value v2 ->
tapp (tabs x T t12) v2 / st ==> [x:=v2]t12 / st
| ST_App1 : forall t1 t1' t2 st st',
t1 / st ==> t1' / st' ->
tapp t1 t2 / st ==> tapp t1' t2 / st'
| ST_App2 : forall v1 t2 t2' st st',
value v1 ->
t2 / st ==> t2' / st' ->
tapp v1 t2 / st ==> tapp v1 t2'/ st'
| ST_SuccNat : forall n st,
tsucc (tnat n) / st ==> tnat (S n) / st
| ST_Succ : forall t1 t1' st st',
t1 / st ==> t1' / st' ->
tsucc t1 / st ==> tsucc t1' / st'
| ST_PredNat : forall n st,
tpred (tnat n) / st ==> tnat (pred n) / st
| ST_Pred : forall t1 t1' st st',
t1 / st ==> t1' / st' ->
tpred t1 / st ==> tpred t1' / st'
| ST_MultNats : forall n1 n2 st,
tmult (tnat n1) (tnat n2) / st ==> tnat (mult n1 n2) / st
| ST_Mult1 : forall t1 t2 t1' st st',
t1 / st ==> t1' / st' ->
tmult t1 t2 / st ==> tmult t1' t2 / st'
| ST_Mult2 : forall v1 t2 t2' st st',
value v1 ->
t2 / st ==> t2' / st' ->
tmult v1 t2 / st ==> tmult v1 t2' / st'
| ST_If0 : forall t1 t1' t2 t3 st st',
t1 / st ==> t1' / st' ->
tif0 t1 t2 t3 / st ==> tif0 t1' t2 t3 / st'
| ST_If0_Zero : forall t2 t3 st,
tif0 (tnat 0) t2 t3 / st ==> t2 / st
| ST_If0_Nonzero : forall n t2 t3 st,
tif0 (tnat (S n)) t2 t3 / st ==> t3 / st
| ST_RefValue : forall v1 st,
value v1 ->
tref v1 / st ==> tloc (length st) / (st ++ v1::nil)
| ST_Ref : forall t1 t1' st st',
t1 / st ==> t1' / st' ->
tref t1 / st ==> tref t1' / st'
| ST_DerefLoc : forall st l,
l < length st ->
tderef (tloc l) / st ==> store_lookup l st / st
| ST_Deref : forall t1 t1' st st',
t1 / st ==> t1' / st' ->
tderef t1 / st ==> tderef t1' / st'
| ST_Assign : forall v2 l st,
value v2 ->
l < length st ->
tassign (tloc l) v2 / st ==> tunit / replace l v2 st
| ST_Assign1 : forall t1 t1' t2 st st',
t1 / st ==> t1' / st' ->
tassign t1 t2 / st ==> tassign t1' t2 / st'
| ST_Assign2 : forall v1 t2 t2' st st',
value v1 ->
t2 / st ==> t2' / st' ->
tassign v1 t2 / st ==> tassign v1 t2' / st'
where "t1 '/' st1 '==>' t2 '/' st2" := (step (t1,st1) (t2,st2)).
(** One slightly ugly point should be noted here: In the [ST_RefValue]
rule, we extend the state by writing [st ++ v1::nil] rather than
the more natural [st ++ [v1]]. The reason for this is that the
notation we've defined for substitution uses square brackets,
which clash with the standard library's notation for lists. *)
Hint Constructors step.
Definition multistep := (multi step).
Notation "t1 '/' st '==>*' t2 '/' st'" :=
(multistep (t1,st) (t2,st'))
(at level 40, st at level 39, t2 at level 39).
(* ################################################################# *)
(** * Typing *)
(** The contexts assigning types to free variables are exactly the
same as for the STLC: partial maps from identifiers to types. *)
Definition context := partial_map ty.
(* ================================================================= *)
(** ** Store typings *)
(** Having extended our syntax and reduction rules to accommodate
references, our last job is to write down typing rules for the new
constructs (and, of course, to check that these rules are sound!).
Naturally, the key question is, "What is the type of a location?"
First of all, notice that this question doesn't arise when
typechecking terms that programmers actually
write. Concrete location constants arise only in terms that are
the intermediate results of reduction; they are not in the
language that programmers write. So we only need to determine the
type of a location when we're in the middle of a reduction
sequence, e.g., trying to apply the progress or preservation
lemmas. Thus, even though we normally think of typing as a
_static_ program property, it makes sense for the typing of
locations to depend on the _dynamic_ progress of the program too.
As a first try, note that when we reduce a term containing
concrete locations, the type of the result depends on the contents
of the store that we start with. For example, if we reduce the
term [!(loc 1)] in the store [[unit, unit]], the result is [unit];
if we reduce the same term in the store [[unit, \x:Unit.x]], the
result is [\x:Unit.x]. With respect to the former store, the
location [1] has type [Unit], and with respect to the latter it
has type [Unit->Unit]. This observation leads us immediately to a
first attempt at a typing rule for locations:
Gamma |- lookup l st : T1
----------------------------
Gamma |- loc l : Ref T1
That is, to find the type of a location [l], we look up the
current contents of [l] in the store and calculate the type [T1]
of the contents. The type of the location is then [Ref T1].
Having begun in this way, we need to go a little further to reach a
consistent state. In effect, by making the type of a term depend on
the store, we have changed the typing relation from a three-place
relation (between contexts, terms, and types) to a four-place relation
(between contexts, _stores_, terms, and types). Since the store is,
intuitively, part of the context in which we calculate the type of a
term, let's write this four-place relation with the store to the left
of the turnstile: [Gamma; st |- t : T]. Our rule for typing
references now has the form
Gamma; st |- lookup l st : T1
--------------------------------
Gamma; st |- loc l : Ref T1
and all the rest of the typing rules in the system are extended
similarly with stores. (The other rules do not need to do anything
interesting with their stores -- just pass them from premise to
conclusion.)
However, this rule will not quite do. For one thing, typechecking
is rather inefficient, since calculating the type of a location [l]
involves calculating the type of the current contents [v] of [l]. If
[l] appears many times in a term [t], we will re-calculate the type of
[v] many times in the course of constructing a typing derivation for
[t]. Worse, if [v] itself contains locations, then we will have to
recalculate _their_ types each time they appear. Worse yet, the
proposed typing rule for locations may not allow us to derive
anything at all, if the store contains a _cycle_. For example,
there is no finite typing derivation for the location [0] with respect
to this store:
[\x:Nat. (!(loc 1)) x, \x:Nat. (!(loc 0)) x]
*)
(** **** Exercise: 2 stars (cyclic_store) *)
(** Can you find a term whose reduction will create this particular
cyclic store? *)
(** [] *)
(** These problems arise from the fact that our proposed
typing rule for locations requires us to recalculate the type of a
location every time we mention it in a term. But this,
intuitively, should not be necessary. After all, when a location
is first created, we know the type of the initial value that we
are storing into it. Suppose we are willing to enforce the
invariant that the type of the value contained in a given location
_never changes_; that is, although we may later store other values
into this location, those other values will always have the same
type as the initial one. In other words, we always have in mind a
single, definite type for every location in the store, which is
fixed when the location is allocated. Then these intended types
can be collected together as a _store typing_ -- a finite function
mapping locations to types.
As with the other type systems we've seen, this conservative typing
restriction on allowed updates means that we will rule out as
ill-typed some programs that could reduce perfectly well without
getting stuck.
Just as we did for stores, we will represent a store type simply
as a list of types: the type at index [i] records the type of the
values that we expect to be stored in cell [i]. *)
Definition store_ty := list ty.
(** The [store_Tlookup] function retrieves the type at a particular
index. *)
Definition store_Tlookup (n:nat) (ST:store_ty) :=
nth n ST TUnit.
(** Suppose we are given a store typing [ST] describing the store
[st] in which some term [t] will be reduced. Then we can use
[ST] to calculate the type of the result of [t] without ever
looking directly at [st]. For example, if [ST] is [[Unit,
Unit->Unit]], then we can immediately infer that [!(loc 1)] has
type [Unit->Unit]. More generally, the typing rule for locations
can be reformulated in terms of store typings like this:
l < |ST|
-------------------------------------
Gamma; ST |- loc l : Ref (lookup l ST)
That is, as long as [l] is a valid location, we can compute the
type of [l] just by looking it up in [ST]. Typing is again a
four-place relation, but it is parameterized on a store _typing_
rather than a concrete store. The rest of the typing rules are
analogously augmented with store typings. *)
(* ================================================================= *)
(** ** The Typing Relation *)
(** We can now formalize the typing relation for the STLC with
references. Here, again, are the rules we're adding to the base
STLC (with numbers and [Unit]): *)
(**
l < |ST|
-------------------------------------- (T_Loc)
Gamma; ST |- loc l : Ref (lookup l ST)
Gamma; ST |- t1 : T1
---------------------------- (T_Ref)