-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy paththoughts.txt
421 lines (293 loc) · 17.8 KB
/
thoughts.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
Identify all properties that are "externally visible"/exported. These properties are ineligible for renaming.
Examples include exported variable, public properties of exported classes, global variables, values passed to
unknown/blackbox places like DOM methods, proxy/reflection, dynamic property accesses (unless the set of keys is known).
e.g. properties of "Foo" are ineligible in the following:
window.foo = Foo;
unknownFn(Foo);
Foo[unknownKey] = 1;
If two entities, A and B, share a property name ("prop1"), which is eligible for renaming on both entities,
and it can be proven that "prop1" is never accessed on a value that could be an A OR a B, then A's "prop1"
and B's "prop1" can be renamed distinctly e.g. to "A$prop1" and "B$prop1".
https://github.com/google/closure-compiler/wiki/Type-Based-Property-Renaming
http://closuretools.blogspot.com/2011/01/property-by-any-other-name-part-1.html
http://closuretools.blogspot.com/2011/01/property-by-any-other-name-part-2.html
http://closuretools.blogspot.com/2011/01/property-by-any-other-name-part-3.html
http://closuretools.blogspot.com/2012/09/which-compilation-level-is-right-for-me.html
Simple version: for all static property accesses (e.g. "obj.prop") and declarations (e.g. "obj = {prop:1}"),
determine if their object is invalidated (public/exported/dynamic). If none of the objects that have a given
property are invalidated, then all references to the property can be conflated/renamed.
Track 'objects':
- Created by object literals
- Need to track:
* Can flow into variables through assignment/declaration e.g. `a = {}` or `const b = {}`
Then need to track variable usage, including all aliasing
* Literal can be used directly in expressions e.g. `({a: 1}).a` or `callFn({a:1})`
* In all cases, should refer to same 'object'.
* Need to record all references to object and its properties.
- Properties are declared by writes/declarations e.g. 'a' in `const v1 = {a:...}` or `v2.a = ...`
- The properties of an object are the set of all properties that are written to the object, even
they occur after a particular read.
- If the object is accessed dynamically or in an unknown way, we pessimistically assume that it is
modified in any/every way, and invalidate if for optimisation. e.g. for object in variable 'a':
`a[someExpression] = ...` or `fnCall(a)` or `window.v = a`
- When an expression could be a number of different objects they are conflated e.g.
obj1 obj2 obj1 ∪ obj2
vvvvv vvvvv v
if (...) { a = {...}; } else { a = {...}; } a;
For each function:
- Track object invalidations and property reads/writes.
* If the object in question can be resolved to a locally defined object, these invalidations/reads/writes
can be directly attributed to the correct 'object'.
* Else, a stub object is created to record this function's interactions with the object.
e.g. `a.foo` and `a.foo = ...` are respectively recorded as a read and write to property 'foo' on some
object referenced by 'a'.
Similarly, `a().foo` and `a().foo = ...` are recorded as a read/write to property 'foo' on some object
returned by the function 'a'.
* The info collected in each function is then iteratively reconciled until a fixed point is reached:
# Stub objects from variables are matched against, and conflated with, variables from parent functions
that may have been captured by the inner function.
# Stub objects from function calls are matched against, and conflated with, objects returned from the
called function.
Once a fixed point is reached, we should
* Else, a stub object is created to record this function's interactions with the object. e.g. for non-local
variable 'a': `a.foo` and `a.foo = ...` are respectively recorded as a read and write to property 'foo'
on some object referenced by 'a'.
Similarly,
Invalidations can have dependencies e.g. `foo(a)`: 'a' is invalidated by the call only if 'foo' invalidates its first param
Collections are opaque bags of values - we track the types that go in, and the output type is the union of the inputs.
We don't care about the specific values that are in a collection at a point in time, or which specific item is outputted.
insert item and retrieve item e.g.
array: push, index with [i] etc
Map: set, get
Object literals are a special kind of collection where we attempt to know the set of possible keys and info about their
mapped values. This is not possible if the object is accessed dynamically using computed properties, in which case we
can't track its properties or their values, but we can track the output type (union of inputs).
If a collection is invalidated (e.g. assigned to unknown variable, passed to unknown function), then we can't conclude
anything about it, and any value entering it is invalidated.
TODO:
catch and throw
destructuring
Compute maybe reaching defs.
For each ref, its type is the union/conflation of types of maybe reaching defs.
Type of def is union/conflation of types in its RHS
A prop access should only force a conflation merge of the conflation constituents that contain the property. More similar to AmbiguateProperties.
The only properties that must share the same name are those that are:
(1) in the intersection of a conflation's constituent's properties; and
(2) accessed as a property of the conflation
Need to consider sub/super types as well, for classes and object/function prototypes.
For now, accessing the prototype of an entity should invalidate it, and any entity assigned to a prototype should also be invalidated.
if function is invalidated, invalidate its output
two versions of function analysis:
1. "owned" analysis of the code within the function. Used to do renaming of the function's code.
2. "other" analysis used to compute side effects and return value at call site
for the code within a function, we need to track the union/join of its possible values.
e.g.
function foo(a) {
a.prop;
}
foo({prop: 1});
foo({prop: 2});
the property ref in the function should be recorded on the union of the two object literals.
after each call, join function's saved lattices with the final lattices after the call
at call time, gather external dependencies of function and pass them to it (params and non-local vars)
for each function create placeholder types for each param and the return type.
run type inference on te function to discover the relationships between inputs and outputs
e.g. that a function returns its first parameter. Then each call can substitute the types of its arguments
for the function's param placeholders to find the return type for those particular args.
Then for the function as a whole, the type of each external dependency within its body for the purpose of renaming
is the union the types the external dependency had for each call.
simplify the call graph by:
for each function:
for each non-function return type:
if function depends on return type and function and type ae not already directly connected:
add edge from function to return type
remove all function-function edges
function analysis should yield the return type in terms of the params or constant/static/local objects.
this template must be in terms of calls, property accesses, unions, and references, to params.
this template can then hydrated by the caller, passing in arguments to be substituted into the template
visit non recursive paths(no cycles). then propagate that return value through the cyclic paths until fixed point
backwards flow, from leafs inwards
convert return type into a finite state machine of transforms/accesses on the params.
this can then be executed against the arguments at call time.
e.g.
function foo(arg) {
if (cond1) {
return foo(arg).prop1;
} else if (cond2) {
return foo(arg).prop2;
} else {
return arg.prop3;
}
}
foo could return the value of a any property chain in arg consisting of "prop3" followed by 0 or more "prop1" or "prop2".
e.g. arg.prop3.prop1.prop1 or arg.prop3 or arg.prop2.prop1.prop2
start at each leaf (should be a param placeholder), try to match. need to find all matchings, so don't
stop after first match, keep going until no matched possible. at a fork (>1 out edges), fork execution and match all branches
for writes to params,union the prop assignments for the param at each return site..
at a call, return "call" type and record the cfg node under the call. Then, when the 'call' type is resolved,
the call site cfg nodes can be added to the dataflow analysis work queue, and a re-analysis computed.
this should propagate the new side effects and return type forwards.
maybe have separate analyses for global and each function, so its easier to "call" them
todo: computed props aren't really visited/analysed
allow each out edge to be traversed n times per path, where n is the in-degree of the node
don't add outdated dependants to call queue. When we pop and process a call, remove outdated dependednts
create DAG by compressing SCCs. Then use (presumable existing) algorithms to process DAG in parallel.
One 'lower' nodes have been processed, they never need to be touched again, and their resources can be freed/reused
if a function invalidates all of its arguments, we don't need to
TODO: store list of calls with "invalidated return types", which may open up simplifications/work skipping.
Current best typescript.js output size: 8,089,763 bytes
before:
8,089,763 bytes
after:
8,089,729 bytes
Sub and super type relation graph. Built-ins are represented as invalid objects (so their props aren't changed).
Instances of these built-ins (such as number literals) are their own objects that are a subtype of the built-in.
Most vars/props in a Lattice are probably from parent scopes/CFG nodes. Therefore, large part of each
lattice is likely static...????? Can we reduce the amount we clone/hash/eq-check this data, similar
to how fn_assignments were pulled out of lattices because they are static.
Currently, I think all previously encountered vars, even those in now-inaccessible scopes, are carried over
to all future lattices.
in one of the initial analysis (such as fn finding/template building), gather statistics and use them to preallocate
various structures. e.g. each ObjectLit gets its own ObjectId and entry in objects_map, so their count can be used to
pre-allocate object_map. Each fn gets an entry in function_map and FnId and static data
allocate space for one lattice/node-annotation per CFG node?
also pre-alloc workQueue?
When constructing call templates, identify patterns in functions (e.g. side-effect free, invariant return type)
and use these to simplify call resolution.
Identify common step patterns and only create them once e.g. all implicit returns generate the steps:
StoreRValue(Some(NullOrVoid))
Return
We appear to inconsistently rename TSC, potentially breaking it. Search typescript.js for:
var failedLookupLocations = [];
var state =
RValues cannot be compared if they are not primitives. Previously i thought vars
could be equalted are deduplicated, but if they occur at different parts in the
program, they may read different values e.g.
obj.prop = 1;
obj.prop; (a)
obj.prop = false;
obj.prop; (b)
(a) and (b) read different values
if a static RValue (i.e. ObjectId, FnId) is invalidated multiple times, just keep the first invalidation.
When returning a function that captures local variables, create a closure that stores those
instead of Option<Pointer>, use a wrapper enum/struct that encodes data about the pointer's origin.
e.g. that its from a function argument. Then when certain operations occur, their effects can be recorded
e.g. that a function's argument was written to
its not safe to substitute None for a union which reports invalid without first invalidating the union.
this is because only one constituent need to be invalid for the union to report its invalid, but
replacing it with None may loose the info that the other constituents were in, and possibly accessed from,
a union with an invalid constituent.
also check that when we invalidate a union, we don't bail early if it reports to be invalid, as, again,
not all constituents might be invalid.
https://hardekbc.github.io/files/dewey15parallel.pdf
5.1 Configuration Focus
the stack concept look interesting for 'call objects' concept i had
No capture:
Standard fn with params, return type, side effects etc.
if a function local var is captured, create a slot for it for each fn call.
struct Closure {
func: FnId,
env: FxHashMap<VarId, Capture>
}
function f1() {
let variable = { prop1: 1 };
function get() {
return variable;
}
f2(get);
variable.prop2;
return variable;
}
function f2(func) {
let obj = get();
obj.prop2 = 2;
}
f1().prop2;
TODO:
Accessing a VarId that has nothing assigned to it should return NullOrVoid. To get
a VarId, a declaration for the var must exist, and for a reference to be resolved
to it, it must be in scope. Declared but un-initialized vars have value undefined.
^^^^^^^^^think this is wrong
when processing functions after the global dataflow, the value of captured variables should be the union
of their values at any point in the program. re-initialize for each fn processed. A fn may override
the value.
Only process called/invalid functions. The effects of called functions are obviously used, while we
can't know if the effects of an invalid fn are observed, so must consider them as well.
if a node has no steps, we can skip it in the dataflow analysis.
Rather than join its inputs into its output, only for its successors to potentially merge its output
with their other inputs, we should try to move straight from its predecessors to its successors.
deduplicate functions by their steps
more eagerly swap from invalid pointer to None
maybe remove None var assignments when crossing fn boundaries (such as when assembling call args).
this assumes that undeclared vars have value None
optimise nested union steps (by flattening)?
similar to how fn assignments are handled, don't copy/propagate the functions input lattice throughout
its dataflow analysis. instead, reference a static copy and only store modifications
identify built in methods
record parameter modifications by param position, then apply instead of calling
only copy over properties that a function references
don't intern leaf lattices (as they will not be accessed again?)
more eagerly replace fully invalid pointers with None e.g. when merging prop assignments, retrieving prop/var assignments
from simple name resolution, we can determine what simple functions call, but not what calls them
function isNumber(x) {
return typeof x === "number";
}
unassigned (but declared) vars should have value NullOrVoid
prove which captured vars are written to by call. vars are assumed to be written to unless proven otherwise.
if a call may write to a captured var/arg, assign RESOLVING_CALL to it
fn vars should not be treated as captured vars?
When calling a fn, if a param is not used in the template, record that the fn was called with the value but
don't store it in the Call object for caching call resolutions
Identify vars whose value is always invalidated before it is reassigned. e.g.
foo[someExpr]
obj[someExpr] = foo
unresolved = foo
unresolved.prop = foo
unresolved(foo)
unresolved.prop(foo)
function f(foo) {
arguments[i]
}
new foo()
new thing(foo)
yield foo
tag`{foo}`
Clarify handling of invalid things.
1. We don't know the type of an entity (i.e. None). It might have any property.
2. We don't know which properties an object has or their values (i.e. invalid_objects.contains(obj) ) e.g. because it was passed to an unknown function
Are 1 and 2 the same?
Eagerly replace invalid objects with None?
Immediately invalidate object that is in union with invalid entity?
Identify places where the result of an unresolved call is Immediately invalidated, in which case we might not need to resolve the call
Need to invalidate function to fix react-dom. A var's only use is in a function that invalidates it, but we can't track
any calls to that function, so it doesn't capture the var and invalidate it.
invalidate function:
return value is invalidated
params are unioned with None
identity fn?
invalidating a fn after we have analysed it?
analyse functions called from global scope
analyze uncalled, but "public" functions. These might be called by external code
skip all other functions, as presumably they are dead
All trackable props come from a specific node in the AST
Union find /conflate props while analysing
just need to disambiguate and determine which properties are escaped (can't be renamed)
Start with all property accesses as referencing distinct properties. Then conflate
those that access the same property.
obj1.propA // prop1
obj2.propA // prop2
obj1.propB // prop3
obj2.propB // prop4
(obj1 || obj2).propB // prop3 = prop4
good functions that only call other good functions and don't touch external vars can be computed ahead of time
if we come to resolve a call, replace it wth a call with its invalid inputs replaced with None
Analysis goals:
- Determine if two property accesses refer to the same underlying property (for property renaming)
- Determine if a property refers to an (unmodified) built-in property (e.g. for simplifying known method calls e.g. Map.get())
This also included implicit methods e.g. ` "" + a ` calls a's toString() method. We would like to know if this has been
overwritten or is still the default (and thus optimisable).
Analyse functions using placeholders for their parameters (including implicit ones such as 'this'/captured vars).
Also use placeholders for return types.
Then, when calling functions, merge the arguments with their formal parameters.
each prop in code gets id
merge ids when they can point to the same thing