-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add metering for deserialization #524
Conversation
Benchmark for 5c2aca5Click to view benchmark
|
Click to see raw report
|
* use the coercion function `C[(<t>,*) <: (<t'>,*)]((<v>,*))` to understand the decoded values at the expected type. | ||
* use the coercion function `v : t ~> v' : t'` to understand the decoded values at the expected type. | ||
|
||
Note on implementation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess this is admitting defeat for having an abstract cost model?
I can see that the formulation using inference rules makes this hard.
Would it still be hard if we specified the decoder as a monadic function, returning a cumulative cost and optional value, making the backtracking and skips explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or would that be too ugly or prescriptive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's possible to have an abstract cost model by assigning weights for each derivation rule. But I see several problems with this approach:
- It forces the implementation to strictly follow the derivation rules;
- Types in the host language is not completely captured by the model. For example,
[Nat8]
vsBlob
in Motoko, andVec<(K, V)>
vsHashMap<K,V>
in Rust. They map to the same Candid type, but have different cost in the host language. Even with the samerecord
type, it can be more expensive to construct a record in Rust than in Motoko (or vice versa); - Even if we have a cost model, it would be hard to come up with a threshold in the spec. It's application dependent (canister endpoints and inter-canister calls may need different bounds, and stable memory won't need any limit). Plus there is always a gap between the cost model and the real implementation.
Looking from a different angle, this is a resource problem originating from a specific platform and a specific implementation. Given infinite resource, the Candid spec doesn't need to reject these large payload. Just like we don't specify stack depth in the spec, we can leave the concrete cost metering to the implementation, so that it matches better with the platform and language runtime.
rust/candid/src/de.rs
Outdated
/// C : <val> -> <constype> -> nat | ||
/// C(null : opt <datatype>) = 2 | ||
/// C(?v : opt <datatype>) = 2 + C(v : <datatype>) | ||
/// C(?v : opt <datatype'>) = 2 + C(v : <datatype>) * 50 + 10 // when v cannot be converted to <datatype'> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// C(?v : opt <datatype'>) = 2 + C(v : <datatype>) * 50 + 10 // when v cannot be converted to <datatype'> | |
/// C(?v : opt <datatype'>) = 2 + C(v : <datatype>) * 50 + 10 // when v cannot be converted to <datatype'> |
where do 50 and 10 come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are some arbitrary numbers. 50x is the multiply when we skip values in the decoder. 10
accounts for the cost of restoring states in the backtracking.
rust/candid/src/de.rs
Outdated
/// C(?v : opt <datatype>) = 2 + C(v : <datatype>) | ||
/// C(?v : opt <datatype'>) = 2 + C(v : <datatype>) * 50 + 10 // when v cannot be converted to <datatype'> | ||
/// C(v^N : vec <datatype>) = 2 + 3 * N + sum_i C(v[i] : <datatype>) | ||
/// C(kv* : record {<fieldtype>*}) = 2 + sum_skipped_i C(kv : <fieldtype>*[skipped_i]) * 50 + sum_expected_i C(kv : <fieldtype>*[expected_i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know which are skipped/expected without having the expected type as an argument pf C(.)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, ideally it should be C(v : t ~> t')
, but it's also verbose. Maybe I can define just C(v:t)
, and say that skipping values cost 50x more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments. Maybe we could discuss a bit tomorrow, or @mraszyk can fill me in on how we got here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline
No description provided.