-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trie.get()/Trie.find() does not return data, but the data is present. #427
Comments
Environment: |
Thanks for reporting. Could you try testing to see if the If the There's a public function called Could you experiment with that test within your tests (ideally after after change)? If it's violated at some point, then we know what change to the Alternatively, if you post your entire test script, I could try to insert the uses of |
As I consider the possible causes, I have an additional check I'd make of the code, if I could: Namely, that there are no mutations to the objects in the Or, if they are mutated, then the mutated fields do not affect their hash or equality relationships. (If this is violated, it would explain the behavior you have now). |
@matthewhammer do you mean the keys, or the values? The keys here appear to be Blobs, thus immutable. |
I mean the keys. The values are not relevant for any of the structural invariants of the Trie. It is the hash of the keys that matters, where the bits of the hash determine the Trie's shape. (The trie uses these bits as needed, depending on its total size; so for If the key were to mutate, the placement in the tree would be "wrong" and would not help when locating it based on its (new) hash value. It's also important (but not relevant for this issue) that the hash be "mostly injective", sending different keys to different hashes, almost always. There is a little tolerance for collisions, but it's "small" (like eight per hash value).
Why are they using the |
returns :
|
If a Canister (e.g. a token) with a lot of data encounters this problem, how do I fix it? |
Possible causes. |
I think in this case, the system will still report an error as not enough cycle. |
What happens when you have more than 8 keys with the same hash value? |
A hash value has 4 bytes? (I think that's a little small) So, what is the safe capacity that a Trie can store? |
Yes, especially as the Hash functions used are rather … naive, so it is likely that collisions can be actively provoked by an attacker. It’s already bad enough if performance degrades in that case (and that alone might be reason enough not to use a hash-based data structure in code under threat), but silent functional breakage is of course intolerable, and would be very embarrassing, to put it mildly… |
Testing this code import TrieMap "mo:base/TrieMap";
import Iter "mo:base/Iter";
import Hash "mo:base/Hash";
import Array "mo:base/Array";
import Debug "mo:base/Debug";
let vs = Array.map([1,2,3,4,5,6,7,8,9,10], func (n : Nat) : Nat = n * 2**32);
type T = TrieMap.TrieMap<Nat,Nat>;
let t : T = TrieMap.fromEntries(
Iter.map<Nat,(Nat,Nat)>(vs.vals(), func (n) = (n,n)),
func (n1 : Nat, n2 : Nat) : Bool = n1 == n2,
func (n: Nat) : Hash.Hash = 0 // simulate hash collisions
);
for (n in vs.vals()) {
if (t.get(n) == ?n) {
Debug.print(debug_show n # " ok")
} else {
Debug.print(debug_show n # " not ok")
}
} I get an internal assertion
so it seems that the |
It should not be caused by Cycles (But I'm really not sure). The stress test also triggers this issue when there are enough Cycles in the canister. |
It would be helpful to see the full source code of the stress test canister. In particular, what part of the |
The When it reaches that point, the current implementation gives an admittedly confusing "error message", in the form of that assertion failure you found. But, if you understand the behavior, you shouldn't be "worried", you should just pick a real hash function. Even the "naive In any case, this PR from 1.5 years ago was meant to fix that "error message", but it started some discussion that was never resolved, I guess: #250
Probably true, but not relevant, so let's not just be needlessly negative here. Thanks. (Also, feel free to open a PR with some less naive contribution at any point.) |
I agree that 4 bytes may be too small in the limit, yes. For the test case above (under 1k items) it should be more than plenty. Even for a million entries, or hundreds of millions, it should still be perfectly fine, assuming the hash function is reasonable. (Why would you think otherwise? Or, at what larger size do you want to store data?) FWIW, the Motoko Playground has used the It would be interesting to test that |
I think this code is using Blob.hash which is crc32 hash (unless the implementation is broken). Right @nomeata? |
Yes, CRC32, so rather trivial to find collisions and, for example, register with a service with usernames that collide. |
"Trivial"? You assume that the attacker controls the entire input string to the hash, right? That does not seem to be the case here, though, or in any case where there's even a "salt" value hidden by the canister (something simple to do if one was worried, though not implemented here). Perhaps even with those measures, it could be possible to find collisions using user-controlled inputs. Even so, who cares? Having collisions is only "bad" if the original data is lost, and hashes are totally confused as identities or “one-way secrets” where the hash is supposed to conceal info. That’s not relevant here, AFAIK, and usually not what people need from a hash function to use it for a collection which tolerates some collisions.
Again, nothing is breaking silently when there is a collision. As for eschewing hashing in favor of comparison-based collections. I can see that working in many cases, but not all. For the Playground, comparing saved project entries means comparing the content of files as huge strings, whose prefixes are not very unique (import base, blah blah). If an adversary wanted, they could save lots of programs that all share the most common prefixes, making everything slower. Between that and the extra rebalancing costs of comparison-based structures, I worry about advocating comparison-based structures for all use cases. It's certainly not a silver bullet. I think devs need to consider each structure and how it will be accessed in their Dapp, and who has control over its keys and insertions of them. Sometimes, hashing is better, and perhaps in other cases, one would really want comparison instead. Each situation is different, and requires its own consideration. |
Just in case my plea above got buried in the discussion, I want to point at it again. I can offer much more effective assistance if you share your test canister code, or at least the parts that build or changes Seeing the accesses that work or do not is also instructive, but that's not where the structure is being corrupted. If for some reason you cannot share the code. Please re-run the code that builds the Trie with test data and use If they are merely uses of |
Developers who are worried are not the ones at risk. It's those that are not worried that we have to worry about. And I have seen plenty of real world Motoko code that's live on the IC that uses Maps (HashMap mostly, I believe, luckily, although I should check) with keys being use-chosen blobs (principals for example). That's why this worries me so much (less now that I saw it traps instead of silently becoming invalid, but is that true for all usage patterns?) |
We are trying to build test code that reproduces this issue. The Trie API that has been used in the code includes,
This is all. Other symptoms of stress testing in case of issue.
I think that Trie.put has an issue when refactoring the tree structure. Or, Trie.get/find is at some critical value when routing based on hash (e.g. in the case of equality at comparison). |
If the hash is 4 bytes in the discrete case, the probability of one collision is 1/(2**32) = 2.3E-10. |
We reproduced the issue in test canister po4px-oiaaa-aaaak-acubq-cai Testing:
the Trie is broken when putting item (txid = blob "\00\00\00\0b\0e}\a7\b0\12\1d\9a|2J$\ea\04\8d\ddaS\ae\ed\f6M\ad\a0\e0\ab@\f0m") code:
|
It is recommended to do stress tests on all modules of the base library, and to issue special documents to tell developers under what scenarios to use. Give developers a basis for making decisions. |
Just reading through the code and trying to understand how it works. Should Line 464 in add126e
|
Ah, and since you mention Lines 981 to 988 in add126e
|
Thanks for the repro! I've managed to create a variant of your code with some additional methods to retrieve the log from your canister (method There are also methods to view the log as candid or Motoko values (but it is large), in case someone want to extract a non-IC repro. https://m7sm4-2iaaa-aaaab-qabra-cai.raw.ic0.app/?tag=2343906604 After calling
@matthewhammer you should be able to experiment with this code by just using upgrades to add code without have to refetch the data each time. The log is stored in a stable variable. This is the modified canister at the link: /**
* Module : Trie Testing
* Author : ICLighthouse Team
*/
import Trie "mo:base/Trie";
import Principal "mo:base/Principal";
import Blob "mo:base/Blob";
import Array "mo:base/Array";
import Nat "mo:base/Nat";
import Nat32 "mo:base/Nat32";
import Time "mo:base/Time";
import Option "mo:base/Option";
import Iter "mo:base/Iter";
import List "mo:base/List";
import Hash "mo:base/Hash";
// Canister-id is po4px-oiaaa-aaaak-acubq-cai
// If you want to redeploy canister, you need to write the trieLog data in manually.
// Testing:
// dfx canister --network ic call po4px-oiaaa-aaaak-acubq-cai reset
// dfx canister --network ic call po4px-oiaaa-aaaak-acubq-cai reproduce '(false)'
shared(installMsg) actor class ICDexPair() = this {
type Trie<K,V> = Trie.Trie<K,V>;
type Key<K> = Trie.Key<K>;
let MAX_LEAF_SIZE = 8;
var error : Text = "";
func debugPrint(t : Text) : () {
error #= t;
};
/// Checks the invariants of the trie structure, including the placement of keys at trie paths
func checkValid<K, V>(t : Trie<K, V>, enforceNormal : Bool) : Bool {
func rec(t : Trie<K, V>, bitpos : ?Hash.Hash, bits : Hash.Hash, mask : Hash.Hash) : Bool {
switch t {
case (#empty) {
switch bitpos {
case null { true };
case (?_) {
not enforceNormal };
}
};
case (#leaf(l)) {
let len = List.size(l.keyvals);
((len <= MAX_LEAF_SIZE) or (not enforceNormal) or
(do { debugPrint("len > MAX_LEAF_SIZE" # debug_show len); false })
)
and
(len == l.size
or
(do { debugPrint("len != "); false }))
and
( List.all(
l.keyvals,
func ((k : Key<K>, v : V)) : Bool {
((k.hash & mask) == bits)
or
(do {
debugPrint("\nmalformed hash!:\n");
debugPrint(debug_show k.hash);
debugPrint("\n (key hash) != (path bits): \n");
debugPrint(debug_show bits);
debugPrint("\nmask : ");
debugPrint(debug_show(mask));
debugPrint("\n");
false
})
}
)
or
(do { debugPrint("one or more hashes are malformed"); false })
)
};
case (#branch(b)) {
let bitpos1 =
switch bitpos {
case null {Nat32.fromNat(0)};
case (?bp) {Nat32.fromNat(Nat32.toNat(bp) + 1)}
};
let mask1 = mask | (Nat32.fromNat(1) << bitpos1);
let bits1 = bits | (Nat32.fromNat(1) << bitpos1);
let sum = Trie.size(b.left) + Trie.size(b.right);
(b.size == sum
or (do { debugPrint("malformed size"); false })
)
and
rec(b.left, ?bitpos1, bits, mask1)
and
rec(b.right, ?bitpos1, bits1, mask1)
};
}
};
rec(t, null, 0, 0)
};
// Types
type Txid = Blob;
type BalanceChange = {
#DebitRecord: Nat;
#CreditRecord: Nat;
#NoChange;
};
type TradingOrder = {
account: Blob;
icrc1Account: ?{owner: Principal; subaccount: ?Blob; };
txid: Blob;
orderType: { #LMT; #FOK; #FAK; #MKT; };
orderPrice: { quantity: {#Buy: (quantity: Nat, amount: Nat); #Sell: Nat; }; price: Nat; };
time: Time.Time;
expiration: Time.Time;
toids: [Nat];
remaining: { quantity: {#Buy: (quantity: Nat, amount: Nat); #Sell: Nat; }; price: Nat; };
refund: (token0: Nat, token1: Nat, toid: Nat);
filled: [{counterparty: Blob; token0Value: BalanceChange; token1Value: BalanceChange; time: Time.Time }];
status: { #Todo; #Pending; #Closed; #Cancelled; };
gas : { gas0: Nat; gas1: Nat; };
fee : { fee0: Nat; fee1: Nat; };
index : Nat;
nonce: Nat;
data: ?Blob;
};
// Variables
private stable var index : Nat = 0;
private stable var icdex_orders : Trie.Trie<Txid, TradingOrder> = Trie.empty();
// The trieLog data comes from the canister where the issue was found and has been pre-stored here.
// The data can be queried by `testLog '(null)'`
private stable var trieLog = List.nil<(Txid, Text, Bool)>(); // (Txid, Method, IsValid)
private func keyb(t: Blob) : Trie.Key<Blob> { return { key = t; hash = Blob.hash(t) }; };
private func buildItem(_txid: Blob) : TradingOrder{
let tradingOrder: TradingOrder = {
account = Blob.fromArray([]);
icrc1Account = ?{owner = Principal.fromText("aaaaa-aa"); subaccount = null; };
txid = _txid;
orderType = #LMT;
orderPrice = { quantity = #Sell(50000); price = 10000; };
time = Time.now();
expiration = Time.now() + 10000*1000000000;
remaining = { quantity = #Sell(50000); price = 10000; };
toids = [];
filled = [];
status = #Todo;
refund = (0, 0, 0);
gas = {gas0=0; gas1=0};
fee = {fee0=0; fee1=0};
index = index;
nonce = index;
data = null;
};
index += 1;
return tradingOrder;
};
private stable var step : Nat = 0;
public func reset() : async Bool{
icdex_orders := Trie.empty();
step := 0;
return Trie.isValid(icdex_orders, true);
};
public func reproduce(enStep: Bool) : async (Txid, Text, Bool, Text){ // Returns (Txid, Method, IsValid)
error := "";
var resTxid : Blob = Blob.fromArray([]);
var resMethod : Text = "";
var resTrieValid : Bool = true;
var _step : Nat = 0;
label test for ((txid, method, isValid) in Array.reverse(List.toArray(trieLog)).vals()){
if (_step == step){
step += 1;
if (method == "put"){
icdex_orders := Trie.put(icdex_orders, keyb(txid), Blob.equal, buildItem(txid)).0;
}else if (method == "remove"){
icdex_orders := Trie.remove(icdex_orders, keyb(txid), Blob.equal).0;
}else if (method == "filter"){
icdex_orders := Trie.filter(icdex_orders, func (k: Txid, v: TradingOrder):Bool{ true });
};
if (not(enStep) and not(checkValid(icdex_orders, true))){
resTxid := txid;
resMethod := method;
resTrieValid := false;
break test;
}else if (enStep){
resTxid := txid;
resMethod := method;
resTrieValid := checkValid(icdex_orders, true);
break test;
};
};
_step += 1;
};
return (resTxid, resMethod, resTrieValid, error);
};
public query func testLog(_isValid: ?Bool) : async [(Txid, Text, Bool)]{
switch(_isValid){
case(?(isValid)){
let list = List.filter(trieLog, func (t:(Txid, Text, Bool)): Bool{
t.2 == isValid
});
return List.toArray(list);
};
case(_){
return List.toArray(trieLog);
};
};
};
public query func testTrieValid() : async Bool{
return Trie.isValid(icdex_orders, true);
};
public func setToRemoteLog() : async [(Txid,Text, Bool)] {
let remote = actor "po4px-oiaaa-aaaak-acubq-cai" : actor {
testLog : query ?Bool -> async [(Txid,Text, Bool)];
};
let rlog = await remote.testLog(null);
trieLog := List.fromArray(rlog);
rlog;
};
public func testLogText() : async Text {
debug_show trieLog;
}
}; |
Yes, I totally agree. Thank you for spotting this invariant violation! It's a little subtle (unless one really internalizes the invariant about paths and bits, as you did). Thanks again. To fix that code, it should only trim the case where there's a binary node and both of its subtrees are empty. In that case, there is no key below (in a subtree) that gets misaligned. Update:
Thanks for catching this. It doesn't appear relevant to this specific issue, but could may well be a separate bug of its own. I'll look at it more closely after we resolve the buggy behavior here. |
The Trie should never become invalid, no matter how the API is used. That it did become invalid here was a result of the logical error in However, even if the Trie is structured correctly at all times (as it should be), its hash function (independent of the Trie itself) could still be attacked, and I'm beginning to agree that's an issue that is worrying, and probably means that a comparison-based structure is the safest choice in most cases (or all cases). FWIW, Timo Hanke has been following a related internal discussion that this one seeded. He wrote this today, and said I could share here:
So maybe every use of HashMap or Trie should plan to migrate to a Motoko-based BTree, though none is in Relatedly, here's a candidate BTree in Motoko to consider. |
Trie is a good data structure, hopefully it will be fixed soon. 4 bytes length hash is safe for most use cases, such as using Principal or salted hash as key. RBTree is also a kind of B-Tree, any comment on it? |
If you haven't yet, it may be instructive to check out the data structure comparisons by @chenyan-dfinity here:
The RBTree could have better performance as a BTree, for a few reasons:
|
Thanks for the support. I also like the As its main author from many years ago, I apologize for the current poor situation regarding its unit tests, and the bugs you found in these lesser-used operations ( The backstory for this structure is that it was one of the first modules I wrote in Motoko, and I even wrote it initially before Motoko had variant types! (I initially encoded them with the other types it had at the time). At some point along the way, it was important to add unit tests but I was probably busy doing something else, and it didn't happen. Again, sorry for that. FWIW, this PR finally adds unit tests for My conclusion after doing those tests is that the The unit tests I wrote demonstrate a pattern that could work for other operations, but not all. I need to think more about how to best test the binary operations like merging and joining. |
In my test example above, the Tire.put operation also breaks the Trie structure.
Destruction occurred here
|
By the way, the underlying data structure will be less efficient to implement in motoko, is it possible to write it at Prim? |
That sounds exciting! Can it be made public? |
@chenyan-dfinity Can we make this repo public? Apologies, I didn't realize that was internal only! |
This might be ok™: The It might be clearer to not use |
If there is canister in production that uses |
We had this issue during a stress test. orders used the Trie module and had a total of 798 records.
Only this data was encountered unreadable, most of the other data can be read normally.
canister-id: ynbvm-hiaaa-aaaak-adbqa-cai
code:
execute:
dfx canister --network ic call ynbvm-hiaaa-aaaak-adbqa-cai testTrieFilter
returns:
code:
execute:
dfx canister --network ic call ynbvm-hiaaa-aaaak-adbqa-cai testTrieMapFilter
returns:
code:
execute:
dfx canister --network ic call ynbvm-hiaaa-aaaak-adbqa-cai testTrieGet
returns:
It is also
code:
execute:
dfx canister --network ic call ynbvm-hiaaa-aaaak-adbqa-cai testTrieFind
returns:
It is also
The text was updated successfully, but these errors were encountered: