Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trie.get()/Trie.find() does not return data, but the data is present. #427

Closed
iclighthouse opened this issue Nov 16, 2022 · 40 comments · Fixed by #431
Closed

Trie.get()/Trie.find() does not return data, but the data is present. #427

iclighthouse opened this issue Nov 16, 2022 · 40 comments · Fixed by #431

Comments

@iclighthouse
Copy link

iclighthouse commented Nov 16, 2022

We had this issue during a stress test. orders used the Trie module and had a total of 798 records.

Only this data was encountered unreadable, most of the other data can be read normally.

canister-id: ynbvm-hiaaa-aaaak-adbqa-cai

  1. Can get the data using Trie.filter().
    code:
private func keyb(t: Blob) : Trie.Key<Blob> { return { key = t; hash = Blob.hash(t) }; };

public query func testTrieFilter() : async TrieList<Txid, TradingOrder>{
        let _txid : Text = "000000c28ca1e305eeaeaccdd069e91a45f9668d6a3407e31a5c0db7c9372b28"; // blob "\00\00\00\c2\8c\a1\e3\05\ee\ae\ac\cd\d0i\e9\1aE\f9f\8dj4\07\e3\1a\5c\0d\b7\c97+("
        var txid : Blob = Blob.fromArray([]);
        switch(Hex.decode(_txid)){
            case(#ok(r)){ txid := Blob.fromArray(r); };
            case(_){};
        };
        var trie = icdex_orders;
        let page = 1;
        let size = 10;
        trie := Trie.filter<Txid, TradingOrder>(trie, func (k:Txid, v:TradingOrder): Bool{ 
            k == txid; 
        });
        return trieItems<Txid, TradingOrder>(trie, page, size);
    };

execute:
dfx canister --network ic call ynbvm-hiaaa-aaaak-adbqa-cai testTrieFilter
returns:

(
  record {
    total = 1 : nat;
    data = vec {
      record {
        blob "\00\00\00\c2\8c\a1\e3\05\ee\ae\ac\cd\d0i\e9\1aE\f9f\8dj4\07\e3\1a\5c\0d\b7\c97+(";
        record {
          fee = record { fee0 = 0 : nat; fee1 = 0 : nat };
          gas = record { gas0 = 0 : nat; gas1 = 0 : nat };
          status = variant { Pending };
          toids = vec {};
          data = null;
          time = 1_668_476_725_633_900_968 : int;
          txid = blob "\00\00\00\c2\8c\a1\e3\05\ee\ae\ac\cd\d0i\e9\1aE\f9f\8dj4\07\e3\1a\5c\0d\b7\c97+(";
          icrc1Account = opt record {
            owner = principal "vlpmr-zi3rn-n6nwy-qx6jf-snrkm-nwoje-jghma-7j3lv-irskf-yojl3-eqe";
            subaccount = null;
          };
          orderType = variant { LMT };
          filled = vec {};
          expiration = 1_676_252_725_633_900_968 : int;
          nonce = 194 : nat;
          account = blob "\c1\e9\a9\c94\8f\ef\c3\e04\c9\08L\cd\0b\b0\0f\c4\9a\8e\c2\0a\0f\89\14\b2\a5R\f3\c3P\bc";
          remaining = record {
            quantity = variant { Sell = 100_000_000 : nat };
            price = 300 : nat;
          };
          index = 940 : nat;
          orderPrice = record {
            quantity = variant { Sell = 100_000_000 : nat };
            price = 300 : nat;
          };
          refund = record { 0 : nat; 0 : nat; 0 : nat };
        };
      };
    };
    totalPage = 1 : nat;
  },
)
  1. Can get the data using Trie.mapFilter().
    code:
public query func testTrieMapFilter() : async TrieList<Txid, TradingOrder>{
        let _txid : Text = "000000c28ca1e305eeaeaccdd069e91a45f9668d6a3407e31a5c0db7c9372b28"; // blob "\00\00\00\c2\8c\a1\e3\05\ee\ae\ac\cd\d0i\e9\1aE\f9f\8dj4\07\e3\1a\5c\0d\b7\c97+("
        var txid : Blob = Blob.fromArray([]);
        switch(Hex.decode(_txid)){
            case(#ok(r)){ txid := Blob.fromArray(r); };
            case(_){};
        };
        var trie = icdex_orders;
        let page = 1;
        let size = 10;
        trie := Trie.mapFilter<Txid, TradingOrder, TradingOrder>(trie, func (k:Txid, v:TradingOrder): ?TradingOrder{ 
            if (k == txid){
                return ?v;
            }else { return null; }; 
        });
        return trieItems<Txid, TradingOrder>(trie, page, size);
    };

execute:
dfx canister --network ic call ynbvm-hiaaa-aaaak-adbqa-cai testTrieMapFilter
returns:

(
  record {
    total = 1 : nat;
    data = vec {
      record {
        blob "\00\00\00\c2\8c\a1\e3\05\ee\ae\ac\cd\d0i\e9\1aE\f9f\8dj4\07\e3\1a\5c\0d\b7\c97+(";
        record {
          fee = record { fee0 = 0 : nat; fee1 = 0 : nat };
          gas = record { gas0 = 0 : nat; gas1 = 0 : nat };
          status = variant { Pending };
          toids = vec {};
          data = null;
          time = 1_668_476_725_633_900_968 : int;
          txid = blob "\00\00\00\c2\8c\a1\e3\05\ee\ae\ac\cd\d0i\e9\1aE\f9f\8dj4\07\e3\1a\5c\0d\b7\c97+(";
          icrc1Account = opt record {
            owner = principal "vlpmr-zi3rn-n6nwy-qx6jf-snrkm-nwoje-jghma-7j3lv-irskf-yojl3-eqe";
            subaccount = null;
          };
          orderType = variant { LMT };
          filled = vec {};
          expiration = 1_676_252_725_633_900_968 : int;
          nonce = 194 : nat;
          account = blob "\c1\e9\a9\c94\8f\ef\c3\e04\c9\08L\cd\0b\b0\0f\c4\9a\8e\c2\0a\0f\89\14\b2\a5R\f3\c3P\bc";
          remaining = record {
            quantity = variant { Sell = 100_000_000 : nat };
            price = 300 : nat;
          };
          index = 940 : nat;
          orderPrice = record {
            quantity = variant { Sell = 100_000_000 : nat };
            price = 300 : nat;
          };
          refund = record { 0 : nat; 0 : nat; 0 : nat };
        };
      };
    };
    totalPage = 1 : nat;
  },
)
  1. Cannot query the data using Trie.get()
    code:
public query func testTrieGet() : async OrderStatusResponse{
        let _txid : Text = "000000c28ca1e305eeaeaccdd069e91a45f9668d6a3407e31a5c0db7c9372b28"; // blob "\00\00\00\c2\8c\a1\e3\05\ee\ae\ac\cd\d0i\e9\1aE\f9f\8dj4\07\e3\1a\5c\0d\b7\c97+("
        var txid : Blob = Blob.fromArray([]);
        switch(Hex.decode(_txid)){
            case(#ok(r)){ txid := Blob.fromArray(r); };
            case(_){};
        };
        switch(Trie.get(icdex_orders, keyb(txid), Blob.equal)){
            case(?(order)){ return #Pending(order); };
            case(_){ return #None; };
        };
    };

execute:
dfx canister --network ic call ynbvm-hiaaa-aaaak-adbqa-cai testTrieGet
returns:

(variant { None })

It is also

(variant { 870_530_776 })
  1. Cannot query the data using Trie.find()
    code:
public query func testTrieFind() : async OrderStatusResponse{
        let _txid : Text = "000000c28ca1e305eeaeaccdd069e91a45f9668d6a3407e31a5c0db7c9372b28"; // blob "\00\00\00\c2\8c\a1\e3\05\ee\ae\ac\cd\d0i\e9\1aE\f9f\8dj4\07\e3\1a\5c\0d\b7\c97+("
        var txid : Blob = Blob.fromArray([]);
        switch(Hex.decode(_txid)){
            case(#ok(r)){ txid := Blob.fromArray(r); };
            case(_){};
        };
        switch(Trie.find(icdex_orders, keyb(txid), Blob.equal)){
            case(?(order)){ return #Pending(order)};
            case(_){ return #None; };
        };
    };

execute:
dfx canister --network ic call ynbvm-hiaaa-aaaak-adbqa-cai testTrieFind
returns:

(variant { None })

It is also

(variant { 870_530_776 })
@iclighthouse iclighthouse changed the title Cannot query the data using Trie.get(), but you can get the data using Trie.mapFilter(). Cannot query the data using Trie.get(), but you can get the data using Trie.filter() and Trie.mapFilter(). Nov 16, 2022
@iclighthouse iclighthouse changed the title Cannot query the data using Trie.get(), but you can get the data using Trie.filter() and Trie.mapFilter(). Cannot query the data using Trie.get()/Trie.find(), but you can get the data using Trie.filter()/Trie.mapFilter(). Nov 16, 2022
@iclighthouse iclighthouse changed the title Cannot query the data using Trie.get()/Trie.find(), but you can get the data using Trie.filter()/Trie.mapFilter(). Trie.get()/Trie.find() does not return data, but the data is present. Nov 16, 2022
@iclighthouse
Copy link
Author

Environment:
MacBook
dfx 0.11.2

@matthewhammer
Copy link
Contributor

matthewhammer commented Nov 16, 2022

Thanks for reporting.

Could you try testing to see if the Trie invariants are being broken somehow?

If the find algorithm (which is dead simple) cannot find the item, but it's actually there (via filter, etc) then it seems likely that something is breaking the structural invariants of the trie itself, so it no longer represents the right thing.

There's a public function called isValid that was written long ago. It should be used in the unit tests, but is not. (Here's a small step towards correcting that.)

Could you experiment with that test within your tests (ideally after after change)? If it's violated at some point, then we know what change to the Trie corrupted the structural invariants, making items impossible to find.

Alternatively, if you post your entire test script, I could try to insert the uses of isValid myself.

@matthewhammer
Copy link
Contributor

As I consider the possible causes, I have an additional check I'd make of the code, if I could:

Namely, that there are no mutations to the objects in the Trie. (The objects in the trie are immutable).

Or, if they are mutated, then the mutated fields do not affect their hash or equality relationships. (If this is violated, it would explain the behavior you have now).

@crusso
Copy link
Contributor

crusso commented Nov 16, 2022

As I consider the possible causes, I have an additional check I'd make of the code, if I could:

Namely, that there are no mutations to the objects in the Trie. (The objects in the trie are immutable).

Or, if they are mutated, then the mutated fields do not affect their hash or equality relationships. (If this is violated, it would explain the behavior you have now).

@matthewhammer do you mean the keys, or the values? The keys here appear to be Blobs, thus immutable.

@matthewhammer
Copy link
Contributor

matthewhammer commented Nov 16, 2022

I mean the keys. The values are not relevant for any of the structural invariants of the Trie.

It is the hash of the keys that matters, where the bits of the hash determine the Trie's shape. (The trie uses these bits as needed, depending on its total size; so for n items, log_2(n) bits are used.)

If the key were to mutate, the placement in the tree would be "wrong" and would not help when locating it based on its (new) hash value.

It's also important (but not relevant for this issue) that the hash be "mostly injective", sending different keys to different hashes, almost always. There is a little tolerance for collisions, but it's "small" (like eight per hash value).

The keys here appear to be Blobs, thus immutable.

Why are they using the var keyword in these snippets?

@iclighthouse
Copy link
Author

public query func testTrieValid() : async Bool{
        return Trie.isValid(icdex_orders, true);
    };

returns :

(false)

@iclighthouse
Copy link
Author

If a Canister (e.g. a token) with a lot of data encounters this problem, how do I fix it?

@iclighthouse
Copy link
Author

Possible causes.
Cycles are heavily pre-charged during stress tests, and then Cycles are mostly returned as the executions continue to complete.
Trie's should be broken when there were not enough Cycles are pre-charged and the system did not report an error. This situation is not really a lack of available Cycles.

@chenyan-dfinity
Copy link
Contributor

Trie's should be broken when there were not enough Cycles are pre-charged and the system did not report an error.

I think in this case, the system will still report an error as not enough cycle.

@nomeata
Copy link
Contributor

nomeata commented Nov 17, 2022

There is a little tolerance for collisions, but it's "small" (like eight per hash value).

What happens when you have more than 8 keys with the same hash value?

@iclighthouse
Copy link
Author

A hash value has 4 bytes? (I think that's a little small)

So, what is the safe capacity that a Trie can store?
Errors in smart contracts due to hidden restrictions can be very costly. If such restrictions are not explicitly pointed out and the dapps that encounter such limitations should have a large number of users and assets, then the damage to the ecosystem will be significant.

@nomeata
Copy link
Contributor

nomeata commented Nov 17, 2022

Yes, especially as the Hash functions used are rather … naive, so it is likely that collisions can be actively provoked by an attacker. It’s already bad enough if performance degrades in that case (and that alone might be reason enough not to use a hash-based data structure in code under threat), but silent functional breakage is of course intolerable, and would be very embarrassing, to put it mildly…

@nomeata
Copy link
Contributor

nomeata commented Nov 17, 2022

Testing this code

import TrieMap "mo:base/TrieMap";
import Iter "mo:base/Iter";
import Hash "mo:base/Hash";
import Array "mo:base/Array";
import Debug "mo:base/Debug";

let vs = Array.map([1,2,3,4,5,6,7,8,9,10], func (n : Nat) : Nat = n * 2**32);

type T = TrieMap.TrieMap<Nat,Nat>;
let t : T = TrieMap.fromEntries(
  Iter.map<Nat,(Nat,Nat)>(vs.vals(), func (n) = (n,n)),
  func (n1 : Nat, n2 : Nat) : Bool = n1 == n2,
  func (n: Nat) : Hash.Hash = 0 // simulate hash collisions
);

for (n in vs.vals()) {
  if (t.get(n) == ?n) {
    Debug.print(debug_show n # " ok")
  } else {
    Debug.print(debug_show n # " not ok")
  }
}

I get an internal assertion

assertion failed at Hash.mo:16.5-16.27

so it seems that the Trie structure can’t handle that, but at least it falls over loudly. Slightly better than silently corrupting the data structure. Still worrysome.

@iclighthouse
Copy link
Author

iclighthouse commented Nov 17, 2022

Possible causes. Cycles are heavily pre-charged during stress tests, and then Cycles are mostly returned as the executions continue to complete. Trie's should be broken when there were not enough Cycles are pre-charged and the system did not report an error. This situation is not really a lack of available Cycles.

It should not be caused by Cycles (But I'm really not sure). The stress test also triggers this issue when there are enough Cycles in the canister.

@matthewhammer
Copy link
Contributor

It would be helpful to see the full source code of the stress test canister. In particular, what part of the Trie API does it use? Only put and get (or find)?

@matthewhammer
Copy link
Contributor

I get an internal assertion

assertion failed at Hash.mo:16.5-16.27

so it seems that the Trie structure can’t handle that, but at least it falls over loudly. Slightly better than silently corrupting the data structure. Still worrysome.

The Trie tolerates a fixed number of collisions per hash, up to 8 currently. It's a constant in the code, and could be changed easily.

When it reaches that point, the current implementation gives an admittedly confusing "error message", in the form of that assertion failure you found.

But, if you understand the behavior, you shouldn't be "worried", you should just pick a real hash function. Even the "naive
hash functions in Hash do not seem to have this issue in practice, AFAIK.

In any case, this PR from 1.5 years ago was meant to fix that "error message", but it started some discussion that was never resolved, I guess: #250

Yes, especially as the Hash functions used are rather … naive, so it is likely that collisions can be actively provoked by an attacker.

Probably true, but not relevant, so let's not just be needlessly negative here. Thanks.

(Also, feel free to open a PR with some less naive contribution at any point.)

@matthewhammer
Copy link
Contributor

A hash value has 4 bytes? (I think that's a little small)

So, what is the safe capacity that a Trie can store? Errors in smart contracts due to hidden restrictions can be very costly. If such restrictions are not explicitly pointed out and the dapps that encounter such limitations should have a large number of users and assets, then the damage to the ecosystem will be significant.

I agree that 4 bytes may be too small in the limit, yes. For the test case above (under 1k items) it should be more than plenty.

Even for a million entries, or hundreds of millions, it should still be perfectly fine, assuming the hash function is reasonable.

(Why would you think otherwise? Or, at what larger size do you want to store data?)

FWIW, the Motoko Playground has used the Trie to support its "save" and "share" features since summer 2021:

https://github.com/dfinity/motoko-playground/pull/57/files#diff-9cc04944593f6c9727e7baffce13b2d62bd26d0a55be30babfd84ea985f85b41

It would be interesting to test that isValid check on its master Trie of all such saved projects, and we should try that. But I haven't heard of anyone trying to do a "save" and then it failing to load later.

@crusso
Copy link
Contributor

crusso commented Nov 17, 2022

I think this code is using Blob.hash which is crc32 hash (unless the implementation is broken). Right @nomeata?

@nomeata
Copy link
Contributor

nomeata commented Nov 17, 2022

Yes, CRC32, so rather trivial to find collisions and, for example, register with a service with usernames that collide.

@matthewhammer
Copy link
Contributor

Yes, CRC32, so rather trivial to find collisions and, for example, register with a service with usernames that collide.

"Trivial"? You assume that the attacker controls the entire input string to the hash, right? That does not seem to be the case here, though, or in any case where there's even a "salt" value hidden by the canister (something simple to do if one was worried, though not implemented here).

Perhaps even with those measures, it could be possible to find collisions using user-controlled inputs.

Even so, who cares? Having collisions is only "bad" if the original data is lost, and hashes are totally confused as identities or “one-way secrets” where the hash is supposed to conceal info. That’s not relevant here, AFAIK, and usually not what people need from a hash function to use it for a collection which tolerates some collisions.

so it is likely that collisions can be actively provoked by an attacker. It’s already bad enough if performance degrades in that case (and that alone might be reason enough not to use a hash-based data structure in code under threat), but silent functional breakage is of course intolerable, and would be very embarrassing, to put it mildly…

Again, nothing is breaking silently when there is a collision.

As for eschewing hashing in favor of comparison-based collections. I can see that working in many cases, but not all.

For the Playground, comparing saved project entries means comparing the content of files as huge strings, whose prefixes are not very unique (import base, blah blah).

If an adversary wanted, they could save lots of programs that all share the most common prefixes, making everything slower.

Between that and the extra rebalancing costs of comparison-based structures, I worry about advocating comparison-based structures for all use cases. It's certainly not a silver bullet.

I think devs need to consider each structure and how it will be accessed in their Dapp, and who has control over its keys and insertions of them. Sometimes, hashing is better, and perhaps in other cases, one would really want comparison instead.

Each situation is different, and requires its own consideration.

@matthewhammer
Copy link
Contributor

matthewhammer commented Nov 17, 2022

It would be helpful to see the full source code of the stress test canister. In particular, what part of the Trie API does it use? Only put and get (or find)?

Just in case my plea above got buried in the discussion, I want to point at it again.

I can offer much more effective assistance if you share your test canister code, or at least the parts that build or changes
the Trie.

Seeing the accesses that work or do not is also instructive, but that's not where the structure is being corrupted.

If for some reason you cannot share the code. Please re-run the code that builds the Trie with test data and use isValid after each mutation, to isolate the one that is corrupting the Trie. Let me know what operation(s) those are.

If they are merely uses of put or replace, I will be surprised, and it will indicate some serious issue with the Trie's algorithm that have been undiscovered until now. Being able to reproduce that locally will be important for me to fix it.

@nomeata
Copy link
Contributor

nomeata commented Nov 17, 2022

"Trivial"? You assume that the attacker controls the entire input string to the hash, right? That does not seem to be the case here, though, or in any case where there's even a "salt" value hidden by the canister (something simple to do if one was worried, though not implemented here).

Developers who are worried are not the ones at risk. It's those that are not worried that we have to worry about. And I have seen plenty of real world Motoko code that's live on the IC that uses Maps (HashMap mostly, I believe, luckily, although I should check) with keys being use-chosen blobs (principals for example). That's why this worries me so much (less now that I saw it traps instead of silently becoming invalid, but is that true for all usage patterns?)

@iclighthouse
Copy link
Author

iclighthouse commented Nov 18, 2022

Just in case my plea above got buried in the discussion, I want to point at it again.

I can offer much more effective assistance if you share your test canister code, or at least the parts that build or changes the Trie.

Seeing the accesses that work or do not is also instructive, but that's not where the structure is being corrupted.

If for some reason you cannot share the code. Please re-run the code that builds the Trie with test data and use isValid after each mutation, to isolate the one that is corrupting the Trie. Let me know what operation(s) those are.

If they are merely uses of put or replace, I will be surprised, and it will indicate some serious issue with the Trie's algorithm that have been undiscovered until now. Being able to reproduce that locally will be important for me to fix it.

We are trying to build test code that reproduces this issue.

The Trie API that has been used in the code includes,

  • Trie.empty
  • Trie.put
  • Trie.remove
  • Trie.filter
  • Trie.mapFilter
  • Trie.get
  • Trie.iter

This is all.

Other symptoms of stress testing in case of issue.

  • input/output message queue size close to 500 (sometimes self-call errors are encountered)
  • 100+ accesses asynchronously being processed at the same time.
  • Cycles were heavily pre-charged, about 1.5 TCycles, and finally returned about 1.4 TCycles.
  • Sometimes an item is not readable via Trie.get, but becomes readable again when the Trie has new data stored.

I think that Trie.put has an issue when refactoring the tree structure. Or, Trie.get/find is at some critical value when routing based on hash (e.g. in the case of equality at comparison).

@iclighthouse
Copy link
Author

iclighthouse commented Nov 18, 2022

Developers who are worried are not the ones at risk. It's those that are not worried that we have to worry about. And I have seen plenty of real world Motoko code that's live on the IC that uses Maps (HashMap mostly, I believe, luckily, although I should check) with keys being use-chosen blobs (principals for example). That's why this worries me so much (less now that I saw it traps instead of silently becoming invalid, but is that true for all usage patterns?)

If the hash is 4 bytes in the discrete case, the probability of one collision is 1/(2**32) = 2.3E-10.
The probability of more than 8 collisions is 1/((2**32)**8) = 8.6E-78.
So it can be considered safe.
However, in the non-discrete case, if the hacker influences the hash value by the input value, there is an attack vector.

@iclighthouse
Copy link
Author

iclighthouse commented Nov 18, 2022

We reproduced the issue in test canister po4px-oiaaa-aaaak-acubq-cai
If you want to redeploy canister, you need to write the trieLog data in manually.

Testing:

dfx canister --network ic call po4px-oiaaa-aaaak-acubq-cai reset
dfx canister --network ic call po4px-oiaaa-aaaak-acubq-cai reproduce '(false)'

the Trie is broken when putting item (txid = blob "\00\00\00\0b\0e}\a7\b0\12\1d\9a|2J$\ea\04\8d\ddaS\ae\ed\f6M\ad\a0\e0\ab@\f0m")

code:

/**
 * Module     : Trie Testing
 * Author     : ICLighthouse Team
 */

import Trie "mo:base/Trie";
import Principal "mo:base/Principal";
import Blob "mo:base/Blob";
import Array "mo:base/Array";
import Nat "mo:base/Nat";
import Nat32 "mo:base/Nat32";
import Time "mo:base/Time";
import Option "mo:base/Option";
import Iter "mo:base/Iter";
import List "mo:base/List";

// Canister-id is po4px-oiaaa-aaaak-acubq-cai
// If you want to redeploy canister, you need to write the trieLog data in manually.
// Testing:
//      dfx canister --network ic call po4px-oiaaa-aaaak-acubq-cai reset
//      dfx canister --network ic call po4px-oiaaa-aaaak-acubq-cai reproduce '(false)'

shared(installMsg) actor class ICDexPair() = this {
    // Types
    type Txid = Blob;
    type BalanceChange = {
        #DebitRecord: Nat; 
        #CreditRecord: Nat; 
        #NoChange;
    };
    type TradingOrder = {
        account: Blob;
        icrc1Account: ?{owner: Principal; subaccount: ?Blob; };
        txid: Blob;
        orderType: { #LMT; #FOK; #FAK; #MKT; };
        orderPrice: { quantity: {#Buy: (quantity: Nat, amount: Nat); #Sell: Nat; }; price: Nat; };
        time: Time.Time;
        expiration: Time.Time;
        toids: [Nat];
        remaining: { quantity: {#Buy: (quantity: Nat, amount: Nat); #Sell: Nat; }; price: Nat; };
        refund: (token0: Nat, token1: Nat, toid: Nat);
        filled: [{counterparty: Blob; token0Value: BalanceChange; token1Value: BalanceChange; time: Time.Time }];
        status: { #Todo; #Pending; #Closed; #Cancelled; };
        gas : { gas0: Nat; gas1: Nat; };
        fee : { fee0: Nat; fee1: Nat; };
        index : Nat;
        nonce: Nat;
        data: ?Blob;
    };

    // Variables
    private stable var index : Nat = 0;
    private stable var icdex_orders : Trie.Trie<Txid, TradingOrder> = Trie.empty();

    // The trieLog data comes from the canister where the issue was found and has been pre-stored here.
    // The data can be queried by `testLog '(null)'`
    private stable var trieLog = List.nil<(Txid, Text, Bool)>(); // (Txid, Method, IsValid)

    private func keyb(t: Blob) : Trie.Key<Blob> { return { key = t; hash = Blob.hash(t) }; };
    private func buildItem(_txid: Blob) : TradingOrder{
        let tradingOrder: TradingOrder = {
                account = Blob.fromArray([]);
                icrc1Account = ?{owner = Principal.fromText("aaaaa-aa"); subaccount = null; };
                txid = _txid;
                orderType = #LMT;
                orderPrice = { quantity = #Sell(50000); price = 10000; };
                time = Time.now();
                expiration = Time.now() + 10000*1000000000;
                remaining = { quantity = #Sell(50000); price = 10000; };
                toids = [];
                filled = [];  
                status = #Todo;
                refund = (0, 0, 0);
                gas = {gas0=0; gas1=0};
                fee = {fee0=0; fee1=0};
                index = index;
                nonce = index;
                data = null;
            };
        index += 1;
        return tradingOrder;
    };

    private stable var step : Nat = 0;
    public func reset() : async Bool{
        icdex_orders := Trie.empty();
        step := 0;
        return Trie.isValid(icdex_orders, true);
    };
    public func reproduce(enStep: Bool) : async (Txid, Text, Bool){ // Returns (Txid, Method, IsValid)
        var resTxid : Blob = Blob.fromArray([]);
        var resMethod : Text = "";
        var resTrieValid : Bool = true;
        var _step : Nat = 0;
        label test for ((txid, method, isValid) in Array.reverse(List.toArray(trieLog)).vals()){
            if (_step == step){
                step += 1;
                if (method == "put"){
                    icdex_orders := Trie.put(icdex_orders, keyb(txid), Blob.equal, buildItem(txid)).0;
                }else if (method == "remove"){
                    icdex_orders := Trie.remove(icdex_orders, keyb(txid), Blob.equal).0;
                }else if (method == "filter"){
                    icdex_orders := Trie.filter(icdex_orders, func (k: Txid, v: TradingOrder):Bool{ true });
                };
                if (not(enStep) and not(Trie.isValid(icdex_orders, true))){
                    resTxid := txid;
                    resMethod := method;
                    resTrieValid := false;
                    break test;
                }else if (enStep){
                    resTxid := txid;
                    resMethod := method;
                    resTrieValid := Trie.isValid(icdex_orders, true);
                    break test;
                };
            };
            _step += 1;
        };
        return (resTxid, resMethod, resTrieValid);
    };

    public query func testLog(_isValid: ?Bool) : async [(Txid, Text, Bool)]{
        switch(_isValid){
            case(?(isValid)){ 
                let list = List.filter(trieLog, func (t:(Txid, Text, Bool)): Bool{
                    t.2 == isValid
                });
                return List.toArray(list);
            };
            case(_){ 
                return List.toArray(trieLog); 
            };
        };
    };
    public query func testTrieValid() : async Bool{
        return Trie.isValid(icdex_orders, true);
    };
    


};

@iclighthouse
Copy link
Author

It is recommended to do stress tests on all modules of the base library, and to issue special documents to tell developers under what scenarios to use. Give developers a basis for making decisions.

@nomeata
Copy link
Contributor

nomeata commented Nov 18, 2022

Just reading through the code and trying to understand how it works. Should bitpos be bumped in the call to leaf in

rec(bitpos, branch(leaf(ll, bitpos), leaf(lr, bitpos)), tr)

@nomeata
Copy link
Contributor

nomeata commented Nov 18, 2022

Ah, and since you mention filter, that also looks fishy: this code looks like it's doing Trie path compression, but the tries here don't support that, as the bitpos of each branch is inferred from the node depth. So this will easily break the structure:

motoko-base/src/Trie.mo

Lines 981 to 988 in add126e

case (#branch(b)) {
let fl = rec(b.left, bitpos + 1);
let fr = rec(b.right, bitpos + 1);
switch (isEmpty(fl), isEmpty(fr)) {
case (true, true) { #empty };
case (false, true) { fl };
case (true, false) { fr };
case (false, false) { branch(fl, fr) };

@crusso
Copy link
Contributor

crusso commented Nov 18, 2022

@iclighthouse @matthewhammer

Thanks for the repro!

I've managed to create a variant of your code with some additional methods to retrieve the log from your canister (method setToRemoteLog()) and uses a new checkValid function to report why Trie.isValid fails with an error message.

There are also methods to view the log as candid or Motoko values (but it is large), in case someone want to extract a non-IC repro.

https://m7sm4-2iaaa-aaaab-qabra-cai.raw.ic0.app/?tag=2343906604

After calling setToRemotLog(), the error I get for reproduce(false) is:

(vec {0; 0; 0; 18; 136; 81; 61; 140; 15; 108; 43; 193; 62; 223; 38; 115; 34; 81; 63; 200; 39; 211; 72; 78; 34; 110; 44; 210; 105; 64; 228; 186}, "put", false, "len > MAX_LEAF_SIZE9")

@matthewhammer you should be able to experiment with this code by just using upgrades to add code without have to refetch the data each time. The log is stored in a stable variable.

This is the modified canister at the link:

/**
 * Module     : Trie Testing
 * Author     : ICLighthouse Team
 */

import Trie "mo:base/Trie";
import Principal "mo:base/Principal";
import Blob "mo:base/Blob";
import Array "mo:base/Array";
import Nat "mo:base/Nat";
import Nat32 "mo:base/Nat32";
import Time "mo:base/Time";
import Option "mo:base/Option";
import Iter "mo:base/Iter";
import List "mo:base/List";
import Hash "mo:base/Hash";

// Canister-id is po4px-oiaaa-aaaak-acubq-cai
// If you want to redeploy canister, you need to write the trieLog data in manually.
// Testing:
//      dfx canister --network ic call po4px-oiaaa-aaaak-acubq-cai reset
//      dfx canister --network ic call po4px-oiaaa-aaaak-acubq-cai reproduce '(false)'

shared(installMsg) actor class ICDexPair() = this {

   type Trie<K,V> = Trie.Trie<K,V>;
   type Key<K> = Trie.Key<K>;
   let MAX_LEAF_SIZE = 8;

   var error : Text = "";
   func debugPrint(t : Text) : () {
     error #= t;
   };
       /// Checks the invariants of the trie structure, including the placement of keys at trie paths
   func checkValid<K, V>(t : Trie<K, V>, enforceNormal : Bool) : Bool {
    func rec(t : Trie<K, V>, bitpos : ?Hash.Hash, bits : Hash.Hash, mask : Hash.Hash) : Bool {
      switch t {
        case (#empty) {
          switch bitpos {
            case null { true };
            case (?_) { 
              not enforceNormal };
          }
        };
        case (#leaf(l)) {
          let len = List.size(l.keyvals);
          ((len <= MAX_LEAF_SIZE) or (not enforceNormal) or
           (do { debugPrint("len > MAX_LEAF_SIZE" # debug_show len); false })
          )
          and
          (len == l.size
            or
           (do { debugPrint("len != "); false }))
          and
          ( List.all(
              l.keyvals,
              func ((k : Key<K>, v : V)) : Bool {
                ((k.hash & mask) == bits)
                or
                (do {
                   debugPrint("\nmalformed hash!:\n");
                   debugPrint(debug_show k.hash);
                   debugPrint("\n (key hash) != (path bits): \n");
                   debugPrint(debug_show bits);
                   debugPrint("\nmask  : ");
                   debugPrint(debug_show(mask));
                   debugPrint("\n");
                   false
                  })
                 }
               )
              or
             (do { debugPrint("one or more hashes are malformed"); false })
          )
        };
        case (#branch(b)) {
          let bitpos1 =
            switch bitpos {
              case null  {Nat32.fromNat(0)};
              case (?bp) {Nat32.fromNat(Nat32.toNat(bp) + 1)}
            };
          let mask1 = mask | (Nat32.fromNat(1) << bitpos1);
          let bits1 = bits | (Nat32.fromNat(1) << bitpos1);
          let sum = Trie.size(b.left) + Trie.size(b.right);
          (b.size == sum
           or (do { debugPrint("malformed size"); false })
           )
          and
          rec(b.left,  ?bitpos1, bits,  mask1)
          and
          rec(b.right, ?bitpos1, bits1, mask1)
        };
      }
    };
    rec(t, null, 0, 0)
  };

    // Types
    type Txid = Blob;
    type BalanceChange = {
        #DebitRecord: Nat; 
        #CreditRecord: Nat; 
        #NoChange;
    };
    type TradingOrder = {
        account: Blob;
        icrc1Account: ?{owner: Principal; subaccount: ?Blob; };
        txid: Blob;
        orderType: { #LMT; #FOK; #FAK; #MKT; };
        orderPrice: { quantity: {#Buy: (quantity: Nat, amount: Nat); #Sell: Nat; }; price: Nat; };
        time: Time.Time;
        expiration: Time.Time;
        toids: [Nat];
        remaining: { quantity: {#Buy: (quantity: Nat, amount: Nat); #Sell: Nat; }; price: Nat; };
        refund: (token0: Nat, token1: Nat, toid: Nat);
        filled: [{counterparty: Blob; token0Value: BalanceChange; token1Value: BalanceChange; time: Time.Time }];
        status: { #Todo; #Pending; #Closed; #Cancelled; };
        gas : { gas0: Nat; gas1: Nat; };
        fee : { fee0: Nat; fee1: Nat; };
        index : Nat;
        nonce: Nat;
        data: ?Blob;
    };

    // Variables
    private stable var index : Nat = 0;
    private stable var icdex_orders : Trie.Trie<Txid, TradingOrder> = Trie.empty();

    // The trieLog data comes from the canister where the issue was found and has been pre-stored here.
    // The data can be queried by `testLog '(null)'`
    private stable var trieLog = List.nil<(Txid, Text, Bool)>(); // (Txid, Method, IsValid)

    private func keyb(t: Blob) : Trie.Key<Blob> { return { key = t; hash = Blob.hash(t) }; };
    private func buildItem(_txid: Blob) : TradingOrder{
        let tradingOrder: TradingOrder = {
                account = Blob.fromArray([]);
                icrc1Account = ?{owner = Principal.fromText("aaaaa-aa"); subaccount = null; };
                txid = _txid;
                orderType = #LMT;
                orderPrice = { quantity = #Sell(50000); price = 10000; };
                time = Time.now();
                expiration = Time.now() + 10000*1000000000;
                remaining = { quantity = #Sell(50000); price = 10000; };
                toids = [];
                filled = [];  
                status = #Todo;
                refund = (0, 0, 0);
                gas = {gas0=0; gas1=0};
                fee = {fee0=0; fee1=0};
                index = index;
                nonce = index;
                data = null;
            };
        index += 1;
        return tradingOrder;
    };

    private stable var step : Nat = 0;
    public func reset() : async Bool{
        icdex_orders := Trie.empty();
        step := 0;
        return Trie.isValid(icdex_orders, true);
    };
    public func reproduce(enStep: Bool) : async (Txid, Text, Bool, Text){ // Returns (Txid, Method, IsValid)
        error := "";
        var resTxid : Blob = Blob.fromArray([]);
        var resMethod : Text = "";
        var resTrieValid : Bool = true;
        var _step : Nat = 0;
        label test for ((txid, method, isValid) in Array.reverse(List.toArray(trieLog)).vals()){
            if (_step == step){
                step += 1;
                if (method == "put"){
                    icdex_orders := Trie.put(icdex_orders, keyb(txid), Blob.equal, buildItem(txid)).0;
                }else if (method == "remove"){
                    icdex_orders := Trie.remove(icdex_orders, keyb(txid), Blob.equal).0;
                }else if (method == "filter"){
                    icdex_orders := Trie.filter(icdex_orders, func (k: Txid, v: TradingOrder):Bool{ true });
                };
                if (not(enStep) and not(checkValid(icdex_orders, true))){
                    resTxid := txid;
                    resMethod := method;
                    resTrieValid := false;
                    break test;
                }else if (enStep){
                    resTxid := txid;
                    resMethod := method;
                    resTrieValid := checkValid(icdex_orders, true);
                    break test;
                };
            };
            _step += 1;
        };
        return (resTxid, resMethod, resTrieValid, error);
    };

    public query func testLog(_isValid: ?Bool) : async [(Txid, Text, Bool)]{
        switch(_isValid){
            case(?(isValid)){ 
                let list = List.filter(trieLog, func (t:(Txid, Text, Bool)): Bool{
                    t.2 == isValid
                });
                return List.toArray(list);
            };
            case(_){ 
                return List.toArray(trieLog); 
            };
        };
    };
    public query func testTrieValid() : async Bool{
        return Trie.isValid(icdex_orders, true);
    };


    public func setToRemoteLog() : async [(Txid,Text, Bool)] {
      let remote = actor "po4px-oiaaa-aaaak-acubq-cai" : actor {
           testLog : query ?Bool -> async [(Txid,Text, Bool)];
       };
      let rlog = await remote.testLog(null);
      trieLog := List.fromArray(rlog);
      rlog;

    };

    public func testLogText() : async Text {
      debug_show trieLog;
    }
    

};

@matthewhammer
Copy link
Contributor

matthewhammer commented Nov 18, 2022

Ah, and since you mention filter, that also looks fishy: this code looks like it's doing Trie path compression, but the tries here don't support that, as the bitpos of each branch is inferred from the node depth. So this will easily break the structure:

motoko-base/src/Trie.mo

Lines 981 to 988 in add126e

case (#branch(b)) {
let fl = rec(b.left, bitpos + 1);
let fr = rec(b.right, bitpos + 1);
switch (isEmpty(fl), isEmpty(fr)) {
case (true, true) { #empty };
case (false, true) { fl };
case (true, false) { fr };
case (false, false) { branch(fl, fr) };

Yes, I totally agree. Thank you for spotting this invariant violation! It's a little subtle (unless one really internalizes the invariant about paths and bits, as you did). Thanks again.

To fix that code, it should only trim the case where there's a binary node and both of its subtrees are empty. In that case, there is no key below (in a subtree) that gets misaligned.

Update:

Just reading through the code and trying to understand how it works. Should bitpos be bumped in the call to leaf in

rec(bitpos, branch(leaf(ll, bitpos), leaf(lr, bitpos)), tr)

Thanks for catching this. It doesn't appear relevant to this specific issue, but could may well be a separate bug of its own. I'll look at it more closely after we resolve the buggy behavior here.

@matthewhammer
Copy link
Contributor

matthewhammer commented Nov 18, 2022

And I have seen plenty of real world Motoko code that's live on the IC that uses Maps (HashMap mostly, I believe, luckily, although I should check) with keys being use-chosen blobs (principals for example). That's why this worries me so much (less now that I saw it traps instead of silently becoming invalid, but is that true for all usage patterns?)

The Trie should never become invalid, no matter how the API is used.

That it did become invalid here was a result of the logical error in filter and mapFilter that you found. Since neither of those functions are used in the Playground (and neither is remove, actually), these bugs didn't come up in that simple use case of the Trie. Once the remaining latent bugs are found (if any), I expect the API to ensure structural integrity. (You found something suspicious in the merge logic, and that needs more testing too.)

However, even if the Trie is structured correctly at all times (as it should be), its hash function (independent of the Trie itself) could still be attacked, and I'm beginning to agree that's an issue that is worrying, and probably means that a comparison-based structure is the safest choice in most cases (or all cases).

FWIW, Timo Hanke has been following a related internal discussion that this one seeded. He wrote this today, and said I could share here:

I am also in favour of comparison-based data structures over hash-based ones.
If the keys are adversary controlled (generally user controlled) then hash-based data structures don’t work that well. And I don’t think changing from Nat32 to Nat64 helps changes that. There may be certain situation where it does, but not generally.

So maybe every use of HashMap or Trie should plan to migrate to a Motoko-based BTree, though none is in base today.

Relatedly, here's a candidate BTree in Motoko to consider.

@iclighthouse
Copy link
Author

iclighthouse commented Nov 19, 2022

So maybe every use of HashMap or Trie should plan to migrate to a Motoko-based BTree, though none is in base today.

Relatedly, here's a candidate BTree in Motoko to consider.

Trie is a good data structure, hopefully it will be fixed soon. 4 bytes length hash is safe for most use cases, such as using Principal or salted hash as key.

RBTree is also a kind of B-Tree, any comment on it?

@matthewhammer
Copy link
Contributor

matthewhammer commented Nov 21, 2022

RBTree is also a kind of B-Tree, any comment on it?

If you haven't yet, it may be instructive to check out the data structure comparisons by @chenyan-dfinity here:

The RBTree could have better performance as a BTree, for a few reasons:

  • Current RBTree is purely functional, which is nice for some uses but,
  • Would be faster and more space efficient if it were imperative (e.g., as a BTree) and,
  • By using a BTree sructure (not a binary RBTree), there'd be less overhead taken by pointers and rebalancing them, since each node would use arrays and only need pointers when these arrays "fill up".
  • Finally, because purely-functional removal from an RBTree is very complex (it requires a more complex color system than R or B), it is not implemented fully here in base (until someone does it, or does the BTree instead). The current implementation punts on removal a bit, by removing the data logically, but not reclaiming the key or rebalancing the tree. The tree remains balanced, but generally leaks space if the keys are coming and going frequently.

@matthewhammer
Copy link
Contributor

matthewhammer commented Nov 21, 2022

Trie is a good data structure, hopefully it will be fixed soon. 4 bytes length hash is safe for most use cases, such as using Principal or salted hash as key.

Thanks for the support. I also like the Trie, personally. (It has some uniquely good properties for incremental computing, which could be relevant in some future application of Motoko, for someone, someday.)

As its main author from many years ago, I apologize for the current poor situation regarding its unit tests, and the bugs you found in these lesser-used operations (filter and mapFilter).

The backstory for this structure is that it was one of the first modules I wrote in Motoko, and I even wrote it initially before Motoko had variant types! (I initially encoded them with the other types it had at the time). At some point along the way, it was important to add unit tests but I was probably busy doing something else, and it didn't happen. Again, sorry for that.

FWIW, this PR finally adds unit tests for put, remove, filter and mapFilter.

My conclusion after doing those tests is that the filter and mapFilter logic we had was breaking important invariants (the one that @nomeata was observing), but now they are fixed. Specifically, all the operations in that little suite preserve the invariants of the Trie, after each operation. They also appear to be functionally correct.

The unit tests I wrote demonstrate a pattern that could work for other operations, but not all. I need to think more about how to best test the binary operations like merging and joining.

@iclighthouse
Copy link
Author

iclighthouse commented Nov 22, 2022

Trie is a good data structure, hopefully it will be fixed soon. 4 bytes length hash is safe for most use cases, such as using Principal or salted hash as key.

Thanks for the support. I also like the Trie, personally. (It has some uniquely good properties for incremental computing, which could be relevant in some future application of Motoko, for someone, someday.)

As its main author from many years ago, I apologize for the current poor situation regarding its unit tests, and the bugs you found in these lesser-used operations (filter and mapFilter).

The backstory for this structure is that it was one of the first modules I wrote in Motoko, and I even wrote it initially before Motoko had variant types! (I initially encoded them with the other types it had at the time). At some point along the way, it was important to add unit tests but I was probably busy doing something else, and it didn't happen. Again, sorry for that.

FWIW, this PR finally adds unit tests for put, remove, filter and mapFilter.

My conclusion after doing those tests is that the filter and mapFilter logic we had was breaking important invariants (the one that @nomeata was observing), but now they are fixed. Specifically, all the operations in that little suite preserve the invariants of the Trie, after each operation. They also appear to be functionally correct.

The unit tests I wrote demonstrate a pattern that could work for other operations, but not all. I need to think more about how to best test the binary operations like merging and joining.

In my test example above, the Tire.put operation also breaks the Trie structure.

public func reproduce2(enStep: Bool) : async (Txid, Text, Bool){ // Returns (Txid, Method, IsValid)
        var resTxid : Blob = Blob.fromArray([]);
        var resMethod : Text = "";
        var resTrieValid : Bool = true;
        var _step : Nat = 0;
        label test for ((txid, method, isValid) in Array.reverse(List.toArray(trieLog)).vals()){
            if (_step == step){
                step += 1;
                if (method == "put"){
                    icdex_orders := Trie.put(icdex_orders, keyb(txid), Blob.equal, buildItem(txid)).0;
                };
                if (not(enStep) and not(Trie.isValid(icdex_orders, true))){
                    resTxid := txid;
                    resMethod := method;
                    resTrieValid := false;
                    break test;
                }else if (enStep){
                    resTxid := txid;
                    resMethod := method;
                    resTrieValid := Trie.isValid(icdex_orders, true);
                    break test;
                };
            };
            _step += 1;
        };
        return (resTxid, resMethod, resTrieValid);
    };

Destruction occurred here

(
  blob "\00\00\00\08\c0e\1awH\aa\ee\a7\e3*A\b4\cc\05\e4\16\8a\ec\d5Qob\faIv\c4\1f\c2",
  "put",
  false,
)

@iclighthouse
Copy link
Author

By the way, the underlying data structure will be less efficient to implement in motoko, is it possible to write it at Prim?

@nomeata
Copy link
Contributor

nomeata commented Nov 22, 2022

If you haven't yet, it may be instructive to check out the data structure comparisons by @chenyan-dfinity here:

That sounds exciting! Can it be made public?

@matthewhammer
Copy link
Contributor

@chenyan-dfinity Can we make this repo public?

Apologies, I didn't realize that was internal only!

@nomeata
Copy link
Contributor

nomeata commented Nov 24, 2022

Just reading through the code and trying to understand how it works. Should bitpos be bumped in the call to leaf in

rec(bitpos, branch(leaf(ll, bitpos), leaf(lr, bitpos)), tr)

This might be ok™: The bitpos passed to leaf is wrong, but since the l1 and l2 are both short lists (< MAX_LEAF_SIZE), the leaf smart constructor will never create any branches, and the bitpos parameter is not used.

It might be clearer to not use leaf here, but #leaf directly.

@letmejustputthishere
Copy link
Member

If there is canister in production that uses TrieMap.filter, what is the recommended way to upgrade that to a new version of the base library without losing state? Is it enough to convert the TrieMap to an array in the preupgrade and then initalize a new TrieMap from that array after the upgrade?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants