Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add opt in reentrancy to soroban #1491

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

heytdep
Copy link
Contributor

@heytdep heytdep commented Nov 6, 2024

What

This PR adds two new host functions for contracts to opt in reentrancy and enable the immediate call to be re-entrant.

Implementation details

I've chosen to go with two separate host functions for two main reasons:

  1. It allows to easily verify which contracts are and are not using reentrancy looking at the imports sigs.
  2. It's fully backwards compatible.
  3. There's no overhead thanks to streamlined linking (CAP 55 or 54 iirc).

Other implementation alternatives would be for the contract to have somewhere a flag set to enable reentrancy either in the binary (e.g stylus contracts, even though it's a different story there since theirs is a heavily memory based implementation), but it limits a contract that wants to use both reentrant and non-reentrant functions and is in general a worse implementation approach imo since it likely requires xdr changes for the flag to be set.

Why

This feature has been previously requested, it seems initially by orbitcdp (likely for non-approval-based flash loans), or on the dev server (https://discord.com/channels/897514728459468821/966788672164855829/1303384724181352449).

I believe that enabling opt-in re-entrancy is in general a good idea. Especially if added as separate host functions for discoverability (easy to discover since the binary would have the reentrancy call signature explicitly hardcoded). There are some use cases here, including doing better, and more similarly to the EVM defi logic (think callbacks).

Known limitations

It's very important that:

  1. The sdk implementation advises against using reentrancy. For instance it could be set as an unsafe function.
  2. Contracts that make cross contract calls understand that they need safeguards in place if the imported binary has reentrancy enabled. I think that this could be also exposed in the contractimport! macro, i.e reading from the binary's imports and raising a warning if the imported binary uses reentrant calls.

@heytdep heytdep requested review from graydon, sisuresh, dmkozh and a team as code owners November 6, 2024 10:06
@heytdep
Copy link
Contributor Author

heytdep commented Nov 6, 2024

I think it's also worth pointing out that this change makes it so that it's not the caller contract that decides whether the called contract can be reentrant or not. Given Soroban's current codebase layout this was the obvious implementation imo, but maybe we should change the host function's name because current naming might get developers confused and thinking that If they use call and not call_reentrant then everything is fine, which could result in hacks.

Tests

Also, about tests, I have to admit I'm not completely familiar with the Soroban SDK, so I just manually imported the external function into the contract. Happy to change that once/if the sdk supports this. Currently my test binaries look much different from the others, e.g

#![no_std]
use soroban_sdk::{contract, contractimpl, Env, Val, Address, Vec, Symbol, vec};

#[link(wasm_import_module = "d")]
extern "C" {
    #[allow(improper_ctypes)]
    #[link_name = "1"]
    pub fn call_reentrant(contract: i64, func: i64, args: i64, ) -> i64;
}

#[contract]
pub struct Contract;

#[contractimpl]
impl Contract {
    pub fn do_reentry(env: Env, caller: Address) {
        let args: Vec<Val> = vec![&env];
        let func = Symbol::new(&env, "do_nothing");
        let called_val = caller.as_val().get_payload() as i64;
        let func_val = func.as_val().get_payload() as i64;
        let args_val = args.as_val().get_payload() as i64;
        
        unsafe {
            call_reentrant(called_val, func_val, args_val);
        };
    }
}

@heytdep
Copy link
Contributor Author

heytdep commented Nov 7, 2024

Following the discord discussions and concerns about the following situation: A calls B, B calls_reentrant A, A gets called without issues, i.e a situation where the caller has no control over whether the called contract makes a reentrant call or not; I've implemented a reentry guard to be specified by the caller contract which will make the linker check on whether the called contract (B in this case) uses reentry or not.

Now the caller contract looks like the following:

#![no_std]
use soroban_sdk::{contract, contractimpl, Env, Address, Symbol, TryIntoVal, Vec, Val};

#[link(wasm_import_module = "d")]
extern "C" {
    #[allow(improper_ctypes)]
    #[link_name = "_"]
    pub fn call_contract(contract: i64, func: i64, args: i64, ) -> i64;

    #[allow(improper_ctypes)]
    #[link_name = "3"]
    pub fn set_reentrant(enabled: i64, ) -> i64;
}

#[contract]
pub struct Contract;

#[contractimpl]
impl Contract {
    pub fn test_reentry(env: Env, called: Address) {
        let args: Vec<Val> = (env.current_contract_address(), ).try_into_val(&env).unwrap();
        let func = Symbol::new(&env, "do_reentry");
        let called_val = called.as_val().get_payload() as i64;
        let func_val = func.as_val().get_payload() as i64;
        let args_val = args.as_val().get_payload() as i64;

        let set_reentrant_flag = Val::from_bool(true).as_val().get_payload() as i64;
        
        unsafe {
            set_reentrant(set_reentrant_flag);
            call_contract(called_val, func_val, args_val);
        };
    }

    pub fn do_nothing(env: Env) {
        env.events().publish((Symbol::new(&env, "first_soroban_reentry"),), ());
    }
}

Note the set_reentrant, which is a new function that enables to switch reentry mode in the host's context.

This implementation makes the following assumptions which should be checked:

  1. VMs instantiated from the cache will always need a reentry guard. I'm really doubtful on whether this actually is correct.
  2. VMs instantiated for uploads and to check the protocol version should not need a reentry guard (also because these can't set the reentry enabled in the context so all uploads would fail).

Copy link
Member

@leighmcculloch leighmcculloch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heytdep Thanks for looking at how reentry could be enabled.

Please correct me if I'm interpreting incorrectly, but I think this change proposes two ways to enable reentry.

I think there are benefits to there being a single way to enable the feature, and a single pattern to look out for to know if a contract is reentrant or not.

We should consider what the SDK interface will be and design down from that. I suspect when we do we'll find only one is needed.

The call_entrant approach has clear scope with a begin and end at a single call out. The set_reentrant has with raw usage an unbounded scope, which could lead to code being in scope that isn't intended to be, however it could have bounded scope if exposes to users in a way where set_reentrant(false) was always guaranteed to be called.

Both affects use in libraries. For example, a library may wish to make a call not knowing if the importer wants reentry or not. There needs to be a way for the importer to choose reentry. On the same note, a library may use reentry, but it's important that the scope doesn't escape the library.

I think the way I'd like to see this implemented in the SDK that would serve both examples above, while retaining scope, is to offer one SDK function that controls reentry for a scope of code, and the SDK in the implementation of that function call set_reentry_enabled(true|false) at the start and at the end of the function. For example:

use soroban_sdk::{contract, contractimpl, vec, symbol_short, BytesN, Env, Symbol, Vec};

#[contract]
pub struct Contract;

#[contractimpl]
impl Contract {
    pub fn exec(env: Env) {
        // ...
        env.allow_reentry(|| {
            client_for_other_contract.call_fn(...)
            library::fncallthatmaycallothercontract(...)
        });
        // ...
    }
}

}
],
"return": "Val",
"docs": "Calls a function in another contract with arguments contained in vector `args`. If the call is successful, returns the result of the called function. Traps otherwise. This functions enables re-entrancy in the immediate cross-contract call.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functions enables re-entrancy in the immediate cross-contract call.

In an outgoing call graph that looks like this:

flowchart LR
    A --> B --> C --> D
Loading

Does it mean 1️⃣ that reentry into A is only allowed from B? For example:

flowchart LR
    A --> B --> C --> D
    B --> A
Loading

Or 2️⃣ that reentry into A is allowed from any contract that is executing further down the stack than A's call out? For example:

flowchart LR
    A --> B --> C --> D
    B --> A
    C --> A
    D --> A
Loading

I think 2️⃣ fits better with narrative on what it means to call out to another contract. What that other contract does, whether it is a monolith or a micro component is irrelevant to the calling contract. The calling contract merely needs to acknowledge that it is capable to handle being re-entered during the call. By who, is mostly irrelevant.

Are there cases where the who is important?

If I'm completely misunderstanding, please correct me 😄.

cc @dmkozh @sisuresh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this was also partially discussed on the thread. Currently the implementation is fairly unsafe because when reentrancy is enabled the called contract can be reentrant on any contract down the stack. This means that scenario 2 (which is what I think we should go for) is possible, but it also makes possible a scenario where e.g D is reentrant on B which didn't have reentrancy enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to confirm that I think we should go with 2 (reentry is enabled into A only, but from any downstream contract in the reentry scope). One extension we've discussed in the thread is to also limit the depth of reentry explicitly (i.e. don't specify who can reenter, but have an ability to limit the reentry e.g. to the direct calls), but I'm not sure yet if that's necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think depth is unnecessary and somewhat breaks the abstraction and possibly interop. For eg someone puts an arbitrary depth and then some pluggavle contract invocations work while others don't.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think for depth the only useful options are 1 (self-reentry), 2 (call a contract that re-enters into me) and 'infinity' (for arbitrary reentrance). Self-reentry is something we do for auth, so it's definitely not useless, but I'm not sure if there are use cases for it in the regular contracts (likely it can be better achieved just by factoring out the function logic into regular functions). Depth 2 may be useful for some well-defined protocols that want to protect themselves from a 'man-in-the-middle' scenarios (e.g. imagine contract A calls contract B and it trusts contract B to reenter contract A, but doesn't trust any contracts C,D,... that B may call). My intuition is that A->B->A reentrance may be useful quite often and tight coupling between A and B is actually intentional for that. Anything above 2 is probably too fine-grained and should just fall into 'infinity' bucket. So if I were to implement depth, I'd probably go with these 3 options. That said, I'm also ok with leaving the depth out completely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not strong opposed to a depth parameter, but I don't yet grasp where it would move the needle on safety. It looks like it could be used to limit exposure, but on its own doesn't actually make contracts reentrant safe, and may harm interop and contract composability.

I'm saying this mostly because when it comes to reentry, A should not trust any dependency to reenter in an unsafe way. A should be internally defensively organised so that its state, irrespective of when or how it is reentered, cannot be moved into a state that is invalid.

The only exception I can think of is when A and B are components of a single system, sharing the same developer. But the presence of an intentional vulnerability in A that B promises not to exploit still doesn't feel right.

Use cases and concrete examples are needed to work out if depth is needed, or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could work through some case studies to determine usefulness.

Comment on lines +1607 to +1617
"export": "3",
"name": "set_reentrant",
"args": [
{
"name": "enabled",
"type": "Bool"
}
],
"return": "Void",
"docs": "Enables the current contract to specify the reentrancy rules.",
"min_supported_protocol": 21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea with set_reentrant that it provides an alternative way to enable reentry on all the following regular call / try_call host fns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set reentrant just instructs the host that the contract in this context (or better, currently it sets the flag on the whole host). The current implementation requires two specialized host functions to have the callparam option with the reentry enabled (while call and try_call always have it disabled), and set_reentrant is an additional guard to make sure that if it's set to false call_reentrant will fail. So set_reentrant(true) and call in the current implementation would result in a trap anyways. This would change if we inherit the reentry guard from the context i.e what dmytro brought up in the discord.

@heytdep
Copy link
Contributor Author

heytdep commented Nov 12, 2024

For future reference, this implementation is unsafe and will be changed into a call stack based reentry guard:

  1. This scenario is wrongly allowed: A calls B, B sets reentrant to true and calls C, C is reentrant on A
  2. Connecting to the above the set reentrant sets a single bool in the host which is shared among VMs in the call stack. So allowing a host function to change this is unsafe.

@jayz22
Copy link
Contributor

jayz22 commented Nov 22, 2024

For future reference, this implementation is unsafe and will be changed into a call stack based reentry guard:

  1. This scenario is wrongly allowed: A calls B, B sets reentrant to true and calls C, C is reentrant on A
  2. Connecting to the above the set reentrant sets a single bool in the host which is shared among VMs in the call stack. So allowing a host function to change this is unsafe.

Sorry for being late into the discussion. I would just like to comment that in general exposing a mechanism allowing a contract to modify the shared global state of the host should be disallowed/strongly discouraged (unless for testing).

In this particular case, iiuc the linker-level reentry guard makes a specific assumption about the ordering of when contracts are instantiated (i.e. A sets reentry flag before calling B, which gets instantiated afterwards). We should make no such assumptions, as the host makes no guarantee of contract instantiation ordering. There is a cache containing pre-instantiated contract modules and it is highly possible the cache will be more global (e.g. across multiple ledgers) or include a fully instantiated contract VM (not just a wasmi::Module) in the future. Which invalidates any temporal assumption of contract instantiation ordering.

@heytdep
Copy link
Contributor Author

heytdep commented Nov 24, 2024

@jayz22 yeah absolutely, both the linker guard (which was actually a "hotfix") and the global host reentry state would not be in the actual proposal since I plan for the guard to be inferred by the context stack as I mentioned above (it's better in every way).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants