Review discussion on new changes for stateless agents #1

0xsuryansh · 2024-03-12T21:46:45Z

0xsuryansh
Mar 12, 2024
Maintainer

Moving the PR conversation here

The database will be used as a cache to avoid unnecessary RPC calls

How do we know what is up to date and what's not in the DB? And how do we know about intents that are created between agent restarts?

To solve these problems I think we might need to persist the last block that we have indexed and start from there on next run.

There would also need to be some mechanism to continue processing unfinished intents (or other events that we care about) after restart.

To me this looks more like building around an event sourcing pattern where it will be better to store a sequence of state-changing events. Applications persist events in an event store, which is a database of events.
Whenever the state of the entity changes, a new event is appended to the list of events. The application on reboot can reconstructs current state by replaying the events.

I would also like to add that maybe later when we scale we can also use a distributed cache with built-in replication features. This should ensures that if one cache node fails, another can take over with minimal impact.
In the case of complete cache failure we can fallback to the database.. maybe have a write back policy for the data and the block indexed.

Counter Argument to myself : I don't think we need to store the events (again), they are already stored in the blockchain. It's more useful to store the state I believe.

let in_progress_intents = self.db_client.get_in_progress_intents().await?;

It's a bit strange how this is implemented. A new (infinite?) intent stream is created for every in progress intent.

I think it should work like this:

Indexer sees a new block. Any events in the block will be inserted into the db, and the block number is also recorded. This should happen in a single db tx. agent will handle these events. (Or they can be queued in a bounded channel.)
When the agent finishes handling an event, it will mark so in the db.
When the agent restarts:
it will query the db for any events whose handling are not finished, and handle them again.
(This will require that the event handling is actually idempotent, for there's a chance that the handling is actually finished but it's not recorded in the db.)
The indexer will start from the next block of the last block that it has indexed.
(There only needs to be one block number in the db, that is the last block that we have indexed.)

It's much like having a WAL.

On bounded channel:

Using a bounded channel between the indexer and the handler is desirable so the indexer can wait a bit if the handler cannot keep up. (This is often called backpressure.)

Artemis don't support bounded channel/backpressure (it uses pub/sub which will lose events if the handler cannot keep up). So it might be actually necessary to move away from Artemis here.

The WAL is for the agent/on the application level, it's not db WAL. It's also not a file but actually stored in the db as the e.g. in_progress field of intents.

There would be a bounded channel between the indexer and the handler:

let (events_tx, events_rx) = mpsc::channel();
Indexer task:

for b in db.get_last_indexed_block_number() + 1.. {
    let events = events_in_block(b);
    let tx = db.begin_tx();
    for e in &events {
        tx.save_event(e);
    }
    tx.save_last_indexed_block_number(b);
    tx.commit();
    events_tx.send(events);
}
Handler task:

for e in db.get_unhandled_events(e) {
    handle(e);
    db.save_event_handled(e);
}

for events in events_rx {
    for e in events {
        handle(e);
        db.save_event_handled(e);
    }
}

The handler may also choose to restore/maintain some in-memory state (CCMM would
probably work like this). In this case it may be desirable to postpone starting
the indexer, so that db state doesn't change when the handler is restoring
state.

let state = restore_state_from_db(db);

let (events_tx, events_rx) = mpsc::channel();

tokio::spawn(indexer_task(events_rx));

for events in events_rx {
    update_state_with_events_and_maybe_perform_actions(&mut state, events);
}

The handler may also save and emit tasks for the next stage of handlers to
perform.

On second thought, even for get_unhandled_events we should postpone starting the indexer so that we don't get duplicated events.

Having a channel in between will make things a bit more complicated. If we don't need the concurrency, we can use just a single task:

for e in db.get_unhandled_events(e) {
    handle(e);
    db.save_event_handled(e);
}

for b in db.get_last_indexed_block_number() + 1.. {
    let tx = db.begin_tx();
    let events = events_in_block(b);
    for e in &events {
        tx.save_event(e);
    }
    tx.save_last_indexed_block_number(b);
    tx.commit();
    
    for e in events {
        handle(e);
        db.save_event_handled(e);
    }
}

This is all to make sure that on arbitrary interruption and restart, we won't miss any events or any handling of the events.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review discussion on new changes for stateless agents #1

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Review discussion on new changes for stateless agents #1

0xsuryansh Mar 12, 2024 Maintainer

Replies: 0 comments

0xsuryansh
Mar 12, 2024
Maintainer