Async Firehose Client: block on make message handler call, add on error callback #157

DXsmiley · 2023-09-19T14:25:03Z

The original _AsyncWebsocketClient._process_message_frame spawns a new task and then returns immediately, allowing the client to enter a very fast client.recv() -> create_task loop.

    def _process_message_frame(self, frame: 'MessageFrame') -> None:
        task: asyncio.Task = self._loop.create_task(self._on_message_callback(frame))
        self._on_message_tasks.add(task)
        task.add_done_callback(self._on_message_callback_done)

This isn't a problem when consuming the head of the firehose in real-time, however when replaying past events, if message processing is at all non-trivial, it causes tasks to be spawned far faster than they can be completed. A glut of in-flight tasks creates a huge amount of scheduling overhead, and a high amount of contention on any shared resources such as databases, or internal locks. There's no way to slow down the frequency of client.recv() so you end up with a memory-leak-esque situation.

I think the original behaviour is unintuitive and quite an unexpected pit-fall, so I've replaced it with a normal (async) callback. The client waits for the callback to finish before receiving the next message. This means there's no "concurrency" out of the box, but provides a much more predictable interface for people to build their own systems on top of; I think it's valuable that the websocket client is as unopinionated about the message processing side of things as possible.

For example, the callback I'm currently using in my own code is simply this:

messages_to_process: 'asyncio.Queue[MessageFrame]' = asyncio.Queue(maxsize=20)

async def on_message_handler(message: 'MessageFrame') -> None:
    await messages_to_process.put(message)

messages_to_process.put blocks while the queue is full, which means that client.recv() is not called faster we can handle, and I'm able to easily configure whatever concurrency I want on the queue-consumer side. This still allows for message receiving and processing to occur simultaneously, but prevents the former from out-pacing the latter.

I've also made some changes to _WebsocketClient and _WebsocketClientBase to account for the differences between _WebsocketClient and _AsyncWebsocketClient.

MarshalX · 2023-09-20T09:51:21Z

Thank you! I'll take a look as soon as done with my main job

MarshalX

pls make sure that the difference between async and sync version _process_message_frame method only in "await" statements. for now, it is not. and probably we don't need _print_exception anymore? because now we are in the catch block

MarshalX · 2023-09-21T10:17:47Z

could we move this part to the base class?

    def _process_raw_frame(self, data: bytes) -> None:
        frame = Frame.from_bytes(data)
        if isinstance(frame, ErrorFrame):
            raise FirehoseError(XrpcError(frame.body.error, frame.body.message))
        if isinstance(frame, MessageFrame):
            self._process_message_frame(frame)
        else:
            raise FirehoseDecodingError('Unknown frame type')

idk smth like this:

    def _pre_process_raw_frame(self, data: bytes) -> Frame:   # in base
        frame = Frame.from_bytes(data)
        if isinstance(frame, ErrorFrame):
            raise FirehoseError(XrpcError(frame.body.error, frame.body.message))
        if not isinstance(frame, MessageFrame):
            raise FirehoseDecodingError('Unknown frame type')

        return frame
    
    def _process_raw_frame(self, data: bytes) -> None:  # in sync
        frame = self._pre_process_raw_frame(data)
        self._process_message_frame(frame)

    async def _process_raw_frame(self, data: bytes) -> None:  # in async
        frame = self._pre_process_raw_frame(data)
        await self._process_message_frame(frame)

MarshalX · 2023-09-21T10:23:08Z

btw dotn we want to have async version of _on_callback_error_callback in async client?

MarshalX

i like the use of wait_for :)

atproto/firehose/client.py

DXsmiley · 2023-09-28T10:59:10Z

I've allowed the stop event to interrupt the client.recv but it's uh... the code's not great I'm gonna be honest.

MarshalX · 2023-09-28T11:21:37Z

uh..

getting towards scope-creep for this PR.

let's not include wait_for tricks in this pr at all. as i understood that will not be ported to sync client. let's merge callback improvements

…n line with each other

DXsmiley · 2023-10-02T14:33:36Z

Alright, I've rolled back a few changes and cleaned things up a bit.

MarshalX · 2023-10-27T10:18:38Z

@DXsmiley thank you so much for your contribution! and sorry for a long time

DXsmiley added 3 commits September 19, 2023 23:46

rework async websocket client

a77680b

sate linter

d2443d5

sate other linter

bce92b5

MarshalX reviewed Sep 21, 2023

View reviewed changes

MarshalX and others added 3 commits September 22, 2023 10:52

Merge branch 'main' into improve-async-firehose-client

29f2fc8

make error handler async, reorganise process_raw_message_frame

793f97c

wake from sleep early if stop event is set

5681d2a

MarshalX reviewed Sep 27, 2023

View reviewed changes

break from client.recv when stop event is fired

c22ffbf

DXsmiley added 2 commits October 3, 2023 01:18

rollback changes to reduce scope of pr, keep sync and async clients i…

506488d

…n line with each other

Merge branch 'main' into improve-async-firehose-client

06e22a4

small refactoring

dbddbf7

MarshalX changed the title ~~Rework _AsyncWebsocketClient~~ Async Firehose Client: block on make message handler call, add on error callback Oct 27, 2023

remove duplicated event creation

f3bc940

MarshalX merged commit 24a19d7 into MarshalX:main Oct 27, 2023
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async Firehose Client: block on make message handler call, add on error callback #157

Async Firehose Client: block on make message handler call, add on error callback #157

DXsmiley commented Sep 19, 2023

MarshalX commented Sep 20, 2023

MarshalX left a comment •

edited

Loading

MarshalX commented Sep 21, 2023 •

edited

Loading

MarshalX commented Sep 21, 2023

MarshalX left a comment

DXsmiley commented Sep 28, 2023

MarshalX commented Sep 28, 2023 •

edited

Loading

DXsmiley commented Oct 2, 2023

MarshalX commented Oct 27, 2023

Async Firehose Client: block on make message handler call, add on error callback #157

Async Firehose Client: block on make message handler call, add on error callback #157

Conversation

DXsmiley commented Sep 19, 2023

MarshalX commented Sep 20, 2023

MarshalX left a comment • edited Loading

Choose a reason for hiding this comment

MarshalX commented Sep 21, 2023 • edited Loading

MarshalX commented Sep 21, 2023

MarshalX left a comment

Choose a reason for hiding this comment

DXsmiley commented Sep 28, 2023

MarshalX commented Sep 28, 2023 • edited Loading

DXsmiley commented Oct 2, 2023

MarshalX commented Oct 27, 2023

MarshalX left a comment •

edited

Loading

MarshalX commented Sep 21, 2023 •

edited

Loading

MarshalX commented Sep 28, 2023 •

edited

Loading