Improve ITCH parser and writer #8

vgreg · 2024-04-24T15:43:56Z

See if we can speedup parser.

vgreg · 2024-04-24T15:50:52Z

Potential ideas for speedup:

Avoid converting to string for comparison in long if...
Reorder message in if based on frequency.

New improved output formats:

As (rich) markdown
As JSON
As arrow-ready format (common for all messages, to store in parquet file)

The markdown output is for interactive work and for the CLI

The JSON output is to simplify development and debugging.

The arrow format is to be able to store historical files in parquet format for easy searching and extraction. That way, when we want to look at a subset of stocks on a given day, we can easily query the messages related to those symbol/days and process them.

vgreg · 2024-04-24T17:49:46Z

The overall parser architecture should be overhauled. The current approach is highly inefficient as it forces to store all messages in memory.

The more modern way to read large files like this would be to use a generator that can do automatic filtering:
https://realpython.com/introduction-to-python-generators/

It would also decouple two important aspects of the message parser: reading and writing. The "in-memory" representation is currently at the message level, but the code around it is very messy. We could have many readers (one for each file type, at the minimum a binary ITCH reader, but potentially also parquet, JSON, etc...)

We could also have many writers, one for each file type.

The formatting logic could be defined at the message level.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ITCH parser and writer #8

Improve ITCH parser and writer #8

vgreg commented Apr 24, 2024

vgreg commented Apr 24, 2024

vgreg commented Apr 24, 2024

Improve ITCH parser and writer #8

Improve ITCH parser and writer #8

Comments

vgreg commented Apr 24, 2024

vgreg commented Apr 24, 2024

vgreg commented Apr 24, 2024