Use COPY to speed up database writes for blocks and traces #211

lukevs · 2022-01-03T20:17:03Z

Following the advice of this guide (and using their string io helper)
https://hakibenita.com/fast-load-data-python-postgresql

Locally on a 30 block test this speeds up from 92.63 seconds to 77.5 seconds

I'll make follow on PRs for the other tables once we see how this performs in production

Will also mess around with our block batch size (hoping we can increase it with this)

Testing

To test, I wrote this to export all of the tables for a given block

def export_all(block_number):
    inspect_db_session = get_inspect_session()

    tables_sorts = [
        ("arbitrages", "block_number, transaction_hash, profit_amount DESC"),
        ("blocks", "block_number DESC"),
        ("classified_traces", "block_number, transaction_hash, trace_address DESC"),
        ("liquidations", "block_number, transaction_hash, trace_address DESC"),
        ("miner_payments", "block_number, transaction_hash DESC"),
        ("nft_trades", "block_number, transaction_hash, trace_address DESC"),
        ("punk_bid_acceptances", "block_number, transaction_hash, trace_address DESC"),
        ("punk_bids", "block_number, transaction_hash, trace_address DESC"),
        ("punk_snipes", "block_number, transaction_hash, trace_address DESC"),
        ("sandwiched_swaps", "block_number, transaction_hash, trace_address DESC"),
        ("sandwiches", "block_number, frontrun_swap_transaction_hash, frontrun_swap_trace_address DESC"),
        ("swaps", "block_number, transaction_hash, trace_address DESC"),
        ("transfers", "block_number, transaction_hash, trace_address DESC"),
    ]

    skip_tables = [
        "arbitrage_swaps",
        "prices",
        "tokens",
    ]

    skip_columns = [
        "created_at",
        "classified_at",
        "id",
        "sandwich_id",
    ]

    for table, sorts in tables_sorts:
        results = inspect_db_session.execute(
            f"SELECT * FROM {table} WHERE block_number = :block_number ORDER BY {sorts}",
            params={"block_number": block_number},
        )
        for result in results:
            as_dict = result._asdict()

            for column in skip_columns:
                if column in as_dict:
                    del as_dict[column]

            print(json.dumps({k: str(v) for k, v in as_dict.items()}, sort_keys=True))

Then wrote this bash script to compare exports across main and this branch
we check which code is deployed by checking for a hello.txt file i committed to this branch (then removed for review)

block_number=$1

######### main #########
# main
git co main

# wait for code to update
echo "Waiting for code update"
sleep 5

# verify we're on main
kubectl exec deploy/mev-inspect -- ls hello.txt
if [ $? -eq 0 ]; then
    echo "Code not updated - exiting"
    exit
else
    echo "Verified correct code is running"
fi

# inspect block
./mev inspect $block_number

# export block
before_file="${block_number}_before.txt"
echo "Writing to $before_file"
kubectl exec deploy/mev-inspect -- poetry run python cli.py export-all $block_number > $before_file

######### branch #########
# main
git co faster-writes

# wait for code to update
echo "Waiting for code update"
sleep 5

# verify we're on main
kubectl exec deploy/mev-inspect -- ls hello.txt
if [ $? -eq 0 ]; then
    echo "Verified correct code is running"
else
    echo "Code not updated - exiting"
    exit
fi

# inspect block
./mev inspect $block_number

# export block
after_file="${block_number}_after.txt"
echo "Writing to $after_file"
kubectl exec deploy/mev-inspect -- poetry run python cli.py export-all $block_number > $after_file

echo "Diffing"
icdiff $before_file $after_file

echo "Cleaning up"
# rm $before_file
# rm $after_file

echo "Done"

I verified the created files include the expected data and that they show no diffs after removing ids and created ats (which are variable on each run)
I also updated the traces writing to write a 0 instead of type and verified that the differ caught this

lukevs · 2022-01-04T16:26:15Z

mev_inspect/crud/traces.py

+
+def _inputs_as_json(trace) -> str:
+    inputs = json.dumps(json.loads(trace.json(include={"inputs"}))["inputs"])
+    inputs_with_array = f"[{inputs}]"


this is because of this #209
kept it for now so the diffs work
will do a follow on to fix and backfill

Use COPY to speed up database writes for blocks and traces

lukevs added 5 commits January 3, 2022 13:15

Write blocks as proof of concept

93bdb7c

Abstract out csv writing

bab2043

Move classified_traces to csv write

6b1c469

Write using an iterator

ada540c

Credit

9b8cac5

lukevs marked this pull request as draft January 3, 2022 20:17

lukevs added 5 commits January 3, 2022 15:20

Move list util to db shared

0ed4f54

Bring back the array for diff checks

24a6ba6

Add placeholder file to detect which code is running

f84b9d4

Break it to prove tests work

02a0adc

Put it back

28b37c7

lukevs marked this pull request as ready for review January 4, 2022 16:10

lukevs changed the title ~~Use COPY to speed up database writes~~ Use COPY to speed up database writes for blocks and traces Jan 4, 2022

lukevs requested review from gheise, metachris and taarushv January 4, 2022 16:11

lukevs added 2 commits January 4, 2022 11:24

goodbye

eff77dd

comment => variable

17823b5

lukevs commented Jan 4, 2022

View reviewed changes

taarushv approved these changes Jan 4, 2022

View reviewed changes

lukevs merged commit 379bd82 into main Jan 4, 2022

lukevs deleted the faster-writes branch January 4, 2022 18:17

mendesfabio pushed a commit to mendesfabio/mev-inspect-py that referenced this pull request Nov 21, 2022

Merge pull request flashbots#211 from flashbots/faster-writes

8d7ffa6

Use COPY to speed up database writes for blocks and traces

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use COPY to speed up database writes for blocks and traces #211

Use COPY to speed up database writes for blocks and traces #211

lukevs commented Jan 3, 2022 •

edited

Loading

lukevs Jan 4, 2022

Use COPY to speed up database writes for blocks and traces #211

Use COPY to speed up database writes for blocks and traces #211

Conversation

lukevs commented Jan 3, 2022 • edited Loading

Testing

lukevs Jan 4, 2022

Choose a reason for hiding this comment

lukevs commented Jan 3, 2022 •

edited

Loading