Skip to content
This repository has been archived by the owner on Nov 27, 2024. It is now read-only.

Use COPY to speed up database writes for blocks and traces #211

Merged
merged 12 commits into from
Jan 4, 2022

Conversation

lukevs
Copy link
Collaborator

@lukevs lukevs commented Jan 3, 2022

Following the advice of this guide (and using their string io helper)
https://hakibenita.com/fast-load-data-python-postgresql

Locally on a 30 block test this speeds up from 92.63 seconds to 77.5 seconds

I'll make follow on PRs for the other tables once we see how this performs in production

Will also mess around with our block batch size (hoping we can increase it with this)

Testing

To test, I wrote this to export all of the tables for a given block

def export_all(block_number):
    inspect_db_session = get_inspect_session()

    tables_sorts = [
        ("arbitrages", "block_number, transaction_hash, profit_amount DESC"),
        ("blocks", "block_number DESC"),
        ("classified_traces", "block_number, transaction_hash, trace_address DESC"),
        ("liquidations", "block_number, transaction_hash, trace_address DESC"),
        ("miner_payments", "block_number, transaction_hash DESC"),
        ("nft_trades", "block_number, transaction_hash, trace_address DESC"),
        ("punk_bid_acceptances", "block_number, transaction_hash, trace_address DESC"),
        ("punk_bids", "block_number, transaction_hash, trace_address DESC"),
        ("punk_snipes", "block_number, transaction_hash, trace_address DESC"),
        ("sandwiched_swaps", "block_number, transaction_hash, trace_address DESC"),
        ("sandwiches", "block_number, frontrun_swap_transaction_hash, frontrun_swap_trace_address DESC"),
        ("swaps", "block_number, transaction_hash, trace_address DESC"),
        ("transfers", "block_number, transaction_hash, trace_address DESC"),
    ]

    skip_tables = [
        "arbitrage_swaps",
        "prices",
        "tokens",
    ]

    skip_columns = [
        "created_at",
        "classified_at",
        "id",
        "sandwich_id",
    ]

    for table, sorts in tables_sorts:
        results = inspect_db_session.execute(
            f"SELECT * FROM {table} WHERE block_number = :block_number ORDER BY {sorts}",
            params={"block_number": block_number},
        )
        for result in results:
            as_dict = result._asdict()

            for column in skip_columns:
                if column in as_dict:
                    del as_dict[column]

            print(json.dumps({k: str(v) for k, v in as_dict.items()}, sort_keys=True))

Then wrote this bash script to compare exports across main and this branch
we check which code is deployed by checking for a hello.txt file i committed to this branch (then removed for review)

block_number=$1

######### main #########
# main
git co main

# wait for code to update
echo "Waiting for code update"
sleep 5

# verify we're on main
kubectl exec deploy/mev-inspect -- ls hello.txt
if [ $? -eq 0 ]; then
    echo "Code not updated - exiting"
    exit
else
    echo "Verified correct code is running"
fi

# inspect block
./mev inspect $block_number

# export block
before_file="${block_number}_before.txt"
echo "Writing to $before_file"
kubectl exec deploy/mev-inspect -- poetry run python cli.py export-all $block_number > $before_file

######### branch #########
# main
git co faster-writes

# wait for code to update
echo "Waiting for code update"
sleep 5

# verify we're on main
kubectl exec deploy/mev-inspect -- ls hello.txt
if [ $? -eq 0 ]; then
    echo "Verified correct code is running"
else
    echo "Code not updated - exiting"
    exit
fi

# inspect block
./mev inspect $block_number

# export block
after_file="${block_number}_after.txt"
echo "Writing to $after_file"
kubectl exec deploy/mev-inspect -- poetry run python cli.py export-all $block_number > $after_file

echo "Diffing"
icdiff $before_file $after_file

echo "Cleaning up"
# rm $before_file
# rm $after_file

echo "Done"
  • I verified the created files include the expected data and that they show no diffs after removing ids and created ats (which are variable on each run)
  • I also updated the traces writing to write a 0 instead of type and verified that the differ caught this

@lukevs lukevs marked this pull request as draft January 3, 2022 20:17
@lukevs lukevs marked this pull request as ready for review January 4, 2022 16:10
@lukevs lukevs changed the title Use COPY to speed up database writes Use COPY to speed up database writes for blocks and traces Jan 4, 2022

def _inputs_as_json(trace) -> str:
inputs = json.dumps(json.loads(trace.json(include={"inputs"}))["inputs"])
inputs_with_array = f"[{inputs}]"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is because of this #209
kept it for now so the diffs work
will do a follow on to fix and backfill

@lukevs lukevs merged commit 379bd82 into main Jan 4, 2022
@lukevs lukevs deleted the faster-writes branch January 4, 2022 18:17
mendesfabio pushed a commit to mendesfabio/mev-inspect-py that referenced this pull request Nov 21, 2022
Use COPY to speed up database writes for blocks and traces
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants