Skip to content

Analyzing the Ethereum Blockchain with Apache Flink

Jörn Franke edited this page Nov 12, 2017 · 1 revision

This is a Flink application written in Scala demonstrating some of the capabilities of the hadoopcryptoledger library and its Flink Datasource for Apache Flink. It takes as input a set of files on HDFS containing Ethereum Blockchain data. As output it returns the number of all transactions found in the data. It has successfully been tested with the Hortonworks Sandbox VM 2.5, but other Hadoop distributions should work equally well. Flink 1.2 was used for testing (note: you may need to put the file flink-hadoop-compatibility*.jar in the lib folder of your flink distribution).

Getting blockchain data

See here how to fetch Ethereum blockchain data.

After it has been copied you are ready to use the example.

Building the example

Execute

git clone https://github.com/ZuInnoTe/hadoopcryptoledger.git hadoopcryptoledger

You can build the application by changing to the directory hadoopcryptoledger/examples/scala-flink-ethereumblock and using the following command:

sbt +clean +assembly +it:test

This will also execute the integration tests

You will find the jar "example-hcl-flink-scala-ethereumblock.jar" in ./target/scala-2.11

Running the example

Make sure that the output directory is clean:

hadoop fs -rm -R /user/ethereum/output

Execute the following command (to execute it using a local master)

flink run example-hcl-flink-scala-ethereumblock.jar --input hdfs://localhost:8020/user/ethereum/input --output hdfs://localhost:8020/user/ethereum/output

After the Flink job has completed, you find the result in /user/ethereum/output. You can display it using the following command:

hadoop fs -text /user/ethereum/output

More Information

Understanding the structure of Ethereum data:

Ethereum Yellow paper: http://yellowpaper.io/

Clone this wiki locally