-
Notifications
You must be signed in to change notification settings - Fork 51
Analyzing the Ethereum Blockchain with Apache Flink
This is a Flink application written in Scala demonstrating some of the capabilities of the hadoopcryptoledger library and its Flink Datasource for Apache Flink. It takes as input a set of files on HDFS containing Ethereum Blockchain data. As output it returns the number of all transactions found in the data. It has successfully been tested with the Hortonworks Sandbox VM 2.5, but other Hadoop distributions should work equally well. Flink 1.2 was used for testing (note: you may need to put the file flink-hadoop-compatibility*.jar in the lib folder of your flink distribution).
See here how to fetch Ethereum blockchain data.
After it has been copied you are ready to use the example.
Execute
git clone https://github.com/ZuInnoTe/hadoopcryptoledger.git hadoopcryptoledger
You can build the application by changing to the directory hadoopcryptoledger/examples/scala-flink-ethereumblock and using the following command:
sbt +clean +assembly +it:test
This will also execute the integration tests
You will find the jar "example-hcl-flink-scala-ethereumblock.jar" in ./target/scala-2.11
Make sure that the output directory is clean:
hadoop fs -rm -R /user/ethereum/output
Execute the following command (to execute it using a local master)
flink run example-hcl-flink-scala-ethereumblock.jar --input hdfs://localhost:8020/user/ethereum/input --output hdfs://localhost:8020/user/ethereum/output
After the Flink job has completed, you find the result in /user/ethereum/output. You can display it using the following command:
hadoop fs -text /user/ethereum/output
Understanding the structure of Ethereum data:
Ethereum Yellow paper: http://yellowpaper.io/