Project Review: Ethereum Blockchain Analysis

Overview

This project leverages Apache Spark to analyze data from the Ethereum blockchain spanning August 2015 to January 2019. It focuses on mining activities, transaction patterns, and provides insights into the operational dynamics of the Ethereum network.

Objectives

Data Management: Load and manage large datasets using PySpark.
Aggregation Analysis: Identify top miners by the total block size mined.
Time Series Analysis: Convert UNIX timestamps and analyze blockchain activities over time.
Data Integration: Merge transaction and block data to enhance analytical depth.
Focused Monthly Analysis: Examine specific months for detailed insights into blockchain activity and transaction fees.

Methodologies

PySpark DataFrames: Utilized for robust data processing and handling.
Aggregation and Sorting: Applied to compute key mining statistics.
Date Transformation: Used to facilitate easier analysis of temporal data patterns.
Inner Joins: Employed to merge datasets for a holistic view.
Data Visualization: Implemented using Matplotlib to create histograms showcasing data trends.

Key Findings

Miner Centralization: A small number of miners were found to dominate block production.
Fluctuating Activity Levels: Detailed analysis of September and October 2015 revealed significant variations in daily blockchain activities.
Economic Insights: October 2015 data highlighted the cost dynamics of transactions, providing a snapshot of economic factors influencing blockchain operations.

Implications

Strategic Insights: Offers valuable information for stakeholders in the Ethereum ecosystem regarding mining and transaction strategies.
Technical Contributions: Demonstrates the effectiveness of PySpark in blockchain data analysis, setting a benchmark for similar analytics projects.
Operational Recommendations: Insights into transaction fees and mining activities can guide adjustments in blockchain design to enhance fairness and decentralization.

Dataset Description

The dataset comprises two main CSV files:

blocks.csv: Contains data about the blocks on the Ethereum blockchain.
transactions.csv: Contains details of transactions within those blocks.

Repository Structure

src/: Contains all source code used for the analysis.
data/: Instructions on how to access the blockchain data (actual data not included due to size and privacy concerns).
docs/: Additional documentation and images.
README.md: This file, providing an overview of the project and setup instructions.

Tasks and Solutions

Task 2.1: Load CSV Files

Objective: Load blocks.csv and transactions.csv using PySpark.
Solution: Utilized PySpark's read.csv method with headers and inferred schema.

Task 2.2: Evaluate Top 10 Miners

Objective: Determine the top 10 miners by the total size of blocks mined.
Solution: Performed aggregation and sorting in PySpark to identify the top miners.

Task 2.3: Convert UNIX Timestamps

Objective: Convert UNIX timestamps in blocks.csv to readable date format.
Solution: Used PySpark functions from_unixtime and to_date to format timestamps.

Task 2.4: Inner Join Datasets

Objective: Join transactions.csv and blocks.csv by hash fields.
Solution: Handled field name ambiguities by specifying dataset origins in the join operation.

Task 2.5: Analyze September 2015 Data

Objective: Analyze block production and unique senders for September 2015.
Solution: Filtered and aggregated data to produce histograms of daily activities.

Task 2.6: Analyze October 2015 Data

Objective: Calculate total transaction fees for October 2015.
Solution: Used gas and gas_price fields to compute fees and visualized results with histograms.

Tools and Technologies

Apache Spark: Main platform for data processing.
Python: Used for scripting and additional data manipulation.
Matplotlib: For creating visualizations of the analysis results.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Sample_data		Sample_data
README.md		README.md
Task1.py		Task1.py
Task2.py		Task2.py
Task3.py		Task3.py
Task4.py		Task4.py
Task5.py		Task5.py
Task6(2)histogram.py		Task6(2)histogram.py
Task6.py		Task6.py
Task6histogram.py		Task6histogram.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Review: Ethereum Blockchain Analysis

Overview

Objectives

Methodologies

Key Findings

Implications

Dataset Description

Repository Structure

Tasks and Solutions

Task 2.1: Load CSV Files

Task 2.2: Evaluate Top 10 Miners

Task 2.3: Convert UNIX Timestamps

Task 2.4: Inner Join Datasets

Task 2.5: Analyze September 2015 Data

Task 2.6: Analyze October 2015 Data

Tools and Technologies

About

Releases

Packages

Languages

Sparsh009/Ethereum-Blockchain-Analysis

Folders and files

Latest commit

History

Repository files navigation

Project Review: Ethereum Blockchain Analysis

Overview

Objectives

Methodologies

Key Findings

Implications

Dataset Description

Repository Structure

Tasks and Solutions

Task 2.1: Load CSV Files

Task 2.2: Evaluate Top 10 Miners

Task 2.3: Convert UNIX Timestamps

Task 2.4: Inner Join Datasets

Task 2.5: Analyze September 2015 Data

Task 2.6: Analyze October 2015 Data

Tools and Technologies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages