Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add toolkit for exporting and transforming missing block header fields #903

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
data/
13 changes: 13 additions & 0 deletions rollup/missing_header_fields/export-headers-toolkit/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
FROM golang:1.22

WORKDIR /app

COPY go.mod go.sum ./

RUN go mod download

COPY . .

RUN go build -o main .

ENTRYPOINT ["./main"]
63 changes: 63 additions & 0 deletions rollup/missing_header_fields/export-headers-toolkit/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Export missing block header fields toolkit

A toolkit for exporting and transforming missing block header fields of Scroll before {{upgrade_name}} TODO: replace when upgrade is clear.

## Context
We are using the [Clique consensus](https://eips.ethereum.org/EIPS/eip-225) in Scroll L2. Amongst others, it requires the following header fields:
- `extraData`
- `difficulty`

However, before {{upgrade_name}}, these fields were not stored on L1/DA.
In order for nodes to be able to reconstruct the correct block hashes when only reading data from L1,
we need to provide the historical values of these fields to these nodes through a separate file.

This toolkit provides commands to export the missing fields, deduplicate the data and create a file
with the missing fields that can be used to reconstruct the correct block hashes when only reading data from L1.

The toolkit provides the following commands:
- `fetch` - Fetch missing block header fields from a running Scroll L2 node and store in a file
- `dedup` - Deduplicate the headers file, print unique values and create a new file with the deduplicated headers

## Binary layout deduplicated missing header fields file
The deduplicated header file binary layout is as follows:

```plaintext
<unique_vanity_count:uint8><unique_vanity_1:[32]byte>...<unique_vanity_n:[32]byte><header_1:header>...<header_n:header>

Where:
- unique_vanity_count: number of unique vanities n
- unique_vanity_i: unique vanity i
- header_i: block header i
- header:
<flags:uint8><seal:[65|85]byte>
- flags: bitmask, lsb first
- bit 0-5: index of the vanity in the sorted vanities list
- bit 6: 0 if difficulty is 2, 1 if difficulty is 1
- bit 7: 0 if seal length is 65, 1 if seal length is 85
```

## How to run
Each of the commands has its own set of flags and options. To display the help message run with `--help` flag.

1. Fetch the missing block header fields from a running Scroll L2 node via RPC and store in a file (approx 40min for 5.5M blocks).
2. Deduplicate the headers file, print unique values and create a new file with the deduplicated headers

```bash
go run main.go fetch --rpc=http://localhost:8545 --start=0 --end=100 --batch=10 --parallelism=10 --output=headers.bin --humanOutput=true
go run main.go dedup --input=headers.bin --output=headers-dedup.bin
```


### With Docker
To run the toolkit with Docker, build the Docker image and run the commands inside the container.

```bash
docker build -t export-headers-toolkit .

# depending on the Docker config maybe finding the RPC container's IP with docker inspect is necessary. Potentially host IP works: http://172.17.0.1:8545
docker run --rm -v "$(pwd)":/app/result export-headers-toolkit fetch --rpc=<address> --start=0 --end=5422047 --batch=10000 --parallelism=10 --output=/app/result/headers.bin --humanOutput=/app/result/headers.csv
docker run --rm -v "$(pwd)":/app/result export-headers-toolkit dedup --input=/app/result/headers.bin --output=/app/result/headers-dedup.bin
```



296 changes: 296 additions & 0 deletions rollup/missing_header_fields/export-headers-toolkit/cmd/dedup.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,296 @@
package cmd

import (
"bufio"
"bytes"
"crypto/sha256"
"encoding/binary"
"fmt"
"io"
"log"
"os"
"strconv"
"strings"

"github.com/spf13/cobra"

"github.com/scroll-tech/go-ethereum/common"

"github.com/scroll-tech/go-ethereum/export-headers-toolkit/types"
)

// dedupCmd represents the dedup command
var dedupCmd = &cobra.Command{
Use: "dedup",
Short: "Deduplicate the headers file, print unique values and create a new file with the deduplicated headers",
Long: `Deduplicate the headers file, print unique values and create a new file with the deduplicated headers.

The binary layout of the deduplicated file is as follows:
- 1 byte for the count of unique vanity
- 32 bytes for each unique vanity
- for each header:
- 1 byte (bitmask, lsb first):
- bit 0-5: index of the vanity in the sorted vanities list
- bit 6: 0 if difficulty is 2, 1 if difficulty is 1
- bit 7: 0 if seal length is 65, 1 if seal length is 85
- 65 or 85 bytes for the seal`,
Run: func(cmd *cobra.Command, args []string) {
inputFile, err := cmd.Flags().GetString("input")
if err != nil {
log.Fatalf("Error reading output flag: %v", err)
}
outputFile, err := cmd.Flags().GetString("output")
if err != nil {
log.Fatalf("Error reading output flag: %v", err)
}
verifyFile, err := cmd.Flags().GetString("verify")
if err != nil {
log.Fatalf("Error reading verify flag: %v", err)
}

if verifyFile != "" {
verifyInputFile(verifyFile, inputFile)
}

_, seenVanity, _ := runAnalysis(inputFile)
runDedup(inputFile, outputFile, seenVanity)

if verifyFile != "" {
verifyOutputFile(verifyFile, outputFile)
}

runSHA256(outputFile)
},
}

func init() {
rootCmd.AddCommand(dedupCmd)

dedupCmd.Flags().String("input", "headers.bin", "headers file")
dedupCmd.Flags().String("output", "headers-dedup.bin", "deduplicated, binary formatted file")
dedupCmd.Flags().String("verify", "", "verify the input and output files with the given .csv file")
}

func runAnalysis(inputFile string) (seenDifficulty map[uint64]int, seenVanity map[[32]byte]bool, seenSealLen map[int]int) {
reader := newHeaderReader(inputFile)
defer reader.close()

// track header fields we've seen
seenDifficulty = make(map[uint64]int)
seenVanity = make(map[[32]byte]bool)
seenSealLen = make(map[int]int)

reader.read(func(header *types.Header) {
seenDifficulty[header.Difficulty]++
seenVanity[header.Vanity()] = true
seenSealLen[header.SealLen()]++
})

// Print distinct values and report
fmt.Println("--------------------------------------------------")
for diff, count := range seenDifficulty {
fmt.Printf("Difficulty %d: %d\n", diff, count)
}

for vanity := range seenVanity {
fmt.Printf("Vanity: %x\n", vanity)
}

for sealLen, count := range seenSealLen {
fmt.Printf("SealLen %d bytes: %d\n", sealLen, count)
}

fmt.Println("--------------------------------------------------")
fmt.Printf("Unique values seen in the headers file (last seen block: %d):\n", reader.lastHeader.Number)
fmt.Printf("Distinct count: Difficulty:%d, Vanity:%d, SealLen:%d\n", len(seenDifficulty), len(seenVanity), len(seenSealLen))
fmt.Printf("--------------------------------------------------\n\n")

return seenDifficulty, seenVanity, seenSealLen
}

func runDedup(inputFile, outputFile string, seenVanity map[[32]byte]bool) {
reader := newHeaderReader(inputFile)
defer reader.close()

writer := newMissingHeaderFileWriter(outputFile, seenVanity)
defer writer.close()

writer.missingHeaderWriter.writeVanities()

reader.read(func(header *types.Header) {
writer.missingHeaderWriter.write(header)
})
}

func runSHA256(outputFile string) {
f, err := os.Open(outputFile)
defer f.Close()
if err != nil {
log.Fatalf("Error opening file: %v", err)
}

h := sha256.New()
if _, err = io.Copy(h, f); err != nil {
log.Fatalf("Error hashing file: %v", err)
}

fmt.Printf("Deduplicated headers written to %s with sha256 checksum: %x\n", outputFile, h.Sum(nil))
}

type headerReader struct {
file *os.File
reader *bufio.Reader
lastHeader *types.Header
}

func newHeaderReader(inputFile string) *headerReader {
f, err := os.Open(inputFile)
if err != nil {
log.Fatalf("Error opening input file: %v", err)
}

h := &headerReader{
file: f,
reader: bufio.NewReader(f),
}

return h
}

func (h *headerReader) read(callback func(header *types.Header)) {
headerSizeBytes := make([]byte, types.HeaderSizeSerialized)

for {
_, err := io.ReadFull(h.reader, headerSizeBytes)
if err != nil {
if err == io.EOF {
break
}
log.Fatalf("Error reading headerSizeBytes: %v", err)
}
headerSize := binary.BigEndian.Uint16(headerSizeBytes)

headerBytes := make([]byte, headerSize)
_, err = io.ReadFull(h.reader, headerBytes)
if err != nil {
if err == io.EOF {
break
}
log.Fatalf("Error reading headerBytes: %v", err)
}
header := new(types.Header).FromBytes(headerBytes)

// sanity check: make sure headers are in order
if h.lastHeader != nil && header.Number != h.lastHeader.Number+1 {
fmt.Println("lastHeader:", h.lastHeader.String())
log.Fatalf("Missing block: %d, got %d instead", h.lastHeader.Number+1, header.Number)
}
h.lastHeader = header

callback(header)
}
}

func (h *headerReader) close() {
h.file.Close()
}

type csvHeaderReader struct {
file *os.File
reader *bufio.Reader
}

func newCSVHeaderReader(verifyFile string) *csvHeaderReader {
f, err := os.Open(verifyFile)
if err != nil {
log.Fatalf("Error opening verify file: %v", err)
}

h := &csvHeaderReader{
file: f,
reader: bufio.NewReader(f),
}

return h
}

func (h *csvHeaderReader) readNext() *types.Header {
line, err := h.reader.ReadString('\n')
if err != nil {
if err == io.EOF {
return nil
}
log.Fatalf("Error reading line: %v", err)
}

s := strings.Split(line, ",")
extraString := strings.Split(s[2], "\n")

num, err := strconv.ParseUint(s[0], 10, 64)
if err != nil {
log.Fatalf("Error parsing block number: %v", err)
}
difficulty, err := strconv.ParseUint(s[1], 10, 64)
if err != nil {
log.Fatalf("Error parsing difficulty: %v", err)
}
extra := common.FromHex(extraString[0])

header := types.NewHeader(num, difficulty, extra)
return header
}

func (h *csvHeaderReader) close() {
h.file.Close()
}

func verifyInputFile(verifyFile, inputFile string) {
csvReader := newCSVHeaderReader(verifyFile)
defer csvReader.close()

binaryReader := newHeaderReader(inputFile)
defer binaryReader.close()

binaryReader.read(func(header *types.Header) {
csvHeader := csvReader.readNext()

if !csvHeader.Equal(header) {
log.Fatalf("Header mismatch: %v != %v", csvHeader, header)
}
})

log.Printf("All headers match in %s and %s\n", verifyFile, inputFile)
}

func verifyOutputFile(verifyFile, outputFile string) {
csvReader := newCSVHeaderReader(verifyFile)
defer csvReader.close()

dedupReader, err := NewReader(outputFile)
if err != nil {
log.Fatalf("Error opening dedup file: %v", err)
}
defer dedupReader.Close()

for {
header := csvReader.readNext()
if header == nil {
if _, _, err = dedupReader.ReadNext(); err == nil {
log.Fatalf("Expected EOF, got more headers")
}
break
}

difficulty, extraData, err := dedupReader.Read(header.Number)
if err != nil {
log.Fatalf("Error reading header: %v", err)
}

if header.Difficulty != difficulty {
log.Fatalf("Difficulty mismatch: headerNum %d: %d != %d", header.Number, header.Difficulty, difficulty)
}
if !bytes.Equal(header.ExtraData, extraData) {
log.Fatalf("ExtraData mismatch: headerNum %d: %x != %x", header.Number, header.ExtraData, extraData)
}
}
}
Loading
Loading