Skip to content

Commit

Permalink
Incremental verify checkpoints (stellar#4487)
Browse files Browse the repository at this point in the history
Resolves stellar#4454

# Description

Adds `--trusted-hash-file` argument to the `verify-checkpoints` command
to support appending new verified checkpoints starting from the last
checkpoint in the trusted hash file.

Adds `--from-ledger` to support generating a verified checkpoint hash
file starting from a specific ledger to LCL/specified end ledger.

Design doc:
https://docs.google.com/document/d/1GRzHAO4_YrfanXqoVc1UDIMhUV10PFqIMQyOxlPOW_s/edit

## Usage example:
### `--from-ledger` :
`% src/stellar-core verify-checkpoints --from-ledger=53736369
--output-file=out.json --conf=../stellar-core.cfg`
Result:
```
% cat out.json
[
[53736575, "1de4bfa30f8af81716d2295b7c9f077afea250ddb88839345c13176de7b75e36"],
[53736511, "9f1bd24f21facc606b49216853c0e2162d55d2e3e898da96dd910ddd1ede784f"],
[53736447, "80a3083ea9e987b48949c2ad33006a5e750f06c6836c4814d5a853cab6bac1e3"],
[53736383, "2363bc49669667aa28da768588b5be7f09dc8c69c5e20416d870748b3739509b"],
[0, ""]
]
```

### Append to existing file:
`src/stellar-core verify-checkpoints --trusted-hash-file=out.json
--output-file=out2.json --conf=../stellar-core.cfg`
Result:
```
cat out2.json
[
[53736959, "4b1900cb4bbaa77e86e3c8abb33be966e24a84098acdbda3d57977f237c5b13e"],
[53736895, "a163415903fa39efb53e4c79198fa2857cdbb12f92cc64f0ac3bcd0e6a7f2cce"],
[53736831, "2977e0c5653960a11359552dd74508a17982a5ca422db961f809fc335cd17901"],
[53736767, "ff7d80daad82981c1512c0f296a9ff9902f7b9d1ffa8ec8ad02e588cca16a9fd"],
[53736703, "0fb92338560bfac48ebd78dac530735ca988009132846fd93e42c061caa8cc5f"],
[53736639, "ba407b9b13e077cf9fb0a1c277416e12c6ff6857a42beef62f5805a9fdeec8ce"],
[53736575, "1de4bfa30f8af81716d2295b7c9f077afea250ddb88839345c13176de7b75e36"],
[53736511, "9f1bd24f21facc606b49216853c0e2162d55d2e3e898da96dd910ddd1ede784f"],
[53736447, "80a3083ea9e987b48949c2ad33006a5e750f06c6836c4814d5a853cab6bac1e3"],
[53736383, "2363bc49669667aa28da768588b5be7f09dc8c69c5e20416d870748b3739509b"],
[0, ""]
]
```

### Usage of both `--from-ledger` and `--trusted-hash-file` -> ERROR

```
 % src/stellar-core verify-checkpoints --trusted-hash-file=out2.json --output-file=out3.json --from-ledger=9999 --conf=../stellar-core.cfg --ll trace
Warning: running non-release version v22.0.0rc1-3-ge94e61395-dirty of stellar-core
2024-09-30T15:56:36.748 [default ERROR] Cannot specify both --from-ledger and --trusted-hash-file
```

### Performance
Time for verification of checkpoints `--from-ledger=53737040` to
LCL=53739327
Output: hashes for checkpoints 53737023 to 53739327, total of 2304
ledgers = 2287 ledgers (from `--from-ledger=53737040` to LCL=53739327) +
13 ledgers (from checkpoint 53737023 to --from-ledger=53737040):

```
time src/stellar-core verify-checkpoints --output-file=out4.json --from-ledger=53737040 --conf=../stellar-core.cfg

src/stellar-core verify-checkpoints --output-file=out4.json    15.22s user 1.25s system 8% cpu 3:25.09 total
  0.80s user 0.31s system 18% cpu 5.825 total
```

205 seconds / 2304 ledgers = 0.09 seconds, 90 milliseconds / ledger

Caveat: There is an overhead as the LCL is obtained from the network. On
average we will wait 1/2 a checkpoint (32 ledgers) to find a checkpoint
boundary LCL (32 ledgers * 5 seconds = 160 seconds).

<!---

Describe what this pull request does, which issue it's resolving
(usually applicable for code changes).

--->

# Checklist
- [ ] Reviewed the
[contributing](https://github.com/stellar/stellar-core/blob/master/CONTRIBUTING.md#submitting-changes)
document
- [ ] Rebased on top of master (no merge commits)
- [ ] Ran `clang-format` v8.0.0 (via `make format` or the Visual Studio
extension)
- [ ] Compiles
- [ ] Ran all tests
- [ ] If change impacts performance, include supporting evidence per the
[performance
document](https://github.com/stellar/stellar-core/blob/master/performance-eval/performance-eval.md)
  • Loading branch information
SirTyson authored Oct 29, 2024
2 parents 42a792c + b827e40 commit acf111d
Show file tree
Hide file tree
Showing 9 changed files with 325 additions and 45 deletions.
19 changes: 17 additions & 2 deletions docs/software/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,8 +209,23 @@ apply.
hash for a checkpoint ledger, and then verifies the entire earlier history
of an archive that ends in that ledger hash, writing the output to a reference
list of trusted checkpoint hashes.
Option **--output-filename <FILE-NAME>** is mandatory and specifies the file
to write the trusted checkpoint hashes to.
* Option **--history-hash <HASH>** is optional and specifies the hash of the ledger
at the end of the verification range. When provided, `stellar-core` will use the history
hash to verify the range, rather than the latest checkpoint hash obtained from consensus.
Used in conjunction with `--history-ledger`.
* Option **--history-ledger <LEDGER-NUMBER>** is optional and specifies the ledger
number to end the verification at. Used in conjunction with `--history-hash`.
* Option **--output-filename <FILE-NAME>** is mandatory and specifies the file
to write the trusted checkpoint hashes to. The file will contain a JSON array
of arrays, where each inner array contains the ledger number and the corresponding
checkpoint hash of the form `[[999, "hash-abc"], [935, "hash-def"], ... [0, "hash-xyz]]`.
* Option **--trusted-checkpoint-file <FILE-NAME>** is optional. If provided,
stellar-core will parse the latest checkpoint ledger number and hash from the file and verify from this ledger to the latest checkpoint ledger obtained from the network.
* Option **--from-ledger <LEDGER-NUMBER>** is optional and specifies the ledger
number to start the verification from.

> Note: It is an error to provide both the `--trusted-checkpoint-hashes` and `--from-ledger` options.
* **version**: Print version info and then exit.

## HTTP Commands
Expand Down
2 changes: 1 addition & 1 deletion src/catchup/CatchupWork.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ CatchupWork::downloadVerifyLedgerChain(CatchupRange const& catchupRange,

mVerifyLedgers = std::make_shared<VerifyLedgerChainWork>(
mApp, *mDownloadDir, verifyRange, mLastClosedLedgerHashPair,
mRangeEndFuture, std::move(fatalFailurePromise));
std::nullopt, mRangeEndFuture, std::move(fatalFailurePromise));

// Never retry the sequence: downloads already have retries, and there's no
// point retrying verification
Expand Down
25 changes: 23 additions & 2 deletions src/catchup/VerifyLedgerChainWork.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ trySetFuture(std::promise<T>& promise, T value)
VerifyLedgerChainWork::VerifyLedgerChainWork(
Application& app, TmpDir const& downloadDir, LedgerRange const& range,
LedgerNumHashPair const& lastClosedLedger,
std::optional<LedgerNumHashPair> const& maxPrevVerified,
std::shared_future<LedgerNumHashPair> trustedMaxLedger,
std::promise<bool>&& fatalFailure,
std::shared_ptr<std::ofstream> outputStream)
Expand All @@ -118,6 +119,7 @@ VerifyLedgerChainWork::VerifyLedgerChainWork(
: mApp.getHistoryManager().checkpointContainingLedger(
mRange.last()))
, mLastClosed(lastClosedLedger)
, mMaxPrevVerified(maxPrevVerified)
, mFatalFailurePromise(std::move(fatalFailure))
, mTrustedMaxLedger(trustedMaxLedger)
, mVerifiedMinLedgerPrevFuture(mVerifiedMinLedgerPrev.get_future().share())
Expand Down Expand Up @@ -211,7 +213,7 @@ VerifyLedgerChainWork::verifyHistoryOfSingleCheckpoint()
}

// Verify ledger with local state by comparing to LCL
// When checking against LCL, see it the local node is in the bad state,
// When checking against LCL, see if the local node is in a bad state
// or if the archive is in a bad state (in which case, retry)
if (curr.header.ledgerSeq == mLastClosed.first)
{
Expand Down Expand Up @@ -242,6 +244,20 @@ VerifyLedgerChainWork::verifyHistoryOfSingleCheckpoint()
mChainDisagreesWithLocalState = lclResult;
}
}
// If the curr history entry is the same ledger as our mMaxPrevVerified,
// verify that the hashes match.
if (mMaxPrevVerified &&
curr.header.ledgerSeq == mMaxPrevVerified->first &&
curr.hash != mMaxPrevVerified->second)
{
CLOG_ERROR(History,
"Checkpoint {} does not agree with trusted "
"checkpoint hash {}",
LedgerManager::ledgerAbbrev(curr),
LedgerManager::ledgerAbbrev(mMaxPrevVerified->first,
*mMaxPrevVerified->second));
return HistoryManager::VERIFY_STATUS_ERR_BAD_HASH;
}

if (beginCheckpoint)
{
Expand Down Expand Up @@ -365,7 +381,7 @@ VerifyLedgerChainWork::verifyHistoryOfSingleCheckpoint()
}
else
{
// Otherwise we just finished a checkpoint _after_ than the first call
// Otherwise we just finished a checkpoint _after_ the first call
// to this method and the `incoming` value we read out of
// `mVerifiedAhead` should have content, because the previous call
// should have saved something in `mVerifiedAhead`.
Expand Down Expand Up @@ -420,6 +436,11 @@ VerifyLedgerChainWork::onSuccess()
{
for (auto const& pair : mVerifiedLedgers)
{
if (mMaxPrevVerified && mMaxPrevVerified->first == pair.first)
{
// Skip writing the trusted hash to the output file.
continue;
}
(*mOutputStream) << "\n[" << pair.first << ", \""
<< binToHex(*pair.second) << "\"],";
}
Expand Down
5 changes: 5 additions & 0 deletions src/catchup/VerifyLedgerChainWork.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ class VerifyLedgerChainWork : public BasicWork
LedgerRange const mRange;
uint32_t mCurrCheckpoint;
LedgerNumHashPair const mLastClosed;
// The max ledger number and hash that we have verified up to at some time
// in the past (or genesis if we have no previous verification). Invocations
// of VerifyLedgerChainWork will verify down to this ledger.
std::optional<LedgerNumHashPair> const mMaxPrevVerified;

// Record any instance where the chain we're verifying disagrees with the
// local node state. This _might_ mean we can't possibly catch up (eg. we're
Expand Down Expand Up @@ -78,6 +82,7 @@ class VerifyLedgerChainWork : public BasicWork
VerifyLedgerChainWork(
Application& app, TmpDir const& downloadDir, LedgerRange const& range,
LedgerNumHashPair const& lastClosedLedger,
std::optional<LedgerNumHashPair> const& maxPrevVerified,
std::shared_future<LedgerNumHashPair> trustedMaxLedger,
std::promise<bool>&& fatalFailure,
std::shared_ptr<std::ofstream> outputStream = nullptr);
Expand Down
2 changes: 1 addition & 1 deletion src/history/test/HistoryTests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -243,7 +243,7 @@ TEST_CASE("Ledger chain verification", "[ledgerheaderverification]")
std::shared_future<bool> fatalFailureFuture =
fataFailurePromise.get_future().share();
auto w = wm.executeWork<VerifyLedgerChainWork>(
tmpDir, ledgerRange, lclPair, ledgerRangeEndFuture,
tmpDir, ledgerRange, lclPair, std::nullopt, ledgerRangeEndFuture,
std::move(fataFailurePromise));
REQUIRE(expectedState == w->getState());
REQUIRE(fatalFailureFuture.valid());
Expand Down
154 changes: 133 additions & 21 deletions src/historywork/WriteVerifiedCheckpointHashesWork.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,46 +4,86 @@

#include "historywork/WriteVerifiedCheckpointHashesWork.h"
#include "catchup/VerifyLedgerChainWork.h"
#include "crypto/Hex.h"
#include "history/HistoryManager.h"
#include "historywork/BatchDownloadWork.h"
#include "ledger/LedgerManager.h"
#include "ledger/LedgerRange.h"
#include "main/Application.h"
#include "util/Fs.h"
#include "util/GlobalChecks.h"
#include "util/Logging.h"
#include "work/ConditionalWork.h"
#include <Tracy.hpp>
#include <algorithm>
#include <filesystem>
#include <fmt/format.h>

namespace stellar
{
LedgerNumHashPair
WriteVerifiedCheckpointHashesWork::loadLatestHashPairFromJsonOutput(
std::filesystem::path const& path)
{
if (!fs::exists(path.string()))
{
throw std::runtime_error("file not found: " + path.string());
}

std::ifstream in(path);
Json::Value root;
Json::Reader rdr;
if (!rdr.parse(in, root))
{
throw std::runtime_error("failed to parse JSON input " + path.string());
}
if (!root.isArray())
{
throw std::runtime_error("expected top-level array in " +
path.string());
}
if (root.size() < 2)
{
throw std::runtime_error(
"expected at least one trusted ledger, hash pair in " +
path.string());
}
// Latest hash is the first element in the array.
auto const& jpair = root[0];
if (!jpair.isArray() || (jpair.size() != 2))
{
throw std::runtime_error("expecting 2-element sub-array in " +
path.string());
}
return {jpair[0].asUInt(), hexToBin256(jpair[1].asString())};
}

Hash
WriteVerifiedCheckpointHashesWork::loadHashFromJsonOutput(
uint32_t seq, std::string const& filename)
uint32_t seq, std::filesystem::path const& path)
{
std::ifstream in(filename);
std::ifstream in(path);
if (!in)
{
throw std::runtime_error("error opening " + filename);
throw std::runtime_error("error opening " + path.string());
}
Json::Value root;
Json::Reader rdr;
if (!rdr.parse(in, root))
{
throw std::runtime_error("failed to parse JSON input " + filename);
throw std::runtime_error("failed to parse JSON input " + path.string());
}
if (!root.isArray())
{
throw std::runtime_error("expected top-level array in " + filename);
throw std::runtime_error("expected top-level array in " +
path.string());
}
for (auto const& jpair : root)
{
if (!jpair.isArray() || (jpair.size() != 2))
{
throw std::runtime_error("expecting 2-element sub-array in " +
filename);
path.string());
}
if (jpair[0].asUInt() == seq)
{
Expand All @@ -54,16 +94,26 @@ WriteVerifiedCheckpointHashesWork::loadHashFromJsonOutput(
}

WriteVerifiedCheckpointHashesWork::WriteVerifiedCheckpointHashesWork(
Application& app, LedgerNumHashPair rangeEnd, std::string const& outputFile,
uint32_t nestedBatchSize, std::shared_ptr<HistoryArchive> archive)
Application& app, LedgerNumHashPair rangeEnd,
std::filesystem::path const& outputFile,
std::optional<std::filesystem::path> const& trustedHashFile,
std::optional<LedgerNumHashPair> const& latestTrustedHashPair,
std::optional<uint32_t> const& fromLedger, uint32_t nestedBatchSize,
std::shared_ptr<HistoryArchive> archive)
: BatchWork(app, "write-verified-checkpoint-hashes")
, mNestedBatchSize(nestedBatchSize)
, mRangeEnd(rangeEnd)
, mRangeEndPromise()
, mRangeEndFuture(mRangeEndPromise.get_future().share())
, mCurrCheckpoint(rangeEnd.first)
, mArchive(archive)
, mOutputFileName(outputFile)
, mTrustedHashPath(trustedHashFile)
, mOutputPath(outputFile)
, mTmpDir("verify-checkpoints")
, mTmpOutputPath(std::filesystem::path(mTmpDir.getName()) /
outputFile.filename())
, mLatestTrustedHashPair(latestTrustedHashPair)
, mFromLedger(fromLedger)
{
mRangeEndPromise.set_value(mRangeEnd);
if (mArchive)
Expand All @@ -81,6 +131,14 @@ WriteVerifiedCheckpointHashesWork::~WriteVerifiedCheckpointHashesWork()
bool
WriteVerifiedCheckpointHashesWork::hasNext() const
{
if (mFromLedger)
{
return mCurrCheckpoint >= *mFromLedger;
}
else if (mLatestTrustedHashPair)
{
return mCurrCheckpoint >= mLatestTrustedHashPair->first;
}
return mCurrCheckpoint != LedgerManager::GENESIS_LEDGER_SEQ;
}

Expand All @@ -101,9 +159,31 @@ WriteVerifiedCheckpointHashesWork::yieldMoreWork()
std::make_optional<Hash>(lclHe.hash));
uint32_t const span = mNestedBatchSize * freq;
uint32_t const last = mCurrCheckpoint;
uint32_t const first =
last <= span ? LedgerManager::GENESIS_LEDGER_SEQ
: hm.firstLedgerInCheckpointContaining(last - span);
uint32_t first = last <= span
? LedgerManager::GENESIS_LEDGER_SEQ
: hm.firstLedgerInCheckpointContaining(last - span);
// If the first ledger in the range is less than mFromLedger then the
// range should be constrained to start at mFromLedger, or the checkpoint
// immediately before it if mFromLedger is not a checkpoint boundary.
if (mFromLedger && first < *mFromLedger)
{
if (hm.isLastLedgerInCheckpoint(*mFromLedger))
{
first = *mFromLedger;
}
else
{
first = hm.lastLedgerBeforeCheckpointContaining(*mFromLedger);
}
releaseAssertOrThrow(first <= *mFromLedger);
}
// If the latest trusted ledger is greater than the first
// ledger in the range then the range should start at the trusted ledger.
else if (mLatestTrustedHashPair && first < mLatestTrustedHashPair->first)
{
first = mLatestTrustedHashPair->first;
releaseAssertOrThrow(hm.isLastLedgerInCheckpoint(first));
}

LedgerRange const ledgerRange = LedgerRange::inclusive(first, last);
CheckpointRange const checkpointRange(ledgerRange, hm);
Expand Down Expand Up @@ -139,8 +219,8 @@ WriteVerifiedCheckpointHashesWork::yieldMoreWork()
: mRangeEndFuture);

auto currWork = std::make_shared<VerifyLedgerChainWork>(
mApp, *tmpDir, ledgerRange, lcl, prevTrusted, std::promise<bool>(),
mOutputFile);
mApp, *tmpDir, ledgerRange, lcl, mLatestTrustedHashPair, prevTrusted,
std::promise<bool>(), mOutputFile);
auto prevWork = mPrevVerifyWork;
auto predicate = [prevWork](Application&) {
if (!prevWork)
Expand Down Expand Up @@ -169,11 +249,11 @@ WriteVerifiedCheckpointHashesWork::startOutputFile()
{
releaseAssert(!mOutputFile);
auto mode = std::ios::out | std::ios::trunc;
mOutputFile = std::make_shared<std::ofstream>(mOutputFileName, mode);
mOutputFile = std::make_shared<std::ofstream>(mTmpOutputPath, mode);
if (!*mOutputFile)
{
throw std::runtime_error("error opening output file " +
mOutputFileName);
mTmpOutputPath.string());
}
(*mOutputFile) << "[";
}
Expand All @@ -183,13 +263,45 @@ WriteVerifiedCheckpointHashesWork::endOutputFile()
{
if (mOutputFile && mOutputFile->is_open())
{
// Each line of output made by a VerifyLedgerChainWork has a trailing
// comma, and trailing commas are not a valid end of a JSON array; so we
// terminate the array here with an entry that does _not_ have a
// trailing comma (and identifies an invalid ledger number anyways).
(*mOutputFile) << "\n[0, \"\"]\n]\n";
if (mTrustedHashPath)
{
if (!fs::exists(mTrustedHashPath->string()))
{
throw std::runtime_error("failed to open trusted hash file " +
mTrustedHashPath->string());
}
// Append everything except the first line of mTrustedHashFile to
// mOutputFile.
std::ifstream trustedHashFile(*mTrustedHashPath);
if (trustedHashFile)
{
std::string line;
// Ignore the first line ("["")
std::getline(trustedHashFile, line);
// Append the rest of the lines to mOutputFile.
while (std::getline(trustedHashFile, line))
{
(*mOutputFile) << "\n" << line;
}
trustedHashFile.close();
}
}
else
{
// Each line of output made by a VerifyLedgerChainWork has a
// trailing comma, and trailing commas are not a valid end of a JSON
// array; so we terminate the array here with an entry that does
// _not_ have a trailing comma (and identifies an invalid ledger
// number anyways).
(*mOutputFile) << "\n[0, \"\"]\n]\n";
}
mOutputFile->close();
mOutputFile.reset();

// The output file was written to a temporary file, so rename it to
// the output path provided by the user.
fs::durableRename(mTmpOutputPath.string(), mOutputPath.string(),
mOutputPath.relative_path().string());
}
}

Expand Down
Loading

0 comments on commit acf111d

Please sign in to comment.