Save the corpus and use later as seed #633

dank-cruise · 2023-10-12T17:10:13Z

Hi there!

A. Is it possible to save or dump the corpus that's been found so far? E.g. when I terminate the fuzzing run, it should save the corpus that's been discovered so far. Presumably the corpus path would be a command line flag.
B. When I fuzz the same target again later, using the same Domains and all that, can I reuse a previously saved corpus?

Obviously, this is not a new idea. For example, Chromium fuzzing talks about it.

A on its own is useful, even if B isn't done. I think it would be very useful to take the corpus from A, and create a unit test for every corpus element, and add that to continuous Integration and pre-commit testing.

irowebbn · 2023-12-20T16:18:16Z

There appears to be a command line flag for this (I found it by running my test binary with the --helpfull flag).

--corpus_database (The directory containing all corpora for all fuzz tests
      in the project. For each test binary, there's a corresponding
      <binary_name> subdirectory in `corpus_database`, and the <binary_name>
      directory has the following structure: (1) For each fuzz test
      `SuiteName.TestName` in the binary, there's a sub-directory with the name
      of that test ('<binary_name>/SuiteName.TestName'). (2) For each fuzz test,
      there are three directories containing `regression`, `crashing`, and
      `coverage` directories. Files in the `regression` directory will always be
      used. Files in `crashing` directory will be used when
      --reproduce_findings_as_separate_tests flag is true. And finally, all
      files in `coverage` directory will be used when --replay_corpus flag is
      true.); default: "~/.cache/fuzztest";

Unfortunately, I have not been able to get it to work.

racko · 2024-01-19T22:59:58Z

There is an undocumented environment variable that helps us along one step: FUZZTEST_TESTSUITE_OUT_DIR

$ FUZZTEST_TESTSUITE_OUT_DIR=/some/path my_fuzztest --fuzz My.Test

will create /some/path and create lots of beautiful corpus files in it.

FUZZTEST_TESTSUITE_IN_DIR could be used in the same way to reuse the corpus later. (This is a separate mechanism from the --corpus_database stuff.)

However, the directory structure described in the --corpus_database flag documentation is not created.
As a workaround, you can create the directory structure yourself, e.g. by running

$ FUZZTEST_TESTSUITE_OUT_DIR=~/.cache/fuzztest/<binary_name>/SuiteName.TestName/coverage <binary_name> --fuzz SuiteName.TestName

Later, to use the corpus, run

$ <binary_name> --fuzz SuiteName.TestName --corpus_database ~/.cache/fuzztest --replay_coverage_inputs

You cannot skip the --corpus_database ~/.cache/fuzztest argument: fuzztest does try to use ~/.cache/fuzztest as a default, but this doesn't actually work because ~ is not resolved by the C++ library code. But it is by your shell when you pass the argument on the command line.

As far as I can tell, we cannot make fuzztest write samples to the --corpus_database just by passing the argument. The path is exclusively used in

fuzztest/fuzztest/init_fuzztest.cc

Lines 154 to 162 in 4c3852b

    
           std::string binary_corpus = absl::StrCat( 
        
               absl::GetFlag(FUZZTEST_FLAG(corpus_database)), "/", binary_identifier); 
        
           if (getenv("TEST_SRCDIR")) { 
        
             binary_corpus = absl::StrCat(getenv("TEST_SRCDIR"), "/", binary_corpus); 
        
           } 
        
           return internal::Configuration{ 
        
               .corpus_database = internal::CorpusDatabase( 
        
                   binary_corpus, absl::GetFlag(FUZZTEST_FLAG(replay_coverage_inputs)), 
        
                   absl::GetFlag(FUZZTEST_FLAG(reproduce_findings_as_separate_tests))),

to create a CorpusDatabase object:

fuzztest/fuzztest/internal/configuration.h

Lines 14 to 41 in 4c3852b

    
           class CorpusDatabase { 
        
            public: 
        
             explicit CorpusDatabase(absl::string_view database_path, 
        
                                     bool use_coverage_inputs, bool use_crashing_inputs) 
        
                 : database_path_(std::string(database_path)), 
        
                   use_coverage_inputs_(use_coverage_inputs), 
        
                   use_crashing_inputs_(use_crashing_inputs) {} 
        
             // Returns set of all regression inputs from `corpus_database` for a fuzz 
        
             // test. 
        
             std::vector<std::string> GetRegressionInputs( 
        
                 absl::string_view test_name) const; 
        
             // Returns set of all corpus inputs from `corpus_database` for a fuzz test. 
        
             // Returns an empty set when `use_coverage_inputs_` is false. 
        
             std::vector<std::string> GetCoverageInputsIfAny( 
        
                 absl::string_view test_name) const; 
        
             // Returns set of all crashing inputs from `corpus_database` for a fuzz test. 
        
             // Returns an empty set when `use_crashing_inputs_` is false. 
        
             std::vector<std::string> GetCrashingInputsIfAny( 
        
                 absl::string_view test_name) const; 
        
            private: 
        
             std::string database_path_; 
        
             bool use_coverage_inputs_ = false; 
        
             bool use_crashing_inputs_ = false; 
        
           };

And as you can see, CorpusDatabase has no public API to get the database_path_ which would be necessary to write the new corpus files to it.

chandlerc · 2024-01-28T02:43:29Z

Some way of seeding with a corpus, and minimizing a corpus of seeds is really needed.

For example, these workflows are well supported with libFuzzer already:
https://github.com/google/fuzzing/blob/master/tutorial/libFuzzerTutorial.md#seed-corpus
https://github.com/google/fuzzing/blob/master/tutorial/libFuzzerTutorial.md#minimizing-a-corpus

I'm trying to migrate from libFuzzer to FuzzTest, and currently this is the biggest issue I'm facing.

davidben · 2024-05-04T17:49:23Z

Same. FuzzTest's model of putting all the fuzzers in one build target would be really attractive for BoringSSL (it would simplify keeping the same build across multiple build systems). But one of our workflows is that we record transcripts from our tests (a good sample of different TLS protocol flow and other hand-crafted interesting cases) and then minimize them as the starting corpus for the fuzzer, so it doesn't need to discover how the TLS protocol works from scratch.

dank-cruise changed the title ~~Save/dump corpus and use later as seed~~ Save the corpus and use later as seed Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save the corpus and use later as seed #633

Save the corpus and use later as seed #633

dank-cruise commented Oct 12, 2023 •

edited

Loading

irowebbn commented Dec 20, 2023

racko commented Jan 19, 2024

chandlerc commented Jan 28, 2024

davidben commented May 4, 2024

Save the corpus and use later as seed #633

Save the corpus and use later as seed #633

Comments

dank-cruise commented Oct 12, 2023 • edited Loading

irowebbn commented Dec 20, 2023

racko commented Jan 19, 2024

chandlerc commented Jan 28, 2024

davidben commented May 4, 2024

dank-cruise commented Oct 12, 2023 •

edited

Loading