The benchmark directory contains several types of files as follows:
- Header files with the classes that are used to define the benchmark test routines. See Fixtures section for details.
It also includesbm_definitions.h
header that contains benchmarks utils such as:IndexType
struct- common typedefs
- macro shortcuts to the static data members of
BM_VecSimGeneral
andBM_VecSim_Index<index_type_t>
- subdirectory bm_initialization - Each file is associated with a test set (basics/batch etc) of a particular data type, and fits for both single and multi indices. Here we define and register the tests using Google benchmarks macros. See How to define and register a new test section.
- subdirectory run_files - contains a file for each combination of: set of benchmark tests, index data type, and index type(single/multi).
In this file we:- Initialize
BM_VecSimGeneral
required data members according to the set of parameters of the index loaded from the relevant serialized index file. - In addition, the macro
BM_FUNC_NAME(bm_func, algo)
is defined to be used in the initialization file, to define a unique name for the benchmark test (see Tests naming). - Include the initialization file from
bm_initialization
subdirectory that calls google benchmark definitions and registrations macros. - Call
BENCHMARK_MAIN()
- Initialize
Google benchmarks library supports templated fixtures (see Google benchmarks user guide regarding this topic).
BENCHMARK_TEMPLATE_DEFINE_F(fixture_name, test_name, data_type)(benchmark::State &st){fixture_name::test_implementation(st, test_args);}
fixture_name::test_implementation() should be defined in the fixture class that implements the required test case.
BENCHMARK_REGISTER_F(test_fixture, test_name)->Args({val1, val2})->ArgNames({"name1", "name2"})->Iterations(val0)->Unit(benchmark::kMillisecond)
Here we register to the benchmarks running loop the test that was bound to test_fixture
and has test_name
.
NOTE: The registration order is important. Changing it might violate the ability to compare one benchmark run to another. We have tests (such as addLabel and DeleteLabel) that modify the indices and are not idempotent.
Since the Google benchmarks library supports fixtures with only one template argument, we use a helper struct to define multiple types.
// benchmarks/bm_definitions.h
template <VecSimType type, typename DataType, typename DistType = DataType>
struct IndexType {
static VecSimType get_index_type() { return type; }
typedef DataType data_t;
typedef DistType dist_t;
};
// benchmarks/bm_vecsim_index.h
template <typename index_type_t>
class BM_VecSimIndex : public BM_VecSimGeneral {
// Parsing the struct's types
using data_t = typename index_type_t::data_t;
using dist_t = typename index_type_t::dist_t;
...
static std::vector<std::vector<data_t>> queries;
// Using the VecSimType getter
static void Initialize() {
VecSimType type = index_type_t::get_index_type();
...
}
};
Defined in bm_vecsim_general.h
This class inherits from benchmark::Fixture.
It includes the index parameters as static data members. The static data members are initialized in the .cpp files under benchmark/run_files.
To refer a member from one of the derived classes, BM_VecSimGeneral:: must be used. Shortcut macros such as #define N_QUERIES BM_VecSimGeneral::n_queries
are available in bm_definitions.h.
Defined in bm_vecsim_index.h
This class's purpose is to initialize the indices' and queries' lists data.
It contains:
- A list
indices
ofVecSimIndex *
type. One can refer to an index usingVecSimAlgo
enum, whereVecSimAlgo = VecSimAlgo_BF = 0
andVecSimAlgo = VecSimAlgo_HNSWLIB = 1
.
In the initialization function, we load HNSW index data from a file and insert the same vectors to the flat index under the same labels. NOTE: In the updated tests set we use this list to save the four indices described above, whereindices[0]
andindices[1]
are the flat and HNSW before the update, respectively, andindices[2]
andindices[3]
are the flat and HNSW after the update, respectively. - A list
queries
, contains the query vectors ofdata_t
type.
Both lists are populated using the files defined inBM_VecSimGeneral
. - Reference count - each benchmark definitions instantiate a new object of the
test_fixture
. Loading the indices and storing them in every single instance might take time and memory. To avoid calling the initialization function multiple times, we use the reference count in the constructor- if it's 0 we initialize the lists mentioned above, otherwise we just increase the reference count.
We also make sure we don't free the indices before we finished all the set's tests - in the class destructor we decrease the reference count, and if it has reached 0 - we free the indices.
indices
andqueries
lists need to be explicitly initialized for a specific type. For example:
template <>
std::vector<std::vector<float>> BM_VecSimIndex<fp32_index_t>::queries{};
template <>
std::vector<VecSimIndex *> BM_VecSimIndex<fp32_index_t>::indices{};
Defined in bm_common.h
In this class we define the test routines which are common between the basics benchmark and the updated index benchmarks:
TopK_BF
TopK_HNSW
Memory_FLAT
Memory_HNSW
These functions have index_offset
parameter (defaults to 0) to be used when we want to apply the function on an updated index.
In addition, in this file, we define macros with the arguments for the TopK
registration.
Defined in bm_vecsim_basics.h
In this class we define additional tests for the basic tests set:
AddLabel
- Add one label per iteration (meaning one vector for single value index, and multiple vectors for multi-value index)DeleteLabel<algo_t>
- Delete one label per iteration (meaning 1 vector for single value index, and multiple vector for multi value index).
algo_t
is the specific index type that is required to obtain vector data by its label (used to restore the deleted vectors). This requirement for thealgo_t
parameter is the reason that this test case is defined separately in each .cpp file usingDEFINE_DELETE_LABEL
macro (defined in bm_vecsim_basics.h). The registration is done like all other test cases in the initialization file.Range_BF
Range_HNSW
In addition, in this file, we define macros with the arguments for theRange
,AddLabel
, andDeleteLabel
registration.
The basic benchmark set includes functions defined both in BM_VecSimCommon
and BM_VecSimBasics
. Pay attention to refer the correct class when defining the test in the initialization file.
Defined in bm_updated_index.h
In this class constructor, we add the additional indices, the "updated" indices. The "before" indices are initialized in BM_VecSimIndex
.
This class defines no new test routines and uses only tests defined in BM_VecSimCommon
.
Defined in bm_batch_iterator.h
Here we implement the tests routines of batch iterator set to be called by the test definition in the initialization file.
In this file, we also define the arguments for each test case in the batch iterator tests set.
A test name will have the following format:
test_fixture<fixture_data_type>/test_name/arg1:val/iterations:val
for example:
BM_VecSimBasics<fp32_index_t>/Range_HNSW_Single/radiusX100:20/epsilonX1000:1/iterations:10
The test's name components are:
- The test's fixture and the fixture type (if it's templated)
- The name as given in definition and registration
- arguments names and values
NOTE: the fixture type (fp32_index_t, for example) is part of the name, so tests that use a different fixture or the same fixture of a different type, can share the same name.
This script expects to get zero or one argument.
It will print the names of the benchmarks binary files, without the bm_
prefix, separated by \n
.
You can run this script with no argument (or BM_TYPE = benchmarks-all
) to refer to all available benchmarks, or specify a set of benchmark tests.