Magic Bitboard Table is slow due to pointer indirection and cache misses #1

master-spike · 2023-09-14T05:06:18Z

The current implementation of MagicTable is as follows:

class MagicTable {
public:
  U64 knight_table[64];
  U64 king_table[64];
  MagicBB bishop_magics[64];
  MagicBB rook_magics[64];
  U64 passed_pawns[128]; // sq + color*64;
  MagicTable();
  ~MagicTable() = default;
  MagicTable(MagicTable& mt);
};

and MagicBB is like:

class MagicBB {
private:
  int square;
  U64 mask;
  U64 magicnum;
  std::vector<U64> table;
  int bits;
public:
  MagicBB() = default;
  MagicBB(int sq, U64 mg, U64 ms, int b);
  MagicBB(const MagicBB& in_mbb);
  U64 compute(U64 blockers);
  // ... et cetera
};

To find the blockers of a rook or bishop, the move generator has to hold a pointer to a MagicTable instance (in MoveGenerator it is std::shared_ptr<MagicTable> mt) and do something like mt->bishop_magics[i].compute(). To do this, the pointer to the magic table is dereferenced to find the MagicBB object for the piece-square we are looking for, then the pointer to the table entry in MagicBB must be dereferenced to find the result. Since MagicTable::compute is called very often, this is leading to a time usage of about 12% in the compute() function. The multiple levels of indirection very often leads to cache misses. Running perf stat shows a 9.2% backend bound on my machine, and the magic table and transposition table are the only large sections of memory where cache misses may be likely to occur. Since the transposition table is not accessed during most of the search (in the quiescence nodes for example), it means that almost all the cache misses are likely in MagicBB::compute(), and that stalling due to cache misses accounts for 75% of the time spent in MagicBB::compute.

The implementation of the magic table should be improved to reduce the cache miss rate. One way to do this would be to ensure that all the entries in all the magic tables are in a contiguous region of memory, and furthermore by finding better magic numbers the size of the total magic table could be potentially reduced. It may also improve performance to put the magic table meta-information as a global constant and hard-code it or compute them at compile-time using some TMP, as then the move generator and any other module using the magic bitboards don't have to dereference a pointer to a magic table object, but rather access the data directly.

The text was updated successfully, but these errors were encountered:

master-spike self-assigned this Sep 14, 2023

master-spike added help wanted Extra attention is needed performance Something is working slower than it could, potentially labels Sep 14, 2023

master-spike changed the title ~~Magic Bitboard Table is slow due to pointer indirection~~ Magic Bitboard Table is slow due to pointer indirection and cache misses Sep 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Magic Bitboard Table is slow due to pointer indirection and cache misses #1

Magic Bitboard Table is slow due to pointer indirection and cache misses #1

master-spike commented Sep 14, 2023

Magic Bitboard Table is slow due to pointer indirection and cache misses #1

Magic Bitboard Table is slow due to pointer indirection and cache misses #1

Comments

master-spike commented Sep 14, 2023