Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Magic Bitboard Table is slow due to pointer indirection and cache misses #1

Open
master-spike opened this issue Sep 14, 2023 · 0 comments
Assignees
Labels
help wanted Extra attention is needed performance Something is working slower than it could, potentially

Comments

@master-spike
Copy link
Owner

The current implementation of MagicTable is as follows:

class MagicTable {
public:
  U64 knight_table[64];
  U64 king_table[64];
  MagicBB bishop_magics[64];
  MagicBB rook_magics[64];
  U64 passed_pawns[128]; // sq + color*64;
  MagicTable();
  ~MagicTable() = default;
  MagicTable(MagicTable& mt);
};

and MagicBB is like:

class MagicBB {
private:
  int square;
  U64 mask;
  U64 magicnum;
  std::vector<U64> table;
  int bits;
public:
  MagicBB() = default;
  MagicBB(int sq, U64 mg, U64 ms, int b);
  MagicBB(const MagicBB& in_mbb);
  U64 compute(U64 blockers);
  // ... et cetera
};

To find the blockers of a rook or bishop, the move generator has to hold a pointer to a MagicTable instance (in MoveGenerator it is std::shared_ptr<MagicTable> mt) and do something like mt->bishop_magics[i].compute(). To do this, the pointer to the magic table is dereferenced to find the MagicBB object for the piece-square we are looking for, then the pointer to the table entry in MagicBB must be dereferenced to find the result. Since MagicTable::compute is called very often, this is leading to a time usage of about 12% in the compute() function. The multiple levels of indirection very often leads to cache misses. Running perf stat shows a 9.2% backend bound on my machine, and the magic table and transposition table are the only large sections of memory where cache misses may be likely to occur. Since the transposition table is not accessed during most of the search (in the quiescence nodes for example), it means that almost all the cache misses are likely in MagicBB::compute(), and that stalling due to cache misses accounts for 75% of the time spent in MagicBB::compute.

The implementation of the magic table should be improved to reduce the cache miss rate. One way to do this would be to ensure that all the entries in all the magic tables are in a contiguous region of memory, and furthermore by finding better magic numbers the size of the total magic table could be potentially reduced. It may also improve performance to put the magic table meta-information as a global constant and hard-code it or compute them at compile-time using some TMP, as then the move generator and any other module using the magic bitboards don't have to dereference a pointer to a magic table object, but rather access the data directly.

@master-spike master-spike self-assigned this Sep 14, 2023
@master-spike master-spike added help wanted Extra attention is needed performance Something is working slower than it could, potentially labels Sep 14, 2023
@master-spike master-spike changed the title Magic Bitboard Table is slow due to pointer indirection Magic Bitboard Table is slow due to pointer indirection and cache misses Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed performance Something is working slower than it could, potentially
Projects
None yet
Development

No branches or pull requests

1 participant