You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Normalizing data for cosine to temporary file, please ensure there is additional (n*d*4) bytes for storing normalized base vectors, apart from the interim indices created by DiskANN and the final index.
Normalizing FLOAT vectors in file: /mnt/raid0/DiskANN/tmp/embeddings.bin
Dataset: #pts = 214386453, # dims = 1280
# blks: 1636
tcmalloc: large alloc 1097658646528 bytes == (nil) @ 0x7f96cdbc0680 0x7f96cdbe0ff4 0x561f428bcdbd 0x561f423d1cce 0x561f42390637 0x7f96c6291083 0x561f4239115e
std::bad_alloc
Index build failed.
Your Environment
Ubuntu 22.04.1 LTS
DiskANN version 0.7.0
Additional Details
The problem seems to be in the utils function void normalize_data_file(const std::string &inFileName, const std::string &outFileName)
bioinsilico
changed the title
[BUG] DiskANN allocation of memory for the whole set of points
[BUG] DiskANN tries to allocate memory the whole set of points
Dec 19, 2024
bioinsilico
changed the title
[BUG] DiskANN tries to allocate memory the whole set of points
[BUG] DiskANN tries to allocate memory for the whole set of points
Dec 19, 2024
Expected Behavior
No allocation memory errors for large datasets and cosine distance.
Actual Behavior
The program crashes when using cosine distance and large datasets due to memory allocation.
Example Code
Dataset Description
Error
Your Environment
Additional Details
The problem seems to be in the utils function
void normalize_data_file(const std::string &inFileName, const std::string &outFileName)
DiskANN/src/utils.cpp
Line 118 in fc3c6e2
It tries to allocate memory for the whole dataset. If I had to guess, the allocation should be only for the block size
blk_size
.The text was updated successfully, but these errors were encountered: