README error #34

muammar · 2022-04-28T15:19:17Z

After you generate the vocabulary in the first step of the README,

python get_vocab.py --ncpu 16 < data/chembl/all.txt > vocab.txt

the next line should be:

python preprocess.py --train data/chembl/all.txt --vocab vocab.txt --ncpu 16 --mode single

Otherwise, you get the following error:

IndexError: tuple index out of range

The text was updated successfully, but these errors were encountered:

orubaba · 2022-04-30T15:23:27Z

I have a question. how long does it take for the training to conclude. I have been running the "python preprocess.py --train data/chembl/all.txt --vocab vocab.txt --ncpu 16 --mode single" for a whole day and it has not completed. Is there something I am doing wrongly?

muammar · 2022-05-02T12:48:08Z

I have a question. how long does it take for the training to conclude. I have been running the "python preprocess.py --train data/chembl/all.txt --vocab vocab.txt --ncpu 16 --mode single" for a whole day and it has not completed. Is there something I am doing wrongly?

That's not normal. It took a couple of hours. I had to change the number of CPUs used because it was killing the memory ram of my workstation, and I have 256 GB of RAM.

orubaba · 2022-05-03T05:21:03Z

wow. Thanks. I was relying on my 16 ram laptop to do the work. it seems that was an ambitious thought. Now I see why I wasn't getting any headway.

muammar · 2022-05-03T12:46:19Z

wow. Thanks. I was relying on my 16 ram laptop to do the work. it seems that was an ambitious thought. Now I see why I wasn't getting any headway.

The chembl dataset is huge, and I think the script is doing its stuff but keeping everything in memory. At some point, you will run out of RAM. There are libraries, like Dask, that could allow you to work with processes requiring a huge amount of RAM but you would need to implement it. If you read the preprocess.py script, you will realize they are doing a pickle.dump at the end of the preprocessing procedure, if you could find a way to write it before and not waiting until the end, you can clear the garbage collector and free memory.

orubaba · 2022-05-05T10:49:06Z

Thanks so much for the suggestion. I am trying to run the get_vocab.py code on a far reduced size of the chembl dataset but got this error - multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x7f6c291da0a0>'. Reason: 'PicklingError("Can't pickle <class 'Boost.Python.ArgumentError'>: import of module 'Boost.Python' failed" - i have checked online but i haven't worked it out.
kindly assist.

muammar · 2022-06-14T15:53:45Z

Thanks so much for the suggestion. I am trying to run the get_vocab.py code on a far reduced size of the chembl dataset but got this error - multiprocessing.pool.MaybeEncodingError: Error sending result: '<multiprocessing.pool.ExceptionWithTraceback object at 0x7f6c291da0a0>'. Reason: 'PicklingError("Can't pickle <class 'Boost.Python.ArgumentError'>: import of module 'Boost.Python' failed" - i have checked online but i haven't worked it out. kindly assist.

See #33

muammar · 2022-06-14T16:20:27Z

Forget about the message above. It is not using multiprocessing at all.

orubaba mentioned this issue Jun 14, 2022

IndexError: tuple index out of range #38

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README error #34

README error #34

muammar commented Apr 28, 2022 •

edited

Loading

orubaba commented Apr 30, 2022

muammar commented May 2, 2022

orubaba commented May 3, 2022

muammar commented May 3, 2022

orubaba commented May 5, 2022

muammar commented Jun 14, 2022

muammar commented Jun 14, 2022

README error #34

README error #34

Comments

muammar commented Apr 28, 2022 • edited Loading

orubaba commented Apr 30, 2022

muammar commented May 2, 2022

orubaba commented May 3, 2022

muammar commented May 3, 2022

orubaba commented May 5, 2022

muammar commented Jun 14, 2022

muammar commented Jun 14, 2022

muammar commented Apr 28, 2022 •

edited

Loading