-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variable deletion consumes a lot of memory #17092
Comments
@gfyoung Sure, find it attached. Just to make it clear, the usage of the I profile using the memory profiler extension of Jupyter Notebooks.
|
where is it stated that this actually does anything w.r.t. memory usage? virtually all It may release the memory, depending on IF the underlying data was a view or a copy.
You are much more likely though to release memory if you use a more idiomatic.
This removes the top-level reference to the original frame. Note that none of this actually will garbage collect (and nothing will release the memory back to the os). |
I know the Anyway, this was not the topic of this conversation. Closing the issue does not help solving it, it is just hiding the dirty under the mat... It would be better to read my main message. The problem is that there is not a way of deleting variables in a big DataFrame without generating a huge peak of memory, this is a big problem guys. In addition, again, regarding to your comment @jreback, I do not have problems releasing memory, I have a highly unexpected peak of memory. Best, |
this is not going to be solved in pandas 1. Data of a single dtype is blocked, creating a a view on that does not release the memory (and that is what you are doing). You can do this.
|
Is there any update on this issue? SO far two contradicting solutions have been proposed.
and
What is the best way to delete a column without running out of memory? |
We encountered the same issue, and just to reiterate, it's the problem with huge memory peak during the |
@giangdaotr I've made a demo to show the cost of using Personally I'm keen to know more because reasoning about memory using in Pandas (and when/if you get a view or a copy) is pretty tricky, I'm using my |
Hi team,
I have been having issues with pandas memory management. Specifically, there is an (at least for me) unavoidable peak of memory which occurs when attempting to remove variables from a data set. It should be (almost) free! I am getting rid of part of the data, but it still needs to allocate a big amount of memory producing MemoryErrors.
Just to give you a little bit of context, I am working with a DataFrame which contains 33M of rows and 500 columns (just a big one!), almost all of them numeric, in a machine with 360GB of RAM. The whole data set fits in memory and I can successfully apply some transformations to the variables. The problem comes when I need to drop a 10% of the columns contained in the table. It just produces a big peak of memory leading to a
MemoryError
. Before performing this operation, there are more than 80GB of memory available!.I tried to use the following methods for removing the columns and all of them failed.
drop()
with or withoutinplace
parameterpop()
reindex()
reindex_axis()
del df[column]
in a loop over the columns to be removed__delitem__(column)
in a loop over the columns to be removedpop()
anddrop()
in a loop over the columns to be removed.loc()
andiloc()
but it does not help.I found that the drop method with inplace is the most efficient one but it still generates a huge peak.
I would like to discuss if there is there any way of implementing (or is it already implemented by any chance) a method for more efficiently removing variables without generating more memory consumption...
Thank you
Iván
The text was updated successfully, but these errors were encountered: