-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
!!!! erroneous table mutation when adding a large column !!!! #86
Comments
That's odd. Can you test a similar thing with just DataFrames? Since your adding of the new column is a DataFrame specific thing, GeoDataFrames doesn't play a role there anymore. |
The shared file has been deleted, so I don't know what Worst case, you generate a lot of memory pressure with your vector of vectors, and something is garbage collected. Then again, you also say deepcopy doesn't work, so something else is happening (or deepcopy on DataFrames is not correctly implemented). |
RiverIDTrace is a column of Vector{Int}, yes. The values inside were randomly overwritten with zeros, seemingly no correlation to row number. (I don't currently have access to the file but saw the bug being reproduced) |
@evetion @asinghvi17 please see updated path to file. I am working on a DataFrames only replication of the issue but have not succeeded yet. I will keep working at it. |
I tried a similar thing in pure DataFrames, and it does not pose an issue there: MWEusing DataFrames
df = DataFrame();
int_data = [rand(Int, rand(1:1_000)) for i in 1:215_000]
df.col1 = deepcopy(int_data)
df.col2 = [zeros(1000000) for i in 1:size(df, 1)]
all(df.col1 .== int_data) # true https://github.com/yeesian/ArchGDAL.jl/blob/a322ce6eb8a811b6ec053608c95c385464214d92/src/ogr/feature.jl#L345 looks to be where int arrays are moved from GDAL to Julia. It looks like this is an |
Good catch! That's the culprit, and kudos for @alex-s-gardner for actually spotting it in real life (sorry for that). But you shouldn't
So the fix would be to at least |
Ah, I missed that we were wrapping the pointer returned directly. Yeah in that case Just for my satisfaction, and to document this, julia> A = rand(10)
10-element Vector{Float64}:
0.8487556697809062
0.4530489648028254
0.7915015228101486
0.5249905434671536
0.011043884362292533
0.8092927336663542
0.2807079717859139
0.5462812200563412
0.7293837731721518
0.8515677666121682
julia> Ap = pointer(A)
Ptr{Float64} @0x00000001ccca4f70
julia> Au = unsafe_wrap(Vector{Float64}, Ap, size(A))
10-element Vector{Float64}:
0.8487556697809062
0.4530489648028254
0.7915015228101486
0.5249905434671536
0.011043884362292533
0.8092927336663542
0.2807079717859139
0.5462812200563412
0.7293837731721518
0.8515677666121682
julia> Auc = copy(Au)
10-element Vector{Float64}:
0.8487556697809062
0.4530489648028254
0.7915015228101486
0.5249905434671536
0.011043884362292533
0.8092927336663542
0.2807079717859139
0.5462812200563412
0.7293837731721518
0.8515677666121682
julia> Auc[1] = 1
1
julia> A
10-element Vector{Float64}:
0.8487556697809062
0.4530489648028254
0.7915015228101486
0.5249905434671536
0.011043884362292533
0.8092927336663542
0.2807079717859139
0.5462812200563412
0.7293837731721518
0.8515677666121682
julia> Au
10-element Vector{Float64}:
0.8487556697809062
0.4530489648028254
0.7915015228101486
0.5249905434671536
0.011043884362292533
0.8092927336663542
0.2807079717859139
0.5462812200563412
0.7293837731721518
0.8515677666121682
julia> Auc
10-element Vector{Float64}:
1.0
0.4530489648028254
0.7915015228101486
0.5249905434671536
0.011043884362292533
0.8092927336663542
0.2807079717859139
0.5462812200563412
0.7293837731721518
0.8515677666121682
julia> Au[1] = 1
1
julia> Au
10-element Vector{Float64}:
1.0
0.4530489648028254
0.7915015228101486
0.5249905434671536
0.011043884362292533
0.8092927336663542
0.2807079717859139
0.5462812200563412
0.7293837731721518
0.8515677666121682
julia> A
10-element Vector{Float64}:
1.0
0.4530489648028254
0.7915015228101486
0.5249905434671536
0.011043884362292533
0.8092927336663542
0.2807079717859139
0.5462812200563412
0.7293837731721518
0.8515677666121682 |
So just for my own edification, why did deepcopy not prevent this issue? |
It could be that GDAL overwrote the memory before / during the |
fixed with yeesian/ArchGDAL.jl#442 |
This one caught me off guard. Large tables seem to be unsafe when manipulating this example geo parquet file (using GeoDataFrames v0.3.10 with Julia v"1.11.1"):
https://drive.google.com/file/d/1FJUbk_Smj3VoMhGeR790AtEEaEwZPiFY/view?usp=sharing
In this case adding a new column with 'vector length = 100' does not modify existing columns
adding a large vector, 'vector length = 1000000', DOES MODIFY existing columns
adding
deepcopy
fixes the problem in this instance but after more testingdeepcopy
does not work in all casesThe text was updated successfully, but these errors were encountered: