-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading in frames from an IOBuffer #46
Comments
The I/O is actually all done in C code in https://github.com/libAtoms/ExtXYZ.jl/blob/master/src/fileio.jl#L16 From a quick look, however, there is no |
As a workaround you could create a ramdisk and do the I/O there: that might not be an option on machines where you are not root, however. |
Alternatively there is a pure Python implementation of A pure Julia parser based on the ExtXYZ Lark grammar would be possible using Lerch.jl, but this is a much bigger undertaking. |
As far as I can tell, there's no way of associating a file descriptor with an A ramdisk would be usable, but yeah this is mostly problematic on networked nodes where I don't have root access. The performance hit is manageable on my laptop where all IO is done on an NVMe drive, but anything slower is a bit painful. The Python implementation could be workable, I can have a look at that. |
On second thoughts, the Python version would require a bit more modification, since it doesn't currently appear to have an option to return the frames as |
From memory I think the dict version is there too, just one level deeper: read_frame_dicts() perhaps. |
Ah yep, got it. I'd seen that function but didn't realise it was extracting the data in the same way. It would require a bit of reorganisation on my end to get the |
Something like this does the trick: using PyCall
pyextxyz = pyimport("extxyz")
function xyz_to_frame(xyz_str::String)
iob = IOBuffer(xyz_str)
na, info, arrays, _ = pyextxyz.extxyz.read_frame_dicts(split(String(take!(iob)), '\n'), use_regex=false)
close(iob)
info = Dict{String, Any}(key => val for (key, val) in info)
if na == 1
arrays = Dict{String, Any}("species" => [arrays.item()[1]], "pos" => reduce(hcat, [arrays.item()[2]]))
else
arrays = Dict{String, Any}("species" => [a[1] for a in arrays], "pos" => reduce(hcat, [a[2] for a in arrays]))
end
frame = Dict{String, Any}("N_atoms" => na, "info" => info, "arrays" => arrays)
return frame
end There were a few tricky hurdles: for my input string-form XYZs I had to use # ExtXYZ.jl implementation
frame = read_frame(xyz_file)
# Python extxyz extension
my_frame = xyz_to_frame(xyz_str)
@assert frame == my_frame I haven't done any formal benchmarking to see the performance hit/gain yet, but anecdotally loading in a large chemical reaction network's worth of XYZs was much speedier than before. I have a lot of points in my workflow that require geometries to be passed back and forth between ExtXYZ form and OpenBabel (through PyCall), so I also implemented the reverse case (molecules loaded in as ExtXYZ frames with function frame_to_xyz(frame::Dict{String, Any})
na = frame["N_atoms"]
s = "$na\n"
comment = join(["$key=$value" for (key, value) in frame["info"]], " ")
s *= "$comment\n"
for i in 1:na
s *= "$(frame["arrays"]["species"][i]) $(frame["arrays"]["pos"][1, i]) $(frame["arrays"]["pos"][2, i]) $(frame["arrays"]["pos"][3, i])\n"
end
return s
end
pbmol = pybel.readstring("xyz", frame_to_xyz(my_frame)) This may also be contributing a lot to the potential speedup. I'll mark this as closed here, but if you're interested in the potential speedup then I can have a go at benchmarking the difference more thoroughly later in the week. Thanks! |
I have some code that generates XYZ geometries, and I need these to be converted into ExtXYZ frames. I can write these geometries to temporary files and read them back in as frames from those files, but this becomes incredibly taxing on disk IO when passing around thousands of XYZs, which I unfortunately have to do quite regularly.
I can write each geometry to a string, but I can't find a way to then read this string back in as a frame, since
read_frames
only works with a string for the path to the XYZ file, or with anIOStream
to a file.Is there some way of extending the current
read_frames
implementation to work withIOBuffer
s? That way I could write the string-form XYZ directly to anIOBuffer
and read back in as a frame entirely within memory, which would be much faster.The text was updated successfully, but these errors were encountered: