our XYZ parser read a box even if a box is not there #999

carlocamilloni · 2023-12-01T01:15:18Z

@GiovanniBussi @gtribello @maxbonomi @Iximiel
In many of our .xyz trajectories in the regtest we do not have a box in the second line but something like:
"generated by VMD" 3 words that are then converted to numbers 👍
now I modified driver.cpp to use tools:convert but I think we should handle it better.
cf. ea418fe in #990

The text was updated successfully, but these errors were encountered:

Iximiel · 2023-12-01T07:49:41Z

Could be better to set up a regex like ^(%f %f %f|%f %f %f %f %f %f %f %f %f) instead of counting the words? so that "generated by VMD" became a "no box" instead of an error?

And set up a make-style test that tries a complete set of combinations right and wrong to check that (to future-proof the thing)?

the "%f" in the example should be something like [0-9]+(\.[0-9]*|) but correct

GiovanniBussi · 2023-12-01T08:23:54Z

Setting up a regex is tricky for numbers in scientific notation (1e-2).

One option would be to check the value returned by sscanf. It it's different from 3 (or 9), we call plumed_error(). However, I think I slightly prefer @carlocamilloni 's solution because it allows to pass box elements such as 3.2*sin(pi/3) which might help describing non orthorombic boxes.

Finally, we might even decide to allow some special string. E.g., we could say generated by VMD means "no box", and check for these string before trying to interpret the numbers. Would this be better? But in Carlo's commit I see other strings (e.g. Atoms. Timestep: 0). I am not sure it makes sense to generalize to all these options.

We might even look for a standardized syntax (such as "extended XYZ", see e.g. here). If we do this I would keep the backward compatibility with 3 (or 9) numbers. I am not sure it's worth, it depends on how many software can handle this. VMD cannot apparently.

carlocamilloni · 2023-12-01T08:31:04Z

At the moment I am fixing the few broken regtest and merging the pull request to reenable the intel CI. I think it would be better to rely on external libraries to read file (xyz, gro, etc), ideally shipping the code with plumed, and remove our own implementations. BTW we will need to implement an mmCIF reader soon

https://github.com/PDB-REDO/libcifpp
https://mmcif.wwpdb.org/docs/sw-examples/cpp/html/index.html

GiovanniBussi · 2023-12-01T08:36:10Z

The problem with xyz is that, if there's such a library, it would break our long-running convention of using box vectors in the second line.

For other formats I totally agree. Implementing the gro reader was not trivial, I had to read the gromacs code to do it properly (I hope it's correct). It would be nice to have mmCIF I agree.

carlocamilloni · 2023-12-01T08:43:55Z

if we find a stable library I would go for breaking our convention because this bug was tricky to detect, it must have been around for 8 years or so.

I have found this:
http://chemfiles.org/chemfiles/latest/formats.html#list-of-supported-formats

fixes #976 related to #999 * test intel CI * switch to the new default intel compilers * fix for CC * decrease accuracy of rc-cs2backbone regtest * reduced precision of ves regtests * Revert "reduced precision of ves regtests" This reverts commit 272ff40. * decreased accuracy of ermsd derivates btw how are they so noisy? * Revert "decreased accuracy of ermsd derivates" This reverts commit 8153e68. * reduced precision of ermsd derivatives I am afraid that they are almost useless now * fix regtest comparison with +-nan * ifx is not found in intel CI revert to ifort * improve nan comparison fortran intel compiler is not yet well configured * more ifx test * updated intel compiler installation * now it intel compilers should work properly * simplified the nan comparison for the regtests * ves regtests: reduce by one more (#995) (#997) Co-authored-by: Omar Valsson <[email protected]> * further reduced precision ves regtest * more ves * reset * reset * there is an issue with many xyz file in our regtests because they do not define a box but the parser was going on without errors. now this is fixed but I think we could do it better * work around for missing boxes in xyz file of multiple regtests * astyle --------- Co-authored-by: Omar Valsson <[email protected]> Co-authored-by: Giovanni Bussi <[email protected]>

GiovanniBussi · 2023-12-01T10:52:36Z

I think the bug was not in the convention, but in the implementation. If we assume that 3 numbers represent the box, the mistake is not checking that they are numbers...

carlocamilloni · 2023-12-01T16:12:13Z

Yea I agree, anyway that chemfiles project may be a very good option

carlocamilloni added the bug label Dec 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

our XYZ parser read a box even if a box is not there #999

our XYZ parser read a box even if a box is not there #999

carlocamilloni commented Dec 1, 2023

Iximiel commented Dec 1, 2023

GiovanniBussi commented Dec 1, 2023

carlocamilloni commented Dec 1, 2023

GiovanniBussi commented Dec 1, 2023

carlocamilloni commented Dec 1, 2023

GiovanniBussi commented Dec 1, 2023

carlocamilloni commented Dec 1, 2023

our XYZ parser read a box even if a box is not there #999

our XYZ parser read a box even if a box is not there #999

Comments

carlocamilloni commented Dec 1, 2023

Iximiel commented Dec 1, 2023

GiovanniBussi commented Dec 1, 2023

carlocamilloni commented Dec 1, 2023

GiovanniBussi commented Dec 1, 2023

carlocamilloni commented Dec 1, 2023

GiovanniBussi commented Dec 1, 2023

carlocamilloni commented Dec 1, 2023