Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

our XYZ parser read a box even if a box is not there #999

Open
carlocamilloni opened this issue Dec 1, 2023 · 7 comments
Open

our XYZ parser read a box even if a box is not there #999

carlocamilloni opened this issue Dec 1, 2023 · 7 comments
Labels

Comments

@carlocamilloni
Copy link
Member

@GiovanniBussi @gtribello @maxbonomi @Iximiel
In many of our .xyz trajectories in the regtest we do not have a box in the second line but something like:
"generated by VMD" 3 words that are then converted to numbers 👍
now I modified driver.cpp to use tools:convert but I think we should handle it better.
cf. ea418fe in #990

@Iximiel
Copy link
Member

Iximiel commented Dec 1, 2023

Could be better to set up a regex like ^(%f %f %f|%f %f %f %f %f %f %f %f %f) instead of counting the words? so that "generated by VMD" became a "no box" instead of an error?

And set up a make-style test that tries a complete set of combinations right and wrong to check that (to future-proof the thing)?

the "%f" in the example should be something like [0-9]+(\.[0-9]*|) but correct

@GiovanniBussi
Copy link
Member

Setting up a regex is tricky for numbers in scientific notation (1e-2).

One option would be to check the value returned by sscanf. It it's different from 3 (or 9), we call plumed_error(). However, I think I slightly prefer @carlocamilloni 's solution because it allows to pass box elements such as 3.2*sin(pi/3) which might help describing non orthorombic boxes.

Finally, we might even decide to allow some special string. E.g., we could say generated by VMD means "no box", and check for these string before trying to interpret the numbers. Would this be better? But in Carlo's commit I see other strings (e.g. Atoms. Timestep: 0). I am not sure it makes sense to generalize to all these options.

We might even look for a standardized syntax (such as "extended XYZ", see e.g. here). If we do this I would keep the backward compatibility with 3 (or 9) numbers. I am not sure it's worth, it depends on how many software can handle this. VMD cannot apparently.

@carlocamilloni
Copy link
Member Author

At the moment I am fixing the few broken regtest and merging the pull request to reenable the intel CI. I think it would be better to rely on external libraries to read file (xyz, gro, etc), ideally shipping the code with plumed, and remove our own implementations. BTW we will need to implement an mmCIF reader soon

https://github.com/PDB-REDO/libcifpp
https://mmcif.wwpdb.org/docs/sw-examples/cpp/html/index.html

@GiovanniBussi
Copy link
Member

The problem with xyz is that, if there's such a library, it would break our long-running convention of using box vectors in the second line.

For other formats I totally agree. Implementing the gro reader was not trivial, I had to read the gromacs code to do it properly (I hope it's correct). It would be nice to have mmCIF I agree.

@carlocamilloni
Copy link
Member Author

if we find a stable library I would go for breaking our convention because this bug was tricky to detect, it must have been around for 8 years or so.

I have found this:
http://chemfiles.org/chemfiles/latest/formats.html#list-of-supported-formats

carlocamilloni added a commit that referenced this issue Dec 1, 2023
fixes #976
related to #999

* test intel CI

* switch to the new default intel compilers

* fix for CC

* decrease accuracy of rc-cs2backbone regtest

* reduced precision of ves regtests

* Revert "reduced precision of ves regtests"

This reverts commit 272ff40.

* decreased accuracy of ermsd derivates

btw how are they so noisy?

* Revert "decreased accuracy of ermsd derivates"

This reverts commit 8153e68.

* reduced precision of ermsd derivatives

I am afraid that they are almost useless  now

* fix regtest comparison
with +-nan

* ifx is not found in intel CI revert to ifort

* improve nan comparison
fortran intel compiler is not yet well configured

* more ifx test

* updated intel compiler installation

* now it intel compilers should work properly

* simplified the nan comparison for the regtests

* ves regtests: reduce by one more (#995) (#997)

Co-authored-by: Omar Valsson <[email protected]>

* further reduced precision ves regtest

* more ves

* reset

* reset

* there is an issue with many xyz file in our regtests
because they do not define a box but the parser was
going on without errors.

now this is fixed but I think we could do it better

* work around for missing boxes in xyz file of
multiple regtests

* astyle

---------

Co-authored-by: Omar Valsson <[email protected]>
Co-authored-by: Giovanni Bussi <[email protected]>
@GiovanniBussi
Copy link
Member

I think the bug was not in the convention, but in the implementation. If we assume that 3 numbers represent the box, the mistake is not checking that they are numbers...

@carlocamilloni
Copy link
Member Author

Yea I agree, anyway that chemfiles project may be a very good option

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants