-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange bug with enumeration of rock-salt system #85
Comments
Thank Shyue, I'll take a look at this. I'm fixing something else in enumlib
right now. It'll be a bit before I can get to this.
In the meantime, I'm curious why you didn't use a two-atom cell with a
32-atom supercell to do this problem (a la example 5 in the EXAMPLES file
in the enumlib repo). Is this functionality not a better way to achieve
what you are after? If not, I'd love your feedback. It's a *design* or *UI*
question...
Gus Hart
Professor, Physics and Astronomy
http://msg.byu.edu
Dean, Physical and Mathematical Sciences
http://cpms.byu.edu
…On Mon, Jul 8, 2019 at 9:11 AM Shyue Ping Ong ***@***.***> wrote:
I am trying to run a simple enumeration of a rocksalt-type system to
create 1 vacancy in a 2x2x2 supercell of NaCl (I am aware that this is a
trivial problem that does not require enumlib whatsoever, given that all
atoms are symmetrically equivalent in this system. But it is part of an
automated generation of structures at different concentrations).
I am encountering a strange bug. If I create a vacancy on the Na site, the
generated input file is
Na31 Cl32
bulk
8.501033 0.000000 0.000000
0.000000 8.501033 0.000000
0.000000 0.000000 8.501033
2
32
0.000000 0.000000 0.000000 0/1
0.000000 0.000000 4.250517 0/1
0.000000 4.250517 0.000000 0/1
0.000000 4.250517 4.250517 0/1
4.250517 0.000000 0.000000 0/1
4.250517 0.000000 4.250517 0/1
4.250517 4.250517 0.000000 0/1
4.250517 4.250517 4.250517 0/1
2.125258 2.125258 0.000000 0/1
2.125258 2.125258 4.250517 0/1
2.125258 6.375775 0.000000 0/1
2.125258 6.375775 4.250517 0/1
6.375775 2.125258 0.000000 0/1
6.375775 2.125258 4.250517 0/1
6.375775 6.375775 0.000000 0/1
6.375775 6.375775 4.250517 0/1
2.125258 0.000000 2.125258 0/1
2.125258 0.000000 6.375775 0/1
2.125258 4.250517 2.125258 0/1
2.125258 4.250517 6.375775 0/1
6.375775 0.000000 2.125258 0/1
6.375775 0.000000 6.375775 0/1
6.375775 4.250517 2.125258 0/1
6.375775 4.250517 6.375775 0/1
0.000000 2.125258 2.125258 0/1
0.000000 2.125258 6.375775 0/1
0.000000 6.375775 2.125258 0/1
0.000000 6.375775 6.375775 0/1
4.250517 2.125258 2.125258 0/1
4.250517 2.125258 6.375775 0/1
4.250517 6.375775 2.125258 0/1
4.250517 6.375775 6.375775 0/1
1 1
0.001
partial
310 310 320
10 10 320
and the enum.x call will fail with the error message "At line 152 of file
../aux_src/makeStr.f90 (unit = 11, file = 'struct_enum.out')
Fortran runtime error: End of file"
But if the vacancy is on the Cl site, the input file is
Na32 Cl31
bulk
8.672776 0.000000 0.000000
0.000000 8.672776 0.000000
0.000000 0.000000 8.672776
2
32
0.000000 0.000000 2.168194 0/1
0.000000 0.000000 6.504582 0/1
0.000000 4.336388 2.168194 0/1
0.000000 4.336388 6.504582 0/1
4.336388 0.000000 2.168194 0/1
4.336388 0.000000 6.504582 0/1
4.336388 4.336388 2.168194 0/1
4.336388 4.336388 6.504582 0/1
2.168194 0.000000 0.000000 0/1
2.168194 0.000000 4.336388 0/1
2.168194 4.336388 0.000000 0/1
2.168194 4.336388 4.336388 0/1
6.504582 0.000000 0.000000 0/1
6.504582 0.000000 4.336388 0/1
6.504582 4.336388 0.000000 0/1
6.504582 4.336388 4.336388 0/1
0.000000 2.168194 0.000000 0/1
0.000000 2.168194 4.336388 0/1
0.000000 6.504582 0.000000 0/1
0.000000 6.504582 4.336388 0/1
4.336388 2.168194 0.000000 0/1
4.336388 2.168194 4.336388 0/1
4.336388 6.504582 0.000000 0/1
4.336388 6.504582 4.336388 0/1
2.168194 2.168194 2.168194 0/1
2.168194 2.168194 6.504582 0/1
2.168194 6.504582 2.168194 0/1
2.168194 6.504582 6.504582 0/1
6.504582 2.168194 2.168194 0/1
6.504582 2.168194 6.504582 0/1
6.504582 6.504582 2.168194 0/1
6.504582 6.504582 6.504582 0/1
1 1
0.001
partial
310 310 320
10 10 320
and enum.x yields the correct number of enumerated structures (1).
Given that NaCl is just two intersecting fcc lattices, I see no reason why
a Na vacancy would fail while the Cl vacancy input file would succeed. My
eyeballs confirm that the two input files are essentially identical except
for the expected (0, 0, 0.5) direct coordinates shift.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#85?email_source=notifications&email_token=AB3UNGDBZA2NS22IXMSDQ4LP6NDJLA5CNFSM4H64ENB2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G53UC6A>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB3UNGFBYJC6F7JIIOYKGS3P6NDJLANCNFSM4H64ENBQ>
.
|
Thanks Gus. Indeed, the example 5 is a better way to achieve what I am after if I am working purely within enumlib. In this case, I am using pymatgen to generate the enumlib input file, in which case I find it easier to generate the supercell first and use that supercell as input for enumeration. In general, I manipulate the structure input files to generate partial occupancies in different concentrations and different supercell sizes within pymatgen and use enumlib purely for running the enumeration, rather than having to worry about generating the additional input files needed for Example 5. |
Thanks Gus. Indeed, the example #5
<#5> is a better way to achieve
what I am after if I am working purely within enumlib. In this case, I am
using pymatgen to generate the enumlib input file, in which case I find it
easier to generate the supercell first and use that supercell as input for
enumeration.
Is it more convenient to generate the supercell instead of using the parent
because of pymatgen? Or is it a shortcoming of the enumlib design? Is there
some way I could make it easier for enumlib to interface with pymatgen?
… |
It is slightly more convenient. We still have to supply a supercell transformation to pymatgen, .e.g, Admittedly, this is partially due to the fact that pymatgen's interface to enumlib is written quite a long time ago (one of the earliest versions of enumlib when it first came out) so it may be out of pace with new enumlib functionality. I do have some suggestions on enumlib design which I think would make it easier to use. Here they are in no particular order:
Comment: 2D square fixed cell
Lattice vectors: [[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]
nSpecies: 4
nLatticePoints: 2
Species:
- ["Fe", "Mn"]
- ["O", "S"]
Coordinates:
- [0, 0, 0]
- [0, 0, 0]
MinCellSize: 8
MaxCellSize: 8
FinitePrecisionParameter: .0001
Mode: Full
ConcentrationRanges:
Fe: [1/16, 1/16]
Mn: [7/16, 7/16]
O: [1/16, 1/16]
S: [7/16, 7/16] It is more verbose, but has the advantage of being more explicit about what each parameter means and less error-prone (it is less likely I change nSpecies when I mean to change nLatticePoints for example). Also, you have the extensive library support of YAML parsers for all different languages, with all the proper validation. Error checking is also easy, e.g., validating that the number of species specified is correct. Adding new parameters is simply adding another key value pair. Also, there is no issue with people swapping the order of the key value pairs in the input file. All in all, it is a lot more flexible compared to a line by line parsed text. ii. For fixed cell, it would be useful if someone can supply that as just part of the same struct_enum.in file. In the YAML format suggested, it is straightforward to just add another key called iii. Some of the parameters seem redundant. E.g., does the number of species need to be specified? iv. Some useful defaults for parameters. E.g., if someone does not specify the FinitePrecisionParameter in the input file, a sensible value is chosen ala, how the VASP INCAR file works. v. It would be useful to be able to explicitly identify the species, e.g., in the format above, I deliberately wrote Fe/Mn and O/S. This would allow makestr.x to generate the vasp.0000x files in the new VASP format where the species are explicitly identified. vi. enumlib right now will basically just run regardless of whether the input makes sense. There are enumerations that can take O(life of the universe) to compute, but enumlib will not fail in those instances. Some timeout parameter would be useful. vii. Finally, in a really ideal world (and I recognize this one is probably too much work), there will be a python library version of enumlib where the Fortran code is simply wrapped as Python extensions. For example, scipy does a lot of things in Fortran but the main API is python based. That will allow something like pymatgen to simply call on the Python interface. Right now, the way pymatgen works is that it generates an input file and then does a command line call to enum.x and makestr.x to generate the enumerations and read in the files. This is probably far from efficient but it is the best that we can do at this juncture. |
Also, just to illustrate the issue with a fixed line by line parsed input file, right now, the struct_enum.in.ex5 in the EXAMPLES do not give the answer that is expected. Running Ex. 5 as is gives the following output:
It appears that the multi-line comment for "full" is causing the concentrations not to be read. If I delete the multi-line comment, the answer is then what is expected, i.e., 4 structures. A YAML based syntax will not have such an issue since comment parsing etc. is all handled within the YAML spec. |
I do have some suggestions on enumlib design which I think would make it
easier to use. Here they are in no particular order:
1. I believe the struct_enum.in file format is based loosely on the
VASP POSCAR format. Honestly, I have never found the VASP file formats
particularly intuitive. Without reading a manual, it is impossible to know
for example that the first line is a comment line, the next three lines are
the lattice vectors etc. I would recommend that enumlib move to a more
structured input file format based on key-value pairs, e.g., something like
a YAML format.
I wholly agree and thought hard about this years ago.
I know that you have successfully made a lot of software that lots of
people use, so I value your perspective. Here was the argument against
something like YAML. If we use a structured input like that, then the user
will have to have the parser (say a fortran yaml parser). As soon as we do
that, we are going to have some set of potential users that download the
code, compile it, set up an input file, and then have some error (outside
of our control) related to the parsing (they don't have a parser installed,
not interested in learning what YAML is, etc...lots of the "old" professors
are probably in this camp, even the heavy computational ones...). I worried
that we would lose more potential users this way that with the "old
fashioned" formatted input (which I REALLY don't like personally...).
[When I first tried linux, in say 2000 or so, it was the case that one
spent more time messing around with making it work, staying out of
"dependency hell" than anything else. So I never adopted it. I think it has
come a long way since then...I worried about similar problems with enumlib
users and a parser...]
Maybe these arguments aren't as convincing in 2019 as in 2007 when we
started the code...(or in 1997 when I first started writing Fortran and
started building libraries...)
What do you think?
If I do add a parser, do I use a python parser that generates a
struct_enum.in file? Or do I rely on someone's fortran-yaml parser (google
says they exist...)? What would you do, or what would you suggest?
…
1. Just to give a sense of what it may look like, an example might be
as follows:
Comment: 2D square fixed cellLattice vectors: [[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]nSpecies: 4nLatticePoints: 2Species:
- ["Fe", "Mn"]
- ["O", "S"]Coordinates:
- [0, 0, 0]
- [0, 0, 0]MinCellSize: 8MaxCellSize: 8FinitePrecisionParameter: .0001Mode: FullConcentrationRanges:
Fe: [1/16, 1/16]
Mn: [7/16, 7/16]
O: [1/16, 1/16]
S: [7/16, 7/16]
It is more verbose, but has the advantage of being more explicit about
what each parameter means. Also, you have the extensive library support of
YAML parsers for all different languages. Error checking is also easy,
e.g., validating that the number of species specified is correct. Adding
new parameters is simply adding another key value pair. Also, there is no
issue with people swapping the order of the key value pairs in the input
file. All in all, it is a lot more flexible compared to a line by line
parsed text.
ii. For fixed cell, it would be useful if someone can supply that as just
part of the same struct_enum.in file. In the YAML format suggested, it is
straightforward to just add another key called
FixedSupercellTransformation. That way, users don't need to worry about
multiple input files.
iii. Some of the parameters seem redundant. E.g., does the number of
species need to be specified?
iv. Some useful defaults for parameters. E.g., if someone does not specify
the FinitePrecisionParameter in the input file, a sensible value is chosen
ala, how the VASP INCAR file works.
v. It would be useful to be able to explicitly identify the species, e.g.,
in the format above, I deliberately wrote Fe/Mn and O/S. This would allow
makestr.x to generate the vasp.0000x files in the new VASP format where the
species are explicitly identified.
vi. enumlib right now will basically just run regardless of whether the
input makes sense. There are enumerations that can take O(life of the
universe) to compute, but enumlib will not fail in those instances. Some
timeout parameter would be useful.
vii. Finally, in a really ideal world (and I recognize this one is
probably too much work), there will be a python library version of enumlib
where the Fortran code is simply wrapped as Python extensions. For example,
scipy does a lot of things in Fortran but the main API is python based.
That will allow something like pymatgen to simply call on the Python
interface. Right now, the way pymatgen works is that it generates an input
file and then does a command line call to enum.x and makestr.x to generate
the enumerations and read in the files. This is probably far from efficient
but it is the best that we can do at this juncture.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#85?email_source=notifications&email_token=AB3UNGB4XWDAQOY2N63CVRLQADUJLA5CNFSM4H64ENB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2J637I#issuecomment-513011197>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB3UNGDDWAD74XNC4TWJZQ3QADUJLANCNFSM4H64ENBQ>
.
|
Thanks. I agree on avoiding dependency hell. It is something we struggle with even in the Python world (which at least has proper package managers in 2019) and it is much worse in the Fortran world. YAML is so established as a format that it is very likely a system would come with a YAML library. I know that libyaml is found on practically every Linux/Unix based system I have encountered, including those on supercomputing systems. Unfortunately, my knowledge of Fortran is very limited. I have no idea how easy it is just simply use existing libyaml from a fortran program. If possible, I would suggest including the Fortran yaml library as part of enumlib, which will automatically be compiled as part of the process and avoid external dependencies. If the open-source fortran-yaml from someone else (e.g., https://github.com/BoldingBruggeman/fortran-yaml) works, that would be best. It probably would not make sense to adopt YAML if you have to rely on Python (or some other language) to parse it to generate a struct_enum.in file. That is an additional step and software that someone has to install. It is not difficult and in fact, pymatgen essentially generates struct_enum.in from Python objects and it would be trivial for us to support a YAML format (pymatgen already relies on yaml for a lot of things). If it is too difficult to incorporate YAML, some of the other minor format modifications would still be useful though, e.g., explicit specification of species, parameter defaults, etc. |
This is great feedback, good ideas Shuye. Thanks for taking the time to
respond. I'll open an issue on github and list this as an enhancement
request.
Gus Hart
Professor, Physics and Astronomy
http://msg.byu.edu
Dean, Physical and Mathematical Sciences
http://cpms.byu.edu
…On Thu, Jul 18, 2019 at 5:39 PM Shyue Ping Ong ***@***.***> wrote:
Thanks. I agree on avoiding dependency hell. It is something we struggle
with even in the Python world (which at least has proper package managers
in 2019) and it is much worse in the Fortran world.
YAML is so established as a format that it is very likely a system would
come with a YAML library. I know that libyaml is found on practically every
Linux/Unix based system I have encountered, including those on
supercomputing systems. Unfortunately, my knowledge of Fortran is very
limited. I have no idea how easy it is just simply use existing libyaml
from a fortran program.
If possible, I would suggest including the Fortran yaml library as part of
enumlib, which will automatically be compiled as part of the process and
avoid external dependencies. If the open-source fortran-yaml from someone
else (e.g., https://github.com/BoldingBruggeman/fortran-yaml) works, that
would be best.
It probably would not make sense to adopt YAML if you have to rely on
Python (or some other language) to parse it to generate a struct_enum.in
file. That is an additional step and software that someone has to install.
It is not difficult and in fact, pymatgen essentially generates
struct_enum.in from Python objects and it would be trivial for us to
support a YAML format (pymatgen already relies on yaml for a lot of things).
If it is too difficult to incorporate YAML, some of the other minor format
modifications would still be useful though, e.g., explicit specification of
species, parameter defaults, etc.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#85?email_source=notifications&email_token=AB3UNGFJ3BU74VSVEET6EGLQAD5L5A5CNFSM4H64ENB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2KEKQQ#issuecomment-513033538>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB3UNGDKUWBQ6X3JGAPYYU3QAD5L5ANCNFSM4H64ENBQ>
.
|
Sorry, misspelled your name...
Gus Hart
Professor, Physics and Astronomy
http://msg.byu.edu
Dean, Physical and Mathematical Sciences
http://cpms.byu.edu
…On Fri, Jul 26, 2019 at 11:39 AM Gus Hart ***@***.***> wrote:
This is great feedback, good ideas Shuye. Thanks for taking the time to
respond. I'll open an issue on github and list this as an enhancement
request.
Gus Hart
Professor, Physics and Astronomy
http://msg.byu.edu
Dean, Physical and Mathematical Sciences
http://cpms.byu.edu
On Thu, Jul 18, 2019 at 5:39 PM Shyue Ping Ong ***@***.***>
wrote:
> Thanks. I agree on avoiding dependency hell. It is something we struggle
> with even in the Python world (which at least has proper package managers
> in 2019) and it is much worse in the Fortran world.
>
> YAML is so established as a format that it is very likely a system would
> come with a YAML library. I know that libyaml is found on practically every
> Linux/Unix based system I have encountered, including those on
> supercomputing systems. Unfortunately, my knowledge of Fortran is very
> limited. I have no idea how easy it is just simply use existing libyaml
> from a fortran program.
>
> If possible, I would suggest including the Fortran yaml library as part
> of enumlib, which will automatically be compiled as part of the process and
> avoid external dependencies. If the open-source fortran-yaml from someone
> else (e.g., https://github.com/BoldingBruggeman/fortran-yaml) works,
> that would be best.
>
> It probably would not make sense to adopt YAML if you have to rely on
> Python (or some other language) to parse it to generate a struct_enum.in
> file. That is an additional step and software that someone has to install.
> It is not difficult and in fact, pymatgen essentially generates
> struct_enum.in from Python objects and it would be trivial for us to
> support a YAML format (pymatgen already relies on yaml for a lot of things).
>
> If it is too difficult to incorporate YAML, some of the other minor
> format modifications would still be useful though, e.g., explicit
> specification of species, parameter defaults, etc.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#85?email_source=notifications&email_token=AB3UNGFJ3BU74VSVEET6EGLQAD5L5A5CNFSM4H64ENB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2KEKQQ#issuecomment-513033538>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AB3UNGDKUWBQ6X3JGAPYYU3QAD5L5ANCNFSM4H64ENBQ>
> .
>
|
I am trying to run a simple enumeration of a rocksalt-type system to create 1 vacancy in a 2x2x2 supercell of NaCl (I am aware that this is a trivial problem that does not require enumlib whatsoever, given that all atoms are symmetrically equivalent in this system. But it is part of an automated generation of structures at different concentrations).
I am encountering a strange bug. If I create a vacancy on the Na site, the generated input file is
and the enum.x call will fail with the error message "At line 152 of file ../aux_src/makeStr.f90 (unit = 11, file = 'struct_enum.out')
Fortran runtime error: End of file"
But if the vacancy is on the Cl site, the input file is
and enum.x yields the correct number of enumerated structures, i.e., 1. (Please ignore the lattice parameters, those are dummy values).
Given that NaCl is just two intersecting fcc lattices, I see no reason why a Na vacancy would fail while the Cl vacancy input file would succeed. My eyeballs confirm that the two input files are essentially identical except for the expected (0, 0, 0.5) direct coordinates shift.
The text was updated successfully, but these errors were encountered: