Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange bug with enumeration of rock-salt system #85

Open
shyuep opened this issue Jul 8, 2019 · 9 comments
Open

Strange bug with enumeration of rock-salt system #85

shyuep opened this issue Jul 8, 2019 · 9 comments
Assignees

Comments

@shyuep
Copy link

shyuep commented Jul 8, 2019

I am trying to run a simple enumeration of a rocksalt-type system to create 1 vacancy in a 2x2x2 supercell of NaCl (I am aware that this is a trivial problem that does not require enumlib whatsoever, given that all atoms are symmetrically equivalent in this system. But it is part of an automated generation of structures at different concentrations).

I am encountering a strange bug. If I create a vacancy on the Na site, the generated input file is

Na31 Cl32
bulk
8.501033 0.000000 0.000000
0.000000 8.501033 0.000000
0.000000 0.000000 8.501033
2
32
0.000000 0.000000 0.000000 0/1
0.000000 0.000000 4.250517 0/1
0.000000 4.250517 0.000000 0/1
0.000000 4.250517 4.250517 0/1
4.250517 0.000000 0.000000 0/1
4.250517 0.000000 4.250517 0/1
4.250517 4.250517 0.000000 0/1
4.250517 4.250517 4.250517 0/1
2.125258 2.125258 0.000000 0/1
2.125258 2.125258 4.250517 0/1
2.125258 6.375775 0.000000 0/1
2.125258 6.375775 4.250517 0/1
6.375775 2.125258 0.000000 0/1
6.375775 2.125258 4.250517 0/1
6.375775 6.375775 0.000000 0/1
6.375775 6.375775 4.250517 0/1
2.125258 0.000000 2.125258 0/1
2.125258 0.000000 6.375775 0/1
2.125258 4.250517 2.125258 0/1
2.125258 4.250517 6.375775 0/1
6.375775 0.000000 2.125258 0/1
6.375775 0.000000 6.375775 0/1
6.375775 4.250517 2.125258 0/1
6.375775 4.250517 6.375775 0/1
0.000000 2.125258 2.125258 0/1
0.000000 2.125258 6.375775 0/1
0.000000 6.375775 2.125258 0/1
0.000000 6.375775 6.375775 0/1
4.250517 2.125258 2.125258 0/1
4.250517 2.125258 6.375775 0/1
4.250517 6.375775 2.125258 0/1
4.250517 6.375775 6.375775 0/1
1 1
0.001
partial
310 310 320
10 10 320

and the enum.x call will fail with the error message "At line 152 of file ../aux_src/makeStr.f90 (unit = 11, file = 'struct_enum.out')
Fortran runtime error: End of file"

But if the vacancy is on the Cl site, the input file is

Na32 Cl31
bulk
8.672776 0.000000 0.000000
0.000000 8.672776 0.000000
0.000000 0.000000 8.672776
2
32
0.000000 0.000000 2.168194 0/1
0.000000 0.000000 6.504582 0/1
0.000000 4.336388 2.168194 0/1
0.000000 4.336388 6.504582 0/1
4.336388 0.000000 2.168194 0/1
4.336388 0.000000 6.504582 0/1
4.336388 4.336388 2.168194 0/1
4.336388 4.336388 6.504582 0/1
2.168194 0.000000 0.000000 0/1
2.168194 0.000000 4.336388 0/1
2.168194 4.336388 0.000000 0/1
2.168194 4.336388 4.336388 0/1
6.504582 0.000000 0.000000 0/1
6.504582 0.000000 4.336388 0/1
6.504582 4.336388 0.000000 0/1
6.504582 4.336388 4.336388 0/1
0.000000 2.168194 0.000000 0/1
0.000000 2.168194 4.336388 0/1
0.000000 6.504582 0.000000 0/1
0.000000 6.504582 4.336388 0/1
4.336388 2.168194 0.000000 0/1
4.336388 2.168194 4.336388 0/1
4.336388 6.504582 0.000000 0/1
4.336388 6.504582 4.336388 0/1
2.168194 2.168194 2.168194 0/1
2.168194 2.168194 6.504582 0/1
2.168194 6.504582 2.168194 0/1
2.168194 6.504582 6.504582 0/1
6.504582 2.168194 2.168194 0/1
6.504582 2.168194 6.504582 0/1
6.504582 6.504582 2.168194 0/1
6.504582 6.504582 6.504582 0/1
1 1
0.001
partial
310 310 320
10 10 320

and enum.x yields the correct number of enumerated structures, i.e., 1. (Please ignore the lattice parameters, those are dummy values).

Given that NaCl is just two intersecting fcc lattices, I see no reason why a Na vacancy would fail while the Cl vacancy input file would succeed. My eyeballs confirm that the two input files are essentially identical except for the expected (0, 0, 0.5) direct coordinates shift.

@glwhart
Copy link
Contributor

glwhart commented Jul 17, 2019 via email

@shyuep
Copy link
Author

shyuep commented Jul 17, 2019

Thanks Gus. Indeed, the example 5 is a better way to achieve what I am after if I am working purely within enumlib. In this case, I am using pymatgen to generate the enumlib input file, in which case I find it easier to generate the supercell first and use that supercell as input for enumeration. In general, I manipulate the structure input files to generate partial occupancies in different concentrations and different supercell sizes within pymatgen and use enumlib purely for running the enumeration, rather than having to worry about generating the additional input files needed for Example 5.

@glwhart
Copy link
Contributor

glwhart commented Jul 18, 2019 via email

@shyuep
Copy link
Author

shyuep commented Jul 18, 2019

It is slightly more convenient. We still have to supply a supercell transformation to pymatgen, .e.g, Structure.make_supercell([[1, 0, 0], [0, 2, -2], [0, 2, 2]]), but we do not need to generate a separate fixed_cells.in file. We also do not have to ensure that the cell sizes supplied in struct_enum.in is consistent with the supercell transformation vectors supplied in fixed_cells.in.

Admittedly, this is partially due to the fact that pymatgen's interface to enumlib is written quite a long time ago (one of the earliest versions of enumlib when it first came out) so it may be out of pace with new enumlib functionality.

I do have some suggestions on enumlib design which I think would make it easier to use. Here they are in no particular order:

  1. I believe the struct_enum.in file format is based loosely on the VASP POSCAR format. Honestly, I have never found the VASP file formats particularly intuitive. Without reading a manual, it is impossible to know for example that the first line is a comment line, the next three lines are the lattice vectors etc. I would recommend that enumlib move to a more structured input file format based on key-value pairs, e.g., something like a YAML format. Just to give a sense of what it may look like, an example might be as follows:
Comment: 2D square fixed cell
Lattice vectors: [[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]
nSpecies: 4
nLatticePoints: 2
Species:
 - ["Fe", "Mn"]
 - ["O", "S"]
Coordinates:
 - [0, 0, 0]
 - [0, 0, 0]
MinCellSize: 8
MaxCellSize: 8
FinitePrecisionParameter: .0001
Mode: Full
ConcentrationRanges:
  Fe: [1/16, 1/16]
  Mn: [7/16, 7/16]
  O: [1/16, 1/16]
  S: [7/16, 7/16]

It is more verbose, but has the advantage of being more explicit about what each parameter means and less error-prone (it is less likely I change nSpecies when I mean to change nLatticePoints for example). Also, you have the extensive library support of YAML parsers for all different languages, with all the proper validation. Error checking is also easy, e.g., validating that the number of species specified is correct. Adding new parameters is simply adding another key value pair. Also, there is no issue with people swapping the order of the key value pairs in the input file. All in all, it is a lot more flexible compared to a line by line parsed text.

ii. For fixed cell, it would be useful if someone can supply that as just part of the same struct_enum.in file. In the YAML format suggested, it is straightforward to just add another key called FixedSupercellTransformation. That way, users don't need to worry about multiple input files. For fixed cells, there should not be a need to cell sizes, since cell size is implicit in the transformations.

iii. Some of the parameters seem redundant. E.g., does the number of species need to be specified?

iv. Some useful defaults for parameters. E.g., if someone does not specify the FinitePrecisionParameter in the input file, a sensible value is chosen ala, how the VASP INCAR file works.

v. It would be useful to be able to explicitly identify the species, e.g., in the format above, I deliberately wrote Fe/Mn and O/S. This would allow makestr.x to generate the vasp.0000x files in the new VASP format where the species are explicitly identified.

vi. enumlib right now will basically just run regardless of whether the input makes sense. There are enumerations that can take O(life of the universe) to compute, but enumlib will not fail in those instances. Some timeout parameter would be useful.

vii. Finally, in a really ideal world (and I recognize this one is probably too much work), there will be a python library version of enumlib where the Fortran code is simply wrapped as Python extensions. For example, scipy does a lot of things in Fortran but the main API is python based. That will allow something like pymatgen to simply call on the Python interface. Right now, the way pymatgen works is that it generates an input file and then does a command line call to enum.x and makestr.x to generate the enumerations and read in the files. This is probably far from efficient but it is the best that we can do at this juncture.

@shyuep
Copy link
Author

shyuep commented Jul 18, 2019

Also, just to illustrate the issue with a fixed line by line parsed input file, right now, the struct_enum.in.ex5 in the EXAMPLES do not give the answer that is expected. Running Ex. 5 as is gives the following output:

Concentration ranges are not specified
---------------------------------------------------------------------------------------------
Generating permutations for fixed cells. index n= 8 to  8
Be aware: non-primitive structures (super-periodic configurations) are included
in the final list in this mode.
---------------------------------------------------------------------------------------------
Calculating derivative structures for index n= 8 to  8
Volume       CPU        #HNFs  #SNFs    #reduced    % dups      volTot      RunTot
   8         0.0531        1     1         1        0.0000        1771        1771
---------------------------------------------------------------------------------------------

It appears that the multi-line comment for "full" is causing the concentrations not to be read. If I delete the multi-line comment, the answer is then what is expected, i.e., 4 structures. A YAML based syntax will not have such an issue since comment parsing etc. is all handled within the YAML spec.

@glwhart
Copy link
Contributor

glwhart commented Jul 18, 2019 via email

@shyuep
Copy link
Author

shyuep commented Jul 18, 2019

Thanks. I agree on avoiding dependency hell. It is something we struggle with even in the Python world (which at least has proper package managers in 2019) and it is much worse in the Fortran world.

YAML is so established as a format that it is very likely a system would come with a YAML library. I know that libyaml is found on practically every Linux/Unix based system I have encountered, including those on supercomputing systems. Unfortunately, my knowledge of Fortran is very limited. I have no idea how easy it is just simply use existing libyaml from a fortran program.

If possible, I would suggest including the Fortran yaml library as part of enumlib, which will automatically be compiled as part of the process and avoid external dependencies. If the open-source fortran-yaml from someone else (e.g., https://github.com/BoldingBruggeman/fortran-yaml) works, that would be best.

It probably would not make sense to adopt YAML if you have to rely on Python (or some other language) to parse it to generate a struct_enum.in file. That is an additional step and software that someone has to install. It is not difficult and in fact, pymatgen essentially generates struct_enum.in from Python objects and it would be trivial for us to support a YAML format (pymatgen already relies on yaml for a lot of things).

If it is too difficult to incorporate YAML, some of the other minor format modifications would still be useful though, e.g., explicit specification of species, parameter defaults, etc.

@glwhart
Copy link
Contributor

glwhart commented Jul 26, 2019 via email

@glwhart
Copy link
Contributor

glwhart commented Jul 26, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants