-
Notifications
You must be signed in to change notification settings - Fork 192
AiiDA 1.0 plugin migration guide
- Migrating imports
- Migrating
Data
subclasses - Migrating
JobCalculation
toCalcJob
- Migrating the
Parser
Before you start modifying imports of AiiDA objects, please run the AiiDA plugin migrator (click on the link to find the instructions) in order to take care of a number of tedious search/replace operations.
Note: The plugin migrator will bring you only some of the way. If you discover some replacements that are missing, it's easy to add them!
-
DataFactory('parameter')
=>DataFactory('dict')
In aiida-core<=0.12.*
, the Data
Node class implemented some magic to automatically call certain methods if the corresponding keywords were passed in the constructor (also making sure that those specific keywords were not passed on to the constructor of the parent class).
Take Dict
(formerly ParameterData
) as an example: if constructed with the keyword dict
, the constructor would call set_dict
and remove the keyword from the kwargs
before calling the constructor of the Data
base class.
This magic has been dropped in favor of the standard python approach - you now have the freedom (but also the duty) to implement the constructor as needed by your class. Take as an example the new Dict
data sub class:
class Dict(Data):
"""`Data` sub class to represent a dictionary."""
def __init__(self, **kwargs):
"""Store a dictionary as a `Node` instance.
:param dict: the dictionary to set
"""
dictionary = kwargs.pop('dict', None)
super(Dict, self).__init__(**kwargs)
if dictionary:
self.set_dict(dictionary)
Note that we allow the user to pass a dictionary using the dict
keyword.
We first pop this value from the kwargs
, using the pop(key, None)
syntax to make sure it will simply return None
instead of excepting if the key is not present. Then we pass the remaining kwargs
to the parent constructor.
Note: If you overwrite the constructor, don't forget to call the parent constructor.
Finally, after having called super
, we can process the dictionary
(if it was actually passed).
You can choose to continue to do so in a set_dict
method but you can in principle choose whatever you like.
Note that not __init__
is called only when constructing a new node, but not when reloading an existing node.
Therefore, you cannot use __init__
to set properties on the class that you need to use also for reloaded nodes, as these will not be there when (re)loading a node from the database. E.g., by doing by self.my_property = xxx
in the __init__
, then load_node(yyy).my_property
will not be available. Instead, for instance, define my_property
as a property:
class MyData(Data):
@property
def my_property(self):
return xxx
In case the plugin migrator hasn't already taken care of this, replace:
from aiida.orm.calculation.job import JobCalculation
with:
from aiida.engine import CalcJob
You can keep the name of your subclass.
Instead of defining default variables in _init_internal_params
:
def _init_internal_params(self):
super(SomeCalculation, self)._init_internal_params()
self._INPUT_FILE_NAME = 'aiida.inp'
self._OUTPUT_FILE_NAME = 'aiida.out'
self._default_parser = 'quantumespresso.pw'
include them via class variables & the metadata of the input spec:
class SomeCalcJob(engine.CalcJob):
# Default input and output files
_DEFAULT_INPUT_FILE = 'aiida.in'
_DEFAULT_OUTPUT_FILE = 'aiida.out'
@classmethod
def define(cls, spec):
super(SomeCalcJob, cls).define(spec)
spec.input('metadata.options.input_filename', valid_type=six.string_types, default=cls._DEFAULT_INPUT_FILE)
spec.input('metadata.options.output_filename', valid_type=six.string_types, default=cls._DEFAULT_OUTPUT_FILE)
spec.input('metadata.options.parser_name', valid_type=six.string_types, default='quantumespresso.pw')
# withmpi defaults to "False" in aiida-core 1.0. Below, we override to default to withmpi=True
spec.input('metadata.options.withmpi', valid_type=bool, default=True)
The parser_name
key will be used by the engine to load the correct parser class after the calculation has completed. In this example, the engine will call ParserFactory('quantumespresso.pw')
which will load the PwParser
class of the aiida-quantumespresso
package.
To access the value of the other metadata options:
- if you are inside a method of the
CalcJob
class (e.g. inprepare_for_submission
), you can doself.inputs.metadata.options.output_filename
); - from a stored
CalcJobNode
, instead, you can donode.get_option('output_filename')
.
Note: The code above is just an example - you don't need to expose filenames via metadata.options
if you don't want users to be able to modify them.
The define
method works exactly as for WorkChains
, see its documentation for details.
Consider the following use method:
@classproperty
def _use_methods(cls):
return {
'structure': {
'valid_types': StructureData,
'additional_parameter': None,
'linkname': 'structure',
'docstring': 'the input structure',
}
}
This translates to the define
method:
@classmethod
def define(cls, spec):
super(SomeCalcJob, cls).define(spec)
spec.input('some_input', valid_type=orm.Int,
help='A simple input')
All input ports that are defined via spec.input
are required by default.
Use required=False
in order to make an input port optional.
For use_methods
that used the additional_parameter
keyword,
spec provides input namespaces.
Consider the following use_method
:
@classproperty
def _use_methods(cls):
return {
'structure': {
'valid_types': UpfData,
'additional_parameter': 'kind',
'linkname': 'pseudos',
'docstring': 'the input pseudo potentials',
}
}
This can be translated to the new process spec as follows:
@classmethod
def define(cls, spec):
super(SomeCalcJob, cls).define(spec)
spec.input_namespace('pseudos', valid_type=UpfData,
help='the input pseudo potentials', dynamic=True)
The spec.input_namespace
and the dynamic=True
keyword lets the engine know that the namespace can receive inputs that are not yet explicitly defined, because at the time of definition we do not know how many or under which keys the UpfData
will be passed. Example usage when setting up the calculation:
inputs = {
...
'pseudos': { 'Si': si_upf, 'C': c_upf },
...
}
Note: some inputs are pre-defined by CalcJob class. Check here for the full list of default inputs.
Please remove the leading underscore and adjust to the new signature:
def prepare_for_submission(self, folder):
"""Create the input files from the input nodes passed to this instance of the `CalcJob`.
:param folder: an `aiida.common.folders.Folder` to temporarily write files on disk
:return: `aiida.common.datastructures.CalcInfo` instance
"""
Inputs are no longer passed in as a dictionary but retrieved through self.inputs
(same as with WorkChains
).
Importantly, the inputs provided as well as their type have already been validated - if the spec defined an input as required the input is guaranteed to be present in self.inputs
. All boilerplate code for validation of presence and type can be removed in prepare_for_submission
.
For example, if the spec defines an input structure
of type StructureData
that is required, instead of:
try:
structure = inputdict.pop('structure')
except KeyError:
raise InputValidationError('No structure was passed in the inputs')
if not isinstance(structure, StructureData):
raise InputValidationError('the input structure should be a StructureData')
Simply do:
structure = self.inputs.structure
Only for input ports that are not required and do not specify a default you still need to check for the presence of the key in the dictionary.
@classmethod
def define(cls, spec):
super(SomeCalcJob, cls).define(spec)
spec.input('optional', valid_type=Int, required=False, help='an optional input')
def prepare_for_submission(self, folder):
if 'optional' in self.inputs:
optional = self.inputs.optional
else:
optional = None
This is an example of adding a SinglefileData
to the local_copy_list
of the CalcInfo
in 0.12:
single_file = SinglefileData()
local_copy_list = [(single_file.get_file_abs_path(),
os.path.join('some/relative/folder', single_file.filename)]
The get_file_abs_path
method has been removed, and the structure of the local_copy_list
has changed to accommodate this. You can now do:
single_file = SinglefileData()
local_copy_list = [(single_file.uuid, single_file.filename, single_file.filename)]
Each tuple in the local_copy_list
should have length 3 and contain:
- the UUID of the node (
SinglefileNode
orFolderData
) - the relative file path within the node repository (for the
SinglefileData
this is given by itsfilename
attribute) - the relative path where the file should be copied in the remote folder used for the execution of the
CalcJob
Naturally, this procedure also works for subclasses of SinglefileData
such as UpfData
, CifData
etc.
Note: If you are creating an input file inside the Calculation
and don't want to go through a node, you can simply use the folder
argument of prepare_for_submission
:
import io
with io.StringIO("my string") as handle:
folder.create_file_from_filelike(handle, filename='input.txt', mode='w')
The retrieve_singlefile_list
has been deprecated. The reason for its existence was to fix an intrinsic inefficiency with the retrieve_list
. Imagine a code that produces an output file that you want to retrieve, but you do not want to parse and store a derivation of the content as a node, but rather you just want to store the file as a SinglefileData
node as a whole. Any file that gets retrieved through the retrieve_list
is also stored in the repo of the calculation node. When you then create a SinglefileData
node out of those, you are again storing the content of the file, effectively duplicating the content. If this is done often, the repository bloats unnecessarily.
The retrieve_singlefile_list
was the solution, where the specified files would not be stored in the retrieved
folder but the engine would automatically turn them into SinglefileData
nodes and attach them as outputs.
This behavior can be reproduced with the more general retrieve_temporary_list
. Just like files in the retrieve_singlefile_list
, they will be retrieved, but not permanently stored in the retrieved
folder. The files will be stored in a temporary folder and passed to the Parser.parse
method as the retrieved_temporary_folder
argument. Here you will find the files from the retrieve_temporary_list
. You can now do whatever you want with these. Parse their content and store them in whatever node types you want. For the SinglefileData
it would look something like:
def parse(self, **kwargs):
from aiida.orm import SinglefileData
temporary_folder = kwargs['retrieved_temporary_folder']
with temporary_folder.open('some_output_file.txt', 'rb') as handle:
node = orm.SinglefileData(file=handle)
self.out('output_file', node)
There are two ways of restarting from a previous calculation
- Make a symlink to the folder with the previous calculation.
- Copy the folder from the previous calculation.
The advantage of approach 1 is that symlinking is fast and it does not occupy additional disk space. The disadvantage is that it won't work if the parent calculation was run on a different machine, and you shouldn't use it if your new calculation can modify data in the symlinked folder.
The old way of symlinking was:
calcinfo.remote_symlink_list = []
if parent_calc_folder is not None:
comp_uuid = parent_calc_folder.get_computer().uuid
remote_path = parent_calc_folder.get_remote_path()
calcinfo.remote_symlink_list.append((comp_uuid, remote_path, link_name)) # where the link_name is decided by you
Replace this by:
calcinfo.remote_symlink_list = []
if 'parent_calc_folder' in self.inputs:
comp_uuid = self.inputs.parent_calc_folder.computer.uuid
remote_path = self.inputs.parent_calc_folder.get_remote_path()
calcinfo.remote_symlink_list.append((comp_uuid, remote_path, link_name)) # where the link_name is decided by you
If you want to run the calculation on a different machine or you are afraid that the old data could be modified by a new run you should choose approach 2, taking into account that this requires time and disk space to copy the data. To implement this in your plugin you should add same information to calcinfo.remote_copy_list
instead of calcinfo.remote_symlink_list
:
calcinfo.remote_copy_list = []
if 'parent_calc_folder' in self.inputs:
comp_uuid = self.inputs.parent_calc_folder.computer.uuid
remote_path = self.inputs.parent_calc_folder.get_remote_path()
calcinfo.remote_copy_list.append((comp_uuid, remote_path, folder_name)) # where the folder_name is decided by you
The method has changed name from parse_with_retrieved
to parse
and the signature is now parse(self, retrieved_temporary_folder=None, **kwargs)
, the retrieved_temporary_folder
argument will be passed as a keyword argument to the parse function.
The retrieved_temporary_folder
is an absolute path to a temporary folder on disk (that will be automatically deleted after the parsing is done) that is by default empty (but can contain data if you specify file to temporarily retrieve in the retrieve_temporary_list
attribute on the CalcInfo
instance in CalcJob.prepare_for_submission
).
output_folder.open(relative_path)
replaces the deprecated output_folder.get_abs_path(relative_path)
.
To get the FolderData
node with the raw data retrieved by the engine, use the following:
try:
output_folder = self.retrieved
except exceptions.NotExistent:
return self.exit_codes.ERROR_NO_RETRIEVED_FOLDER
with output_folder.open('output_file_name', 'rb') as handle:
self.out('output_link_label', SinglefileData(file=handle))
Note that if you use this method of passing a filelike object to the SinglefileData
constructor, it is best to open it in binary mode. That is why we define rb
in the open
call.
As shown in the above example, the return signature of parse
has changed as well.
In aiida 0.12., we used to return a boolean (signalling whether parsing was successful) plus the list of output nodes:
def parse(self, **kwargs):
success = False
node_list = []
if some_problem:
self.logger.error("No retrieved folder found")
return success, node_list
This has been replaced by returning an aiida.engine.ExitCode
- or nothing, if parsing is successful. Adding output nodes is handled by self.out
(see next section).
def parse(self, **kwargs):
if some_error:
return self.exit_codes.ERROR_NO_RETRIEVED_FOLDER
Here, we are using an exit code defined in the spec
of a CalcJob
like so:
class SomeCalcJob(engine.CalcJob):
@classmethod
def define(cls, spec):
super(SomeCalcJob, cls).define(spec)
spec.exit_code(100, 'ERROR_NO_RETRIEVED_FOLDER', message='The retrieved folder data node could not be accessed.')
Note: We recommend defining exit codes in the spec
. It is also possible to define them directly in the parser, however:
def parse(self, **kwargs):
if some_error:
return aiida.engine.ExitCode(418)
Instead of returning a tuple of outputs after parsing is completed,
use the function self.out
at any point in the parse
function in order to attach an output node to the CalcJobNode
representing the execution of the CalcJob
class.
For example, this adds a Dict
node as an output with the link label results
.
output_results = {'some_key': 1}
self.out('results', Dict(dict=output_results))
By default, you will need to declare your outputs in the spec of the Process:
@classmethod
def define(cls, spec):
super(SomeCalcJob, cls).define(spec)
spec.output('results', valid_type=Dict, required=True, help='the results of the calculation')
spec.output('structure', valid_type=StructureData, required=False, help='optional relaxed structure')
spec.default_output_node = 'results'
Like inputs, outputs can be required or not. If a required output is missing after parsing, the calculation is marked as failed.
You can choose to forego this check to gain flexibility by making the outputs dynamic in the spec:
spec.outputs.dynamic = True
spec.outputs.valid_type = Data
In order to access inputs of the original calculation, you can use the property self.node
to get the CalcJobNode
. Example: self.node.inputs.structure
.