Sphinx White Paper

#Sphinx White Paper APBS 2.0, aka Sphinx, is an effort to create a modern platform for performing electrostatic calculations and operations. It's main feature is a that it is a "pluggable" architecture in order to make it easily extensible. At the same time we aim to make Sphinx backward compatible with existing APBS *.in files.

Sphinx itself is simply the architecture for which plugins are written. The "real" work is performed by the plugins. For example, the functionality of APBS and PDB2PQR will be migrated to plugins. It is hoped that other community modules will follow. Besides the mechanical hosting of plugin modules, the main purpose of the Sphinx architecture is to facilitate the routing of data between it's plugins.

For example, a plugin may read PDB data, pass that to PDB2PQR where it will be processed and eventually passed to the Multigrid solver which will generate a dx-file. This essentially integrates what was once several different command line programs, with several different files to keep track of, into a single, unified tool.

##Execution Pipelines There necessarily needs to a means by which to specify how data is routed from plugin to plugin. A set of plugins, connected end-to-end, where the previous feeds it's output to the input of the next is referred to as a an Execution Pipeline, or simply a pipeline.

For maximum flexibility, pipelines are specified in a text file, which by convention has the apbs extension. A set of pre-built workflows will be provided to the end user so that it's not necessary to create a pipeline configuration to perform routine tasks.

###Configuration Files

##Plugins

###Sources

###Sinks

###Processors

##Databus Of course plugins must communicate to do real work. Given the plethora of plugins existing, and yet to be devised, the communication mechanism must needs be generic enough to be useful across current and future plugins. However, it must not be so generic that an unnecessary onus is placed upon the plugin writer. Finally, the sum of the data passed around during an APBS session should be part of a cohesive whole. As a whole, the data may be operated upon later: searched, indexed, reviewed, bits pulled out for further investigation, etc. This may convey advantages to plugin writers as well.

As of this writing, the Sphinx prototype has bypassed all outlined above in favor of getting something working in the vein of what we envision. As such, the pipelines are constructed manually, and the data format passed betwixt plugins is tacitly agreed upon by the plugins themselves. Clearly, this is insufficient in the long run.

Going forward we would like to develop a framework that supports the definition, and use of data types useful to the subject domain of APBS. Sphinx will ship with some predefined types, but it will generally be up to plugin writers to define those that are useful for their plugins.

The process is imagined to consist of first registering a data type with the databus. This requires that the data type have a name, and a structure, defined via a JSON schema. Once registered, the databus incorporates it's structure into the central data representation.

Incorporation is possible because of the data type naming convention, which is unix filesystem path-like, going from general to specific -- e.g., sphinx/atomic/position/x, sphinx/atomic/position/y. This is of course a tree, therefor the central representation is a tree of data instances. New data types are incorporated into the tree in the appropriate location.

From the JSON representation, the databus can generate Python objects from instances of the data, upon which the plugins can operate. For example, the following JSON can be easily converted to a Python object.

{"sphinx" : {
   "atomic" : [{
     "atom1" : {
       "position": {
         "x": 53.66,
         "y": 5.42,
         "z": 66.53
       }}
     }
   ]
 }
}

And then the plugin can access the position information as a Python dictionary.

Meanwhile, plugins register with the databus, letting it know the types of data that they consume and produce. The databus confirms that it knows what the requisite types look like, and that there are plugins that provide, or consume the requisite types. Given the example types above (atom position data) a plugin may choose to request data from any place in the tree, e.g., sphinx/atomic/position, or even sphinx/atomic.

After the databus has validated the pipeline, in terms of impedance matching of the plugin inputs and outputs, the pipeline is executed. Plugin sources and processors place data on the bus, and the bus routes it to the appropriate downstream plugins. These destination plugins receive data from the bus, do some work on it and put it back on the bus. Or if they are terminal plugins, they write their output to a file, socket, etc.

One nice side effect of this work is that it should be possible to somewhat automagically build pipelines given input sources, and output sinks. For example, given a PDB input file, and an output .dx file, the databus could conceivably infer that a PDB reader plugin should feed it's output to PDB2PQR, which then needs to feed a solver plugin. Given this partial pipeline information, a set of alternatives can be presented to the user to complete, and then run the pipeline.

##Remote Jobs ###MPI ###ØMQ?

##User Interaction ###Command Line ####Single Run ####REPL ####Browser-Based UI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sphinx White Paper

Clone this wiki locally