Skip to content

Gumtree : Python Scripting Interface of Gumtree Programmer's Guide

nxi edited this page Mar 9, 2017 · 1 revision
Created by Norman Xiong, last modified on May 13, 2013
Contents
Dataset - Multi-Dimensional Data Container
Overview
The Basics
Python Import
Dataset Creation
Output
Operation
Indexing
Iterating
Set Values
The Analysis Interface
Error Propagation
Nexus Axes
Nexus Metadata
Normalisation
Nexus Import and Export
Examples
Plotting – Engine of Curve and Image Plot
Overview
Curve Plot Interface – Plot
Create Curve Plot
Dataset Management
Axis Control
Rendering Control
Mask Control
I/O Control
Image Plot Interface – Image
Create Image Plot
Dataset Management
Axis Control
Rendering Control
Mask Control
I/O Control
Examples
Plot 1D
Image 2D
Appendix – Architecture Diagram

Dataset - Multi-Dimensional Data Container

Overview


The Dataset interface borrows a number of Numpy function names. Besides that, it also has necessary exclusive function names to make Nexus data reduction easier.
In implementation, the python dataset is a wrapper of Java GDM object. A great part of the logic happens in the Java side. Object wrapped with dataset can be referenced in Java.
Categories:
  • Constructing
  • Indexing
  • Iterating
  • Slicing
  • Modifying
  • Arithmetic operations
  • Mathematical and statistical functions
Implemented Numpy methods (1.6):
  • Array attributes: shape, ndim, size, dtype, tolist
  • Array methods: copy, fill, flatten, reduce, take, put, max, min, sum, prod
  • Array creation: instance, zeros, ones, linspace, arange, zeros_like, ones_like, as array, rand
  • Array manipulation: tile, concatenate, take, column_stack, vstack, hstack, dstack, array_split, split, vsplit, hsplit, dsplit
  • Array modification: fill, append, delete, put
  • Maths: add, subtract, multiply, divide, negative, power, exp, ln, log10, sqrt, angle, divide, reminder, sum
  • Trig: sin, cos, tan, arcsin, arccos, arctan,
  • Stats: Random: rand, engine, seed
Exclusive:
  • Error propagation – quick access to var and err
  • Normalisation – normalising against nexus metadata
  • File accessing – nexus hdf file access
  • Nexus metadata – quick access to nexus metadata
  • Rich axes information – carry axis information in the nexus way

The Basics

Python Import

  • To import the project into Python workspace, use: from gumpy.nexus import *

Dataset Creation

  • arange() – to create an array that similar to range() command, but can be with a given shape and type.

>>> from gumpy.nexus import * >>> a = arange(12, [3,]) >>> a Dataset(Array(0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11
),
name='357',
var=Array(0.00000000, 1.00000000, 2.00000000, 3.00000000],
[4.00000000, 5.00000000, 6.00000000, 7.00000000],
[8.00000000, 9.00000000, 10.00000000, 11.00000000
),
axes=[SimpleData(Array([0,]),
])
  • instance(shape, init = 0, dtype = float) – to create a dataset with given shape, type and initiate value.

>>> from gumpy.nexus import * >>> b = instance([2,], 3, int) >>> b Dataset(Array(3.00000000, 3.00000000, 3.00000000],
[3.00000000, 3.00000000, 3.00000000
),
name='672',
var=Array(3.00000000, 3.00000000, 3.00000000],
[3.00000000, 3.00000000, 3.00000000
),
axes=[SimpleData(Array([0,]),
])
  • zeros(shape, dtype = float) – to create an empty dataset with given shape and type.

>>> from gumpy.nexus import * >>> b = zeros([2,])
  • ones(shape, init = 0, dtype = float) – to create a dataset with given shape, type and initiate value of 1.

>>> from gumpy.nexus import * >>> b = ones([2,], int)
  • rand (shape, dtype = float) – to create a dataset with given shape, type and random initiate values. The random values are evenly distributed in [0,]1). Please read the python documentation of the package for more information about using different engines and seeds to generate random values.

>>> from gumpy.nexus import * >>> b = rand([2,], float)
  • asarray(obj, dtype = None) – to create a dataset from list values. The shape and type of the dataset match with the list object.


>>> from gumpy.nexus import * >>> b = asarray(1, 2], [3, 4)
  • Other functions to create a dataset, such as, zeros_like(), ones_like(), linspace().
 

Output

  • To print the dataset as String in console, use str(dataset).

>>> from gumpy.nexus import * >>> b = asarray(1, 2], [3, 4)
>>> print str(b)
title: 912
storage: 1 2]
[3 4

error: 1.00000000, 1.41421354]
[1.73205078, 2.00000000

axes:
0. title: dim_0
units:
storage: [0]
1. title: dim_1
units:
storage: [0]
  • To get the printable representation of the dataset, use repr(dataset).

>>> print repr(b)
Dataset(Array(1, 2],
[3, 4
),
title='912',
var=Array(1.00000000, 2.00000000],
[3.00000000, 4.00000000
),
axes=[SimpleData(Array([0,]),
])
  • To get a simple view of the dataset, use dataset.storage.

>>> print b.storage
1 2]
[3 4
  • To present the dataset as a possibly nested list, use dataset.tolist().

>>> print b.tolist()
1, 2], [3, 4
Operation
Datasets carrying error information will perform error propagation in operations.
  • Simple operators: +, -, *, /, %, **
  • In place operators: +=, -=, *=, /=, %=
  • Logic operator: ==, !=
  • Math functions, such as, exp(), log10(), ln(), sqrt()
  • Trigonometric functions, such as sin(), cos(), tan(), arcsin(), arcos(), arctan()
  • Statistic functions, such as, sum(), prod(), max(), min(), all(), any()
 

Indexing

  • Use dataset[index] syntax. For example,

>>> print b[0,]
2
  • Slicing: basic slicing functions are supported.
    • Simple way to slice a dataset: dataset[i] takes the ith slice from the dataset, assume the rank of the dataset is more than 1.
    • Use dataset[i:j:k] to take slice of the dataset, where i is the starting index, j is the stopping index and k is the step (k0).

>>> c = arange(12, [6,]) >>> print c.storage
0 1]
[ 2 3]
[ 4 5]
[ 6 7]
[ 8 9]
[10 11

>>> print c[1:6:2].storage
2 3]
[ 6 7]
[10 11
  • ** A combination of the above two methods.

>>> print c[1:6:2,].storage
3]
[ 7]
[11
  • ** To take a slice from a dimension other that the first one:

>>> print c[:,].storage
1]
[ 3]
[ 5]
[ 7]
[ 9]
[11
  • ** Multiple slicing.

>>> print c[1:6:2][1:3].storage
6 7]
[10 11

Iterating

  • The for loop iterating. If the rank of a dataset is larger than 1, a for loop will iterate the slices of the dataset. For example in the below code, it use a for loop to iterate a 2-dimensional dataset. Then print the list representation of each object in the iteration. In this case each object is a 1-dimensional slice of the dataset.

>>> c = arange(12, [6,]) >>> for obj in c :
... print obj.storage
[0]
[2]
[4]
[6]
[8]
[10]
In the example below, it uses a for loop to iterate a 1-dimensional array. In each iteration, it returns an integer number.
>>> d = arange(6) >>> for obj in c :
... print obj
0
1
2
3
4
5
  • The iter(), next() iterating. It performs similar to the for loop iterating. The object retrieved in each iteration is one dimension less than then parent dataset.

>>> e = arange(6, [2,])
>>> it = iter(e) >>> while it.has_next() :
... print it.next().storage
[0]
[3]
>>> e1 = e[1]
>>> it1 = iter(e1) >>> while it1.has_next() :
... print it1.next()
3
4
5
  • The value iterator. Use item_iter() and item_iter.next() to iterate all values of the dataset.

>>> e = arange(6, [2,])
>>> iit = e.item_iter() >>> while iit.has_next() :
... print iit.next()
0
1
2
3
4
5
  • A section iterator. Use section_iter(shape) and next_section() to iterate sub-sections of the dataset, with the shape of the sub-section provided.

>>> f = arange(16, [2,])
>>> f.section_iter([2,]) >>> while f.has_next_section() :
... print f.next_section().storage
0 1]
[2 3

4 5]
[6 7

8 9]
[10 11

12 13]
[14 15

Set Values

  • Use the indexing way to set values for the dataset. It calls native __setitem() method for setting values. One can set a scalar value to a space in the dataset, or set values to a section of the dataset from a list, in which case the size of the list must be larger or equal than the indexed section.

>>> f[0,] = 44
>>> f[0,] = [11,]
>>> f[0,] = [1,]
>>> f[0,] = [1]
IndexError: index out of range: 1
  • To set a value to the whole dataset, use fill(value) command.

>>> f.fill(12)
  • The copy_from() command. Use this method to copy values from another dataset or array type of data. It is possible to set a length on how many values get copied.

>>> a = arange(24, [4,])
>>> b = instance([10], dtype = int)
>>> b.copy_from(a, 5)
[0]
  • To make a deep copy of a dataset. Use copy(dataset) or dataset.copy() command.
 

The Analysis Interface

The dataset is mapped to a Nexus file either stored in the physical drive or just in memory. One can efficiently load a Nexus file into a dataset. The dataset is subjected to be used in analysis programs. So it carries interfaces for data reduction and analysis.

Error Propagation

A dataset normally carries the error information. It gets stored as variance in the memory. To access the dataset variance, use dataset.var. To access the error of the dataset, use dataset.err or dataset.error. To get the error from the dataset will call a square root function, hence costly.
When initialising the dataset, one can choose to set the variance or not. If no variance is set, by default it will use a copy of the data storage as the variance.
  • Initialise variance: to initialise variance values when creating a dataset, simply add argument var in the command. For example:

>>> a = asarray([2,], var = [1.2,])
>>> print a
title: 102
units:
storage: [2]
error: [1.09544516,]
axes:
0. title: dim_0
units:
storage: [0]
  • None variance: it is also possible to create a dataset without any variance. Set default_var to False in the argument list when creating the dataset. The error propagation will perform differently for dataset without variance. For example:

>>> a = asarray([2,], default_var = False)
>>> print a
title: 102
units:
storage: [2]
axes:
0. title: dim_0
units:
storage: [0]
  • The variance can be set after the dataset is created. Simply set values to dataset.var or dataset.err will do the work.
 

Nexus Axes

If the dataset is loaded from a Nexus file, it carries axes information provided by the file. If a dataset is created from a helper function, one can initialise the axes in the argument list. If no axes information is provided, by default it creates index as axes for the dataset.
  • Example of default axes:

>>> a = arange(24, [2,])
>>> for axis in a.axes :
... print axis
title: dim_0
units:
storage: [0]
title: dim_1
units:
storage: [0]
title: dim_2
units:
storage: [0]
  • To set axes to the dataset, use dataset.axes = values syntax. If None value is set, the dataset will have an empty list as the axes property.
 
  • To access the axis of a specific dimension, use dataset.axes[i]. For example:

>>> print axis[1]
title: dim_1
units:
storage: [0]

Nexus Metadata

Dataset provides interface to access Nexus metadata. Nexus metadata are treated as public fields of the dataset. For example, to get the wavelength value of the dataset, simply call dataset.wavelength. To change the value of the property, use the same way to reference it. For example:
>>> ds.wavelength = 5.1
>>> print ds.wavelength
5.1
To expose a metadata in the Nexus file as an easy accessible property, a path table need to be provided. Before a dataset is created, one can set a dictionary file that contains the path table information to the Dataset class. To do that, use the following code as an example:
>>> Dataset._dicpath_ = '/usr/dic'
After a dataset has been created, it is still possible to add entries to the path table. Use dataset.dict.addEntry(name, xpath) to append entries, where name is a given short name for the metadata, and xpath is the path to access the metadata in the Nexus way. For example:
>>> ds = arange(5)
>>> ds.dict.addEntry('theta', '$entry/data/dim_0')
>>> print ds.theta
title: dim_0
units:
storage: [0]

Normalisation

The dataset can be normalised against certain metadata. For example total counts or counting time. To enable the normalisation, set the normalising factor to the dataset factory. For example,
>>> DatasetFactory._normalising_factor_ = 'monitor_data'
Normalisation is also performed when two datasets are added together.

Nexus Import and Export

Importing:
To load a Nexus file to a dataset, simple use the constructor Dataset(filepath), for example:
>>> ds = Dataset('/user/data/nexusdata.nx.hdf')
There is a helper function to help you load ANSTO Nexus data from a pre-given data folder. The requirement is to set the folder path and instrument prefix first. Then use df[index] to access a file that follows the naming convention of [instrument][seven].nx.hdf. For example,
>>> DatasetFactory._prefix_ = 'ECH'
>>> DatasetFactory._path_ = 'user/data/current'
>>> ds = df[4918]
Exporting:
Dataset interface supports exporting to a Nexus hdf file. To save a copy of the dataset to a given path, use save_copy(file_path) command.
>>> ds.save_copy('/user/data/reduced/ECH0004918.reduced.hdf')
It is also possible to save the change to the file where the dataset is loaded. To save the change that about just a metadata, one needs to provide the name of the metadata. If no name is provided, it will overwrite everything.
>>> ds.save('theta')

Examples

Here is an example of Dataset class used in Numpy routine:
from gumpy.nexus import *
# create a dataset instance with given shape
ds = instance([3,])
# fill data by slicing
for block in ds :
for row in block :
row.copy_from(arange(4))
# math calculation
ds += arange(48, [3,]) * 2.0
# array manipulation
dss = split(ds, 2, axis = 1)
ds = dss[0]
# interactive with python list
ds[0] *= sin(asarray(1, 2, 2, 3], [2, 1, 3, 2))
# construct from repr
new_ds = eval(repr(ds))
print new_ds

Below is an example of Dataset usage in Nexus data reduction.
#######################################################################
# reduction.py
# library of data reduction
# use ECH[id] to load data, e.g. use ECH[4918] to load ECH0004918.nx.hdf
#
#######################################################################
from gumpy.nexus import *
from gumpy.echidna import *
# control parameters
do_background = True
do_efficiency = True
background_ds = ECH[backgroundFile]
efficiency_ds = ECH[efficiencyMap]
def reduce(ds):
# background correction
if do_background :
print do background correction ... ,
do_bkg(ds)
print done
# efficiency correction
if do_efficiency :
print do efficiency correction ... ,
ds = do_eff(ds)
print done
# reduce the time_of_flight dimension
if ds.ndim > 3 :
ds = ds.get_reduced(1)
# do stitching
print do stitching ... ,
stds = stitch(ds)
stds._copy_metadata_(ds, 0)
ds = stds
print done
# do vertical integration
print do integration ... ,
ds = v_intg(ds)
print done
res = ds
return res
# use this methord to do background correction
def do_bkg(ds):
for i in xrange(len(ds)) :
if i < len(background_ds) :
ds[i] -= background_ds[i]
# remove negative values
it = ds.item_iter()
while it.has_next() :
value = it.next_value()
if value < 0 :
it.set_current(0)
return ds
# use this methord to do efficiency correction
def do_eff(ds):
ds /= efficiency_ds
return ds
# use this methord to do data stitching
def stitch(ds):
nshape = [ds.shape[1],]
res = dataset.instance(nshape)
rhaxis = simpledata.instance([nshape[1]])
haxis = ds.axes[-1]
rhaxis.title = haxis.title
sth = ds.sth
i_frame = ds.shape[0]
for i in xrange(ds.shape[0]) :
res[:,] = ds[i]
rhaxis[slice(i,] = haxis + sth[i]
print ... ,
raxes = [ds.axes[-2],]
res.set_axes(raxes)
return res
# use this methord to do vertical integration
def v_intg(ds):
return ds.sum(1)


#######################################################################
# testReduction.py
# batch reducing data 4918 to 4963 in Echidna data source path
#######################################################################
from gumpy.echidna.reduction import *
start_id = 4918
stop_id = 4964
viewer = browser.DataBrowser(True)
ress = []
for id in xrange(start_id, stop_id + 1) :
ds = ECH[id]
print ds.title + loaded
viewer.add(ds)
res = reduce(ds)
new_title = ds.title.split(.)[0] + .reduced.hdf
res.title = new_title
viewer.add(res)
ress.append(res)
print export result ... ,
res.save_copy(save_path +
+ new_title)
print done

Plotting – Engine of Curve and Image Plot

Overview

The Python interface for plotting in Gumtree provides scientific plot functionality for one-dimensional curve plot and two-dimensional image plot. The interface creates plot objects in Java and provides convenient access with Python syntax.
Curve plot:
The curve plot is also called plot 1D. The plot takes vector datasets as input. The dataset may have axis information. The axis will be used to scale the horizontal axis of the plot. It is Ok to plot multiple datasets. Each dataset in the plot will be assigned with a unique colour. The interface provide interface for managing the datasets and how they are rendered.
Image plot:
The image plot is also called image 2D. The image plot takes a single two-dimensional dataset as input. The dataset may have up to two axes information. The axis will be used to scale the vertical axis and horizontal axis of the plot. The plot render the dataset as a 2D histogram image.
The Python plot interface is depending on the Gumtree workbench environment, although the Java plot engine is not.

Curve Plot Interface – Plot

Create Curve Plot

The curve plot interface is called Plot. A convenient way of creating an empty plot is:
>>> from gumpy.vis.plot1d import *
>>> p1 = plot()
To create a plot that has a dataset, use
>>> from gumpy.nexus import dataset
>>> ds = dataset.rand(100)
>>> p1 = plot(ds)

An example of the curve plot is shown in the following picture.
Image:attachments/230400516/230564809.png

Dataset Management

The Python interface provides convenient functions to manage datasets in plot.
  • set_dataset(ds) : set a single dataset to the plot. All existing datasets get removed and the given dataset gets added to the plot.
  • add_dataset(ds) : add a single dataset to the plot. Existing datasets in the plot are intact. A update event will be triggered and the plot is automatically re-ranging in both axes.
  • remove_dataset(ds) : the referred dataset gets removed from the plot. The plot does a re-ranging afterwards. If the given dataset does not exist in the plot, the plot does nothing.
  • Select_dataset(ds) : select a dataset in the plot will make it highlighted.
  • get_datasets() : quick access to the datasets. This returns a shallow copy of the datasets in the plot as a list. Modifying this list will not affect the datasets property of the plot.
  • *[plot].datasets : use the attribute to access the datasets the same way as get_datasets(). It makes a shallow copy of the real datasets as well.
  • *[plot].ds : access the first dataset in the plot.
 
  • [plot] means an instance of Plot.

    Axis Control

  • set_x_label(text) : change the label text of the horizontal axis.
  • set_y_label(text) : change the label text of the vertical axis.
  • set_log_x_on(flag) : change horizontal axis from linear scale to logarithm scale.
  • set_log_y_on(flag) : change vertical axis from linear scale to logarithm scale.
  • set_x_flipped(flag) : flip the curve horizontally.
  • set_y_flipped(flag) : flip the curve vertically.
  • set_x_range(min, max) : set the zooming range in the horizontal axis.
  • set_y_range(min, max) : set the zooming range in the vertical axis.
  • set_bounds(x_min, x_max, y_min, y_max) : set the zooming range in both axes.
  • restore_x_range() : reset the zooming in horizontal axis.
  • restore_x_range() : reset the zooming in vertical axis.
  • restore_bounds() : reset the zooming in both axes.
 

Rendering Control

  • set_error_bar_on(flag) : control whether to show the error bar for all the curve.
  • set_marker_on(flag) : control whether to show marker for all the curves.
  • set_color(ds, color) : change the colour of the curve that represents the specific dataset.
  • set_marker_shape(ds, marker_shape) : change the marker shape of the curve that represents the specific dataset.
  • set_marker_filled(ds, flag) : set whether to fill the marker of the curve that represents the specific dataset.
  • set_legend_on(flag) : control whether to show the legend.
  • set_legend_title(ds, text) : change the title of the curve that is shown in the legend.
  • set_title(text) : change the title text of the plot.
  • update() : apply all the changes and re-render the plot.
 

Mask Control

  • add_mask(x_min, x_max, name, is_inclusive) : add a region mask to the plot, with the given limit values. Optionally, one can provide a name for the mask. By default the mask is used with inclusive purpose unless the is_inclusive is set to be false.
  • select_mask(obj) : select a mask in the plot either by the reference or by name. This will highlight the mask in the plot and make it targe for change.
  • remove_mask(obj) : remove a mask from the plot either by the reference or by name. The plot will update itself afterwards.
  • *[plot].masks : access the masks as an attribute. It will make a shallow copy of the mask list of the plot. Changing this copy list will not affect the original mask list in the plot.
 

I/O Control

  • save_as_png(filename) : save the plot as a picture in PNG format.
  • save_as_jpg(filename) : save the plot as a picture in JPEG format.
 

Image Plot Interface – Image

Create Image Plot

The image plot interface is called Image. A convenient way of creating an empty image is:
>>> from gumpy.vis.image2d import *
>>> i1 = image()
To create a image with a dataset, use
>>> from gumpy.nexus import dataset
>>> ds = dataset.rand([100,])
>>> i2 = image(ds)

An example of the image plot is shown in the following picture.
Image:attachments/230400516/230564810.png

Dataset Management

The Python interface provides convenient functions to manage datasets in the image.
  • set_dataset(ds) : set a dataset to the image. The existing dataset gets removed and the given dataset gets rendered in the image.
  • get_dataset() : quick access to the datasets.
  • *[image].ds : use the attribute to access the dataset the same way as get_dataset().
 
  • [image] is an instance of Image.

    Axis Control

  • set_x_label(text) : change the label text of the horizontal axis.
  • set_y_label(text) : change the label text of the vertical axis.
  • set_x_flipped(flag) : flip the curve horizontally.
  • set_y_flipped(flag) : flip the curve vertically.
  • set_x_range(min, max) : set the zooming range in the horizontal axis.
  • set_y_range(min, max) : set the zooming range in the vertical axis.
  • set_bounds(x_min, x_max, y_min, y_max) : set the zooming range in both axes.
  • restore_x_range() : reset the zooming in horizontal axis.
  • restore_x_range() : reset the zooming in vertical axis.
  • restore_bounds() : reset the zooming in both axes.
 

Rendering Control

  • set_color_scale(ds, color_scale) : change the rendering colour scale.
  • set_log_scale_on(flag) : set whether to use a logarithm colour scale.
  • set_marker_filled(ds, flag) : set whether to fill the marker of the curve that represents the specific dataset.
  • set_title(text) : change the title text of the image.
  • update() : apply all the changes and re-render the image.
 

Mask Control

  • add_mask(x_min, x_max, y_min, y_max, name, is_inclusive, shape) : add a region mask to the image plot, with the given limit values of a rectangle frame. Optionally, one can provide a name for the mask. By default the mask is used with inclusive purpose unless the is_inclusive is set to be false. By default the mask will be in a rectangle shape unless it is set to be 'ellipse' shape.
  • select_mask(obj) : select a mask in the image either by the reference or by name. This will highlight the mask in the image and make it targe for change.
  • remove_mask(obj) : remove a mask from the plot either by the reference or by name. The image will update itself afterwards.
  • *[image].masks : access the masks as an attribute. It will make a shallow copy of the mask list of the image. Changing this copy list will not affect the original mask list in the image.
 

I/O Control

  • save_as_png(filename) : save the image as a picture in PNG format.
  • save_as_jpg(filename) : save the image as a picture in JPEG format.
 

Examples

Plot 1D

from gumpy.nexus import *
from echidna import *
from gumpy.vis.plot1d import *
# load Echidna data with a reference number id.
d1 = ECH[4918]
# reduce the data to 3d if it's 4d.
d2 = d1.get_reduced()
# do a vertical integration for the first frame of data. Result is a 1d dataset.
d3 = d2[0].sum(1)
# open a plot with given dataset.
p1 = plot(d3, 1D Plot)
# make another 1d dataset.
d4 = d2[1].sum(1)
# add the dataset to the plot.
p1.add_dataset(d4)
# set title to the plot.
p1.set_title(Plot Example)

Image 2D

from gumpy.nexus import *
from echidna import *
from gumpy.vis.image2d import *
# load Echidna data with a reference number id.
d1 = ECH[4918]
# reduce the data to 3d if it's 4d.
d2 = d1.get_reduced()
# get the first frame of the data as a 2d dataset.
d3 = d2[0]
# open a image with given dataset.
p2 = image(d3, 2D Plot)

Appendix – Architecture Diagram


Image:attachments/230400516/230564811.png
Document generated by Confluence on Apr 01, 2015 00:11
Clone this wiki locally