Skip to content

Commit

Permalink
Documentation edits and setup.py cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
shoyer committed May 3, 2014
1 parent 5292c94 commit 867319e
Show file tree
Hide file tree
Showing 4 changed files with 85 additions and 37 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ makes many powerful array operations possible:
dimensions (known in numpy as "broadcasting") based on dimension names,
regardless of their original order.
- Flexible split-apply-combine operations with groupby:
`x.groupby('time.dayofyear').apply(lambda y: y - y.mean())`.
`x.groupby('time.dayofyear').mean()`.
- Database like aligment based on coordinate labels that smoothly
handles missing values: `x, y = xray.align(x, y, join='outer')`.
- Keep track of arbitrary metadata in the form of a Python dictionary:
Expand Down
19 changes: 16 additions & 3 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ homogeneous, n-dimensional arrays. It implements flexible array
operations and dataset manipulation for in-memory datasets within the
`Common Data
Model <http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM/>`__
widely used for self-describing scientific data (e.g., the NetCDF file
widely used for self-describing scientific data (e.g., the
`NetCDF <http://www.unidata.ucar.edu/software/netcdf/>`__ file
format).

Why xray?
Expand All @@ -23,7 +24,7 @@ makes many powerful array operations possible:
dimensions (known in numpy as "broadcasting") based on dimension
names, regardless of their original order.
- Flexible split-apply-combine operations with groupby:
``x.groupby('time.dayofyear').apply(lambda y: y - y.mean())``.
``x.groupby('time.dayofyear').mean()``.
- Database like aligment based on coordinate labels that smoothly
handles missing values: ``x, y = xray.align(x, y, join='outer')``.
- Keep track of arbitrary metadata in the form of a Python dictionary:
Expand All @@ -42,7 +43,19 @@ a numpy ``ndarray`` or a pandas ``DataFrame`` or ``Series``, providing
compatibility with the full `PyData ecosystem <http://pydata.org/>`__.

For a longer introduction to **xray** and its design goals, see
`the project's GitHub page <https://github.com/akleeman/xray>`__.
`the project's GitHub page <http://github.com/akleeman/xray>`__. The GitHub
page is where to go to look at the code, report a bug or make your own
contribution. You can also get in touch via `Twitter
<http://twitter.com/shoyer>`__.

.. note ::
**xray** is still very new -- it is on its first release and is only a few
months old. Although we will make a best effort to maintain the current
API, it is likely that the API will change in future versions as xray
matures. Some changes are already anticipated, as called out in the
`Tutorial <tutorial>`_ and the project `README
<http://github.com/akleeman/xray>`__.
Contents
--------
Expand Down
62 changes: 41 additions & 21 deletions doc/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,24 @@ Tutorial
import numpy as np
np.random.seed(123456)
To get started, we will import numpy, pandas and xray:

.. ipython:: python
import numpy as np
import pandas as pd
import xray
``Dataset`` objects
-------------------

:py:class:`xray.Dataset` is xray's primary data structure. It is a dict-like
container of labeled arrays (:py:class:`xray.DataArray` objects) with aligned
dimensions. It is designed as an in-memory representation of the data model
from the `NetCDF`__ file format.

__ http://www.unidata.ucar.edu/software/netcdf/

Creating a ``Dataset``
~~~~~~~~~~~~~~~~~~~~~~

Expand All @@ -22,10 +37,6 @@ values in the form ``(dimensions, data[, attributes])``.

.. ipython:: python
import numpy as np
import pandas as pd
import xray
foo_values = np.random.RandomState(0).rand(3, 4)
times = pd.date_range('2000-01-01', periods=3)
Expand All @@ -34,7 +45,7 @@ values in the form ``(dimensions, data[, attributes])``.
ds
You can also insert :py:class:`xray.Variable` or :py:class:`xray.DataArray`
objects directly into a Dataset, or create an dataset from a
objects directly into a ``Dataset``, or create an dataset from a
:py:class:`pandas.DataFrame` with
:py:meth:`Dataset.from_dataframe <xray.Dataset.from_dataframe>` or from a
NetCDF file on disk with :py:func:`~xray.open_dataset`. See
Expand All @@ -44,8 +55,7 @@ NetCDF file on disk with :py:func:`~xray.open_dataset`. See
~~~~~~~~~~~~~~~~~~~~

:py:class:`~xray.Dataset` implements the Python dictionary interface, with
values given by :py:class:`xray.DataArray` objects. The valid keys include
each listed "coordinate" and "noncoordinate":
values given by :py:class:`xray.DataArray` objects:

.. ipython:: python
Expand All @@ -55,7 +65,19 @@ each listed "coordinate" and "noncoordinate":
ds['time']
We didn't explicitly include a variable for the "space" dimension, so it
The valid keys include each listed "coordinate" and "noncoordinate".
Coordinates are arrays that labels values along a particular dimension, which
they index by keeping track of a :py:class:`pandas.Index` object. They
are created automatically from dataset arrays whose name is equal to the one
item in their list of dimensions.

Noncoordinates include all arrays in a ``Dataset`` other than its coordinates.
These arrays can exist along multiple dimensions. The numbers in the columns in
the ``Dataset`` representation indicate the order in which dimensions appear
for each array (on a ``Dataset``, the dimensions are always listed in
alphabetical order).

We didn't explicitly include a coordinate for the "space" dimension, so it
was filled with an array of ascending integers of the proper length:

.. ipython:: python
Expand All @@ -64,9 +86,9 @@ was filled with an array of ascending integers of the proper length:
ds['foo']
The numbers in the columns in the ``Dataset`` representation indicate the order
of the dimension for each array (on a ``Dataset``, the dimensions are always
listed in alphabetical order).
Noncoordinates and coordinates are listed explicitly by the
:py:attr:`~xray.Dataset.noncoordinates` and
:py:attr:`~xray.Dataset.coordinates` attributes.

There are also a few derived variables based on datetime coordinates that you
can access from a dataset (e.g., "year", "month" and "day"), even if you didn't
Expand All @@ -77,7 +99,7 @@ explicitly add them. These are known as
ds['time.dayofyear']
Finally, Datasets also store arbitrary metadata in the form of `attributes`:
Finally, datasets also store arbitrary metadata in the form of `attributes`:

.. ipython:: python
Expand Down Expand Up @@ -225,7 +247,7 @@ including the name of only the DataArray itself:
foo2
`foo2` is generally an equivalent labeled array to `foo`, but we dropped the
non-relevant dataset variables:
dataset variables that are no longer relevant:

.. ipython:: python
Expand Down Expand Up @@ -309,7 +331,7 @@ which wouldn't work with numpy arrays:
This is a much simpler model than numpy's `advanced indexing`__,
and is basically the only model that works for labeled arrays. If you would
like to do this sort of indexing, so you always index ``.values`` instead:
like to do advanced indexing, so you always index ``.values`` instead:

__ http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

Expand Down Expand Up @@ -794,7 +816,7 @@ section below.

Although xray provides reasonable support for incremental reads of files on
disk, it does not yet support incremental writes, which is important for
dealing with datasets that do not fit into memory. This is a major
dealing with datasets that do not fit into memory. This is a significant
shortcoming which is on the roadmap for fixing in the next major version,
which will include the ability to create ``Dataset`` objects directly
linked to a NetCDF file on disk.
Expand Down Expand Up @@ -901,10 +923,10 @@ Now, let's access and plot a small subset:

In [6]: tmax_ss = tmax[0]

For this dataset, we still to manually fill in some of the values with `NaN`
to indicate that they are missing. As soon as we access ``tmax_ss.values``, the
values are loaded over the network and cached on the DataArray so they can
be manipulated:
For this dataset, we still need to manually fill in some of the values with
`NaN` to indicate that they are missing. As soon as we access
``tmax_ss.values``, the values are loaded over the network and cached on the
DataArray so they can be manipulated:

.. ipython::
:verbatim:
Expand All @@ -930,5 +952,3 @@ Finally, we can plot the values with matplotlib:
In [13]: plt.colorbar()

.. image:: _static/opendap-prism-tmax.png


39 changes: 27 additions & 12 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,23 @@
QUALIFIER = ''


FULL_DESCRIPTION = """
DISTNAME = 'xray'
LICENSE = 'Apache'
AUTHOR = 'Stephan Hoyer, Alex Kleeman, Eugene Brevdo'
AUTHOR_EMAIL = '[email protected]'
URL = 'https://github.com/akleeman/xray'
CLASSIFIERS = [
'Development Status :: 3 - Alpha',
'License :: OSI Approved :: Apache Software License',
'Operating System :: OS Independent',
'Intended Audience :: Science/Research',
'Programming Language :: Python :: 2.7',
'Topic :: Scientific/Engineering',
]


DESCRIPTION = "Extended arrays for working with scientific datasets in Python"
LONG_DESCRIPTION = """
**xray** is a Python package for working with aligned sets of
homogeneous, n-dimensional arrays. It implements flexible array
operations and dataset manipulation for in-memory datasets within the
Expand All @@ -39,7 +55,7 @@
dimensions (known in numpy as "broadcasting") based on dimension
names, regardless of their original order.
- Flexible split-apply-combine operations with groupby:
``x.groupby('time.dayofyear').apply(lambda y: y - y.mean())``.
``x.groupby('time.dayofyear').mean()``.
- Database like aligment based on coordinate labels that smoothly
handles missing values: ``x, y = xray.align(x, y, join='outer')``.
- Keep track of arbitrary metadata in the form of a Python dictionary:
Expand All @@ -57,12 +73,11 @@
a numpy ``ndarray`` or a pandas ``DataFrame`` or ``Series``, providing
compatibility with the full `PyData ecosystem <http://pydata.org/>`__.
For a longer introduction to **xray**, see the project's README on GitHub_.
.. _GitHub: https://github.com/akleeman/xray
For more about **xray**, see the project's `GitHub page
<https://github.com/akleeman/xray>`__ and `documentation
<http://xray.readthedocs.org>`__
"""


# code to extract and write the version copied from pandas, which is available
# under the BSD license:
FULLVERSION = VERSION
Expand Down Expand Up @@ -130,14 +145,14 @@ def write_version_py(filename=None):
write_version_py()


setup(name='xray',
setup(name=DISTNAME,
version=FULLVERSION,
description='Extended arrays for working with scientific datasets in Python',
full_description=FULL_DESCRIPTION,
author='Stephan Hoyer, Alex Kleeman, Eugene Brevdo',
author_email='TODO',
description=DESCRIPTION,
long_description=LONG_DESCRIPTION,
author=AUTHOR,
author_email=AUTHOR_EMAIL,
install_requires=['numpy >= 1.8', 'pandas >= 0.13.1'],
tests_require=['mock >= 1.0.1', 'nose >= 1.0'],
url='https://github.com/akleeman/xray',
url=URL,
test_suite='nose.collector',
packages=['xray', 'xray.backends'])

0 comments on commit 867319e

Please sign in to comment.