Documentation edits and setup.py cleanup

pydata · May 3, 2014 · 867319e · 867319e
1 parent 5292c94
commit 867319e
Show file tree

Hide file tree

Showing 4 changed files with 85 additions and 37 deletions.
diff --git a/README.md b/README.md
@@ -22,7 +22,7 @@ makes many powerful array operations possible:
     dimensions (known in numpy as "broadcasting") based on dimension names,
     regardless of their original order.
   - Flexible split-apply-combine operations with groupby:
-    `x.groupby('time.dayofyear').apply(lambda y: y - y.mean())`.
+    `x.groupby('time.dayofyear').mean()`.
   - Database like aligment based on coordinate labels that smoothly
     handles missing values: `x, y = xray.align(x, y, join='outer')`.
   - Keep track of arbitrary metadata in the form of a Python dictionary:

diff --git a/doc/index.rst b/doc/index.rst
@@ -6,7 +6,8 @@ homogeneous, n-dimensional arrays. It implements flexible array
 operations and dataset manipulation for in-memory datasets within the
 `Common Data
 Model <http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM/>`__
-widely used for self-describing scientific data (e.g., the NetCDF file
+widely used for self-describing scientific data (e.g., the
+`NetCDF <http://www.unidata.ucar.edu/software/netcdf/>`__ file
 format).
 
 Why xray?
@@ -23,7 +24,7 @@ makes many powerful array operations possible:
    dimensions (known in numpy as "broadcasting") based on dimension
    names, regardless of their original order.
 -  Flexible split-apply-combine operations with groupby:
-   ``x.groupby('time.dayofyear').apply(lambda y: y - y.mean())``.
+   ``x.groupby('time.dayofyear').mean()``.
 -  Database like aligment based on coordinate labels that smoothly
    handles missing values: ``x, y = xray.align(x, y, join='outer')``.
 -  Keep track of arbitrary metadata in the form of a Python dictionary:
@@ -42,7 +43,19 @@ a numpy ``ndarray`` or a pandas ``DataFrame`` or ``Series``, providing
 compatibility with the full `PyData ecosystem <http://pydata.org/>`__.
 
 For a longer introduction to **xray** and its design goals, see
-`the project's GitHub page <https://github.com/akleeman/xray>`__.
+`the project's GitHub page <http://github.com/akleeman/xray>`__. The GitHub
+page is where to go to look at the code, report a bug or make your own
+contribution. You can also get in touch via `Twitter
+<http://twitter.com/shoyer>`__.
+
+.. note ::
+
+    **xray** is still very new -- it is on its first release and is only a few
+    months old. Although we will make a best effort to maintain the current
+    API, it is likely that the API will change in future versions as xray
+    matures. Some changes are already anticipated, as called out in the
+    `Tutorial <tutorial>`_ and the project `README
+    <http://github.com/akleeman/xray>`__.
 
 Contents
 --------

diff --git a/doc/tutorial.rst b/doc/tutorial.rst
@@ -7,9 +7,24 @@ Tutorial
    import numpy as np
    np.random.seed(123456)
 
+To get started, we will import numpy, pandas and xray:
+
+.. ipython:: python
+
+    import numpy as np
+    import pandas as pd
+    import xray
+
 ``Dataset`` objects
 -------------------
 
+:py:class:`xray.Dataset` is xray's primary data structure. It is a dict-like
+container of labeled arrays (:py:class:`xray.DataArray` objects) with aligned
+dimensions. It is designed as an in-memory representation of the data model
+from the `NetCDF`__ file format.
+
+__ http://www.unidata.ucar.edu/software/netcdf/
+
 Creating a ``Dataset``
 ~~~~~~~~~~~~~~~~~~~~~~
 
@@ -22,10 +37,6 @@ values in the form ``(dimensions, data[, attributes])``.
 
 .. ipython:: python
 
-    import numpy as np
-    import pandas as pd
-    import xray
-
     foo_values = np.random.RandomState(0).rand(3, 4)
     times = pd.date_range('2000-01-01', periods=3)
 
@@ -34,7 +45,7 @@ values in the form ``(dimensions, data[, attributes])``.
     ds
 
 You can also insert :py:class:`xray.Variable` or :py:class:`xray.DataArray`
-objects directly into a Dataset, or create an dataset from a
+objects directly into a ``Dataset``, or create an dataset from a
 :py:class:`pandas.DataFrame` with
 :py:meth:`Dataset.from_dataframe <xray.Dataset.from_dataframe>` or from a
 NetCDF file on disk with :py:func:`~xray.open_dataset`. See
@@ -44,8 +55,7 @@ NetCDF file on disk with :py:func:`~xray.open_dataset`. See
 ~~~~~~~~~~~~~~~~~~~~
 
 :py:class:`~xray.Dataset` implements the Python dictionary interface, with
-values given by :py:class:`xray.DataArray` objects. The valid keys include
-each listed "coordinate" and "noncoordinate":
+values given by :py:class:`xray.DataArray` objects:
 
 .. ipython:: python
 
@@ -55,7 +65,19 @@ each listed "coordinate" and "noncoordinate":
 
     ds['time']
 
-We didn't explicitly include a variable for the "space" dimension, so it
+The valid keys include each listed "coordinate" and "noncoordinate".
+Coordinates are arrays that labels values along a particular dimension, which
+they index by keeping track of a :py:class:`pandas.Index` object. They
+are created automatically from dataset arrays whose name is equal to the one
+item in their list of dimensions.
+
+Noncoordinates include all arrays in a ``Dataset`` other than its coordinates.
+These arrays can exist along multiple dimensions. The numbers in the columns in
+the ``Dataset`` representation indicate the order in which dimensions appear
+for each array (on a ``Dataset``, the dimensions are always listed in
+alphabetical order).
+
+We didn't explicitly include a coordinate for the "space" dimension, so it
 was filled with an array of ascending integers of the proper length:
 
 .. ipython:: python
@@ -64,9 +86,9 @@ was filled with an array of ascending integers of the proper length:
 
     ds['foo']
 
-The numbers in the columns in the ``Dataset`` representation indicate the order
-of the dimension for each array (on a ``Dataset``, the dimensions are always
-listed in alphabetical order).
+Noncoordinates and coordinates are listed explicitly by the
+:py:attr:`~xray.Dataset.noncoordinates` and
+:py:attr:`~xray.Dataset.coordinates` attributes.
 
 There are also a few derived variables based on datetime coordinates that you
 can access from a dataset (e.g., "year", "month" and "day"), even if you didn't
@@ -77,7 +99,7 @@ explicitly add them. These are known as
 
     ds['time.dayofyear']
 
-Finally, Datasets also store arbitrary metadata in the form of `attributes`:
+Finally, datasets also store arbitrary metadata in the form of `attributes`:
 
 .. ipython:: python
 
@@ -225,7 +247,7 @@ including the name of only the DataArray itself:
     foo2
 
 `foo2` is generally an equivalent labeled array to `foo`, but we dropped the
-non-relevant dataset variables:
+dataset variables that are no longer relevant:
 
 .. ipython:: python
 
@@ -309,7 +331,7 @@ which wouldn't work with numpy arrays:
 
 This is a much simpler model than numpy's `advanced indexing`__,
 and is basically the only model that works for labeled arrays. If you would
-like to do this sort of indexing, so you always index ``.values`` instead:
+like to do advanced indexing, so you always index ``.values`` instead:
 
 __ http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
 
@@ -794,7 +816,7 @@ section below.
 
     Although xray provides reasonable support for incremental reads of files on
     disk, it does not yet support incremental writes, which is important for
-    dealing with datasets that do not fit into memory. This is a major
+    dealing with datasets that do not fit into memory. This is a significant
     shortcoming which is on the roadmap for fixing in the next major version,
     which will include the ability to create ``Dataset`` objects directly
     linked to a NetCDF file on disk.
@@ -901,10 +923,10 @@ Now, let's access and plot a small subset:
 
     In [6]: tmax_ss = tmax[0]
 
-For this dataset, we still to manually fill in some of the values with `NaN`
-to indicate that they are missing. As soon as we access ``tmax_ss.values``, the
-values are loaded over the network and cached on the DataArray so they can
-be manipulated:
+For this dataset, we still need to manually fill in some of the values with
+`NaN` to indicate that they are missing. As soon as we access
+``tmax_ss.values``, the values are loaded over the network and cached on the
+DataArray so they can be manipulated:
 
 .. ipython::
     :verbatim:
@@ -930,5 +952,3 @@ Finally, we can plot the values with matplotlib:
     In [13]: plt.colorbar()
 
 .. image:: _static/opendap-prism-tmax.png
-
-
diff --git a/setup.py b/setup.py
@@ -16,7 +16,23 @@
 QUALIFIER = ''
 
 
-FULL_DESCRIPTION = """
+DISTNAME = 'xray'
+LICENSE = 'Apache'
+AUTHOR = 'Stephan Hoyer, Alex Kleeman, Eugene Brevdo'
+AUTHOR_EMAIL = '[email protected]'
+URL = 'https://github.com/akleeman/xray'
+CLASSIFIERS = [
+    'Development Status :: 3 - Alpha',
+    'License :: OSI Approved :: Apache Software License',
+    'Operating System :: OS Independent',
+    'Intended Audience :: Science/Research',
+    'Programming Language :: Python :: 2.7',
+    'Topic :: Scientific/Engineering',
+]
+
+
+DESCRIPTION = "Extended arrays for working with scientific datasets in Python"
+LONG_DESCRIPTION = """
 **xray** is a Python package for working with aligned sets of
 homogeneous, n-dimensional arrays. It implements flexible array
 operations and dataset manipulation for in-memory datasets within the
@@ -39,7 +55,7 @@
    dimensions (known in numpy as "broadcasting") based on dimension
    names, regardless of their original order.
 -  Flexible split-apply-combine operations with groupby:
-   ``x.groupby('time.dayofyear').apply(lambda y: y - y.mean())``.
+   ``x.groupby('time.dayofyear').mean()``.
 -  Database like aligment based on coordinate labels that smoothly
    handles missing values: ``x, y = xray.align(x, y, join='outer')``.
 -  Keep track of arbitrary metadata in the form of a Python dictionary:
@@ -57,12 +73,11 @@
 a numpy ``ndarray`` or a pandas ``DataFrame`` or ``Series``, providing
 compatibility with the full `PyData ecosystem <http://pydata.org/>`__.
 
-For a longer introduction to **xray**, see the project's README on GitHub_.
-
-.. _GitHub: https://github.com/akleeman/xray
+For more about **xray**, see the project's `GitHub page
+<https://github.com/akleeman/xray>`__ and `documentation
+<http://xray.readthedocs.org>`__
 """
 
-
 # code to extract and write the version copied from pandas, which is available
 # under the BSD license:
 FULLVERSION = VERSION
@@ -130,14 +145,14 @@ def write_version_py(filename=None):
     write_version_py()
 
 
-setup(name='xray',
+setup(name=DISTNAME,
       version=FULLVERSION,
-      description='Extended arrays for working with scientific datasets in Python',
-      full_description=FULL_DESCRIPTION,
-      author='Stephan Hoyer, Alex Kleeman, Eugene Brevdo',
-      author_email='TODO',
+      description=DESCRIPTION,
+      long_description=LONG_DESCRIPTION,
+      author=AUTHOR,
+      author_email=AUTHOR_EMAIL,
       install_requires=['numpy >= 1.8', 'pandas >= 0.13.1'],
       tests_require=['mock >= 1.0.1', 'nose >= 1.0'],
-      url='https://github.com/akleeman/xray',
+      url=URL,
       test_suite='nose.collector',
       packages=['xray', 'xray.backends'])