Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

Commit

Permalink
Merge pull request #206 from scikit-hep/issue-205
Browse files Browse the repository at this point in the history
Fixes #205.
  • Loading branch information
jpivarski authored Oct 18, 2019
2 parents 2a0ad59 + faa9958 commit c124877
Show file tree
Hide file tree
Showing 5 changed files with 22 additions and 9 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ install:
- python -c 'import awkward; print(awkward.__version__)'
- if [[ $TRAVIS_PYTHON_VERSION != pypy* && $NUMPY == "numpy==1.13.1" ]] ; then pip install h5py ; fi
- if [[ $TRAVIS_PYTHON_VERSION != pypy* && $NUMPY != "numpy==1.13.1" ]] ; then pip install h5py pyarrow; python -c 'import pyarrow; print("pyarrow", pyarrow.__version__)' ; fi
- if [[ $TRAVIS_PYTHON_VERSION != pypy* ]] ; then pip install numba ; ln -s ../awkward-numba/awkward/numba awkward/numba ; fi
# - if [[ $TRAVIS_PYTHON_VERSION != pypy* ]] ; then pip install numba ; ln -s ../awkward-numba/awkward/numba awkward/numba ; fi
- if [[ $TRAVIS_PYTHON_VERSION != pypy* ]] ; then pip install pybind11 ; cd awkward-cpp ; python setup.py build ; cd .. ; tree awkward-cpp/build/ ; cd awkward ; ln -s ../awkward-cpp/build/lib.*/awkward/cpp cpp ; cd .. ; ls -l awkward/cpp ; ls -l awkward/cpp/ ; python -c 'print("TESTING awkward-cpp"); import awkward.cpp; print(awkward.cpp.JaggedArray)' ; fi
- export AWKWARD_DEPLOYMENT=awkward
- pip install --upgrade pyOpenSSL # for deployment
Expand Down
8 changes: 3 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,17 +60,15 @@ Install awkward like any other Python package:
.. code-block:: bash
pip install awkward # maybe with sudo or --user, or in virtualenv
pip install awkward-numba # optional: integration with and optimization by Numba
or install with `conda <https://conda.io/en/latest/miniconda.html>`__:

.. code-block:: bash
conda config --add channels conda-forge # if you haven't added conda-forge already
conda install awkward
conda install awkward-numba # optional: integration with and optimization by Numba
The base ``awkward`` package requires only `Numpy <https://scipy.org/install.html>`__ (1.13.1+), but ``awkward-numba`` additionally requires `Numba <https://numba.pydata.org/numba-doc/dev/user/installing.html>`__.
The base ``awkward`` package requires only `Numpy <https://scipy.org/install.html>`__ (1.13.1+).

Recommended packages:
---------------------
Expand Down Expand Up @@ -1274,7 +1272,7 @@ The following list translates awkward-array classes and features to their Arrow
High-level operations: common to all classes
--------------------------------------------

There are three levels of abstraction in awkward-array: high-level operations for data analysis, low-level operations for engineering the structure of the data, and implementation details. Implementation details are handled in the usual way for Python: if exposed at all, class, method, and function names begin with underscores and are not guaranteed to be stable from one release to the next. There is more than one implementation of awkward: the original awkward library, which depends only on Numpy, awkward-numba, which uses Numba to just-in-time compile its operations, and awkward-cpp, which has precompiled operations. Each has its own implementation details.
There are three levels of abstraction in awkward-array: high-level operations for data analysis, low-level operations for engineering the structure of the data, and implementation details. Implementation details are handled in the usual way for Python: if exposed at all, class, method, and function names begin with underscores and are not guaranteed to be stable from one release to the next.

The distinction between high-level operations and low-level operations is more subtle and developed as awkward-array was put to use. Data analysts care about the logical structure of the data—whether it is jagged, what the column names are, whether certain values could be ``None``, etc. Data engineers (or an analyst in "engineering mode") care about contiguousness, how much data are in memory at a given time, whether strings are dictionary-encoded, whether arrays have unreachable elements, etc. The dividing line is between high-level types and low-level array layout (both of which are defined in their own sections below). The following awkward classes have the same high-level type as their content:

Expand Down Expand Up @@ -3086,7 +3084,7 @@ Functions for input/output and conversion

Most of the functions defined at the top-level of the library are conversion functions.

* ``awkward.fromiter(iterable, awkwardlib=None, dictencoding=False, maskedwhen=True)``: convert Python or JSON data into awkward arrays. Not a fast function: it necessarily involves a Python for loop. The ``awkwardlib`` determines which awkward module to use to make arrays (``awkward`` is the default, but ``awkward.numba`` and ``awkward.cpp`` are alternatives). If ``dictencoding`` is ``True``, bytes and strings will be "dictionary-encoded" in Arrow/Parquet terms—this is an ``IndexedArray`` in awkward. The ``maskedwhen`` parameter determines whether ``MaskedArrays`` have a mask that is ``True`` when data are missing or ``False`` when data are missing.
* ``awkward.fromiter(iterable, awkwardlib=None, dictencoding=False, maskedwhen=True)``: convert Python or JSON data into awkward arrays. Not a fast function: it necessarily involves a Python for loop. The ``awkwardlib`` determines which awkward module to use to make arrays. If ``dictencoding`` is ``True``, bytes and strings will be "dictionary-encoded" in Arrow/Parquet terms—this is an ``IndexedArray`` in awkward. The ``maskedwhen`` parameter determines whether ``MaskedArrays`` have a mask that is ``True`` when data are missing or ``False`` when data are missing.

.. code-block:: python3
Expand Down
9 changes: 8 additions & 1 deletion awkward/array/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -572,10 +572,17 @@ def concatenate(isclassmethod, cls_or_self, arrays, axis=0):
cls = type(self)
arrays = (self,) + tuple(arrays)

def resolve(t):
for b in t.__bases__:
if issubclass(t, AwkwardArray):
return resolve(b)
else:
return t

if all(type(x) == cls.numpy.ndarray for x in arrays):
return cls.numpy.concatenate(arrays, axis=axis)

if not all(type(x) == type(arrays[0]) for x in arrays):
if not all(resolve(type(x)) == resolve(type(arrays[0])) for x in arrays):
if axis == 0:
tags = cls.numpy.concatenate([cls.numpy.full(len(x), i, dtype=cls.TAGTYPE) for i, x in enumerate(arrays)])
return cls.UnionArray.fget(None).fromtags(tags, arrays)
Expand Down
10 changes: 9 additions & 1 deletion awkward/array/jagged.py
Original file line number Diff line number Diff line change
Expand Up @@ -1633,6 +1633,8 @@ def _concatenate_axis0(cls, arrays):

@classmethod
def _concatenate_axis1(cls, arrays):
import awkward.array.table

if len(arrays) == 0:
raise ValueError("at least one array must be provided") # this can only happen in the classmethod case
if any(len(a) != len(arrays[0]) for a in arrays):
Expand All @@ -1645,6 +1647,12 @@ def _concatenate_axis1(cls, arrays):
flatarrays = [a.flatten() for a in arrays]
n_arrays = len(arrays)

if n_arrays > 0 and all(isinstance(a, awkward.array.table.Table) and set(cls._util_columns_descend(a, set())) == set(cls._util_columns_descend(flatarrays[0], set())) for a in flatarrays):
results = {}
for n in flatarrays[0].columns:
results[n] = cls._concatenate_axis1([a[n] for a in arrays])
return cls.zip(results)

# the first step is to get the starts and stops for the stacked structure
counts = np.vstack([a.counts for a in arrays])
flat_counts = counts.T.flatten()
Expand Down Expand Up @@ -1704,7 +1712,7 @@ def get_dtype(arrays):
content = flatarrays[0].copy(content=awkward.array.table.Table(**tablecontent))

else:
raise NotImplementedError("concatenate with axis=1 is not implemented for " + type(arrays[0]).__name__)
raise NotImplementedError("concatenate with axis=1 is not implemented for these types")

return arrays[0].__class__(starts[::n_arrays], stops[n_arrays-1::n_arrays], content)

Expand Down
2 changes: 1 addition & 1 deletion awkward/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

import re

__version__ = "0.12.13"
__version__ = "0.12.14"
version = __version__
version_info = tuple(re.split(r"[-\.]", __version__))

Expand Down

0 comments on commit c124877

Please sign in to comment.