Skip to content

Commit

Permalink
fixed tests
Browse files Browse the repository at this point in the history
  • Loading branch information
EelcoHoogendoorn committed Apr 20, 2016
1 parent 7e07be6 commit d1892f2
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 28 deletions.
35 changes: 9 additions & 26 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,14 @@ and set operations.
- Rich and efficient grouping functionality:
- splitting of values by key-group
- reductions of values by key-group

- Generalization of existing array set operation to nd-arrays, such as:
- unique
- union
- difference
- exclusive (xor)
- contains (in1d)

- Some new functions:
- indices: numpy equivalent of list.index
- count: numpy equivalent of collections.Counter
Expand All @@ -24,11 +26,6 @@ and set operations.
- count\_table: like R's table or pandas crosstab, or an ndim version
of np.bincount

The generalization of the existing array set operations pertains
primarily to the extension of this functionality to different types of
key objects, such as keys formed by slices of nd-arrays. For instance,
we may wish to find the intersection of several sets of graph edges.

Some brief examples to give an impression hereof:

.. code:: python
Expand Down Expand Up @@ -64,40 +61,26 @@ Design decisions:
-----------------

This package builds upon a generalization of the design pattern as can
be found in numpy.unique. That is, by argsorting an ndarray, subsequent
operations can be implemented efficiently.
be found in numpy.unique. That is, by argsorting an ndarray, many
subsequent operations can be implemented efficiently and in a vectorized
manner.

The sorting and related low level operations are encapsulated into a
hierarchy of Index classes, which allows for efficient lookup of many
properties for a variety of different key-types. The public API of this
package is a quite thin wrapper around these Index objects.

The principal information exposed by an Index object is the required
permutations to map between the original and sorted order of the keys.
This information can subsequently be used for many purposes, such as
efficiently finding the set of unique keys, or efficiently performing
group\_by logic on an array of corresponding values.

The two complex key types currently supported, beyond standard sequences
of sortable primitive types, are array keys and composite keys. For the
of sortable primitive types, are ndarray keys (i.e, finding unique
rows/columns of an array) and composite keys (zipped sequences). For the
exact casting rules describing valid sequences of key objects to index
objects, see as\_index().

Todo and open questions:
------------------------

- What about nesting of key objects? This should be possible too, but
not fully supported yet
- What about floating point nd keys? Currently, they are treated as
object indices. However, bitwise and floating point equality are not
the same thing
- Add special index classes for things like object arrays of variable
length strings?
- While this package is aimed more at expanding functionality than
optimizing performance, the most common code paths might benefit from
some specialization, such as the concatenation of sorted sets
- There may be further generalizations that could be made. merge/join
functionality perhaps?
- There may be further generalizations that could be built on top of
these abstractions. merge/join functionality perhaps?

.. |Build Status| image:: https://travis-ci.org/EelcoHoogendoorn/Numpy_arraysetops_EP.svg?branch=master
:target: https://travis-ci.org/EelcoHoogendoorn/Numpy_arraysetops_EP
Expand Down
5 changes: 3 additions & 2 deletions numpy_indexed/arraysetops.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,8 +151,9 @@ def indices(this, that, axis=semantics.axis_default, missing='raise'):

if missing != 'ignore':
invalid = this._keys[indices] != that._keys
if missing == 'raise' and np.any(invalid):
raise KeyError('Not all keys in `that` are present in `this`')
if missing == 'raise':
if np.any(invalid):
raise KeyError('Not all keys in `that` are present in `this`')
elif missing == 'mask':
indices = np.ma.masked_array(indices, invalid)
else:
Expand Down

0 comments on commit d1892f2

Please sign in to comment.