Releases: scikit-hep/awkward-0.x
0.9.0rc1
0.8.15
0.8.14
0.8.13
0.8.12
PR #105 fixed two cases of not checking for empty arrays before calling .max()
.
Added pad
and fillna
to turn jagged arrays into Numpy arrays:
a = awkward.fromiter([[1.1, 2.2, 3.3, 4.4, 5.5], [], [6.6, 7.7, 8.8], [9.9]])
a.pad(3)
# returns [[1.1 2.2 3.3 4.4 5.5] [None None None] [6.6 7.7 8.8] [9.9 None None]]
a.pad(3, clip=True)
# returns [[1.1 2.2 3.3] [None None None] [6.6 7.7 8.8] [9.9 None None]]
a.pad(3, clip=True).fillna(999)
# returns [[1.1 2.2 3.3] [999.0 999.0 999.0] [6.6 7.7 8.8] [9.9 999.0 999.0]]
a.pad(3, clip=True).fillna(999).regular()
# returns [[ 1.1, 2.2, 3.3],
# [999. , 999. , 999. ],
# [ 9.9, 999. , 999. ]]
0.8.11
0.8.9 and 0.8.10 broke uproot's tree.pandas.df()
because that function (illegally!) used the private method _broadcast
. This release puts it back as an alias, which will make uproot work as long as the installed version of awkward isn't in this two-version window.
This will be handled properly soon.
0.8.10
0.8.9
Various bug-fixes and improvements to broadcasting from PR #99.
The old internal member function _broadcast
has been made part of the public API as tojagged
. (Do not confuse this with the internal member function _tojagged
, which will sooner or later be removed. The public tojagged
, with no underscore, has a different definition and is intended to be maintained.)
0.8.8
All array types have an nbytes
parameter, which determines eviction from uproot's ArrayCache
. Without this parameter, the cache would fill up to a billion arrays rather than a billion bytes!
The nbytes
parameter only counts data in arrays, not the Python objects that support those arrays (which differs between Pythons 2 and 3, and PyPy doesn't track), and it doesn't track ephemeral attributes, even if they are arrays (like JaggedArray._counts
, which only exists after the first time JaggedArray.counts
is requested). It also doesn't make a distinction between owned data and not-owned data, so views would be double-counted.
The nbytes
algorithm always halts, even if structures have cyclic references (if x.content is x
, the nbytes
of x
are not double-counted and do not lead to infinite recursion).
0.8.7
This release adds awkward.toarrow
and awkward.toparquet
, renaming old functions to awkward.fromarrow
and awkward.fromparquet
for symmetry. They can only be used if you have pyarrow
installed, which is not a strict dependency (must be explicitly installed). String columns can be converted from Arrow to Awkward, but not from Awkward to Arrow because of an open question (see comments).
The implemented conversion is really just between Awkward and Arrow, letting pyarrow
convert to and from Parquet.
Top-level Awkward Tables
(possibly under ChunkedArray
or any MaskedArray
) are converted into Arrow Tables
, but deeper Awkward Tables
are converted into Arrow StructArrays
.
Arrow arrays with an associated mask adds a BitMaskedArray
to the Awkward structure. All Awkward MaskedArrays
are pushed down to the deepest Arrow level that can accept them. This might not be necessary—a better understanding of how to generate Arrow buffers might make this unnecessary.
Python types in Awkward ObjectArrays
can't be saved to Arrow, as it's a multilingual serialization system.
Awkward VirtualArrays
are evaluated before converting to Arrow. When reading from Parquet, all columns of all chunks are presented as Awkward VirtualArrays
so that they may be lazily read. By default, Awkward VirtualArrays
are read-once: the VirtualArray
object maintains a reference to the materialized array. That's good for multiple reading performance, but bad for memory use. The cache
parameter of fromparquet
lets you pass a dict-like cache, such as from the cachetools
library.
Awkward ChunkedArrays
become RecordBatches
in a Table
in toarrow
but separate Tables
in toparquet
. When reading fromparquet
, the separate Tables
define the level of granularity for incremental reading.
If toparquet
is given an iterable of Awkward data, it will incrementally write the Parquet file. The same can be achieved by an Awkward ChunkedArray
of Tables
of VirtualArray
, which is what fromparquet
returns, so the output of fromparquet
can be used as input to toparquet
.