You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thread 'main' panicked at 'index out of bounds: the len is 8192 but the index is 8192', /Users/runner/Library/Caches/Homebrew/cargo_cache/registry/src/github.com-1ecc6299db9ec823/parquet-26.0.0/src/encodings/rle.rs:490:25
stack backtrace:
0: _rust_begin_unwind
1: core::panicking::panic_fmt
2: core::panicking::panic_bounds_check
3: parquet::encodings::rle::RleDecoder::get_batch_with_dict
4: <parquet::encodings::decoding::DictDecoder<T> as parquet::encodings::decoding::Decoder<T>>::get
5: <parquet::column::reader::decoder::ColumnValueDecoderImpl<T> as parquet::column::reader::decoder::ColumnValueDecoder>::read
6: parquet::column::reader::GenericColumnReader<R,D,V>::read_batch
7: parquet::arrow::record_reader::GenericRecordReader<V,CV>::read_records
8: parquet::arrow::array_reader::read_records
9: <parquet::arrow::array_reader::primitive_array::PrimitiveArrayReader<T> as parquet::arrow::array_reader::ArrayReader>::read_records
10: <parquet::arrow::array_reader::struct_array::StructArrayReader as parquet::arrow::array_reader::ArrayReader>::read_records
11: <parquet::arrow::arrow_reader::ParquetRecordBatchReader as core::iter::traits::iterator::Iterator>::next
12: <S as futures_core::stream::TryStream>::try_poll_next
13: <futures_util::stream::stream::map::Map<St,F> as futures_core::stream::Stream>::poll_next
14: <futures_util::stream::stream::map::Map<St,F> as futures_core::stream::Stream>::poll_next
15: <datafusion::physical_plan::file_format::file_stream::FileStream<F> as futures_core::stream::Stream>::poll_next
16: <datafusion::physical_plan::projection::ProjectionStream as futures_core::stream::Stream>::poll_next
17: <datafusion::physical_plan::limit::LimitStream as futures_core::stream::Stream>::poll_next
18: <futures_util::stream::try_stream::try_collect::TryCollect<St,C> as core::future::future::Future>::poll
19: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
20: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
21: tokio::park::thread::CachedParkThread::block_on
22: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
23: tokio::runtime::Runtime::block_on
24: qv::main
I had the same problem, but so far all issues have been caused by doing something wrong with respect to the Parquet format.
The parquet2 crate doesn't do enough validation to guarantee its output is correct. For example, it'll happily let you skip definition levels even when they're required.
What I think the crate needs is a few solid examples of how to write a parquet file. Currently, it's up to you to figure it out and it's easy to get it wrong.
The panic happens here https://github.com/apache/arrow-rs/blob/b1642ab150ee61f730b2cda51bb917d42d9aeeb1/parquet/src/encodings/rle.rs#L490.
I've noticed similar errors when trying to read using Trino https://github.com/trinodb/trino
The text was updated successfully, but these errors were encountered: