Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support homedir expansion in lazy/scan read functions #16869

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions crates/polars-lazy/src/scan/csv.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ use polars_io::cloud::CloudOptions;
use polars_io::csv::read::{
infer_file_schema, CommentPrefix, CsvEncoding, CsvParseOptions, CsvReadOptions, NullValues,
};
use polars_io::prelude::resolve_homedir;
use polars_io::utils::get_reader_bytes;
use polars_io::RowIndex;

Expand Down Expand Up @@ -35,7 +36,7 @@ impl LazyCsvReader {

pub fn new(path: impl AsRef<Path>) -> Self {
LazyCsvReader {
path: path.as_ref().to_owned(),
path: resolve_homedir(path.as_ref()),
paths: Arc::new([]),
glob: true,
cache: true,
Expand Down Expand Up @@ -305,11 +306,17 @@ impl LazyFileListReader for LazyCsvReader {
}

fn with_path(mut self, path: PathBuf) -> Self {
self.path = path;
self.path = resolve_homedir(&path);
self
}

fn with_paths(mut self, paths: Arc<[PathBuf]>) -> Self {
let paths = paths
.iter()
.map(|p| resolve_homedir(p))
.collect::<Vec<_>>()
.into();

self.paths = paths;
self
}
Expand Down
11 changes: 9 additions & 2 deletions crates/polars-lazy/src/scan/ipc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ use std::path::{Path, PathBuf};
use polars_core::prelude::*;
use polars_io::cloud::CloudOptions;
use polars_io::ipc::IpcScanOptions;
use polars_io::utils::resolve_homedir;
use polars_io::RowIndex;

use crate::prelude::*;
Expand Down Expand Up @@ -41,7 +42,7 @@ impl LazyIpcReader {
fn new(path: PathBuf, args: ScanArgsIpc) -> Self {
Self {
args,
path,
path: resolve_homedir(&path),
paths: Arc::new([]),
}
}
Expand Down Expand Up @@ -96,11 +97,17 @@ impl LazyFileListReader for LazyIpcReader {
}

fn with_path(mut self, path: PathBuf) -> Self {
self.path = path;
self.path = resolve_homedir(&path);
self
}

fn with_paths(mut self, paths: Arc<[PathBuf]>) -> Self {
let paths = paths
.iter()
.map(|p| resolve_homedir(p))
.collect::<Vec<_>>()
.into();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure, but I believe we can collect into a Arc<[]> directly.

Copy link
Collaborator Author

@alexander-beedie alexander-beedie Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(thinking out loud)
Will poke at it some more 😆

Copy link
Collaborator Author

@alexander-beedie alexander-beedie Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can't see a way to avoid the intermediate Vec; the Arc seems to want a constructed object to reference? 🤔 If you can spot something cunning, feel free to point me at it or commit over the top :))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I am sure. :D

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=82ca9fa7f2c267ad4811d7714747b7ae

Lol... pretty sure that the collect there still creates an intermediate Vec, it just gets deallocated after conversion to Arc[i32] :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but Rust might optimize that later.


self.paths = paths;
self
}
Expand Down
9 changes: 8 additions & 1 deletion crates/polars-lazy/src/scan/ndjson.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ use std::path::{Path, PathBuf};
use std::sync::RwLock;

use polars_core::prelude::*;
use polars_io::utils::resolve_homedir;
use polars_io::RowIndex;

use super::*;
Expand Down Expand Up @@ -114,11 +115,17 @@ impl LazyFileListReader for LazyJsonLineReader {
}

fn with_path(mut self, path: PathBuf) -> Self {
self.path = path;
self.path = resolve_homedir(&path);
self
}

fn with_paths(mut self, paths: Arc<[PathBuf]>) -> Self {
let paths = paths
.iter()
.map(|p| resolve_homedir(p))
.collect::<Vec<_>>()
.into();

self.paths = paths;
self
}
Expand Down
9 changes: 8 additions & 1 deletion crates/polars-lazy/src/scan/parquet.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ use std::path::{Path, PathBuf};
use polars_core::prelude::*;
use polars_io::cloud::CloudOptions;
use polars_io::parquet::read::ParallelStrategy;
use polars_io::prelude::resolve_homedir;
use polars_io::{HiveOptions, RowIndex};

use crate::prelude::*;
Expand Down Expand Up @@ -112,11 +113,17 @@ impl LazyFileListReader for LazyParquetReader {
}

fn with_path(mut self, path: PathBuf) -> Self {
self.path = path;
self.path = resolve_homedir(&path);
self
}

fn with_paths(mut self, paths: Arc<[PathBuf]>) -> Self {
let paths = paths
.iter()
.map(|p| resolve_homedir(p))
.collect::<Vec<_>>()
.into();

self.paths = paths;
self
}
Expand Down