Skip to content

Commit

Permalink
Merge pull request #21 from MarcoGorelli/sub
Browse files Browse the repository at this point in the history
Add Expr.sub
  • Loading branch information
MarcoGorelli authored Oct 27, 2023
2 parents d999f28 + 2de7a14 commit a57c7ba
Show file tree
Hide file tree
Showing 14 changed files with 308 additions and 37 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
working-directory: polars_business
- run: venv/bin/python -m pytest tests
working-directory: polars_business
- run: venv/bin/python -m mypy .
- run: venv/bin/python -m mypy polars_business/polars_business/ tests
working-directory: polars_business

- name: Build wheels
Expand Down
49 changes: 28 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ Supported functions are:
- `holidays` argument, for passing custom holidays
- `weekend` argument, for passing custom a weekend (default is ('Sat', 'Sun'))
- `plb.datetime_range`: same as above, but the output will be `Datetime` dtype.
- `Expr.bdt.sub`: subtract two `Date`s and count the number of business dates between them!

See `Examples` below!

Expand All @@ -74,7 +75,10 @@ Let's shift `Date` forwards by 5 days, excluding Saturday and Sunday:

```python
result = df.with_columns(
date_shifted=plb.col("date").bdt.offset_by('5bd')
date_shifted=plb.col("date").bdt.offset_by(
'5bd',
weekend=('Sat', 'Sun'),
)
)
print(result)
```
Expand All @@ -91,17 +95,18 @@ shape: (3, 2)
└────────────┴──────────────┘
```

Let's shift `Date` forwards by 5 days, excluding Saturday and Sunday and UK holidays
Let's shift `Date` forwards by 5 days, excluding Friday, Saturday, and England holidays
for 2023 and 2024:

```python
import holidays

uk_holidays = holidays.country_holidays("UK", years=[2023, 2024])
uk_holidays = holidays.country_holidays("UK", subdiv='England', years=[2023, 2024])

result = df.with_columns(
date_shifted=plb.col("date").bdt.offset_by(
by='5bd',
weekend=('Sat', 'Sun'),
holidays=uk_holidays,
)
)
Expand All @@ -114,33 +119,34 @@ shape: (3, 2)
│ --- ┆ --- │
│ date ┆ date │
╞════════════╪══════════════╡
│ 2023-04-03 ┆ 2023-04-11
│ 2023-04-03 ┆ 2023-04-12
│ 2023-09-01 ┆ 2023-09-08 │
│ 2024-01-04 ┆ 2024-01-11 │
└────────────┴──────────────┘
```

Let's shift `Date` forwards by 5 days, excluding only Sunday:
Count the number of business dates between two columns:
```python
result = df.with_columns(
date_shifted=plb.col("date").bdt.offset_by(
by='5bd',
weekend=['Sun'],
)
df = pl.DataFrame(
{
"start": [date(2023, 1, 4), date(2023, 5, 1), date(2023, 9, 9)],
"end": [date(2023, 2, 8), date(2023, 5, 2), date(2023, 12, 30)],
}
)
result = df.with_columns(n_business_days=plb.col("end").bdt.sub("start"))
print(result)
```
```
shape: (3, 2)
┌────────────┬──────────────┐
date ┆ date_shifted
│ --- ┆ --- │
│ date ┆ date │
╞════════════╪══════════════╡
│ 2023-04-03 ┆ 2023-04-08 │
│ 2023-09-01 ┆ 2023-09-07
2024-01-042024-01-10
└────────────┴──────────────┘
shape: (3, 3)
┌────────────┬────────────┬─────────────────┐
start ┆ end ┆ n_business_days
│ --- ┆ --- ┆ ---
│ date ┆ date ┆ i32
╞════════════╪════════════╪═════════════════╡
│ 2023-01-04 ┆ 2023-02-08 ┆ 25
│ 2023-05-01 ┆ 2023-05-02 ┆ 1
2023-09-092023-12-30 ┆ 80
└────────────┴────────────┴─────────────────┘
```

Benchmarks
Expand All @@ -150,6 +156,7 @@ Single-threaded performance is:
- about on par with NumPy
- at least an order of magnitude faster than pandas.

but note that Polars will take care of parallelisation for you.
but note that Polars will take care of parallelisation for you, and that this plugin
will fit in with Polars lazy execution.

Check out https://www.kaggle.com/code/marcogorelli/polars-business for some comparisons.
3 changes: 3 additions & 0 deletions polars_business/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ install-release: venv
unset CONDA_PREFIX && \
source venv/bin/activate && maturin develop --release -m polars_business/Cargo.toml

pre-commit: venv
cargo fmt --all --manifest-path polars_business/Cargo.toml && cargo clippy --all-features --manifest-path polars_business/Cargo.toml

clean:
-@rm -r venv
-@cd expression_lib && cargo clean
Expand Down
1 change: 1 addition & 0 deletions polars_business/bump_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
how = sys.argv[1]

subprocess.run(["cp", "../README.md", "polars_business/README.md"])
subprocess.run(["cp", "../LICENSE", "polars_business/LICENSE"])

with open("polars_business/pyproject.toml", "r", encoding="utf-8") as f:
content = f.read()
Expand Down
21 changes: 21 additions & 0 deletions polars_business/polars_business/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 Marco Edward Gorelli

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
25 changes: 25 additions & 0 deletions polars_business/polars_business/polars_business/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,31 @@ def offset_by(
return result
return result.dt.offset_by(by)

def sub(
self,
end_dates: str | pl.Expr,
*,
weekend: Sequence[str] = ("Sat", "Sun"),
holidays: Sequence[date] | None = None,
) -> pl.Expr:
if weekend != ("Sat", "Sun"):
raise NotImplementedError(
"custom weekends are not yet supported - coming soon!"
)
if holidays:
raise NotImplementedError(
"custom holidays are not yet supported - coming soon!"
)
if isinstance(end_dates, str):
end_dates = pl.col(end_dates)
result = self._expr._register_plugin(
lib=lib,
symbol="sub",
is_elementwise=True,
args=[end_dates],
)
return result


class BExpr(pl.Expr):
@property
Expand Down
13 changes: 7 additions & 6 deletions polars_business/polars_business/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,16 @@ build-backend = "maturin"

[project]
name = "polars-business"
requires-python = ">=3.8"
description = "Business day utilities for Polars"
readme = "README.md"
authors = [
{ name="Marco Gorelli", email="[email protected]" },
]
license = { file = "LICENSE" }
classifiers = [
"Programming Language :: Rust",
"Programming Language :: Python :: Implementation :: CPython",
"Programming Language :: Python :: Implementation :: PyPy",
]
version = "0.1.29"
authors = [
{ name="Marco Gorelli", email="[email protected]" },
]
description = "Business day utilities for Polars"
readme = "README.md"
requires-python = ">=3.8"
8 changes: 8 additions & 0 deletions polars_business/polars_business/src/expressions.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
use crate::business_days::*;
use crate::sub::*;
use polars::prelude::*;
use pyo3_polars::derive::polars_expr;
use serde::Deserialize;
Expand All @@ -23,3 +24,10 @@ fn advance_n_days(inputs: &[Series], kwargs: BusinessDayKwargs) -> PolarsResult<

impl_advance_n_days(s, n, holidays, weekend)
}

#[polars_expr(output_type=Int32)]
fn sub(inputs: &[Series]) -> PolarsResult<Series> {
let begin_dates = &inputs[0];
let end_dates = &inputs[1];
impl_sub(begin_dates, end_dates)
}
1 change: 1 addition & 0 deletions polars_business/polars_business/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
mod business_days;
mod expressions;
mod sub;

#[cfg(target_os = "linux")]
use jemallocator::Jemalloc;
Expand Down
79 changes: 79 additions & 0 deletions polars_business/polars_business/src/sub.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
use crate::business_days::weekday;
use polars::prelude::arity::binary_elementwise;
use polars::prelude::*;

fn date_diff(mut start_date: i32, mut end_date: i32) -> i32 {
let swapped = start_date > end_date;
if swapped {
(start_date, end_date) = (end_date, start_date);
start_date += 1;
end_date += 1;
}

let mut start_weekday = weekday(start_date);
let end_weekday = weekday(end_date);

if start_weekday == 6 {
start_date += 2;
start_weekday = 1;
} else if start_weekday == 7 {
start_date += 1;
start_weekday = 1;
}
if end_weekday == 6 {
end_date += 2;
} else if end_weekday == 7 {
end_date += 1;
}

let diff = end_date - start_date;

let whole_weeks = diff / 7;
let mut count = 0;
count += whole_weeks * 5;
start_date += whole_weeks * 7;
while start_date < end_date {
if start_weekday < 6 {
count += 1;
}
start_date += 1;
start_weekday += 1;
if start_weekday > 7 {
start_weekday = 1;
}
}
if swapped {
-count
} else {
count
}
}

pub(crate) fn impl_sub(
end_dates: &Series,
start_dates: &Series,
// holidays: Vec<i32>,
// weekend: Vec<i32>,
) -> PolarsResult<Series> {
if (start_dates.dtype() != &DataType::Date) || (end_dates.dtype() != &DataType::Date) {
polars_bail!(InvalidOperation: "polars_business sub only works on Date type. Please cast to Date first.");
}
let start_dates = start_dates.date()?;
let end_dates = end_dates.date()?;
let out = match end_dates.len() {
1 => {
if let Some(end_date) = end_dates.get(0) {
start_dates.apply(|x_date| x_date.map(|start_date| date_diff(start_date, end_date)))
} else {
Int32Chunked::full_null(start_dates.name(), start_dates.len())
}
}
_ => binary_elementwise(start_dates, end_dates, |opt_s, opt_n| {
match (opt_s, opt_n) {
(Some(start_date), Some(end_date)) => Some(date_diff(start_date, end_date)),
_ => None,
}
}),
};
Ok(out.into_series())
}
51 changes: 51 additions & 0 deletions polars_business/sub_perf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# type: ignore
import timeit
import warnings
import numpy as np

BENCHMARKS = [1, 2, 3, 4]

SIZE = 1_000_000

# BENCHMARK 1: NO HOLIDAYS INVOLVED

setup = f"""
import polars as pl
import polars_business as plb
from datetime import date
import numpy as np
import pandas as pd
import holidays
import warnings
dates = pl.date_range(date(2020, 1, 1), date(2024, 1, 1), closed='left', eager=True)
size = {SIZE}
start_dates = np.random.choice(dates, size)
end_dates = np.random.choice(dates, size)
df = pl.DataFrame({{
'start_date': start_dates,
'end_date': end_dates,
}})
"""


def time_it(statement):
results = (
np.array(
timeit.Timer(
stmt=statement,
setup=setup,
).repeat(7, 3)
)
/ 3
)
return round(min(results), 5)


if 1 in BENCHMARKS:
print(
"Polars-business: ",
time_it("result_pl = df.select(plb.col('end_date').bdt.sub('start_date'))"),
)
print("NumPy: ", time_it("result_np = np.busday_count(start_dates, end_dates)"))
19 changes: 16 additions & 3 deletions polars_business/t.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,20 @@
import polars as pl
import numpy as np
import polars_business as plb
from datetime import date

df = pl.DataFrame({"ts": [date(2020, 1, 1)]})

print(df.with_columns(ts_shifted=plb.col("ts").bdt.offset_by('3bd')))
df = pl.DataFrame(
{
"start": pl.date_range(date(2019, 12, 30), date(2020, 2, 8), eager=True),
"end": [date(2020, 2, 3)] * 41,
}
)
with pl.Config(tbl_rows=100):
print(
df.with_columns(
start_weekday=pl.col("start").dt.weekday(),
end_weekday=pl.col("end").dt.weekday(),
result=plb.col("end").bdt.sub("start", weekend=("Fri",)),
result_np=pl.Series(np.busday_count(df["start"], df["end"])),
)
)
Loading

0 comments on commit a57c7ba

Please sign in to comment.