Skip to content

Commit

Permalink
adds comparison docs (#112)
Browse files Browse the repository at this point in the history
* adds comparison docs

* added missing space

* fixes `n` slice_min/max bug (#110)

* fixes `n` slice_min/max bug

* adds `@head`

* Clean up documentation in prep for release, bump version to v0.16.2.

* Fix doctest.

---------

Co-authored-by: Karandeep Singh <[email protected]>

* adds extra for sep and remove for unite (#113)

* adds extra for sep and remove for unite

* switch from `warn` ex to `drop` ex in docstring

* add :cat_other, :cat_replace_missing, :cat_recode to donotvec list

* fixes `n` slice_min/max bug (#110)

* fixes `n` slice_min/max bug

* adds `@head`

* Clean up documentation in prep for release, bump version to v0.16.2.

* Fix doctest.

---------

Co-authored-by: Karandeep Singh <[email protected]>

* Cleaned up docstrings.

* Clean up NEWS.md

---------

Co-authored-by: Karandeep Singh <[email protected]>

* Clean up comparison docs.

---------

Co-authored-by: Karandeep Singh <[email protected]>
  • Loading branch information
drizk1 and kdpsingh authored Sep 3, 2024
1 parent ad1e8b5 commit 70b35d4
Show file tree
Hide file tree
Showing 2 changed files with 59 additions and 0 deletions.
58 changes: 58 additions & 0 deletions docs/examples/UserGuide/comparisons.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# TidierData.jl is built on DataFrames.jl.

# This section will directly compare the two package syntaxes.
#
# This documentation is based directly off of the DataFrames.jl documentation [comparing different workflows.](https://dataframes.juliadata.org/stable/man/comparisons/#Comparison-with-the-R-package-dplyr)

# To run these examples, use these two dataframes.

# ```julia
# using DataFrames, TidierData # TidierData re-exports Statistics.jl which is why it does not need to be explicitly loaded.
# df = DataFrame(grp = repeat(1:2, 3), x = 6:-1:1, y = 4:9, z = [3:7; missing], id = 'a':'f')
# df2 = DataFrame(grp = [1, 3], w = [10, 11])
# ```

# ## Basic Operations
# | Operation | TidierData.jl | DataFrames.jl |
# |:-------------------------|:-------------------------------------|:---------------------------------------|
# | Reduce multiple values | `@summarize(df, mean_x = mean(x))` | `combine(df, :x => mean)` |
# | Add new columns | `@mutate(df, mean_x = mean(x))` | `transform(df, :x => mean => :x_mean)` |
# | Rename columns | `@rename(df, x_new = x)` | `rename(df, :x => :x_new)` |
# | Pick columns | `@select(df, x, y)` | `select(df, :x, :y)` |
# | Pick & transform columns | `@transmute(df, mean_x = mean(x), y)`| `select(df, :x => mean, :y)` |
# | Pick rows | `@filter(df, x >= 1)` | `subset(df, :x => ByRow(x -> x >= 1))` |
# | Sort rows | `@arrange(df, x)` | `sort(df, :x)` |

# As in DataFrames.jl, some of these functions can operate by group on a grouped dataframe.
# Below we show TidierData macros chained together.

# ## Grouped DataFrames
# | Operation | TidierData.jl | DataFrames.jl |
# |:-------------------------|:-----------------------------------------------------------|:--------------------------------------------|
# | Reduce multiple values | `@chain df @group_by(grp) @summarize(mean_x = mean(x))` | `combine(groupby(df, :grp), :x => mean)` |
# | Add new columns | `@chain df @group_by(grp) @mutate(mean_x = mean(x))` | `transform(groupby(df, :grp), :x => mean)` |
# | Pick & transform columns | `@chain df @group_by(grp) @select(mean_x = mean(x), y)` | `select(groupby(df, :grp), :x => mean, :y)` |

# ## More advanced commands are shown below:

# | Operation | TidierData.jl | DataFrames.jl |
# |:--------------------------|:----------------------------------------------------------|:---------------------------------------------------------------------------|
# | Complex Function | `@summarize(df, mean_x = mean(skipmissing(x)))` | `combine(df, :x => x -> mean(skipmissing(x)))` |
# | Transform several columns | `@summarize(df, x_max = maximum(x), y_min = minimum(y))` | `combine(df, :x => maximum => :x_max, :y => minimum => :y_min)` |
# | | `@summarize(df, across((x, y), mean))` | `combine(df, [:x, :y] .=> mean)` |
# | | `@summarize(df, across(starts_with("x"), mean))` | `combine(df, names(df, r"^x") .=> mean)` |
# | | `@summarize(df, across((x, y), (maximum, minimum)))` | `combine(df, ([:x, :y] .=> [maximum minimum])...)` |
# | DataFrame as output | `@summarize(df, test = [minimum(x), maximum(x)])` | `combine(df, :x => (x -> (value = [minimum(x), maximum(x)],)) => AsTable)` |


# ## Joining DataFrames

# | Operation | TidierData.jl | DataFrames.jl |
# |:----------------------|:------------------------------------------------|:--------------------------------|
# | Inner join | `@inner_join(df, df2, grp)` | `innerjoin(df, df2, on = :grp)` |
# | Outer join | `@outer_join(df, df2, grp)` | `outerjoin(df, df2, on = :grp)` |
# | Left join | `@left_join(df, df2, grp)` | `leftjoin(df, df2, on = :grp)` |
# | Right join | `@right_join(df, df2, grp)` | `rightjoin(df, df2, on = :grp)` |
# | Anti join (filtering) | `@anti_join(df, df2, grp)` | `antijoin(df, df2, on = :grp)` |
# | Semi join (filtering) | `@semi_join(df, df2, grp)` | `semijoin(df, df2, on = :grp)` |

1 change: 1 addition & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -139,5 +139,6 @@ nav:
- "Interpolation" : "examples/generated/UserGuide/interpolation.md"
- "Auto-vectorization" : "examples/generated/UserGuide/autovec.md"
# - "Benchmarking" : "examples/generated/UserGuide/benchmarking.md"
- "Comparison to DF.jl" : "examples/generated/UserGuide/comparisons.md"
- "Contribute" : "examples/generated/Contributors/Howto.md"
- "Reference" : "reference.md"

2 comments on commit 70b35d4

@kdpsingh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/114458

Tip: Release Notes

Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.

@JuliaRegistrator register

Release notes:

## Breaking changes

- blah

To add them here just re-invoke and the PR will be updated.

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.16.2 -m "<description of version>" 70b35d4f0d9708733db3a28c5bd9bc9b2f3c93db
git push origin v0.16.2

Please sign in to comment.