Skip to content

Commit

Permalink
Allow Strings as column identifiers in addition to Symbols (#63)
Browse files Browse the repository at this point in the history
* Enable `String`s as identifiers

* amend docstrings and docs

* missing conversion

* change some tests to use Strings

* add changelog entry
  • Loading branch information
jkrumbiegel authored Dec 17, 2024
1 parent 78d2860 commit 487d97c
Show file tree
Hide file tree
Showing 7 changed files with 35 additions and 25 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## Unreleased

- Allowed `String`s as column identifiers in addition to `Symbol`s [#63](https://github.com/PumasAI/SummaryTables.jl/pull/63).
- Made HTML tables dark mode compatible by reusing foreground color for the lines [#62](https://github.com/PumasAI/SummaryTables.jl/pull/62).

## 3.0.2 - 2024-11-27
Expand Down
6 changes: 3 additions & 3 deletions docs/src/predefined_tables/listingtable.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ listingtable(data, :value, rows = :group1, cols = :group2)
## Argument 2: `variable`

The second argument primarily selects the table column whose data should populate the cells of the listing table.
The column name is specified with a `Symbol`:
The column name is specified with a `Symbol` or `String`:

```@example
using DataFrames
Expand Down Expand Up @@ -204,7 +204,7 @@ listingtable(data, :value, Pagination(cols = 3), rows = :group1, cols = :group2)
## Keyword: `rows`

The `rows` keyword determines the grouping structure along the rows.
It can either be a `Symbol` specifying a grouping column, a `Pair{Symbol,Any}` where the second element overrides the group's label, or a `Vector` with multiple groups of the aforementioned format.
It can either be a `Symbol` or `String` specifying a grouping column, a `Pair{Symbol,Any}` or `Pair{String,Any}` where the second element overrides the group's label, or a `Vector` with multiple groups of the aforementioned format.

This example uses a single group with default label.

Expand Down Expand Up @@ -252,7 +252,7 @@ listingtable(data, :value, rows = [:group1, :group2 => "Group 2"])
## Keyword: `cols`

The `cols` keyword determines the grouping structure along the columns.
It can either be a `Symbol` specifying a grouping column, a `Pair{Symbol,Any}` where the second element overrides the group's label, or a `Vector` with multiple groups of the aforementioned format.
It can either be a `Symbol` or `String` specifying a grouping column, a `Pair{Symbol,Any}` or `Pair{String,Any}` where the second element overrides the group's label, or a `Vector` with multiple groups of the aforementioned format.

This example uses a single group with default label.

Expand Down
6 changes: 3 additions & 3 deletions docs/src/predefined_tables/summarytable.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ summarytable(data, :value, cols = :group, summary = [mean, std])
## Argument 2: `variable`

The second argument primarily selects the table column whose data should populate the cells of the summary table.
The column name is specified with a `Symbol`:
The column name is specified with a `Symbol` or `String`:

```@example
using DataFrames
Expand Down Expand Up @@ -135,7 +135,7 @@ summarytable(data, :value1 => "Value", cols = :group, summary = [mean, std])
## Keyword: `rows`

The `rows` keyword determines the grouping structure along the rows.
It can either be a `Symbol` specifying a grouping column, a `Pair{Symbol,Any}` where the second element overrides the group's label, or a `Vector` with multiple groups of the aforementioned format.
It can either be a `Symbol` or `String` specifying a grouping column, a `Pair{Symbol,Any}` or `Pair{String,Any}` where the second element overrides the group's label, or a `Vector` with multiple groups of the aforementioned format.

This example uses a single group with default label.

Expand Down Expand Up @@ -186,7 +186,7 @@ summarytable(data, :value, rows = [:group1, :group2 => "Group 2"], summary = [me
## Keyword: `cols`

The `cols` keyword determines the grouping structure along the columns.
It can either be a `Symbol` specifying a grouping column, a `Pair{Symbol,Any}` where the second element overrides the group's label, or a `Vector` with multiple groups of the aforementioned format.
It can either be a `Symbol` or `String` specifying a grouping column, a `Pair{Symbol,Any}` or `Pair{String,Any}` where the second element overrides the group's label, or a `Vector` with multiple groups of the aforementioned format.

This example uses a single group with default label.

Expand Down
4 changes: 2 additions & 2 deletions docs/src/predefined_tables/table_one.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ Each analysis can have up to three parts: the variable, the analysis function an

For convenience, if the `analyses` argument is omitted, it is equivalent to passing `Tables.columnnames(table)` except that all columns referenced in `groupby` are filtered out.

The variable is passed as a `Symbol`, corresponding to a column in the input data, and must always be specified.
The variable is passed as a `Symbol` or `String`, corresponding to a column in the input data, and must always be specified.
The other two parts are optional.

If you specify only variables, the analysis functions are chosen automatically based on the columns, and the labels are equal to the variable names.
Expand Down Expand Up @@ -237,7 +237,7 @@ table_one(data, :x, groupby = :y, total_name = "Overall")

## Keyword: `group_totals`

A `Symbol` or `Vector{Symbol}` specifying one or multiple groups for which to add subtotals. All but the topmost group can be chosen here as the topmost group is handled by `show_total` already.
A `Symbol` or `String`, or a `Vector{Symbol}` or `Vector{String}` specifying one or multiple groups for which to add subtotals. All but the topmost group can be chosen here as the topmost group is handled by `show_total` already.

```@example
using SummaryTables
Expand Down
20 changes: 12 additions & 8 deletions src/table.jl
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ end

Group(s::Symbol) = Group(s, string(s))
Group(p::Pair{Symbol, <:Any}) = Group(p[1], p[2])
Group(s::String) = Group(Symbol(s), s)
Group(p::Pair{String, <:Any}) = Group(Symbol(p[1]), p[2])
make_groups(v::AbstractVector) = map(Group, v)
make_groups(x) = [Group(x)]

Expand All @@ -34,8 +36,8 @@ struct Summary
analyses::Vector{SummaryAnalysis}
end

function Summary(p::Pair{Symbol, <:Vector}, symbols)
sym = p[1]
function Summary(p::Pair{<:Union{Symbol,String}, <:Vector}, symbols)
sym = Symbol(p[1])
summary_index = findfirst(==(sym), symbols)
if summary_index === nothing
error("Summary variable :$(sym) is not a grouping variable.")
Expand All @@ -55,7 +57,9 @@ struct Variable
end

Variable(s::Symbol) = Variable(s, string(s))
Variable(s::String) = Variable(Symbol(s), s)
Variable(p::Pair{Symbol, <:Any}) = Variable(p[1], p[2])
Variable(p::Pair{String, <:Any}) = Variable(Symbol(p[1]), p[2])

struct ListingTable
gdf::DataFrames.GroupedDataFrame
Expand Down Expand Up @@ -222,7 +226,7 @@ Create a listing table `Table` from `table` which displays raw values from colum
## Arguments
- `table`: Data source which must be convertible to a `DataFrames.DataFrame`.
- `variable`: Determines which variable's raw values are shown. Can either be a `Symbol` such as `:ColumnA`, or alternatively a `Pair` where the second element is the display name, such as `:ColumnA => "Column A"`.
- `variable`: Determines which variable's raw values are shown. Can either be a `Symbol` or `String` such as `:ColumnA`, or alternatively a `Pair` where the second element is the display name, such as `:ColumnA => "Column A"`.
- `pagination::Pagination`: If a pagination object is passed, the return type changes to `PaginatedTable`.
The `Pagination` object may be created with keywords `rows` and/or `cols`.
These must be set to `Int`s that determine how many group sections along each side are included in one page.
Expand All @@ -236,14 +240,14 @@ Create a listing table `Table` from `table` which displays raw values from colum
## Keyword arguments
- `rows = []`: Grouping structure along the rows. Should be a `Vector` where each element is a grouping variable, specified as a `Symbol` such as `:Column1`, or a `Pair`, where the first element is the symbol and the second a display name, such as `:Column1 => "Column 1"`. Specifying multiple grouping variables creates nested groups, with the last variable changing the fastest.
- `rows = []`: Grouping structure along the rows. Should be a `Vector` where each element is a grouping variable, specified as a `Symbol` or `String` such as `:Column1`, or a `Pair`, where the first element is the symbol and the second a display name, such as `:Column1 => "Column 1"`. Specifying multiple grouping variables creates nested groups, with the last variable changing the fastest.
- `cols = []`: Grouping structure along the columns. Follows the same structure as `rows`.
- `summarize_rows = []`: Specifies functions to summarize `variable` with along the rows.
Should be a `Vector`, where each entry is one separate summary.
Each summary can be given as a `Function` such as `mean` or `maximum`, in which case the display name is the function's name.
Alternatively, a display name can be given using the pair syntax, such as `mean => "Average"`.
By default, one summary is computed over all groups.
You can also pass `Symbol => [...]` where `Symbol` is a grouping column, to compute one summary for each level of that group.
You can also pass `name => [...]` where name, either a `Symbol` or `String`, is a grouping column, to compute one summary for each level of that group.
- `summarize_cols = []`: Specifies functions to summarize `variable` with along the columns. Follows the same structure as `summarize_rows`.
- `variable_header = true`: Controls if the cell with the name of the summarized `variable` is shown.
- `sort = true`: Sort the input table before grouping. Pre-sort as desired and set to `false` when you want to maintain a specific group order or are using non-sortable objects as group keys.
Expand Down Expand Up @@ -725,17 +729,17 @@ Create a summary table `Table` from `table`, which summarizes values from column
## Arguments
- `table`: Data source which must be convertible to a `DataFrames.DataFrame`.
- `variable`: Determines which variable from `table` is summarized. Can either be a `Symbol` such as `:ColumnA`, or alternatively a `Pair` where the second element is the display name, such as `:ColumnA => "Column A"`.
- `variable`: Determines which variable from `table` is summarized. Can either be a `Symbol` or `String` such as `:ColumnA`, or alternatively a `Pair` where the second element is the display name, such as `:ColumnA => "Column A"`.
## Keyword arguments
- `rows = []`: Grouping structure along the rows. Should be a `Vector` where each element is a grouping variable, specified as a `Symbol` such as `:Column1`, or a `Pair`, where the first element is the symbol and the second a display name, such as `:Column1 => "Column 1"`. Specifying multiple grouping variables creates nested groups, with the last variable changing the fastest.
- `rows = []`: Grouping structure along the rows. Should be a `Vector` where each element is a grouping variable, specified as a `Symbol` or `String` such as `:Column1`, or a `Pair`, where the first element is the symbol and the second a display name, such as `:Column1 => "Column 1"`. Specifying multiple grouping variables creates nested groups, with the last variable changing the fastest.
- `cols = []`: Grouping structure along the columns. Follows the same structure as `rows`.
- `summary = []`: Specifies functions to summarize `variable` with.
Should be a `Vector`, where each entry is one separate summary.
Each summary can be given as a `Function` such as `mean` or `maximum`, in which case the display name is the function's name.
Alternatively, a display name can be given using the pair syntax, such as `mean => "Average"`.
By default, one summary is computed over all groups.
You can also pass `Symbol => [...]` where `Symbol` is a grouping column, to compute one summary for each level of that group.
You can also pass `name => [...]` where name, either a `Symbol` or `String`, is a grouping column, to compute one summary for each level of that group.
- `variable_header = true`: Controls if the cell with the name of the summarized `variable` is shown.
- `sort = true`: Sort the input table before grouping. Pre-sort as desired and set to `false` when you want to maintain a specific group order or are using non-sortable objects as group keys.
Expand Down
11 changes: 8 additions & 3 deletions src/table_one.jl
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ function Analysis(s::Symbol, df::DataFrames.DataFrame)
Analysis(s, default_analysis(df[!, s]), string(s))
end

function Analysis(p::Pair{Symbol, <:Any}, df::DataFrames.DataFrame)
function Analysis(p::Pair{<:Union{Symbol,String}, <:Any}, df::DataFrames.DataFrame)
sym, rest = p
Analysis(sym, rest, df)
end
Expand All @@ -114,6 +114,10 @@ function Analysis(sym::Symbol, p::Pair, df::DataFrames.DataFrame)
Analysis(sym, funcs, name, df)
end

function Analysis(sym::String, args...)
Analysis(Symbol(sym), args...)
end

make_analyses(v::AbstractVector, df::DataFrame) = map(x -> Analysis(x, df), v)
make_analyses(x, df::DataFrame) = [Analysis(x, df)]

Expand Down Expand Up @@ -226,7 +230,7 @@ can be stratified by one, or more, variables using the `groupby` keyword.
- `tests`: A `NamedTuple` of hypothesis test types to use for `categorical`, `nonnormal`, `minmax`, and `normal` variables.
- `combine`: An object from `MultipleTesting` to use when combining p-values.
- `show_total`: Display the total column summary. Default is `true`.
- `group_totals`: A group `Symbol` or vector of symbols specifying for which group levels totals should be added. Any group levels but the topmost can be chosen (the topmost being already handled by the `show_total` option). Default is `Symbol[]`.
- `group_totals`: A group `Symbol` or `String` or vector of symbols/strings specifying for which group levels totals should be added. Any group levels but the topmost can be chosen (the topmost being already handled by the `show_total` option). Default is `Symbol[]`.
- `total_name`: The name for all total columns. Default is `"Total"`.
- `show_n`: Display the number of rows for each group key next to its label.
- `show_pvalues`: Display the `P-Value` column. Default is `false`.
Expand Down Expand Up @@ -292,8 +296,9 @@ function table_one(

groupsymbols = [g.symbol for g in groups]

_group_totals(a::AbstractVector{Symbol}) = collect(a)
_group_totals(a::AbstractVector{<:Union{String,Symbol}}) = Symbol.(a)
_group_totals(s::Symbol) = [s]
_group_totals(s::String) = [Symbol(s)]
group_totals = _group_totals(group_totals)
if !isempty(groupsymbols) && first(groupsymbols) in group_totals
throw(ArgumentError("Cannot show totals for topmost group $(repr(first(groupsymbols))) as it would be equivalent to the `show_total` option. Grouping is $groupsymbols"))
Expand Down
12 changes: 6 additions & 6 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ end
t = table_one(df, [:value1, :value2], groupby = [:group1, :group2, :group3], group_totals = [:group3], show_n = true)
reftest(t, "references/table_one/group_totals_three_groups_one_total_level_three")

t = table_one(df, [:value1, :value2], groupby = [:group1, :group2, :group3], group_totals = :group2, show_n = true)
t = table_one(df, ["value1", "value2"], groupby = ["group1", "group2", "group3"], group_totals = "group2", show_n = true)
reftest(t, "references/table_one/group_totals_three_groups_one_total_level_two")

function summarizer(col)
Expand Down Expand Up @@ -284,10 +284,10 @@ end
)
reftest(t, "references/listingtable/summarize_last_group_rows")

t = listingtable(df, :value1,
rows = [:group1, :group2],
cols = [:group3],
summarize_rows = :group1 => [mean]
t = listingtable(df, "value1",
rows = ["group1", "group2"],
cols = ["group3"],
summarize_rows = "group1" => [mean]
)
reftest(t, "references/listingtable/summarize_first_group_rows")

Expand Down Expand Up @@ -419,7 +419,7 @@ end
t = summarytable(df, :value1, rows = [:group1 => "Group 1", :group2], cols = [:group3 => "Group 3"], summary = [mean, std])
reftest(t, "references/summarytable/two_rowgroups_one_colgroup_two_summaries")

t = summarytable(df, :value1, rows = [:group1 => "Group 1", :group2], cols = [:group3 => "Group 3"], summary = [mean, std], variable_header = false)
t = summarytable(df, "value1", rows = ["group1" => "Group 1", "group2"], cols = ["group3" => "Group 3"], summary = [mean, std], variable_header = false)
reftest(t, "references/summarytable/two_rowgroups_one_colgroup_two_summaries_no_header")

t = summarytable(df, :value1, summary = [mean, mean])
Expand Down

0 comments on commit 487d97c

Please sign in to comment.