Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPL command implementation for appendCol #990

Merged
merged 64 commits into from
Jan 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
760ec87
Update grammar def
andy-k-improving Dec 6, 2024
c0e341e
Skeleton for Append fields
andy-k-improving Dec 6, 2024
6441a2e
Visitor skeleton
andy-k-improving Dec 7, 2024
9994704
Update import
andy-k-improving Dec 9, 2024
5d7f55f
Update import
andy-k-improving Dec 9, 2024
11394a7
Update osrt
andy-k-improving Dec 10, 2024
16cb357
Changes
andy-k-improving Dec 11, 2024
ae5568d
Consolidate String constant
andy-k-improving Dec 11, 2024
4440d87
Update projection clause
andy-k-improving Dec 11, 2024
81065a6
Remove dep on parent method
andy-k-improving Dec 11, 2024
03d38cc
Consolidate relation inject logic
andy-k-improving Dec 11, 2024
eb7bb74
Move constant
andy-k-improving Dec 11, 2024
eae0eeb
Move out constant from lambda
andy-k-improving Dec 11, 2024
bf45860
Consolidate method
andy-k-improving Dec 11, 2024
8d25c77
Update logic
andy-k-improving Dec 11, 2024
b10adbe
Test 1 2
andy-k-improving Dec 12, 2024
696b98e
Test-cases 3 and 4
andy-k-improving Dec 13, 2024
852e8b4
Update code format
andy-k-improving Dec 13, 2024
0b1d908
Update code style
andy-k-improving Dec 13, 2024
5dff8d4
Update scala syntax
andy-k-improving Dec 13, 2024
b2bb270
Override option
andy-k-improving Dec 13, 2024
a6ded0a
Update override option
andy-k-improving Dec 13, 2024
1077596
Enable override option
andy-k-improving Dec 13, 2024
4e590de
Override impl
andy-k-improving Dec 14, 2024
59df098
Minimise cmd permission
andy-k-improving Dec 14, 2024
e1bc98a
Refactor util class
andy-k-improving Dec 14, 2024
53fa48a
Java doc
andy-k-improving Dec 14, 2024
6198c79
Integ test 1 2
andy-k-improving Dec 14, 2024
cfc9dfb
Test cases 3 4
andy-k-improving Dec 14, 2024
bf0829d
Test code comments
andy-k-improving Dec 16, 2024
37baa76
Code tidy
andy-k-improving Dec 16, 2024
1f97167
Code refactor
andy-k-improving Dec 16, 2024
e34e731
ScalaFmt
andy-k-improving Dec 16, 2024
27f33b6
Remove sout
andy-k-improving Dec 16, 2024
040b889
Update doc
andy-k-improving Dec 16, 2024
4735970
Override option test case
andy-k-improving Dec 16, 2024
e33f7ed
Code style
andy-k-improving Dec 16, 2024
b8b1fbf
Code comment
andy-k-improving Dec 17, 2024
833561b
Deprecate visit child (1)
andy-k-improving Dec 17, 2024
3c3177b
Minimise code diff
andy-k-improving Dec 17, 2024
993e87d
Update override logic
andy-k-improving Dec 19, 2024
27c46b7
Update test-cases
andy-k-improving Dec 19, 2024
12f7817
Integ
andy-k-improving Dec 20, 2024
c2a6c22
Make append alias distinct
andy-k-improving Dec 20, 2024
e2c8875
Update integ for distinct tables
andy-k-improving Dec 20, 2024
beddbd8
Update limitation
andy-k-improving Dec 20, 2024
1d06bbd
Update code style
andy-k-improving Dec 20, 2024
ebf4078
Update docs/ppl-lang/ppl-appendcol-command.md
andy-k-improving Dec 20, 2024
03dfcfb
Update docs/ppl-lang/ppl-appendcol-command.md
andy-k-improving Dec 20, 2024
f85405f
Update docs/ppl-lang/ppl-appendcol-command.md
andy-k-improving Dec 20, 2024
5030f0c
Update readme
andy-k-improving Dec 20, 2024
7d2c36a
Mark var as final
andy-k-improving Dec 20, 2024
c5a6649
Update join type
andy-k-improving Jan 2, 2025
2b36884
Update unit tests
andy-k-improving Jan 2, 2025
e1c7fc4
Update existing integ test for full outer
andy-k-improving Jan 2, 2025
2fdbadd
Test cases for null
andy-k-improving Jan 3, 2025
d6a29fb
Update scalafmt
andy-k-improving Jan 3, 2025
45a22d2
Update doc
andy-k-improving Jan 3, 2025
0c8e3d1
Update doc
andy-k-improving Jan 3, 2025
820df4e
Multiple avg commands
andy-k-improving Jan 3, 2025
bf5b73a
Multiple avg commands
andy-k-improving Jan 3, 2025
bcfef7a
Remove debug
andy-k-improving Jan 3, 2025
05c9dff
Additional example for conflicted columns
andy-k-improving Jan 7, 2025
8e43db3
Code refactor
andy-k-improving Jan 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/ppl-lang/PPL-Example-Commands.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
## Example PPL Queries

#### **AppendCol**
[See additional command details](ppl-appendcol-command.md)
- `source=employees | stats avg(age) as avg_age1 by dept | fields dept, avg_age1 | APPENDCOL [ stats avg(age) as avg_age2 by dept | fields avg_age2 ];` (To display multiple table statistics side by side)
- `source=employees | FIELDS name, dept, age | APPENDCOL OVERRIDE=true [ stats avg(age) as age ];` (When the override option is enabled, fields from the sub-query take precedence over fields in the main query in cases of field name conflicts)

#### **Comment**
[See additional command details](ppl-comment.md)
- `source=accounts | top gender // finds most common gender of all the accounts` (line comment)
Expand Down
2 changes: 2 additions & 0 deletions docs/ppl-lang/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ For additional examples see the next [documentation](PPL-Example-Commands.md).

- [`expand commands`](ppl-expand-command.md)

- [`appendcol command`](ppl-appendcol-command.md)

* **Functions**

- [`Expressions`](functions/ppl-expressions.md)
Expand Down
120 changes: 120 additions & 0 deletions docs/ppl-lang/ppl-appendcol-command.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
## PPL `appendcol` command

### Description
Using `appendcol` command to append the result of a sub-search and attach it alongside with the input search results (The main search).

### Syntax - APPENDCOL
`APPENDCOL <override=?> [sub-search]...`

* <override=?>: optional boolean field to specify should result from main-result be overwritten in the case of column name conflict.
* sub-search: Executes PPL commands as a secondary search. The sub-search uses the same data specified in the source clause of the main search results as its input.


#### Example 1: To append the result of `stats avg(age) as AVG_AGE` into existing search result
andy-k-improving marked this conversation as resolved.
Show resolved Hide resolved

The example append the result of sub-search `stats avg(age) as AVG_AGE` alongside with the main-search.

PPL query:

os> source=employees | FIELDS name, dept, age | APPENDCOL [ stats avg(age) as AVG_AGE ];
fetched rows / total rows = 9/9
+------+-------------+-----+------------------+
| name | dept | age | AVG_AGE |
+------+-------------+-----+------------------+
| Lisa | Sales | 35 | 31.2222222222222 |
| Fred | Engineering | 28 | NULL |
| Paul | Engineering | 23 | NULL |
| Evan | Sales | 38 | NULL |
| Chloe| Engineering | 25 | NULL |
| Tom | Engineering | 33 | NULL |
| Alex | Sales | 33 | NULL |
| Jane | Marketing | 28 | NULL |
| Jeff | Marketing | 38 | NULL |
+------+-------------+-----+------------------+


#### Example 2: To compare multiple stats commands with side by side with appendCol.

This example demonstrates a common use case: performing multiple statistical calculations and displaying the results side by side in a horizontal layout.

PPL query:

os> source=employees | stats avg(age) as avg_age1 by dept | fields dept, avg_age1 | APPENDCOL [ stats avg(age) as avg_age2 by dept | fields avg_age2 ];
andy-k-improving marked this conversation as resolved.
Show resolved Hide resolved
fetched rows / total rows = 3/3
+-------------+-----------+----------+
| dept | avg_age1 | avg_age2 |
+-------------+-----------+----------+
| Engineering | 27.25 | 27.25 |
| Sales | 35.33 | 35.33 |
| Marketing | 33.00 | 33.00 |
+-------------+-----------+----------+


#### Example 3: Append multiple sub-search result

The example demonstrate multiple APPENCOL commands can be chained to provide one comprehensive view for user.

PPL query:

os> source=employees | FIELDS name, dept, age | APPENDCOL [ stats avg(age) as AVG_AGE ] | APPENDCOL [ stats max(age) as MAX_AGE ];
fetched rows / total rows = 9/9
+------+-------------+-----+------------------+---------+
| name | dept | age | AVG_AGE | MAX_AGE |
+------+-------------+-----+------------------+---------+
| Lisa | Sales------ | 35 | 31.22222222222222| 38 |
| Fred | Engineering | 28 | NULL | NULL |
| Paul | Engineering | 23 | NULL | NULL |
| Evan | Sales------ | 38 | NULL | NULL |
| Chloe| Engineering | 25 | NULL | NULL |
| Tom | Engineering | 33 | NULL | NULL |
| Alex | Sales | 33 | NULL | NULL |
| Jane | Marketing | 28 | NULL | NULL |
| Jeff | Marketing | 38 | NULL | NULL |
+------+-------------+-----+------------------+---------+

#### Example 4: Over main-search in the case of column name conflict

The example demonstrate the usage of `OVERRIDE` option to overwrite the `age` column from the main-search,
when the option is set to true and column with same name `age` present on sub-search.

PPL query:

os> source=employees | FIELDS name, dept, age | APPENDCOL OVERRIDE=true [ stats avg(age) as age ];
fetched rows / total rows = 9/9
+------+-------------+------------------+
| name | dept | age |
+------+-------------+------------------+
| Lisa | Sales------ | 31.22222222222222|
| Fred | Engineering | NULL |
| Paul | Engineering | NULL |
| Evan | Sales------ | NULL |
| Chloe| Engineering | NULL |
| Tom | Engineering | NULL |
| Alex | Sales | NULL |
| Jane | Marketing | NULL |
| Jeff | Marketing | NULL |
+------+-------------+------------------+

#### Example 5: AppendCol command with duplicated columns

The example demonstrate what could happen when conflicted columns exist, with `override` set to false or absent.
In this particular case, average aggregation is being performed over column `age` with group-by `dept`, on main and sub query respectively.
As the result, `dept` and `avg_age1` will be returned by the main query, with `avg_age2` and `dept` for the sub-query,
and take into consideration `override` is absent, duplicated columns won't be dropped, hence all four columns will be displayed as the final result.

PPL query:

os> source=employees | stats avg(age) as avg_age1 by dept | APPENDCOL [ stats avg(age) as avg_age2 by dept ];
fetched rows / total rows = 3/3
+------------+--------------+------------+--------------+
| Avg Age 1 | Dept | Avg Age 2 | Dept |
+------------+--------------+------------+--------------+
| 35.33 | Sales | 35.33 | Sales |
| 27.25 | Engineering | 27.25 | Engineering |
| 33.00 | Marketing | 33.00 | Marketing |
+------------+--------------+------------+--------------+


### Limitation:
When override is set to true, only `FIELDS` and `STATS` commands are allowed as the final clause in a sub-search.
Otherwise, an IllegalStateException with the message `Not Supported operation: APPENDCOL should specify the output fields` will be thrown.
Loading
Loading