-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added test of else method using differently ordered columns #682
base: main
Are you sure you want to change the base?
Conversation
a06147b
to
6d235f6
Compare
@@ -0,0 +1,4 @@ | |||
{"_time":"1970-01-01T00:00:00.000000001","_subsort":0,"_key_hash":17095134351192101601,"_key":"dev","joined":{"text":"Thread 1","user":"UCZ4","time":1,"thread_ts":1.0,"key":"dev"},"threads":{"text":"Thread 1","user":"UCZ4","time":1.0,"thread_ts":1.0,"key":"dev"},"non_threads":null} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on what is the expected results here? These appear to be correct. If threads
is null then non_threads
is used as the value of joined
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the expected results are correct. the test currently fails. https://github.com/kaskada-ai/kaskada/actions/runs/5926509057/job/16068060314?pr=682#step:6:211
def record_source_slack() -> kd.sources.JsonlString: | ||
content = "\n".join( | ||
[ | ||
"""{"text":"Thread 1","user":"UCZ4","time":1,"thread_ts":1,"key":"dev"}""", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't use """..."""
here -- those are "docstrings". Lint will likely suggest changing it.
"""{"text":"Msg 2","user":"U016","time":4,"thread_ts":null,"key":"dev"}""", | ||
] | ||
) | ||
return kd.sources.JsonlString( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyList may be even easier for this example:
source = kd.sources.PyList(
[
{"time": "1996-12-19T16:39:57", "text": "Thread 2", "user": "U016", "thread_ts": 1, "key": "dev" },
...,
],
time_column_name="time",
key_column_name="user",
)
threads = record_source_slack.filter(record_source_slack.col("thread_ts").is_not_null()) | ||
non_threads = record_source_slack.filter(record_source_slack.col("thread_ts").is_null()) | ||
|
||
# this call re-orders the columns in the non_threads timestream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the error, it doesn't seem to care that the fields are re-ordered. Instead, it doesn't like row 3:
At positional index 2, first diff: {'user': None, 'text': None, 'time': None, 'thread_ts': None, 'key': None} != {'text': 'Msg 1', 'user': 'U016', 'time': 3, 'thread_ts': None, 'key': 'dev'}
non_threads = joined.col("non_threads") | ||
|
||
golden.jsonl( | ||
joined.extend({"joined": threads.else_(non_threads)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I believe the problem here is that we're creating a new record containing joined
. Even when threads.else_(non_threads)
is null, this record will sometimes be not null. Eg.
Two questions / thoughts:
- Why do we need the
joined
at all? What happens if you just outputthreads.else_(non_threads)
- If we could do
joined.extend(threads.else_(non_threads)
that would also help (and it is currently planned)
6d235f6
to
493a7a7
Compare
i was having trouble shortening this test, while still being able to reproduce the bug.
feel free to shorten if it still reproduces the issue.
note that I haven't fixed the issue yet. only added this failing test.