Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework "Normalize time series data intervals" #19

Open
amotl opened this issue Apr 27, 2021 · 6 comments
Open

Rework "Normalize time series data intervals" #19

amotl opened this issue Apr 27, 2021 · 6 comments
Labels
help wanted Extra attention is needed

Comments

@amotl
Copy link
Member

amotl commented Apr 27, 2021

Hi there,

after reading @hammerhead's excellent Interpolation of missing time series values, I wonder if we should overhaul or even remove the Interpolate missing records section within the Normalize time series data intervals document.

Then, this document would be more like a Unix component in the terms of "do one thing and do it well" and I would rename it to "Processing and plotting timeseries-data with CrateDB, Pandas and Matplotlib" in order to better reflect its content.

What do you think about this?

With kind regards,
Andreas.

cc @proddata, @NOROSA

@proddata
Copy link
Member

That was the intention, to maybe integrate it into the tutorial as next step :)

@nomicode
Copy link

nomicode commented Apr 27, 2021

I won't fight this work at all. the more tutorials the better

however, I don't think it's accurate to say that one is more about Pandas and Matplotlib and the other is (implicitly) more direct or otherwise somehow "closer" to the topic. both of them present the data in different ways. the new one uses screenshots of tables :) -- just another form of data representation

both are good! but I am not sure I would elevate one to over the other in terms of directness. the reason I went with graphical plotting is that I think it helps to see the data. I believe that most people are visual learners (but I may be misremembering the literature on this)

@proddata
Copy link
Member

@NOROSA
What I discussed with @hammerhead was less about the data presentation, but rather of bringing in LOCF or NOCB into the normalize time series data tutorial, instead or additional to filling gaps with null 😉

@nomicode
Copy link

@proddata I see. that is a meaningful distinction yes! seems to me that these tutorials could be presented side by side as different-but-related tutorials

@hammerhead
Copy link
Member

We can also remove the community article again at some later point in time if we decide to merge that information into the main documentation. For now, it was really just a quick way to get the information out to the public, because users ask about this topic frequently. I won't mind if anyone takes the content of the article and works it into the existing tutorial 👍.

I also just created a ticket to improve the backend support for time series interpolation. Depending on when/if this is going to be implemented, that might also be a good point in time to revisit the documentation situation.

@amotl
Copy link
Member Author

amotl commented Sep 20, 2023

Hi again,

do we have canonical tutorial-like resources, for example on the community forum or on the blog, which describe how to solve the LOCF/NOCB problem with CrateDB, using LAG/LEAD functions, which could be taken into consideration when modernizing the corresponding documentation sections?

I can see that a few query examples on the website are using LAG already, and that @andnig shared an example at 03-anomaly-detection.md:

Last Observation Carried Forward

More often than not, you are working with time series data using different sampling intervals. When
processing such data, you will most likely run into situations where you will have gaps in your data,
mostly represented by non-value symbols like NULL or NaN.

image

To fill these gaps, you can use CrateDB's LAG function, with their IGNORE NULLS option.

SELECT "time",
    COALESCE(battery_level, LAG(battery_level) IGNORE NULLS OVER w) AS battery_level,
    COALESCE(battery_status, LAG(battery_status) IGNORE NULLS OVER w) AS battery_status,
    COALESCE(battery_temperature, LAG(battery_temperature) IGNORE NULLS OVER w) AS battery_temperature
FROM machine_data
WINDOW w AS (ORDER BY "time");
ORDER BY "time";

Do you know about any other resources at cratedb.com, properly educating people about this?

With kind regards,
Andreas.

/cc @marijaselakovic, @karynzv, @hlcianfagna, @matriv, @seut

@amotl amotl transferred this issue from crate/crate-tutorials Feb 20, 2024
@amotl amotl added good first issue Good for newcomers help wanted Extra attention is needed and removed good first issue Good for newcomers labels Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants