DBSTREAM: add "time" argument to the learn function #1472

ShkarupaDC · 2023-12-09T11:31:19Z

Hi! I want to propose a new feature for DBSTREAM.

DBSTREAM uses a protected internal timer (_time_stamp) to measure the time between learning steps. There are 2 issues with this approach.

First, when several learning samples arrive at a time instead of one, we should update the internal timer manually like that

for x in batch:
    dbstream.learn_one(x)
    dbstream._time_stamp -= 1
dbstream._time_stamp += 1

Second, using the natural time when samples arrive instead of a surrogate time is required sometimes. DBSTREAM does not distinguish 2 scenarios when samples come at

100ms and 200ms
100ms and 500ms
if these arrivals are sequential (no other samples arrive in between). However, there can be a large difference from the business perspective.

I propose to add a t (time) argument to the learn_one function. Then, we can learn from samples using the same t value if samples arrive simultaneously and supply time in any units to this function, adjusting the fading_factor.

The text was updated successfully, but these errors were encountered:

MaxHalford · 2023-12-09T13:00:08Z

I think this is a great idea! @Dennis1989 what do you think?

hoanganhngo610 · 2024-05-28T11:17:15Z

@ShkarupaDC Hi! Sorry for getting this late to get back to you.

Within the original paper, the authors have designed DBSTREAM with the time step concept. This means that from my understanding, the authors only consider the order of which data comes, not the speed at which data comes. As such, this is the reason why I use time_stamp in my implementation to represent the parameter t in the original paper.

Moreover, in data stream in general, we usually assume that data comes once at a time. As such, when samples arrive simultaneously, what we would usually do would be to consider them as other data points, coming one by one and in order. This might seem unreasonable, but to make DBSTREAM compatible with the design language of River in general, and to align with such philosophy, we decided to implement it this way.

Hope that this answer clearly explains your concerns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DBSTREAM: add "time" argument to the learn function #1472

DBSTREAM: add "time" argument to the learn function #1472

ShkarupaDC commented Dec 9, 2023 •

edited

Loading

MaxHalford commented Dec 9, 2023

hoanganhngo610 commented May 28, 2024

DBSTREAM: add "time" argument to the learn function #1472

DBSTREAM: add "time" argument to the learn function #1472

Comments

ShkarupaDC commented Dec 9, 2023 • edited Loading

MaxHalford commented Dec 9, 2023

hoanganhngo610 commented May 28, 2024

ShkarupaDC commented Dec 9, 2023 •

edited

Loading