Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float timeseries #98

Merged
merged 15 commits into from
May 10, 2024
Merged

Float timeseries #98

merged 15 commits into from
May 10, 2024

Conversation

KaraMelih
Copy link
Collaborator

@KaraMelih KaraMelih commented Apr 8, 2024

Aimed at fixing #87
also fixes #97 (moved this fix to #99)

Previously, we were expecting iso-formattable time strings for all the entries in the expected timing_series .

However, this can easily become too large. As was suggested by others, this PR implements a float conversion.

It first checks if all the values are strings or floats/integers.

  • If they are strings, it converts them to numpy datetime objects, sort by value, look at the relative time delta's from the first ever value in the list.
  • If they are floats/integers it also checks if the "neutrino_time" is explicitly given.

Right now, the logic is missing. I do not know if we should always force initial neutrino time too, or should this be fetched from the Coincidence Tier by time tier people?

I imagine a list of time series starting with 0 is not fully useful if we do not know what the 0 refers to.

@KaraMelih
Copy link
Collaborator Author

Reverted the firedrill fix (ee7f062), as I added that in a minor patch

@KaraMelih
Copy link
Collaborator Author

@sybenzvi @mcolomermolla

I want to ask the expected information in the timing tier. Now, this PR allows for relative-to-first neutrino times with ns precision. This logic indicates that the first value is always zero, and the rest are relative to this value.

In this case, do we also want to force the initial "neutrino_time" to appear as a string (that corresponds to the zeroth time) or do we expect that people looking at the TimeTier will fetch this from the CoincidenceTier ?

Right now, if you pass

neutrino_time = "2024-01-01T12:00:00.00000000"
timing_series = [0, 119781135000, 119881135000, 179781135000, 248890124000]

the neutrino_times will be parsed and sent under CoincidenceTier message (timing series ignored) and

the timing_series will be parsed and sent under TimingTier message (neutrino time ignored).

Is this what we want, or do we want to also always see the neutrino_time in the timing tier too?

@KaraMelih KaraMelih marked this pull request as ready for review May 7, 2024 13:40
@KaraMelih KaraMelih requested a review from sybenzvi May 7, 2024 13:41
@KaraMelih
Copy link
Collaborator Author

I added 'neutrin_time' as a required field for the time tier messages.

Now, it also allows for either a list of strings with individual neutrino times, or a list of integers indicating the relative times from the initial neutrino time with a nanosecond precision.

In the former case, it computes these relative times from the initial neutrino time itself and creates a list of relative times. So the submitted message always contains;

  • 'neutrino_time' : string, time of the first neutrino event
  • 'timing_series' : list of integers, nanosecond precision time differences from the initial neutrino time

@Storreslara Storreslara merged commit 4da6619 into main May 10, 2024
2 checks passed
@KaraMelih KaraMelih deleted the float_timeseries branch May 10, 2024 12:34
@habig
Copy link
Contributor

habig commented May 10, 2024

Have we been able to reproduce #87 ? If there's an unexpected hard limit on message size, we need to take that to scimma as a bug, since the original specs were "sure, you guys can hove anything around including big skymaps"

@KaraMelih
Copy link
Collaborator Author

I just tried and this fails;

import numpy as np
numbers = np.linspace(1e9, 9e9, 100000).astype(int)

tims = SNEWSMessageBuilder(detector_name='XENONnT',
                               neutrino_time='2012-06-09T15:31:08.109876',
                               timing_series=numbers.tolist(),
                               machine_time='2012-06-09T15:30:00.009876',
                               firedrill_mode=False, is_test=True)
tims.send_messages()

here is the error that it raises;

KafkaException: KafkaError{code=MSG_SIZE_TOO_LARGE,val=10,str="Unable to produce message: Broker: Message size too large"}

It was fine with 10k integers, it crashes at 100k.

@habig
Copy link
Contributor

habig commented May 10, 2024

Great, thanks. If we convert that to bytes (since we're sending in ASCII one number is a lot of characters), what's that look like? I've got the attention of scimma devs at the moment.

@habig
Copy link
Contributor

habig commented May 10, 2024

They're not opposed to discussing it on slack, but suggested their ticketing system, https://support.scimma.org/

@justinvasel
Copy link
Contributor

See #87 (comment) for a potential solution.

@habig
Copy link
Contributor

habig commented May 13, 2024

The scimma people report that the default max message size is 1MB. They're leery to increase that to head off future scaling problems, and are working on a large file offload service. But: now knowing this we can plan for it, with efficiencies like this PR helping. Could imagine daisy-chaining things together like SMS messages that go over 140 characters, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Corrections on the CLI
4 participants