Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug in resample function messes up timing #545

Open
pavlis opened this issue Jun 6, 2024 · 6 comments
Open

bug in resample function messes up timing #545

pavlis opened this issue Jun 6, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@pavlis
Copy link
Collaborator

pavlis commented Jun 6, 2024

After many hours over the last 24 I have established that there is a bug in the resample function. Note also there is something wrong with our configuration as none of the docstrings found in algorithms/resample.py are in the sphnyx docomentation. That is just an annoyance, but the bug is serious. It definitely messes up timing. In the draft Session1 notebook for the upcoming short course I had a line to downsample the data from 40 to 20 sps. I was mystified why I wan't getting QC plots windowed around P that looked right. The reason was the window was always at the wrong time because the resample operator did something I don't quite understand. We are going to have to do some controlled test to figure this out. The resample operator wasn't essential for this class anyway so it is easy to remove it and not cause an issue. It may increase some run times since we will be handling twice as many samples but it should not be a problem.

@wangyinz
Copy link
Member

wangyinz commented Jun 6, 2024

oh... that sounds pretty bad. We definitely want to fix it asap.

@wangyinz
Copy link
Member

wangyinz commented Jun 6, 2024

I just got the docs added. The python api doc is not automatic. It needs to be added to the index, which I think was overlooked when resample was added.

@wangyinz wangyinz added the bug Something isn't working label Jun 20, 2024
@pavlis
Copy link
Collaborator Author

pavlis commented Oct 24, 2024

The patch yesterday resolves the timing error bug but it has a minor blemish that will impact performance. I ran a minor variant of the workflow that initiated this issue. The timing problem went away but it seems every TimeSeries passed through the processing ends up with an entry like the following the elog collections:

{
  "_id": {
    "$oid": "671a2b3cfba663610cde28fe"
  },
  "logdata": [
    {
      "job_id": 0,
      "algorithm": "resample",
      "badness": "ErrorSeverity.Complaint",
      "error_message": "sampling_rate inconsistent with 1/dt; updating to 1/dt",
      "process_id": 144912
    }
  ],
  "data_tag": "serial_preprocessed",
  "wf_TimeSeries_id": {
    "$oid": "671a2a551910a104e6c014b7"
  }
}

This is being created by this section of resample:

            mspass_object.dt = self.dt
            # Check for "sampling_rate" attribute
            if mspass_object.is_defined("sampling_rate"):
                sampling_rate = mspass_object["sampling_rate"]
                # Check if sampling_rate is consistent with 1/dt
                if abs(sampling_rate - 1.0 / mspass_object.dt) > 1e-6:
                    # Record inconsistency in error log (elog)
                    message = "sampling_rate inconsistent with 1/dt; updating to 1/dt"
                    mspass_object.elog.log_error("resample",
                                                 message,
                                                 ErrorSeverity.Complaint)
                # Update sampling_rate to 1/dt
                mspass_object["sampling_rate"] = 1.0 / mspass_object.dt
            else:
                # Set sampling_rate to 1/dt
                mspass_object["sampling_rate"] = 1.0 / mspass_object.dt

There are two problems with this section is that it doing something it shouldn't do in testing the current value of "sampling_rate". In this function the "sampling_rate" attribute should just be set to self.samprate as that is what the operator is doing. There is a similar construct in TimeSeries2Trace that is appropriate but this one is not. This operator is changing the sampling rate (aka sampling interval dt) so that value will always be definitive.

I think this construct appears multiple times in the resample module. As I said it is harmless as it doesn't do anything bad but it will impact performance and lead to database bloat when every datum gets an elog entry.

@wangyinz
Copy link
Member

ahh... sorry, I overlooked this. Yes, we need to remove the elog from the resample. @Cloud1e Could you please get that fixed? Thank you!

@Cloud1e
Copy link
Collaborator

Cloud1e commented Oct 24, 2024

ahh... sorry, I overlooked this. Yes, we need to remove the elog from the resample. @Cloud1e Could you please get that fixed? Thank you!

OK. I fixed this problem in resample.py.
But in TimeSeries2Trace, it seems that the variable ts doesn't have attribute "samprate". Can I use 1/dt to update the "sampling_rate" of ts and dresult?

@wangyinz
Copy link
Member

For TimeSeries2Trace, We should add a elog there if sampling_rate doesn't match 1/dt and update it. That was the original bug. Gary is only saying for resample, we don't need to check for the original sampling_rate as we should always update it to whatever rate it is we are resampling to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants