-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoupling inference and model #281
Comments
OK, I thought a little bit more, and I have a conjecture about a sensible thing to do here. Rather than describe implementation details, I'll just describe the concept. As above, suppose observations are produced by a fast clock stochastic process (with observation noise, also fast clocked). For concreteness, let's imagine this is a random walk on the plane, and the noise is Gaussian. Let's say the clock ticks every millisecond. Let's say the inference is a particle filter running at a tick per second. Let's suppose that the resampling buffer Each particle represents a path of the random walk (discretized to one jump per second), but we have access to the past. So for each particle, and a given observation (a tuple of timestamp and a position), at a time This way, observations that happen closer in time to the previous tick are more likely to reweight the previous position (smoothing) and ones nearer the current tick reweight the present position. Conceptually, the idea is that, from the perspective of the slower process, the observations' timestamp is uncertain. The true "latent" timestamp must be one of the ticks of the slow process, but some noise has altered the time. |
I doubt this assumption. Let me try to explain. A model of a purely physical system is best described as a
In the case of the random walk, this is not the case. There is no intrinsic property of the system we describe that forces us to simulate the model at a particular rate. It's an implementation detail that depends on the machine we're running it on, and on our accuracy requirements. The same goes for the inference. We tune the number of particles and the inference rate such that we use all of the available computing power. In practice, this means simulating just a bit faster than 25 fps so that the discreteness of the simulation is not visible when rendered on the screen, and using all the remaining power for inference. I don't see how feeding the inference multiple values from different past time steps can help. After all, to process each of these values, we need to do a computation, so I believe handling 1000 ticks from the last second in one inference step will not be much faster as running the inference step a 1000 times and handle only one value each time. If I understand your angle correctly, you're saying that if I handle 1000 ticks at once, I only need to do one simulation step, but I still need to do a 1000 calculations of the PDF for weighting. So there is still the same scaling, it's just that I don't need to do both simulation and weighting. Maybe you're on a good track here and in fact simulation will always/usually be the bottle neck. But I don't see why it should. The idea with different weights for older values sounds plausible in principle, but I find it ad hoc. You're basically saying "the latent position has diffused since the older measurement, so the older value describes it with less precision, hence we give it a weaker weight". This makes a lot of sense qualitatively, but quantitatively it should depend on the particular stochastic process. For example if the temperature is very low, the position will not move much, and then a measurement from a second ago is nearly as good as one from right now. Also, there is a vague hidden assumption that the process is something like a Levy process with increments with expected value 0, i.e. older values have no systematically different expected value than newer values. But this is already not the case for our example: The particle has a velocity, so if the particle moves e.g. to the right then older values will systematically estimate a value too far to the left. So, as a bottom line, I believe that if there is no necessity to introduce a clock into the model, one shouldn't, and instead run everything as fast as possible such that buffering won't be needed. |
Thanks for writing this! I agree with you that (except for the situations you mentioned) it is natural for the model to be a behavior (i.e. unclocked). However, what I'm wondering about is this: In practice, our particle filter has to run at some rate, which is limited by speed concerns (especially if the number of particles is large). Let's say that for a given problem, it ticks as fast as it can, which is at 10 Hertz. But suppose that observations arrive at a much higher rate (e.g. around 1000 Hertz). How should the inference procedure deal with the 100 samples (all arriving at different times) that it receives since its last tick? Perhaps this is a non-problem for many common settings, since we can just run the inference fast enough. Re the adhoc-ness, those are good points, I agree. |
In such a situation, when we really don't have the computational power to process all observations, I guess we have to throw some away. This is what's done with the |
OK, I think we're more or less in agreement except for 2 points:
|
I can definitely see the motivation. Still, when I try to incorporate a measurement from a past timestamp, then it is about conditioning the state of that time. The straightforward solution is to make sure that the simulation does a step at that timestamp, hence stepping for each input. If somehow want to bypass that I think there cannot be a good general solution, but there may be solutions that use a property of a particular process. Maybe I can't simulate the whole process, but that particular likelihood simplifies. E.g. for the brownian motion I know that the variance grows linearly with time, which gives me the factor by which I need to weight each observation: it will be something like |
OK, so I now have a slightly better understanding of the resampling buffer, and so I thought it would be useful to turn the discussion about the particle filter where the model and inference are on different rates into an issue.
So far, I'll just state my understanding of the problem, to make sure we have common ground, and going forward we can discuss ideas building on top of it.
To make things concrete, here's some code:
(Here, I've removed the dependence on temperature, and the visualization, just because they are orthogonal to the discussion).
Now it seems reasonable to want
inferenceRhine
to tick at a significantly lower rate thanmodelRhine
, since it is more computationally intensive (it basically has to run n versions of modelRhine in parallel). What then happens is thatmodelRhine
produces m observations (i.e. sensor readings) in the time it takesinferenceRhine
to tick once. The question is how the resampling buffer should combine these into 1 observation. Options include:collect
), and passes on all of the observations with their times. It is then the job of the model to receive all of them: I think this just pushes the problem down the roadinferenceRhine
believes the observations all come from (what it believes the current position of the latent variable is, and that cannot be in most of the positions that the latent on the high rate clock ofmodelRhine
suggests)Abstractly, I think what we want is for an interpolation based on the time of the observations. That is, say that the m observations are collected between times t1 and t2 of the clock of
inferenceRhine
. We want to view an given observation as a time-weighted average of observations drawn from the position of the latent at t1 and t2.The text was updated successfully, but these errors were encountered: