-
Notifications
You must be signed in to change notification settings - Fork 14
Adding State Updating and # of Tasks to LSTM and RGCN models #104
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jake, super cool to see these updates going into the shared river-dl codebase. I've made comments that reflect on decisions made by both you and Jeff, plus a few changes to Jeff's original code to maintain readability as the scope of this repo expands.
My one other major comment is about the interfaces you've designed here. I see this block in your ipynb:
old_preds_lstm = model_lstm(inputs)
old_preds_rgcn = model_rgcn(inputs)
h_lstm, c_lstm = model_lstm.rnn_layer.states
h_rgcn = model_rgcn.h_gr
c_rgcn = model_rgcn.c_gr
updated_h_lstm = h_lstm * 500 # update just the states of just the last time step
updated_c_lstm = c_lstm * 500
updated_h_rgcn = h_rgcn[:,-1,:] * 500 # rgcn returns entire timeseries of h & c states so we need to select most recent states
updated_c_rgcn = c_rgcn[:,-1,:] * 500
# initialize the states with the previously trained states
model_lstm.rnn_layer.reset_states(states=[updated_h_lstm, updated_c_lstm])
new_preds_lstm = model_lstm(inputs)
new_preds_rgcn = model_rgcn(inputs, h_init = updated_h_rgcn, c_init = updated_c_rgcn)
new_h_lstm, new_c_lstm = model_lstm.rnn_layer.states
new_h_rgcn = model_rgcn.h_gr
new_c_rgcn = model_rgcn.c_gr
I see these differences in interfaces:
- To access or reset states in the LSTM, you accept or pass in a list, whereas to access or reset states in the RGCN, you accept or pass in two objects.
- RGCN returns the entire timeseries whereas LSTM returns just the most recent states; is there a need to preserve all values in the RGCN, or could that model be changed to store/change just the most recent state as well?
- You reset states in LSTM by calling the built-in
reset_states
method. Could you write such a method forRGCN
as well rather than adding arguments tomodel_rgcn
?
Could you make it so that both models have the same interface? That would make it easier to switch between RCGN and LSTM in any code that uses the state information.
river_dl/RGCN.py
Outdated
""" | ||
|
||
:param hidden_size: [int] the number of hidden units | ||
:param A: [numpy array] adjacency matrix | ||
:param tasks: [int] number of prediction tasks to perform - currently supports either 1 or 2 prediction tasks | ||
:param dropout: [float] value between 0 and 1 for the probability of a reccurent element to be zero | ||
:param flow_in_temp: [bool] whether the flow predictions should feed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter name suggests it'd be hard to generalize to other variables. Prod mostly to @jsadler2 to think about whether/how this parameter name and/or functionality should be adjusted to accommodate, say, DO predictions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be moved to the model
level, not the layer
level because we will be separating out the output layer from the RGCN layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jzwart - like Alison said, great to see these additions to the shared codebase. I'm wondering if we should take what you've started and try to make it even more modular.
- have the
RGCN
layer return theh
's andc
's - I think this makes a lot of sense. As it is written now, the RGCN and the output "layers" are both in one layer class. If we just return theh
's andc
's we can swap out the output layer without changing the actualRGCN
code. This will be useful for if we want to expand to say 3 outputs later on or if we want the model to predict the probability distribution. - multi-task loss functions - I think we should rename and rewrite the loss functions so it is clear they are multi-task loss functions. These functions will take the a 1-d array of lambdas, one for each output variable. That way we don't have to explicitly have a different function for 1, 2, 3, ... variables.
- separate
SingleTaskLSTMModel
- I think it might make sense to have a separateSingleTaskLSTMModel
class. It would be a very simple class, but I think it could make the code more maintainable/clear because we wouldn't have to write in all this logic aboutlambda
,num_tasks
, etc. That logic I guess would be upstream. If someone was running just a single-task model, they'd just use theSingleTaskLSTMModel
and not the other one (maybe we call that one aMultiTaskLSTMModel
). - (something to think about) One multi-task output layer - In line with simplifying, I wonder if we could just have all of the "tasks" share just one output layer. This would really simplify the model definition. We could just have pass (or infer) the number of outputs and then the output layer would output that number. That would preclude the fancy gradient correction routine, but I'm not sure we need that (at least in the main branch).
If you are good with it, Jake, I can commit code to your branch that addresses 1, 2, and 3 above.
# was b2 | ||
self.b_out = self.add_weight( | ||
shape=[1], initializer="zeros", name="b_out" | ||
) | ||
|
||
@tf.function | ||
def call(self, inputs, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to add a docstring to this function so that we are clear on what the arguments are.
Yes, I think that's a good idea and will update so that they have similar interfaces.
That'd be great @jsadler2 , I will add a few other commits based on @aappling-usgs and your comments as well. |
Co-authored-by: Alison Appling <[email protected]>
no magic numbers Co-authored-by: Alison Appling <[email protected]>
Co-authored-by: Alison Appling <[email protected]>
OK @jsadler2 , I've addressed most but not all comments. If you want to commit code to this branch addressing your points 1, 2, and 3 above, then go ahead. I won't commit code for a bit. |
Okay @jzwart - I added a few commits that get at my 1, 2, and 3 points from above. I'm realizing that there is some more simplifying that I think can/should be done here, but I'm not sure if it belongs in this PR. For example, taking out the sample weights relates to the loss functions (#98), but that may be beyond scope here. |
Co-authored-by: Alison Appling <[email protected]>
looks good @jsadler2 , I added a couple comments |
… adding_da_to_models
(just add losses together for multitask)
this check throws error and is handled upstream
@jzwart - I got a little carried away :). I felt like just one thing led to another and I ended doing a couple more major changes that related to what you started. So now in addition to that you wrote in, i.e.,:
I also
These two additional changes were mostly prompted by the I think all the changes together will make the codebase both simpler and more flexible. I'm done making changes, now, so if you could look over what I've done, that'd be great. |
river_dl/train_model.py
Outdated
default="rgcn", | ||
) | ||
parser.add_argument( | ||
"--lamb", help="lambda for weighting aux gradient", default=1.0, type=float | ||
"--num_tasks", | ||
help="number of tasks (outputs to be predicted)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
help="number of tasks (outputs to be predicted)", | |
help="number of tasks (variables to be predicted)", |
I think it looks good @jsadler2 ! I like the unified approach to all the models and similar output for predictions and states. There is a slight difference in how the LSTM and RGCN states are reset and I wonder if we could make them the same. For the RGCN we'd reset by supplying the
but the LSTM states would need to be reset before making predictions rather than supplying to the function:
It'd be nice to make those the same so we don't have to change the prediction / training code as much depending on the model type we're using. I think we'd just need to add an Other than that, I think the changes look good and it can be merged |
self.gradient_correction = gradient_correction | ||
self.grad_log_file = grad_log_file | ||
self.lamb = lamb | ||
self.num_tasks = num_tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add:
self.hidden_size = hidden_size
if doing the suggested h & c state reset below
# adjust auxiliary gradient | ||
gradient_shared_aux = adjust_gradient_list( | ||
gradient_shared_main, gradient_shared_aux, self.grad_log_file | ||
x, h, c = self.rnn_layer(inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose adding in the following right above this line to make the reset of the h
and c
states more similar to RGCN model:
batch_size = inputs.shape[0]
h_init = kwargs.get("h_init", tf.zeros([batch_size, self.hidden_size]))
c_init = kwargs.get("c_init", tf.zeros([batch_size, self.hidden_size]))
self.rnn_layer.reset_states(states = [h_init, c_init])
I think this should work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
Back to you, @jzwart. I'm good with this merge now if you are. |
great, looks good to me. Merging |
I added some functionality to the LSTM and RGCN models so that they can return the h and c states of the LSTM. The LSTM returns the most recent h and c states (i.e. most recent time step), while the RGCN returns the h and c states for every time step since it is looping through each timestep for the LSTM.
I also added options for predicting either one or two tasks for the LSTM and RGCN. Previously, the models only predicted two tasks (i.e. you had to supply observations for two target features), but now both models should be able to predict either one or two tasks while defaulting to predicting only one tasks. This will be a big update and I didn't change
train.py
or other relevant functions that will call the LSTM/RGCN models because I think we need to first see if these proposed changes will work for all the projects. I also wasn't sure what to call the number of prediction tasks - I ended up calling themtasks
since that is what @jsadler2 refers to them in his multi-task modeling paper, but I'm open to other suggestions (e.g. targets, target_features, etc..). I had to update some of the loss functions too since they defaulted to assuming the model was predicting 2 modeling tasks.Here's an example of running these models with the new updates. This example is just predicting a single task and returning the h and c states. I show that both the LSTM and RGCN h and c states can be adjusted, updated, and influence the model predictions when supplied to the LSTM / RGCN as initial h and c states. This workflow also works for 2 tasks but it isn't shown right now.
closes #98, #106