-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rationale of multiple (num_exp) draws of the exponential time delta per event in the draw_next_time_one_step() thinning algorithm #34
Comments
Hi, I think your comment makes sense. This implementation, i believe, is kind of an approximation, and the code is inherited from AttNHP. If you have time, can you improve this code to follow what is proposed in NHP, and see if the result may change(i guess change is marginal) and the efficiency (maybe much slower)? If the modification does not cause efficiency issues, i will be happy to see merge a PR. |
Hi thank you for you response! I'm happy to make the change and test but I'm still trying to figure out how to properly set other config parameters, so hopefully I can come back later when I'm more confident :) |
Np. The thinning algo is the most challenging part to implement. Also you can see there is a multi-step generation issue that i still have not fixed. There is probably some problem in this implementation fo batch-wise sampling. |
Hi! I can suggest an alternative implementation of thinning at HoTPP |
Great work. We will update the code based on yours. |
Thank you! Please, cite the original code in the docstring or comment, if you will be using HoTPP implementation in some way. |
Hi. I know HoTPP should be a more promising option but I need to use EasyTPP now so I spent some time changing the current thinning code to what's described in NHP paper's Algorithm 2 myself. I've made a PR. My tests on the provided datasets show that my implementation consistently outperforms the current one on both one-step-ahead prediction RMSE and efficiency. For example, using the current implementation (picking the exp sample of smallest criterion), running the taxi dataset on the config hhhqss@hhhs-MacBook-Air examples % python train_nhp.py
2024-08-08 20:39:36,296 - config.py[pid:18330;line:33:build_from_yaml_file] - CRITICAL: Load pipeline config class RunnerConfig
2024-08-08 20:39:36,302 - runner_config.py[pid:18330;line:164:update_config] - CRITICAL: train model NHP using CPU with torch backend
2024-08-08 20:39:36,314 - runner_config.py[pid:18330;line:36:__init__] - INFO: Save the config to ./checkpoints/18330_140704444761728_240808-203936/NHP_train_output.yaml
2024-08-08 20:39:36,315 - base_runner.py[pid:18330;line:170:save_log] - INFO: Save the log to ./checkpoints/18330_140704444761728_240808-203936/log
2024-08-08 20:39:38,512 - tpp_runner.py[pid:18330;line:60:_init_model] - INFO: Num of model parameters 59146
/Users/hhhqss/Library/Python/3.8/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
warnings.warn(
2024-08-08 20:39:46,705 - base_runner.py[pid:18330;line:92:train] - INFO: Data 'taxi' loaded...
2024-08-08 20:39:46,705 - base_runner.py[pid:18330;line:97:train] - INFO: Start NHP training...
2024-08-08 20:39:49,654 - tpp_runner.py[pid:18330;line:96:_train_model] - INFO: [ Epoch 0 (train) ]: train loglike is -1.8306055527528788, num_events is 50454
2024-08-08 20:39:59,254 - torch_basemodel.py[pid:18330;line:182:predict_one_step_at_every_event] - CRITICAL: draw_next_time_one_step takes time (seconds):9.396914958953857
2024-08-08 20:39:59,464 - tpp_runner.py[pid:18330;line:107:_train_model] - INFO: [ Epoch 0 (valid) ]: valid loglike is -1.6224344222827596, num_events is 7204, acc is 0.8430038867295947, rmse is 0.362656055077529
2024-08-08 20:40:32,508 - torch_basemodel.py[pid:18330;line:182:predict_one_step_at_every_event] - CRITICAL: draw_next_time_one_step takes time (seconds):32.72161293029785
2024-08-08 20:40:35,118 - torch_basemodel.py[pid:18330;line:182:predict_one_step_at_every_event] - CRITICAL: draw_next_time_one_step takes time (seconds):2.1267929077148438
2024-08-08 20:40:35,219 - tpp_runner.py[pid:18330;line:122:_train_model] - INFO: [ Epoch 0 (test) ]: test loglike is -1.6084953487994105, num_events is 14420, acc is 0.8522884882108183, rmse is 0.3709726238506591
2024-08-08 20:40:35,219 - tpp_runner.py[pid:18330;line:124:_train_model] - CRITICAL: current best loglike on valid set is -1.6224 (updated at epoch-0), best updated at this epoch
2024-08-08 20:40:38,282 - tpp_runner.py[pid:18330;line:96:_train_model] - INFO: [ Epoch 1 (train) ]: train loglike is -1.493789516192968, num_events is 50454
2024-08-08 20:40:46,362 - torch_basemodel.py[pid:18330;line:182:predict_one_step_at_every_event] - CRITICAL: draw_next_time_one_step takes time (seconds):7.884713888168335
2024-08-08 20:40:46,514 - tpp_runner.py[pid:18330;line:107:_train_model] - INFO: [ Epoch 1 (valid) ]: valid loglike is -1.246168578220433, num_events is 7204, acc is 0.8629927817878956, rmse is 0.36263942498794405
2024-08-08 20:41:13,904 - torch_basemodel.py[pid:18330;line:182:predict_one_step_at_every_event] - CRITICAL: draw_next_time_one_step takes time (seconds):27.073447942733765
2024-08-08 20:41:30,474 - torch_basemodel.py[pid:18330;line:182:predict_one_step_at_every_event] - CRITICAL: draw_next_time_one_step takes time (seconds):15.272824764251709
2024-08-08 20:41:30,886 - tpp_runner.py[pid:18330;line:122:_train_model] - INFO: [ Epoch 1 (test) ]: test loglike is -1.2258233397083045, num_events is 14420, acc is 0.8748266296809986, rmse is 0.37095806510138896
2024-08-08 20:41:30,887 - tpp_runner.py[pid:18330;line:124:_train_model] - CRITICAL: current best loglike on valid set is -1.2462 (updated at epoch-1), best updated at this epoch
2024-08-08 20:41:30,898 - base_runner.py[pid:18330;line:104:train] - INFO: End NHP train! Cost time: 1.736m Using my implementation of Algorithm 2 under the same config, the result is: hhhqss@hhhs-MacBook-Air examples % python train_nhp.py
2024-08-08 20:34:38,080 - config.py[pid:18006;line:33:build_from_yaml_file] - CRITICAL: Load pipeline config class RunnerConfig
2024-08-08 20:34:38,081 - runner_config.py[pid:18006;line:164:update_config] - CRITICAL: train model NHP using CPU with torch backend
2024-08-08 20:34:38,086 - runner_config.py[pid:18006;line:36:__init__] - INFO: Save the config to ./checkpoints/18006_140704444761728_240808-203438/NHP_train_output.yaml
2024-08-08 20:34:38,087 - base_runner.py[pid:18006;line:170:save_log] - INFO: Save the log to ./checkpoints/18006_140704444761728_240808-203438/log
2024-08-08 20:34:39,924 - tpp_runner.py[pid:18006;line:60:_init_model] - INFO: Num of model parameters 59146
/Users/hhhqss/Library/Python/3.8/lib/python/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
warnings.warn(
2024-08-08 20:34:48,340 - base_runner.py[pid:18006;line:92:train] - INFO: Data 'taxi' loaded...
2024-08-08 20:34:48,341 - base_runner.py[pid:18006;line:97:train] - INFO: Start NHP training...
2024-08-08 20:34:50,675 - tpp_runner.py[pid:18006;line:96:_train_model] - INFO: [ Epoch 0 (train) ]: train loglike is -1.8306055527528788, num_events is 50454
2024-08-08 20:34:57,606 - torch_basemodel.py[pid:18006;line:182:predict_one_step_at_every_event] - CRITICAL: draw_next_time_one_step takes time (seconds):6.782813787460327
2024-08-08 20:34:57,824 - tpp_runner.py[pid:18006;line:107:_train_model] - INFO: [ Epoch 0 (valid) ]: valid loglike is -1.6224344222827596, num_events is 7204, acc is 0.8428650749583565, rmse is 0.33080250566940167
2024-08-08 20:35:17,381 - torch_basemodel.py[pid:18006;line:182:predict_one_step_at_every_event] - CRITICAL: draw_next_time_one_step takes time (seconds):19.326049089431763
2024-08-08 20:35:20,364 - torch_basemodel.py[pid:18006;line:182:predict_one_step_at_every_event] - CRITICAL: draw_next_time_one_step takes time (seconds):2.4894449710845947
2024-08-08 20:35:20,542 - tpp_runner.py[pid:18006;line:122:_train_model] - INFO: [ Epoch 0 (test) ]: test loglike is -1.6084953487994105, num_events is 14420, acc is 0.8521497919556172, rmse is 0.3386021943378324
2024-08-08 20:35:20,543 - tpp_runner.py[pid:18006;line:124:_train_model] - CRITICAL: current best loglike on valid set is -1.6224 (updated at epoch-0), best updated at this epoch
2024-08-08 20:35:23,670 - tpp_runner.py[pid:18006;line:96:_train_model] - INFO: [ Epoch 1 (train) ]: train loglike is -1.493789516192968, num_events is 50454
2024-08-08 20:35:30,518 - torch_basemodel.py[pid:18006;line:182:predict_one_step_at_every_event] - CRITICAL: draw_next_time_one_step takes time (seconds):6.638453960418701
2024-08-08 20:35:30,677 - tpp_runner.py[pid:18006;line:107:_train_model] - INFO: [ Epoch 1 (valid) ]: valid loglike is -1.246168578220433, num_events is 7204, acc is 0.862576346474181, rmse is 0.33324353030724874
2024-08-08 20:35:52,634 - torch_basemodel.py[pid:18006;line:182:predict_one_step_at_every_event] - CRITICAL: draw_next_time_one_step takes time (seconds):21.6252920627594
2024-08-08 20:35:56,763 - torch_basemodel.py[pid:18006;line:182:predict_one_step_at_every_event] - CRITICAL: draw_next_time_one_step takes time (seconds):3.633074998855591
2024-08-08 20:35:57,085 - tpp_runner.py[pid:18006;line:122:_train_model] - INFO: [ Epoch 1 (test) ]: test loglike is -1.2258233397083045, num_events is 14420, acc is 0.8746879334257975, rmse is 0.34193676160808867
2024-08-08 20:35:57,086 - tpp_runner.py[pid:18006;line:124:_train_model] - CRITICAL: current best loglike on valid set is -1.2462 (updated at epoch-1), best updated at this epoch
2024-08-08 20:35:57,095 - base_runner.py[pid:18006;line:104:train] - INFO: End NHP train! Cost time: 1.146m NHP_train_output.yaml generated in the checkpoints folder after the runs (identical for the 2 experiments above): data_config:
train_dir: easytpp/taxi
valid_dir: easytpp/taxi
test_dir: easytpp/taxi
data_format: easytpp/taxi
data_specs:
num_event_types: 10
pad_token_id: 10
padding_side: right
truncation_side: null
padding_strategy: null
truncation_strategy: null
max_len: null
base_config:
stage: train
backend: torch
dataset_id: taxi
runner_id: std_tpp
model_id: NHP
base_dir: ./checkpoints/
specs:
log_folder: ./checkpoints/88267_140704444761728_240806-135500
saved_model_dir: ./checkpoints/88267_140704444761728_240806-135500/models/saved_model
saved_log_dir: ./checkpoints/88267_140704444761728_240806-135500/log
output_config_dir: ./checkpoints/88267_140704444761728_240806-135500/NHP_train_output.yaml
model_config:
rnn_type: LSTM
hidden_size: 64
time_emb_size: 16
num_layers: 2
mc_num_sample_per_step: 20
sharing_param_layer: false
loss_integral_num_sample_per_step: 20
dropout_rate: 0.0
use_ln: false
thinning:
num_seq: 10
num_sample: 1
num_exp: 500
look_ahead_time: 10
patience_counter: 5
over_sample_rate: 5
num_samples_boundary: 5
dtime_max: 5
num_step_gen: 1
num_event_types_pad: 11
num_event_types: 10
event_pad_index: 10
model_id: NHP
pretrained_model_dir: null
gpu: -1
model_specs:
beta: 1
bias: true
trainer_config:
seed: 2019
gpu: -1
batch_size: 256
max_epoch: 2
shuffle: false
optimizer: adam
learning_rate: 0.001
valid_freq: 1
use_tfb: false
metrics:
- acc
- rmse |
Hi! Algorithm 2 from NHP includes the repeat loop. I don't see the loop in the PR version. Do you think, the implemented approach is a valid alternative? Can it be prooved? |
Hi, I think the correctness can be proved. I did avoid
exp_numbers tensor. I copied this line from #13 (comment).
Now we want to accept the first (in left-to-right order) of these accumulated exp samples that gives criterion < 1 as required by Algorithm 2. If non is accepted, use Now let me go over
criterion ([batch_size, seq_len, num_sample, num_exp]), where the criterion values smaller than 1 are labeled 1 and the rest are labeled 0. Next,
torch.argmax would return the index of the first occurrence of the largest value if multiple equally large values are present in the dimension it searches along. Since masked_crit_less_than_1 is a binary mask tensor, torch.argmax finds the index of first occurrence of 1 in the dimension of exp samples (the 4th dimension). This is exactly the sample we want to accept. Next,
There's one last thing to consider: if none of the exp samples drawn for a certain next-event-time can be accepted, the argmax would be taken over [0,0,...,0] and returns index 0 anyways. We filter out the exp samples that correspond to such argmax results and replace them with self.dtime_max using torch.where :
My explanation power is limited so it would be great if you can try running this |
Thank you for the detailed explanation! It seems convincing. |
Hi. I'm looking at the$t$ in the batch, $\Delta \sim \exp(\lambda^*)$ RV are drawn. Then, the$\frac{u \lambda^*}{\lambda_k(t+\Delta)}$ among all samples of $\Delta$ . If it's smaller than 1; accept it as a sample of next-event-time; Otherwise, take the max sampled $\Delta$ plus 1 as the sampled next-event-time.
draw_next_time_one_step
method ofEventSampler
class intorch_thinning.py
.I see that for any single event that occurred at time
num_exp
samples ofsample_accept
method finds the sample that gives the smallest criterionMy doubt is, drawing multiple one-step-ahead time deltas and accepting the one that gives the smallest criterion doesn't seem to simulate a non-homogeneous poisson process anymore. For example, if the model is a plain Hawkes process (assumption for simplicity), I think if we increase$\Delta$ samples as such samples make the criterion $\frac{u \lambda^*}{\lambda_k(t+\Delta)}$ smaller by making the denominator (which decays as $\Delta$ grows) bigger. For models with non-monotonic change of intensity in the intervals between events, such as NHP, it'll not be such a simple convergence though I think, for which I'm unsure if this method of multiple exp draws work.
num_exp
while keeping everything else unchanged, we certainly will observe smaller accepted one-step-aheadSo why is it ok to do the thinning like this, instead of drawing consecutive$\Delta$ samples and adding them to the last event time $t$ until the criterion $\leq 1$ , like the "repeat...until" procedure of Algorithm 2 in the NHP paper? Is it a approximation (like how intensity boundaries are computed)? My understanding of the math or the code can be wrong but I have been confused by this for several days so I'm asking here. Thank you for your help!
The text was updated successfully, but these errors were encountered: