Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Shuffle between epochs #456

Merged
merged 25 commits into from
Jun 3, 2024
Merged

feat: Shuffle between epochs #456

merged 25 commits into from
Jun 3, 2024

Conversation

MaxiBoether
Copy link
Contributor

@MaxiBoether MaxiBoether commented May 31, 2024

This PR introduces a shuffle option for training: If True, then we shuffle the order of the partitions and the keys within the partitions between each epoch.

Note that as described in #460, we might need to have this a bit more finegrained for things like Criteo to optimize performance.

Copy link

Line Coverage: -% ( % to main)
Branch Coverage: -% ( % to main)

@MaxiBoether MaxiBoether changed the title [WIP] feat: Shuffle between epochs feat: Shuffle between epochs May 31, 2024
@MaxiBoether MaxiBoether marked this pull request as ready for review May 31, 2024 20:02
Copy link

github-actions bot commented May 31, 2024

✅ Result of Pytest Coverage

---------- coverage: platform linux, python 3.12.3-final-0 -----------

Name Stmts Miss Cover
modyn/common/benchmark/stopwatch.py 26 0 100%
modyn/common/example_extension/example_extension.py 28 2 93%
modyn/common/ftp/ftp_server.py 31 18 42%
modyn/common/ftp/ftp_utils.py 83 69 17%
modyn/common/grpc/grpc_helpers.py 67 36 46%
modyn/common/trigger_sample/trigger_sample_storage.py 158 9 94%
modyn/config/schema/config.py 93 0 100%
modyn/config/schema/modyn_base_model.py 5 0 100%
modyn/config/schema/pipeline.py 245 20 92%
modyn/config/schema/sampling/downsampling_config.py 50 1 98%
modyn/database/abstract_database_connection.py 35 0 100%
modyn/database/partition_by_meta.py 33 12 64%
modyn/evaluator/evaluator.py 15 0 100%
modyn/evaluator/evaluator_entrypoint.py 32 3 91%
modyn/evaluator/internal/dataset/evaluation_dataset.py 75 3 96%
modyn/evaluator/internal/grpc/evaluator_grpc_server.py 22 0 100%
modyn/evaluator/internal/grpc/evaluator_grpc_servicer.py 165 14 92%
modyn/evaluator/internal/metric_factory.py 18 1 94%
modyn/evaluator/internal/metrics/abstract_decomposable_metric.py 10 1 90%
modyn/evaluator/internal/metrics/abstract_evaluation_metric.py 29 2 93%
modyn/evaluator/internal/metrics/abstract_holistic_metric.py 10 1 90%
modyn/evaluator/internal/metrics/accuracy.py 20 2 90%
modyn/evaluator/internal/metrics/f1_score.py 63 0 100%
modyn/evaluator/internal/metrics/roc_auc.py 36 1 97%
modyn/evaluator/internal/pytorch_evaluator.py 113 28 75%
modyn/evaluator/internal/utils/evaluation_info.py 9 0 100%
modyn/evaluator/internal/utils/evaluation_process_info.py 8 0 100%
modyn/evaluator/internal/utils/evaluator_messages.py 3 0 100%
modyn/metadata_database/metadata_base.py 3 0 100%
modyn/metadata_database/metadata_database_connection.py 55 3 95%
modyn/metadata_database/models/pipelines.py 22 1 95%
modyn/metadata_database/models/sample_training_metadata.py 15 0 100%
modyn/metadata_database/models/selector_state_metadata.py 47 10 79%
modyn/metadata_database/models/trained_models.py 18 0 100%
modyn/metadata_database/models/trigger_partitions.py 10 0 100%
modyn/metadata_database/models/trigger_training_metadata.py 14 0 100%
modyn/metadata_database/models/triggers.py 10 0 100%
modyn/metadata_database/utils/model_storage_strategy_config.py 21 2 90%
modyn/metadata_processor/internal/grpc/metadata_processor_grpc_servicer.py 18 0 100%
modyn/metadata_processor/internal/grpc/metadata_processor_server.py 24 0 100%
modyn/metadata_processor/internal/metadata_processor_manager.py 23 4 83%
modyn/metadata_processor/metadata_processor.py 11 0 100%
modyn/metadata_processor/metadata_processor_entrypoint.py 24 1 96%
modyn/metadata_processor/processor_strategies/abstract_processor_strategy.py 30 0 100%
modyn/metadata_processor/processor_strategies/basic_processor_strategy.py 17 2 88%
modyn/metadata_processor/processor_strategies/processor_strategy_type.py 6 1 83%
modyn/model_storage/internal/grpc/grpc_server.py 23 0 100%
modyn/model_storage/internal/grpc/model_storage_grpc_servicer.py 54 0 100%
modyn/model_storage/internal/model_storage_manager.py 118 5 96%
modyn/model_storage/internal/storage_strategies/abstract_difference_operator.py 11 2 82%
modyn/model_storage/internal/storage_strategies/abstract_model_storage_strategy.py 16 1 94%
modyn/model_storage/internal/storage_strategies/difference_operators/sub_difference_operator.py 12 0 100%
modyn/model_storage/internal/storage_strategies/difference_operators/xor_difference_operator.py 14 0 100%
modyn/model_storage/internal/storage_strategies/full_model_strategies/abstract_full_model_strategy.py 26 2 92%
modyn/model_storage/internal/storage_strategies/full_model_strategies/binary_full_model.py 16 0 100%
modyn/model_storage/internal/storage_strategies/full_model_strategies/pytorch_full_model.py 15 0 100%
modyn/model_storage/internal/storage_strategies/incremental_model_strategies/abstract_incremental_model_strategy.py 26 10 62%
modyn/model_storage/internal/storage_strategies/incremental_model_strategies/weights_difference.py 99 1 99%
modyn/model_storage/internal/utils/model_storage_policy.py 35 0 100%
modyn/model_storage/model_storage.py 27 3 89%
modyn/model_storage/model_storage_entrypoint.py 32 3 91%
modyn/models/articlenet/articlenet.py 30 16 47%
modyn/models/coreset_methods_support.py 29 1 97%
modyn/models/dlrm/cuda_ext/dot_based_interact.py 24 13 46%
modyn/models/dlrm/cuda_ext/fused_gather_embedding.py 16 16 0%
modyn/models/dlrm/cuda_ext/sparse_embedding.py 32 32 0%
modyn/models/dlrm/dlrm.py 67 9 87%
modyn/models/dlrm/nn/embeddings.py 123 64 48%
modyn/models/dlrm/nn/factories.py 24 9 62%
modyn/models/dlrm/nn/interactions.py 50 11 78%
modyn/models/dlrm/nn/mlps.py 77 23 70%
modyn/models/dlrm/nn/parts.py 60 4 93%
modyn/models/dlrm/setup.py 5 5 0%
modyn/models/dlrm/utils/install_lib.py 11 7 36%
modyn/models/dlrm/utils/utils.py 28 0 100%
modyn/models/dummy/dummy.py 12 0 100%
modyn/models/fmownet/fmownet.py 25 0 100%
modyn/models/resnet18/resnet18.py 28 0 100%
modyn/models/resnet50/resnet50.py 28 0 100%
modyn/models/resnet152/resnet152.py 28 0 100%
modyn/models/tokenizers/distill_bert_tokenizer.py 11 0 100%
modyn/models/yearbooknet/yearbooknet.py 23 0 100%
modyn/selector/internal/grpc/selector_grpc_servicer.py 78 22 72%
modyn/selector/internal/grpc/selector_server.py 33 12 64%
modyn/selector/internal/selector_manager.py 125 37 70%
modyn/selector/internal/selector_strategies/abstract_selection_strategy.py 125 8 94%
modyn/selector/internal/selector_strategies/coreset_strategy.py 66 6 91%
modyn/selector/internal/selector_strategies/downsampling_strategies/abstract_downsampling_strategy.py 29 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/craig_downsampling_strategy.py 18 12 33%
modyn/selector/internal/selector_strategies/downsampling_strategies/downsampling_scheduler.py 51 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/gradmatch_downsampling_strategy.py 14 8 43%
modyn/selector/internal/selector_strategies/downsampling_strategies/gradnorm_downsampling_strategy.py 6 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/kcentergreedy_downsampling_strategy.py 14 8 43%
modyn/selector/internal/selector_strategies/downsampling_strategies/loss_downsampling_strategy.py 6 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/no_downsampling_strategy.py 10 0 100%
modyn/selector/internal/selector_strategies/downsampling_strategies/rho_loss_downsampling_strategy.py 36 6 83%
modyn/selector/internal/selector_strategies/downsampling_strategies/submodular_downsampling_strategy.py 20 14 30%
modyn/selector/internal/selector_strategies/downsampling_strategies/uncertainty_downsampling_strategy.py 15 9 40%
modyn/selector/internal/selector_strategies/downsampling_strategies/utils.py 7 0 100%
modyn/selector/internal/selector_strategies/freshness_sampling_strategy.py 130 12 91%
modyn/selector/internal/selector_strategies/new_data_strategy.py 98 10 90%
modyn/selector/internal/selector_strategies/presampling_strategies/abstract_balanced_strategy.py 57 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/abstract_presampling_strategy.py 23 1 96%
modyn/selector/internal/selector_strategies/presampling_strategies/label_balanced_presampling_strategy.py 7 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/no_presampling_strategy.py 16 1 94%
modyn/selector/internal/selector_strategies/presampling_strategies/random_no_replacement_presampling_strategy.py 42 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/random_presampling_strategy.py 17 0 100%
modyn/selector/internal/selector_strategies/presampling_strategies/trigger_balanced_presampling_strategy.py 13 1 92%
modyn/selector/internal/selector_strategies/presampling_strategies/utils.py 9 0 100%
modyn/selector/internal/selector_strategies/utils.py 10 0 100%
modyn/selector/internal/storage_backend/abstract_storage_backend.py 34 7 79%
modyn/selector/internal/storage_backend/database/database_storage_backend.py 85 7 92%
modyn/selector/internal/storage_backend/local/local_storage_backend.py 136 5 96%
modyn/selector/selector.py 82 14 83%
modyn/selector/selector_entrypoint.py 31 3 90%
modyn/supervisor/entrypoint.py 31 3 90%
modyn/supervisor/internal/eval_strategies/abstract_eval_strategy.py 8 1 88%
modyn/supervisor/internal/eval_strategies/matrix_eval_strategy.py 17 0 100%
modyn/supervisor/internal/eval_strategies/offset_eval_strategy.py 22 0 100%
modyn/supervisor/internal/evaluation_result_writer/abstract_evaluation_result_writer.py 16 2 88%
modyn/supervisor/internal/evaluation_result_writer/json_result_writer.py 23 1 96%
modyn/supervisor/internal/evaluation_result_writer/tensorboard_result_writer.py 13 0 100%
modyn/supervisor/internal/grpc/enums.py 55 0 100%
modyn/supervisor/internal/grpc/supervisor_grpc_server.py 25 7 72%
modyn/supervisor/internal/grpc/supervisor_grpc_servicer.py 35 0 100%
modyn/supervisor/internal/grpc/template_msg.py 26 0 100%
modyn/supervisor/internal/grpc_handler.py 301 36 88%
modyn/supervisor/internal/pipeline_executor/models.py 256 34 87%
modyn/supervisor/internal/pipeline_executor/pipeline_executor.py 361 18 95%
modyn/supervisor/internal/supervisor.py 144 17 88%
modyn/supervisor/internal/triggers/amounttrigger.py 15 0 100%
modyn/supervisor/internal/triggers/datadrifttrigger.py 102 28 73%
modyn/supervisor/internal/triggers/embedding_encoder_utils/embedding_encoder.py 30 19 37%
modyn/supervisor/internal/triggers/embedding_encoder_utils/embedding_encoder_downloader.py 50 31 38%
modyn/supervisor/internal/triggers/timetrigger.py 26 3 88%
modyn/supervisor/internal/triggers/trigger.py 21 1 95%
modyn/supervisor/internal/triggers/trigger_datasets/dataloader_info.py 16 13 19%
modyn/supervisor/internal/triggers/trigger_datasets/fixed_keys_dataset.py 72 3 96%
modyn/supervisor/internal/triggers/trigger_datasets/online_trigger_dataset.py 17 1 94%
modyn/supervisor/internal/triggers/utils.py 50 37 26%
modyn/supervisor/internal/utils/evaluation_status_reporter.py 31 0 100%
modyn/supervisor/internal/utils/pipeline_info.py 30 9 70%
modyn/supervisor/internal/utils/training_status_reporter.py 24 3 88%
modyn/tests/common/example_extension/test_example_extension.py 13 0 100%
modyn/tests/common/grpc/test_grpc_helpers.py 3 0 100%
modyn/tests/common/trigger_sample/test_trigger_sample_storage.py 128 0 100%
modyn/tests/config/schema/test_pipeline.py 35 0 100%
modyn/tests/config/test_config_integrity.py 36 1 97%
modyn/tests/conftest.py 39 0 100%
modyn/tests/database/test_abstract_database_connection.py 19 0 100%
modyn/tests/evaluator/internal/dataset/test_evaluation_dataset.py 131 2 98%
modyn/tests/evaluator/internal/grpc/test_evaluator_grpc_server.py 20 0 100%
modyn/tests/evaluator/internal/grpc/test_evaluator_grpc_servicer.py 365 16 96%
modyn/tests/evaluator/internal/metrics/test_accuracy.py 45 0 100%
modyn/tests/evaluator/internal/metrics/test_f1_score.py 53 0 100%
modyn/tests/evaluator/internal/metrics/test_roc_auc.py 31 0 100%
modyn/tests/evaluator/internal/test_metric_factory.py 13 0 100%
modyn/tests/evaluator/internal/test_pytorch_evaluator.py 163 19 88%
modyn/tests/evaluator/test_evaluator.py 30 0 100%
modyn/tests/evaluator/test_evaluator_entrypoint.py 21 0 100%
modyn/tests/metadata_database/models/test_pipelines.py 48 0 100%
modyn/tests/metadata_database/models/test_sample_training_metadata.py 40 0 100%
modyn/tests/metadata_database/models/test_selector_state_metadata.py 46 0 100%
modyn/tests/metadata_database/models/test_trained_models.py 48 0 100%
modyn/tests/metadata_database/models/test_trigger_training_metadata.py 38 0 100%
modyn/tests/metadata_database/models/test_triggers.py 33 0 100%
modyn/tests/metadata_database/test_metadata_database_connection.py 47 0 100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_grpc_servicer.py 26 0 100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_server.py 27 0 100%
modyn/tests/metadata_processor/internal/test_metadata_processor_manager.py 42 3 93%
modyn/tests/metadata_processor/processor_strategies/test_abstract_processor_strategy.py 60 0 100%
modyn/tests/metadata_processor/processor_strategies/test_basic_processor_strategy.py 43 0 100%
modyn/tests/metadata_processor/test_metadata_processor.py 22 3 86%
modyn/tests/metadata_processor/test_metadata_processor_entrypoint.py 21 0 100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_server.py 16 0 100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_servicer.py 100 0 100%
modyn/tests/model_storage/internal/storage_strategies/difference_operators/test_sub_difference_operator.py 16 0 100%
modyn/tests/model_storage/internal/storage_strategies/difference_operators/test_xor_difference_operator.py 16 0 100%
modyn/tests/model_storage/internal/storage_strategies/full_model_strategies/test_binary_full_model.py 27 1 96%
modyn/tests/model_storage/internal/storage_strategies/full_model_strategies/test_pytorch_full_model.py 36 1 97%
modyn/tests/model_storage/internal/storage_strategies/incremental_model_strategies/test_weights_difference.py 88 2 98%
modyn/tests/model_storage/internal/test_model_storage_manager.py 217 1 99%
modyn/tests/model_storage/internal/utils/test_model_storage_policy.py 28 0 100%
modyn/tests/model_storage/test_model_storage.py 37 0 100%
modyn/tests/model_storage/test_model_storage_entrypoint.py 21 0 100%
modyn/tests/models/test_bert_tokenizer.py 24 0 100%
modyn/tests/models/test_dlrm.py 46 0 100%
modyn/tests/models/test_dummy.py 8 0 100%
modyn/tests/models/test_embedding_recorder.py 27 0 100%
modyn/tests/models/test_fmownet.py 25 0 100%
modyn/tests/models/test_resnet18.py 22 0 100%
modyn/tests/models/test_resnet50.py 22 0 100%
modyn/tests/models/test_resnet152.py 22 0 100%
modyn/tests/models/test_yearbook_net.py 47 0 100%
modyn/tests/selector/internal/grpc/test_selector_grpc_servicer.py 132 0 100%
modyn/tests/selector/internal/grpc/test_selector_server.py 16 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_abstract_downsampling_strategy.py 14 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_gradnorm_downsampling_strategy.py 14 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_loss_downsampling_strategy.py 18 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_no_downsampling_strategy.py 6 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_rho_loss_downsampling_strategy.py 68 0 100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_scheduler.py 131 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_abstract_balanced_strategy.py 14 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_empty_presampling_strategy.py 0 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_label_balanced_presampling_strategy.py 165 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_random_no_replacement_presampling_strategy.py 52 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_random_presampling_strategy.py 86 0 100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_trigger_balanced_presampling.py 140 0 100%
modyn/tests/selector/internal/selector_strategies/test_abstract_selection_strategy.py 170 0 100%
modyn/tests/selector/internal/selector_strategies/test_coreset_strategy.py 246 0 100%
modyn/tests/selector/internal/selector_strategies/test_freshness_sampling_strategy.py 300 0 100%
modyn/tests/selector/internal/selector_strategies/test_new_data_strategy.py 500 0 100%
modyn/tests/selector/internal/storage_backend/database/test_database_storage_backend.py 123 0 100%
modyn/tests/selector/internal/storage_backend/local/test_local_storage_backend.py 84 0 100%
modyn/tests/selector/internal/storage_backend/utils.py 16 5 69%
modyn/tests/selector/internal/test_selector_manager.py 148 5 97%
modyn/tests/selector/test_selector.py 95 5 95%
modyn/tests/selector/test_selector_entrypoint.py 25 0 100%
modyn/tests/supervisor/internal/eval_strategies/test_matrix_eval_strategy.py 16 0 100%
modyn/tests/supervisor/internal/eval_strategies/test_offset_eval_strategy.py 8 0 100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_abstract_evaluation_result_writer.py 7 0 100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_json_result_writer.py 16 0 100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_tensorboard_result_writer.py 21 0 100%
modyn/tests/supervisor/internal/grpc/test_supervisor_grpc_server.py 29 1 97%
modyn/tests/supervisor/internal/grpc/test_supervisor_grpc_servicer.py 54 0 100%
modyn/tests/supervisor/internal/pipeline_executor/test_pipeline_executor.py 348 6 98%
modyn/tests/supervisor/internal/test_grpc_handler.py 287 0 100%
modyn/tests/supervisor/internal/test_supervisor.py 179 5 97%
modyn/tests/supervisor/internal/triggers/test_amounttrigger.py 25 0 100%
modyn/tests/supervisor/internal/triggers/test_datadrifttrigger.py 94 1 99%
modyn/tests/supervisor/internal/triggers/test_timetrigger.py 21 0 100%
modyn/tests/supervisor/internal/triggers/test_trigger.py 5 0 100%
modyn/tests/supervisor/internal/triggers/trigger_datasets/test_fixed_keys_dataset.py 123 2 98%
modyn/tests/supervisor/internal/triggers/trigger_datasets/test_online_trigger_dataset.py 28 2 93%
modyn/tests/supervisor/test_entrypoint.py 25 0 100%
modyn/tests/trainer_server/internal/data/key_sources/test_local_key_source.py 89 0 100%
modyn/tests/trainer_server/internal/data/key_sources/test_selector_key_source.py 92 0 100%
modyn/tests/trainer_server/internal/data/test_data_utils.py 22 1 95%
modyn/tests/trainer_server/internal/data/test_local_dataset_writer.py 59 0 100%
modyn/tests/trainer_server/internal/data/test_online_dataset.py 367 5 99%
modyn/tests/trainer_server/internal/data/test_per_class_online_dataset.py 53 3 94%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_server.py 17 0 100%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_servicer.py 406 8 98%
modyn/tests/trainer_server/internal/metadata_collector/test_metadata_collector.py 41 0 100%
modyn/tests/trainer_server/internal/trainer/metadata_pytorch_callbacks/test_loss_callback.py 51 1 98%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/deepcore_comparison_tests_utils.py 21 1 95%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_abstract_matrix_downsampling_strategy.py 75 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_abstract_remote_downsampling_strategy.py 12 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_craig_remote_downsampling.py 249 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_get_tensor_subset.py 56 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_gradmatch_downsampling_strategy.py 116 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_gradnorm_downsample.py 92 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_kcenter_downsampling_strategy.py 104 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_loss_downsample.py 82 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_submodular_downsampling_strategy.py 101 0 100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_uncertainty_downsampling_strategy.py 49 0 100%
modyn/tests/trainer_server/internal/trainer/test_batch_accumulator.py 93 0 100%
modyn/tests/trainer_server/internal/trainer/test_pytorch_trainer.py 412 34 92%
modyn/tests/trainer_server/test_trainer_server.py 34 0 100%
modyn/tests/trainer_server/test_trainer_server_entrypoint.py 21 0 100%
modyn/tests/utils/test_timer.py 22 0 100%
modyn/tests/utils/test_utils.py 175 0 100%
modyn/trainer_server/custom_lr_schedulers/dlrm_lr_scheduler/dlrm_scheduler.py 33 33 0%
modyn/trainer_server/internal/dataset/data_utils.py 17 2 88%
modyn/trainer_server/internal/dataset/key_sources/abstract_key_source.py 21 5 76%
modyn/trainer_server/internal/dataset/key_sources/local_key_source.py 23 1 96%
modyn/trainer_server/internal/dataset/key_sources/selector_key_source.py 54 2 96%
modyn/trainer_server/internal/dataset/local_dataset_writer.py 55 3 95%
modyn/trainer_server/internal/dataset/online_dataset.py 308 29 91%
modyn/trainer_server/internal/dataset/per_class_online_dataset.py 14 0 100%
modyn/trainer_server/internal/grpc/trainer_server_grpc_server.py 22 0 100%
modyn/trainer_server/internal/grpc/trainer_server_grpc_servicer.py 244 38 84%
modyn/trainer_server/internal/metadata_collector/metadata_collector.py 33 0 100%
modyn/trainer_server/internal/mocks/mock_metadata_processor.py 22 2 91%
modyn/trainer_server/internal/trainer/batch_accumulator.py 30 0 100%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/base_callback.py 15 3 80%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/loss_callback.py 21 0 100%
modyn/trainer_server/internal/trainer/pytorch_trainer.py 513 150 71%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_matrix_downsampling_strategy.py 66 4 94%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_per_label_remote_downsample_strategy.py 9 1 89%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_remote_downsampling_strategy.py 32 3 91%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/cossim.py 28 17 39%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/euclidean.py 29 12 59%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/k_center_greedy.py 38 4 89%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/orthogonal_matching_pursuit.py 66 34 48%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/shuffling.py 9 0 100%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/submodular_function.py 103 15 85%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/submodular_optimizer.py 116 78 33%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_craig_downsampling.py 95 7 93%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_grad_match_downsampling_strategy.py 17 1 94%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_gradnorm_downsampling.py 42 5 88%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_kcenter_greedy_downsampling_strategy.py 15 0 100%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_loss_downsampling.py 34 5 85%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_submodular_downsampling_strategy.py 30 3 90%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_uncertainty_downsampling_strategy.py 61 18 70%
modyn/trainer_server/internal/utils/metric_type.py 3 0 100%
modyn/trainer_server/internal/utils/trainer_messages.py 4 0 100%
modyn/trainer_server/internal/utils/training_info.py 53 2 96%
modyn/trainer_server/internal/utils/training_process_info.py 10 0 100%
modyn/trainer_server/trainer_server.py 19 0 100%
modyn/trainer_server/trainer_server_entrypoint.py 32 3 91%
modyn/utils/timer.py 8 0 100%
modyn/utils/utils.py 161 13 92%
TOTAL 18161 1581 91%
Coverage HTML written to
Required test coverage of
=============== 2426 passed, 8078

@MaxiBoether MaxiBoether requested a review from XianzheMa May 31, 2024 20:09
@MaxiBoether
Copy link
Contributor Author

MaxiBoether commented May 31, 2024

This touches a few files but mostly propagating the shuffle bool through Modyn. The logic changes are mostly in the OnlineDataset and the selector gRPC servicer. Thanks already for the review :) Locally, the integrationtests worked - I hoped now it runs through in CI as well.

Copy link
Collaborator

@XianzheMa XianzheMa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review everything (the comments now are just nits), but I think I conceptually understand the logic:
We shuffle inter-partition in online_dataset.py by shuffle the partition ids, and shuffle intra-partition on selector side by shuffling data within partition, or directly shuffling data if it is the local key store. Correct?

Right now I have two questions:

  1. For SelectorKeySource, why do we have to shuffle on server (sender) side? Can we shuffle on receiver/client side, like what we did in LocalKeySource, to reduce the radius of changes?
  2. In online_dataset.py, there are two paths prefetched_partition_generator and _fetch_partition_noprefetch to yield data, why is it enough to only apply shuffling in the second path?

@MaxiBoether
Copy link
Contributor Author

re your questions in the main comment:

  1. the problem is that the transfer is implemented in a streaming fashion. this means we cannot shuffle on the receiving side (without implementing buffering logic). hence, we have to shuffle at the sending site before we start sending.

  2. I don't get your comment yet. In line 345, shuffling is implemented for _fetch_partition_noprefetch, and in line 317, for _prefetch_partition. Did you maybe miss one of those lines?

@MaxiBoether
Copy link
Contributor Author

@XianzheMa: I will have to remove shuffle from the selector again. The problem with shuffling the keys is that the storage implementation sorts the request by associated file ids, which makes the order returned by the storage somewhat predictable. So basically, we can either just shuffle on a partition level, but then this means that if the partition size is too big, there is no effect from shuffling. Right now we stream partition data as soon as we have it:

        while not self._is_partition_fetched(partition_id):
            max_idx = self._partition_max_index(partition_id)
            if max_idx <= last_idx:  # No new data
                self._wait_for_new_partition_data(partition_id)

            yield from self._get_partition_data(last_idx, max_idx, partition_id)
            last_idx = max_idx

probably I will have to change the logic such that if shuffling is enabled, we wait for the entire partition to be transferred and then shuffle it. i hope this will not affect throughput too much. at some point i can measure this, but it will mostly affect things like criteo. i will implement this tomorrow morning such that we can then merge this PR and you can use the shuffling functionality. sorry i did not realize that shuffling is not straightforward due to the storage implementation

@XianzheMa
Copy link
Collaborator

@XianzheMa: I will have to remove shuffle from the selector again. The problem with shuffling the keys is that the storage implementation sorts the request by associated file ids, which makes the order returned by the storage somewhat predictable. So basically, we can either just shuffle on a partition level, but then this means that if the partition size is too big, there is no effect from shuffling. Right now we stream partition data as soon as we have it:

        while not self._is_partition_fetched(partition_id):
            max_idx = self._partition_max_index(partition_id)
            if max_idx <= last_idx:  # No new data
                self._wait_for_new_partition_data(partition_id)

            yield from self._get_partition_data(last_idx, max_idx, partition_id)
            last_idx = max_idx

probably I will have to change the logic such that if shuffling is enabled, we wait for the entire partition to be transferred and then shuffle it. i hope this will not affect throughput too much. at some point i can measure this, but it will mostly affect things like criteo. i will implement this tomorrow morning such that we can then merge this PR and you can use the shuffling functionality. sorry i did not realize that shuffling is not straightforward due to the storage implementation

I see. So even if we shuffle the keys, the storage component still return samples not in the key order, but in timestamp order, is this correct?

Do you remember why we want to sort samples in the timestamp order in the first place (instead of naturally returning samples in key order)?

@MaxiBoether
Copy link
Contributor Author

@XianzheMa: I will have to remove shuffle from the selector again. The problem with shuffling the keys is that the storage implementation sorts the request by associated file ids, which makes the order returned by the storage somewhat predictable. So basically, we can either just shuffle on a partition level, but then this means that if the partition size is too big, there is no effect from shuffling. Right now we stream partition data as soon as we have it:

        while not self._is_partition_fetched(partition_id):
            max_idx = self._partition_max_index(partition_id)
            if max_idx <= last_idx:  # No new data
                self._wait_for_new_partition_data(partition_id)

            yield from self._get_partition_data(last_idx, max_idx, partition_id)
            last_idx = max_idx

probably I will have to change the logic such that if shuffling is enabled, we wait for the entire partition to be transferred and then shuffle it. i hope this will not affect throughput too much. at some point i can measure this, but it will mostly affect things like criteo. i will implement this tomorrow morning such that we can then merge this PR and you can use the shuffling functionality. sorry i did not realize that shuffling is not straightforward due to the storage implementation

I see. So even if we shuffle the keys, the storage component still return samples not in the key order, but in timestamp order, is this correct?

Do you remember why we want to sort samples in the timestamp order in the first place (instead of naturally returning samples in key order)?

No, we don't sort by timestamp:

static void send_sample_data_for_keys_and_file( // NOLINT(readability-function-cognitive-complexity)

That function implements the sending of results. For performance reasons, we iterate over files (not by timestamp!) in a request. This is because if one file contains many samples, we just want to open and load it into memory once and then read the data.

Copy link
Collaborator

@XianzheMa XianzheMa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! My comments are non-critical. The most concern is, shouldn't we add unit test coverage to the online_dataset.py file?

Because, the integration test does not cover everything, for example as you commented, even when shuffling == False, the order of samples can be slightly different.

@MaxiBoether MaxiBoether merged commit 838cb08 into main Jun 3, 2024
24 checks passed
@MaxiBoether MaxiBoether deleted the feat/MaxiBoether/shuffle branch June 3, 2024 17:05
robinholzi pushed a commit that referenced this pull request Jun 4, 2024
This PR introduces a `shuffle` option for training: If `True`, then we
shuffle the order of the partitions and the keys within the partitions
between each epoch.

Note that as described in #460, we might need to have this a bit more
finegrained for things like Criteo to optimize performance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants