feat: Shuffle between epochs #456

MaxiBoether · 2024-05-31T11:26:45Z

This PR introduces a shuffle option for training: If True, then we shuffle the order of the partitions and the keys within the partitions between each epoch.

Note that as described in #460, we might need to have this a bit more finegrained for things like Criteo to optimize performance.

github-actions · 2024-05-31T11:57:45Z

^{( % to main)}
^{( % to main)}

modyn/selector/internal/selector_strategies/new_data_strategy.py

modyn/supervisor/internal/triggers/trigger.py

github-actions · 2024-05-31T20:03:35Z

✅ Result of Pytest Coverage

---------- coverage: platform linux, python 3.12.3-final-0 -----------

Name	Stmts	Miss	Cover
modyn/common/benchmark/stopwatch.py	26	0	100%
modyn/common/example_extension/example_extension.py	28	2	93%
modyn/common/ftp/ftp_server.py	31	18	42%
modyn/common/ftp/ftp_utils.py	83	69	17%
modyn/common/grpc/grpc_helpers.py	67	36	46%
modyn/common/trigger_sample/trigger_sample_storage.py	158	9	94%
modyn/config/schema/config.py	93	0	100%
modyn/config/schema/modyn_base_model.py	5	0	100%
modyn/config/schema/pipeline.py	245	20	92%
modyn/config/schema/sampling/downsampling_config.py	50	1	98%
modyn/database/abstract_database_connection.py	35	0	100%
modyn/database/partition_by_meta.py	33	12	64%
modyn/evaluator/evaluator.py	15	0	100%
modyn/evaluator/evaluator_entrypoint.py	32	3	91%
modyn/evaluator/internal/dataset/evaluation_dataset.py	75	3	96%
modyn/evaluator/internal/grpc/evaluator_grpc_server.py	22	0	100%
modyn/evaluator/internal/grpc/evaluator_grpc_servicer.py	165	14	92%
modyn/evaluator/internal/metric_factory.py	18	1	94%
modyn/evaluator/internal/metrics/abstract_decomposable_metric.py	10	1	90%
modyn/evaluator/internal/metrics/abstract_evaluation_metric.py	29	2	93%
modyn/evaluator/internal/metrics/abstract_holistic_metric.py	10	1	90%
modyn/evaluator/internal/metrics/accuracy.py	20	2	90%
modyn/evaluator/internal/metrics/f1_score.py	63	0	100%
modyn/evaluator/internal/metrics/roc_auc.py	36	1	97%
modyn/evaluator/internal/pytorch_evaluator.py	113	28	75%
modyn/evaluator/internal/utils/evaluation_info.py	9	0	100%
modyn/evaluator/internal/utils/evaluation_process_info.py	8	0	100%
modyn/evaluator/internal/utils/evaluator_messages.py	3	0	100%
modyn/metadata_database/metadata_base.py	3	0	100%
modyn/metadata_database/metadata_database_connection.py	55	3	95%
modyn/metadata_database/models/pipelines.py	22	1	95%
modyn/metadata_database/models/sample_training_metadata.py	15	0	100%
modyn/metadata_database/models/selector_state_metadata.py	47	10	79%
modyn/metadata_database/models/trained_models.py	18	0	100%
modyn/metadata_database/models/trigger_partitions.py	10	0	100%
modyn/metadata_database/models/trigger_training_metadata.py	14	0	100%
modyn/metadata_database/models/triggers.py	10	0	100%
modyn/metadata_database/utils/model_storage_strategy_config.py	21	2	90%
modyn/metadata_processor/internal/grpc/metadata_processor_grpc_servicer.py	18	0	100%
modyn/metadata_processor/internal/grpc/metadata_processor_server.py	24	0	100%
modyn/metadata_processor/internal/metadata_processor_manager.py	23	4	83%
modyn/metadata_processor/metadata_processor.py	11	0	100%
modyn/metadata_processor/metadata_processor_entrypoint.py	24	1	96%
modyn/metadata_processor/processor_strategies/abstract_processor_strategy.py	30	0	100%
modyn/metadata_processor/processor_strategies/basic_processor_strategy.py	17	2	88%
modyn/metadata_processor/processor_strategies/processor_strategy_type.py	6	1	83%
modyn/model_storage/internal/grpc/grpc_server.py	23	0	100%
modyn/model_storage/internal/grpc/model_storage_grpc_servicer.py	54	0	100%
modyn/model_storage/internal/model_storage_manager.py	118	5	96%
modyn/model_storage/internal/storage_strategies/abstract_difference_operator.py	11	2	82%
modyn/model_storage/internal/storage_strategies/abstract_model_storage_strategy.py	16	1	94%
modyn/model_storage/internal/storage_strategies/difference_operators/sub_difference_operator.py	12	0	100%
modyn/model_storage/internal/storage_strategies/difference_operators/xor_difference_operator.py	14	0	100%
modyn/model_storage/internal/storage_strategies/full_model_strategies/abstract_full_model_strategy.py	26	2	92%
modyn/model_storage/internal/storage_strategies/full_model_strategies/binary_full_model.py	16	0	100%
modyn/model_storage/internal/storage_strategies/full_model_strategies/pytorch_full_model.py	15	0	100%
modyn/model_storage/internal/storage_strategies/incremental_model_strategies/abstract_incremental_model_strategy.py	26	10	62%
modyn/model_storage/internal/storage_strategies/incremental_model_strategies/weights_difference.py	99	1	99%
modyn/model_storage/internal/utils/model_storage_policy.py	35	0	100%
modyn/model_storage/model_storage.py	27	3	89%
modyn/model_storage/model_storage_entrypoint.py	32	3	91%
modyn/models/articlenet/articlenet.py	30	16	47%
modyn/models/coreset_methods_support.py	29	1	97%
modyn/models/dlrm/cuda_ext/dot_based_interact.py	24	13	46%
modyn/models/dlrm/cuda_ext/fused_gather_embedding.py	16	16	0%
modyn/models/dlrm/cuda_ext/sparse_embedding.py	32	32	0%
modyn/models/dlrm/dlrm.py	67	9	87%
modyn/models/dlrm/nn/embeddings.py	123	64	48%
modyn/models/dlrm/nn/factories.py	24	9	62%
modyn/models/dlrm/nn/interactions.py	50	11	78%
modyn/models/dlrm/nn/mlps.py	77	23	70%
modyn/models/dlrm/nn/parts.py	60	4	93%
modyn/models/dlrm/setup.py	5	5	0%
modyn/models/dlrm/utils/install_lib.py	11	7	36%
modyn/models/dlrm/utils/utils.py	28	0	100%
modyn/models/dummy/dummy.py	12	0	100%
modyn/models/fmownet/fmownet.py	25	0	100%
modyn/models/resnet18/resnet18.py	28	0	100%
modyn/models/resnet50/resnet50.py	28	0	100%
modyn/models/resnet152/resnet152.py	28	0	100%
modyn/models/tokenizers/distill_bert_tokenizer.py	11	0	100%
modyn/models/yearbooknet/yearbooknet.py	23	0	100%
modyn/selector/internal/grpc/selector_grpc_servicer.py	78	22	72%
modyn/selector/internal/grpc/selector_server.py	33	12	64%
modyn/selector/internal/selector_manager.py	125	37	70%
modyn/selector/internal/selector_strategies/abstract_selection_strategy.py	125	8	94%
modyn/selector/internal/selector_strategies/coreset_strategy.py	66	6	91%
modyn/selector/internal/selector_strategies/downsampling_strategies/abstract_downsampling_strategy.py	29	0	100%
modyn/selector/internal/selector_strategies/downsampling_strategies/craig_downsampling_strategy.py	18	12	33%
modyn/selector/internal/selector_strategies/downsampling_strategies/downsampling_scheduler.py	51	0	100%
modyn/selector/internal/selector_strategies/downsampling_strategies/gradmatch_downsampling_strategy.py	14	8	43%
modyn/selector/internal/selector_strategies/downsampling_strategies/gradnorm_downsampling_strategy.py	6	0	100%
modyn/selector/internal/selector_strategies/downsampling_strategies/kcentergreedy_downsampling_strategy.py	14	8	43%
modyn/selector/internal/selector_strategies/downsampling_strategies/loss_downsampling_strategy.py	6	0	100%
modyn/selector/internal/selector_strategies/downsampling_strategies/no_downsampling_strategy.py	10	0	100%
modyn/selector/internal/selector_strategies/downsampling_strategies/rho_loss_downsampling_strategy.py	36	6	83%
modyn/selector/internal/selector_strategies/downsampling_strategies/submodular_downsampling_strategy.py	20	14	30%
modyn/selector/internal/selector_strategies/downsampling_strategies/uncertainty_downsampling_strategy.py	15	9	40%
modyn/selector/internal/selector_strategies/downsampling_strategies/utils.py	7	0	100%
modyn/selector/internal/selector_strategies/freshness_sampling_strategy.py	130	12	91%
modyn/selector/internal/selector_strategies/new_data_strategy.py	98	10	90%
modyn/selector/internal/selector_strategies/presampling_strategies/abstract_balanced_strategy.py	57	0	100%
modyn/selector/internal/selector_strategies/presampling_strategies/abstract_presampling_strategy.py	23	1	96%
modyn/selector/internal/selector_strategies/presampling_strategies/label_balanced_presampling_strategy.py	7	0	100%
modyn/selector/internal/selector_strategies/presampling_strategies/no_presampling_strategy.py	16	1	94%
modyn/selector/internal/selector_strategies/presampling_strategies/random_no_replacement_presampling_strategy.py	42	0	100%
modyn/selector/internal/selector_strategies/presampling_strategies/random_presampling_strategy.py	17	0	100%
modyn/selector/internal/selector_strategies/presampling_strategies/trigger_balanced_presampling_strategy.py	13	1	92%
modyn/selector/internal/selector_strategies/presampling_strategies/utils.py	9	0	100%
modyn/selector/internal/selector_strategies/utils.py	10	0	100%
modyn/selector/internal/storage_backend/abstract_storage_backend.py	34	7	79%
modyn/selector/internal/storage_backend/database/database_storage_backend.py	85	7	92%
modyn/selector/internal/storage_backend/local/local_storage_backend.py	136	5	96%
modyn/selector/selector.py	82	14	83%
modyn/selector/selector_entrypoint.py	31	3	90%
modyn/supervisor/entrypoint.py	31	3	90%
modyn/supervisor/internal/eval_strategies/abstract_eval_strategy.py	8	1	88%
modyn/supervisor/internal/eval_strategies/matrix_eval_strategy.py	17	0	100%
modyn/supervisor/internal/eval_strategies/offset_eval_strategy.py	22	0	100%
modyn/supervisor/internal/evaluation_result_writer/abstract_evaluation_result_writer.py	16	2	88%
modyn/supervisor/internal/evaluation_result_writer/json_result_writer.py	23	1	96%
modyn/supervisor/internal/evaluation_result_writer/tensorboard_result_writer.py	13	0	100%
modyn/supervisor/internal/grpc/enums.py	55	0	100%
modyn/supervisor/internal/grpc/supervisor_grpc_server.py	25	7	72%
modyn/supervisor/internal/grpc/supervisor_grpc_servicer.py	35	0	100%
modyn/supervisor/internal/grpc/template_msg.py	26	0	100%
modyn/supervisor/internal/grpc_handler.py	301	36	88%
modyn/supervisor/internal/pipeline_executor/models.py	256	34	87%
modyn/supervisor/internal/pipeline_executor/pipeline_executor.py	361	18	95%
modyn/supervisor/internal/supervisor.py	144	17	88%
modyn/supervisor/internal/triggers/amounttrigger.py	15	0	100%
modyn/supervisor/internal/triggers/datadrifttrigger.py	102	28	73%
modyn/supervisor/internal/triggers/embedding_encoder_utils/embedding_encoder.py	30	19	37%
modyn/supervisor/internal/triggers/embedding_encoder_utils/embedding_encoder_downloader.py	50	31	38%
modyn/supervisor/internal/triggers/timetrigger.py	26	3	88%
modyn/supervisor/internal/triggers/trigger.py	21	1	95%
modyn/supervisor/internal/triggers/trigger_datasets/dataloader_info.py	16	13	19%
modyn/supervisor/internal/triggers/trigger_datasets/fixed_keys_dataset.py	72	3	96%
modyn/supervisor/internal/triggers/trigger_datasets/online_trigger_dataset.py	17	1	94%
modyn/supervisor/internal/triggers/utils.py	50	37	26%
modyn/supervisor/internal/utils/evaluation_status_reporter.py	31	0	100%
modyn/supervisor/internal/utils/pipeline_info.py	30	9	70%
modyn/supervisor/internal/utils/training_status_reporter.py	24	3	88%
modyn/tests/common/example_extension/test_example_extension.py	13	0	100%
modyn/tests/common/grpc/test_grpc_helpers.py	3	0	100%
modyn/tests/common/trigger_sample/test_trigger_sample_storage.py	128	0	100%
modyn/tests/config/schema/test_pipeline.py	35	0	100%
modyn/tests/config/test_config_integrity.py	36	1	97%
modyn/tests/conftest.py	39	0	100%
modyn/tests/database/test_abstract_database_connection.py	19	0	100%
modyn/tests/evaluator/internal/dataset/test_evaluation_dataset.py	131	2	98%
modyn/tests/evaluator/internal/grpc/test_evaluator_grpc_server.py	20	0	100%
modyn/tests/evaluator/internal/grpc/test_evaluator_grpc_servicer.py	365	16	96%
modyn/tests/evaluator/internal/metrics/test_accuracy.py	45	0	100%
modyn/tests/evaluator/internal/metrics/test_f1_score.py	53	0	100%
modyn/tests/evaluator/internal/metrics/test_roc_auc.py	31	0	100%
modyn/tests/evaluator/internal/test_metric_factory.py	13	0	100%
modyn/tests/evaluator/internal/test_pytorch_evaluator.py	163	19	88%
modyn/tests/evaluator/test_evaluator.py	30	0	100%
modyn/tests/evaluator/test_evaluator_entrypoint.py	21	0	100%
modyn/tests/metadata_database/models/test_pipelines.py	48	0	100%
modyn/tests/metadata_database/models/test_sample_training_metadata.py	40	0	100%
modyn/tests/metadata_database/models/test_selector_state_metadata.py	46	0	100%
modyn/tests/metadata_database/models/test_trained_models.py	48	0	100%
modyn/tests/metadata_database/models/test_trigger_training_metadata.py	38	0	100%
modyn/tests/metadata_database/models/test_triggers.py	33	0	100%
modyn/tests/metadata_database/test_metadata_database_connection.py	47	0	100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_grpc_servicer.py	26	0	100%
modyn/tests/metadata_processor/internal/grpc/test_metadata_processor_server.py	27	0	100%
modyn/tests/metadata_processor/internal/test_metadata_processor_manager.py	42	3	93%
modyn/tests/metadata_processor/processor_strategies/test_abstract_processor_strategy.py	60	0	100%
modyn/tests/metadata_processor/processor_strategies/test_basic_processor_strategy.py	43	0	100%
modyn/tests/metadata_processor/test_metadata_processor.py	22	3	86%
modyn/tests/metadata_processor/test_metadata_processor_entrypoint.py	21	0	100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_server.py	16	0	100%
modyn/tests/model_storage/internal/grpc/test_model_storage_grpc_servicer.py	100	0	100%
modyn/tests/model_storage/internal/storage_strategies/difference_operators/test_sub_difference_operator.py	16	0	100%
modyn/tests/model_storage/internal/storage_strategies/difference_operators/test_xor_difference_operator.py	16	0	100%
modyn/tests/model_storage/internal/storage_strategies/full_model_strategies/test_binary_full_model.py	27	1	96%
modyn/tests/model_storage/internal/storage_strategies/full_model_strategies/test_pytorch_full_model.py	36	1	97%
modyn/tests/model_storage/internal/storage_strategies/incremental_model_strategies/test_weights_difference.py	88	2	98%
modyn/tests/model_storage/internal/test_model_storage_manager.py	217	1	99%
modyn/tests/model_storage/internal/utils/test_model_storage_policy.py	28	0	100%
modyn/tests/model_storage/test_model_storage.py	37	0	100%
modyn/tests/model_storage/test_model_storage_entrypoint.py	21	0	100%
modyn/tests/models/test_bert_tokenizer.py	24	0	100%
modyn/tests/models/test_dlrm.py	46	0	100%
modyn/tests/models/test_dummy.py	8	0	100%
modyn/tests/models/test_embedding_recorder.py	27	0	100%
modyn/tests/models/test_fmownet.py	25	0	100%
modyn/tests/models/test_resnet18.py	22	0	100%
modyn/tests/models/test_resnet50.py	22	0	100%
modyn/tests/models/test_resnet152.py	22	0	100%
modyn/tests/models/test_yearbook_net.py	47	0	100%
modyn/tests/selector/internal/grpc/test_selector_grpc_servicer.py	132	0	100%
modyn/tests/selector/internal/grpc/test_selector_server.py	16	0	100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_abstract_downsampling_strategy.py	14	0	100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_gradnorm_downsampling_strategy.py	14	0	100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_loss_downsampling_strategy.py	18	0	100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_no_downsampling_strategy.py	6	0	100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_rho_loss_downsampling_strategy.py	68	0	100%
modyn/tests/selector/internal/selector_strategies/downsampling_strategies/test_scheduler.py	131	0	100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_abstract_balanced_strategy.py	14	0	100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_empty_presampling_strategy.py	0	0	100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_label_balanced_presampling_strategy.py	165	0	100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_random_no_replacement_presampling_strategy.py	52	0	100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_random_presampling_strategy.py	86	0	100%
modyn/tests/selector/internal/selector_strategies/presampling_strategies/test_trigger_balanced_presampling.py	140	0	100%
modyn/tests/selector/internal/selector_strategies/test_abstract_selection_strategy.py	170	0	100%
modyn/tests/selector/internal/selector_strategies/test_coreset_strategy.py	246	0	100%
modyn/tests/selector/internal/selector_strategies/test_freshness_sampling_strategy.py	300	0	100%
modyn/tests/selector/internal/selector_strategies/test_new_data_strategy.py	500	0	100%
modyn/tests/selector/internal/storage_backend/database/test_database_storage_backend.py	123	0	100%
modyn/tests/selector/internal/storage_backend/local/test_local_storage_backend.py	84	0	100%
modyn/tests/selector/internal/storage_backend/utils.py	16	5	69%
modyn/tests/selector/internal/test_selector_manager.py	148	5	97%
modyn/tests/selector/test_selector.py	95	5	95%
modyn/tests/selector/test_selector_entrypoint.py	25	0	100%
modyn/tests/supervisor/internal/eval_strategies/test_matrix_eval_strategy.py	16	0	100%
modyn/tests/supervisor/internal/eval_strategies/test_offset_eval_strategy.py	8	0	100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_abstract_evaluation_result_writer.py	7	0	100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_json_result_writer.py	16	0	100%
modyn/tests/supervisor/internal/evaluation_result_writer/test_tensorboard_result_writer.py	21	0	100%
modyn/tests/supervisor/internal/grpc/test_supervisor_grpc_server.py	29	1	97%
modyn/tests/supervisor/internal/grpc/test_supervisor_grpc_servicer.py	54	0	100%
modyn/tests/supervisor/internal/pipeline_executor/test_pipeline_executor.py	348	6	98%
modyn/tests/supervisor/internal/test_grpc_handler.py	287	0	100%
modyn/tests/supervisor/internal/test_supervisor.py	179	5	97%
modyn/tests/supervisor/internal/triggers/test_amounttrigger.py	25	0	100%
modyn/tests/supervisor/internal/triggers/test_datadrifttrigger.py	94	1	99%
modyn/tests/supervisor/internal/triggers/test_timetrigger.py	21	0	100%
modyn/tests/supervisor/internal/triggers/test_trigger.py	5	0	100%
modyn/tests/supervisor/internal/triggers/trigger_datasets/test_fixed_keys_dataset.py	123	2	98%
modyn/tests/supervisor/internal/triggers/trigger_datasets/test_online_trigger_dataset.py	28	2	93%
modyn/tests/supervisor/test_entrypoint.py	25	0	100%
modyn/tests/trainer_server/internal/data/key_sources/test_local_key_source.py	89	0	100%
modyn/tests/trainer_server/internal/data/key_sources/test_selector_key_source.py	92	0	100%
modyn/tests/trainer_server/internal/data/test_data_utils.py	22	1	95%
modyn/tests/trainer_server/internal/data/test_local_dataset_writer.py	59	0	100%
modyn/tests/trainer_server/internal/data/test_online_dataset.py	367	5	99%
modyn/tests/trainer_server/internal/data/test_per_class_online_dataset.py	53	3	94%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_server.py	17	0	100%
modyn/tests/trainer_server/internal/grpc/test_trainer_server_grpc_servicer.py	406	8	98%
modyn/tests/trainer_server/internal/metadata_collector/test_metadata_collector.py	41	0	100%
modyn/tests/trainer_server/internal/trainer/metadata_pytorch_callbacks/test_loss_callback.py	51	1	98%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/deepcore_comparison_tests_utils.py	21	1	95%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_abstract_matrix_downsampling_strategy.py	75	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_abstract_remote_downsampling_strategy.py	12	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_craig_remote_downsampling.py	249	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_get_tensor_subset.py	56	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_gradmatch_downsampling_strategy.py	116	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_gradnorm_downsample.py	92	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_kcenter_downsampling_strategy.py	104	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_loss_downsample.py	82	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_submodular_downsampling_strategy.py	101	0	100%
modyn/tests/trainer_server/internal/trainer/remote_downsamplers/test_remote_uncertainty_downsampling_strategy.py	49	0	100%
modyn/tests/trainer_server/internal/trainer/test_batch_accumulator.py	93	0	100%
modyn/tests/trainer_server/internal/trainer/test_pytorch_trainer.py	412	34	92%
modyn/tests/trainer_server/test_trainer_server.py	34	0	100%
modyn/tests/trainer_server/test_trainer_server_entrypoint.py	21	0	100%
modyn/tests/utils/test_timer.py	22	0	100%
modyn/tests/utils/test_utils.py	175	0	100%
modyn/trainer_server/custom_lr_schedulers/dlrm_lr_scheduler/dlrm_scheduler.py	33	33	0%
modyn/trainer_server/internal/dataset/data_utils.py	17	2	88%
modyn/trainer_server/internal/dataset/key_sources/abstract_key_source.py	21	5	76%
modyn/trainer_server/internal/dataset/key_sources/local_key_source.py	23	1	96%
modyn/trainer_server/internal/dataset/key_sources/selector_key_source.py	54	2	96%
modyn/trainer_server/internal/dataset/local_dataset_writer.py	55	3	95%
modyn/trainer_server/internal/dataset/online_dataset.py	308	29	91%
modyn/trainer_server/internal/dataset/per_class_online_dataset.py	14	0	100%
modyn/trainer_server/internal/grpc/trainer_server_grpc_server.py	22	0	100%
modyn/trainer_server/internal/grpc/trainer_server_grpc_servicer.py	244	38	84%
modyn/trainer_server/internal/metadata_collector/metadata_collector.py	33	0	100%
modyn/trainer_server/internal/mocks/mock_metadata_processor.py	22	2	91%
modyn/trainer_server/internal/trainer/batch_accumulator.py	30	0	100%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/base_callback.py	15	3	80%
modyn/trainer_server/internal/trainer/metadata_pytorch_callbacks/loss_callback.py	21	0	100%
modyn/trainer_server/internal/trainer/pytorch_trainer.py	513	150	71%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_matrix_downsampling_strategy.py	66	4	94%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_per_label_remote_downsample_strategy.py	9	1	89%
modyn/trainer_server/internal/trainer/remote_downsamplers/abstract_remote_downsampling_strategy.py	32	3	91%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/cossim.py	28	17	39%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/euclidean.py	29	12	59%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/k_center_greedy.py	38	4	89%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/orthogonal_matching_pursuit.py	66	34	48%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/shuffling.py	9	0	100%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/submodular_function.py	103	15	85%
modyn/trainer_server/internal/trainer/remote_downsamplers/deepcore_utils/submodular_optimizer.py	116	78	33%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_craig_downsampling.py	95	7	93%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_grad_match_downsampling_strategy.py	17	1	94%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_gradnorm_downsampling.py	42	5	88%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_kcenter_greedy_downsampling_strategy.py	15	0	100%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_loss_downsampling.py	34	5	85%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_submodular_downsampling_strategy.py	30	3	90%
modyn/trainer_server/internal/trainer/remote_downsamplers/remote_uncertainty_downsampling_strategy.py	61	18	70%
modyn/trainer_server/internal/utils/metric_type.py	3	0	100%
modyn/trainer_server/internal/utils/trainer_messages.py	4	0	100%
modyn/trainer_server/internal/utils/training_info.py	53	2	96%
modyn/trainer_server/internal/utils/training_process_info.py	10	0	100%
modyn/trainer_server/trainer_server.py	19	0	100%
modyn/trainer_server/trainer_server_entrypoint.py	32	3	91%
modyn/utils/timer.py	8	0	100%
modyn/utils/utils.py	161	13	92%
TOTAL	18161	1581	91%
Coverage	HTML	written	to
Required	test	coverage	of
===============	2426	passed,	8078

MaxiBoether · 2024-05-31T20:09:47Z

This touches a few files but mostly propagating the shuffle bool through Modyn. The logic changes are mostly in the OnlineDataset and the selector gRPC servicer. Thanks already for the review :) Locally, the integrationtests worked - I hoped now it runs through in CI as well.

XianzheMa

I didn't review everything (the comments now are just nits), but I think I conceptually understand the logic:
We shuffle inter-partition in online_dataset.py by shuffle the partition ids, and shuffle intra-partition on selector side by shuffling data within partition, or directly shuffling data if it is the local key store. Correct?

Right now I have two questions:

For SelectorKeySource, why do we have to shuffle on server (sender) side? Can we shuffle on receiver/client side, like what we did in LocalKeySource, to reduce the radius of changes?
In online_dataset.py, there are two paths prefetched_partition_generator and _fetch_partition_noprefetch to yield data, why is it enough to only apply shuffling in the second path?

modyn/protos/selector.proto

modyn/selector/internal/selector_strategies/new_data_strategy.py

modyn/trainer_server/internal/dataset/online_dataset.py

MaxiBoether · 2024-06-02T10:24:44Z

re your questions in the main comment:

the problem is that the transfer is implemented in a streaming fashion. this means we cannot shuffle on the receiving side (without implementing buffering logic). hence, we have to shuffle at the sending site before we start sending.
I don't get your comment yet. In line 345, shuffling is implemented for _fetch_partition_noprefetch, and in line 317, for _prefetch_partition. Did you maybe miss one of those lines?

MaxiBoether · 2024-06-02T19:54:36Z

@XianzheMa: I will have to remove shuffle from the selector again. The problem with shuffling the keys is that the storage implementation sorts the request by associated file ids, which makes the order returned by the storage somewhat predictable. So basically, we can either just shuffle on a partition level, but then this means that if the partition size is too big, there is no effect from shuffling. Right now we stream partition data as soon as we have it:

        while not self._is_partition_fetched(partition_id):
            max_idx = self._partition_max_index(partition_id)
            if max_idx <= last_idx:  # No new data
                self._wait_for_new_partition_data(partition_id)

            yield from self._get_partition_data(last_idx, max_idx, partition_id)
            last_idx = max_idx

probably I will have to change the logic such that if shuffling is enabled, we wait for the entire partition to be transferred and then shuffle it. i hope this will not affect throughput too much. at some point i can measure this, but it will mostly affect things like criteo. i will implement this tomorrow morning such that we can then merge this PR and you can use the shuffling functionality. sorry i did not realize that shuffling is not straightforward due to the storage implementation

XianzheMa · 2024-06-03T11:00:23Z

@XianzheMa: I will have to remove shuffle from the selector again. The problem with shuffling the keys is that the storage implementation sorts the request by associated file ids, which makes the order returned by the storage somewhat predictable. So basically, we can either just shuffle on a partition level, but then this means that if the partition size is too big, there is no effect from shuffling. Right now we stream partition data as soon as we have it:
        while not self._is_partition_fetched(partition_id):
            max_idx = self._partition_max_index(partition_id)
            if max_idx <= last_idx:  # No new data
                self._wait_for_new_partition_data(partition_id)

            yield from self._get_partition_data(last_idx, max_idx, partition_id)
            last_idx = max_idx
probably I will have to change the logic such that if shuffling is enabled, we wait for the entire partition to be transferred and then shuffle it. i hope this will not affect throughput too much. at some point i can measure this, but it will mostly affect things like criteo. i will implement this tomorrow morning such that we can then merge this PR and you can use the shuffling functionality. sorry i did not realize that shuffling is not straightforward due to the storage implementation

I see. So even if we shuffle the keys, the storage component still return samples not in the key order, but in timestamp order, is this correct?

Do you remember why we want to sort samples in the timestamp order in the first place (instead of naturally returning samples in key order)?

MaxiBoether · 2024-06-03T11:06:28Z

@XianzheMa: I will have to remove shuffle from the selector again. The problem with shuffling the keys is that the storage implementation sorts the request by associated file ids, which makes the order returned by the storage somewhat predictable. So basically, we can either just shuffle on a partition level, but then this means that if the partition size is too big, there is no effect from shuffling. Right now we stream partition data as soon as we have it:
        while not self._is_partition_fetched(partition_id):
            max_idx = self._partition_max_index(partition_id)
            if max_idx <= last_idx:  # No new data
                self._wait_for_new_partition_data(partition_id)

            yield from self._get_partition_data(last_idx, max_idx, partition_id)
            last_idx = max_idx
probably I will have to change the logic such that if shuffling is enabled, we wait for the entire partition to be transferred and then shuffle it. i hope this will not affect throughput too much. at some point i can measure this, but it will mostly affect things like criteo. i will implement this tomorrow morning such that we can then merge this PR and you can use the shuffling functionality. sorry i did not realize that shuffling is not straightforward due to the storage implementation
I see. So even if we shuffle the keys, the storage component still return samples not in the key order, but in timestamp order, is this correct?

Do you remember why we want to sort samples in the timestamp order in the first place (instead of naturally returning samples in key order)?

No, we don't sort by timestamp:

modyn/modyn/storage/include/internal/grpc/storage_service_impl.hpp

Line 525 in 028a646

    
           static void send_sample_data_for_keys_and_file(  // NOLINT(readability-function-cognitive-complexity)

That function implements the sending of results. For performance reasons, we iterate over files (not by timestamp!) in a request. This is because if one file contains many samples, we just want to open and load it into memory once and then read the data.

integrationtests/online_dataset/test_online_dataset.py

XianzheMa

LGTM! My comments are non-critical. The most concern is, shouldn't we add unit test coverage to the online_dataset.py file?

Because, the integration test does not cover everything, for example as you commented, even when shuffling == False, the order of samples can be slightly different.

integrationtests/online_dataset/test_online_dataset.py

modyn/tests/trainer_server/internal/data/test_online_dataset.py

modyn/supervisor/internal/triggers/datadrifttrigger.py

modyn/trainer_server/internal/dataset/online_dataset.py

This PR introduces a `shuffle` option for training: If `True`, then we shuffle the order of the partitions and the keys within the partitions between each epoch. Note that as described in #460, we might need to have this a bit more finegrained for things like Criteo to optimize performance.

store intermediate stuff

f128aa9

MaxiBoether and others added 9 commits May 31, 2024 14:10

wip

44bf525

format

996193d

format and fixes

3688a00

black

c08a0f1

mypy

6ca8d70

Merge branch 'main' into feat/MaxiBoether/shuffle

bdb2b49

some formatting and tests

bba85e9

fixing tests

c0096dd

unittests working locally

f355958

MaxiBoether changed the title ~~[WIP] feat: Shuffle between epochs~~ feat: Shuffle between epochs May 31, 2024

add test for shuffling local key source

2600065

MaxiBoether commented May 31, 2024

View reviewed changes

modyn/selector/internal/selector_strategies/new_data_strategy.py Show resolved Hide resolved

modyn/supervisor/internal/triggers/trigger.py Show resolved Hide resolved

remove comment

be4e0e8

MaxiBoether marked this pull request as ready for review May 31, 2024 20:02

MaxiBoether requested a review from XianzheMa May 31, 2024 20:09

XianzheMa reviewed May 31, 2024

View reviewed changes

modyn/protos/selector.proto Outdated Show resolved Hide resolved

modyn/selector/internal/selector_strategies/new_data_strategy.py Show resolved Hide resolved

modyn/trainer_server/internal/dataset/online_dataset.py Show resolved Hide resolved

why deos it fail in CI but not locally?

aeaa896

MaxiBoether added 5 commits June 2, 2024 12:36

address comments and fix issue in no prefetch case

8f01021

rename variable in other case as well

50db579

more logging for dataset test

005b3a9

more debug logging

d6707d4

log storage reply

a624b65

MaxiBoether added 2 commits June 3, 2024 09:20

remove shuffling keys

962d81f

shuffle partitions

590743c

fix shuffling

219ac8a

MaxiBoether mentioned this pull request Jun 3, 2024

Allow users to specify whether to only shuffle partition order or data within partitions #460

Open

fix modynclient configs and add them to integrity check test

d883d5b

XianzheMa reviewed Jun 3, 2024

View reviewed changes

integrationtests/online_dataset/test_online_dataset.py Show resolved Hide resolved

XianzheMa approved these changes Jun 3, 2024

View reviewed changes

MaxiBoether added 3 commits June 3, 2024 16:23

address comments

65ee365

Merge branch 'main' into feat/MaxiBoether/shuffle

4bc21b9

black...

69dc820

MaxiBoether merged commit 838cb08 into main Jun 3, 2024
24 checks passed

MaxiBoether deleted the feat/MaxiBoether/shuffle branch June 3, 2024 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Shuffle between epochs #456

feat: Shuffle between epochs #456

MaxiBoether commented May 31, 2024 •

edited

Loading

github-actions bot commented May 31, 2024

github-actions bot commented May 31, 2024 •

edited

Loading

MaxiBoether commented May 31, 2024 •

edited

Loading

XianzheMa left a comment •

edited

Loading

MaxiBoether commented Jun 2, 2024

MaxiBoether commented Jun 2, 2024

XianzheMa commented Jun 3, 2024

MaxiBoether commented Jun 3, 2024

XianzheMa left a comment

feat: Shuffle between epochs #456

feat: Shuffle between epochs #456

Conversation

MaxiBoether commented May 31, 2024 • edited Loading

github-actions bot commented May 31, 2024

github-actions bot commented May 31, 2024 • edited Loading

✅ Result of Pytest Coverage

MaxiBoether commented May 31, 2024 • edited Loading

XianzheMa left a comment • edited Loading

Choose a reason for hiding this comment

MaxiBoether commented Jun 2, 2024

MaxiBoether commented Jun 2, 2024

XianzheMa commented Jun 3, 2024

MaxiBoether commented Jun 3, 2024

XianzheMa left a comment

Choose a reason for hiding this comment

MaxiBoether commented May 31, 2024 •

edited

Loading

github-actions bot commented May 31, 2024 •

edited

Loading

MaxiBoether commented May 31, 2024 •

edited

Loading

XianzheMa left a comment •

edited

Loading