-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text generation #647
base: main
Are you sure you want to change the base?
Text generation #647
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #647 +/- ##
==========================================
- Coverage 85.58% 85.34% -0.25%
==========================================
Files 258 261 +3
Lines 11378 11464 +86
==========================================
+ Hits 9738 9784 +46
- Misses 1640 1680 +40 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey,
Thanks again for the changes. I did a first pass (without a detailed look at the intricacies of the OnlineDataset) and realized the GetNL call is still there. Let's make this a bit more readable by removing this as mentioned in the detailed comments, and only having the relevant diff here. If you could add some tests for the stuff you change/add, that would also be great. Thank you!
modyn/config/schema/pipeline/data.py
Outdated
no_labels: bool = Field( | ||
False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having a no_labels
flag which is conceptually not part of the pipeline, let's call it has_labels
and make it part of the dataset configuration
@@ -30,7 +30,6 @@ def __init__(self, supervisor: Supervisor, modyn_config: dict) -> None: | |||
def start_pipeline(self, request: StartPipelineRequest, context: grpc.ServicerContext) -> PipelineResponse: | |||
tid = threading.get_native_id() | |||
pid = os.getpid() | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary changes
@@ -295,33 +295,34 @@ TEST_F(BinaryFileWrapperTest, TestGetSamplesFromIndices) { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add some unit tests please to test the include labels false case? (for all file wrappers)
else: # If we have failed, we need to filter out yielded samples | ||
# Note that the returned order by storage is non-deterministic | ||
if not self._include_labels: | ||
yield keys, list(response.samples), None, response_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned by codecov, we are missing a test for the include labels False case for the online dataset. Can you extend the test suite?
for batch in self._train_dataloader: | ||
stopw.stop("FetchBatch") | ||
batch_timings.append(stopw.stop("IndivFetchBatch")) | ||
retrieve_weights_from_dataloader, weighted_optimization = self.weights_handling(len(batch)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you limit the diff to the relevant changes and don't commit the changed new lines please :)? Unless it's an intended cleanup, it's better to only commit the actual changes.
# compute the scores and accumulate them | ||
model_output = self._model.model(data) if self._downsampler.forward_required else torch.Tensor() | ||
embeddings = self.get_embeddings_if_recorded() | ||
# Inform the downsampler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clean up the diff of this file to make it easier so see what is actually changing that is relevant
Took the previous feedback in mind and corrected the code.