eval.utils.load_validation_dataset #25

jettjaniak · 2024-02-08T05:48:49Z

No description provided.

siwei-li · 2024-02-08T17:26:53Z

src/delphi/eval/utils.py

+        # regardless of the files we're actually loading
+        split="train",
+    )
+    return cast(Dataset, dataset)


One question: we are still add a BOS token to every sequence in tokenized = load_validation_dataset("tinystories-v2-clean-tokenized-v0") before batching, right?

No, we don't prepend anything in the tokenized dataset. -v0 is weird and doesn't have BOS, but this is what the v0 models were trained on. Final version will have BOS

eval.utils.load_validation_dataset

eb038ee

jettjaniak requested review from jaidhyani, siwei-li and transcendingvictor February 8, 2024 05:48

jettjaniak linked an issue Feb 8, 2024 that may be closed by this pull request

load validation dataset only #4

Closed

jettjaniak merged commit 7ae5d16 into main Feb 8, 2024
1 check passed

jettjaniak deleted the 4-load-validation-dataset-only branch February 8, 2024 05:51

This was referenced Feb 8, 2024

inference script #16

Merged

Add utility functions for text processing and visualization #17

Merged

siwei-li reviewed Feb 8, 2024

View reviewed changes

siwei-li pushed a commit that referenced this pull request Feb 9, 2024

eval.utils.load_validation_dataset (#25)

688485c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eval.utils.load_validation_dataset #25

eval.utils.load_validation_dataset #25

jettjaniak commented Feb 8, 2024

siwei-li Feb 8, 2024

jettjaniak Feb 8, 2024

eval.utils.load_validation_dataset #25

eval.utils.load_validation_dataset #25

Conversation

jettjaniak commented Feb 8, 2024

siwei-li Feb 8, 2024

Choose a reason for hiding this comment

jettjaniak Feb 8, 2024

Choose a reason for hiding this comment