-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics are in random order #255
Comments
That is curious behavior indeed... torchmd-net/torchmdnet/module.py Line 229 in 2c2b5f0
The fact that it is random makes me think of this: https://docs.python.org/3/using/cmdline.html#envvar-PYTHONHASHSEED This is how CSVLogger writes metrics down the line: def log_metrics(self, metrics_dict: Dict[str, float], step: Optional[int] = None) -> None:
"""Record metrics."""
def _handle_value(value: Union[Tensor, Any]) -> Any:
if isinstance(value, Tensor):
return value.item()
return value
if step is None:
step = len(self.metrics)
metrics = {k: _handle_value(v) for k, v in metrics_dict.items()}
metrics["step"] = step
self.metrics.append(metrics)
def save(self) -> None:
"""Save recorded metrics into files."""
if not self.metrics:
return
new_keys = self._record_new_keys()
file_exists = self._fs.isfile(self.metrics_file_path)
if new_keys and file_exists:
# we need to re-write the file if the keys (header) change
self._rewrite_with_new_header(self.metrics_keys)
with self._fs.open(self.metrics_file_path, mode=("a" if file_exists else "w"), newline="") as file:
writer = csv.DictWriter(file, fieldnames=self.metrics_keys)
if not file_exists:
# only write the header if we're writing a fresh file
writer.writeheader()
writer.writerows(self.metrics)
self.metrics = [] # reset I do not see anything that would make this random after python 3.6. AFAICT there is also nothing random in how LNNP constructs the metrics dict: torchmd-net/torchmdnet/module.py Lines 219 to 231 in 2c2b5f0
|
BTW I cannot think of a reason for epoch to be a float, lets change that... I will include it in #231 |
Found this issue: Lightning-AI/pytorch-lightning#18978 |
Fix is already merged and will be included in lightning 2.2. So we just have to wait for this one. Lightning-AI/pytorch-lightning#19159 |
Are you storing the data in a dictionary?
…On Fri, Jan 19, 2024 at 11:20 AM Raul ***@***.***> wrote:
Fix is already merged and will be included in lightning 2.2. So we just
have to wait for this one. Lightning-AI/pytorch-lightning#19159
<Lightning-AI/pytorch-lightning#19159>
—
Reply to this email directly, view it on GitHub
<#255 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3KUOXKS56V52OTRLPYKIDYPJCIDAVCNFSM6AAAAABCA533POVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBQGEZTGNJWGA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
The different losses and other metadata are packed into a dictionary at the end of each epoch and sent to log_dict from lightning. |
Dictionaries are sorted after python 3.7 by the insertion order so it's not the issue. |
The issue is explained here: Lightning-AI/pytorch-lightning#18978 |
Ok, good to know. |
Thanks, Raul |
At some point, the metrics.csv file seems to have changed to putting the columns in a random order. Literally. As in, every training run puts them in a different order! This is confusing and makes the file hard to read.
How about sorting the columns alphabetically? That will give a consistent and logical order: epoch first, all the training losses grouped together, then all the validation losses grouped together in the same order as the training losses.
Bonus points if it can output the epoch as an integer rather than a floating point number.
The text was updated successfully, but these errors were encountered: