You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm training a YOLOv5 model on sagemaker. I've created an Experiment and Trial for training the model. But the training metrics like precision, recall, mAP, etc are not being recorded in the Sagemaker.
Hi,
I'm training a YOLOv5 model on sagemaker. I've created an Experiment and Trial for training the model. But the training metrics like precision, recall, mAP, etc are not being recorded in the Sagemaker.
I've followed the process similar to https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-experiments/mnist-handwritten-digits-classification-experiment/mnist-handwritten-digits-classification-experiment.ipynb
Is it a problem with the IAM role or something like that?
I'm triggering the training process using 'Estimator' as shown below.
yolov5_experiment = Experiment.create(
experiment_name=f"yolov5-training-job-{timenow}",
description="yolov5n model training",
sagemaker_boto_client=sm,
)
yolov5_training_job_name = f'yolov5-training-job-{timenow}'
trial_name = f"yolov5-training-job-{timenow}"
yolov5_trial = Trial.create(
trial_name=trial_name,
experiment_name=yolov5_experiment.experiment_name,
sagemaker_boto_client=sm,
)
estimator = Estimator(
image_uri=container,
role=role,
instance_count=1,
instance_type='ml.m4.xlarge',
# instance_type='local',
input_mode='File',
output_path=outpath,
base_job_name='yolov5',
sagemaker_session=sagemaker.Session(sagemaker_client=sm),
metric_definitions=[
{'Name': 'metrics/mAP_0.5', "Regex": "metrics/mAP_0.5: (.?);"},
{'Name': 'metrics/mAP_0.5:0.95', "Regex": "metrics/mAP_0.5:0.95: (.?);"},
{'Name': 'metrics/recall', "Regex": "metrics/recall: (.?);"},
{'Name': 'metrics/precision', "Regex": "metrics/precision: (.?);"},
{'Name': 'train/box_loss', "Regex": "train/box_loss: (.?);"},
{'Name': 'train/cls_loss', "Regex": "train/cls_loss: (.?);"},
{'Name': 'train/obj_loss', "Regex": "train/obj_loss: (.?);"},
{'Name': 'val/cls_loss', "Regex": "val/cls_loss: (.?);"},
{'Name': 'val/obj_loss', "Regex": "val/obj_loss: (.?);"},
{'Name': 'val/box_loss',"Regex": "val/box_loss: (.?);"},
{'Name': 'Epoch', "Regex": "Epoch: (.*?);"}
],
enable_sagemaker_metrics=True,
)
estimator.fit(inputs,job_name=yolov5_training_job_name,
experiment_config={
"ExperimentName": yolov5_experiment.experiment_name,
"TrialName": yolov5_trial.trial_name,
"TrialComponentDisplayName": "Training",
},
wait=True,)
The text was updated successfully, but these errors were encountered: