Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training metrics not being recorded in Sagemaker Experiments #169

Open
santoshmedisetty opened this issue Oct 28, 2022 · 0 comments
Open

Comments

@santoshmedisetty
Copy link

santoshmedisetty commented Oct 28, 2022

Hi,

I'm training a YOLOv5 model on sagemaker. I've created an Experiment and Trial for training the model. But the training metrics like precision, recall, mAP, etc are not being recorded in the Sagemaker.

I've followed the process similar to https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-experiments/mnist-handwritten-digits-classification-experiment/mnist-handwritten-digits-classification-experiment.ipynb

Is it a problem with the IAM role or something like that?

I'm triggering the training process using 'Estimator' as shown below.

yolov5_experiment = Experiment.create(
experiment_name=f"yolov5-training-job-{timenow}",
description="yolov5n model training",
sagemaker_boto_client=sm,
)

yolov5_training_job_name = f'yolov5-training-job-{timenow}'

trial_name = f"yolov5-training-job-{timenow}"
yolov5_trial = Trial.create(
trial_name=trial_name,
experiment_name=yolov5_experiment.experiment_name,
sagemaker_boto_client=sm,
)

estimator = Estimator(
image_uri=container,
role=role,
instance_count=1,
instance_type='ml.m4.xlarge',
# instance_type='local',
input_mode='File',
output_path=outpath,
base_job_name='yolov5',
sagemaker_session=sagemaker.Session(sagemaker_client=sm),
metric_definitions=[
{'Name': 'metrics/mAP_0.5', "Regex": "metrics/mAP_0.5: (.?);"},
{'Name': 'metrics/mAP_0.5:0.95', "Regex": "metrics/mAP_0.5:0.95: (.
?);"},
{'Name': 'metrics/recall', "Regex": "metrics/recall: (.?);"},
{'Name': 'metrics/precision', "Regex": "metrics/precision: (.
?);"},
{'Name': 'train/box_loss', "Regex": "train/box_loss: (.?);"},
{'Name': 'train/cls_loss', "Regex": "train/cls_loss: (.
?);"},
{'Name': 'train/obj_loss', "Regex": "train/obj_loss: (.?);"},
{'Name': 'val/cls_loss', "Regex": "val/cls_loss: (.
?);"},
{'Name': 'val/obj_loss', "Regex": "val/obj_loss: (.?);"},
{'Name': 'val/box_loss',"Regex": "val/box_loss: (.
?);"},
{'Name': 'Epoch', "Regex": "Epoch: (.*?);"}
],
enable_sagemaker_metrics=True,

)

estimator.fit(inputs,job_name=yolov5_training_job_name,
experiment_config={
"ExperimentName": yolov5_experiment.experiment_name,
"TrialName": yolov5_trial.trial_name,
"TrialComponentDisplayName": "Training",
},
wait=True,)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant