Pose estimation is a task that involves identifying the location of specific points in an image, usually referred to as keypoints. The keypoints can represent various parts of the object such as joints, landmarks, or other distinctive features. The locations of the keypoints are usually represented as a set of 2D** **[x, y]
*or 3D[x, y, visible]
coordinates.
The output of a pose estimation model is a set of points that represent the keypoints on an object in the image, usually along with the confidence scores for each point. Pose estimation is a good choice when you need to identify specific parts of an object in a scene, and their location in relation to each other.
How to use YOLOv8 pretrained Pose models?
from ultralytics import YOLO
# Load a model
model = YOLO('yolov8n-pose.pt')
# Predict with the model
results = model('https://ultralytics.com/images/bus.jpg')
# Extract keypoint
result_keypoint = results.keypoints.xyn.cpu().numpy()[0]
source : https://learnopencv.com/wp-content/uploads/2021/05/fix-overlay-issue.jpg
In the output of YOLOv8 pose estimation, there are no keypoint names. Here’s sample output
To obtain the x, y coordinates by calling the keypoint name, you can create a Pydantic class with a “keypoint” attribute where the keys represent the keypoint names, and the values indicate the index of the keypoint in the YOLOv8 output.
from pydantic import BaseModel
class GetKeypoint(BaseModel):
NOSE: int = 0
LEFT_EYE: int = 1
RIGHT_EYE: int = 2
LEFT_EAR: int = 3
RIGHT_EAR: int = 4
LEFT_SHOULDER: int = 5
RIGHT_SHOULDER: int = 6
LEFT_ELBOW: int = 7
RIGHT_ELBOW: int = 8
LEFT_WRIST: int = 9
RIGHT_WRIST: int = 10
LEFT_HIP: int = 11
RIGHT_HIP: int = 12
LEFT_KNEE: int = 13
RIGHT_KNEE: int = 14
LEFT_ANKLE: int = 15
RIGHT_ANKLE: int = 16
# example
get_keypoint = GetKeypoint()
nose_x, nose_y = result_yolov8[get_keypoint.NOSE]
left_eye_x, left_eye_y = keypoint[get_keypoint.LEFT_EYE]
To classify keypoints, you need to create a keypoint dataset. If you are using images from a public dataset on Kaggle yoga-pose-classification. This dataset have 5 classes Downdog, Goddess, Plank, Tree, Warrior2. I will run pose estimation YoloV8 on each image and extract the output. I extracted the keypoints for each body part to obtain the x, y coordinates, and then I saved them in CSV format.
Let’s proceed with training a multi-class classification model for keypoints using the PyTorch library for neural networks.
class NeuralNet(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super(NeuralNet, self).__init__()
self.l1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.l2 = nn.Linear(hidden_size, num_classes)
def forward(self, x):
out = self.l1(x)
out = self.relu(out)
out = self.l2(out)
return out
hidden_size = 256
model = NeuralNet(X_train.shape[1], hidden_size, len(class_weights))
The neural network architecture consists of two linear layers and a ReLU activation function:
self.l1 = nn.Linear(input_size, hidden_size)
: The first linear layer, which takes the input features and maps them to the hidden layer.self.relu = nn.ReLU()
: The activation function, which applies element-wise rectified linear unit (ReLU) activation to introduce non-linearity.self.l2 = nn.Linear(hidden_size, num_classes)
: The second linear layer, which maps the hidden layer to the output classes.
forward(self, x)
This method defines the forward pass of the neural network. It takes an input tensor** **x
and returns the output tensor. The forward pass involves passing the input through the defined layers in sequence and returning the final output.
learning_rate = 0.01
criterion = nn.CrossEntropyLoss(weight=torch.from_numpy(class_weights.astype(np.float32)))
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
In this code,learning_rate
is set to 0.01, which controls the step size during optimization. The CrossEntropyLoss
criterion is used for multi-class classification, and the weight
parameter is set to the class weights converted to a PyTorch tensor. This allows for handling class imbalance if present in the dataset.
The optimizer is defined as the Adam optimizer, which is a popular optimization algorithm for neural networks. It takes** model.parameters()
as the input, which specifies the parameters of the model to be optimized. The **lr
parameter sets the learning rate for the optimizer.
The results are quite good for a simple Neural Network and the given dataset size, with an accuracy above 90%.