Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to train model on SynthText #96

Open
deepakdebug opened this issue Jun 13, 2018 · 3 comments
Open

Unable to train model on SynthText #96

deepakdebug opened this issue Jun 13, 2018 · 3 comments

Comments

@deepakdebug
Copy link

I implemented Textboxes in keras and trained on SynthText on about 1,50,000 images( I am using this figure and not epoch because I am not giving it complete data altogether). I stopped training and verified its results, it is predicting background class in most of the cases. Text class is predicted with accuracy 2.7%. Should I run my model for longer time or is there something wrong I am doing?

@deepakdebug
Copy link
Author

SSDLoss
`'''
The Keras-compatible loss function for the SSD model. Currently supports TensorFlow only.

Copyright (C) 2017 Pierluigi Ferrari

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
'''

import tensorflow as tf
from keras import metrics

class SSDLoss:
'''
The SSD loss, see https://arxiv.org/abs/1512.02325.
'''

def __init__(self,
             neg_pos_ratio=3,
             n_neg_min=0,
             alpha=1.0):
    '''
    Arguments:
        neg_pos_ratio (int, optional): The maximum ratio of negative (i.e. background)
            to positive ground truth boxes to include in the loss computation.
            There are no actual background ground truth boxes of course, but `y_true`
            contains anchor boxes labeled with the background class. Since
            the number of background boxes in `y_true` will ususally exceed
            the number of positive boxes by far, it is necessary to balance
            their influence on the loss. Defaults to 3 following the paper.
        n_neg_min (int, optional): The minimum number of negative ground truth boxes to
            enter the loss computation *per batch*. This argument can be used to make
            sure that the model learns from a minimum number of negatives in batches
            in which there are very few, or even none at all, positive ground truth
            boxes. It defaults to 0 and if used, it should be set to a value that
            stands in reasonable proportion to the batch size used for training.
        alpha (float, optional): A factor to weight the localization loss in the
            computation of the total loss. Defaults to 1.0 following the paper.
    '''
    self.neg_pos_ratio = tf.constant(neg_pos_ratio)
    self.n_neg_min = tf.constant(n_neg_min)
    self.alpha = tf.constant(alpha)
    
def smooth_L1_loss(self, y_true, y_pred):
    '''
    Compute smooth L1 loss, see references.

    Arguments:
        y_true (nD tensor): A TensorFlow tensor of any shape containing the ground truth data.
            In this context, the expected tensor has shape `(batch_size, #boxes, 4)` and
            contains the ground truth bounding box coordinates, where the last dimension
            contains `(xmin, xmax, ymin, ymax)`.
        y_pred (nD tensor): A TensorFlow tensor of identical structure to `y_true` containing
            the predicted data, in this context the predicted bounding box coordinates.

    Returns:
        The smooth L1 loss, a nD-1 Tensorflow tensor. In this context a 2D tensor
        of shape (batch, n_boxes_total).

    References:
        https://arxiv.org/abs/1504.08083
    '''
    absolute_loss = tf.abs(y_true - y_pred)
    square_loss = 0.5 * (y_true - y_pred)**2
    l1_loss = tf.where(tf.less(absolute_loss, 1.0), square_loss, absolute_loss - 0.5)
    return tf.reduce_sum(l1_loss, axis=-1)

def log_loss(self, y_true, y_pred):
    '''
    Compute the softmax log loss.

    Arguments:
        y_true (nD tensor): A TensorFlow tensor of any shape containing the ground truth data.
            In this context, the expected tensor has shape (batch_size, #boxes, #classes)
            and contains the ground truth bounding box categories.
        y_pred (nD tensor): A TensorFlow tensor of identical structure to `y_true` containing
            the predicted data, in this context the predicted bounding box categories.

    Returns:
        The softmax log loss, a nD-1 Tensorflow tensor. In this context a 2D tensor
        of shape (batch, n_boxes_total).
    '''
    # Make sure that `y_pred` doesn't contain any zeros (which would break the log function)
    y_pred = tf.maximum(y_pred, 1e-15)
    # Compute the log loss
    log_loss = -tf.reduce_sum(y_true * tf.log(y_pred), axis=-1)
    return log_loss

def compute_loss(self, y_true, y_pred):
    '''
    Compute the loss of the SSD model prediction against the ground truth.

    Arguments:
        y_true (array): A Numpy array of shape `(batch_size, #boxes, #classes + 8)`,
            where `#boxes` is the total number of boxes that the model predicts
            per image. Be careful to make sure that the index of each given
            box in `y_true` is the same as the index for the corresponding
            box in `y_pred`. The last axis must have length `#classes + 8` and contain
            `[classes one-hot encoded, 4 ground truth box coordinates, 4 arbitrary entries]`
            in this order, including the background class. The last four entries of the
            last axis are not used by this function and therefore their contents are
            irrelevant, they only exist so that `y_true` has the same shape as `y_pred`,
            where the last four entries of the last axis contain the anchor box
            coordinates, which are needed during inference. Important: Boxes that
            you want the cost function to ignore need to have a one-hot
            class vector of all zeros.
        y_pred (Keras tensor): The model prediction. The shape is identical
            to that of `y_true`.

    Returns:
        A scalar, the total multitask loss for classification and localization.
    '''
    batch_size = tf.shape(y_pred)[0] # Output dtype: tf.int32
    n_boxes = tf.shape(y_pred)[1] # Output dtype: tf.int32, note that `n_boxes` in this context denotes the total number of boxes per image, not the number of boxes per cell

    # 1: Compute the losses for class and box predictions for every box

    classification_loss = tf.to_float(self.log_loss(y_true[:,:,:-12], y_pred[:,:,:-12])) # Output shape: (batch_size, n_boxes)
    localization_loss = tf.to_float(self.smooth_L1_loss(y_true[:,:,-12:-8], y_pred[:,:,-12:-8])) # Output shape: (batch_size, n_boxes)

    # 2: Compute the classification losses for the positive and negative targets

    # Create masks for the positive and negative ground truth classes
    negatives = y_true[:,:,0] # Tensor of shape (batch_size, n_boxes)
    positives = tf.to_float(tf.reduce_max(y_true[:,:,1:-12], axis=-1)) # Tensor of shape (batch_size, n_boxes)

    # Count the number of positive boxes (classes 1 to n) in y_true across the whole batch
    n_positive = tf.reduce_sum(positives)
    
    # Now mask all negative boxes and sum up the losses for the positive boxes PER batch item
    # (Keras loss functions must output one scalar loss value PER batch item, rather than just
    # one scalar for the entire batch, that's why we're not summing across all axes)
    pos_class_loss = tf.reduce_sum(classification_loss * positives, axis=-1) # Tensor of shape (batch_size,)

    # Compute the classification loss for the negative default boxes (if there are any)

    # First, compute the classification loss for all negative boxes
    neg_class_loss_all = classification_loss * negatives # Tensor of shape (batch_size, n_boxes)
    n_neg_losses = tf.count_nonzero(neg_class_loss_all, dtype=tf.int32) # The number of non-zero loss entries in `neg_class_loss_all`
    # What's the point of `n_neg_losses`? For the next step, which will be to compute which negative boxes enter the classification
    # loss, we don't just want to know how many negative ground truth boxes there are, but for how many of those there actually is
    # a positive (i.e. non-zero) loss. This is necessary because `tf.nn.top-k()` in the function below will pick the top k boxes with
    # the highest losses no matter what, even if it receives a vector where all losses are zero. In the unlikely event that all negative
    # classification losses ARE actually zero though, this behavior might lead to `tf.nn.top-k()` returning the indices of positive
    # boxes, leading to an incorrect negative classification loss computation, and hence an incorrect overall loss computation.
    # We therefore need to make sure that `n_negative_keep`, which assumes the role of the `k` argument in `tf.nn.top-k()`,
    # is at most the number of negative boxes for which there is a positive classification loss.

    # Compute the number of negative examples we want to account for in the loss
    # We'll keep at most `self.neg_pos_ratio` times the number of positives in `y_true`, but at least `self.n_neg_min` (unless `n_neg_loses` is smaller)
    n_negative_keep = tf.minimum(tf.maximum(self.neg_pos_ratio * tf.to_int32(n_positive), self.n_neg_min), n_neg_losses)

    # In the unlikely case when either (1) there are no negative ground truth boxes at all
    # or (2) the classification loss for all negative boxes is zero, return zero as the `neg_class_loss`
    def f1():
        return tf.zeros([batch_size])
    # Otherwise compute the negative loss
    def f2():
        # Now we'll identify the top-k (where k == `n_negative_keep`) boxes with the highest confidence loss that
        # belong to the background class in the ground truth data. Note that this doesn't necessarily mean that the model
        # predicted the wrong class for those boxes, it just means that the loss for those boxes is the highest.

        # To do this, we reshape `neg_class_loss_all` to 1D...
        neg_class_loss_all_1D = tf.reshape(neg_class_loss_all, [-1]) # Tensor of shape (batch_size * n_boxes,)
        # ...and then we get the indices for the `n_negative_keep` boxes with the highest loss out of those...
        values, indices = tf.nn.top_k(neg_class_loss_all_1D, n_negative_keep, False) # We don't need sorting
        # ...and with these indices we'll create a mask...
        negatives_keep = tf.scatter_nd(tf.expand_dims(indices, axis=1), updates=tf.ones_like(indices, dtype=tf.int32), shape=tf.shape(neg_class_loss_all_1D)) # Tensor of shape (batch_size * n_boxes,)
        negatives_keep = tf.to_float(tf.reshape(negatives_keep, [batch_size, n_boxes])) # Tensor of shape (batch_size, n_boxes)
        # ...and use it to keep only those boxes and mask all other classification losses
        neg_class_loss = tf.reduce_sum(classification_loss * negatives_keep, axis=-1) # Tensor of shape (batch_size,)
        return neg_class_loss

    neg_class_loss = tf.cond(tf.equal(n_neg_losses, tf.constant(0)), f1, f2)

    class_loss = pos_class_loss + neg_class_loss # Tensor of shape (batch_size,)

    # 3: Compute the localization loss for the positive targets
    #    We don't penalize localization loss for negative predicted boxes (obviously: there are no ground truth boxes they would correspond to)

    loc_loss = tf.reduce_sum(localization_loss * positives, axis=-1) # Tensor of shape (batch_size,)

    # 4: Compute the total loss

    total_loss = (class_loss + self.alpha * loc_loss) / tf.maximum(1.0, n_positive) # In case `n_positive == 0`
            
    return total_loss



def accuracy_metric(self, y_true, y_pred):
    acc = tf.to_float(tf.equal(tf.argmax(y_true[:,:,:-12], axis=-1), tf.argmax(y_pred[:,:,:-12], axis=-1)))
    
    positives = tf.to_float(tf.reduce_max(y_true[:,:,1:-12], axis=-1)) # Tensor of shape (batch_size, n_boxes)
    
    n_positive = tf.reduce_sum(positives)
    
    pos_acc = tf.reduce_sum( acc * positives, axis = -1)
    return pos_acc/n_positive

`

@deepakdebug
Copy link
Author

Box encoder

`# -- coding: utf-8 --
"""
Created on Fri May 18 13:07:01 2018

@author: deepakc
"""

import numpy as np

def convert_coordinates(tensor, start_index, conversion):
'''
Convert coordinates for axis-aligned 2D boxes between two coordinate formats.

Creates a copy of `tensor`, i.e. does not operate in place. Currently there are
two supported coordinate formats that can be converted from and to each other:
    1) (xmin, xmax, ymin, ymax) - the 'minmax' format
    2) (cx, cy, w, h) - the 'centroids' format

Note that converting from one of the supported formats to another and back is
an identity operation up to possible rounding errors for integer tensors.

Arguments:
    tensor (array): A Numpy nD array containing the four consecutive coordinates
        to be converted somewhere in the last axis.
    start_index (int): The index of the first coordinate in the last axis of `tensor`.
    conversion (str, optional): The conversion direction. Can be 'minmax2centroids'
        or 'centroids2minmax'. Defaults to 'minmax2centroids'.

Returns:
    A Numpy nD array, a copy of the input tensor with the converted coordinates
    in place of the original coordinates and the unaltered elements of the original
    tensor elsewhere.
'''
ind = start_index
tensor1 = np.copy(tensor).astype(np.float)
if conversion == 'minmax2centroids':
    tensor1[..., ind] = (tensor[..., ind] + tensor[..., ind+2]) / 2.0 # Set cx
    tensor1[..., ind+1] = (tensor[..., ind+1] + tensor[..., ind+3]) / 2.0 # Set cy
    tensor1[..., ind+2] = tensor[..., ind+2] - tensor[..., ind] # Set w
    tensor1[..., ind+3] = tensor[..., ind+3] - tensor[..., ind+2] # Set h
elif conversion == 'centroids2minmax':
    tensor1[..., ind] = tensor[..., ind] - tensor[..., ind+2] / 2.0 # Set xmin
    tensor1[..., ind+1] = tensor[..., ind+1] - tensor[..., ind+3] / 2.0 # Set ymin
    tensor1[..., ind+2] = tensor[..., ind] + tensor[..., ind+2] / 2.0 # Set xmax
    tensor1[..., ind+3] = tensor[..., ind+1] + tensor[..., ind+3] / 2.0 # Set ymax
    
else:
    raise ValueError("Unexpected conversion value. Supported values are 'minmax2centroids' and 'centroids2minmax'.")

return tensor1

def iou(boxes1, boxes2):
'''
Compute the intersection-over-union similarity (also known as Jaccard similarity)
of two axis-aligned 2D rectangular boxes or of multiple axis-aligned 2D rectangular
boxes contained in two arrays with broadcast-compatible shapes.

Three common use cases would be to compute the similarities for 1 vs. 1, 1 vs. `n`,
or `n` vs. `n` boxes. The two arguments are symmetric.

Arguments:
    boxes1 (array): Either a 1D Numpy array of shape `(4, )` containing the coordinates for one box in the
        format specified by `coords` or a 2D Numpy array of shape `(n, 4)` containing the coordinates for `n` boxes.
        Shape must be broadcast-compatible to `boxes2`.
    boxes2 (array): Either a 1D Numpy array of shape `(4, )` containing the coordinates for one box in the
        format specified by `coords` or a 2D Numpy array of shape `(n, 4)` containing the coordinates for `n` boxes.
        Shape must be broadcast-compatible to `boxes1`.
    coords (str, optional): The coordinate format in the input arrays. Can be either 'centroids' for the format
        `(cx, cy, w, h)` or 'minmax' for the format `(xmin, xmax, ymin, ymax)`. Defaults to 'centroids'.

Returns:
    A 1D Numpy array of dtype float containing values in [0,1], the Jaccard similarity of the boxes in `boxes1` and `boxes2`.
    0 means there is no overlap between two given boxes, 1 means their coordinates are identical.
'''

if len(boxes1.shape) > 2: raise ValueError("boxes1 must have rank either 1 or 2, but has rank {}.".format(len(boxes1.shape)))
if len(boxes2.shape) > 2: raise ValueError("boxes2 must have rank either 1 or 2, but has rank {}.".format(len(boxes2.shape)))

if len(boxes1.shape) == 1: boxes1 = np.expand_dims(boxes1, axis=0)
if len(boxes2.shape) == 1: boxes2 = np.expand_dims(boxes2, axis=0)

if not (boxes1.shape[1] == boxes2.shape[1] == 4): raise ValueError("It must be boxes1.shape[1] == boxes2.shape[1] == 4, but it is boxes1.shape[1] == {}, boxes2.shape[1] == {}.".format(boxes1.shape[1], boxes2.shape[1]))


intersection = np.maximum(0, np.minimum(boxes1[:,2], boxes2[:,2]) - np.maximum(boxes1[:,0], boxes2[:,0])) * np.maximum(0, np.minimum(boxes1[:,3], boxes2[:,3]) - np.maximum(boxes1[:,1], boxes2[:,1]))

union = (boxes1[:,2] - boxes1[:,0]) * (boxes1[:,3] - boxes1[:,1]) + (boxes2[:,2] - boxes2[:,0]) * (boxes2[:,3] - boxes2[:,1]) - intersection

return intersection / union

def _greedy_nms2(predictions, iou_threshold=0.45):
'''
The same greedy non-maximum suppression algorithm as above, but slightly modified for use as an internal
function in decode_y2().
'''
boxes_left = np.copy(predictions)
maxima = [] # This is where we store the boxes that make it through the non-maximum suppression
while boxes_left.shape[0] > 0: # While there are still boxes left to compare...
maximum_index = np.argmax(boxes_left[:,1]) # ...get the index of the next box with the highest confidence...
maximum_box = np.copy(boxes_left[maximum_index]) # ...copy that box and...
maxima.append(maximum_box) # ...append it to maxima because we'll definitely keep it
boxes_left = np.delete(boxes_left, maximum_index, axis=0) # Now remove the maximum box from boxes_left
if boxes_left.shape[0] == 0: break # If there are no boxes left after this step, break. Otherwise...
similarities = iou(boxes_left[:,2:], maximum_box[2:]) # ...compare (IoU) the other left over boxes to the maximum box...
boxes_left = boxes_left[similarities <= iou_threshold] # ...so that we can remove the ones that overlap too much with the maximum box
return np.array(maxima)
def decode_y3(y_pred, img_height, img_width, confidence_thresh = 0.1, iou_threshold = 0.1):
y_pred_converted = np.copy(y_pred[:,:,-14:-8]) # Slice out the four offset predictions plus two elements whereto we'll write the class IDs and confidences in the next step
y_pred_converted[:,:,0] = np.argmax(y_pred[:,:,1:-12], axis=-1) # The indices of the highest confidence values in the one-hot class vectors are the class ID
y_pred_converted[:,:,1] = np.amax(y_pred[:,:,1:-12], axis=-1) # Store the confidence values themselves, too

y_pred_converted[:,:,2:] += y_pred[:,:,-8:-4] # delta(pred) + anchor == pred for all four coordinates

y_pred_converted[:,:,[2,4]] *= img_width # Convert xmin, xmax back to absolute coordinates
y_pred_converted[:,:,[3,5]] *= img_height # Convert ymin, ymax back to absolute coordinates
y_pred_converted = np.maximum(y_pred_converted[:,:,1], 0.1)

y_pred_decoded = []

for batch_item in y_pred_converted:
    
  boxes = batch_item[np.nonzero(batch_item[:,])] # ...get all boxes that don't belong to the  
  
  boxes = boxes[boxes[:,1] >= confidence_thresh]
  boxes = _greedy_nms2(boxes, iou_threshold=iou_threshold)
  y_pred_decoded.append(boxes)

return y_pred_decoded

def decode_y2(y_pred, img_height, img_width, confidence_thresh = 0.5, iou_threshold = 0.7):
y_pred_converted = np.copy(y_pred[:,:,-14:-8]) # Slice out the four offset predictions plus two elements whereto we'll write the class IDs and confidences in the next step
y_pred_converted[:,:,0] = np.argmax(y_pred[:,:,:-12], axis=-1) # The indices of the highest confidence values in the one-hot class vectors are the class ID
y_pred_converted[:,:,1] = np.amax(y_pred[:,:,:-12], axis=-1) # Store the confidence values themselves, too

y_pred_converted[:,:,2:] += y_pred[:,:,-8:-4] # delta(pred) + anchor == pred for all four coordinates

y_pred_converted[:,:,[2,4]] *= img_width # Convert xmin, xmax back to absolute coordinates
y_pred_converted[:,:,[3,5]] *= img_height # Convert ymin, ymax back to absolute coordinates

y_pred_decoded = []

for batch_item in y_pred_converted:
    
  boxes = batch_item[np.nonzero(batch_item[:,0])] # ...get all boxes that don't belong to the  
  
  boxes = boxes[boxes[:,1] >= confidence_thresh]
  boxes = _greedy_nms2(boxes, iou_threshold=iou_threshold)
  y_pred_decoded.append(boxes)

return y_pred_decoded

class SSDEncoder:
def init(self, img_height, img_width, n_classes, featureMapSizes, scales, aspect_ratio = [2, 3, 4], variances=[1.0, 1.0, 1.0, 1.0], pos_iou_threshold=0.7, neg_iou_threshold=0.7):
self.img_width = img_width
self.img_height = img_height
self.n_classes = n_classes
self.featureMapSizes = featureMapSizes

    self.scales = np.linspace(scales[0], scales[1], len(featureMapSizes))
    self.aspect_ratio = np.sort(aspect_ratio)
    
    self.variances = variances
    self.pos_iou_threshold = pos_iou_threshold
    self.neg_iou_threshold = neg_iou_threshold
    self.size = min(self.img_height,self.img_width)
    self.n_boxes=len(self.aspect_ratio)
    
def generate_anchor_boxes(self, feature_map_size,this_scale, batch_size):
    wh_list = []
    for ar in self.aspect_ratio:
        w = this_scale * self.size * np.sqrt(ar)
        h = this_scale * self.size / np.sqrt(ar)
        wh_list.append((w,h))
    wh_list = np.array(wh_list)
    
    cell_height = self.img_height / feature_map_size[0]
    cell_width = self.img_width / feature_map_size[1]
    cx = np.linspace(cell_width/2, self.img_width-cell_width/2, feature_map_size[1])
    cy = np.linspace(cell_height/2, self.img_height-cell_height/2, feature_map_size[0])
    cx_grid, cy_grid = np.meshgrid(cx, cy)
    cx_grid = np.expand_dims(cx_grid, -1) # This is necessary for np.tile() to do what we want further down
    cy_grid = np.expand_dims(cy_grid, -1) # This is necessary for np.tile() to do what we want further down
    
    boxes_tensor = np.zeros((feature_map_size[0], feature_map_size[1], self.n_boxes, 4))   
    boxes_tensor[:, :, :, 0] = np.tile(cx_grid, (1, 1, self.n_boxes)) # Set cx
    boxes_tensor[:, :, :, 1] = np.tile(cy_grid, (1, 1, self.n_boxes)) # Set cy
    boxes_tensor[:, :, :, 2] = wh_list[:, 0] # Set w
    boxes_tensor[:, :, :, 3] = wh_list[:, 1] # Set h
    
    boxes_tensor = convert_coordinates(boxes_tensor, start_index=0, conversion='centroids2minmax')
    
    x_coords = boxes_tensor[:,:,:,[0, 2]]
    x_coords[x_coords >= self.img_width] = self.img_width - 1
    x_coords[x_coords < 0] = 0
    boxes_tensor[:,:,:,[0, 2]] = x_coords
    y_coords = boxes_tensor[:,:,:,[1, 3]]
    y_coords[y_coords >= self.img_height] = self.img_height - 1
    y_coords[y_coords < 0] = 0
    boxes_tensor[:,:,:,[1, 3]] = y_coords
    
    
    boxes_tensor[:, :, :,0] /= self.img_width
    boxes_tensor[:, :, :,2] /= self.img_width
    boxes_tensor[:, :, :,1] /= self.img_height
    boxes_tensor[:, :, :,3] /= self.img_height
    
   
    boxes_tensor = np.expand_dims(boxes_tensor, axis=0)
    boxes_tensor = np.tile(boxes_tensor, (batch_size, 1, 1, 1, 1))
    
    boxes_tensor = np.reshape(boxes_tensor, (batch_size, -1, 4))
    
    return boxes_tensor

def generate_encode_template(self, batch_size):
    boxes_tensor = []
    for i in range(len(self.featureMapSizes)):
        boxes_tensor.append(self.generate_anchor_boxes( feature_map_size=self.featureMapSizes[i], this_scale=self.scales[i], batch_size = batch_size))
    
    boxes_tensor = np.concatenate(boxes_tensor, axis=1)
    classes_tensor = np.zeros((batch_size, boxes_tensor.shape[1], self.n_classes))
    variances_tensor = np.zeros_like(boxes_tensor)
    variances_tensor += self.variances
    
    y_encode_template = np.concatenate((classes_tensor, boxes_tensor, boxes_tensor, variances_tensor), axis=2)                                                   
    
    return y_encode_template  

def encode_y(self, ground_truth_labels, batch_size):
    y_encode_template = self.generate_encode_template(batch_size)
    y_encoded = np.copy(y_encode_template)
    
    class_vector = np.eye(self.n_classes)
    
    for i in range(len(ground_truth_labels)): # For each batch item...
        available_boxes = np.ones((y_encode_template.shape[1])) # 1 for all anchor boxes that are not yet matched to a ground truth box, 0 otherwise
        negative_boxes = np.ones((y_encode_template.shape[1])) # 1 for all negative boxes, 0 otherwise
        for true_box in ground_truth_labels[i]: # For each ground truth box belonging to the current batch item...
            true_box = np.array(true_box, dtype=np.float32)  
            true_box[0] /= self.img_width # Normalize xmin and xmax to be within [0,1]
            true_box[1] /= self.img_height # Normalize ymin and ymax to be within [0,1]
            true_box[2] /= self.img_width # Normalize xmin and xmax to be within [0,1]
            true_box[3] /= self.img_height # Normalize ymin and ymax to be within [0,1]
            
            similarities = iou(y_encode_template[i,:,-12:-8], true_box) # The iou similarities for all anchor boxes
            negative_boxes[similarities >= self.neg_iou_threshold] = 0 # If a negative box gets an IoU match >= `self.neg_iou_threshold`, it's no longer a valid negative box
            similarities *= available_boxes # Filter out anchor boxes which aren't available anymore (i.e. already matched to a different ground truth box)
            available_and_thresh_met = np.copy(similarities)
            available_and_thresh_met[available_and_thresh_met < self.pos_iou_threshold] = 0 # Filter out anchor boxes which don't meet the iou threshold
            assign_indices = np.nonzero(available_and_thresh_met)[0] # Get the indices of the left-over anchor boxes to which we want to assign this ground truth box
            if len(assign_indices) > 0: # If we have any matches
                
                y_encoded[i,assign_indices,:-8] = np.concatenate((class_vector[int(self.n_classes - 1)], true_box), axis=0) # Write the ground truth box coordinates and class to all assigned anchor box positions. Remember that the last four elements of `y_encoded` are just dummy entries.
                available_boxes[assign_indices] = 0
                # Make the assigned anchor boxes unavailable for the next ground truth box
            else: # If we don't have any matches
                best_match_index = np.argmax(similarities) # Get the index of the best iou match out of all available boxes
                y_encoded[i,best_match_index,:-8] = np.concatenate((class_vector[int(self.n_classes - 1)], true_box), axis=0) # Write the ground truth box coordinates and class to the best match anchor box position
                available_boxes[best_match_index] = 0 # Make the assigned anchor box unavailable for the next ground truth box
                negative_boxes[best_match_index] = 0 # The assigned anchor box is no longer a negative box
        # Set the classes of all remaining available anchor boxes to class zero
        background_class_indices = np.nonzero(negative_boxes)[0]
        y_encoded[i,background_class_indices,0] = 1
        
    y_encoded[:,:,-12:-8] -= y_encode_template[:,:,-12:-8] # (gt - anchor) for all four coordinates
        
     #   y_encoded[:,:,[-12,-10]] /= np.expand_dims(y_encode_template[:,:,-10] - y_encode_template[:,:,-12], axis=-1) # (xmin(gt) - xmin(anchor)) / w(anchor), (xmax(gt) - xmax(anchor)) / w(anchor)
     #   y_encoded[:,:,[-11,-9]] /= np.expand_dims(y_encode_template[:,:,-9] - y_encode_template[:,:,-11], axis=-1) # (ymin(gt) - ymin(anchor)) / h(anchor), (ymax(gt) - ymax(anchor)) / h(anchor)
     #   y_encoded[:,:,-12:-8] /= y_encode_template[:,:,-4:] # (gt - anchor) / size(anchor) / variance for all four coordinates, where 'size' refers to w and h respectively
            
    
    return y_encoded
        `

@deepakdebug
Copy link
Author

Model
`'''
A small 7-layer Keras model with SSD architecture. Also serves as a template to build arbitrary network architectures.

Copyright (C) 2017 Pierluigi Ferrari

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
'''

import numpy as np
from keras.models import Model
from keras.layers import Input, Lambda, Conv2D, MaxPooling2D, BatchNormalization, ELU, Reshape, Concatenate, Activation, ZeroPadding2D, GlobalAveragePooling2D, Dense
from keras_layer_L2Normalization import L2Normalization
from keras_layer_AnchorBoxes import AnchorBoxes

from keras.applications.vgg16 import VGG16

def build_model(image_size,
n_classes,
min_scale=0.1,
max_scale=0.9,
aspect_ratios_global=[0.5, 1.0, 2.0],
aspect_ratios_per_layer=None,
variances=[1.0, 1.0, 1.0, 1.0]):
'''
Build a Keras model with SSD architecture, see references.

The model consists of convolutional feature layers and a number of convolutional
predictor layers that take their input from different feature layers.
The model is fully convolutional.

The implementation found here is a smaller version of the original architecture
used in the paper (where the base network consists of a modified VGG-16 extended
by a few convolutional feature layers), but of course it could easily be changed to
an arbitrarily large SSD architecture by following the general design pattern used here.
This implementation has 7 convolutional layers and 4 convolutional predictor
layers that take their input from layers 4, 5, 6, and 7, respectively.

In case you're wondering why this function has so many arguments: All arguments except
the first two (`image_size` and `n_classes`) are only needed so that the anchor box
layers can produce the correct anchor boxes. In case you're training the network, the
parameters passed here must be the same as the ones used to set up `SSDBoxEncoder`.
In case you're loading trained weights, the parameters passed here must be the same
as the ones used to produce the trained weights.

Note: Requires Keras v2.0 or later. Training currently works only with the
TensorFlow backend (v1.0 or later).

Arguments:
    image_size (tuple): The input image size in the format `(height, width, channels)`.
    n_classes (int): The number of categories for classification including
        the background class (i.e. the number of positive classes +1 for
        the background calss).
    min_scale (float, optional): The smallest scaling factor for the size of the anchor boxes as a fraction
        of the shorter side of the input images. Defaults to 0.1.
    max_scale (float, optional): The largest scaling factor for the size of the anchor boxes as a fraction
        of the shorter side of the input images. All scaling factors between the smallest and the
        largest will be linearly interpolated. Note that the second to last of the linearly interpolated
        scaling factors will actually be the scaling factor for the last predictor layer, while the last
        scaling factor is used for the second box for aspect ratio 1 in the last predictor layer
        if `two_boxes_for_ar1` is `True`. Defaults to 0.9.
    scales (list, optional): A list of floats containing scaling factors per convolutional predictor layer.
        This list must be one element longer than the number of predictor layers. The first `k` elements are the
        scaling factors for the `k` predictor layers, while the last element is used for the second box
        for aspect ratio 1 in the last predictor layer if `two_boxes_for_ar1` is `True`. This additional
        last scaling factor must be passed either way, even if it is not being used.
        Defaults to `None`. If a list is passed, this argument overrides `min_scale` and
        `max_scale`. All scaling factors must be greater than zero.
    aspect_ratios_global (list, optional): The list of aspect ratios for which anchor boxes are to be
        generated. This list is valid for all predictor layers. The original implementation uses more aspect ratios
        for some predictor layers and fewer for others. If you want to do that, too, then use the next argument instead.
        Defaults to `[0.5, 1.0, 2.0]`.
    aspect_ratios_per_layer (list, optional): A list containing one aspect ratio list for each predictor layer.
        This allows you to set the aspect ratios for each predictor layer individually. If a list is passed,
        it overrides `aspect_ratios_global`. Defaults to `None`.
    two_boxes_for_ar1 (bool, optional): Only relevant for aspect ratio lists that contain 1. Will be ignored otherwise.
        If `True`, two anchor boxes will be generated for aspect ratio 1. The first will be generated
        using the scaling factor for the respective layer, the second one will be generated using
        geometric mean of said scaling factor and next bigger scaling factor. Defaults to `True`, following the original
        implementation.
    limit_boxes (bool, optional): If `True`, limits box coordinates to stay within image boundaries.
        This would normally be set to `True`, but here it defaults to `False`, following the original
        implementation.
    variances (list, optional): A list of 4 floats >0 with scaling factors (actually it's not factors but divisors
        to be precise) for the encoded predicted box coordinates. A variance value of 1.0 would apply
        no scaling at all to the predictions, while values in (0,1) upscale the encoded predictions and values greater
        than 1.0 downscale the encoded predictions. If you want to reproduce the configuration of the original SSD,
        set this to `[0.1, 0.1, 0.2, 0.2]`, provided the coordinate format is 'centroids'. Defaults to `[1.0, 1.0, 1.0, 1.0]`.
    coords (str, optional): The box coordinate format to be used. Can be either 'centroids' for the format
        `(cx, cy, w, h)` (box center coordinates, width, and height) or 'minmax' for the format
        `(xmin, xmax, ymin, ymax)`. Defaults to 'centroids'.
    normalize_coords (bool, optional): Set to `True` if the model is supposed to use relative instead of absolute coordinates,
        i.e. if the model predicts box coordinates within [0,1] instead of absolute coordinates. Defaults to `False`.

Returns:
    model: The Keras SSD model.
    predictor_sizes: A Numpy array containing the `(height, width)` portion
        of the output tensor shape for each convolutional predictor layer. During
        training, the generator function needs this in order to transform
        the ground truth labels into tensors of identical structure as the
        output tensors of the model, which is in turn needed for the cost
        function.

References:
    https://arxiv.org/abs/1512.02325v5
'''

n_predictor_layers = 6 # The number of predictor conv layers in the network

# Get a few exceptions out of the way first
if aspect_ratios_global is None and aspect_ratios_per_layer is None:
    raise ValueError("`aspect_ratios_global` and `aspect_ratios_per_layer` cannot both be None. At least one needs to be specified.")
if aspect_ratios_per_layer:
    if len(aspect_ratios_per_layer) != n_predictor_layers:
        raise ValueError("It must be either aspect_ratios_per_layer is None or len(aspect_ratios_per_layer) == {}, but len(aspect_ratios_per_layer) == {}.".format(n_predictor_layers, len(aspect_ratios_per_layer)))

scales = np.linspace(min_scale, max_scale, n_predictor_layers)

if len(variances) != 4: # We need one variance value for each of the four box coordinates
    raise ValueError("4 variance values must be pased, but {} values were received.".format(len(variances)))
variances = np.array(variances)
if np.any(variances <= 0):
    raise ValueError("All variances must be >0, but the variances given are {}".format(variances))

# Set the aspect ratios for each predictor layer. These are only needed for the anchor box layers.
if aspect_ratios_per_layer:
    aspect_ratios_conv4 = aspect_ratios_per_layer[0]
    aspect_ratios_conv5 = aspect_ratios_per_layer[1]
    aspect_ratios_conv6 = aspect_ratios_per_layer[2]
    aspect_ratios_conv7 = aspect_ratios_per_layer[3]
    aspect_ratios_conv8 = aspect_ratios_per_layer[4]
else:
    aspect_ratios_conv3 = aspect_ratios_global
    aspect_ratios_conv4 = aspect_ratios_global
    aspect_ratios_conv5 = aspect_ratios_global
    aspect_ratios_conv6 = aspect_ratios_global
    aspect_ratios_conv7 = aspect_ratios_global
    aspect_ratios_conv8 = aspect_ratios_global
    

n_boxes = len(aspect_ratios_global)
n_boxes_conv3 = n_boxes    
n_boxes_conv4 = n_boxes
n_boxes_conv5 = n_boxes
n_boxes_conv6 = n_boxes
n_boxes_conv7 = n_boxes
n_boxes_conv8 = n_boxes


# Input image format
img_height, img_width, img_channels = image_size[0], image_size[1], image_size[2]
#Define base VGG16 model

base_model = VGG16(weights='imagenet', include_top=False, input_shape=(img_height, img_width, img_channels))

freeze_layers = ['block1_conv1', 'block1_conv2', 'block2_conv1', 'block2_conv2']
for freeze_layer in freeze_layers:
    if base_model.get_layer(freeze_layer):
        base_model.get_layer(freeze_layer).trainable = False
        
x = base_model.output
    
fc6 = Conv2D(1024,(3,3), dilation_rate=(6,6), name='fc6', padding="same", activation = 'relu')(x)

fc7 = Conv2D(1024, (1,1), padding = "valid", name="fc7", activation = 'relu')(fc6)
    
conv6_1 = Conv2D(256, (1,1), padding = "valid", name = 'block6_conv1', activation = 'relu' )(fc7)
        
conv6_2 = Conv2D(512, (3,3), padding = "same", strides = (2,2), name = 'block6_conv2', activation = 'relu' )(conv6_1)
         
conv7_1 = Conv2D(128, (1,1), padding = "valid", name = 'block7_conv1', activation = 'relu' )(conv6_2)
        
conv7_2 = Conv2D(256, (3,3), padding = "same", strides = (2,2), name = 'block7_conv2', activation = 'relu' )(conv7_1)
        
conv8_1 = Conv2D(128, (1,1), padding = "valid", name = 'block8_conv1', activation = 'relu' )(conv7_2)
        
conv8_2 = Conv2D(256, (3,3), padding = "same", strides = (2,2), name = 'block8_conv2', activation = 'relu' )(conv8_1)
    
conv9_1 = Conv2D(128, (1,1), padding = "valid", name = 'block9_conv1', activation = 'relu' )(conv8_2)
        
conv9_2 = Conv2D(256, (3,3), padding = "same", strides = (2,2), name = 'block9_conv2', activation = 'relu' )(conv9_1)


# The next part is to add the convolutional predictor layers on top of the base network
# that we defined above. Note that I use the term "base network" differently than the paper does.
# To me, the base network is everything that is not convolutional predictor layers or anchor
# box layers. In this case we'll have four predictor layers, but of course you could
# easily rewrite this into an arbitrarily deep base network and add an arbitrary number of
# predictor layers on top of the base network by simply following the pattern shown here.

# Build the convolutional predictor layers on top of conv layers 4, 5, 6, and 7
# We build two predictor layers on top of each of these layers: One for classes (classification), one for box coordinates (localization)
# We precidt `n_classes` confidence values for each box, hence the `classes` predictors have depth `n_boxes * n_classes`
# We predict 4 box coordinates for each box, hence the `boxes` predictors have depth `n_boxes * 4`

# Output shape of `classes`: `(batch, height, width, n_boxes * n_classes)
conv4_3_norm = L2Normalization(gamma_init=20, name='conv4_3_norm')(base_model.get_layer('block4_conv3').output)

x = ZeroPadding2D(padding = (1,2))(conv4_3_norm)
classes3 = Conv2D(n_boxes_conv3 * n_classes, (3, 5), strides=(1, 1), padding="valid", name='classes3')(x)

x = ZeroPadding2D(padding = (1,2))(fc7)
classes4 = Conv2D(n_boxes_conv4 * n_classes, (3, 5), strides=(1, 1), padding="valid", name='classes4')(x)

x = ZeroPadding2D(padding = (1,2))(conv6_2)
classes5 = Conv2D(n_boxes_conv5 * n_classes, (3, 5), strides=(1, 1), padding="valid", name='classes5')(x)

x = ZeroPadding2D(padding = (1,2))(conv7_2)
classes6 = Conv2D(n_boxes_conv6 * n_classes, (3, 5), strides=(1, 1), padding="valid", name='classes6')(x)

x = ZeroPadding2D(padding = (1,2))(conv8_2)
classes7 = Conv2D(n_boxes_conv7 * n_classes, (3, 5), strides=(1, 1), padding="valid", name='classes7')(x)

x = ZeroPadding2D(padding = (1,2))(conv9_2)
classes8 = Conv2D(n_boxes_conv8 * n_classes, (3, 5), strides=(1, 1), padding="valid", name='classes8')(x)

# Output shape of `boxes`: `(batch, height, width, n_boxes * 4)`
x = ZeroPadding2D(padding = (1,2))(conv4_3_norm)
boxes3 = Conv2D(n_boxes_conv3 * 4, (3, 5), strides=(1, 1), padding="valid", name='boxes3')(x)

x = ZeroPadding2D(padding = (1,2))(fc7)
boxes4 = Conv2D(n_boxes_conv4 * 4, (3, 5), strides=(1, 1), padding="valid", name='boxes4')(x)

x = ZeroPadding2D(padding = (1,2))(conv6_2)
boxes5 = Conv2D(n_boxes_conv5 * 4, (3, 5), strides=(1, 1), padding="valid", name='boxes5')(x)

x = ZeroPadding2D(padding = (1,2))(conv7_2)
boxes6 = Conv2D(n_boxes_conv6 * 4, (3, 5), strides=(1, 1), padding="valid", name='boxes6')(x)

x = ZeroPadding2D(padding = (1,2))(conv8_2)
boxes7 = Conv2D(n_boxes_conv7 * 4, (3, 5), strides=(1, 1), padding="valid", name='boxes7')(x)

x = ZeroPadding2D(padding = (1,2))(conv9_2)
boxes8 = Conv2D(n_boxes_conv8 * 4, (3, 5), strides=(1, 1), padding="valid", name='boxes8')(x)

# Generate the anchor boxes
# Output shape of `anchors`: `(batch, height, width, n_boxes, 8)`
anchors3 = AnchorBoxes(img_height, img_width, this_scale=scales[0],aspect_ratios=aspect_ratios_conv3, variances=variances, name='anchors3')(boxes3)
anchors4 = AnchorBoxes(img_height, img_width, this_scale=scales[1],aspect_ratios=aspect_ratios_conv4, variances=variances, name='anchors4')(boxes4)
anchors5 = AnchorBoxes(img_height, img_width, this_scale=scales[2],aspect_ratios=aspect_ratios_conv5, variances=variances, name='anchors5')(boxes5)
anchors6 = AnchorBoxes(img_height, img_width, this_scale=scales[3],aspect_ratios=aspect_ratios_conv6, variances=variances, name='anchors6')(boxes6)
anchors7 = AnchorBoxes(img_height, img_width, this_scale=scales[4],aspect_ratios=aspect_ratios_conv7, variances=variances, name='anchors7')(boxes7)
anchors8 = AnchorBoxes(img_height, img_width, this_scale=scales[5],aspect_ratios=aspect_ratios_conv8, variances=variances, name='anchors8')(boxes8)
# Reshape the class predictions, yielding 3D tensors of shape `(batch, height * width * n_boxes, n_classes)`
# We want the classes isolated in the last axis to perform softmax on them
classes3_reshaped = Reshape((-1, n_classes), name='classes3_reshape')(classes3)
classes4_reshaped = Reshape((-1, n_classes), name='classes4_reshape')(classes4)
classes5_reshaped = Reshape((-1, n_classes), name='classes5_reshape')(classes5)
classes6_reshaped = Reshape((-1, n_classes), name='classes6_reshape')(classes6)
classes7_reshaped = Reshape((-1, n_classes), name='classes7_reshape')(classes7)
classes8_reshaped = Reshape((-1, n_classes), name='classes8_reshape')(classes8)

# Reshape the box coordinate predictions, yielding 3D tensors of shape `(batch, height * width * n_boxes, 4)`
# We want the four box coordinates isolated in the last axis to compute the smooth L1 loss
boxes3_reshaped = Reshape((-1, 4), name='boxes3_reshape')(boxes3)
boxes4_reshaped = Reshape((-1, 4), name='boxes4_reshape')(boxes4)
boxes5_reshaped = Reshape((-1, 4), name='boxes5_reshape')(boxes5)
boxes6_reshaped = Reshape((-1, 4), name='boxes6_reshape')(boxes6)
boxes7_reshaped = Reshape((-1, 4), name='boxes7_reshape')(boxes7)
boxes8_reshaped = Reshape((-1, 4), name='boxes8_reshape')(boxes8)
# Reshape the anchor box tensors, yielding 3D tensors of shape `(batch, height * width * n_boxes, 8)`
anchors3_reshaped = Reshape((-1, 8), name='anchors3_reshape')(anchors3)
anchors4_reshaped = Reshape((-1, 8), name='anchors4_reshape')(anchors4)
anchors5_reshaped = Reshape((-1, 8), name='anchors5_reshape')(anchors5)
anchors6_reshaped = Reshape((-1, 8), name='anchors6_reshape')(anchors6)
anchors7_reshaped = Reshape((-1, 8), name='anchors7_reshape')(anchors7)
anchors8_reshaped = Reshape((-1, 8), name='anchors8_reshape')(anchors8)


# Concatenate the predictions from the different layers and the assosciated anchor box tensors
# Axis 0 (batch) and axis 2 (n_classes or 4, respectively) are identical for all layer predictions,
# so we want to concatenate along axis 1
# Output shape of `classes_merged`: (batch, n_boxes_total, n_classes)
classes_concat = Concatenate(axis=1, name='classes_concat')([classes3_reshaped,
                                                             classes4_reshaped,
                                                             classes5_reshaped,
                                                             classes6_reshaped,
                                                             classes7_reshaped,
                                                             classes8_reshaped])

# Output shape of `boxes_final`: (batch, n_boxes_total, 4)
boxes_concat = Concatenate(axis=1, name='boxes_concat')([boxes3_reshaped,
                                                         boxes4_reshaped,
                                                         boxes5_reshaped,
                                                         boxes6_reshaped,
                                                         boxes7_reshaped,
                                                         boxes8_reshaped])

# Output shape of `anchors_final`: (batch, n_boxes_total, 8)
anchors_concat = Concatenate(axis=1, name='anchors_concat')([anchors3_reshaped,
                                                             anchors4_reshaped,
                                                             anchors5_reshaped,
                                                             anchors6_reshaped,
                                                             anchors7_reshaped,
                                                             anchors8_reshaped])

# The box coordinate predictions will go into the loss function just the way they are,
# but for the class predictions, we'll apply a softmax activation layer first
classes_softmax = Activation('softmax', name='classes_softmax')(classes_concat)

# Concatenate the class and box coordinate predictions and the anchors to one large predictions tensor
# Output shape of `predictions`: (batch, n_boxes_total, n_classes + 4 + 8)
predictions = Concatenate(axis=2, name='predictions')([classes_softmax, boxes_concat, anchors_concat])


#model.load_weights('model_detection_Weights.h5')

model = Model(inputs=base_model.input, outputs=predictions)

# Get the spatial dimensions (height, width) of the convolutional predictor layers, we need them to generate the default boxes
# The spatial dimensions are the same for the `classes` and `boxes` predictors
predictor_sizes = np.array([classes3._keras_shape[1:3],
                            classes4._keras_shape[1:3],
                            classes5._keras_shape[1:3],
                            classes6._keras_shape[1:3],
                            classes7._keras_shape[1:3],
                            classes8._keras_shape[1:3]])


return model, predictor_sizes

`

@deepakdebug deepakdebug reopened this Jun 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant