Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in gt_util.sample_random_batch(batch_size=32, input_size=model.image_size) #21

Open
kamae opened this issue Jun 24, 2019 · 20 comments

Comments

@kamae
Copy link

kamae commented Jun 24, 2019

ssd_detectors-master\ssd_data.py in preprocess(img, size)
628 img = img.astype(np.float32)
629 mean = np.array([104,117,123])
--> 630 img -= mean[np.newaxis, np.newaxis, :]
631 return img
632
ValueError: operands could not be broadcast together with shapes (512,512) (1,1,3) (512,512)

@mvoelk
Copy link
Owner

mvoelk commented Jun 24, 2019

I guess your image data is grayscale with shape (512,512) or (512,512,1). I always used RGB images (e.g. shape (512,512,3)) and hard coded the channel means for compatibility with caffe models.

@kamae
Copy link
Author

kamae commented Jun 24, 2019 via email

@mvoelk
Copy link
Owner

mvoelk commented Jun 24, 2019

Okay, what you are looking for is probably in SL_predict.ipynb under 'Real world images', but with SSD Model and PriorUtility.

For training with your own dataset, you should write a custom parser (GTUtility), like it is done in data_voc.py.

@kamae
Copy link
Author

kamae commented Jun 24, 2019 via email

@mvoelk
Copy link
Owner

mvoelk commented Jun 24, 2019

import numpy as np
import matplotlib.pyplot as plt
import os
import glob
import cv2

from ssd_model import SSD300, SSD512
from ssd_utils import PriorUtil
from ssd_data import preprocess
from utils.model import load_weights

%matplotlib inline

# MS COCO
from data_coco import GTUtility
gt_util = GTUtility('./data/COCO/', validation=True)

# SDD512
model = SSD512(num_classes=gt_util.num_classes)
weights_path = './models/ssd512_coco_weights_fixed.hdf5'; confidence_threshold = 0.7

load_weights(model, weights_path)
prior_util = PriorUtil(model)

# predict 
inputs = []
images = []

img_paths = glob.glob('./data/images/*.jpg')

for img_path in img_paths:
    img = cv2.imread(img_path)
    inputs.append(preprocess(img, model.image_size))
    h, w = model.image_size
    img = cv2.resize(img, (w,h), cv2.INTER_LINEAR).astype('float32')
    img = img[:, :, (2,1,0)] # BGR to RGB
    img /= 255
    images.append(img)
    
inputs = np.asarray(inputs)

preds = model.predict(inputs, batch_size=1, verbose=1)

for i in range(len(images)):
    print(img_paths[i])
    plt.figure(figsize=[8]*2, frameon=True)
    plt.imshow(images[i])
    res = prior_util.decode(preds[i], confidence_threshold=0.5)
    prior_util.plot_results(res, classes=gt_util.classes)
    plt.axis('off')
    plt.show()

The converted caffe models may require fine tuning and the threshold was chosen more or less ad hoc.

@kamae
Copy link
Author

kamae commented Jun 24, 2019 via email

@mvoelk
Copy link
Owner

mvoelk commented Jun 24, 2019

prior_util.plot_results(res, classes=gt_util.classes, show_labels=True)

@kamae
Copy link
Author

kamae commented Jun 24, 2019 via email

@kamae
Copy link
Author

kamae commented Jun 27, 2019 via email

@mvoelk
Copy link
Owner

mvoelk commented Jun 27, 2019

I tried MobileNet V1, but I'm not sure if it is working...

from keras.models import Model
from keras.applications import MobileNet
from keras.layers import Activation
from keras.layers import Conv2D
from keras.layers import SeparableConv2D
from keras.layers import BatchNormalization


def ssd300_mobilenet_body(x):
    
    source_layers = []
    
    mobilenet = MobileNet(input_shape=(224,224,3), include_top=False, weights='imagenet')
    x = Model(inputs=mobilenet.input, outputs=mobilenet.get_layer('conv_dw_11_relu').output)(x)

    x = Conv2D(512, (1, 1), padding='same', name='conv11')(x)
    x = BatchNormalization(name='bn11')(x)
    x = Activation('relu')(x)
    source_layers.append(x)
    
    x = SeparableConv2D(512, (3, 3),strides=(2, 2), padding='same', name='conv12dw')(x)
    x = BatchNormalization(name='bn12dw')(x)
    x = Activation('relu')(x)
    x = Conv2D(1024, (1, 1), padding='same', name='conv12')(x)
    x = BatchNormalization(name='bn12')(x)
    x = Activation('relu')(x)
    x = SeparableConv2D(1024, (3, 3), padding='same',name='conv13dw')(x)
    x = BatchNormalization(name='bn13dw')(x)
    x = Activation('relu')(x)
    x = Conv2D(1024, (1, 1), padding='same', name='conv13')(x)
    x = BatchNormalization(name='bn13')(x)
    x = Activation('relu')(x)
    source_layers.append(x)
    
    x = Conv2D(256, (1, 1), padding='same', name='conv14_1')(x)
    x = BatchNormalization(name='bn14_1')(x)
    x = Activation('relu')(x)
    x = Conv2D(512, (3, 3), strides=(2, 2), padding='same', name='conv14_2')(x)
    x = BatchNormalization(name='bn14_2')(x)
    x = Activation('relu')(x)
    source_layers.append(x)
    
    x = Conv2D(128, (1, 1), padding='same', name='conv15_1')(x)
    x = BatchNormalization(name='bn15_1')(x)
    x = Activation('relu')(x)
    x = Conv2D(256, (3, 3), strides=(2, 2), padding='same', name='conv15_2')(x)
    x = BatchNormalization(name='bn15_2')(x)
    x = Activation('relu')(x)
    source_layers.append(x)
    
    x = Conv2D(128, (1, 1), padding='same', name='conv16_1')(x)
    x = BatchNormalization(name='bn16_1')(x)
    x = Activation('relu')(x)
    x = Conv2D(256, (3, 3), strides=(2, 2), padding='same', name='conv16_2')(x)
    x = BatchNormalization(name='bn16_2')(x)
    x = Activation('relu')(x)
    source_layers.append(x)
    
    x = Conv2D(64, (1, 1), padding='same', name='conv17_1')(x)
    x = BatchNormalization(name='bn17_1')(x)
    x = Activation('relu')(x)
    x = Conv2D(128, (3, 3), strides=(2, 2), padding='same', name='conv17_2')(x)
    x = BatchNormalization(name='bn17_2')(x)
    x = Activation('relu')(x)
    source_layers.append(x)
    
    return source_layers


def SSD300_mobile(input_shape=(300, 300, 3), num_classes=21, softmax=True):
    """SSD300 with MobileNet architecture.
    
    Based on the Keras implementationo of MobileNet.
    
    # References
        https://arxiv.org/abs/1704.04861
    """
    
    x = input_tensor = Input(shape=input_shape)
    source_layers = ssd300_mobilenet_body(x)
    
    num_priors = [4, 6, 6, 6, 4, 4]
    normalizations = [20, 20, 20, 20, 20, 20]

    output_tensor = multibox_head(source_layers, num_priors, num_classes, normalizations, softmax)
    model = Model(input_tensor, output_tensor)
    model.num_classes = num_classes

    # parameters for prior boxes
    model.image_size = input_shape[:2]
    model.source_layers = source_layers
    model.aspect_ratios = [[1,2,1/2], [1,2,1/2,3,1/3], [1,2,1/2,3,1/3], [1,2,1/2,3,1/3], [1,2,1/2], [1,2,1/2]]
    model.minmax_sizes = [(30, 60), (60, 111), (111, 162), (162, 213), (213, 264), (264, 315)]
    model.steps = [8, 16, 32, 64, 100, 300]
    model.special_ssd_boxes = True
    
    return model

If you get SSD running with MobileNet V2, I would appreciate if you could share your findings.

@kamae
Copy link
Author

kamae commented Jun 28, 2019 via email

@mvoelk
Copy link
Owner

mvoelk commented Jun 28, 2019

conv1_1 Weights 3x3x3x64 relu 64
conv1_2 Weights 3x3x64x64 relu 64
Question 1: depth changed from 3(rgb?) to 64 but not explicitly written in ssd_model.py

Yes, the weights have always shape (kernel_size, kernel_size, input_channels, output_channels). 3 is the number of input channels (BGR) defined in SSD512.

The missing Conv2_1 and Conv2_2 layers in Fig. 3.5 are my mistake...

@mvoelk
Copy link
Owner

mvoelk commented Jul 1, 2019

The tensors at the branching point are collected in source_layers. multibox_head adds the prediction paths.

@kamae
Copy link
Author

kamae commented Jul 13, 2019 via email

@mvoelk
Copy link
Owner

mvoelk commented Jul 13, 2019

Am I right?

Yes

SegLink is actually not intended for the detection of curved text instances. Curved text would require custom encoding and decoding procedure, as well as another representation in the GTUtility and rectification before the recognition stage. It should also work to just write a new decoder and use it with the SynthText models, but I do not have the time for implementing this. arXiv:1807.01544 is probably the approach that comes closest to this idea.

If you just need a custom parser for a dataset with oriented bounding boxes, #12...

@kamae
Copy link
Author

kamae commented Jul 14, 2019 via email

@mvoelk
Copy link
Owner

mvoelk commented Jul 15, 2019

I have no idea how to access the email attachments from the github issues, but I hope you find the answer to your question in #1 or #8.

@kamae
Copy link
Author

kamae commented Jul 17, 2019 via email

@mvoelk
Copy link
Owner

mvoelk commented Jul 17, 2019

"input_height = 32\n",

@kamae
Copy link
Author

kamae commented Jul 17, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants