service for less latency #3

MartinThoma · 2016-04-27T14:05:52Z

I am currently writing the "Pflichtenheft" (as we have to for the practical). I think the mediseg project should have its own executable. This program should start and load the model before we feed it with images to reduce latency. We could use a web service approach as with sst before, but that is not so clean. Cleaner would be a system service.

I found a couple of sources how to create a Python service:

However, I'm currently not quite sure how to call the service and give it parameters.

Essentially, I am thinking of the following workflow:

User (or system) starts the medisegd daemon
Daemon loads model, makes as much as possible and keeps the important stuff in memory.
User calls mediseg --input image.jpg --output segmentation.jpg as often as he wants with little latency

The text was updated successfully, but these errors were encountered:

MartinThoma · 2016-04-27T14:19:44Z

Ok, this is pretty awesome: Pyro

MarvinTeichmann · 2016-04-27T14:29:56Z

Well, whether we need a service kind of depends on what they do expect of us. So far, for deployment I have just used an eval.py. This file will load the model, takes the whole Epoche of testing images and runs a variety of evaluation. It will compute an overlap of prediction and image of every image, measure the prediction time and write out some of those images to disk (every 20th, arbitrarily chosen or deterministically depending on flag). It also computes a variety of statistic such as ROC curve and accuracy.

I have not ported the eval.py to the newest TensorVision version, but who can find an implementation eval.py.

I think this might be enough, depending on Sebastians goal. The eval.py shows clearly howto use the model and for deployment (i.e. as part of a Robotic System or to use it inside a stream) they will need to write their own code around our model anyway.

I like the idea of having a WebInterface though, as it provides a more interactive way of accessing the model and is a nice way of presenting the results. (Also for externals, so they might but it on there Website or Start a Demo on some Open House Day).

MarvinTeichmann · 2016-04-27T14:42:21Z

One further remark. A very nice advantage of Tensorflow vs. Theano is that there is almost no compile time of the model. An external service precompiling the model is therefore much less needed in Tensorflow then in Theano.

MartinThoma · 2016-04-27T14:51:42Z

@MarvinTeichmann Lets say we end up having a 100 MB model. According to tomsguide a HDD has an average read speed of 80-160 MB/s. So it costs us more than a second for just reading the model, assuming we can actually use the best speed possible. For mediseg, I would add to the Pflichtenheft as an extended aim of the project to segment on image in less than 0.3 seconds (or less, depending on what is realtime). Hence a second of latency is not acceptable.

I don't think creating a daemon is much work and I think it is a good idea to get rid of a lot of the latency.

I like the idea of having a WebInterface though, as it provides a more interactive way of accessing the model and is a nice way of presenting the results. (Also for externals, so they might but it on there Website or Start a Demo on some Open House Day).

Well, with pyro this seems to be quite easy to do. You can connect to the daemon much easier than with a web service. So one is free to add a web server or a iPython notebook. I'll write a blog article about how to use pyro today.

MarvinTeichmann · 2016-04-27T15:00:01Z

Sure, adding the constraint of 0.3 seconds sounds reasonable. However I think an eval.py as depicted early would already satisfy this constrained, right? The evaluation modul will not load the model for each images, but load it ones and runs evaluation on a whole folder of images. The loading time of the model is not measured, measurement only starts after the tf.seassion is started.

MartinThoma · 2016-04-27T15:05:14Z

The loading time of the model is not measured, measurement only starts after the tf.seassion is started.

This is cheating. It is of no use for a surgeon to say "hey, you know, your image was actually processed in only 0.0001 seconds. It only took a while to start up. And you have to start the service every single time your using the model. Or start coding yourself." Fixing this issue is not much work, so I will do it.

MarvinTeichmann · 2016-04-27T15:16:47Z

And you have to start the service every single time your using the model.

The model is not loaded on every single image. It is loaded once and all desired testing images are processed. For science purpose this is what is desired. For actual usage one would integrate this into a process pipeline and feed the model with some kind of stream. I do not think that it is useful to have a system daemon in that case. One would rather modify the evaluation module to take stream instead of a directory. (This is at least what they do in the FZI). For this purpose providing the evaluation module is much cleaner, as it is easier to understand how the model is loaded and used.

Fixing this issue is not much work, so I will do it.

Sure, if you really want to do that, go ahead. I still think that it providing this is not to useful. On top, what is the disadvantage of providing a Web-based system? I think providing a Web-service would add extra value to the project and it seems pretty common to do that to easily show the performance of the model to others example.

MartinThoma · 2016-04-28T10:02:04Z

The model is not loaded on every single image. It is loaded once and all desired testing images are processed. For science purpose this is what is desired. For actual usage one would integrate this into a process pipeline and feed the model with some kind of stream.

Yes, this is what I am talking about. Batch processing does work for the scientific part, but not in production. In production, when you want to apply the network in real time, you only get the images one by one. Currently, TensorVision does not support "staying alive" for multiple images.

I do not think that it is useful to have a system daemon in that case. One would rather modify the evaluation module to take stream instead of a directory.

Where is the difference? So with a system daemon I would expect the user to write code like this:

#!/usr/bin/env python

"""Python video stream editing."""

import cv2
import numpy as np


def edit_frame(rgb):
    """The neural network magic happens here."""
    # Here we would apply the network to the stream
    # We can use Pyro to get the stream and edit it.
    #
    # e.g.
    # model = Pyro.core.getProxyForURI("PYROLOC://localhost:7766/tensorvision")
    # return model.segment(rgb)
    gray_c = rgb
    gray = np.dot(rgb[..., :3], [0.299, 0.587, 0.114]).astype(int)
    gray_c[..., ..., 0] = gray
    gray_c[..., ..., 1] = gray
    gray_c[..., ..., 2] = gray
    return gray_c


cv2.namedWindow("preview")
vc = cv2.VideoCapture(0)

if vc.isOpened():  # try to get the first frame
    rval, frame = vc.read()
else:
    rval = False

while rval:
    frame = edit_frame(frame)
    cv2.imshow("preview", frame)
    rval, frame = vc.read()
    key = cv2.waitKey(20)
    if key == 27:  # exit on ESC
        break
cv2.destroyWindow("preview")

However, when you have to edit the eval.py code, then the user has to look at TensorVisions code. Which is something he should not be forced to do. My solution is a 3-liner (import pyro, connect to the TensorVision daemon, feed him with the frame and receive the result)

MarvinTeichmann · 2016-04-28T15:23:16Z

I think we are arguing about a different point. We have defined the targeted audience as Researchers, Scientists and Developers. TV is supposed to help model development and not its deployment. And I do not believe that TV can do well regarding deployment. If you want to user your train model in a real-time system, you will most likely want to use the Tensorflow C++ interface to directly integrate the model in the system. The Interface allows to load the saved computational graph and weights which where trained with TV. (Btw. this is how the FZI plans to integrate TV in CoCar. Training on PC with Python; Loading graph_def and weights using C++ on CoCar). Using the C++ Interface will give much better latency and performance than system calls. System Calls are usually avoided in real-time Systems-Calls are considered to be evil, as they do not give much performance guarantees and are much slower than function calls.

On top I believe that model deployment is highly application specific. Who says, that efficient batching is not an option? What about a System with multiply (equivalent) cameras, or a system where the trade-off between latency and throughput is a valid choose. On top I believe that it is usually desired to properly integrate the model in the overall System. I do not see a good way to generalize deployment and I do not think that this is the primary task of TensorVision.

Yes, this is what I am talking about. Batch processing does work for the scientific part, but not in production. In production, when you want to apply the network in real time, you only get the images one by one.

Well, the eval.py does process the images individually. So the time measurement is fairly simulates a System which process images one-by-one.

MarvinTeichmann mentioned this issue Apr 27, 2016

Integration with TensorVision #4

Open

MarvinTeichmann mentioned this issue Apr 28, 2016

Features of TensorVision TensorVision/TensorVision#12

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

service for less latency #3

service for less latency #3

MartinThoma commented Apr 27, 2016 •

edited

Loading

MartinThoma commented Apr 27, 2016

MarvinTeichmann commented Apr 27, 2016 •

edited

Loading

MarvinTeichmann commented Apr 27, 2016

MartinThoma commented Apr 27, 2016 •

edited

Loading

MarvinTeichmann commented Apr 27, 2016

MartinThoma commented Apr 27, 2016

MarvinTeichmann commented Apr 27, 2016 •

edited

Loading

MartinThoma commented Apr 28, 2016 •

edited

Loading

MarvinTeichmann commented Apr 28, 2016

service for less latency #3

service for less latency #3

Comments

MartinThoma commented Apr 27, 2016 • edited Loading

MartinThoma commented Apr 27, 2016

MarvinTeichmann commented Apr 27, 2016 • edited Loading

MarvinTeichmann commented Apr 27, 2016

MartinThoma commented Apr 27, 2016 • edited Loading

MarvinTeichmann commented Apr 27, 2016

MartinThoma commented Apr 27, 2016

MarvinTeichmann commented Apr 27, 2016 • edited Loading

MartinThoma commented Apr 28, 2016 • edited Loading

MarvinTeichmann commented Apr 28, 2016

MartinThoma commented Apr 27, 2016 •

edited

Loading

MarvinTeichmann commented Apr 27, 2016 •

edited

Loading

MartinThoma commented Apr 27, 2016 •

edited

Loading

MarvinTeichmann commented Apr 27, 2016 •

edited

Loading

MartinThoma commented Apr 28, 2016 •

edited

Loading