Author: Abhishek Dutta, Version: Jan. 2017
This code documentation is based on VIA-1.0.x version. While there are some major differences between VIA-1.0.x and VIA-2.0.y codebase, this code documentation is still useful to understand the basic architecture and working of the VIA software.
VGG Image Annotator (VIA) application is contained in a single html file with definitions of CSS style and Javascript code blocks.
The VIA application code via.html
has the following structure:
<!DOCTYPE html>
<html lang="en">
<head>
[source code license declaration]
[html meta tags definition]
[css definition]
</head>
<body onload="_via_init()" onresize="_via_update_ui_components()" >
[html content definition]
[javascript definition]
</body>
</html>
The _via_init()
function is the main entry point to the javascript
code. When the browser window is resized, we have to recompute the image
canvas dimensions. Therefore, the function _via_update_ui_components()
is tied to onresize
event which is fired when browser window is
resized.
Nearly 3000 lines of javascript source code is required to support all the functionalities provided by the VIA image annotator. A majority of this javascript is needed to support an interactive user interface. Only a small portion of this javascript code is needed to support the drawing, selection, resizing, deletion, etc. of image regions of varying shapes -- rectangle, circle, ellipse and polygon.
Now we describe how some of the core actions (like loading images, drawing regions, etc) are facilitated by the javascript codebase.
- Description of VIA Project JSON File
- Core Data Structures
- Loading Images
- Displaying Image
- Moving to Next/Previous Images
- Capturing User's Mouse Interactions
- Rendering Regions
- Moving and Resizing Regions
- Updating Attribute Value
- Adding New Attributes
- Download Annotations
- Importing Annotations
- Building Applications using VIA as a Module
- Source Code License
A VIA project is simply a JSON file containing all the details associated with the project. Here is an annotated example of a project JSON file.
{
"_via_settings": { # settings used by the VIA application
"ui": {
"annotation_editor_height": 25,
"annotation_editor_fontsize": 0.8,
"leftsidebar_width": 18,
"image_grid": {
"img_height": 80,
"rshape_fill": "none",
"rshape_fill_opacity": 0.3,
"rshape_stroke": "yellow",
"rshape_stroke_width": 2,
"show_region_shape": true,
"show_image_policy": "all"
},
"image": {
"region_label": "__via_region_id__",
"region_color": "__via_default_region_color__",
"region_label_font": "10px Sans",
"on_image_annotation_editor_placement": "NEAR_REGION"
}
},
"core": {
"buffer_size": 18,
"filepath": {},
"default_filepath": ""
},
"project": {
"name": "via_project_16Feb2021_13h17m"
}
},
"_via_img_metadata": { # stores information about all images and their associated metadata
"adutta_swan.jpg-1": { # each image is indexed using a unique key: FILENAME-FILESIZE
"filename": "adutta_swan.jpg", # image filename
"size": -1, # file size in bytes (-1 indicates unknown)
"regions": [ # an array of all manually defined regions (only 1 region here)
{
"shape_attributes": { # shape of the first region
"name": "rect", # region shape: {rect, polygon, circle, ellipse, point, ...}
"x": 108, # x-coordinate of the top-left point
"y": 123, # y-coordinate of the top-left point
"width": 283, # width of rectangle
"height": 150 # height of rectangle
},
"region_attributes": { # attributes (i.e. metadata) of the first region
"name": "Swan" # "name" is a region attribute and it has a value of "Swan"
}
}
],
"file_attributes": { # attributes associated with the full image
"caption": "Swan in lake Geneve", # "caption" is a file attribute and it has a value of "Swan in ..."
}
},
"wikimedia_death_of_socrates.jpg-1": {
"filename": "wikimedia_death_of_socrates.jpg",
"size": -1,
"regions": [], # this image has no regions (so far)
"file_attributes": { # the "caption" file attribute for this image has a user defined value
"caption": "The Death of Socrates by David",
}
}
},
"_via_attributes": { # attributes that can be attached to image and its regions
"region": { # definition of region attributes (i.e. attributes belonging to an image region)
"name": { # "name" is region attribute which defines the name of the object contained in that region
"type": "text", # attribute type can be {text, dropdown, radio, checkbox, ...}
"description": "Name of the object",
"default_value": "not_defined"
}
},
"file": { # file attributes correspond to the full image (and not image region)
"caption": { # "caption" is a file attribute
"type": "text",
"description": "",
"default_value": ""
}
}
},
"_via_data_format_version": "2.0.10",
"_via_image_id_list": [ # this contains the list of image-id present in the "_via_img_metadata" dictionary
"adutta_swan.jpg-1",
"wikimedia_death_of_socrates.jpg-1"
]
}
"Bad programmers worry about the code. Good programmers worry about data structures and their relationships." Linus Torvalds
The function _via_get_image_id()
generates a unique image_id
for
each image by combining the image filename and image size in bytes. For
example, the file photo.jpg
of size 16454
bytes will
get assigned an image-id photo.jpg16454
.
The annotation data corresponding to each image is stored in the object
_via_img_metadata
indexed by its unqiue image_id
. Each such entry in
_via_img_metadata
is another object of type ImageMetadata()
having
the following properties:
fileref
: a reference to the local file uploaded by userbase64_img_data
: contains either the image URL or image data represented in base64 formatfile_attributes
: a Map() of image file's attributes. For example, image captions can be represented by file attributes as
Map { 'caption': 'a white football flying over a red car' }
regions
: an array ofImageRegion()
objects
Each image can have multiple regions of varying shape and attributes.
Therefore, each array entry in ImageMetadata.regions[]
contains an object of
type ImageRegion()
with the following properties corresponding to each
region (rectangular, circular, polygon, etc) defined in the image:
shape_attributes
: a Map() of attributes defining the shape of the region. For example, a rectangular region has the following shape attributes
Map {'name': 'rect', 'x': '115', 'y': '210', 'width': '100', 'height': '200' }
region_attributes
: a Map() of attributes corresponding to the region. For example, an image region containing a red car can have the following attributes
Map { 'object\_name': 'car', 'object\_color': 'red' }
is_user_selected
: a state variable indicating if this region has been selected by the user
Here is an example of how VIA would store file attribute and two region
annotations for a file photo.jpg
in _via_img_metadata
object:
var img_id = _via_get_image_id('photo.jpg', 16454);
var _via_img_metadata = {};
_via_img_metadata[img_id] = new ImageMetadata('', 'photo.jpg', 16454);
_via_img_metadata[img_id].file_attributes.set('caption', 'a white football flying over a red car');
_via_img_metadata[img_id].regions[0] = new ImageRegion();
_via_img_metadata[img_id].regions[0].shape_attributes.set('name', 'rect');
_via_img_metadata[img_id].regions[0].shape_attributes.set('x', '115');
_via_img_metadata[img_id].regions[0].shape_attributes.set('y', '210');
_via_img_metadata[img_id].regions[0].shape_attributes.set('width', '100');
_via_img_metadata[img_id].regions[0].shape_attributes.set('height', '200');
_via_img_metadata[img_id].regions[0].region_attributes.set('object_name', 'car');
_via_img_metadata[img_id].regions[0].region_attributes.set('object_color', 'red');
_via_img_metadata[img_id].regions[1] = new ImageRegion();
_via_img_metadata[img_id].regions[1].shape_attributes.set('name', 'circle');
_via_img_metadata[img_id].regions[1].shape_attributes.set('cx', '50');
_via_img_metadata[img_id].regions[1].shape_attributes.set('cy', '90');
_via_img_metadata[img_id].regions[1].shape_attributes.set('r', '20');
_via_img_metadata[img_id].regions[1].region_attributes.set('object_name', 'football');
_via_img_metadata[img_id].regions[1].region_attributes.set('object_color', 'white');
For the current image, we keep a copy of all region's coordinates in the
canvas space inside the object _via_canvas_regions
. Recall that the
canvas space is related to the original image space by
_via_canvas_scale
(which is a scaling factor determined by the current
browser window size and zoom level). In other words,
x_image_space = x_canvas_space * _via_canvas_scale.
Maintaining such a data structure avoids unnecessary re-computation of region
coordinates in canvas space. Therefore, you will notice that
_via_canvas_regions[i].shape_attributes
is used when rendering region
boundaries.
VIA uses the canvas
to render image, region boundaries and region labels. The image currently
being displayed is rendered by _via_img_canvas
while the region
boundaries and region labels are rendered by the canvas
_via_reg_canvas
. These two canvas are overlaid with _via_reg_canvas
being on the top.
The Set()
of image region attributes is stored in the variable _via_region_attributes
.
These attributes form the keys of the map
_via_img_metadata[img_id].regions[i].region_attributes
.
A set of variables are used to maintain the state of the VIA application. These
state variable form a crucial component of user interactions. For
example, _via_is_user_drawing_region
is true
when the user is drawing a region.
Loading (or adding) an image into VIA is initiated by
sel_local_images()
. Therefore, this function is attached to the
onclick
event of the menu entry Load or Add Images. The
function sel_local_images()
invokes local file
selector
called invisible_file_input
which is configured to triggers the
function store_local_img_ref()
when the user finishes selecting local
files. The function store_local_img_ref()
performs the following
tasks:
- Computes the
img_id
using the function_via_get_image_id()
. - Inserts an object of type
ImageMetadata()
in_via_img_metadata[img_id]
. The_via_img_metadata[img_id].fileref
property of this object containts a reference to the local file selected by the user. - Triggers
show_image()
(further discussed in Displaying Image section) to display the newly loaded image.
The VIA application displays one of the pre-loaded images using
show_image()
. This function uses
FileReader()
to load a local image using the file reference stored in
_via_img_metadata[img_id].fileref
(see Loading
Images section).
When image loading is complete, the image is scaled by
_via_canvas_scale
to fit the canvas display area in browser window. A
reference to current
Image()
is stored in _via_current_image
. The 2D drawing context of the canvas
_via_image_canvas
is stored in _via_img_ctx
. Rendering of the
_via_current_image
is handled by drawImage()
method of this 2D
context.
The base64_img_data
property of object ImageMetadata()
can store:
(a) raw image data represented in base64
format, (b) URL of the image. The base64_img_data
property is used
when the images to be annotated are hosted in a publicly accessible
server or the image data is embedded in the VIA application code. If
_via_img_metadata[img_id].base64_img_data
is available, the image is
loaded from this resource, otherwise the image is loaded from
_via_img_metadata[img_id].fileref
.
The methods move_to_next_image()
and move_to_prev_image()
handle the
user requests to switch display to next or previous image. This boils
down to invoking show_image()
(see Displaying Image
section) with appropriate image_index
.
The following event listeners attached to via_reg_canvas
handles user
interactions using a mouse:
-
_via_reg_canvas.addEventListener('dblclick', function(e) { ... }
: A double click on an image region displays the region attribute panel at the bottom to allow the user to add or update region attribute values. -
_via_reg_canvas.addEventListener('mousedown', function(e) { ... }
: Handles user interactions involving the following mouse gesture: mouse cursor is dragged while pressing the mouse button. This corresponds to actions such as drawing a region boundary (except polygon), resizing or moving regions. Furthermore, amousedown
may also indicate a prelude to other events like a region select/unselect. -
_via_reg_canvas.addEventListener('mouseup', function(e) { ... }
:- The
mouseup
event may indicate that the user has finished drawing a region, moving a region, resizing a region, etc. - A combination of
mousedown
andmouseup
events within a small region indicates a single mouse click which indicates the user's intention to select/unselect a region, define a vertex of polygon, or define a point.
- The
-
_via_reg_canvas.addEventListener('mouseover, function(e) { ... }
: Forces re-rendering of region boundaries and labels. -
_via_reg_canvas.addEventListener('mousemove', function(e) { ... }
:- When user is drawing, moving or resizing a region, a
mousemove
event renders the region at new positions as the mouse cursor moves to give interactive feedback to the user. - When mouse cursor is moved over the region edge, this methods changes the mouse cursor style to indicate region resize mode.
- In the polygon region shape mode, this function draws a temporary edge from last defined polygon vertex to current user position.
- When user is drawing, moving or resizing a region, a
Each region draw, resize, move or select/unselect triggers re-rendering
of region boundaries and labels using _via_redraw_reg_canvas()
(see
Rendering Regions section).
_via_redraw_img_canvas()
renders images onto the canvas
_via_img_canvas
using drawImage().
Image is re-rendered only when the user zoom's in/out.
Rendering of region boundaries is performed by _via_redraw_reg_canvas
.
For example, rectangular and circular regions are drawn using the 2D
context _via_reg_ctx
as follows:
function _via_draw_rect(x, y, w, h) {
_via_reg_ctx.beginPath();
_via_reg_ctx.moveTo(x , y);
_via_reg_ctx.lineTo(x+w, y);
_via_reg_ctx.lineTo(x+w, y+h);
_via_reg_ctx.lineTo(x , y+h);
_via_reg_ctx.closePath();
}
function _via_draw_circle(cx, cy, r) {
_via_reg_ctx.beginPath();
_via_reg_ctx.arc(cx, cy, r, 0, 2*Math.PI, false);
_via_reg_ctx.closePath();
}
A region has to be selected before it can be moved or resized. A single
click inside a region sets the state variable
_via_is_region_selected = true;
as follows:
_via_reg_canvas.addEventListener('mousedown', function(e) {
_via_click_x0 = e.offsetX; _via_click_y0 = e.offsetY;
...
}
_via_reg_canvas.addEventListener('mouseup', function(e) {
_via_click_x1 = e.offsetX; _via_click_y1 = e.offsetY;
var click_dx = Math.abs(_via_click_x1 - _via_click_x0);
var click_dy = Math.abs(_via_click_y1 - _via_click_y0);
...
// denotes a single click (= mouse down + mouse up)
if ( click_dx < VIA_MOUSE_CLICK_TOL ||
click_dy < VIA_MOUSE_CLICK_TOL ) {
...
var region_id = is_inside_region(_via_click_x0, _via_click_y0);
if ( region_id >= 0 ) {
// first click selects region
_via_user_sel_region_id = region_id;
_via_is_region_selected = true;
...
}
}
}
Recall that a click event is detected by checking if the mouse cursor
position during the mousedown
and mouseup
events are
close to each other. Furthermore, the function is_inside_region()
checks if the mouse cursor position is inside a pre-defined region.
Once a region has been selected, it can be moved around by clicking the
mouse button, dragging the cursor around and finally releasing the mouse
click when desired new location is reached. This mouse gesture is
captured by mousedown
, mousemove
and mouseup
events. First, the
mousedown
event sets the state variable _via_is_user_moving_region = true;
as follows:
_via_reg_canvas.addEventListener('mousedown', function(e) {
_via_click_x0 = e.offsetX; _via_click_y0 = e.offsetY;
_via_region_edge = is_on_region_corner(_via_click_x0, _via_click_y0);
var region_id = is_inside_region(_via_click_x0, _via_click_y0);
if ( _via_is_region_selected ) {
// check if user clicked on the region boundary
if ( _via_region_edge[1] > 0 ) {
...
} else {
var yes = is_inside_this_region(_via_click_x0,
_via_click_y0,
_via_user_sel_region_id);
if (yes) {
if( !_via_is_user_moving_region ) {
_via_is_user_moving_region = true;
_via_region_click_x = _via_click_x0;
_via_region_click_y = _via_click_y0;
}
}
...
}
}
Next, the mousemove
event draws intermediate regions -- to aid
with visualization -- as the user moves the mouse cursor towards the
final destination as shown by the code snippet below:
_via_reg_canvas.addEventListener('mousemove', function(e) {
_via_current_x = e.offsetX; _via_current_y = e.offsetY;
...
if ( _via_is_user_moving_region ) {
_via_redraw_reg_canvas();
var move_x = (_via_current_x - _via_region_click_x);
var move_y = (_via_current_y - _via_region_click_y);
var attr = _via_canvas_regions[_via_user_sel_region_id].shape_attributes;
switch (attr.get('name')) {
case VIA_REGION_SHAPE.RECT:
_via_draw_rect_region(attr.get('x') + move_x,
attr.get('y') + move_y,
attr.get('width'),
attr.get('height'),
true);
break;
case VIA_REGION_SHAPE.CIRCLE:
...
case VIA_REGION_SHAPE.POLYGON:
...
case VIA_REGION_SHAPE.POINT:
...
}
_via_reg_canvas.focus();
}
...
}
Finally, the mouseup
event moves the selected region to a new
location as follows:
_via_reg_canvas.addEventListener('mouseup', function(e) {
_via_click_x1 = e.offsetX; _via_click_y1 = e.offsetY;
var click_dx = Math.abs(_via_click_x1 - _via_click_x0);
var click_dy = Math.abs(_via_click_y1 - _via_click_y0);
// indicates that user has finished moving a region
if ( _via_is_user_moving_region ) {
_via_is_user_moving_region = false;
_via_reg_canvas.style.cursor = 'default';
var move_x = Math.round(_via_click_x1 - _via_region_click_x);
var move_y = Math.round(_via_click_y1 - _via_region_click_y);
if (Math.abs(move_x) > VIA_MOUSE_CLICK_TOL ||
Math.abs(move_y) > VIA_MOUSE_CLICK_TOL) {
var image_attr = _via_img_metadata[_via_image_id].regions[_via_user_sel_region_id].shape_attributes;
var canvas_attr = _via_canvas_regions[_via_user_sel_region_id].shape_attributes;
switch( canvas_attr.get('name') ) {
case VIA_REGION_SHAPE.RECT:
var xnew = image_attr.get('x') + Math.round(move_x * _via_canvas_scale);
var ynew = image_attr.get('y') + Math.round(move_y * _via_canvas_scale);
image_attr.set('x', xnew);
image_attr.set('y', ynew);
var canvas_xnew = canvas_attr.get('x') + move_x;
var canvas_ynew = canvas_attr.get('y') + move_y;
canvas_attr.set('x', canvas_xnew);
canvas_attr.set('y', canvas_ynew);
break;
case VIA_REGION_SHAPE.CIRCLE:
case VIA_REGION_SHAPE.ELLIPSE:
case VIA_REGION_SHAPE.POINT:
...
case VIA_REGION_SHAPE.POLYGON:
...
}
} else {
// indicates a user click on an already selected region
// this could indicate a user's intention to select another
// nested region within this region
...
}
_via_redraw_reg_canvas();
...
return;
}
...
}
Some notes:
_via_canvas_regions
contains the region coordinates in the canvas space and therefore needs scaling by_via_canvas_scale
to convert it to the original image space coordinates stored in_via_img_metadata
(see Core Data Structures section).- Moving circular, elliptical and point regions is conceptually similar -- move their center coordinates -- and hence are handled by a single code.
- The same user mouse interactions occur for nested regions -- one smaller region placed inside a larger region. Therefore, before moving regions, we check if the movement of user cursor is beyond certain tolerance. If it is below that level, we consider this as a gesture to select other nested region.
The update of region attributes is triggered by the function
toggle_reg_attr_panel()
toggles the region attribute panel in the
bottom of the browser window. The spreashsheet like input environment
for each region (along the rows) and each attribute (along the columns)
is generated by update_region_attributes_input_panel()
. In a similar
way, the update of file attributes are handled by
toggle_file_attr_panel()
and update_file_attributes_input_panel()
.
The spreadsheet like editing environment is setup and handled by the
function init_spreadsheet_input()
.
This is handled by the function add_new_attribute()
.
This action is initiated by the function download_all_region_data()
.
The conversion from _via_img_metadata
object to a user requested
format (CSV or
JSON) is done by the function
pack_via_metadata()
while the function save_data_to_local_file()
triggers the browser action to download this file to local disk.
By default, the CSV export uses comma "," as the separating character.
Importing existing annotation data (in CSV or JSON format) to VIA
application is initiated by the function sel_local_data_file()
which
allows the user to select a local file. Once a local file has been
selected, the function import_annotations_from_file()
is triggered to
import annotation and insert the valid ones into the _via_img_metadata
object.
Note: the CSV file containing annotation data should have comma "," as the separating character.
At the end of application initialization, VIA application invokes the function
_via_load_submodules()
if it is defined in the Javascript global namespace.
This behaviour can be used to build a lot of interesting tools that rely on VIA
for the core functionality of image annotation. See the following for examples:
- via_demo.js : VIA application packaged together with some images and their annotations for demonstration of VIA features.
- DMIAT : Distributed Manual Image
Annotation Tool (DMIAT) is built on top of VIA and isolates the image annotators
from the technical details of loading images and saving/sending annotations.
- images to be annotated are predefined in the form of http-image-url
- the annotations are automatically pushed to git repository
VIA is an open source project actively maintained by the Visual Geometry Group (VGG). Its source code is a distributed under the BSD-2 clause license.
Copyright (c) 2016-2017, Abhishek Dutta.
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.