The Flex Backend is experimental. Everything in here is subject to change.
The "Flex" backend, as the name suggests, allows for a more flexible configuration that tells osm2pgsql what OSM data to store in your database and exactly where and how. It is configured through a Lua file which
- defines the structure of the output tables and
- defines functions to map the OSM data to the database data format
See also the example config files in the flex-config
directory which contain
lots of comments to get you started.
All configuration is done through the osm2pgsql
object in Lua. It has the
following fields:
osm2pgsql.version
: The version of osm2pgsql as a string.osm2pgsql.srid
: The SRID set on the command line (with-l|--latlong
,-m|--merc
, or-E|--proj
).osm2pgsql.mode
: Either"create"
or"append"
depending on the command line options (--create
or-a|--append
).osm2pgsql.stage
: Either1
or2
(1st/2nd stage processing of the data). See below.
The following functions are defined:
osm2pgsql.define_node_table(name, columns[, options])
: Define a node table.osm2pgsql.define_way_table(name, columns[, options])
: Define a way table.osm2pgsql.define_relation_table(name, columns[, options])
: Define a relation table.osm2pgsql.define_area_table(name, columns[, options])
: Define an area table.osm2pgsql.define_table(options)
: Define a table. This is the more flexible function behind all the otherdefine_*_table()
functions. It gives you more control than the more convenient other functions.
You are expected to define one or more of the following functions:
osm2pgsql.process_node()
: Called for each new or changed node.osm2pgsql.process_way()
: Called for each new or changed way.osm2pgsql.process_relation()
: Called for each new or changed relation.osm2pgsql.select_relation_members()
: Called for each deleted or added relation. See below for more details.
Osm2pgsql also provides some additional functions in the [lua-lib.md](Lua helper library).
You have to define one or more tables where your data should end up. This
is done with the osm2pgsql.define_table()
function or one of the slightly
more convenient functions osm2pgsql.define_(node|way|relation|area)_table()
.
osm2pgsql.define_table(OPTIONS)
osm2pgsql.define_(node|way|relation|area)_table(NAME, COLUMNS[, OPTIONS])
Here NAME is the name of the table, COLUMNS is a list of Lua tables describing
the columns as documented below. OPTIONS is a Lua table with options for
the table as a whole. When using the define_table()
command, the NAME and
COLUMNS are specified as options "name" and "columns", respectively.
Each table is either a node table, way table, relation table, or area table. This means that the data for that table comes primarily from a node, way, relation, or area, respectively. Osm2pgsql makes sure that the OSM object id will be stored in the table so that later updates to those OSM objects (or deletions) will be properly reflected in the tables. Area tables are special, they can contain data derived from ways or from relations. Way ids will be stored as is, relation ids will be stored as negative numbers.
With the osm2pgsql.define_table()
function you can also define tables that
- don't have any ids, but those tables will never be updated by osm2pgsql
- take any OSM object, in this case the type of object is stored in an
additional
char(1)
column. - are in a specific PostgresSQL tablespace (set option
data_tablespace
) or that get their indexes created in a specific tablespace (set optionindex_tablespace
). - are in a specific schema (set option
schema
). Note that the schema has to be created before you start osm2pgsql.
If you are using the osm2pgsql.define_(node|way|relation|area)_table()
convenience functions, osm2pgsql will automatically create an id column named
(node|way|relation|area)_id
, respectively. If you want more control over
the id column(s), use the osm2pgsql.define_table()
function.
Most tables will have a geometry column. (Currently only zero or one geometry columns are supported.) The types of the geometry column possible depend on the type of the input data. For node tables you are pretty much restricted to point geometries, but there is a variety of options for relation tables for instance.
The supported geometry types are:
point
: Point geometry, usually created from nodes.linestring
: Linestring geometry, usually created from ways.polygon
: Polygon geometry for area tables, created from ways or relations.multipoint
: Currently not used.multilinestring
: Created from (possibly split up) ways or relations.multipolygon
: For area tables, created from ways or relations.geometry
: Any kind of geometry. Also used for area tables that should hold both polygon and multipolygon geometries.
A column of type area
will be filled automatically with the area of the
geometry. This will only work for (multi)polygons.
In addition to id and geometry columns, each table can have any number of
"normal" columns using any type supported by PostgreSQL. Some types are
specially recognized by osm2pgsql: text
, boolean
, int2
(smallint
),
int4
(int
, integer
), int8
(bigint
), real
, hstore
, and
direction
. See the "Type conversion" section for details.
Instead of the above types you can use any SQL type you want. If you do that
you have to supply the PostgreSQL string representation for that type when
adding data to such columns (or Lua nil to set the column to NULL
).
In the table definitions the columns are specified as a list of Lua tables with the following keys:
column
: The name of the PostgreSQL column (required).type
: The type of the column as described above (required).not_null = true
: Make this aNOT NULL
column. (Optional, defaultfalse
.)create_only = true
: Add the column to theCREATE TABLE
command, but do not try to fill this column when adding data. This can be useful forSERIAL
columns or when you want to fill in the column later yourself. (Optional, defaultfalse
.)
All the osm2pgsql.define*table()
functions return a database table object.
You can call the following functions on it:
name()
: The name of the table as specified in the define function.schema()
: The schema of the table as specified in the define function.columns()
: The columns of the table as specified in the define function.add_row()
: Add a row to the database table. See below for details.
You are expected to define one or more of the following functions:
osm2pgsql.process_node(object)
: Called for each node.osm2pgsql.process_way(object)
: Called for each way.osm2pgsql.process_relation(object)
: Called for each relation.
They all have a single argument of type table (here called object
) and no
return value. If you are not interested in all object types, you do not have
to supply all the functions.
These functions are called for each new or modified OSM object in the input file. No function is called for deleted objects, osm2pgsql will automatically delete all data in your database tables that derived from deleted objects. Modifications are handled as deletions followed by creation of a "new" object, for which the functions are called.
The parameter table (object
) has the following fields:
id
: The id of the node, way, or relation.tags
: A table with all the tags of the object.version
,timestamp
,changeset
,uid
, anduser
: Attributes of the OSM object. These are only available if the-x|--extra-attributes
option is used and the OSM input file actually contains those fields. Thetimestamp
contains the time in seconds since the epoch (midnight 1970-01-01).grab_tag(KEY)
: Return the tag value of the specified key and remove the tag from the list of tags. (Example:local name = object:grab_tag('name')
) This is often used when you want to store some tags in special columns and the rest of the tags in an hstore column.get_bbox()
: Get the bounding box of the current node or way. This function returns four result values: the lot/lat values for the bottom left corner of the bounding box, followed by the lon/lat values of the top right corner. Both lon/lat values are identical in case of nodes. Example:lon, lat, dummy, dummy = object.get_bbox()
(This function doesn't work for relations currently.)
Ways have the following additional fields:
is_closed
: A boolean telling you whether the way geometry is closed, ie the first and last node are the same.nodes
: An array with the way node ids.
Relations have the following additional field:
members
: An array with member tables. Each member table has the fieldstype
(valuesn
,w
, orr
),ref
(member id) androle
.
You can do anything in those processing functions to decide what to do with
this data. If you are not interested in that OSM object, simply return from the
function. If you want to add the OSM object to some table call the add_row()
function on that table:
-- definition of the table:
table_pois = osm2pgsql.define_node_table('pois', {
{ column = 'tags', type = 'hstore' },
{ column = 'name', type = 'text' },
{ column = 'geom', type = 'point' },
})
...
function osm2pgsql.process_node(object)
...
table_pois:add_row({
tags = object.tags,
name = object.tags.name,
geom = { create = 'point' }
})
...
end
The add_row()
function takes a single table parameter, that describes what to
fill into all the database columns. Any column not mentioned will be set to
NULL
.
The geometry column in somewhat special. You have to define a geometry transformation that will be used to transform the OSM object data into a geometry that fits into the geometry column. See the next section for details.
Note that you can't set the object id, this will be handled for you behind the scenes.
Currently these geometry transformations are supported:
{ create = 'point'}
. Only valid for nodes, create a 'point' geometry.{ create = 'line'}
. For ways or relations. Create a 'linestring' or 'multilinestring' geometry.{ create = 'area'}
For ways or relations. Create a 'polygon' or 'multipolygon' geometry.
Some of these transformations can have parameters:
- The
line
transformation has an optional parametersplit_at
. If this is set to anything other than 0, linestrings longer than this value will be split up into parts no longer than this value. - The
area
transformation has an optional parametermulti
. If this is set tofalse
(the default), a multipolygon geometry will be split up into several polygons. If this is set totrue
, the multipolygon geometry is kept as one. It depends on this parameter whether you need a polygon or multipolygon geometry column.
If no geometry transformation is set, osm2pgsql will, in some cases, assume a default transformation. These are the defaults:
- For node tables, a
point
column gets the node location. - For way tables, a
linestring
column gets the complete way geometry, apolygon
column gets the way geometry as area (if the way is closed and the area is valid).
When processing OSM data, osm2pgsql reads the input file(s) in order, nodes
first, then ways, then relations. This means that when the ways are read and
processed, osm2pgsql can't know yet whether a way is in a relation (or in
several). But for some use cases we need to know in which relations a way is
and what the tags of these relations are or the roles of those member ways.
The typical case are relations of type route
(bus routes etc.) where we
might want to render the name
or ref
from the route relation onto the
way geometry.
The osm2pgsql flex backend supports this use case by adding an additional
"reprocessing" step. Osm2pgsql will call the Lua function
osm2pgsql.select_relation_members()
for each added, modified, or deleted
relation. Your job is to figure out which way members in that relation might
need the information from the relation to be rendered correctly and return
those ids in a Lua table with the only field 'ways'. This is usually done with
a function like this:
function osm2pgsql.select_relation_members(relation)
if relation.tags.type == 'route' then
return { ways = osm2pgsql.way_member_ids(relation) }
end
end
Instead of using the helper function osm2pgsql.way_member_ids()
which
returns the ids of all way members, you can write your own code, for instance
if you want to check the roles.
Note that select_relation_members()
is called for deleted relations and for
the old version of a modified relation as well as for new relations and the
new version of a modified relation. This is needed, for instance, to correctly
mark member ways of deleted relations, because they need to be updated, too.
The decision whether a way is to be marked or not can only be based on the
tags of the relation and/or the roles of the members. If you take other
information into account, updates might not work correctly.
In addition you have to store whatever information you need about the relation
in your process_relation()
function in a global variable.
After all relations are processed, osm2pgsql will reprocess all marked ways by
calling the process_way()
function for them again. This time around you have
the information from the relation in the global variable and can use it.
If you don't mark any ways, nothing will be done in this reprocessing stage.
(It is currently not possible to mark nodes or relations. This might or might not be added in future versions of osm2pgsql.)
You can look at osm2pgsql.stage
to see in which stage you are.
You want to do all the processing you can in stage 1, because it is faster and there is less memory overhead. For most use cases, stage 1 is enough.
Processing in two stages can add quite a bit of overhead. Because this feature is new, there isn't much operational experience with it. So be a bit careful when you are experimenting and watch memory and disk space consumption and any extra time you are using. Keep in mind that:
- All data stored in stage 1 for use in stage 2 in your Lua script will use main memory.
- Keeping track of way ids marked in stage 1 needs some memory.
- To do the extra processing in stage 2, time is needed to get objects out of the object store and reprocess them.
- Osm2pgsql will create an id index on all way tables to look up ways that need to be deleted and re-created in stage 2.
Use the command line option -O flex
or --output=flex
to enable the flex
backend and the -S|--style
option to set the Lua config file.
The following command line options have a somewhat different meaning when using the flex backend:
-p|--prefix
: The table names you are setting in your Lua config files will not get this prefix. You can easily add the prefix in the Lua config yourself.-S|--style
: Use this to specify the Lua config file. Without it, osm2pgsql will not work, because it will try to read the default style file.
The following command line options are ignored by osm2pgsl
when using the
flex backend, because they don't make sense in that context:
-k|--hstore
-j|--hstore-all
-z|--hstore-column
--hstore-match-only
--hstore-add-index
-K|--keep-coastlines
(Coastline tags are not handled specially in the flex backend.)--tag-transform-script
(Set the Lua config file with the-S|--style
option.)-G|--multi-geometry
(Use themulti
option on the geometry transformation instead, see the "Geometry transformation" section above for details.)- The command line options to set the tablespace (
-i|--tablespace-index
,--tablespace-main-data
,--tablespace-main-index
) are ignored by the flex backend, instead use thedata_tablespace
orindex_tablespace
options when defining your table.
The add_row()
command will try its best to convert Lua values into
corresponding PostgreSQL values. But not all conversions make sense. Here
are the detailed rules:
- Lua values of type
function
,userdata
, orthread
will always result in an error. - The Lua type
nil
is always converted toNULL
. - If the result of a conversion is
NULL
and the column is defined asNOT NULL
, an error is thrown. - The Lua type
table
is converted to the PostgreSQL typehstore
if and only if all keys and values in the table are string values. A Luatable
can not be converted to any other PostgreSQL type. - For
boolean
columns: The number0
is converted tofalse
, all other numbers aretrue
. Strings are converted as follows:"yes"
,"true"
,"1"
aretrue
;"no"
,"false"
,"0"
arefalse
, all others areNULL
. - For integer columns (
int2
,int4
,int8
): Booleantrue
is converted to1
,false
to0
. Numbers that are not integers or outside the range of the type result inNULL
. Strings are converted to integers if possible otherwise the result isNULL
. - For
real
columns: Booleans result in an error, all numbers are used as is, strings are converted to a number, if that is not possible the result isNULL
. - For
direction
columns (stored asint2
in the database): Booleantrue
is converted to1
,false
to0
. The number0
results in0
, all positive numbers in1
, all negative numbers in-1
. Strings"yes"
and"1"
will result in1
,"no"
and"0"
in0
,"-1"
in-1
. All other strings will result inNULL
. - For text columns and any other not specially recognized column types, booleans result in an error and numbers are converted to strings.
If you want any other conversions, you have to do them yourself in your Lua code. Osm2pgsql provides some helper functions for other conversions, see the (lua-lib Documentation)[lua-lib.md].