Management and treatment algorithms for GIS data are developed with the
assumption that the geometry of the entities meets certain specifications. When
the data processing algorithms deal with data that does not respect these specifications,
the software can simply crash or, worse, the operation can succeed with no
apparent problem but the result is wrong. The subject is complicated since
there is almost no documentation as for what each software does. And if you
think that a control “Repair geometries” will remediate
this problem, you are far from reality …
At the origin of the wrong geometries; we find, basically, the ESRI shapefiles.
This quite old format (from the eighties) was not designed to incorporate the
topology constraints of a GIS. For example, two contiguous polygons can be
superimposed without problem , their boundaries can be double, there may be
empty spaces between the two limits , …
Unfortunately , it’s become a standard format for data exchange between
different GIS software , and even if all software publishers as well as the
OpenSource projects strive to propose alternative, powerful and much safer
management formats (ESRI geodatabase, PostGIS, Spatialite …) the users, most
often, opt for the ease of the shapefile solution.
But for those who opt for these current database formats, the problem is, only,
solved for the data created directly in these formats. When shapefiles
(shapefile) are loaded into ESRI geodatabases (personal or files), in a PostGis
database or Spatiality, etc., geometries are copied as they are, with all the
existing geometry problems. The same precaution and care must be taken when
using other formats where these data is imported.
The indispensable work includes two steps, conveyed by two GIS tools:
- analysing the geometry to detect abnormalities, usually under the form of a tool “Check the geometries”
- correcting the detected defects, generally under the form of a tool “Repair geometries”
The tool Check Geometry will generate a report of all entities with
geometry problems in the geographical layers provided. To solve these problems,
the geometry repair tool will, automatically, perform the correction. This
seems magical, but it’s a lot more complicated…
Although there are different definitions for a polygon, most current GIS
software, use the definition stated by the Open Geospatial Consortium (OGC) and
the Organization International Standards (ISO), and provide validation
functions to ensure compliance with the polygons definition. There are small
variations between the different implementations, but the
validation of a two-dimensional polygon can be considered as a problem solved
at the theoretical level. To have a common definition, as well as validation tools
GIS users should be provided with the possibility to exchange data sets and use
spatial analysis operations with these data (a valid input is a prerequisite
for most operations).
However, if a polygon does not comply with the definition, it has to be fixed.
Most validation tools give the user a list of errors and locations where they
are located, but the user must fix these shortcomings manually. This is a very
tedious and time consuming task.
Hence the obvious temptations to use the auto repair geometries tools.
In this series of articles, we will discuss how the main GIS software behaves
to detect and correct the geometries. Let’s say right away that if you want one
a 100% efficient, you will be disappointed. But it is worth knowing what the
possible shortcomings are than to pretend there are none.
The first problem faced to tackle this topic is the almost total lack of
software documentation. Therefore we will consider a layer of polygons
containing abnormalities and process them with the different softwares .
We will use a layer of Italian municipalities provided by ISTAT, the
Italian Statistical Institute. It is this layer that is used in the page on
validating Spatialite geometries: SQL functions based on
liblwgeom support in version 4.0.0 .
You can download this layer with the following link:
Geometry validation with ArcGis
Let’s state the following setting:
– we use ArcGIS 10.3 in English
– the order used is Toolbox -> Data Management Tools -> Features ->
Check Geometry
– the layer to be tested is the layer com2011.shp that we have downloaded
Once
the order is executed , the table with the list of invalid records is loaded in
ArcMap
You will find that the order has not found any invalid geometry.
If we search in ArcGis help, the only description of the work done by the
command Check geometries is in the page http://desktop.arcgis.com/en/desktop/latest/tools/data-management-toolbox/check-geometry.htm
It becomes clear that the detected mistakes are:
- Short segment: some segments are shorter than the size authorized by the units of the reference spatial system associated with the geometry.
- Null geometry: the entity does not have any geometry or anything in the SHAPE field.
- Incorrect ring ordering: the polygon is simple from a topological point
of view but his loops may not be oriented correctly (external loops: clockwise, inner loops: counter clockwise). - Invalid segment orientation:
individual segments are not consistently oriented. The arrival point of the segment i must correspond to the starting point of the segment i + 1. - Self-intersections: a polygon should not be self-intersecting.
- Unclosed rings: the end point of the last segment in a loop must match the starting point of the first segment.
- Empty parts: the geometry includes several parts and one of them is empty (has any geometry).
- Duplicate vertex: the geometry has two or more peaks with identical coordinates.
- Mismatched attributes: the Z or M coordinate at the end of a line segment does not match the concurrent Z or M coordinate of the next segment.
- Discontinuous parts: one of the
parts of the geometry is composed of disconnected or discontinuous
parts. - Empty Z values: the geometry presents one or many peaks including an empty Z value (NaN, for example).
- Bad envelope: the envelope does not correspond to the
extent of the coordinates of the geometry. - Bad dataset extent: the extent of the data set does not contain all entities.
Satisfied with our test, we will load this shape in a Spatialite database.
Note that this would exactly the same if using a PostGis database, the SQL
validation tools being exactly the same . In the next article, we will use
Spatialite because you do not need to install PostGres or anything special, if
you have for ArcGIS or QGis.