Data quality and errors (and uncertainty) in GIS - Lecture Material - Completely GIS dan Remote Sensing tutorial - facegis.com
Data quality and errors in GIS

### Introduction

Wrong: computer / GIS data and output must be without error... child's drawing analogy ............... rotten GIS product

Errors occur in GIS data more frequently than in traditional map products, as data may be generated from maps and thus can only add more error.

Some like to imagine stages of error between converting reality to our GIS display:

Real world -> our conception of it -> measurement -> analysis -> display

What is 'error':  = the difference between different observers or between measuring instruments

Accuracy: = difference between reality and our representation of reality

Uncertainty = our representations are incomplete measurements of reality

We could consider the 'errors' that can occur during the four components of GIS:

a. Input

b. Database management

c. Analysis

d. Output

## 1. Types of Errors (spatial or attribute):

### a. Positional accuracy

The general rule is to be within the best possible map resolution or about the width of a line = 0.5mm. Hence at a scale of 1:50,000, error should be no more than 25 metres on the ground; at 1:250,000 error must be <125 metres.

Positonal accuracy can be measured in Root Mean Square Error or 'RMS'  = a measure of the average distance between the true and estimated location, or the error (e) in x and y.  RMS is calculated as the square root of the sum of the squared errors in x and y (distance  between the true and the calculated locations)

### b. Attribute accuracy

Classification and measurement may be rated in terms of % correct, with a standard of e.g. 80%. It may not be much higher due to the uncertain nature of some features such as vegetation boundaries (unless man-modified).  On a map and database, items such as forest types are grouped and placed within a boundary, which may be uncertain. UNBC example

For both spatial and attribute data, digital systems enable high precision (decimal places), but no guarantee of more accuracy. Example (Alaska Highway) of high precision (number of classes), but inaccuracy

## 2. Sources / Causes: errors before GIS

#### a. Instrument inaccuracies:

• satellite / air photo / (GPS) / surveying    (spatial)
• similarly for (attribute) measuring instruments

#### b. Human processing:

• mis- interpretation (e.g. photos), spatial and attribute example
• effects of scale change and generalization
• effects of classification - nominal (categorical) /  ordinal / interval    example
• different groups and jurisdictions using variable classifications

#### c. Actual changes: (out of date)

• Gradual 'natural' changes: river courses, glacier recession
• Catastrophic change: fires, floods, landslides
• Seasonal and daily changes: lake / sea / river levels:
• ..need for documentation of date and collection methods in metadata (= 'information or data about data')

## 3. GIS processing errors

#### a. Input:

• Digitizing: human error and the width of a line examples:
• Dangling nodes (connected to only one arc): permissible in arc themes (river headwaters, cul-de-sacs)
• Pseudo-nodes (connected to one or two arcs) - permissible in island arcs, and where attributes change, e.g. road becomes paved from dirt or vice versa.

Sidebar on Topology: Topology is needed for GIS analysis = the spatial relationships between geographic features. It is not to be confused with topography, the form of the land.

The Components of Topology- three fundamental  components:

a. Connectivity:
Arcs are connected to others (at nodes). This identifies possible routes and networks, such as rivers and roads, via the lists of arcs and nodes in the database.

b.  Containment:
An enclosed polygon has a measurable area; lists of arcs define boundaries and closed areas.

c.  Contiguity:
The adjacency of polygons can be determined by shared arcs.

 Polygon Topology: Area Node Topology: connectivity Arc Topology: contiguity Polygon Arcs Node Arcs Arc Left & Right Polygons A B C D a1, a2, a3 a2, a5, a6 a3, a4, a5 a1, a4, a6 1 2 3 4 a1, a2, a6 a2, a3, a5 a1, a3, a4 a4, a5, a6 a1 a2 a3 a4 a5 a6 A  D A  B A  C C  D B  C B  D

These are fundamental to GIS analysis and queries, for example:

a. From point A, how can I get to point B using the city road system?
b. What is the area of the combined areas of all residential housing?
c. Which residential areas are next to city parks?

GIS vector data can be acquired with topology (topological data) or without topology - 'spaghetti' data  (see below)

Spaghetti versus Topological data

### 'Simple' spaghetti data

Vector data that has been created without topology is referred to as 'spaghetti' data for reasons you can imagine (strings of unconnected lines). This is easier to create, but if to be used for GIS, one pays for lack of topology later: a case of "more haste, less speed". Individual features may appear the same, but:
• Arcs may not necessarily join and Polygons may not close to form areas example
• Intersections may not have nodes where two arcs cross.
• Adjacent digitized polygons may overlap or underlap - leaving empty slivers
• Arcs may consist of many broken segments.

'Complex' topological data

Creating topologically correct data takes longer, but enables GIS queries and analysis.
• Points: are polygons of zero area and length.
• Lines (arcs): start and end at nodes.
• Polygons: given by sets of connected arcs and an interior label point

Shared polygon arcs result in:

• Lower total number of arcs in a dataset.
• Adjacent polygons do not enclose overlap wedges or slivers.
• Cleaner map output (more evident when you zoom in or magnify).

b. Data manipulation:

• Interpolation of point data into lines and surfaces e.g. TIN and contours.
• Overlay of layers, digitized separately from different sources or scales, e.g. soils and vegetation. It's likely they have common borders, but slight differences cause 'slivers'.
• The compounding effects of processing and analysis of multiple layers: for example, if two layers each have correctness of 90%, the accuracy of the resulting overlay is around 81%.
• Inappropriate or inadequate inputs for models local example
• Inappropriate layers - future developments (in present datasets)
• Dubious classifications example
c. Data output:
• colour palettes: intended colours don't match from screen to Printer
• Scale changes - level of detail (generalisation)
• Beware of using software design defaults ....
• Scale bars- round numbers, logically subdivided, in logical units
• Legend items and design    Zorro lines   Rotten classes
• examples from this week's output lab

Further readings: GIS Primer; the Geographer's Craft

## 4. Review

Things you should consider after finishing this lecture:
1. Computer data have as many or more errors than Printed maps
2. The difference between accuracy and precision
3. The effects of scale and generalization
4. Lack of documentation - the need for metadata
5. Age and date of GIS data (relative to rate of change)
6. Effect of area jurisdictions - e.g provincial and federal differences
7. The challenge of a large province and country

http://www.gis.unbc.ca/courses/geog300/lectures/lect15/index.php