Learning Geospatial Analysis with Python
上QQ阅读APP看书,第一时间看更新

Geographic Information System concepts

In order to begin geospatial analysis, it is important to understand some key underlying concepts unique to the field. The list isn't long but nearly every aspect of analysis traces back to one of these ideas.

Thematic maps

A thematic map portrays a specific theme as its name suggests. A general reference map visually represents features as they relate geographically for navigation or planning. A thematic map goes beyond location to provide the geographic context for information around a central idea. Usually a thematic map is designed for a targeted audience to answer specific questions. The value of thematic maps lies in what they do not show. A thematic map will use minimal geographic features to avoid distracting the reader from the theme. Most thematic maps include political boundaries such as country or state borders but omit navigational features, such as street names or points of interest beyond major landmarks which orient the reader. The cholera map earlier in this chapter is a perfect example of a thematic map. Common uses for thematic maps are visualizing health issues, such as disease, election results, and environmental phenomena such as rainfall. These maps are also the most common output of geospatial analysis. The following map from the US Census Bureau shows cancer mortality rates by state:

Thematic maps tell a story and are very useful. However, it is important to remember that while thematic maps are models of reality like any other map, they are also generalizations of information. Two different analysts using the same source information will often come up with very different thematic maps depending on how they analyze and summarize the data. The technical nature of thematic maps often leads people to treat them as if they are scientific evidence. But geospatial analysis is never conclusive. While the analysis may be based on scientific data the analyst does not follow the rigor of the scientific method. In his classic book How to Lie with Maps, Mark Monmonier demonstrates in great detail how maps are easily manipulated models of reality, which are commonly abused. This fact doesn't degrade the value of these tools. The legendary statistician George Box wrote in his 1987 book Empirical Model-Building and Response Surfaces, "Essentially, all models are wrong, but some are useful." Thematic maps have been used as guides to start (and end) wars, stop deadly disease in its tracks, win elections, feed nations, fight poverty, protect endangered species, and rescue those impacted by disaster. Thematic maps may be the most useful models ever created.

Spatial databases

In its purest form, a database is simply an organized collection of information. A database management system (DBMS) is an interactive suite of software that can interact with a database. People often use the word "database" as a catch-all term referring to both the DBMS and the underlying data structure. Databases typically contain alpha-numeric data and in some cases binary large objects, or blobs, which can store binary data, such as images. Most databases also allow a relational database structure in which entries in normalized tables can be referenced to each other to create many-to-one and one-to-many relationships among data.

Spatial databases use specialized software to extend a traditional relational DBMS or RDMS to store and query data defined in two-dimensional or three-dimensional space. Some systems also account for a series of data over time. In a spatial database, attributes about geographic features are stored and queried as traditional relational database structures. The spatial extensions allow you to query geometries using Structured Query Language (SQL) in a similar way to traditional database queries. Spatial queries and attribute queries can also be combined to select results based on both location and attributes.

Spatial indexing

Spatial indexing is a process that organizes geospatial vector data for faster retrieval. It is a way of prefiltering the data for common queries or rendering. Indexing is commonly used in large databases to speed up returns to queries. Spatial data is no different. Even a moderately-sized geodatabase can contain millions of points or objects. If you perform a spatial query, every point in the database must be considered by the system in order to include it or eliminate it in the results. Spatial indexing groups data in ways that allow large portions of the data set to be eliminated from consideration by doing computationally simpler checks before going into detailed and slower analysis of the remaining items.

Metadata

Metadata is defined as data about data. Accordingly, geospatial metadata is data about geospatial data sets that provides traceability for the source and history of a data set as well as summary technical details. Metadata also provides long-term preservation of information holdings. Geospatial metadata can be represented by several possible standards. One of the most prominent standards is international standard ISO 19115-1, which includes hundreds of potential fields to describe a single geospatial data set. Example fields include spatial representation, temporal extent, and lineage. The primary use of metadata is cataloging data sets. Modern metadata can be ingested by geographic search engines making it potentially automatically discoverable by other systems. It also lists points of contact for a data set if you have questions. Metadata is an important support tool for geospatial analysts and adds credibility and accessibility to your work.

Map projections

Map projections can be a challenge for new analysts. If you take any three-dimensional object and flatten it onto a plane, such as your screen or a sheet of paper, the object is distorted. Many grade school geography classes demonstrated this concept by having students peel an orange and then attempt to lay the peel flat on their desk to understand the resulting distortion. The same effect occurs when you take the round shape of the earth and project it onto a computer screen.

In geospatial analysis, you can manipulate this distortion to preserve common properties, such as area, scale, bearing, distance, or shape. There is no one-size-fits-all solution to map projections. The choice of projection is always a compromise of gaining accuracy in one dimension in exchange for error in another. Projections are typically represented as a set of over 40 parameters as either XML or a text format called Well-Known Text or WKT, used to define the transformation algorithm.

The International Association of Oil and Gas Producers maintains a registry of most known projections. The organization was formerly known as the EPSG. The entries in the registry are still known as EPSG codes. The EPSG maintained the registry as a common benefit for the oil and gas industry, which is a prolific user of geospatial analysis for energy exploration. At last count that registry contained over 5,000 entries.

As recently as 10 years ago, map projections were a primary concern for a geospatial analyst. Data storage was expensive, high-speed Internet was rare, and cloud computing didn't really exist. Geospatial data was typically exchanged among small groups working in separate areas of interest. The technology constraints at the time meant geospatial analysis was highly localized. Analysts would use the best projection for their area of interest. Data in different projections cannot be displayed on the same map because they represent two different models of the earth. Any time an analyst received data from a third party it had to be reprojected before using it with existing data. This process was tedious and time consuming. Most geospatial data formats do not provide a way to store the projection information. That information is stored in an ancillary file usually as text or XML. Because analysts didn't exchange data often, many people wouldn't bother defining projection information. Every analyst's nightmare was to come across an extremely valuable data set missing the projection information. It rendered the data useless. The coordinates in the file are just numbers and offer no clue to the projection. With over 5,000 choices it was nearly impossible to guess.

But now, thanks to modern software and the Internet making data exchange easier and more common, nearly every data format has added on a metadata format that defines the projection or places it in the file header if supported. Advances in technology have also allowed for global basemaps, which allow for more common uses of projections like the common Google Mercator projection used for Google Maps. Geospatial portal projects like OpenStreetMap.org and NationalAtlas.gov have consolidated data sets for much of the world in common projections. Modern geospatial software can also reproject data on the fly saving the analyst the trouble of pre-processing the data before using it.

Rendering

The exciting part of geospatial analysis is visualization. Because geospatial analysis is a computer-based process, it is good to be aware of how geographic data appears on a computer screen.

Geographic data including points, lines, and polygons are stored numerically as one or more points, which come in (x,y) pairs or (x,y,z) tuples. The x represents the horizontal axis on a graph. The y represents the vertical axis. The z represents terrain elevation. In computer graphics, a computer screen is represented by an x and y axis. A z axis in not used because the computer screen is treated as a two-dimensional plane by most graphics software APIs.

Another important factor is screen coordinates versus world coordinates. Geographic data is stored in a coordinate system representing a grid overlaid on the earth, which is three-dimensional and round. Screen coordinates, also known as pixel coordinates, represent a grid of pixels on a flat, two-dimensional computer screen. Mapping x and y world coordinates to pixel coordinates is fairly straightforward and involves a simple scaling algorithm. However, if a z coordinate exists then a more complicated transform must be performed to map coordinates from 3D space to a 2D plane. These transformations can be computationally costly and therefore slow if not handled correctly.

In the case of remote sensing data, the challenge is typically file size. Even a moderately sized satellite image, compressed, can be tens, if not hundreds of megabytes. Images can be compressed using lossless or lossy methods. Lossless methods use tricks to reduce file size without discarding any data. Lossy compression algorithms reduce file size by reducing the amount of data in the image while avoiding a significant change in appearance of the image. Rendering an image on the screen can be computationally intensive. Most remote sensing file formats allow for storing multiple lower-resolution versions of the image, called overviews or pyramids, for the sole purpose of faster rendering at different scales. When zoomed out from the image to a scale where you couldn't see the detail of the full resolution image, a pre-processed, lower-resolution version of the image is displayed quickly and seamlessly.