Spatial Data Transformations for GEMS Exchange
To put data onto the GEMS grid, it is often necessary to transform the original data. The specific types of transformations needed vary, depending on the nature of the original dataset. Individuals familiar with spatial data are likely familiar with the process of transforming data. The aim of this document is both to provide a general overview of these processes for those unfamiliar with them as well as to provide a key that describes the terms used within the GEMS metadata for specific data products.
Projection transformations, datum transformations
Most of the datasets in GEMS Exchange were not originally in the EASE-Grid 2.0 projection. Minimally, when the source data share the same data model of the Earth (WGS84) as the EASE-Grid 2.0, a projection transformation is required to convert the spatial locations from the original to the locations within the GEMS Grid. Note that GEMS Grid itself uses the EASE-Grid 2.0 specification. The mathematics of these transformations can be quite complicated, but the process of a projection transformation itself is relatively straightforward. If the original data are also projected (e.g., in a Cartesian coordinate system), a projection transformation involves transforming the source coordinates to geographic coordinates and then performing a second transformation into EASE-Grid 2.0 coordinates. Because the underlying model of the Earth is the same, a projection transformation in these cases does not change the underlying spatial data. That is, it’s a reversible transformation.
When the source data do not share the WGS84 data model of the EASE-Grid 2.0, the situation is more complicated. If the source data are projected and not in a geographic coordinate system, the first step is to transform data from the source coordinates to geographic coordinates using the source’s data model for the Earth. GRS80 is an example of an older model describing the Earth. Next, the source datum is transformed to the WGS84 Earth model used in EASE-Grid 2.0. Once the data are in WGS84, they are then transformed into the EASE-Grid v2 projected coordinate system. Note that a datum transformation often changes the underlying spatial data. This arises because of differences in underlying models and their assumptions regarding the Earth. For example, transforming between a datum that treats the Earth as a sphere can lead to information loss when transforming to a datum that assumes the Earth is an oblate spheroid (slightly flattened sphere) and vice versa. Once the spatial data are in WGS84, they are finally transformed into EASE-Grid 2.0 coordinates.
Descriptor transformations
The process described in the preceding section relates only to the transformation of the spatial component of the datasets. Because GEMS Grid is a discrete and nested hierarchical gridding system, there are a fixed number of levels within the system, and each level of the hierarchy corresponds to a fixed spatial resolution (Table 1). When the original source dataset is at spatial resolution that does not directly correspond with a GEMS Grid resolution, the analyst must make choices on how to transform the data descriptors themselves. These choices will vary from dataset to dataset, depending in part on whether the source data represent nominal (e.g. classes or categories without relationships), ordinal (e.g. classes or categories with ordered relationship), or whether they are descriptors of intervals or ratios (e.g. continuous values).
A real example of these types of descriptor transformations are 30 m observations collected by Landsat-8. Because the GEMS Grid does not have a corresponding 30 m resolution, the analyst must choose whether or not it is better to represent the observations using the 100 m (Level 4) or 10 m (Level 5) within the GEMS Grid nested hierarchy. Neither choice is inherently better or worse than the other, and how to deal with transforming the descriptors will depend on whether or not the analysis thinks representing the data at the coarser 100 m resolution is preferable to representing the observations at the finer 10 m resolution. In a sense, this can be thought of either aggregating or disaggregating the descriptors spatially. Sometimes this process is described as resampling to either higher or lower spatial resolutions by spatial analysts.
Table 1: Dimensionality Details of Levels Within the GEMS Grid
Level | Ref. Ratio | Num. Cells | Cell dimensions (meters) |
---|---|---|---|
0 | 16 | 391,384 | 36,032.22084 x 36,032.22084 |
1 | 9 | 1,565,536 | 9008.05521 x 9008.05521 |
2 | 9 | 4,969,608 | 3002.68507 x 3002.68507 |
3 | 100 | 14,089,824 | 100.89502 x 100.89502 |
4 | 100 | 140,898,240 | 100.08950 x 100.08950 |
5 | 100 | 1,408,982,400 | 10.00895 x 10.00895 |
6 | N.D. | 14,089,824,400 | 1.000895 x 1.00895 |
Types of descriptor transformations
When transforming spatially explicit data from coarser-to-finer or finer-to-coarser resolutions, a variety of transformation techniques are possible, the choice of which depends on the nature of the data and the purpose of the transformation.
Nearest neighbor. This is among the simplest and most straightforward descriptor transformation as it simply maintains the original values of the descriptor throughout the transformation. Basically, a new raster, at either higher or lower spatial resolution, is derived from the original. That raster is then filled with values that correspond to the nearest neighbor of that pixel of the source raster. When moving to a coarser spatial resolution, this generally means the pixel that was closest to the center of the original raster will be chosen. When moving to a finer resolution than the source data, nearest neighbor generally puts identical values of the parent pixel into all of its higher resolution children pixels. This descriptor transformation can be used for nominal, ordinal, interval or ratio data.
Majority or most frequent.
This descriptor transformation only makes sense when moving to a coarser spatial resolution, whereby the coarser resolution raster will contain multiple cells of the source. In this descriptor transformation, values of all constituent cells are counted and the value with the highest counts, or most frequent value, is kept in the new target raster. This descriptor transformation can be used for nominal, ordinal, interval or ratio data, but it is more commonly used for categorical (nominal or ordinal) data since it represents the most frequent categorical value.
Averaging.
This descriptor transformation can be used for transforming from finer to coarser spatial resolutions. When the target resolution is a coarser resolution, the descriptor value in the new raster represents an average of the constituent finer resolution source cells. It is also possible to use weighted averaging as a type of descriptor transformation. Weighted averages can be defined spatially, whereby cells located near the center of the new raster are weighted more heavily than those at the edges. Weighting by category is also possible when moving from finer to coarser spatial resolution, but generally requires additional ancillary data.
Fractional.
This type of transformation is used when moving from a coarser resolution source data to finer spatial resolutions. In this case, the original value coarser resolution cell is divided and proportioned to constituent cells at the finer spatial resolution. Source descriptor values can be equally proportioned, whereby each finer resolution cell gets an equal fraction of the original descriptor value (e.g. if the original cells are comprised of 9 higher resolution cells, each higher resolution cell gets 1/9 of the original, coarse resolution descriptor value). Weighted fractional transformations are also possible when moving from coarse to finer spatial resolutions. Those types of descriptor transformations tend to require additional ancillary data, assumptions, and modeling.
Information
For more information on the GEMS Grid see
Thompson. Jeffery A., Mary J. Brodzik, Kevin A. T. Silverstein, Mason A. Hurley and Nathan L. Carlson. “EASE-DGGS: A Hybrid Discrete Global Grid System for Earth Sciences.” Big Earth Data 3 (30)(2022): 340-357. https://doi.org/10.1080/20964471.2021.2017539