Common data classification methods in GIS

发布时间： 2023-03-28 02:04:08 UTC

Page Views: Stats unavailable

When we use ArcGIS or other GIS software to classify data, we are often confused by some terms. Today, I try to explain the common classification methods in GIS in plain English.

Manual Interval

Manual interval, which is easy to understand, is to let you divide the interval range of the class yourself, and the machine does not interfere at all.

For example, for data from 1-100, the interval I set is 1-20, 21-50, 51-54, 55-100.

Defined interval

Defined interval: The user defines the interval, and the system automatically classifies the data based on this interval. For example, if the interval I define is 10, then the data from 1 to 100 will be initially divided into 1-10, 11-20, 21-30, 31-40, 41-50, 51-60, and continue until 91-100.

Defined interval

Equal spacing

Equal interval: The system automatically divides the range of attribute values into several sub ranges of equal size, such as 1-300 data. If the equal interval is set to 60, the classification results are: 1-60, 61-120121-180181-240241-300, and the numerical spacing (data variation range) of each class in the classification results is the same.

Equal spacing

Quantile

Quantile, also known as the equal frequency method, contains the same number of data values for each type in the classification results.

Quantile

Looking at the above diagram, a two-dimensional data is divided into 7 groups, with 10 data in each group. A common application scenario for this classification method is to distinguish data such as population density that is uniformly distributed within its range.

Quantile

Natural discontinuity classification method (Jenks)

Natural Breaks (Jenks), which optimizes data grouping by using an algorithm to minimize differences in data values within the same category and maximize differences in data values between categories.

This classification method strives to minimize the average deviation from the class mean while maximizing the deviation from other group mean values. This method reduces intra class variance and maximizes inter class variance. Also known as variance goodness of fit (GVF), it is equal to subtracting SDCM (sum of square deviations of class means) from SDAM (sum of square deviations of array means).

Jenks

Geometric interval

Geometric interval, which classifies data values through gradually increasing intervals. The principle of this algorithm to create geometric intervals is to minimize the sum of squares of the number of elements in each class. This ensures that each class range has approximately the same number of values as each class, and that changes between intervals are very consistent.

This algorithm is dedicated to processing continuous data. This is a compromise between equal spacing, natural breakpoint grading (Jenks), and quantiles. It achieves a balance between highlighting intermediate and extreme value changes, so the generated results have a beautiful appearance and detailed map content, which are very useful for displaying non normal distributed data or when the distribution of the data is extremely skewed.

Geometric interval

Standard deviation

Standard deviation, used to display the difference between feature attribute values and average values. The system will automatically calculate the average and standard deviation. The classification interval will be created using an equivalent range proportional to the standard deviation, typically 1, 1/2, 1/3, or 1/4 times the interval, using the average value and the standard deviation derived from the average value.

Standard Deviation

文章链接 :Common data classification methods in GIS