Spatial statistical classification analysis

发布时间 :2023-08-23 01:42:48 UTC      

Multivariate statistical analysis is mainly used for data classification and comprehensive evaluation. Data classification methods are an important part of GIS. Generally speaking, the data stored in GIS is primitive, users can extract and analyze the data according to different practical purposes, especially for observation and sampling data, with the different classification and interpolation methods, the results are very different. Therefore, in most cases, a large amount of unclassified data is first entered into the information system database, and then the user is required to establish a specific classification algorithm to obtain the required information.

Comprehensive evaluation model is the basis of regionalization and planning. From the perspective of human cognition, there are two types of precise and fuzzy. Because most geographical phenomena are difficult to be classified and expressed by precise quantitative relations, the fuzzy model is more practical and the results are often closer to reality. Comprehensive evaluation generally goes through four processes:

  1. Selection and simplification of evaluation factors;

  2. Determination of multi-factor importance index (weight);

  3. Determination of the membership degree of each category within a factor to the evaluation target.

  4. Select an appropriate method for multi-factor integration.

The problem of classification and evaluation usually involves a large number of interrelated geographical factors, the principal component analysis method can statistically compress the information of each influencing factor to several synthetic factors, thus greatly simplifying the model; the determination of the weight of the factor is an important step in the establishment of the evaluation model, the correctness of the weight greatly affects the correctness of the evaluation model, the usual factor weight determination depends on more subjective judgmen, the analytic hierarchy process is to integrate the opinions of the people and scientifically determine each A simple and effective mathematical means of influencing factor weights. The different influences of each category in the membership degree response factor on the evaluation target are determined according to the variation of different factors, and are usually calculated by piecewise linear function or other higher order function. Commonly used classification and synthesis methods include two categories: cluster analysis and discriminant analysis. Cluster analysis can divide the evaluation area into several categories according to the degree of similarity of the influencing factors between geographical entities, using some distance indicators related to weight and membership degree; discriminant analysis is similar to the classification method of remote sensing image processing, that is, according to the weight and membership degree of each element, each geographic entity is judged to be the most likely evaluation level or a sequence of ranks indicated by a certain data value according to certain evaluation criteria; classification and grading is the last step of evaluation. The results of clustering are combined according to the actual situation, and the evaluation grade of each category is determined. For the result sequence of discriminant analysis, the criterion of equal or unequal distance is used to divide the final evaluation grade.

The following is a brief introduction to several mathematical methods commonly used in classification and evaluation.

Principal component analysis (PCA) #

Geography problems often involve a large number of interrelated natural and social factors, many elements often bring great difficulties to the construction of the model, and also increase the complexity of the operation. In order to make it easy for users to understand and solve the problem of insufficient existing storage capacity, it is necessary to reduce some data while retaining the most necessary information. Since many variables in a geographic variable are usually related to each other, it is possible to perform mathematical processing on these associations to simplify the data. Principal Component Analysis (PCA) is a mathematical and statistical analysis to obtain the meaningful expression of the linear relationship between the various elements, it compresses the information of many elements into several representative synthetic variables, which overcomes the redundancy and correlation in variable selection, then, it chooses the few factors with the most abundant information to carry out cluster analysis and constructs application models.

Suppose there are n samples and p variables. The original data are transformed into a new set of features principal components, which are linear combinations of the original variables and possess orthogonal characteristics. That is, x₁, x₂, …, xₚ are synthesized into m (m < p) indicators z₁, z₂, …, zₘ, as follows:

z_1 =l_11 *x_1 +l_12 *x_2 +…+l_1p *x_p

z_2 =l_21 *x_1 +l_22 *x_2 +…+l_2p *x_p

… …

z_m =l_m1 *x_1 +l_m2 *x_2 +…+l_mp *x_p

The composite indicators determined in this way, * z*₁, * z*₂, …, zₘ, are referred to as the first, second, …, and m-th principal components of the original indicators, respectively. Among them, * z*₁ accounts for the largest proportion of the total variance, while the variances of the remaining principal components * z*₂, * z*₃, …, zₘ decrease sequentially. In practice, the first few principal components with the highest proportion of variance are often selected. This approach not only reduces the number of indicators but also captures the essential aspects and simplifies the relationships between the indicators.

Geometrically, the problem of determining principal components involves identifying the principal axes of an ellipsoid in p dimensional space. This is achieved by obtaining the eigenvectors corresponding to the m largest eigenvalues of the correlation matrix of x*₁, *x*₂, …, *xₚ . The eigenvalues and eigenvectors are typically calculated using the Jacobi method.

Obviously, principal component analysis, as a data analysis technique, reduces data to a manageable level and serves as a powerful tool for transforming complex data into simplified categories, thereby facilitating storage and management.

Analytic Hierarchy Process #

Analytic Hierarchy Process (AHP) is one of the mathematical tools for system analysis. It layers and quantifies the human thinking process and provides quantitative basis for analysis, decision-making, prediction or control with mathematical methods. In fact, it is a combination of qualitative and quantitative analysis. When the model involves a large number of interrelated and mutually restrictive complex factors, each factor has different importance in the analysis of the problem. It is very important to establish the model to determine the sequence of their importance to the target.

The AHP method divides the interrelated elements into several levels according to their subordinate relations. Experienced experts are invited to give quantitative indicators of the relative importance of each factor at each level, and the weights of the relative importance of each factor at each level are given by using mathematical methods to synthesize expert opinions as the basis of comprehensive analysis.

Systematic Clustering Analysis #

Systematic clustering is a method to classify geographic entities according to various geographic elements. Classification of different elements often reflects the hierarchical sequence of different objectives, such as land grading and grading, soil erosion intensity grading, etc.

The steps of systematic clustering are generally to merge several categories according to the similarity degree between entities, and the similarity degree is defined by distance or similarity coefficient. The criterion of merging classes is to maximize the differences among classes and minimize the differences within classes.

Discriminant Analysis #

Discriminant analysis and cluster analysis belong to the same classification problem. The difference is that discriminant analysis is a method of determining the factor criterion of grade sequence in advance according to theory and practice, and then arranging the geographical entities to be analyzed to the reasonable position of the sequence. It is more suitable for classification system grading problems with certain theoretical basis, such as soil erosion evaluation and land suitability evaluation.

Discriminant analysis can be divided into two types of discrimination, multi-type discrimination and step-by-step discrimination according to the number and method of discrimination.

Usually in two types of discriminant analysis, it is required to linearly combine according to known geographical feature values to form a linear discriminant function Y , namely:

Y=c_1 *x_1 +c_2 *x_2 +…+c_m *x_p

In the formula, cₖ (k = 1, 2, …, m) represents the discriminant coefficients, which reflect the direction of influence, discriminative power, and contribution rate of each factor or characteristic value. Once cₖ is determined, the discriminant function Y is also established. After defining the discriminant function, each sample can be classified into the corresponding category by calculating its discriminant function value. Commonly used discriminant analysis methods include distance-based discrimination, Bayes minimum risk discrimination, Fisher’s discriminant criterion, and others.

Principles, Technologies, and Methods of Geographic Information Systems  102

In recent years, Geographic Information Systems (GIS) have undergone rapid development in both theoretical and practical dimensions. GIS has been widely applied for modeling and decision-making support across various fields such as urban management, regional planning, and environmental remediation, establishing geographic information as a vital component of the information era. The introduction of the “Digital Earth” concept has further accelerated the advancement of GIS, which serves as its technical foundation. Concurrently, scholars have been dedicated to theoretical research in areas like spatial cognition, spatial data uncertainty, and the formalization of spatial relationships. This reflects the dual nature of GIS as both an applied technology and an academic discipline, with the two aspects forming a mutually reinforcing cycle of progress.