Comparing Qualitative Raster Maps

. Spatial diversity of the natural environment can be presented using raster qualitative data. They can be the result of collecting field data or be the result of stochastic modelling of a certain complexity of the environment. One example of such modelling is the determination of similar elements of relief forms on the basis of morphometric variables. The use of unsupervised methods for clustering raster data in modelling can produce different maps. It is necessary to assess the compatibility of the obtained maps in order to assess how significant the differences between them are. The article presents selected stochastic and deterministic methods for assessing the spatial distribution of data. Exemplary methods of measuring variability at the global and local level were discussed.


Introduction
In the era of relatively easy access to numerical data, including raster data, both qualitative and quantitative, the attention of scientists is directed not so much to the ways of obtaining data, but to the ways of using them, analyzing, modelling, and then assessing the results of their validation. Currently, the application of image analysis and classification (i.e. rasters) is used in many fields. The most common methods include the analysis of satellite and meteorological images, morphometric classification, medical image analysis (including tomograms, roentgenograms, ultrasound images, etc.), monitoring of product wear and quality (such as metal products, fabrics, paper, and foodstuffs), as well as analysis of surface images and microscopic images of material structures to monitor tool wear.
Algorithms have already been developed for recognizing objects in satellite images or diagnostic images used in medicine, often using neural networks or fuzzy logic. Comparing two images and assessing whether they are statistically [1] the same when these are modeling results and there is no pattern to relate to is still a challenge, although the topic has already been addressed in the literature [2]. This paper presents two such research situations along with a proposal of measures to assess the similarity of two images. The obtained results and limitations resulting from the measures used were discussed. In the article, the authors focused on exemplary global and local indicators that allow the assessment of differences between two images.
The methodology presented in this article goes beyond its applications solely in the field of topographic maps, as image analysis, particularly for images featuring stochastic fields, can be widely applied in industry and management, significantly impacting product quality enhancement [3][4][5], as well as modifying management methods [6,7] and related analytics [8,9]. The applications of such analyses encompass various issues, ranging from the assessment of corrosion phenomena [10], both in reference to typical metallic construction materials [11][12][13] and special alloys [14], to the visual evaluation of separator performance [15], and even to the automatic assessment of surfaces [16], coatings [17][18][19], and structural welds [20]. Currently, this significantly influences the organization of enterprises that have surpassed the level of Industry 3.0 and are reaching Industry 4.0, impacting the means of automatic control, for instance, in the application of DLC  [21][22][23] and ESD [24][25][26] coatings, as well as in the automation of thermal insulation evaluation using infrared cameras [27][28][29]. It is also a significant tool supporting the detection of machine damage [30,31], particularly hydraulic power equipment [32][33][34]. Automated methods for processing image maps are also becoming a vital tool in process optimization [35][36][37], including the application of qualitative and/or subjective factors [38][39][40], although this typically requires prior strong dimensionality reduction [41,42] to avoid unwanted correlations.

Area of Study
A fragment of the Owl Mountains (a mountain range in the Sudetes in central-southern Poland, located in the Lower Silesian Voivodeship) was selected for the research area with the intervention introduced so as to obtain a flat area. It is an area of 8,725 m in the E-W direction and 6,725 m in the N-S direction (269 rows x 349 columns). Using a digital elevation model (DEM) with a field resolution of 25 m, a k-median clustering with Manhattan metrics method was developed. This grouping was based on morphometric variables calculated from DEM, such as elevation, slope, aspect, curvature.

Fig.1. Models and pattern of the area subject to classification, divided into classes
Three images ( Fig.1) were selected for the calculations, broken down into 4 classes (clusters) of landform elements: Model 1 -developed on the basis of all 4 morphometric variables, Model 2 -obtained on the basis of height, inclination and curvature, and the third considered as a Patternmodel obtained similarly to model 1, but characterized by the greatest intergroup variability (sum of square between clusters). Exposure is not a morphometric variable that is commonly included in modelling. Having a tool (apart from the expert's assessment) to assess to what extent the inclusion of an exposure gives a significantly different modelling effect, it would be helpful in assessing the legitimacy of its inclusion in the model. An expert assessment of the advantages and disadvantages of modelling elements of sculpture forms with exposure was presented by Wieczorek and Migoń [43], but advanced similarity measures were not used in this assessment.

Problem to Solve
In this case, we are interested in the answer to the question: How to assess which model fits the pattern better? Although they are not identical, is the scale of similarity large enough to consider them identical models? Where to put this limit of assumed similarity? Obtaining answers to such questions would allow us to decide whether it is worth dividing into classes using 3 or 4 morphometric variables as input.
Another situation in which a measure of statistical significance would be needed in determining the differences between two rasters with qualitative data is the processing of maps by means of filtering. The key question is: After how many filters do we get an image that is too smooth? Fig.2 shows the pattern and its modification by carrying out the filtering procedure 5 times. The agreement between the two images is 92.6% (explained later in the article). The parameters set during filtering (the size of the filtering window and the filtering method), the number of filters and the homogeneity of the image are also important. The smaller information noise, the greater the differences between the original and the final image -however, we leave these considerations for another research study.

Research Procedure
Map comparison procedures are derived from four spatial data analysis traditions: 1. An accuracy rating that characterizes a match (or mismatch) between a reference map considered to be accurate and one or more of its approximations. 2. Detection of changes that are interpreted as a function of time.
3. Model comparison, where the (predicted or simulated) model results are compared with observed data and/or with other model results (pixel-to-pixel consistency is not expected here). 4. The last tradition is to compare the landscape. The key feature of this approach is that the comparison is undertaken against one or more spatial metrics computed from the maps. In practice, there are a number of methods used to compare and measure differences between raster images. They can be divided into: objective methods -using image descriptors based on mathematical models, and subjective methods -based on observations made by expert observers. Three methods were selected in the article: differential, using Cohen's kappa coefficient, and based on the Haralick event matrix (Co-occurrence Matrix). These methods were applied globally -to the entire area and locally -to selected fragments of areas. Different areas and adjacent areas were deliberately selected to test the effectiveness of the measures for different types of terrain.

Global Measures
The simplest approach to finding differences between two raster maps is to subtract one raster from the other and mark the differences. The results of such subtraction are shown in Fig.3, where the differences between the pattern and Model1 are marked in green, and the differences between the pattern and Model2 in yellow. The overall agreement was 62% for Model1 and 76% for Model2, respectively (Table 1). Setting a threshold for the allowable number of differing pixels is a half-way solution, because there is still the question of distributing these differences in a spatial context. The second proposed global measure is Cohen's Kappa Coefficient [44] -determined according to the equation [1]. This coefficient takes values from -1 to 1. The closer the value is to 1, the greater the agreement between the model and the standard. For the study area, Cohen's kappa coefficient was 0.71 for Model1 and 0.82 for Model2 (Table 1). Based on these two global measures, Model2 is closer to the benchmark, but is this close enough to conclude that the maps are consistent at a certain level of statistical significance? where cij -values in the i-row and j-column, dots mean summing up e.g. ci -summing up over all columns in row i, c -total sum, n -number of classes. The third method used to determine the differences was the Cartesian distance from the textural properties according to Haralick [45], determined on the basis of the Co-occurrence Matrix. Table  2 contains the values of individual properties for the pattern and models.  Fig.3. The concept of locality in similarity measures can be understood as a similarity within subclusters or a measure of global similarity calculated for a smaller area (subregion -R1, R2). Fig.3 shows that greater differences occur in the western part of the study area (R1 rectangle). Difference pixels are larger clusters and there are more of them. In the eastern part of the area, the differences are minor and do not constitute compact patches (R2 rectangle). This experiment shows that differences (Table 3) can be distributed very differently in space, so one global measure of this nature is not sufficient. In addition to the spatial distribution of differences, there is also the issue of their share in a given class, which was already indicated in their research by Pontius et al. [46] and Boots and Csillag [42]. Hence, a local approach seems to be more promising.

Local Measures
In order to determine local measures, 2 sets of sub-areas were designated. One resulted from the simple division of the entire area into equal adjacent sub-areas (Fig.4). The second method of division was to designate two sub-areas that differed according to the field expert (Fig.5).

Fig.5. Designation of two sub-areas with large geomorphological differences
Therefore, the determination of local measures consisted in determining global measures for 9 modules (tiles being sub-areas of the entire study area) and for two selected frames, for which the largest and smallest differences between the Pattern and Models 1 and 2 were recorded (Table 4). It is worth noting that the Haralick distance does not indicate Model 2 as closer to the standard for all sub-areas, which is the case for Cohen's kappa coefficient. For Cohen's kappa coefficient, a value of 0.7 was adopted as the limit for a satisfactory fit. Although model 2 has a better match to the pattern everywhere, the critical value of 0.7 was exceeded only for 4 out of 9 sub-regions

Conclusions
The number of pixels in which two raster maps differ is very specific information, but not sufficient to consider two maps similar or not, because there is no information about the spatial distribution of these differences. The presented measures and indicators are an attempt to develop a tool to assess the significance of differences between qualitative raster maps in the quantitative and spatial context. Even with expert knowledge, it is difficult to indicate the boundary between similar and dissimilar depiction of a given phenomenon. None of the presented measures is sufficient to unequivocally indicate the critical value allowing two maps to be considered similar. The experiment indicates that you should look for a combination of different indicators. The second approach is to determine similarity measures for subregions and then combine them. However, at this stage it is difficult to indicate the optimal number of subregions. The question remains how to combine the obtained measures for subregions into one summary indicator, which requires further research.