Recognition & AI, part 1
Foundation


Introduction
Our ability to recognize is simply amazing. How is it that we all, without conspiring, recognize the dog in the photo above? Why are so many people convinced that it is necessarily a poodle!
Recognition is the central focus of artificial intelligence (AI) and its main advantage is the ability to objectively evaluate the effectiveness of a proposed solution.

A Simple Task that is Difficult to Delegate to a Computer
Try to pair similar objects.

I know you handled it easily, but in order to create a program that could replace you in this task, we need a strict definition of “similarity of figures of different shapes”. Any ideas?
But first, let’s be clear — what do we mean when we talk about figures having the same shape? Basically it means that by moving one of the figures and possibly rotating and resizing it along the way, you can achieve a complete match with another figure. For example, all squares have the same shape. The same is true of circles and equilateral triangles. However, ectangular triangle and equilateral triangle is obvious example of two figures with different shape.

Two Approaches to Image Processing
Most image processing operations can be performed either in the image space domain or in the frequency domain. In the former case, we manipulate individual pixels, while in the latter case we treat the image as a two-dimensional luminance function to which we have applied a Fourier transform. Since we will be making extensive use of frequency space in the following, the original images of shapes will usually be reduced to “canonical form”, i.e., represented by their smaller contour image. By completely preserving the shape of the figure, we significantly reduce the number of non-zero (informative) pixels. Such a “canonical image” is easy to analyze in frequency space, since its Fourier magnitude is a smooth function.

Here on the left we see the original image of the object, then in the center of the figure the same object is represented in the canonical form, and on the right we show the low-frequency part of the magnitude of the Fourier spectrum using the COLORMAP_HSV color palette.
The basic properties of the magnitude are well known: it does not change when the object is displaced, rotates with it, and shrinks proportionally when the object is enlarged. Equality of images implies coincidence of their magnitudes, although the reverse is not always true.

The Degree of Similarity of Geometric Figures of the Same Shape
An example of such figures together with their canonical representation and magnitudes is shown below.


Since the figure on the left in Fig. 2 is smaller than the figure on the right and rotated clockwise relative to it, in full accordance with the basic properties of the spectrum, its magnitude is proportionally stretched and rotated by the same angle.
In frequency space, it is easy to establish the exact values of this transformation, i.e., the scale factor (scale) and the rotation angle (angle). To do this, we find the coordinates of several local maxima of both magnitudes (which is not difficult, given the smoothness of these functions) and draw the vectors R1 and R2 going from the center to the maxima corresponding to each other.

The length and direction of R1 and R2 completely determine the required scale and angle parameters. The transformation using them and its result are shown below.
mat_rotate = cv2.getRotationMatrix2D(center, angle, scale)
image_left_warp = cv2.warpAffine(image_left, mat_rotate, dsize)

As we can see, after rotation and scaling, the magnitudes have completely coincided in their non-zero, low-frequency part. In other words, here we have eliminated any difference due to the size and orientation of the figures. Therefore, the distance D between figures of the same shape can be defined as the average difference between the non-zero parts of the magnitudes after the above rotation and scaling. Due to calculation errors, the value of D may vary slightly, but always remain close to zero.
Even more widely used is the notion of the degree of similarity S between figures, defined as S = 1 / (1 + coeff * D), which for figures of the same shape is close to 1 or 100%.
If the similarity values between a given figure and all figures of a certain class are close to 100%, then the figure is said to belong to this class or to be recognized.
Below is a program for calculating the similarity of two figures from Fig. 2.
import time
from pathlib import Path
from max2.src import cfg_max2
from max2.src.utils import init_directory, remove_directory
from max2.src.set_params import set_params
from max2.src.compare_shapes import compare_shapes
# -------------------------------------------------------------
cfg_max2.image_name = 'loc_left.png'
cfg_max2.templ_name = 'loc_right.png'
# -------------------------------------------------------------
cfg_max2.dir_data = 'IMAGES_DATA_max2'
cfg_max2.dir_debug = 'RESULTS_max2'
cfg_max2.path_image \
= str(Path.cwd() / cfg_max2.dir_data / cfg_max2.image_name)
cfg_max2.path_templ \
= str(Path.cwd() / cfg_max2.dir_data / cfg_max2.templ_name)
cfg_max2.debug_mode = True
if cfg_max2.debug_mode:
init_directory(cfg_max2.dir_debug)
else:
remove_directory(cfg_max2.dir_debug)
set_params(
12, # n_peaks (local maxima)
100, # canonical_size = 100 x 100 pixels
0.333) # cutoff = 0.333, low-pass filter parameter
begin = time.time()
distance, similarity = compare_shapes(cfg_max2.path_image,
cfg_max2.path_templ)
end = time.time()
print(f'\n{cfg_max2.image_name} vs. {cfg_max2.templ_name}'
f'\ndistance = {distance:.4f}\t similarity = {similarity:.4f}'
f'\ntime = {(end - begin):.1f} sec')


The Degree of Similarity of Geometric Figures of Different Shapes
We will borrow an example of such figures from Fig. 1 at the beginning of the article.


Let us again find the local maxima of both magnitudes and indicate three of them that correspond to each other: 2 –> 1, 7 –> 13, 0 –> 0.

The affine transformation that maps three points of one image to the corresponding three points of another image is shown below. Further, Fig. 8 shows the result of performing this transformation.
pts_image1 = np.float32([[x1_image1, y1_image1],
[x2_image1, y2_image1],
[x0, x0]])
pts_image2 = np.float32([[x1_image2, y1_image2],
[x2_image2, y2_image2],
[x0, x0]])
mat_affine = cv.getAffineTransform(pts_image1, pts_image2)
image_warp = cv.warpAffine(image_1, mat_affine, dsize)

And again, the non-zero parts of the magnitudes are almost identical.
To determine how similar two differently shaped figures are, we again look at the difference between the non-zero parts of the magnitudes, but now after an affine transformation based on three corresponding local maxima.
But how do we find this correspondence between the maxima? Unfortunately, we have no better solution than to go through all possible options and choose the one that gives the maximum similarity value.
We called the proposed method max3, or “method of recognizing geometric figures by three local maxima of magnitude.” Accordingly, the previous method used above for comparing figures of the same shape, we called max2, or “method of recognizing geometric figures by two local maxima.”
Note: When comparing figures of different shapes, max2 performs significantly worse than max3 and is used only when the current location of local maxima makes it impossible to apply a perspective transformation. For example, this often occurs when recognizing “1”s in the famous MNIST Dataset of handwritten digits.
Below is a program for calculating the similarity of two hearts of Fig. 1.
import time
from pathlib import Path
from max3.src import cfg_max3
from max3.src.utils import init_directory, remove_directory
from max3.src.set_params import set_params
from max3.src.compare_shapes import compare_shapes
# -------------------------------------------------------------
cfg_max3.image_name = 'heart_c.png'
cfg_max3.templ_name = 'heart_f.png'
# -------------------------------------------------------------
cfg_max3.dir_data = 'IMAGES_DATA_max3'
cfg_max3.dir_debug = 'RESULTS_max3'
cfg_max3.path_image \
= str(Path.cwd() / cfg_max3.dir_data / cfg_max3.image_name)
cfg_max3.path_templ \
= str(Path.cwd() / cfg_max3.dir_data / cfg_max3.templ_name)
cfg_max3.debug_mode = True
if cfg_max3.debug_mode:
init_directory(cfg_max3.dir_debug)
else:
remove_directory(cfg_max3.dir_debug)
set_params(
15, # n_peaks (local maxima)
100, # canonical_size = 100 x 100 pixels
0.25) # cutoff = 0.25, low-pass filter parameter
begin = time.time()
distance, similarity = compare_shapes(cfg_max3.path_image,
cfg_max3.path_templ)
end = time.time()
print(f'\n{cfg_max3.image_name} vs. {cfg_max3.templ_name}'
f'\ndistance = {distance:.4f}\t similarity = {similarity:.4f}'
f'\ntime = {(end - begin):.1f} sec')


Solving a “Simple” Task
Using the program above, let’s calculate the similarity values between all six figures of the “simple” task and fill the similarity table with these values.

As we can see, the most similar (with the highest similarity value) pairs are a-e, b-d and c-f.


Proof of effectiveness
A successful solution to a “simple” task is encouraging, but does not serve as conclusive proof of the effectiveness of the new approach. Any conclusions must be based on thousands of comparisons of geometric shapes.
Suppose you have two groups of objects of different types — type A (a1, a2, a3) and type B (b1, b2, b3). Let’s make lists MATCH and MISMATCH for subsequent calculation of the degree of similarity between the objects:
MATCH:
a1 - a2
a1 - a3
a2 - a3
b1 - b2
b1 - b3
b2 - b3
MISMATCH:
a1 - b1
a1 - b2
a1 - b3
a2 - b1
a2 - b2
a2 - b3
a3 - b1
a3 - b2
a3 - b3
As we can see, the MATCH list consists of pairs of candidates for comparison of the same type, and, accordingly, in the MISMATCH list all pairs are composed of objects of different types. If our similarity estimation adequately reflects reality, then the similarity values between the objects composing pairs of the MATCH list will, as a rule, be larger than those of the pairs of objects in the MISMATCH list. The latter means that the histogram constructed from the results of MATCH list comparisons will be shifted to the right relative to another histogram constructed from the MISMATCH list. Let’s make sure of it!
The data for our computational experiments will be randomly distorted star shapes, an example of which is given below.

In this case, we will increase the number of objects (stars) of each type to 50, which will entail an increase in the MATCH list to 2450 compared pairs and, accordingly, the MISMATCH list to 2500.
See the result of the first computational experiment. As follows from the title, four- and five-pointed stars were involved here.

And this is the result of a second computational experiment, involving five- and six-pointed stars.

In the figure above, blue color represents the results of comparison of objects of different types, and orange — objects of the same type. As we can see, orange histograms are shifted to the right relative to the blue ones, i.e., stars of the same type usually show higher values of similarity when compared. At the same time, the size of the overlapping area of the histograms reflects the probability of an error in decision-making, and with similarity values > 0.9, you reliably recognize an unknown star.

Since magnitude reflects the energy distribution in the image plane, we consider similar images to be those that have approximately the same energy distribution after simple geometric transformations.

Note
The author is interested in verifying and promoting the obtained results and therefore provides the reader with everything necessary for an independent testing (see below).
The source code for test-max2 and test-max3 is available on GitHub. At the test preparation stage, it is necessary to load data files (images of figures) into the _DATA_1 and _DATA_2 folders, and then sequentially execute four console applications contained in test-max2 or test-max3:
1_create_batch_files.py — automatic creation of lists of compared files MATCH (match.txt) and MISMATCH (mismatch.txt).
2_calc_max2 or 2_calc_max3 — calculation of the similarity for each pair of images in the match.txt and mismatch.txt files.
3_sort_max2 or 3_sort_max3 — sorting by similarity values.
4_histogram_max2 or 4_histogram_max3 — creation of histograms.
Our article Hand-Drawn Shape Simulation is dedicated to the technology of creating various geometric shapes. This technology allows you to compare your own original sets of geometric figures.
