Recognition & AI, part 1

Foundation

10 min readMar 1, 2025

--

Photo by charlesdeluvio on Unsplash

Introduction

Our ability to recognize is simply amazing. How do we all, without conferring, recognize the dog in the photo above? Why are many convinced that it must be a poodle?

Recognition is a central focus in the creation of artificial intelligence (AI), and its main advantage lies in the ability to objectively evaluate the effectiveness of proposed solutions.

A Simple Task that is Difficult to Assign to a Computer

Try to pair up similar objects.

Fig. 1. The six objects of the “simple” task

I know you managed easily, but to create a program that could replace you, it’s necessary to give a strict definition to the concept of “similarity of figures with different shapes”. Any ideas?

But first, let’s clarify: what do we mean when we talk about figures having the same shape? Essentially, it means that by moving one of the figures, possibly rotating and resizing it along the way, you can achieve a complete overlap with the other figure. For example, all squares have the same shape. The same applies to circles and equilateral triangles. However, a right-angled triangle and an equilateral triangle are an obvious example of figures with different shapes.

Two Approaches to Image Processing

Most image processing operations can be performed either in the space domain or in the frequency domain. In the first case, we manipulate individual pixels, while in the second, we consider the image as a two-dimensional brightness function to which the Fourier transform has been applied. Since we will be extensively using the frequency domain later on, the original images of figures will typically be reduced to a “canonical view,” i.e., represented by their smaller contour image. While fully preserving the shape of the figure, this significantly reduces the number of non-zero (informative) pixels. Such a “canonical image” is easy to analyze in the frequency domain, as its Fourier magnitude is a smooth function.

Fig. 2. Steps for transitioning to the frequency domain

Here, on the left, we see the original image of the object; then, in the center of the figure, the same object is presented in its canonical view; and on the right, the low-frequency part of the magnitude (modulus) of the Fourier spectrum is shown using the COLORMAP_HSV color palette.

The main properties of the magnitude are well-known: it does not change when the object is shifted (translation invariance), it rotates along with the object, and it proportionally compresses when the object is enlarged (scaling property). Equality of images implies the coincidence of their magnitudes, although the reverse is not always true.

Determining the degree of similarity between geometric figures of the same shape, i.e., differing in size and orientation on the plane

An example of such figures, along with their canonical representation and magnitudes, is shown below.

Fig. 3. Figures of the same shape and the magnitudes of their canonical representations

Since the figure on the top left in Fig. 3 is smaller than the figure on the right and is rotated clockwise relative to it, then, in full accordance with the properties of the spectrum, its magnitude is proportionally expanded (“inflated”) and rotated by the same angle.

In the frequency domain, it is easy to determine the exact values of this transformation, i.e., the scale factor (scale) and the angle of rotation (angle). To do this, we find the coordinates of the first local maxima of both magnitudes (which is not difficult, given the smoothness of these functions) and draw vectors R1 and R2 originating from the center to the corresponding maxima on each.

Fig. 4. Correspondence of the magnitudes’ local maxima: 0 -> 0, 1 -> 1

The length and direction of R1 and R2 entirely determine the sought parameters, scale and angle. The transformation using them and its result are shown below.

mat_rotate = cv2.getRotationMatrix2D(center, angle, scale)

image_left_warp = cv2.warpAffine(image_left, mat_rotate, dsize)
Fig. 5. The transformed magnitude of the figure on the left and the magnitude of the figure on the right

As we can see, after scaling by a factor of scale and rotating counter-clockwise by an angle of angle, the left magnitude became practically indistinguishable from the right magnitude within its non-zero, low-frequency part. Here, the differences between the magnitudes are close to zero.

We define the measure of similarity, or distance D, between figures of the same shape as the arithmetic mean of the differences between the compared magnitudes after the aforementioned rotation and scaling.

A detailed difference map of the magnitudes is shown below in monochrome format and using a color palette. The greater the difference between the magnitude values, the brighter the corresponding location on the map.

Fig. 6. Difference map of the two magnitudes from Fig. 5 in monochrome and color format

If the distances between a given figure and all figures of a specific class are close to zero, it indicates the figure’s belonging to that class, or its recognition.

Below is the text of the program for calculating the distance D between two figures of the same shape.

# max2 - https://github.com/boriskravtsov/max2

from pathlib import Path
import time

from max2.src import cfg
from max2.src.utils import init_directory, remove_directory
from max2.src.get_shapes_distance import get_shapes_distance


cfg.image_name = 'loc_left.png'
cfg.templ_name = 'loc_right.png'

cfg.dir_data = 'IMAGES_DATA'
cfg.path_image = str(Path.cwd() / cfg.dir_data / cfg.image_name)
cfg.path_templ = str(Path.cwd() / cfg.dir_data / cfg.templ_name)

begin = time.time()

distance = get_shapes_distance(cfg.path_image, cfg.path_templ)

end = time.time()

print(f'\n{cfg.image_name} & {cfg.templ_name}'
f'\ndistance = {distance:.5f}'
f'\ntime = {(end - begin):.1f} sec')
Fig. 7. Output of the max2 program running on PyCharm

Determining the degree of similarity between geometric figures of different shapes

Fig. 8. Figures of different shapes and the magnitudes of their canonical representations

Again, let’s find the local maxima of the magnitudes and indicate three corresponding pairs: 0 -> 0, 10 -> 10, 8 -> 7.

Fig. 9. Correspondence of local maxima between the two compared magnitudes

The affine transformation, which maps three points of one image to their corresponding three points in the other image, is presented below. Next, Fig. 10 shows the result of performing this transformation.

        pts_image1 = np.float32([[x1_image1, y1_image1],
[x2_image1, y2_image1],
[x0, x0]])

pts_image2 = np.float32([[x1_image2, y1_image2],
[x2_image2, y2_image2],
[x0, x0]])

mat_affine = cv.getAffineTransform(pts_image1, pts_image2)

image_warp = cv.warpAffine(image_1, mat_affine, dsize)
Fig. 10. The transformed magnitude of the figure on the left and the magnitude of the figure on the right

And again, the non-zero parts of the magnitudes practically coincide. The difference map of the magnitudes is provided below.

Fig. 11. Difference map of the two magnitudes from Fig. 10

We define the measure of similarity, or distance D, between figures of different shapes as the arithmetic mean of the differences between the compared magnitudes after the aforementioned affine transformation.

We have named this method max3, or the “method for recognizing geometric figures by three local maxima of their magnitudes”. Correspondingly, we named the previous method, used above when comparing figures of the same shape, max2, or the “method for recognizing geometric figures by two local maxima”.

We have already mentioned the correspondence of the magnitudes’ local maxima twice. But how can this correspondence be found? Which local maxima are preferable? Unfortunately, we have no better solution than trying all possible combinations and selecting the one that leads to the minimum difference between the magnitudes (see Fig. 10). This, of course, significantly impacts the execution speed.

Below is the program for calculating the similarity of the two distorted 4-pointed stars from Fig. 8.

# max3 - https://github.com/boriskravtsov/max3

from pathlib import Path
import time

from max3.src import cfg
from max3.src.utils import init_directory, remove_directory
from max3.src.get_shapes_distance import get_shapes_distance

cfg.image_name = '4star_1.png'
cfg.templ_name = '4star_24.png'

cfg.dir_data = 'IMAGES_DATA'
cfg.path_image = str(Path.cwd() / cfg.dir_data / cfg.image_name)
cfg.path_templ = str(Path.cwd() / cfg.dir_data / cfg.templ_name)

begin = time.time()

distance = get_shapes_distance(cfg.path_image, cfg.path_templ)

end = time.time()

print(f'\n{cfg.image_name} & {cfg.templ_name}'
f'\ndistance = {distance:.5f}'
f'\ntime = {(end - begin):.1f} sec')
Fig. 12. Result of the similarity calculation for the two 4-pointed stars of different shapes (note the program’s execution time)

Solution to the “Simple” Task

Using the max3 program above, let’s calculate the distances D between all six figures of the “simple” task:

As we can see, the most similar (with the minimum mutual distance) pairs turned out to be the following: a-e, b-d, and c-f. Let’s represent the obtained solution graphically by connecting similar figures with straight line segments.

Figure 13. Solution of the“simple” task

Proof of effectiveness

The successful solution to the “simple” task is encouraging, but it does not serve as sufficient proof of the new approach’s effectiveness. Here, any conclusions must be based on thousands of comparisons of different geometric shapes.

Suppose you have two groups of objects of different types — type A (a1, a2, a3) and type B (b1, b2, b3). Let’s create MATCH and MISMATCH lists for subsequent calculation of distances between the objects in the lists:

MATCH:
a1 - a2
a1 - a3
a2 - a3
b1 - b2
b1 - b3
b2 - b3

MISMATCH:
a1 - b1
a1 - b2
a1 - b3
a2 - b1
a2 - b2
a2 - b3
a3 - b1
a3 - b2
a3 - b3

As we can see, the MATCH list consists of pairs of candidates for comparison of the same type, and, accordingly, the MISMATCH list consists of all pairs of objects of different types. If our similarity assessment adequately reflects reality, then the distances between the objects that make up the pairs in the MATCH list will, as a rule, be smaller in magnitude than those of the pairs of objects in the MISMATCH list. The latter means that the histogram constructed based on the results of comparisons in the MATCH list will be shifted to the left relative to another histogram constructed based on the MISMATCH list. Let’s verify this!

The data for our computational experiments will be randomly distorted star figures, an example of which is shown below.

At the same time, we will increase the number of stars of each type to 50, which will lead to an increase in the MATCH list to 2,450 comparable pairs and, accordingly, the MISMATCH list to 2,500.

Below is the result of the first computational experiment. As the title suggests, four- and five-pointed stars participated here.

And this is the result of the second computational experiment, with the participation of five- and six-pointed stars.

In the figures above, the results of comparing objects of the same type (MATCH list) are shown in blue, and the results of comparing objects of different types (MISMATCH list) are shown in orange. As we can see, the blue histograms are shifted to the left relative to the orange ones, i.e., stars of the same type, when compared, do indeed, as a rule, show smaller distance values. At the same time, the size of the overlap area of the histograms reflects the probability of error in decision-making.

Looking at the histograms, two conclusions can be drawn:

  1. If, as a result of comparing two stars, the obtained value D < 11, then we confidently conclude that stars of the same type were compared. In other words, if we know one of the participants in the comparison, we reliably recognize the other.
  2. Otherwise, if D > 11, our conclusions are unreliable and will be accompanied by a certain degree of uncertainty. The question of how, taking this uncertainty into account, to ensure reliable recognition will be the subject of consideration in the second part of this article.

Important note. Since the magnitude reflects the distribution of energy in the image plane, we consider images to be similar if, after simple geometric transformations, they have a similar distribution of energy.

In conclusion

The author is interested in promoting the obtained results and therefore provides the reader with everything necessary for independent verification:

Source codes for max2 and max3 on GitHub.

To obtain magnitude images, local maxima, and difference maps shown in the article, you need to set the flag cfg.debug_mode = True.

Testing procedures test-max2 and test-max3.

To obtain histograms, it is necessary to load data files (figure images) into the _DATA_1 and _DATA_2 folders, and then sequentially execute four console applications contained in the test-max2 or test-max3 directories:

1_create_batch_files.py — automatic creation of lists of comparable files MATCH (match.txt) and MISMATCH (mismatch.txt).

2_calc_max2 or 2_calc_max3 — calculation of the distance D for each pair of images in the match.txt and mismatch.txt files.

3_sort_max2 or 3_sort_max3 — sorting by distance values.

4_histogram_max2 or 4_histogram_max3 — creation of histograms.

Data for those wishing to double-check the solution to the “simple” task can be downloaded here, and the images of distorted stars used in the computational experiments can be found here.

Our article Hand-Drawn Shape Generation is dedicated to the procedure of creating various geometric shapes. This technology makes it possible to create and then compare your own sets of geometric figures.

--

--

Boris Kravtsov, PhD
Boris Kravtsov, PhD

Written by Boris Kravtsov, PhD

I'm trying to share some of my old thoughts and new perspectives.

No responses yet