Recognition of Geometric Shapes
Part 1
Contents
- Description of a simple task that is difficult to delegate to a computer.
- Two different approaches to image processing.
- Determining the similarity of geometric shapes with different sizes and orientations on the plane.
- Determining the similarity of geometric shapes of different forms.
- Data for computer experiments.
- Two ways to prove the effectiveness of recognition methods.
- Conclusion.
1. Description of a Simple Task that is Difficult to Delegate to a Computer
Try to pair similar objects.
I know you handled it easily, but to create a program that could replace you in this task, we first need to define the concept of “similarity of shapes of different forms.” Any ideas? Note that the size or orientation of shapes on the plane plays no role. Indeed, rotating a square or increasing the size of a square does not change its shape.
2. Two Different Approaches to Image Processing
As known, most image processing operations can be performed either in the spatial domain or in the frequency domain. In the former case, we manipulate individual pixels, while in the latter, we consider the image as a two-dimensional brightness function to which Fourier transformation is applied. Since we will exclusively use the frequency domain later, it is necessary to note the following: standard color images of geometric shapes on a white background will initially be converted to the “canonical form,” i.e., transformed into monochrome, greatly reduced in size, and inverted. While preserving the shape of the figure, we significantly reduce the number of non-zero (informative) pixels in the image. The magnitude of the Fourier spectrum of such a “canonical image” is a smooth function.
The left side of the figure shows a color image of an object, then in the center, the same object is presented in canonical form, and finally, on the right, the low-frequency part of the magnitude (module) of the Fourier spectrum is shown using the COLORMAP_HSV palette.
The main properties of the magnitude of the Fourier spectrum are well-known: it does not change when the object is shifted, rotates parallel to it, and proportionally compresses when it enlarges. Equality of images entails the coincidence of their magnitudes, although the reverse is not always true.
Note. The size of the original color image does not play a special role, but usually, we choose [200 x 200] pixels. This is necessary if AI Tables are used for recognition procedure testing (see below).
3. Determining the Similarity of Geometric Shapes with Different Sizes and Orientations on the Plane
Let’s start by comparing shapes of the same form but different in size and orientation. An example of such shapes is shown below along with their canonical representation.
At the same time, let’s show the magnitudes of their spectra.
Since the shape on the left is smaller than the one on the right and is rotated clockwise relative to it, in complete accordance with the basic properties of the spectrum, its magnitude is proportionally stretched and rotated by the same angle.
Let’s establish the exact numerical values of this transformation, i.e., the scaling factor (scale) and the rotation angle (angle). To do this, find the coordinates of several first local maxima of magnitudes (which is not difficult, given the smoothness of these functions) and draw vectors R1 and R2 going from the center to the corresponding maxima.
The length and direction of these vectors entirely determine the sought parameters. Their use and the result of the transformation are shown below.
mat_rotate = cv2.getRotationMatrix2D(center, angle, scale)
image_left_warp = cv2.warpAffine(image_left, mat_rotate, dsize)
As we can see, now the magnitudes completely coincide in their central part. In other words, in the center, we eliminated any differences caused by the sizes and orientations of the shapes, and any additional differences in this area will be a consequence of the inequality of their geometric forms.
For a measure of similarity between two shapes of different forms, we propose using the difference between their magnitudes after the rotation and scaling proposed above, followed by normalization to the [0.0–1.0] interval.
Note also that if the similarity values between a geometric shape and all shapes of a certain class are close to 1, we talk about the shape belonging to this class or its recognition.
The presented method of similarity calculation was named pnt2, or the recognition method by two points. By installing pnt2 from PyPI in the standard way — pip install pnt2, you are ready to use it:
"""
HOW TO USE:
"""
from pathlib import Path
from pnt2 import get_similarity
path_1 = str(Path.cwd() / ‘first_image.png')
path_2 = str(Path.cwd() / ’second_image.png')
similarity = get_similarity(path_1, path_2)
print(f’Similarity = {round(similarity, 7)}’)
4. Determining the Similarity of Geometric Shapes of Different Forms
Let’s take examples of such shapes from our drawing at the beginning of the article.
And here is their representation in the frequency domain:
As before, let’s find the local maxima of magnitudes and indicate three of them corresponding to each other: 3 — 8, 8 — 6, and 0 — 0.
The affine transformation that translates three points of one image into the corresponding three points of another image is presented below. The result of performing this transformation is then shown.
pts_image1 = np.float32([[x1_image1, y1_image1],
[x2_image1, y2_image1],
[x0, x0]])
pts_image2 = np.float32([[x1_image2, y1_image2],
[x2_image2, y2_image2],
[x0, x0]])
mat_affine = cv.getAffineTransform(pts_image1, pts_image2)
image_warp = cv.warpAffine(image_1, mat_affine, dsize)
And again, the central parts practically coincide.
As another measure of similarity between two shapes of different forms, we propose using the difference between their magnitudes after an affine transformation based on three corresponding local maxima.
We named this method of similarity calculation pnt3 or the three-point recognition method.
"""
HOW TO USE:
"""
from pathlib import Path
from pnt3 import get_similarity
path_1 = str(Path.cwd() / ‘first_image.png')
path_2 = str(Path.cwd() / 'second_image.png')
similarity = get_similarity(path_1, path_2)
print(f’Similarity = {round(similarity, 7)}’)
Notes
a. The sources in Python of pnt2 and pnt3, data files, and examples of program operation can be downloaded from our website.
b. In debug mode cfg.debug_mode == True, both packages illustrate all stages of obtaining the result. A small part of these illustrations is used in this article.
c. There are ten different definitions of similarity between plane geometric figures. What sense is there in extending this long list? Below, we will show how the presented technologies allow solving tasks that could not be solved before.
5. Data for Computer Experiments
The first type of data (images of playing card suits) was created manually on an iPad using the INKredible drawing program, then the obtained drawings were scaled to the size [200 x 200] pixels.
The second type of data (images of stars) was generated by the Hand-Drawn Shape Simulation package automatically. This allows us to create practically an unlimited number of randomly distorted geometric figures and make our conclusions statistically justified.
6. Two Ways to Prove the Effectiveness of Recognition Methods
a. Sorting AI Tables
In the left table, data of the first type is arranged chaotically. It is necessary to change their position so that all similar figures are in a common column. The right table shows an example of one possible solution.
Note that any table sorting program is based on calculating the degree of similarity between its elements. Thus, success in sorting testifies to the effectiveness of the applied procedure for recognizing geometric figures.
A detailed discussion of this problem and its successful solution for a wide range of data can be found in our previous article — Amazing AI Tables. However, installing the new version of this program, install shapes-recognition==3.2.0, provides an important advantage: now you can choose which of the two recognition methods — pnt2 or pnt3 — to apply in each specific case.
import shapes_recognition as sr
sr.init()
"""
2 - pnt2, 3 - pnt3, others - random
"""
sr.set_calc_similarity_method(2)
# S E L F - S T U D Y
# ---------------------------------------------
# Get a list of shapes for self-study
list_self_study_in = sr.get_self_study_data('DATA_SELF_STUDY')
# Self-study
list_self_study_out = sr.self_study(list_self_study_in)
# Show self-study result
sr.show_self_study_results(list_self_study_in, list_self_study_out)
# ---------------------------------------------
# R E C O G N I T I O N
# ---------------------------------------------
# Get a list of shapes for recognition
list_recognition = sr.get_recognition_data('DATA_RECOGNITION')
# Recognition
recogn_dictionary = \
sr.recognition(list_self_study_out, list_recognition)
# Show recognition result
sr.show_recognition_results(recogn_dictionary)
# ---------------------------------------------
In addition, the program has become faster and even based on pnt2 successfully sorts images that differ significantly in shape (see the example with data of the second type below).
b. Construction and Analysis of Histograms
The best way to evaluate the effectiveness of the recognition procedure and make an unbiased comparison of different recognition methods is to analyze the results by constructing histograms. That’s what we will do now.
As a first example, let’s consider the black suits of playing cards, already used above: clubs and spades. The match list in its last column contains values of the degree of similarity between images of the same class (clubs compared with clubs, and spades with spades).
list "match", method "pnt2"
1 clubs_1.png clubs_2.png 0.56070001
2 clubs_1.png clubs_3.png 0.55136621
3 clubs_1.png clubs_4.png 0.54751737
4 clubs_1.png clubs_5.png 0.55803732
5 clubs_1.png clubs_6.png 0.59316292
6 clubs_2.png clubs_3.png 0.55408618
7 clubs_2.png clubs_4.png 0.54963147
8 clubs_2.png clubs_5.png 0.54132218
9 clubs_2.png clubs_6.png 0.55478911
10 clubs_3.png clubs_4.png 0.60338862
11 clubs_3.png clubs_5.png 0.56295087
12 clubs_3.png clubs_6.png 0.55916441
13 clubs_4.png clubs_5.png 0.56276984
14 clubs_4.png clubs_6.png 0.57363807
15 clubs_5.png clubs_6.png 0.57309041
16 spades_1.png spades_2.png 0.58182963
17 spades_1.png spades_3.png 0.52718553
18 spades_1.png spades_4.png 0.57701902
19 spades_1.png spades_5.png 0.59346156
20 spades_1.png spades_6.png 0.61351867
21 spades_2.png spades_3.png 0.52777798
22 spades_2.png spades_4.png 0.58018384
23 spades_2.png spades_5.png 0.59047866
24 spades_2.png spades_6.png 0.56184697
25 spades_3.png spades_4.png 0.52817713
26 spades_3.png spades_5.png 0.53300819
27 spades_3.png spades_6.png 0.52446784
28 spades_4.png spades_5.png 0.56555371
29 spades_4.png spades_6.png 0.59190648
30 spades_5.png spades_6.png 0.57739733
The mismatch list provides values of similarity between images of different classes (all possible comparisons of clubs with spades).
list "mismatch", method "pnt2"
1 clubs_1.png spades_1.png 0.54192863
2 clubs_1.png spades_2.png 0.53416434
3 clubs_1.png spades_3.png 0.53119831
4 clubs_1.png spades_4.png 0.5379787
5 clubs_1.png spades_5.png 0.5163439
6 clubs_1.png spades_6.png 0.53674907
7 clubs_2.png spades_1.png 0.53659498
8 clubs_2.png spades_2.png 0.54463789
9 clubs_2.png spades_3.png 0.52812881
10 clubs_2.png spades_4.png 0.53527197
11 clubs_2.png spades_5.png 0.51754406
12 clubs_2.png spades_6.png 0.51932797
13 clubs_3.png spades_1.png 0.53340233
14 clubs_3.png spades_2.png 0.51930335
15 clubs_3.png spades_3.png 0.50842647
16 clubs_3.png spades_4.png 0.53139809
17 clubs_3.png spades_5.png 0.50607686
18 clubs_3.png spades_6.png 0.52450555
19 clubs_4.png spades_1.png 0.5169884
20 clubs_4.png spades_2.png 0.52087741
21 clubs_4.png spades_3.png 0.504485
22 clubs_4.png spades_4.png 0.52916215
23 clubs_4.png spades_5.png 0.49581141
24 clubs_4.png spades_6.png 0.51879692
25 clubs_5.png spades_1.png 0.5361981
26 clubs_5.png spades_2.png 0.51875466
27 clubs_5.png spades_3.png 0.51964995
28 clubs_5.png spades_4.png 0.53019745
29 clubs_5.png spades_5.png 0.50247776
30 clubs_5.png spades_6.png 0.52523978
31 clubs_6.png spades_1.png 0.54236326
32 clubs_6.png spades_2.png 0.53349987
33 clubs_6.png spades_3.png 0.53377427
34 clubs_6.png spades_4.png 0.54604526
35 clubs_6.png spades_5.png 0.51578293
36 clubs_6.png spades_6.png 0.53814251
If the procedure for calculating similarity adequately reflects the situation, then values of similarity in the match list will generally be greater than those in the mismatch list. Therefore, the histogram built on data from the first list should be shifted to the right relative to the one built based on the second list.
The figure titled “pnt2: Clubs vs. Spades” is a graphical representation of the lists above. The histogram offset is evident, proving the usefulness of the pnt2 method for recognizing figures of different shapes. However, if we apply the pnt3 method to the same data (figure on the right), the histogram shift becomes even more pronounced, indicating the advantage of pnt3 over pnt2.
How does changing the dataset affect this? Below are the results of processing red suits — diamonds and hearts. As we can see, the histogram shift is even more pronounced, but this is a result of changing the data, not the method. Here we can assert that distinguishing between red suits is easier than distinguishing between black ones.
The figures below are obtained from the second type of data — distorted stars. In this case, 50 samples of each class were processed, increasing the match list to 2,450 comparisons and the mismatch list to 2,500. As before, it’s evident that the pnt3 method has an advantage. Also, it’s easier to distinguish three-pointed stars from four-pointed ones than five-pointed stars from six-pointed ones. In the former case, the area of overlap is smaller. Interestingly, like computers, humans also find it much easier to distinguish stars with fewer points and the simple shape of diamonds from all other playing card suits.
7. Conclusion
Some neurobiologists and vision physiologists, starting from Campbell and Robson in 1968, Pollen and Lee in 1971, Maffei and Fiorentiny in 1973, and Glezer et al. in 1989, proposed considering the visual cortex of the brain as a frequency analyzer. In other words, they hypothesized that the processing of visual information occurs in the frequency domain. In the articles listed below, we consistently confirm this point of view:
a. New High-Quality Edge Detector — Feb 2, 2022
One of the most notable achievements in experimental brain research (Nobel Prize in 1981) is the discovery by D. Hubel and T. Wiesel of orientation-selective neurons. However, the role of these neurons remained unclear until the end. By shifting the focus to the frequency domain, we clarified the purpose of orientation-selective neurons and based on that, created perhaps the best existing method for edge detection — the vc-filter.
b. Image Processing in the Visual Cortex — Feb 18, 2022
The transition to the frequency domain made it possible to explain well-known optical illusions (Müller-Lyer illusion, Vertical-horizontal illusion).
c. This article is no exception. It shows how basic pattern recognition tasks are easily solved in the frequency domain. Since the magnitude reflects the distribution of energy in the image plane, we consider images “similar” if they have a similar energy distribution after simple transformations.
The importance of the mentioned works lies in the construction of mathematical models of intelligence, initiated by observations and experiments. We strictly adhere to a system that has proven its fruitfulness in natural sciences and consider the visual cortex as another natural phenomenon alongside gravity and electromagnetism.