Handwritten Digit Recognition (MNIST Dataset) without Using Neural Networks
Introduction
Neural networks are highly effective at recognizing handwritten digits, achieving an error rate of less than 0.1% on the MNIST dataset. This accuracy, however, comes after extensive tuning of numerous internal parameters — a long process known as pre-training. In addition, the neural network is highly specialized. Trained to recognize digits, it is completely useless when working with anything else — letters, traffic signs, etc.
Human intelligence is not burdened by these limitations. Consider a straightforward illustration: suppose we take four playing cards, each from a different suit, and present them to a child. We then ask the child to arrange the rest of the deck of playing cards by suit. The kid will easily cope, even if he sees the cards for the first time and has never done anything like this before.
The innate ability of humans to correctly assess the similarity between objects of different shapes allows them to recognize these objects without any preliminary action.
Now, let’s simulate this scenario and imagine a recognition system with a built-in function for calculating the similarity of two images:
similarity = get_similarity(path_image1, path_image2)
If we introduce representatives (samples) of different classes into such a system, similar to the four cards of different suits, it immediately gains the ability to recognize. Indeed, by calculating the similarity of each input object with these representatives, it will assign it to the class whose representative has the maximum similarity value with this object. The operation of such a system is indistinguishable from the behavior of a child with cards.
The New Method
The core of our approach is a function that calculates the similarity between two images, which allows the system to classify digits by comparing them against a set of predefined samples. Below is a screenshot of PyCharm with the program comparing two mnist images and displaying the result in the Terminal window.
Line 4:
here it is assumed that the mnist-separator package from PyPI
is already installed (pip install mnist-separator)
Line 6:
the path to the MNIST_DATASET_100 database, which will be
discussed below (you will have a different path)
Lines 8 and 9:
the filenames of the compared images 0_6.png and 1_7.png
Lines 11 and 13:
calculating the degree of similarity of two images and
printing the result
See our previous article “Recognition of Geometric Shapes”, which discusses the principles underlying the construction of the similarity function.
MNIST_DATASET_100
Google’s famous MNIST Dataset consists of two unequal parts. The first part, data_train, containing 60,000 handwritten images of each digit, is used for training neural networks, while the second part, data_test (10,000 images per digit), contains data for testing recognition. For our purposes, this database is redundant and we will use only a small part of it named MNIST_DATASET_100, in which each digit is represented by the first 100 instances of the original database. You can download MNIST_DATASET_100 here.
Program mnist-separator
Let’s make the task more challenging for our child by replacing playing cards with cards that have handwritten numbers on them. First, we will show him just two sample cards, one with a ‘0’ and the other with a ‘1’. By doing this, we introduce the child to two new concepts: ‘zero’ and ‘one.’ Next, we will give the child a deck of 20 cards with other handwritten zeros and ones (10 of each) and ask him to sort them — placing the cards that resemble the ‘0’ sample on the left and those that resemble the ‘1’ sample on the right.
Below is a screenshot of a program that implements this scenario. The Terminal window shows the result of its execution, indicating that the computation time was 44 seconds.
Line 9:
Initialization of working directories
Lines 11 and 12:
"Loading samples": samples of zero and one are copied from
MNIST_DATASET_100/data_train into two internal directories
of the program: $mnist/train_0 and $mnist/train_1
MNIST_DATASET_100/data_train/0/0_0.png -> $mnist/train_0/0_0.png
MNIST_DATASET_100/data_train/1/1_0.png -> $mnist/train_1/1_0.png
Lines 14 and 15:
"Loading data for recognition": 10 images of zero (0) and 10
images of one (1) are copied from MNIST_DATASET_100/data_test
into the common directory $mnist/test
MNIST_DATASET_100/data_test/0/0_0.png -> $mnist/test/0_0.png
MNIST_DATASET_100/data_test/0/0_1.png -> $mnist/test/0_1.png
MNIST_DATASET_100/data_test/0/0_2.png -> $mnist/test/0_2.png
MNIST_DATASET_100/data_test/0/0_3.png -> $mnist/test/0_3.png
MNIST_DATASET_100/data_test/0/0_4.png -> $mnist/test/0_4.png
MNIST_DATASET_100/data_test/0/0_5.png -> $mnist/test/0_5.png
MNIST_DATASET_100/data_test/0/0_6.png -> $mnist/test/0_6.png
MNIST_DATASET_100/data_test/0/0_7.png -> $mnist/test/0_7.png
MNIST_DATASET_100/data_test/0/0_8.png -> $mnist/test/0_8.png
MNIST_DATASET_100/data_test/0/0_9.png -> $mnist/test/0_9.png
MNIST_DATASET_100/data_test/1/1_0.png -> $mnist/test/1_0.png
MNIST_DATASET_100/data_test/1/1_1.png -> $mnist/test/1_1.png
MNIST_DATASET_100/data_test/1/1_2.png -> $mnist/test/1_2.png
MNIST_DATASET_100/data_test/1/1_3.png -> $mnist/test/1_3.png
MNIST_DATASET_100/data_test/1/1_4.png -> $mnist/test/1_4.png
MNIST_DATASET_100/data_test/1/1_5.png -> $mnist/test/1_5.png
MNIST_DATASET_100/data_test/1/1_6.png -> $mnist/test/1_6.png
MNIST_DATASET_100/data_test/1/1_7.png -> $mnist/test/1_7.png
MNIST_DATASET_100/data_test/1/1_8.png -> $mnist/test/1_8.png
MNIST_DATASET_100/data_test/1/1_9.png -> $mnist/test/1_9.png
Line 17:
The go() function - recognition or separation of handwritten
digits
The go() function sequentially calculates the similarity of each of the 20 images in the $mnist/test directory with the “zero sample” $mnist/train_0/0_0.png and the “one sample” $mnist/train_1/1_0.png. If the similarity value of the current image with the zero sample is greater, then the name of the current image is saved in the file list_result_0.txt. Otherwise, it is saved in list_result_1.txt.
As we can see, there is an error in the obtained results — image `1_7.png` was classified as a zero. Verify that this error disappears if we slightly increase the number of samples —
ms.load_samples(dir_mnist, 1, 1) -> ms.load_samples(dir_mnist, 1, 3)
Another Example
We started with the first digits, 0 and 1. Now let’s work with the last three — 7, 8, and 9.
Lines 11, 12, and 13:
"Loading samples": loading 50 samples of seven, 50 samples of
eight, and 50 samples of nine into three internal directories
of the program: $mnist/train_7, $mnist/train_8, $mnist/train_9
Lines 15, 16, and 17:
"Loading data for recognition": 15 images each of seven,
eight, and nine are copied into the common directory
$mnist/test.
Line 19:
The go() function - recognition or separation of handwritten
digits
On the screenshot of the result, we manually added red marks indicating errors. The calculation time was 2 hours, 25 minutes, and 13 seconds (iMac 2019, 3 GHz Intel Core i5, 8GB DDR4).
We hope that now you will be able to independently plan and conduct your own computational experiments with mnist-separator.
Conclusion
Neural networks are no longer the only way to recognize handwritten digits. An alternative has appeared, and this alternative has a number of important advantages:
- No pre-training phase (“deep learning”) required.
- Simplicity. In terms of requirements for computational resources and the amount of code (TensorFlow, Keras, PyTorch), neural networks are significantly inferior to mnist-separator. Download the source code.
- Universality. In the article “Amazing AI Tables”, we demonstrated that, in addition to mnist images, the new technology is capable of recognizing images of other types (see below).