One-Shot Learning to Classify Omniglot data


Overview

This is a One-Shot Learning Handwritten Character Classifer written in Python using the SciPy library. The code trains against a few examples of handwritten characters and then tries to classify characters correctly. The error rate is around 38%. If you want to try a state-of-the-art, better-than-human, one-shot learning library that you can apply to all sorts of data, check out this repo.

One-Shot Learning

One-shot learning refers to the process of learning features from datasets containing few training samples. This is in contrast to some machine learning techniques like CNN which require large datasets in order to perform the learning task effectively.

Humans have the ability to learn categories using very few examples and can match features and perform pattern recognition well despite of less data available. If I was to provide you a reference card of an unfamiliar alphabet and show you a series of hand written variants of that alphabet, chances are you'd be able to match a high percentage of this test set with the reference alphabet -- perhaps misclassifying a few exotic variations. The key motivation for the one-shot learning technique is that systems, like humans, can use prior knowledge about object categories to classify new objects.

There was a recent paper [http://web.mit.edu/cocosci/Papers/Science-2015-Lake-1332-8.pdf] that came out called Human level concept learning through probabilsitic program induction.They used something called Bayesian Program Learning or BPL to implement One Shot Learning. Bayesian referans to Baye's thereom, which attemtps to use simple stochastic programst to reprsent concepts. The word stocasthic, referring to the theory of probability, is what Bayes theorem loosely revolves around. So by using "simple stochastic programs" or probability algorithms, BPL can "represent concepts" so given the knowledge of how to write a known letter, it can calculate the chance of a novel letter being the same or not. BPL builds these "simple stochastic programs compositionally from parts, subparts, and spatial relations. All these things exist in a hierarchy of knowledge which the machine has gained through little experience. So they trained it on a dataset of handwriting chracters and it was able to recognize chracters with a better error rate than deep learning or even humans. OK so does that mean BPL is the way to go? Well, it does have its flaws. It lacks explicit knoweldge of certain things like parallel lines, symmetry, and connections between ends of strokes and other strokes. And the learning isn't really transferable to other things so it's not better than deep learning in every way.

A few months later though, DeepMind challenged the paper by releasing their own paper called Oneshot learning with memory augmented neural networks. The basic idea they had was that deep learning is very data intensive, but perhaps there's a way to build a deep neural net that only needs a few examples to learn - deep Learning without the huge datasets. So they built whats called a Memory Augmented Neural Network or MANN. A MANN has two parts, a controller which is either a feed forward neural net or LSTM neural net and an external memory module. The controller interacts with the external memory module with a number of read/write heads. Its capable of long term storage via slow updates of weights and short term storage via the external memory module. They fed it a few examples of handwritten characters and continously trained it thousands of times on just those examples. It performed meta learning, which means it learned to learn. And guess what? It outperformed humans as ! So they proved that one-shot learning can be accomplished by using a neural network architecture.

In this example, we will build a one-shot handwritten character classifier in Python using the Scipy library. We gotta import our dependencies first. We would require 3 libraries, numpy, scipy.

Dependencies

Use pip to install any missing dependencies

Basic Usage

Step 1 - Run the code! It'll train against the handwritten character samples in the all_runs folder and then test it's classification ability. It should output an average error rate of around 38%.

python one-shot-classification-demo.py

Once we have those we can define two variables, the number of runs we want to complete and a reference var for where we store our class labels. Then in our main method we can create an array of the size of runs which is 20. We'll use this array to store all of our classification error rates, one every run. Then we'll write a for loop to train our algorithm 20 times. For each run, we'll run a classification function which will attempt to classify a small sample set of images, and store the error rate in in the array. Then we'll print out the error rate to terminal and when we are done with all of our runs, we'll go ahead and get the mean error rate from the array and print it out as the last statement in terminal.

In [1]:
#! Python3
# one-shot-classification-demo.py

import os
import numpy as np
from scipy.ndimage import imread
from scipy.spatial.distance import cdist


# Parameters
nrun = 20  # Number of classification runs
path_to_all_runs = os.path.join('all_runs')
fname_label = 'class_labels.txt'  # Where class labels are stored for each run


def classification_run(folder, f_load, f_cost, ftype='cost'):
    # Compute error rate for one run of one-shot classification
    #
    # Input
    #  folder : contains images for a run of one-shot classification
    #  f_load : itemA = f_load('file.png') should read in the image file and
    #           process it
    #  f_cost : f_cost(itemA,itemB) should compute similarity between two
    #           images, using output of f_load
    #  ftype  : 'cost' if small values from f_cost mean more similar,
    #           or 'score' if large values are more similar
    #
    # Output
    #  perror : percent errors (0 to 100% error)
    #
    assert ftype in {'cost', 'score'}

    with open(os.path.join(path_to_all_runs, folder, fname_label)) as f:
        pairs = [line.split() for line in f.readlines()]
    # Unzip the pairs into two sets of tuples
    test_files, train_files = zip(*pairs)

    answers_files = list(train_files)  # Copy the training file list
    test_files = sorted(test_files)
    train_files = sorted(train_files)
    n_train = len(train_files)
    n_test = len(test_files)

    # Load the images (and, if needed, extract features)
    train_items = [f_load(os.path.join(path_to_all_runs, f))
                   for f in train_files]
    test_items = [f_load(os.path.join(path_to_all_runs, f))
                  for f in test_files]

    # Compute cost matrix
    costM = np.zeros((n_test, n_train))
    for i, test_i in enumerate(test_items):
        for j, train_j in enumerate(train_items):
            costM[i, j] = f_cost(test_i, train_j)
    if ftype == 'cost':
        y_hats = np.argmin(costM, axis=1)
    elif ftype == 'score':
        y_hats = np.argmax(costM, axis=1)
    else:
        # This should never be reached due to the assert above
        raise ValueError('Unexpected ftype: {}'.format(ftype))

    # compute the error rate by counting the number of correct predictions
    correct = len([1 for y_hat, answer in zip(y_hats, answers_files)
                   if train_files[y_hat] == answer])
    pcorrect = correct / float(n_test)  # Python 2.x ensure float division
    perror = 1.0 - pcorrect
    return perror * 100


def modified_hausdorf_distance(itemA, itemB):
    # Modified Hausdorff Distance
    #
    # Input
    #  itemA : [n x 2] coordinates of black pixels
    #  itemB : [m x 2] coordinates of black pixels
    #
    #  M.-P. Dubuisson, A. K. Jain (1994). A modified hausdorff distance for object matching.
    #  International Conference on Pattern Recognition, pp. 566-568.
    #
    D = cdist(itemA, itemB)
    mindist_A = D.min(axis=1)
    mindist_B = D.min(axis=0)
    mean_A = np.mean(mindist_A)
    mean_B = np.mean(mindist_B)
    return max(mean_A, mean_B)


def load_img_as_points(filename):
    # Load image file and return coordinates of black pixels in the binary image
    #
    # Input
    #  filename : string, absolute path to image
    #
    # Output:
    #  D : [n x 2] rows are coordinates
    #
    I = imread(filename, flatten=True)
    # Convert to boolean array and invert the pixel values
    I = ~np.array(I, dtype=np.bool)
    # Create a new array of all the non-zero element coordinates
    D = np.array(I.nonzero()).T
    return D - D.mean(axis=0)

So how does this classification step work? Well, before we answer that , we need to understand these two methods, load_img_as_points and modified_hausdorf_distance. The load_img_as_points function loads an image file, in our case this will be a character image. It then converts the image to an array and finds all the non zero values, that is all of the inked pixels and stores that in an array. then it creates an array of all the coordinates of those pixels and returns that. The modified_hausdorf_distance is a metric that computes the similarity between two images by comparing their pixel coordinate arrays. That comes from the load_img_as_points functions. It calculates the mean difference between both images and returns it. The last parameter of the classificaton function, cost, just notifies the function that small values from the modified_hausdorf_distance mean more similar.

Lastly, in the main classification function, we retrieve both our training and testing images and load their image point matrices into memory. Then we compute the cost matrix using the modified hausdorff distance. After that, we compute the error rate and return it. That's all! We do this for every run and then average them all to get the average rate.

In [2]:
# Running this demo should lead to a result of 38.8% average error rate.
#
#   M.-P. Dubuisson, A. K. Jain (1994). A modified hausdorff distance for object matching.
#     International Conference on Pattern Recognition, pp. 566-568.
#
# ** Models should be trained on images in 'images_background' directory to
#    avoid using images and alphabets used in the one-shot evaluation **
#
print('One-shot classification demo with Modified Hausdorff Distance')
perror = np.zeros(nrun)
for r in range(nrun):
    perror[r] = classification_run('run{:02d}'.format(r + 1),
                                   load_img_as_points,
                                   modified_hausdorf_distance,
                                   'cost')
    print(' run {:02d} (error {:.1f}%)'.format(r, perror[r]))
total = np.mean(perror)
print('Average error {:.1f}%'.format(total))
One-shot classification demo with Modified Hausdorff Distance
 run 00 (error 45.0%)
 run 01 (error 35.0%)
 run 02 (error 40.0%)
 run 03 (error 25.0%)
 run 04 (error 30.0%)
 run 05 (error 15.0%)
 run 06 (error 60.0%)
 run 07 (error 35.0%)
 run 08 (error 40.0%)
 run 09 (error 55.0%)
 run 10 (error 15.0%)
 run 11 (error 70.0%)
 run 12 (error 65.0%)
 run 13 (error 35.0%)
 run 14 (error 15.0%)
 run 15 (error 25.0%)
 run 16 (error 30.0%)
 run 17 (error 40.0%)
 run 18 (error 70.0%)
 run 19 (error 30.0%)
Average error 38.8%

We can see that when we run this code, it'll return an average error rate of around a third. Which isn't state-of-the-art like DeepMind or BPL but it does make for a good baseline demo of one-shot learning.


You can download the complete source code from here.


References

Credit for this demo code goes to the authors of the original BPL paper, this was the baseline demo code they used to compare their novel (much better) Matlab results against.

Omniglot data set for one-shot learning

This dataset contains 1623 different handwritten characters from 50 different alphabets.
Each of the 1623 characters was drawn online via Amazon's Mechanical Turk by 20 different people.
The Omniglot data set contains 50 alphabets total. These are split into a background set of 30 alphabets and an evaluation set of 20 alphabets.

The data is stored in the all_runs folder which contain the testing set, training set and the class labels in each of the 20 runs subfolders.

Lake, B. M., Salakhutdinov, R., and Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332-1338.