The scikit-learn library provides a great implmentation of the Isomap algorithm and a dataset of handwritten digits. In this post we'll see how to load the dataset and how to compute an embedding of the dataset on a bidimentional space.
Let's load the dataset and show some samples:
from pylab import scatter,text,show,cm,figure from pylab import subplot,imshow,NullLocator from sklearn import manifold, datasets # load the digits dataset # 901 samples, about 180 samples per class # the digits represented 0,1,2,3,4 digits = datasets.load_digits(n_class=5) X = digits.data color = digits.target # shows some digits figure(1) for i in range(36): ax = subplot(6,6,i) ax.xaxis.set_major_locator(NullLocator()) # remove ticks ax.yaxis.set_major_locator(NullLocator()) imshow(digits.images[i], cmap=cm.gray_r)The result should be as follows:
Now X is a matrix where each row is a vector that represent a digit. Each vector has 64 elements and it has been obtained using spatial resampling on the above images. We can apply the Isomap algorithm on this data and plot the result with the following lines:
# running Isomap # 5 neighbours will be considered and reduction on a 2d space Y = manifold.Isomap(5, 2).fit_transform(X) # plotting the result figure(2) scatter(Y[:,0], Y[:,1], c='k', alpha=0.3, s=10) for i in range(Y.shape[0]): text(Y[i, 0], Y[i, 1], str(color[i]), color=cm.Dark2(color[i] / 5.), fontdict={'weight': 'bold', 'size': 11}) show()The new embedding for the data will be as follows:
We computed a bidimensional version of each pattern in the dataset and it's easy to see that the separation between the five classes in the new manifold is pretty neat.