## Thursday, December 29, 2011

### Book review: Numpy 1.5 Beginner's Guide

I got the chance to read the book NumPy 1.5 Beginner's Guide written by Ivan Idris and published by Packt Publishing last month. My impression of the book was quite positive. It's a book based on examples, which incrementally introduce all the main features of the library. It is written with a simple language, and it is always easy to understand.

## Contents and structure

The organization and flow are good. The ten chapters contain well thought out examples that you can use as building blocks for your scientific computing projects. Every example is structured in this way:
• An introduction to the problem that the example will solve.
• The code, commented line by line.
• The result of the code.
• A short recap of how the problem has been solved.
• And, sometimes, a multiple choice question to help the reader to test his own understanding.
There is no attempt at teaching the mathematics behind the examples. Every example is a "how to" that can help you to learn how to use the library and can save hours of searching through the official documentation and more complicated texts.

The chapters 1,2 and 3 contain the starting points to use NumPy. They explain how to install NumPy, how to handle the NumPy arrays and how to use some of the basic mathematical/statistical functions provided by the library. Chapters 4 through 7 cover the basics about handling matrices, how to load and write data, how to write universal functions and cover some of the basic modules that are discussed. Chapter 8 explains how to use the unit test functions provided by NumPy. Finally, chapters 9 and 10 (my favorites!) introduce how to integrate NumPy with Matplotlib and SciPy.

## Who is this book for?

This book is aimed at people who know Python and need to start using scientific computing in their programs. It is also suitable for people who use another scientific computing environment, such as Matlab, and want quick-start introduction to NumPy.

## Thursday, December 15, 2011

### Polar Charts with matplolib

A polar system is a two-dimensional coordinate system, where there are two coordinates: the radial and the angular coordinates. The radial coordinate denotes the point distance from a central point (pole) and the angular coordinate denotes the angle required to reach the point from the 0 degree ray (polar axis). Let's see an example of how to make polar charts with matplotlib:
```from pylab import figure,polar,show
from numpy import arange,pi,cos

theta = arange(0, 2, 1./180)*pi # angular coordinates
figure(1)
polar(3*theta, theta/5) # drawing a spiral
figure(2)
polar(theta, cos(4*theta)) # drawing the polar rose
show()
```
The result of this script consists of two charts. The first with the spiral and the second with the polar rose

## Thursday, December 8, 2011

### Lissajous curves

And after the Epitrochoids, we're going to see another family of wonderful figures: The Lissajous curves. The equations that describe these curves are the following

the curves vary with respect the parameter t and their appearance is determined by the ratio a/b and the value of δ.
As usual, I made a snippet to visualize them:
```from numpy import sin,pi,linspace
from pylab import plot,show,subplot

a = [1,3,5,3] # plotting the curves for
b = [1,5,7,4] # different values of a/b
delta = pi/2
t = linspace(-pi,pi,300)

for i in range(0,4):
x = sin(a[i] * t + delta)
y = sin(b[i] * t)
subplot(2,2,i+1)
plot(x,y)

show()
```
This is the result

## Wednesday, December 7, 2011

### Waiting for NumPy 1.5 Beginner’s Guide

I'm waiting to receive a copy of NumPy 1.5 Beginner’s Guide. Lately, I have read a sample chapter of the book and I was surprised because it contains a lot of code examples and clearly explains how to use the code and what it means.

I found on the publisher web site that the book is supposed to cover a lot of topics, from the NumPy installation to the integration of NumPy in real scientific computing projects. If the entire book is like the sample chapter, it will be one of my favorite resources about the NumPy. I can't wait to have the book!

Stay tuned for the complete review ;)

## Wednesday, November 23, 2011

### How to make Bubble Charts with matplotlib

In this post we will see how to make a bubble chart using matplotlib. The snippet that we are going to see was inspired by a tutorial on flowingdata.com where R is used to make a bubble chart that represents some data extracted from a csv file about the crime rates of America by states. I used the dataset provided by flowingdata to create a similar chart with Python. Let's see the code:
```from pylab import *
from scipy import *

# reading the data from a csv file
durl = 'http://datasets.flowingdata.com/crimeRatesByState2005.csv'
rdata = genfromtxt(durl,dtype='S8,f,f,f,f,f,f,f,i',delimiter=',')

rdata = zeros(8) # cutting the label's titles
rdata = zeros(8) # cutting the global statistics

x = []
y = []
color = []
area = []

for data in rdata:
x.append(data) # murder
y.append(data) # burglary
color.append(data) # larceny_theft
area.append(sqrt(data)) # population
# plotting the first eigth letters of the state's name
text(data, data,
data,size=11,horizontalalignment='center')

# making the scatter plot
sct = scatter(x, y, c=color, s=area, linewidths=2, edgecolor='w')
sct.set_alpha(0.75)

axis([0,11,200,1280])
xlabel('Murders per 100,000 population')
ylabel('Burglaries per 100,000 population')
show()
```
The following figure is the resulting bubble chart It shows the number of burglaries versus the number of murders per 100,000 population. Every bubble is a state of America, the size of the bubbles represents the population of the state and the color is the number of larcenies.

## Thursday, November 17, 2011

### Fun with Epitrochoids

An epitrochoid is a curve traced by a point attached to a circle of radius r rolling around the outside of a fixed circle of radius R, where the point is a distance d from the center of the exterior circle [Ref]. Lately I found the Epitrochoid's parametric equations on wikipedia:

So,I decided to plot them with pylab. This is the script I made
```from numpy import sin,cos,linspace,pi
import pylab

# curve parameters
R = 14
r = 1
d = 18

t = linspace(0,2*pi,300)

# Epitrochoid parametric equations
x = (R-r)*cos(t)-d*cos( (R+r)*t / r )
y = (R-r)*sin(t)-d*sin( (R+r)*t / r )

pylab.plot(x,y,'r')
pylab.axis('equal')
pylab.show()
```
And this is the result isn't it fashinating? :)

## Monday, November 7, 2011

### Computing a disparity map in OpenCV

A disparity map contains information related to the distance of the objects of a scene from a viewpoint. In this example we will see how to compute a disparity map from a stereo pair and how to use the map to cut the objects far from the cameras.
The stereo pair is represented by two input images, these images are taken with two cameras separated by a distance and the disparity map is derived from the offset of the objects between them. There are various algorithm to compute a disparity map, the one implemented in OpenCV is the graph cut algorithm. To use it we have to call the function CreateStereoGCState() to initialize the data structure needed by the algorithm and use the function FindStereoCorrespondenceGC() to get the disparity map. Let's see the code:
```def cut(disparity, image, threshold):
for i in range(0, image.height):
for j in range(0, image.width):
# keep closer object
if cv.GetReal2D(disparity,i,j) > threshold:
cv.Set2D(disparity,i,j,cv.Get2D(image,i,j))

disparity_left  = cv.CreateMat(left.height, left.width, cv.CV_16S)
disparity_right = cv.CreateMat(left.height, left.width, cv.CV_16S)

# data structure initialization
state = cv.CreateStereoGCState(16,2)
# running the graph-cut algorithm
cv.FindStereoCorrespondenceGC(left,right,
disparity_left,disparity_right,state)

disp_left_visual = cv.CreateMat(left.height, left.width, cv.CV_8U)
cv.ConvertScale( disparity_left, disp_left_visual, -16 );
cv.Save( "disparity.pgm", disp_left_visual ); # save the map

# cutting the object farthest of a threshold (120)
cut(disp_left_visual,left,120)

cv.NamedWindow('Disparity map', cv.CV_WINDOW_AUTOSIZE)
cv.ShowImage('Disparity map', disp_left_visual)
cv.WaitKey()
```
These are the two input image I used to test the program (respectively left and right):
Result using threshold = 100

## Thursday, November 3, 2011

### Face and eyes detection in OpenCV

The goal of object detection is to find an object of a pre-defined class in an image. In this post we will see how to use the Haar Classifier implemented in OpenCV in order to detect faces and eyes in a single image. We are going to use two trained classifiers stored in two XML files:
The first one is able to detect faces and the second one eyes. To use a trained classifier stored in a XML file we need to load it into memory using the function cv.Load() and call the function cv.HaarDetectObjects() to detect the objects. Let's see the snippet:
```imcolor = cv.LoadImage('detectionimg.jpg') # input image
# running the classifiers
storage = cv.CreateMemStorage()
detectedFace = cv.HaarDetectObjects(imcolor, haarFace, storage)
detectedEyes = cv.HaarDetectObjects(imcolor, haarEyes, storage)

# draw a green rectangle where the face is detected
if detectedFace:
for face in detectedFace:
cv.Rectangle(imcolor,(face,face),
(face+face,face+face),
cv.RGB(155, 255, 25),2)

# draw a purple rectangle where the eye is detected
if detectedEyes:
for face in detectedEyes:
cv.Rectangle(imcolor,(face,face),
(face+face,face+face),
cv.RGB(155, 55, 200),2)

cv.NamedWindow('Face Detection', cv.CV_WINDOW_AUTOSIZE)
cv.ShowImage('Face Detection', imcolor)
cv.WaitKey()
```
These images are produced running the script with two different inputs. The first one is obtained from an image that contains two faces and four eyes: And the second one is obtained from an image that contains one face and two eyes (the shakira.jpg we used in the post about PCA):

## Monday, October 24, 2011

### Corner Detection with OpenCV

Corner detection is an approach used within computer vision systems to extract certain kinds of features and infer the contents of an image [Ref]. We have seen in the previous post how to perform an edge detection using the Sobel operator. Using the edges, we can define a corner as a point for which there are two dominant and different edge directions in a local neighborhood of the point.
One of the most used tool for corner detection is the Harris Corner Detector operator. OpenCV provide a function that implement this operator. The name of the function is CornerHarris(...) and the corners in the image can be found as the local maxima of the returned image. Let's see how to use it in Python:
```imcolor = cv.LoadImage('stairs.jpg')
cornerMap = cv.CreateMat(image.height, image.width, cv.CV_32FC1)
# OpenCV corner detection
cv.CornerHarris(image,cornerMap,3)

for y in range(0, image.height):
for x in range(0, image.width):
harris = cv.Get2D(cornerMap, y, x) # get the x,y value
# check the corner detector response
if harris > 10e-06:
# draw a small circle on the original image
cv.Circle(imcolor,(x,y),2,cv.RGB(155, 0, 25))

cv.NamedWindow('Harris', cv.CV_WINDOW_AUTOSIZE)
cv.ShowImage('Harris', imcolor) # show the image
cv.SaveImage('harris.jpg', imcolor)
cv.WaitKey()
```
The following image is the result of the program:
The red markers are the corners found by the OpenCV's function.

## Friday, October 14, 2011

### Beginning with OpenCV in Python

OpenCV (Open Source Computer Vision) is a library of programming functions for real time computer vision [Ref]. In this post we will see how to use some of the basic functions of OpenCV in Python.

The following code opens an image from the disk, prints some image properties on the console and shows a window that contains the image.
```# load and show an image in gray scale

# print some image properties
print 'Depth:',image.depth,'# Channels:',image.nChannels
print 'Size:',image.width,image.height
print 'Pixel values average',cv.Avg(image)

# create the window
cv.NamedWindow('my window', cv.CV_WINDOW_AUTOSIZE)
cv.ShowImage('my window', image) # show the image
cv.WaitKey() # the window will be closed with a (any)key press
```
This is the image I used for this example.
And this is what the script showed on the console:
```Depth: 8 # Channels: 1
Size: 366 550
Pixel values average (80.46735717834079, 0.0, 0.0, 0.0)
```
Now we can resize the image loaded above:
```# resize the image
dst = cv.CreateImage((150,150), 8, 1)
cv.Resize(image,dst,interpolation=cv.CV_INTER_LINEAR)
cv.ShowImage('my window', dst)
cv.WaitKey()
cv.SaveImage('image2.jpg', dst) # save the image
```
And this is the result.
A Sobel operator can be applied as follow:
```# Sobel operator
dstSobel = cv.CreateMat(image.height, image.width, cv.CV_32FC1)
cv.Sobel(image,dstSobel,1,1,3)
cv.ShowImage('my window', dstSobel)
cv.WaitKey()
cv.SaveImage('imageSobel.jpg', dstSobel)
```
And this is the result on the picture that I'm using:
The final example below uses two operation, a smoothing filter and a subtraction. It applies a Gaussian Blur to the original image and subtracts the result of the filtering from the original image.
```# image smoothing and subtraction
imageBlur = cv.CreateImage(cv.GetSize(image), image.depth, image.nChannels)
# filering the original image
cv.Smooth(image, imageBlur, cv.CV_BLUR, 15, 15)
diff = cv.CreateImage(cv.GetSize(image), image.depth, image.nChannels)
# subtraction (original - filtered)
cv.AbsDiff(image,imageBlur,diff)
cv.ShowImage('my window', diff)
cv.WaitKey()
cv.SaveImage('imageDiff.jpg', diff)
```
The final output is:

## Thursday, October 6, 2011

### The Perceptron

In the field of pattern classification, the purpose of a classifier is to use the object's characteristics to identify which class it belongs to. The Perceptron is a classifier and it is one of the simplest kind of Artificial Neural Network. In this post we will see a Python implementation of the Perceptron. The code that we will see implements the schema represented below.

In this schema, an object (pattern) is represented by a vector x and its characteristics (features) are represented by the vector's elements x_1 and x_2. We call the vector w, with elements w_1 and w_2, the weights vector. The values x_1 and x_2 are the input of the Perceptron. When we activate the Perceptron each input is multiplied by the respective weight and then summed. This produces a single value that it is passed to a threshold step function. The output of this function is the output of the Perceptron. The threshold step function has only two possible output: 1 and -1. Hence, when it is activated his reponse indicates that x belong to the first class (1) or the second (-1).

A Perceptron can be trained and we have to guide his learning. In order to train the Perceptron we need something that the Perceptron can imitate, this data is called train set. So, the perceptron learns as follow: an input pattern is shown, it produces an output, compares the output to what the output should be, and then adjusts its weights. This is repeated until the Perceptron converges to the correct behavior or a maximum number of iteration is reached.

The following Python class implements the Percepron using the Rosenblatt training algorithm.
```from pylab import rand,plot,show,norm

class Perceptron:
def __init__(self):
""" perceptron initialization """
self.w = rand(2)*2-1 # weights
self.learningRate = 0.1

def response(self,x):
""" perceptron output """
y = x*self.w+x*self.w # dot product between w and x
if y >= 0:
return 1
else:
return -1

def updateWeights(self,x,iterError):
"""
updates the weights status, w at time t+1 is
w(t+1) = w(t) + learningRate*(d-r)*x
where d is desired output and r the perceptron response
iterError is (d-r)
"""
self.w += self.learningRate*iterError*x
self.w += self.learningRate*iterError*x

def train(self,data):
"""
trains all the vector in data.
Every vector in data must have three elements,
the third element (x) must be the label (desired output)
"""
learned = False
iteration = 0
while not learned:
globalError = 0.0
for x in data: # for each sample
r = self.response(x)
if x != r: # if we have a wrong response
iterError = x - r # desired response - actual response
self.updateWeights(x,iterError)
globalError += abs(iterError)
iteration += 1
if globalError == 0.0 or iteration >= 100: # stop criteria
print 'iterations',iteration
learned = True # stop learning
```
Perceptrons can only classify data when the two classes can be divided by a straight line (or, more generally, a hyperplane if there are more than two inputs). This is called linear separation. Here is a function that generates a linearly separable random dataset.
```def generateData(n):
"""
generates a 2D linearly separable dataset with n samples.
The third element of the sample is the label
"""
xb = (rand(n)*2-1)/2-0.5
yb = (rand(n)*2-1)/2+0.5
xr = (rand(n)*2-1)/2+0.5
yr = (rand(n)*2-1)/2-0.5
inputs = []
for i in range(len(xb)):
inputs.append([xb[i],yb[i],1])
inputs.append([xr[i],yr[i],-1])
return inputs
```
And now we can use the Perceptron. We generate two dataset, the first one is used to train the classifier (train set), and the second one is used to test it (test set):
```trainset = generateData(30) # train set generation
perceptron = Perceptron()   # perceptron instance
perceptron.train(trainset)  # training
testset = generateData(20)  # test set generation

# Perceptron test
for x in testset:
r = perceptron.response(x)
if r != x: # if the response is not correct
print 'error'
if r == 1:
plot(x,x,'ob')
else:
plot(x,x,'or')

# plot of the separation line.
# The separation line is orthogonal to w
n = norm(perceptron.w)
ww = perceptron.w/n
ww1 = [ww,-ww]
ww2 = [-ww,ww]
plot([ww1, ww2],[ww1, ww2],'--k')
show()
```
The script above gives the following result:

The blue points belong to the first class and the red ones belong to the second. The dashed line is the separation line learned by the Perceptron during the training.

## Thursday, September 22, 2011

### The sampling theorem explained with numpy

The sampling theorem states that a continuous signal x(t) bandlimited to B Hz can be recovered from its samples x[n] = x(n*T), where n is an integer, if T is greater than or equal to 1/(2B) without loss of any information. And we call 2B the Nyquist rate.
Sampling at a rate below the Nyquist rate is called undersampling, it leads to the aliasing effect. Let's observe the aliasing effect with the following script:
```from numpy import linspace,cos,pi,ceil,floor,arange
from pylab import plot,show,axis

# sampling a signal badlimited to 40 Hz
# with a sampling rate of 800 Hz
f = 40;  # Hz
tmin = -0.3;
tmax = 0.3;
t = linspace(tmin, tmax, 400);
x = cos(2*pi*t) + cos(2*pi*f*t); # signal sampling
plot(t, x)

# sampling the signal with a sampling rate of 80 Hz
# in this case, we are using the Nyquist rate.
T = 1/80.0;
nmin = ceil(tmin / T);
nmax = floor(tmax / T);
n = arange(nmin,nmax);
x1 = cos(2*pi*n*T) + cos(2*pi*f*n*T);
plot(n*T, x1, 'bo')

# sampling the signal with a sampling rate of 35 Hz
# note that 35 Hz is under the Nyquist rate.
T = 1/35.0;
nmin = ceil(tmin / T);
nmax = floor(tmax / T);
n = arange(nmin,nmax);
x2 = cos(2*pi*n*T) + cos(2*pi*f*n*T);
plot(n*T, x2, '-r.',markersize=8)

axis([-0.3, 0.3, -1.5, 2.3])
show()
```
The following figure is the result:
The blue curve is the original signal, the blue dots are the samples obtained with the Nyquist rate and the red dots are the samples obtainde with 35 Hz. It's easy to see that the blue samples are enough to recover the blue curve, while the red ones are not enough to capture the oscillations of the signal.

## Wednesday, September 14, 2011

### Uncertainty principle and spectrogram with pylab

The Fourier transform does not give any information on the time at which a frequency component occurs. One approach which can give information on the time resolution of the spectrum is the Short Time Fourier Transform (STFT). Here a moving window is applied to the signal and the Fourier transform is applied to the signal within the window as the window is moved [Ref]. The choice of window is very important with respect to the performance of the STFT in practice. Since the STFT is simply applying the Fourier transform to pieces of the time series of interest, a drawback of the STFT is that it will not be able to resolve events if they happen to appear within the width of the window. In this case, the lack of time resolution of the Fourier transform is present. In general, one cannot achieve simultaneous time and frequency resolution because of the Heisenberg uncertain principle. In the field of particle physics, an elementary particle does not have precise position and momentum. The better one determines the position of the particle, the less precisely is know at that time, and vice versa. For signal processing, this rule translates into the fact that a signal does not simultaneously have a precise location in time and precise frequency [Ref].

The library pylab provides the function specgram(...) to compute the spectrogram of a signal using the STFT. The following script uses that function to show the spectrogram of a signal with different windows size:
```from scipy.io.wavfile import read,write
from pylab import plot,show,subplot,specgram

# Open the Homer Simpson voice: "Ummm, Crumbled up cookie things."
# from http://www.thesoundarchive.com/simpsons/homer/mcrumble.wav

subplot(411)
plot(range(len(data)),data)
subplot(412)
# NFFT is the number of data points used in each block for the FFT
# and noverlap is the number of points of overlap between blocks
specgram(data, NFFT=128, noverlap=0) # small window
subplot(413)
specgram(data, NFFT=512, noverlap=0)
subplot(414)
specgram(data, NFFT=1024, noverlap=0) # big window

show()
```
This image is the result: The pictures shows that changing the number of data points used for each Fourier transform block, the spectrogram loses definition in frequency or in the time.

## Thursday, September 8, 2011

### Sound Synthesis

Physically, sound is an oscillation of a mechanical medium that makes the surrounding air also oscillate and transport the sound as a compression wave. Mathematically, the oscillations can be described as where t is the time, and f the frequency of the oscillation. Sound on a computer is a sequence of numbers and in this post we will see how to generate a musical tone with numpy and how to write it to a file wav file. Each musical note vibrates at a particular frequency and the following script contains a function to generate a note (tone(...)), we'll use this function to generate the A tone creating an oscillation with f = 440 Hz.
```from scipy.io.wavfile import write
from numpy import linspace,sin,pi,int16
from pylab import plot,show,axis

# tone synthesis
def note(freq, len, amp=1, rate=44100):
t = linspace(0,len,len*rate)
data = sin(2*pi*freq*t)*amp
return data.astype(int16) # two byte integers

# A tone, 2 seconds, 44100 samples per second
tone = note(440,2,amp=10000)

write('440hzAtone.wav',44100,tone) # writing the sound to a file

plot(linspace(0,2,2*44100),tone)
axis([0,0.4,15000,-15000])
show()
```
Now we can play the file 440hzAtone.wav with an external player. This plot shows a part of the signal generated by the script:
And using the plotSpectrum function defined in a previous post we can verify that 440 Hz is the fundamental frequency of the tone.

## Friday, September 2, 2011

### Eigenvectors animated gif

We have already seen how to make an animation using pylab. In this post we will use a trick proposed here by Alemi to make another animation. The animation we are going to see shows the eigenvectors of a matrix A 2-by-2 and the result of the linear application A*v when v is a vector that lies on the unit circle. Each frame of the animation is generated by the following script:
```from pylab import arrow,axis,clf,savefig,plot
from numpy import array,pi,cos,sin,dot
from numpy.linalg import eig

theta = pi/9
R = array([ [cos(theta), -sin(theta)],  # rotation matrix
[sin(theta), cos(theta)] ])
v = array([0,1]) # y axis versor

A = array([ [3, -1],  # transformation matrix
[0,  2] ])
eival,eivec = eig(A) # eigen values and eigenvectors

for i in range(18):
v = dot(R,v) # theta radiants rotation of v
y = dot(A,v) # transformation
# current original vector
arrow(0,0,v,v, width=0.01, color='b')
# current resulting vector
arrow(0,0,y,y, width=0.01, color='g')
# ellipse axis
arrow(0,0,eival*eivec[0,0],eival*eivec[1,0], width=0.01, color='y') # major
arrow(0,0,eival*eivec[0,1],eival*eivec[1,1], width=0.01, color='y') # minor
# 1st eigenvector
arrow(0,0,eivec[0,0],eivec[1,0], width=0.01, color='r')
# 2nd eigenvector
arrow(0,0,eivec[0,1],eivec[1,1], width=0.01, color='r')
axis([-3.5,3.5,-3.5,3.5])
savefig('rotation/'+'0'+str(i+1)+'.png') # save the frame
clf() # figure clear
```
And the animated gif is created using the command convert in the directory where the frames have been saved:
```\$ cd rotation
\$ convert *.png -delay 50 -layers Optimize anim.gif
```
The command is provided by the ImageMagick suite available under linux. Click on the following image to see the animation.
The vector v is represented by the blue arrow, A*v is the green arrow, the eigenvectors are the red arrows, and the yellow arrows are the eigenvectors multiplied by the their respective eigenvalues.

## Thursday, August 25, 2011

### How to use ginput

ginput is a function that enables you to select points from a figure using the mouse. This post is a simple example on how to use it.
```from pylab import plot, ginput, show, axis

axis([-1, 1, -1, 1])
pts = ginput(3) # it will wait for three clicks
print "The point selected are"
print pts # ginput returns points as tuples
x=map(lambda x: x,pts) # map applies the function passed as
y=map(lambda x: x,pts) # first parameter to each element of pts
plot(x,y,'-o')
axis([-1, 1, -1, 1])
show()
```
And after three clicks this is the result on the console:
```Please click three times
The point selected are
[(-0.77468982630272942, -0.3418367346938776), (0.11464019851116632, -0.21428571428571436), (0.420347394540943, 0.55612244897959173)]
```
and this is the figure generated:

## Wednesday, August 10, 2011

### Applying a Moebius transformation to an image

The Moebius transformation is defined as
where z is a complex variable different than -d/c. And a, b, c, d are complex numbers.
In this post we will see how to apply the Moebius transform to an image.
Given a set of three distinct points on the complex plane z1, z2, z3 and a second set of distinct points w1, w2, w3, there exists precisely one Moebius transformation f(z) which maps the zs to the ws. So, in the first step we have to determine f(z) from the given sets of points. We will use the explicit determinant formula to compute the coefficients:
```from pylab import *
from numpy import *

zp=[157+148j, 78+149j, 54+143j]; # (zs) the complex point zp[i]
wa=[147+143j, 78+140j, 54+143j]; # (ws) will be in wa[i]

# transformation parameters
a = linalg.det([[zp*wa, wa, 1],
[zp*wa, wa, 1],
[zp*wa, wa, 1]]);

b = linalg.det([[zp*wa, wa, wa],
[zp*wa, wa, wa],
[zp*wa, wa, wa]]);

c = linalg.det([[zp, wa, 1],
[zp, wa, 1],
[zp, wa, 1]]);

d = linalg.det([[zp*wa, zp, 1],
[zp*wa, zp, 1],
[zp*wa, zp, 1]]);
```
Now we can apply the transformation to the pixel coordinates.
```img = fliplr(imread('mondrian.jpg')) # load an image

r = ones((500,500,3),dtype=uint8)*255 # empty-white image
for i in range(img.shape):
for j in range(img.shape):
z = complex(i,j)
# transformation applied to the pixels coordinates
w = (a*z+b)/(c*z+d)
r[int(real(w)),int(imag(w)),:] = img[i,j,:] # copy of the pixel

subplot(1,2,1)
title('Original Mondrian')
imshow(img)
subplot(1,2,2)
title('Mondrian after the transformation')
imshow(roll(r,120,axis=1))
show()
```
And this is the result.
This is another image obtained changing the vectors zp and wa.
As we can see, the result of the transformation has some empty areas that we should fill using interpolation. So let's look to another way to implement a geometric transformation on an image. The following code uses the scipy.ndimage.geometric_transform to implement the inverse Moebius transform:
```from scipy.ndimage import geometric_transform

def shift_func(coords):
""" Define the moebius transformation, though backwards """
#turn the first two coordinates into an imaginary number
z = coords + 1j*coords
w = (d*z-b)/(-c*z+a) #the inverse mobius transform
#take the color along for the ride
return real(w),imag(w),coords

r = geometric_transform(img,shift_func,cval=255,output_shape=(450,350,3))
```
It gives the following result:
We can see that the geometric_transform provides the pixel interpolation automatically.

## Wednesday, August 3, 2011

### How to plot the frequency spectrum with scipy

Spectrum analysis is the process of determining the frequency domain representation of a time domain signal and most commonly employs the Fourier transform. The Discrete Fourier Transform (DFT) is used to determine the frequency content of signals and the Fast Fourier Transform (FFT) is an efficient method for calculating the DFT. Scipy implements FFT and in this post we will see a simple example of spectrum analysis:
```from numpy import sin, linspace, pi
from pylab import plot, show, title, xlabel, ylabel, subplot
from scipy import fft, arange

def plotSpectrum(y,Fs):
"""
Plots a Single-Sided Amplitude Spectrum of y(t)
"""
n = len(y) # length of the signal
k = arange(n)
T = n/Fs
frq = k/T # two sides frequency range
frq = frq[range(n/2)] # one side frequency range

Y = fft(y)/n # fft computing and normalization
Y = Y[range(n/2)]

plot(frq,abs(Y),'r') # plotting the spectrum
xlabel('Freq (Hz)')
ylabel('|Y(freq)|')

Fs = 150.0;  # sampling rate
Ts = 1.0/Fs; # sampling interval
t = arange(0,1,Ts) # time vector

ff = 5;   # frequency of the signal
y = sin(2*pi*ff*t)

subplot(2,1,1)
plot(t,y)
xlabel('Time')
ylabel('Amplitude')
subplot(2,1,2)
plotSpectrum(y,Fs)
show()
```
The program shows the following figure, on top we have a plot of the signal and on the bottom the frequency spectrum.

## Wednesday, July 27, 2011

### PCA and image compression with numpy

In the previous post we have seen the princomp function. This function performs principal components analysis (PCA) on the n-by-p data matrix and uses all the p principal component to computed the principal component scores. In this new post, we will see a modified version of the princomp where the representation of the original data in the in the principal component space is computed with less than p principal components:
```from numpy import mean,cov,cumsum,dot,linalg,size,flipud

def princomp(A,numpc=0):
# computing eigenvalues and eigenvectors of covariance matrix
M = (A-mean(A.T,axis=1)).T # subtract the mean (along columns)
[latent,coeff] = linalg.eig(cov(M))
p = size(coeff,axis=1)
idx = argsort(latent) # sorting the eigenvalues
idx = idx[::-1]       # in ascending order
# sorting eigenvectors according to the sorted eigenvalues
coeff = coeff[:,idx]
latent = latent[idx] # sorting eigenvalues
if numpc < p and numpc >= 0:
coeff = coeff[:,range(numpc)] # cutting some PCs if needed
score = dot(coeff.T,M) # projection of the data in the new space
return coeff,score,latent```
The following code uses the new version of the princomp to compute the PCA of a matrix that represents an image in gray scale. The PCA is computed ten times with an increasing number of principal components. The script show the images reconstructed using less than 50 principal components (out of 200).
```from pylab import imread,subplot,imshow,title,gray,figure,show,NullLocator
A = mean(A,2) # to get a 2-D array
full_pc = size(A,axis=1) # numbers of all the principal components
i = 1
dist = []
for numpc in range(0,full_pc+10,10): # 0 10 20 ... full_pc
coeff, score, latent = princomp(A,numpc)
Ar = dot(coeff,score).T+mean(A,axis=0) # image reconstruction
# difference in Frobenius norm
dist.append(linalg.norm(A-Ar,'fro'))
# showing the pics reconstructed with less than 50 PCs
if numpc <= 50:
ax = subplot(2,3,i,frame_on=False)
ax.xaxis.set_major_locator(NullLocator()) # remove ticks
ax.yaxis.set_major_locator(NullLocator())
i += 1
imshow(flipud(Ar))
title('PCs # '+str(numpc))
gray()

figure()
imshow(flipud(A))
title('numpc FULL')
gray()
show()
```
The resulting images:

We can see that 40 principal components are enough to reconstruct the original image.

At the end of this experiment, we can plot the distance of the reconstructed images from the original image in Frobenius norm (red curve) and the cumulative sum of the eigenvalues (blue curve). Recall that the cumulative sum of the eigenvalues shows the level of variance accounted by each of the corresponding eigenvectors. On the x axis there is the number of eigenvalues/eigenvectors used.
```from pylab import plot,axis
figure()
perc = cumsum(latent)/sum(latent)
dist = dist/max(dist)
plot(range(len(perc)),perc,'b',range(0,full_pc+10,10),dist,'r')
axis([0,full_pc,0,1.1])
show()
```

## Friday, July 22, 2011

### Principal Component Analysis with numpy

The following function is a three-line implementation of the Principal Component Analysis (PCA). It is inspired by the function princomp of the matlab's statistics toolbox.
```from numpy import mean,cov,double,cumsum,dot,linalg,array,rank
from pylab import plot,subplot,axis,stem,show,figure

def princomp(A):
""" performs principal components analysis
(PCA) on the n-by-p data matrix A
Rows of A correspond to observations, columns to variables.

Returns :
coeff :
is a p-by-p matrix, each column containing coefficients
for one principal component.
score :
the principal component scores; that is, the representation
of A in the principal component space. Rows of SCORE
correspond to observations, columns to components.
latent :
a vector containing the eigenvalues
of the covariance matrix of A.
"""
# computing eigenvalues and eigenvectors of covariance matrix
M = (A-mean(A.T,axis=1)).T # subtract the mean (along columns)
[latent,coeff] = linalg.eig(cov(M)) # attention:not always sorted
score = dot(coeff.T,M) # projection of the data in the new space
return coeff,score,latent
```
(In this other post you can find an updated version of this function).
In the following test a 2D dataset wil be used. The result of this test is a plot with the two principal components (dashed lines), the original data (blue dots) and the new data (red stars). As we expected the first principal component describes the direction of maximum variance and the second is orthogonal to the first.
```A = array([ [2.4,0.7,2.9,2.2,3.0,2.7,1.6,1.1,1.6,0.9],
[2.5,0.5,2.2,1.9,3.1,2.3,2,1,1.5,1.1] ])

coeff, score, latent = princomp(A.T)

figure()
subplot(121)
# every eigenvector describe the direction
# of a principal component.
m = mean(A,axis=1)
plot([0, -coeff[0,0]*2]+m, [0, -coeff[0,1]*2]+m,'--k')
plot([0, coeff[1,0]*2]+m, [0, coeff[1,1]*2]+m,'--k')
plot(A[0,:],A[1,:],'ob') # the data
axis('equal')
subplot(122)
# new data
plot(score[0,:],score[1,:],'*g')
axis('equal')
show()
```

In this second example princomp(.) is tested on a 4D dataset. In this example the matrix of the data is rank deficient and only the first two components are necessary to bring the information of the entry dataset.
```A = array([[-1, 1, 2, 2],
[-2, 3, 1, 0],
[ 4, 0, 3,-1]],dtype=double)

coeff, score, latent = princomp(A)
perc = cumsum(latent)/sum(latent)
figure()
# the following plot shows that first two components
# account for 100% of the variance.
stem(range(len(perc)),perc,'--b')
axis([-0.3,4.3,0,1.3])
show()
print 'the principal component scores'
print score.T # only the first two columns are nonzero
print 'The rank of A is'
print rank(A)  # indeed, the rank of A is 2
```

```Coefficients for principal components
[[  1.464140e+00   1.588382e+00   0.000000e+00  -4.440892e-16]
[  2.768170e+00  -1.292503e+00  -2.775557e-17   6.557254e-16]
[ -4.232310e+00  -2.958795e-01   1.110223e-16  -3.747002e-16]]
The rank of A is
2
```

## Monday, July 18, 2011

### Jukowsi Transformation

The following code applies the Jukowsi transformation to various circles on the complex plane.
```from pylab import plot,show,axis,subplot
import cmath
from numpy import *

"""Returns an array z where every z[i] is a point
on the circle's circumference on the complex plane"""
teta = linspace(0,pi*2,150)
z = []
for a in teta:
z.append(zc)
return array( z, dtype=complex )

def jukowsi(z,L):
return z+L/z # asimmetric Jukowski transformation

# let's try the transformation on three different circles
subplot(131)
z = cplxcircm(1,complex(0.5,0)) #1st circle
w = jukowsi(z,1) # applying the trasformation
plot(real(z),imag(z),'b',real(w),imag(w),'r')
axis('equal')

subplot(132)
z = cplxcircm(0.8,complex(0.4,0.3))
w = jukowsi(z,0.188)
plot(real(z),imag(z),'b',real(w),imag(w),'r')
axis('equal')

subplot(133)
z = cplxcircm(1.1,complex(0.1,0.0))
w = jukowsi(z,1)
plot(real(z),imag(z),'b',real(w),imag(w),'r')
axis('equal')
show()
```
The script shows the following figure. Every subplot shows two curves on the complex plane (x real values, y imaginary values), the blue curve is the starting curve and the red is the transformation.

## Thursday, July 14, 2011

### Polynomial curve fitting

We have seen already how to a fit a given set of points minimizing an error function, now we will see how to find a fitting polynomial for the data using the function polyfit provided by numpy:
```from numpy import *
import pylab

# data to fit
x = random.rand(6)
y = random.rand(6)

# fit the data with a 4th degree polynomial
z4 = polyfit(x, y, 4)
p4 = poly1d(z4) # construct the polynomial

z5 = polyfit(x, y, 5)
p5 = poly1d(z5)

xx = linspace(0, 1, 100)
pylab.plot(x, y, 'o', xx, p4(xx),'-g', xx, p5(xx),'-b')
pylab.legend(['data to fit', '4th degree poly', '5th degree poly'])
pylab.axis([0,1,0,1])
pylab.show()

```
Let's see the two polynomials:

## Tuesday, July 12, 2011

### Dice rolling experiment

If we roll a die a large number of times, and we compute the mean and variance, as exaplained here, we’d expect to obtain a mean = 3.5 and a variance = 2.916. Let's simulate that with a script:
```import pylab
import math

# Rolling the die 1000 times
v = pylab.randint(1,7,size=(1000))

print 'mean',pylab.mean(v)
print 'variance',pylab.var(v)

pylab.hist(v, bins=6) # histogram of the outcoming
pylab.xlim(1,6)
pylab.show()
```
Here's the result:
```mean 3.435
variance 2.781775
```

## Monday, July 11, 2011

### Prime factor decomposition of a number

The following function compute the prime factors (PFs) of an integer.
```from math import floor

def factors(n):
result = []
for i in range(2,n+1): # test all integers between 2 and n
s = 0;
while n/i == floor(n/float(i)): # is n/i an integer?
n = n/float(i)
s += 1
if s > 0:
for k in range(s):
result.append(i) # i is a pf s times
if n == 1:
return result

# test
print factors(90)
```
The result will be
```[2, 3, 3, 5]
```
This means that 90 is equal to 2*3*3*5.

## Wednesday, July 6, 2011

### How to use reflection

A simple code snippet that use reflection to print names of attributes, methods and doc strings:
```class Obj:
""" An object that use reflection """

def __init__(self,name):
""" the constructor of this object """
self.name = name

def print_methods(self):
""" print all the methods of this object and their doc string"""
print '\n* Methods *'
for names in dir(self):
attr = getattr(self,names)
if callable(attr):
print names,':',attr.__doc__

def print_attributes(self):
""" print all the attributes of this object and their value """
print '* Attributes *'
for names in dir(self):
attr = getattr(self,names)
if not callable(attr):
print names,':',attr

def print_all(self):
""" calls all the methods of this object """
for names in dir(self):
attr = getattr(self,names)
if callable(attr) and names != 'print_all' and names != '__init__':
attr() # calling the method

o = Obj('the my object')
o.print_all()
```
Which gives the following output:
```* Attributes *
__doc__ :  An object that use reflection
__module__ : __main__
name : the my object

* Methods *
__init__ :  the constructor of this object
print_all :  calls all the methods of this object
print_my_attributes :  print all the attributes of this object
print_my_methods :  print all the methods of this object
```

## Monday, July 4, 2011

### How to plot biorhythm

The following script plot the biorhythm of a person born in 14/3/1988 in a range of 20 days.
```from datetime import date
import matplotlib.dates
from pylab import *
from numpy import array,sin,pi

t0 = date(1988,3,14).toordinal()
t1 = date.today().toordinal()
t = array(range((t1-10),(t1+10))) # range of 20 days

y = 100*[sin(2*pi*(t-t0)/23),  # Physical
sin(2*pi*(t-t0)/28),  # Emotional
sin(2*pi*(t-t0)/33)]; # Intellectual

# converting ordinals to date
label = []
for p in t:
label.append(date.fromordinal(p))

fig = figure()
ax = fig.gca()
plot(label,y,label,y,label,y)
legend(['Physical', 'Emotional', 'Intellectual'])
# formatting the dates on the x axis
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%d/%b'))

show()
```
The resulting graph:

## Friday, July 1, 2011

### Approximating pi

This script uses the following formula to approximate the value of pi with a fixed number of correct digits.
```import math
def pi(digits):
k = 0
pi = 0.0
e = 1.0
tol = pow(10,-digits) # is minimum error that we want to accept
while e > tol:
pi_old = pi
pi += (2*pow(-1,k)*pow(3,.5-k))/(2*k+1)
e = abs(pi-pi_old) # current error
k += 1
print '%0.16f %20.16f' % (pi,e)
print '\nerror',e,'\niterations ',k
print '%0.16f' % pi,'result'
print '%0.16f' % math.pi,'real pi'

pi(8) # approximating pi with 8 correcet digits
```
During the execution you can see the current approximated value on the left column and the difference with the approximation at the previous step on the right column. At the end, the difference between the approximated value and the value provided by the math python module will be printed.
```\$ python pi.py
3.4641016151377544   3.4641016151377544
3.0792014356780038   0.3849001794597506
3.1561814715699539   0.0769800358919501
3.1378528915956800   0.0183285799742738
3.1426047456630846   0.0047518540674045
3.1413087854628832   0.0012959602002014
3.1416743126988376   0.0003655272359544
3.1415687159417840   0.0001055967570536
3.1415997738115058   0.0000310578697218
3.1415905109380802   0.0000092628734256
3.1415933045030817   0.0000027935650015
3.1415924542876463   0.0000008502154354
3.1415927150203800   0.0000002607327336
3.1415926345473140   0.0000000804730660
3.1415926595217138   0.0000000249743999
3.1415926517339976   0.0000000077877162

error 7.78771624965e-09
iterations  16
3.1415926517339976 result
3.1415926535897931 real pi
```

## Thursday, June 30, 2011

### Animation with matplotlib, bringin the Sierpinski triangle to life

Few post ago we have seen how to plot the Sierpinski triangle. This post is an update that shows how to create an animation where at each step a new point of the fractal is plotted:
```from numpy import *
import pylab

x = [0, 0];

A = [ [.5, 0], [0, .5] ];
b1 = [0, 0];
b2 = [.5, 0];
b3 = [.25, sqrt(3)/4];

pylab.ion() # animation on

#Note the comma after line. This is placed here because plot returns a list of lines that are drawn.
line, = pylab.plot(x,x,'m.',markersize=6)
pylab.axis([0,1,0,1])

data1 = []
data2 = []
iter = 0

while True:
r = fix(random.rand()*3)
if r==0:
x = dot(A,x)+b1
if r==1:
x = dot(A,x)+b2
if r==2:
x = dot(A,x)+b3
data1.append(x)
data2.append(x)
line.set_xdata(data1)  # update the data
line.set_ydata(data2)
pylab.draw() # draw the points again
iter += 1
print iter
```
This is the result:

## Tuesday, June 28, 2011

### Searching for IP address using regular expression

This snippet finds an IP address in a string using regex:
```import re
ip = re.compile('(([0-5]\.)|([0-4][0-9]\.)|([0-1]?[0-9]?[0-9]\.)){3}'
+'(([0-5])|([0-4][0-9])|([0-1]?[0-9]?[0-9]))')

if match:
print match.group(), # matching substring
print 'at position',match.span() # indexes of the substring found
else:
```
The output will be
```IP address found: 192.168.0.1 at position (19, 30)
```

## Thursday, June 9, 2011

### Crawling the web with SGMLParser

In this example we will use SGMLParser in order to build a simple web crawler.
```import urllib
from random import choice
from sgmllib import SGMLParser

def reset(self):
SGMLParser.reset(self)
self.links = [] # list with the urls

def start_a(self, attrs):
for k in attrs:
if k == 'href' and k.startswith('http'):

def explore(parser,s_url,maxvisit=10,iter=0):
""" pick a random link in the page s_url
if iter < maxvisit: # it will stop after maxvisit iteration
print '(',iter,') I am in',s_url
parser.reset() # reset the list
parser.feed(usock.read()) # parse the current page
else: # if the page has no links to follow
print 'the page has no links'

# test the crawler starting from the python's website
explore(parser,"http://www.python.org/")
```
Let's go!
```( 0 ) I am in http://www.python.org/
( 1 ) I am in http://wiki.python.org/moin/NumericAndScientific
( 2 ) I am in http://numpy.scipy.org/
( 3 ) I am in http://sphinx.pocoo.org/
( 4 ) I am in http://www.bitbucket.org/birkenfeld/sphinx/issues/
( 5 ) I am in http://blog.bitbucket.org
( 6 ) I am in http://haproxy.1wt.eu/
( 7 ) I am in http://www.olivepeak.com/blog/posts/read/free-your-port-80-with-haproxy
( 8 ) I am in http://www.olivepeak.com/peaknotes/
( 9 ) I am in http://notes.olivepeak.com/account/create
```

## Tuesday, June 7, 2011

### Sierpinski fractal

As described in the introduction of Numerical Computing in Matlab we can generate fractals using affine transformations. The following script uses it to generate the famous Sierpinski triangle:
```from numpy import *
import pylab

x = [0, 0];

A = [ [.5, 0], [0, .5] ];
b1 = [0, 0];
b2 = [.5, 0];
b3 = [.25, sqrt(3)/4];

for i in range(3000): # 3000 points will be generated
r = fix(random.rand()*3)
if r==0:
x = dot(A,x)+b1
if r==1:
x = dot(A,x)+b2
if r==2:
x = dot(A,x)+b3
pylab.plot(x,x,'m.',markersize=2)

pylab.show()
```
It will take a while, here's the result

## Friday, June 3, 2011

### The Collatz conjecture

The Collatz conjecture is a famous unsolved problem in number theory. Start with any
positive integer n. Repeat the following steps:
• If n = 1, stop.
• If n is even, replace it with n/2.
• If n is odd, replace it with 3n + 1

The unanswered question is, does the process always terminate?

Let to see a script that generate the sequence of number involved by the algorithm:

```import sys
from pylab import *

# take n from the command line
n = int(sys.argv)

i = 0
old_n = n
while n > 1:
if n%2 == 0: # if n is even
n = n/2
else:
n = 3*n+1
i += 1;
plot([i-1, i],[old_n,n],'-ob')
old_n = n

title('hailstone sequence for '+sys.argv+', length is '+str(i))
show()
```
The script will plot the sequence:
`\$ python collatz.py 25`

The Collatz conjecture is is also known as the 3n + 1 conjecture, the Ulam conjecture, Kakutani's problem, the Thwaites conjecture, Hasse's algorithm, or the Syracuse problem; the sequence of numbers involved is referred to as the hailstone sequence or hailstone numbers, or as wondrous numbers.

References

## Thursday, June 2, 2011

### Integration of a function using quad from scipy

Example of integration of a function using scipy:
```from scipy.integrate import quad

f = lambda x: 4-x*x # function to integrate
p = lambda x: (-x*x*x)/3 + 4*x # the primitive of f

# let's try to integrate f in the interval [0 2]
r_analytic = p(2.0)-p(0.0) # analytic integration

print 'Results'
print r
print r_analytic
```
Numerical and analytic result compared:
```Results
5.33333333333
5.33333333333
```

## Saturday, May 28, 2011

### Data fitting using fmin

We have seen already how to find the minimum of a function using fmin, in this example we will see how use it to fit a set of data with a curve minimizing an error function:
```from pylab import *
from numpy import *
from numpy.random import normal
from scipy.optimize import fmin

# parametric function, x is the independent variable
# and c are the parameters.
# it's a polynomial of degree 2
fp = lambda c, x: c+c*x+c*x*x
real_p = rand(3)

# error function to minimize
e = lambda p, x, y: (abs((fp(p,x)-y))).sum()

# generating data with noise
n = 30
x = linspace(0,1,n)
y = fp(real_p,x) + normal(0,0.05,n)

# fitting the data with fmin
p0 = rand(3) # initial parameter value
p = fmin(e, p0, args=(x,y))

print 'estimater parameters: ', p
print 'real parameters: ', real_p

xx = linspace(0,1,n*3)
plot(x,y,'bo', xx,fp(real_p,xx),'g', xx, fp(p,xx),'r')

show()
```
The following figure will be showed, in green the original curve used to generate the noisy data, in blue the noisy data and in red the curve found in the minimization process:

The parameters will be printed also:
```Optimization terminated successfully.
Current function value: 0.861885
Iterations: 77
Function evaluations: 146
estimater parameters:  [ 0.92504602  0.87328979  0.64051926]
real parameters:  [ 0.86284356  0.95994753  0.67643758]
```

## Friday, May 27, 2011

### Delaunay triangulation with matplotlib

How to plot the delaunay triangulation for a set of points in the plane using matplotlib:
```import matplotlib.delaunay as triang
import pylab
import numpy

# 10 random points (x,y) in the plane
x,y =  numpy.array(numpy.random.standard_normal((2,10)))
cens,edg,tri,neig = triang.delaunay(x,y)

for t in tri:
# t, t, t are the points indexes of the triangle
t_i = [t, t, t, t]
pylab.plot(x[t_i],y[t_i])

pylab.plot(x,y,'o')
pylab.show()
```
The output will be similar to this:

## Wednesday, May 25, 2011

### Pickling: How to serialize objects

There is an example on how to serialize and de-serialize python objects.
```import pickle
import random

print 'Write data...'
output = open('mydata.pkl', 'wb')
for i in range(0,5): # writing five lists on the file
list = random.sample(range(0,100),10) # random integers list
print 'saving:',list
pickle.dump(list, output)
output.close()

input = open("mydata.pkl","rb") # open the file in reading mode
try:
while True: # load from the file until EOF is reached
except EOFError:
print 'End of file reached'
input.close()
```
This is the output:
```Write data...
saving: [15, 10, 85, 65, 17, 31, 19, 2, 68, 44]
saving: [96, 2, 27, 90, 99, 66, 31, 97, 51, 12]
saving: [82, 61, 12, 49, 50, 84, 91, 83, 23, 89]
saving: [83, 28, 85, 75, 14, 68, 96, 58, 5, 66]
saving: [55, 73, 1, 24, 29, 92, 58, 96, 41, 10]
loaded:  [15, 10, 85, 65, 17, 31, 19, 2, 68, 44]
loaded:  [96, 2, 27, 90, 99, 66, 31, 97, 51, 12]
loaded:  [82, 61, 12, 49, 50, 84, 91, 83, 23, 89]
loaded:  [83, 28, 85, 75, 14, 68, 96, 58, 5, 66]
loaded:  [55, 73, 1, 24, 29, 92, 58, 96, 41, 10]
End of file reached
```

## Monday, May 23, 2011

### Four ways to compute the Google Pagerank

As described in THE \$25,000,000,000 EIGENVECTOR THE LINEAR ALGEBRA BEHIND GOOGLE, we can compute the score of a page on a web as the maximal eigenvector of the matrix where A is the scaled connectivity matrix of a web, S is an n × n matrix with all entries 1/n and m is a real number between 0 and 1.

Here's implemented four ways to compute the maximal eigenvector of the matrix using the numpy:
```from numpy import *

def powerMethodBase(A,x0,iter):
""" basic power method """
for i in range(iter):
x0 = dot(A,x0)
x0 = x0/linalg.norm(x0,1)
return x0

def powerMethod(A,x0,m,iter):
""" power method modified to compute
the maximal real eigenvector
of the matrix M built on top of the input matrix A """
n = A.shape
delta = m*(array(*n,dtype='float64')/n) # array(*n is [1 1 ... 1] n times
for i in range(iter):
x0 = dot((1-m),dot(A,x0)) + delta
return x0

def maximalEigenvector(A):
""" using the eig function to compute eigenvectors """
n = A.shape
w,v = linalg.eig(A)
return abs(real(v[:n,0])/linalg.norm(v[:n,0],1))

def linearEquations(A,m):
""" solving linear equations
of the system (I-(1-m)*A)*x = m*s """
n = A.shape
C = eye(n,n)-dot((1-m),A)
b = m*(array(*n,dtype='float64')/n)
return linalg.solve(C,b)

def getTeleMatrix(A,m):
""" return the matrix M
of the web described by A """
n = A.shape
S = ones((n,n))/n
return (1-m)*A+m*S

A = array([ [0,     0,     0,     1, 0, 1],
[1/2.0, 0,     0,     0, 0, 0],
[0,     1/2.0, 0,     0, 0, 0],
[0,     1/2.0, 1/3.0, 0, 0, 0],
[0,     0,     1/3.0, 0, 0, 0],
[1/2.0, 0,     1/3.0, 0, 1, 0 ] ])

n = A.shape # A is n x n
m = 0.15
M = getTeleMatrix(A,m)

x0 = *n
x1 = powerMethod(A,x0,m,130)
x2 = powerMethodBase(M,x0,130)
x3 = maximalEigenvector(M)
x4 = linearEquations(A,m)

# comparison of the four methods
labels = range(1,6)
print array([labels, x1, x2, x3, x4]).T
```

The matrix A used to the test the program describe the following web

The scores are (the first column show the labels):
```[[ 1.          0.32954577  0.32954577  0.32954577  0.32954577]
[ 2.          0.16505695  0.16505695  0.16505695  0.16505695]
[ 3.          0.0951492   0.0951492   0.0951492   0.0951492 ]
[ 4.          0.12210815  0.12210815  0.12210815  0.12210815]
[ 5.          0.05195894  0.05195894  0.05195894  0.05195894]
[ 6.          0.23618099  0.23618099  0.23618099  0.23618099]]
```