Showing posts with label google. Show all posts
Showing posts with label google. Show all posts

Monday, May 23, 2011

Four ways to compute the Google Pagerank

As described in THE $25,000,000,000 EIGENVECTOR THE LINEAR ALGEBRA BEHIND GOOGLE, we can compute the score of a page on a web as the maximal eigenvector of the matrix

Google

where A is the scaled connectivity matrix of a web, S is an n × n matrix with all entries 1/n and m is a real number between 0 and 1.

Here's implemented four ways to compute the maximal eigenvector of the matrix using the numpy:
from numpy import *

def powerMethodBase(A,x0,iter):
 """ basic power method """
 for i in range(iter):
  x0 = dot(A,x0)
  x0 = x0/linalg.norm(x0,1)
 return x0

def powerMethod(A,x0,m,iter):
 """ power method modified to compute
     the maximal real eigenvector 
     of the matrix M built on top of the input matrix A """
 n = A.shape[1]
 delta = m*(array([1]*n,dtype='float64')/n) # array([1]*n is [1 1 ... 1] n times
 for i in range(iter):
  x0 = dot((1-m),dot(A,x0)) + delta
 return x0

def maximalEigenvector(A):
 """ using the eig function to compute eigenvectors """
 n = A.shape[1]
 w,v = linalg.eig(A)
 return abs(real(v[:n,0])/linalg.norm(v[:n,0],1))

def linearEquations(A,m):
 """ solving linear equations 
     of the system (I-(1-m)*A)*x = m*s """
 n = A.shape[1]
 C = eye(n,n)-dot((1-m),A)
 b = m*(array([1]*n,dtype='float64')/n)
 return linalg.solve(C,b)

def getTeleMatrix(A,m):
 """ return the matrix M
     of the web described by A """
 n = A.shape[1]
 S = ones((n,n))/n
 return (1-m)*A+m*S

A = array([ [0,     0,     0,     1, 0, 1],
            [1/2.0, 0,     0,     0, 0, 0],
            [0,     1/2.0, 0,     0, 0, 0],
            [0,     1/2.0, 1/3.0, 0, 0, 0],
            [0,     0,     1/3.0, 0, 0, 0],
            [1/2.0, 0,     1/3.0, 0, 1, 0 ] ])

n = A.shape[1] # A is n x n
m = 0.15
M = getTeleMatrix(A,m)

x0 = [1]*n
x1 = powerMethod(A,x0,m,130)
x2 = powerMethodBase(M,x0,130)
x3 = maximalEigenvector(M)
x4 = linearEquations(A,m)

# comparison of the four methods
labels = range(1,6)
print array([labels, x1, x2, x3, x4]).T

The matrix A used to the test the program describe the following web


The scores are (the first column show the labels):
[[ 1.          0.32954577  0.32954577  0.32954577  0.32954577]
 [ 2.          0.16505695  0.16505695  0.16505695  0.16505695]
 [ 3.          0.0951492   0.0951492   0.0951492   0.0951492 ]
 [ 4.          0.12210815  0.12210815  0.12210815  0.12210815]
 [ 5.          0.05195894  0.05195894  0.05195894  0.05195894]
 [ 6.          0.23618099  0.23618099  0.23618099  0.23618099]]

Monday, May 2, 2011

How to create a chart with Google Chart API

The example shows how to create a scatter plot using the Google Chart API.
import random
import urllib

def list2String(x):
 """ from a list like [1,2,5]
     return a string like '1,2,5' """
 data = ""
 for i in x:
  data += str(i)+","
 return data[0:len(data)-1]

def makeChart(x,y,filename):
 query_url = "http://chart.apis.google.com/chart?chxt=x,y&chs=300x200&cht=s&chd=t:"
 query_url += list2String(x)+"|"+list2String(y)
 chart = urllib.urlopen(query_url) # retrieve the chart
 print "saving",query_url
 f = open(filename,"wb")
 f.write(chart.read()) # save the pic
 f.close()

x = random.sample(range(0,100),10) # list with
y = random.sample(range(0,100),10) # random values in [0 100[
makeChart(x,y,"chart.png")
You can embed the picture in a web page:
<img alt="Google chart example" src="http://chart.apis.google.com/chart?chxt=x,y&amp;chs=300x200&amp;cht=s&amp;chd=t:64,10,18,42,49,83,73,27,44,51|77,89,13,87,27,34,38,44,22,42" />
Or use it from the disk.

Google chart example