Showing posts with label network analysis. Show all posts
Showing posts with label network analysis. Show all posts

Friday, May 3, 2013

A new RefCard from the GlowingPython!

Check out the DZone RefCard from the GlowingPython:


This Refcard is a collection of code examples that introduces the reader to the principal Data Mining tasks using Python. In the RefCard you will find the following contents:
  • How to import and visualize data.
  • How to classify and cluster data.
  • How to discover relationships in the data using regression and correlation measures.
  • How to reduce the dimensionality of the data in order to compress and visualize the information it brings.
  • How to analyze structured data with networkx.
Each topic is covered with code examples based on four of the major Python libraries for data analysis and manipulation: numpy, matplotlib,sklearn and networkx. Here is a preview of the first two pages:


Click on the preview to get the RefCard!

Wednesday, February 6, 2013

Betweenness Centrality

In Network Analysis the identification of important nodes is a common task. We have various centrality measures that we can use and in this post we will focus on the Betweenness Centrality. We will see how this measure is computed and how to use the library networkx in order to create a visualization of the network where the nodes with the highest betweenness are highlighted. The betweenness focuses on the number of visits through the shortests paths. If a walker moves from one node to another node via the shortests path, then the nodes with a large number of visits have a higher centrality. The betweenness centrality is defined as


where s(s,t) is total number of shortest paths from node s to node t and sv(s,t) is the number of those paths that pass through v.
Let's see how to compute the betweenness with networkx. As first step we have to load a sample network (yes, it's the same of this post):
# read the graph (gml format)
G = nx.read_gml('lesmiserables.gml',relabel=True)
Now we have a representation G of our network and we can use the function betweenness_centrality() to compute the centrality of each node. This function returns a list of tuples, one for each node, and each tuple contains the label of the node and the centrality value. We can use this information in order to trim the original network and keep only the most important nodes:
def most_important(G):
 """ returns a copy of G with
     the most important nodes
     according to the pagerank """ 
 ranking = nx.betweenness_centrality(G).items()
 print ranking
 r = [x[1] for x in ranking]
 m = sum(r)/len(r) # mean centrality
 t = m*3 # threshold, we keep only the nodes with 3 times the mean
 Gt = G.copy()
 for k, v in ranking:
  if v < t:
   Gt.remove_node(k)
 return Gt

Gt = most_important(G) # trimming
And we can use the original network and the trimmed one to visualize the network as follows:
from pylab import show
# create the layout
pos = nx.spring_layout(G)
# draw the nodes and the edges (all)
nx.draw_networkx_nodes(G,pos,node_color='b',alpha=0.2,node_size=8)
nx.draw_networkx_edges(G,pos,alpha=0.1)

# draw the most important nodes with a different style
nx.draw_networkx_nodes(Gt,pos,node_color='r',alpha=0.4,node_size=254)
# also the labels this time
nx.draw_networkx_labels(Gt,pos,font_size=12,font_color='b')
show()
The graph should be like this one:


This graph is pretty interesting, indeed it highlights the nodes which are very influential on the way the information spreads over the network. In the sample network we used each node represents a character and the connection between two characters represent the coappearance in the same chapter of the book 'Les miserable'. Looking at the graph we can easily say what are the most important characters according to the Betweenness Centrality. We can also observe some interesting situations like the ones of Valjean and Myriel. They are to connected to groups of characters who don't have a direct connection with the main ones.

Saturday, November 17, 2012

First steps with networkx

One of my favorite topics is the study of structures and, inspired by the presentation of Jacqueline Kazil and Dana Bauer at PyCon US, I started to use networkx in order to analyze some networks. This library provides a lot facilities for the creation, the visualization and the mining of structured data. So, I decided to write this post that shows the first steps to start with it. We will see how to load a network from the gml format and how to prune the network in order to visualize only the nodes with a high degree. In the following examples the coappearance network of characters in the novel Les Miserables, freely available here, will be used. In this network each node represents a character and the connection between two characters represent the coappearance in the same chapter.

Let's start with the snippets. We can load and visualize the network with the following code:
# read the graph (gml format)
G = nx.read_gml('lesmiserables.gml',relabel=True)

# drawing the full network
figure(1)
nx.draw_spring(G,node_size=0,edge_color='b',alpha=.2,font_size=10)
show()
This should be the result:


It's easy to see that the graph is not really helpful. Most of the details of the network are still hidden and it's impossible to understand which are the most important nodes. Let's plot an histogram of the number of connections per node:
# distribution of the degree
figure(2)
d = nx.degree(G)
hist(d.values(),bins=15)
show()
The result should be as follows:


Looking at this histogram we can see that only few characters have more than ten connections. Then, we decide to visualize only them:
def trim_nodes(G,d):
 """ returns a copy of G without 
     the nodes with a degree less than d """
 Gt = G.copy()
 dn = nx.degree(Gt)
 for n in Gt.nodes():
  if dn[n] <= d:
   Gt.remove_node(n)
 return Gt

# drawing the network without
# nodes with degree less than 10
Gt = trim_nodes(G,10)
figure(3)
nx.draw(Gt,node_size=0,node_color='w',edge_color='b',alpha=.2)
show()
In the graph below we can see the final result of the analysis. This time the graph makes us able to observe which are the most relevant characters and how they are related to each other according to their coappearance through the chapters.