The Glowing Python: Visualizing correlation matrices

Friday, October 12, 2012

Visualizing correlation matrices

The correlation is one of the most common and most useful statistics. A correlation is a single number that describes the degree of relationship between two variables. The function corrcoef provided by numpy returns a matrix R of correlation coefficients calculated from an input matrix X whose rows are variables and whose columns are observations. Each element of the matrix R represents the correlation between two variables and it is computed as

where cov(X,Y) is the covariance between X and Y, while σ_X and σ_Y are the standard deviations. If N is number of variables then R is a N-by-N matrix. Then, when we have a large number of variables we need a way to visualize R. The following snippet uses a pseudocolor plot to visualize R:

from numpy import corrcoef, sum, log, arange
from numpy.random import rand
from pylab import pcolor, show, colorbar, xticks, yticks

# generating some uncorrelated data
data = rand(10,100) # each row of represents a variable

# creating correlation between the variables
# variable 2 is correlated with all the other variables
data[2,:] = sum(data,0)
# variable 4 is correlated with variable 8
data[4,:] = log(data[8,:])*0.5

# plotting the correlation matrix
R = corrcoef(data)
pcolor(R)
colorbar()
yticks(arange(0.5,10.5),range(0,10))
xticks(arange(0.5,10.5),range(0,10))
show()

The result should be as follows:

As we expected, the correlation coefficients for the variable 2 are higher than the others and we observe a strong correlation between the variables 4 and 8.

15 comments:

AnonymousOctober 12, 2012 at 4:25 PM
Don't use the jet colormap!

http://www.jwave.vt.edu/~rkriz/Projects/create_color_table/color_07.pdf

https://abandonmatlab.wordpress.com/2011/05/07/lets-talk-colormaps/

http://cresspahl.blogspot.com/2012/03/expanded-control-of-octaves-colormap.html

I think the hot colormap would be a better choice here
ReplyDelete
Replies
JoseOctober 14, 2012 at 1:08 AM
In some cases, Hinton diagrams can be far more useful. See http://www.scipy.org/Cookbook/Matplotlib/HintonDiagrams
ReplyDelete
Replies
AnonymousOctober 18, 2012 at 4:04 PM
hey,

i get a strange error when running the script:

/Users/xxx/src/matplotlib/lib/matplotlib/backends/backend_macosx.pyc in draw_quad_mesh(self, gc, master_transform, meshWidth, meshHeight, coordinates, offsets, offsetTrans, facecolors, antialiased, showedges)
98 facecolors,
99 antialiased,
--> 100 showedges)
101
102 def new_gc(self):

"only length-1 arrays can be converted to Python scalars"

also, the colorbar is not visible
what to do?
ReplyDelete
Replies
JustGlowingOctober 18, 2012 at 4:31 PM
which version of matplotlib/python are you using?
ReplyDelete
Replies
AnonymousOctober 18, 2012 at 4:41 PM
hey,

i'm using Python 2.7.3 and matplotlib '1.2.x' on os x.
btw: if i leave out the colorbar command the error doesn't show up.
ReplyDelete
Replies
AnonymousOctober 18, 2012 at 6:06 PM
hello again.

actually, i dont know why i had this unstable version installed.
i used pip to install the stable 1.1.1 version and now it works like a charm.

thanks for the fast reply and keep up the good work here :)
ReplyDelete
Replies
Magnus HummelgårdDecember 3, 2012 at 7:32 PM
I like the correlation example and will try that later on some of my data. It is also cool that we uses the same theme on blogger. /Magnus
ReplyDelete
Replies
UnknownJune 9, 2013 at 3:21 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownJune 9, 2013 at 3:22 AM
Love this blog. Here's the same matrix made in Plotly: http://on.fb.me/14oU6ej
Different colormap and 20 instead of 10 rows.
ReplyDelete
Replies
AnonymousFebruary 5, 2015 at 9:39 PM
I found it difficult to get result for 288 rows by 1000 columns, Any suggestion????
ReplyDelete
Replies
AnonymousApril 28, 2016 at 4:37 AM
Thanks a lot for this! very helpful!
Just one question why is the correlation coeff range going from -0.15 to 1 and not from -1 to 1 ?
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.

Friday, October 12, 2012

Visualizing correlation matrices

15 comments:

Quote