where cov(X,Y) is the covariance between X and Y, while σX and σY are the standard deviations. If N is number of variables then R is a N-by-N matrix. Then, when we have a large number of variables we need a way to visualize R. The following snippet uses a pseudocolor plot to visualize R:
from numpy import corrcoef, sum, log, arange from numpy.random import rand from pylab import pcolor, show, colorbar, xticks, yticks # generating some uncorrelated data data = rand(10,100) # each row of represents a variable # creating correlation between the variables # variable 2 is correlated with all the other variables data[2,:] = sum(data,0) # variable 4 is correlated with variable 8 data[4,:] = log(data[8,:])*0.5 # plotting the correlation matrix R = corrcoef(data) pcolor(R) colorbar() yticks(arange(0.5,10.5),range(0,10)) xticks(arange(0.5,10.5),range(0,10)) show()The result should be as follows:
As we expected, the correlation coefficients for the variable 2 are higher than the others and we observe a strong correlation between the variables 4 and 8.
Don't use the jet colormap!
ReplyDeletehttp://www.jwave.vt.edu/~rkriz/Projects/create_color_table/color_07.pdf
https://abandonmatlab.wordpress.com/2011/05/07/lets-talk-colormaps/
http://cresspahl.blogspot.com/2012/03/expanded-control-of-octaves-colormap.html
I think the hot colormap would be a better choice here
In some cases, Hinton diagrams can be far more useful. See http://www.scipy.org/Cookbook/Matplotlib/HintonDiagrams
ReplyDeletehey,
ReplyDeletei get a strange error when running the script:
/Users/xxx/src/matplotlib/lib/matplotlib/backends/backend_macosx.pyc in draw_quad_mesh(self, gc, master_transform, meshWidth, meshHeight, coordinates, offsets, offsetTrans, facecolors, antialiased, showedges)
98 facecolors,
99 antialiased,
--> 100 showedges)
101
102 def new_gc(self):
"only length-1 arrays can be converted to Python scalars"
also, the colorbar is not visible
what to do?
which version of matplotlib/python are you using?
ReplyDeletehey,
ReplyDeletei'm using Python 2.7.3 and matplotlib '1.2.x' on os x.
btw: if i leave out the colorbar command the error doesn't show up.
I use matplotlib 1.1.1rc.
Deletehello again.
ReplyDeleteactually, i dont know why i had this unstable version installed.
i used pip to install the stable 1.1.1 version and now it works like a charm.
thanks for the fast reply and keep up the good work here :)
I like the correlation example and will try that later on some of my data. It is also cool that we uses the same theme on blogger. /Magnus
ReplyDeleteThanks Magnus. I like this theme because it's simple. If you're interested in matrix visualization don't forget to try Hinton diagrams also.
DeleteThis comment has been removed by the author.
ReplyDeleteLove this blog. Here's the same matrix made in Plotly: http://on.fb.me/14oU6ej
ReplyDeleteDifferent colormap and 20 instead of 10 rows.
You should force 0 to be white dude, otherwise it's great.
DeleteI found it difficult to get result for 288 rows by 1000 columns, Any suggestion????
ReplyDeleteThanks a lot for this! very helpful!
ReplyDeleteJust one question why is the correlation coeff range going from -0.15 to 1 and not from -1 to 1 ?
Hi, correlation is between -1 and 1. When it's 1 it means that the two variables linearly increase at the same time and it is maximum when we compare a variable with itself (see the values on the diagonal). When it's -1 the correlation is still maximum but negative, it means that when one variable increases, the other decreases. We don't reach -1 because this doesn't happen in the variables we considered.
Delete