Tuesday, January 14, 2014

Review: Fundamentals of Data Analytics in Python

I massively use Python for data analysis and when I was offered to review the video tutorial with the title “Fundamentals of Data Analytics in Python LiveLessons”, I couldn't refuse.

The tutorial starts from the basics showing how to install Python and its data analysis libraries. Then it continues explaining the main uses that data scientists and engineers practice during their analysis: importing and cleaning data, vectorial computing, visualization and data summarization.




Most of the videos are commented sessions of IPython notebook sometimes supported by some slides. The authors go deep into the explanation of how to use the libraries for the manipulation of the data (Numpy, Scipy and Pandas), while they summarize the potential of the other complementary libraries. In particular, the last video is a survey of various visualization tools.

In conclusion, this video tutorial provides a solid introduction to the main tools for data analysis in Python and a clear view of the open source Python tools relevant to scientific and engineering programming. This tutorial seems perfect for people who need to learn the technical methodologies for data analysis and for people who already know Python but want to acquire skills about data analysis.

Thursday, December 12, 2013

Multiple axes and subplots in Plotly

Some time ago we have seen how to visualize 2D histograms with Plotly and in this post we will see how to use one of the mostin interesting new features introduced by the Plotly guys: multiple axes into subplots. This features makes us able to couple subplots, so when you zoom or pan in one subplot, it zooms and pans in the other subplots. Just like the graphs produced by D3.
Here's how to plot a subplots matrix where each cell is a scatterplot between the features of the Iris dataset that we already used here. The first thing we need to do is to convert the data in the format required by the Plotly API:
from sklearn.datasets import load_iris
iris = load_iris()

attr = [f.replace(' (cm)', '') for f in iris.feature_names]
colors = {'setosa': 'rgb(31, 119, 180)', 
          'versicolor': 'rgb(255, 127, 14)', 
          'virginica': 'rgb(44, 160, 44)'}

data = []
for i in range(4):
    for j in range(4):
        for t,flower in enumerate(iris.target_names):
            data.append({"name": flower, 
                         "x": iris.data[iris.target == t,i],
                         "y": iris.data[iris.target == t,j],
                         "type":"scatter", "mode":"markers",
                         'marker': {'color': colors[flower], 
                                    'opacity':0.7},
                         "xaxis": "x"+(str(i) if i!=0 else ''),
                         "yaxis": "y"+(str(j) if j!=0 else '')})
Then, we create a layout to adjust the look and feel:
d = 0.04; # padding
dms = [[i*d+i*(1-3*d)/4,i*d+((i+1)*(1-3*d)/4)] for i in range(4)]

layout = {
    "xaxis":{"domain":dms[0], "title":attr[0], 
             'zeroline':False,'showline':False},
    "yaxis":{"domain":dms[0], "title":attr[0], 
             'zeroline':False,'showline':False},
    "xaxis1":{"domain":dms[1], "title":attr[1], 
              'zeroline':False,'showline':False},
    "yaxis1":{"domain":dms[1], "title":attr[1], 
              'zeroline':False,'showline':False},
    "xaxis2":{"domain":dms[2], "title":attr[2], 
              'zeroline':False,'showline':False},
    "yaxis2":{"domain":dms[2], "title":attr[2], 
              'zeroline':False,'showline':False},
    "xaxis3":{"domain":dms[3], "title":attr[3], 
              'zeroline':False,'showline':False},
    "yaxis3":{"domain":dms[3], "title":attr[3], 
              'zeroline':False,'showline':False},
    "showlegend":False,
    "width": 500,
    "height": 550,
    "title":"Iris flower data set",
    "titlefont":{'color':'rgb(67,67,67)', 'size': 20}
    }
Finally, we import the plotly module (see this page for more details about the installation) and we are read to invoke the Plotly remote service:
import plotly
p = plotly.plotly('supersexyusername', 'mysecretkey')
# iplot shows the graph in the ipython notebook
# use plot if you're outside of the notebook
p.iplot(data,layout=layout, width=500,height=550)
The result should be as follows: This interactive graph of the iris data set below was inspired by this wonderful D3 example by Mike Bostock. Find out more example of Plotly visualizations in Python inside the IPython notebook here.

Thursday, November 7, 2013

Book review: Learning IPython for Interactive Computing and Data Visualization

I use IPython almost every day and I am very happy to review Learning IPython for Interactive Computing and Data Visualization by Cyrille Rossant, and published by Packt Publishing.


The book introduces the IPython basics and then focuses on how to combine IPython with some of the most useful libraries for data analysis such as Numpy, Matplotlib, Basemap and Pandas. Every topic is covered with examples and the code presented is also available online. The references proposed are always up-to-date and give the reader the opportunity to discovery resources not covered in the book.

Favorite chapter

The Chapter 5 is a little gem. Here, you can find an introduction on how to use IPython to write high performance code through Cython and the parallel programming facilities of IPython. The attention paid by the author on how to write efficient code is remarkable.

Conclusions

This book definitely achieves its goal to provide a technical introduction to IPython. It is intended for Python users who want an easy to follow introduction to IPython, but also experienced users will find this book useful. It is to notice that, at the moment, this is the only book about IPython.