from scipy.stats.kde import gaussian_kde from scipy.stats import norm from numpy import linspace,hstack from pylab import plot,show,hist # creating data with two peaks sampD1 = norm.rvs(loc=-1.0,scale=1,size=300) sampD2 = norm.rvs(loc=2.0,scale=0.5,size=300) samp = hstack([sampD1,sampD2]) # obtaining the pdf (my_pdf is a function!) my_pdf = gaussian_kde(samp) # plotting the result x = linspace(-5,5,100) plot(x,my_pdf(x),'r') # distribution function hist(samp,normed=1,alpha=.3) # histogram show()The result should be as follows:
Thursday, August 16, 2012
Kernel Density Estimation with scipy
This post continues the last one where we have seen how to how to fit two types of distribution functions (Normal and Rayleigh). This time we will see how to use Kernel Density Estimation (KDE) to estimate the probability density function. KDE is a non-parametric technique for density estimation in which a known density function (the kernel) is averaged across the observed data points to create a smooth approximation. Given the non-parametrica nature of KDE, the main estimator has not a fixed functional form but only it depends upon all the data points we used for the estimation. Let's see the snippet:
Friday, July 20, 2012
Distribution fitting with scipy
Distribution fitting is the procedure of selecting a statistical distribution that best fits to a dataset generated by some random process. In this post we will see how to fit a distribution using the techniques implemented in the Scipy library.
This is the first snippet:
In the code above a dataset of 150 samples have been created using a normal distribution with mean 0 and standar deviation 1, then a fitting procedure have been applied on the data. In the figure we can see the original distribution (blue curve) and the fitted distribution (red curve) and we can observe that they are really similar. Let's do the same with a Rayleigh distribution:
As expected, the two distributions are very close.
This is the first snippet:
from scipy.stats import norm from numpy import linspace from pylab import plot,show,hist,figure,title # picking 150 of from a normal distrubution # with mean 0 and standard deviation 1 samp = norm.rvs(loc=0,scale=1,size=150) param = norm.fit(samp) # distribution fitting # now, param[0] and param[1] are the mean and # the standard deviation of the fitted distribution x = linspace(-5,5,100) # fitted distribution pdf_fitted = norm.pdf(x,loc=param[0],scale=param[1]) # original distribution pdf = norm.pdf(x) title('Normal distribution') plot(x,pdf_fitted,'r-',x,pdf,'b-') hist(samp,normed=1,alpha=.3) show()The result should be as follows
In the code above a dataset of 150 samples have been created using a normal distribution with mean 0 and standar deviation 1, then a fitting procedure have been applied on the data. In the figure we can see the original distribution (blue curve) and the fitted distribution (red curve) and we can observe that they are really similar. Let's do the same with a Rayleigh distribution:
from scipy.stats import norm,rayleigh samp = rayleigh.rvs(loc=5,scale=2,size=150) # samples generation param = rayleigh.fit(samp) # distribution fitting x = linspace(5,13,100) # fitted distribution pdf_fitted = rayleigh.pdf(x,loc=param[0],scale=param[1]) # original distribution pdf = rayleigh.pdf(x,loc=5,scale=2) title('Rayleigh distribution') plot(x,pdf_fitted,'r-',x,pdf,'b-') hist(samp,normed=1,alpha=.3) show()The resulting plot:
As expected, the two distributions are very close.
Sunday, July 8, 2012
Color quantization
The aim of color clustering is to produce a small set of representative colors which captures the color properties of an image. Using the small set of color found by the clustering, a quantization process can be applied to the image to find a new version of the image that has been "simplified," both in colors and shapes.
In this post we will see how to use the K-Means algorithm to perform color clustering and how to apply the quantization. Let's see the code:
We have the original image on the top and the quantized version on the bottom. We can see that the image on the bottom has only six colors. Now, we can plot the colors found with the clustering in the RGB space with the following code:
In this post we will see how to use the K-Means algorithm to perform color clustering and how to apply the quantization. Let's see the code:
from pylab import imread,imshow,figure,show,subplot from numpy import reshape,uint8,flipud from scipy.cluster.vq import kmeans,vq img = imread('clearsky.jpg') # reshaping the pixels matrix pixel = reshape(img,(img.shape[0]*img.shape[1],3)) # performing the clustering centroids,_ = kmeans(pixel,6) # six colors will be found # quantization qnt,_ = vq(pixel,centroids) # reshaping the result of the quantization centers_idx = reshape(qnt,(img.shape[0],img.shape[1])) clustered = centroids[centers_idx] figure(1) subplot(211) imshow(flipud(img)) subplot(212) imshow(flipud(clustered)) show()The result shoud be as follows:
We have the original image on the top and the quantized version on the bottom. We can see that the image on the bottom has only six colors. Now, we can plot the colors found with the clustering in the RGB space with the following code:
# visualizing the centroids into the RGB space from mpl_toolkits.mplot3d import Axes3D fig = figure(2) ax = fig.gca(projection='3d') ax.scatter(centroids[:,0],centroids[:,1],centroids[:,2],c=centroids/255.,s=100) show()And this is the result:
Subscribe to:
Posts (Atom)