from scipy.stats.kde import gaussian_kde from scipy.stats import norm from numpy import linspace,hstack from pylab import plot,show,hist # creating data with two peaks sampD1 = norm.rvs(loc=-1.0,scale=1,size=300) sampD2 = norm.rvs(loc=2.0,scale=0.5,size=300) samp = hstack([sampD1,sampD2]) # obtaining the pdf (my_pdf is a function!) my_pdf = gaussian_kde(samp) # plotting the result x = linspace(-5,5,100) plot(x,my_pdf(x),'r') # distribution function hist(samp,normed=1,alpha=.3) # histogram show()The result should be as follows:
Thursday, August 16, 2012
Kernel Density Estimation with scipy
This post continues the last one where we have seen how to how to fit two types of distribution functions (Normal and Rayleigh). This time we will see how to use Kernel Density Estimation (KDE) to estimate the probability density function. KDE is a non-parametric technique for density estimation in which a known density function (the kernel) is averaged across the observed data points to create a smooth approximation. Given the non-parametrica nature of KDE, the main estimator has not a fixed functional form but only it depends upon all the data points we used for the estimation. Let's see the snippet: