This is the first snippet:

from scipy.stats import norm from numpy import linspace from pylab import plot,show,hist,figure,title # picking 150 of from a normal distrubution # with mean 0 and standard deviation 1 samp = norm.rvs(loc=0,scale=1,size=150) param = norm.fit(samp) # distribution fitting # now, param[0] and param[1] are the mean and # the standard deviation of the fitted distribution x = linspace(-5,5,100) # fitted distribution pdf_fitted = norm.pdf(x,loc=param[0],scale=param[1]) # original distribution pdf = norm.pdf(x) title('Normal distribution') plot(x,pdf_fitted,'r-',x,pdf,'b-') hist(samp,normed=1,alpha=.3) show()The result should be as follows

In the code above a dataset of 150 samples have been created using a normal distribution with mean 0 and standar deviation 1, then a fitting procedure have been applied on the data. In the figure we can see the original distribution (blue curve) and the fitted distribution (red curve) and we can observe that they are really similar.

Let's do the same with a Rayleigh distribution:

from scipy.stats import norm,rayleigh samp = rayleigh.rvs(loc=5,scale=2,size=150) # samples generation param = rayleigh.fit(samp) # distribution fitting x = linspace(5,13,100) # fitted distribution pdf_fitted = rayleigh.pdf(x,loc=param[0],scale=param[1]) # original distribution pdf = rayleigh.pdf(x,loc=5,scale=2) title('Rayleigh distribution') plot(x,pdf_fitted,'r-',x,pdf,'b-') hist(samp,normed=1,alpha=.3) show()The resulting plot:

As expected, the two distributions are very close.

or you could plug your samples into http://zunzun.com/ :D

ReplyDeleteThe actual direct link would be:

ReplyDeletehttp://zunzun.com/StatisticalDistributions/1/

Hurray! Been missing Glowing Python posts. Happy to see a new one, learn something new.

ReplyDeleteI think, this does nothing else than calculating the mean and standard deviation of samp:

ReplyDelete>>> samp = norm.rvs(loc=0,scale=1,size=150)

>>> param = norm.fit(samp)

>>> mu = np.mean(samp)

>>> sigma = np.std(samp)

>>> mu==param[0]

True

>>> sigma==param[1]

True

>>>

According to the scipy documentation it should perform a Maximum Likelihood Estimate.

DeleteFor the normal distribution, the sample mean ( which is what np.mean() calculates ) is the maximum likelihood estimator of the population ( parametric ) mean. This is not true of all distributions, though.

DeleteIf it helps, some code for doing this w/o normalizing, which plots the gaussian fit over the real histogram:

ReplyDeletefrom scipy.stats import norm

from numpy import linspace

from pylab import plot,show,hist

def PlotHistNorm(data, log=False):

# distribution fitting

param = norm.fit(data)

mean = param[0]

sd = param[1]

#Set large limits

xlims = [-6*sd+mean, 6*sd+mean]

#Plot histogram

histdata = hist(data,bins=12,alpha=.3,log=log)

#Generate X points

x = linspace(xlims[0],xlims[1],500)

#Get Y points via Normal PDF with fitted parameters

pdf_fitted = norm.pdf(x,loc=mean,scale=sd)

#Get histogram data, in this case bin edges

xh = [0.5 * (histdata[1][r] + histdata[1][r+1]) for r in xrange(len(histdata[1])-1)]

#Get bin width from this

binwidth = (max(xh) - min(xh)) / len(histdata[1])

#Scale the fitted PDF by area of the histogram

pdf_fitted = pdf_fitted * (len(data) * binwidth)

#Plot PDF

plot(x,pdf_fitted,'r-')

this code doesn't work….

DeleteThis comment has been removed by the author.

ReplyDeleteIs there a way to fit data to an exponential distribution such that it maximizes the entropy H(p_i) = - sum p_i*log(p_i) where p_i is the probability of a given event?

ReplyDeleteHi, scipy implements the exponential distribution and way to fit the parameters:

Deletehttp://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.expon.html

I'm not sure about the fitting technique implemented.

Hello I have precipiation data, I am trying to fit a general extreme value distribution to this data before computing percentiles of this data. will I get new data if I fit this distribution and then use this for my analysis? Thanks

ReplyDeletehi there, I'm not sure I understand your question. Anyway, if you want to estimate a percentile, I'd recommend you to use numpy.percentile. In case you want to generate data from a distribution you have fitted, the distributions in scipy have a method called rvs to generate the samples.

DeleteHello, I noticed this code fits a distribution over random data. If we had an array of data that we wanted to fit our distribution over, how would we do that?

ReplyDeleteHi there, of course you can. You just have to put your values in the variable samp.

Delete