Let's see a simple example where a smoothing parameter for a Bayesian classifier is select using the capabilities of the Sklearn library.
To begin we load one of the test datasets provided by sklearn (the same used here) and we hold 33% of the samples for the final evaluation:
from sklearn.datasets import load_digits data = load_digits() from sklearn.cross_validation import train_test_split X,X_test,y,y_test = train_test_split(data.data,data.target, test_size=.33, random_state=1899)Now, we import the classifier we want to use (a Bernoullian Naive Bayes in this case), specify a set of values for the parameter we want to choose and run a grid search:
from sklearn.naive_bayes import BernoulliNB # test the model for alpha = 0.1, 0.2, ..., 1.0 parameters = [{'alpha':np.linspace(0.1,1,10)}] from sklearn.grid_search import GridSearchCV clf = GridSearchCV(BernoulliNB(), parameters, cv=10, scoring='f1') clf.fit(X,y) # running the grid searchThe grid search has evaluated the classifier for each value specified for the parameter alpha using the CV. We can visualize the results as follows:
res = zip(*[(f1m, f1s.std(), p['alpha']) for p, f1m, f1s in clf.grid_scores_]) subplot(2,1,1) plot(res[2],res[0],'-o') subplot(2,1,2) plot(res[2],res[1],'-o') show()
The plots above show the average score (top) and the standard deviation of the score (bottom) for each values of alpha used. Looking at the graphs it seems plausible that a small alpha could be a good choice.
We can also see thet using the alpha value that gave us the best results on the the test set we selected at the beginning gives us results that are similar to the ones obtained during the CV stage:
from sklearn.metrics import f1_score print 'Best alpha in CV = %0.01f' % clf.best_params_['alpha'] final = f1_score(y_test,clf.best_estimator_.predict(X_test)) print 'F1-score on the final testset: %0.5f' % final
Best alpha in CV = 0.1 F1-score on the final testset: 0.85861