tag:blogger.com,1999:blog-1693014329567144872.post465036473043427682..comments2023-08-27T06:49:20.658+01:00Comments on The Glowing Python: k-nearest neighbor searchJustGlowinghttp://www.blogger.com/profile/17212021288715206641noreply@blogger.comBlogger16125tag:blogger.com,1999:blog-1693014329567144872.post-78976134324975030002020-09-02T14:50:51.356+01:002020-09-02T14:50:51.356+01:00yesyesAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-1693014329567144872.post-40772185746868034742020-09-02T13:54:36.920+01:002020-09-02T13:54:36.920+01:00the function knn_search should work also in higher...the function knn_search should work also in higher dimension. If it doesn't work there's a bug.JustGlowinghttps://www.blogger.com/profile/17212021288715206641noreply@blogger.comtag:blogger.com,1999:blog-1693014329567144872.post-6354701688950547132020-09-02T13:52:06.430+01:002020-09-02T13:52:06.430+01:00Great post!
the only problem is, it only works fo...Great post!<br /><br />the only problem is, it only works for 2D spaces, while it would be much more usefull if it worked in higher dimentions!<br />Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-1693014329567144872.post-57029167134496030532016-02-02T16:19:03.579+00:002016-02-02T16:19:03.579+00:00should be axis=1 instead of axis=0 for euclidean d...should be axis=1 instead of axis=0 for euclidean distanceAnonymoushttps://www.blogger.com/profile/11162533055096729122noreply@blogger.comtag:blogger.com,1999:blog-1693014329567144872.post-52257697454334491492013-12-05T15:54:13.003+00:002013-12-05T15:54:13.003+00:00Hi, I would suggest you to read the CSV file using...Hi, I would suggest you to read the CSV file using Pandas. Since your dataset has 3 dimensions you have to make a 3D plot (or ignore one of the variables). Matplotlib has a module named mplot3 that enables 3d visualization.JustGlowinghttps://www.blogger.com/profile/17212021288715206641noreply@blogger.comtag:blogger.com,1999:blog-1693014329567144872.post-67584833420053969812013-12-04T15:11:45.333+00:002013-12-04T15:11:45.333+00:00forgot to mention, the color feature is numeric 1,...forgot to mention, the color feature is numeric 1, 0.5, 0.3, or 0. I want a new random vector to be predicted. Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-1693014329567144872.post-57158135610514079982013-12-04T15:02:58.581+00:002013-12-04T15:02:58.581+00:00I have my training data in a csv file. The data c...I have my training data in a csv file. The data contains 35 points corresponding to 3D vector in 3 columns x,y, and z and a feature 'color' in the fourth column. Not being a great pythonista, how do I modify your code here to employ my data to test a random new vector?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-1693014329567144872.post-36716712609799885832013-05-24T08:43:31.770+01:002013-05-24T08:43:31.770+01:00Usually each row of the data matrix contains one o...Usually each row of the data matrix contains one of your samples and the knn computes the distance between each sample you have and a query vector. At the end it reports to you the k samples closest to your query vector.JustGlowinghttps://www.blogger.com/profile/17212021288715206641noreply@blogger.comtag:blogger.com,1999:blog-1693014329567144872.post-32580703021275705222013-05-23T21:04:29.085+01:002013-05-23T21:04:29.085+01:00Is data matrix, then, some kind of similarity or d...Is data matrix, then, some kind of similarity or distance measure if I'm doing kNN on documents??Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-1693014329567144872.post-90996744330334107212013-05-23T20:18:11.562+01:002013-05-23T20:18:11.562+01:00Hello, if you data matrix is of dimension n by m t...Hello, if you data matrix is of dimension n by m then x have to be a vector of dimension n.JustGlowinghttps://www.blogger.com/profile/17212021288715206641noreply@blogger.comtag:blogger.com,1999:blog-1693014329567144872.post-20287555400408732262013-05-23T19:57:27.637+01:002013-05-23T19:57:27.637+01:00GP,
Great tutorial. Thanks as always for uploadi...GP,<br /><br />Great tutorial. Thanks as always for uploading these. One question, tho: It's not clear to me (a beginner) what form "x" should take when it's passed into knn_search function. You say that x is "a query point," but what does a query point look like?? Is it a slice of ndata -- a point within the features?? Thank you for your thoughts!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-1693014329567144872.post-48047848182888945562013-03-22T11:58:12.722+00:002013-03-22T11:58:12.722+00:00Awesome code - this really helped me out! Thanks ...Awesome code - this really helped me out! Thanks for sharing!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-1693014329567144872.post-82360975448233627772012-12-12T11:17:09.556+00:002012-12-12T11:17:09.556+00:00"""Thanks a lot, I made a small cha..."""Thanks a lot, I made a small change so that the user can make several queries with one call of knn_serach(...). Sorry I don't know how to format code here"""<br /><br />from numpy import random,argsort,sqrt,array,ones<br />from pylab import plot,show<br /><br /># The function computes the euclidean distance between every point of D and x then returns the indexes of the points for which the distance is smaller.<br />def knn_search(x, D, K):<br /> """ find K nearest neighbours of data among D """<br /> ndata = D.shape[0]<br /> # num of query points<br /> queries=x.shape<br /> K = K if K < ndata else ndata<br /> # euclidean distances from the other points<br /> diff=array(D*ones(queries,int)).T - x[:,:ndata].T<br /> sqd=sqrt(((diff.T)**2).sum(axis=2))<br /> # sorting<br /> idx=argsort(sqd) <br /> # return the indexes of K nearest neighbours<br /> return idx[:,:K] <br /><br /><br /># Now, we will test this function on a random bidimensional dataset:<br />data = random.rand(200,2) # random dataset<br />x = array([[[0.4,0.4]],[[0.6,0.8]],[[0.9,0.2]],[[0.2,0.9]]]) # query points<br /><br /># Performing the search<br />neig_idx = knn_search(x,data,10)<br /><br /># Plotting the data and the input points<br />plot(data[:,0],data[:,1],'ob',x.T[0,0],x.T[1,0],'or')<br /><br /># Highlighting the neighbours for each input<br />plot(data[neig_idx,0],data[neig_idx,1],'o', markerfacecolor='None',markersize=15,markeredgewidth=1)<br />#plot(data[neig_idx[1],0],data[neig_idx[1],1],'xk', markerfacecolor='None',markersize=15,markeredgewidth=1)<br />show()<br />Anonymoushttps://www.blogger.com/profile/12745086989912594780noreply@blogger.comtag:blogger.com,1999:blog-1693014329567144872.post-31717248888796785522012-12-12T11:02:36.713+00:002012-12-12T11:02:36.713+00:00This comment has been removed by the author.Anonymoushttps://www.blogger.com/profile/12745086989912594780noreply@blogger.comtag:blogger.com,1999:blog-1693014329567144872.post-40512838251590931882012-04-19T17:49:28.595+01:002012-04-19T17:49:28.595+01:00Hi Michael, the class scipy.spatial.cKDTree implem...Hi Michael, the class scipy.spatial.cKDTree implements another algorithm for the nearest-neighbor search based on KDTrees. Of course, KDTrees have pro and cons. For example, the search cost using a KDTree is logarithmic (so, it's faster than the naive algorithm implemented here) but you have to build the tree and if need to delete or insert points in your dataset, you have to modify the tree. <br />If you need more details look at this: http://en.wikipedia.org/wiki/K-d_treeJustGlowinghttps://www.blogger.com/profile/17212021288715206641noreply@blogger.comtag:blogger.com,1999:blog-1693014329567144872.post-13562366281757615202012-04-19T17:36:15.550+01:002012-04-19T17:36:15.550+01:00How does this compare with using Scipy's cKDtr...How does this compare with using Scipy's cKDtree? http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.cKDTree.htmlMichael Linhttps://www.blogger.com/profile/09721355086352173334noreply@blogger.com