Small molecules, especially those which alter biological processes and disease states, are of significant interest.
Predicting the specific bioactivities of small molecules is a general problem that has attracted particular attention.
Nearest Neighbor(NN) classification is one of the most widely used methods for small molecule classification,
but it suffers from both speed and overfitting. We proposed a variation, namely Centroid based Nearest Neighbor (CBNN)
classifier to solve NN problems, and a specific approach that we call Combinatorial Centroid Nearest Neighbor(CCNN) approach.
We applied CCNN to a number of data sets, which are publicly available. It turned out that CCNN has the best accuracy and running time among all competitors.