The authors here invited ACM KDD Innovation Award and IEEE ICDM Research Contributions Award winners to each nominate up to 10 best-known algorithms in data mining, including the algorithm name, justification for nomination, and a representative publication reference. The list was voted on by other IEEE and ACM award winners to narrow this down to a top 10 list. These algorithms are used for association analysis, classification, clustering, statistical learning, and much more.You can read the paper
here.
Here are the winners:
- C4.5
- The k-Means algorithm
- Support Vector Machines
- The Apriori algorithm
- Expectation-Maximization
- PageRank
- AdaBoost
- k-Nearest Neighbor Classification
- Naive Bayes
- CART (Classification and Regression Trees)
The 2007 paper gives a brief overview of what the method is commonly used for and how it works, along with lots of references. It also has a much more detailed description of how these winners were selected than what I've said here.
The exciting thing is I've seen nearly all of these algorithms used for mining genetic data for complex patterns of genetic and environmental exposures that influence complex disease. See some recent papers at
EvoBio and
PSB. Further, lots of these methods are implemented in several
R packages.
Top 10 Algorithms in Data Mining (PDF)