Data Mining Research Paper Topics

Here is the list of 50 selected papers in Data Mining and Machine Learning. You can download them for your detailed reading and research. Enjoy!


Data Mining and Statistics: What’s the Connection?

Data Mining: Statistics and More?, D. Hand, American Statistician, 52(2):112-118.

Data Mining, G. Weiss and B. Davison, in Handbook of Technology Management, John Wiley and Sons, expected 2010.

From Data Mining to Knowledge Discovery in Databases, U. Fayyad, G. Piatesky-Shapiro & P. Smyth, AI Magazine, 17(3):37-54, Fall 1996.

Mining Business Databases, Communications of the ACM, 39(11): 42-48.

10 Challenging Problems in Data Mining Research, Q. Yiang and X. Wu, International Journal of Information Technology & Decision Making, Vol. 5, No. 4, 2006, 597-604.

The Long Tail, by Anderson, C., Wired magazine.

AOL’s Disturbing Glimpse Into Users’ Lives, by McCullagh, D.,, August 9, 2006

General Data Mining Methods and Algorithms

Top 10 Algorithms in Data Mining, X. Wu, V. Kumar, J.R. Quinlan, J. Ghosh, Q. Yang, H. motoda, G.J. MClachlan, A. Ng, B. Liu, P.S. Yu, Z. Zhou, M. Steinbach, D. J. Hand, D. Steinberg, Knowl Inf Syst (2008) 141-37.

Induction of Decision Trees, R. Quinlan, Machine Learning, 1(1):81-106, 1986.

Web and Link Mining

The Pagerank Citation Ranking: Bringing Order to the Web, L. Page, S. Brin, R. Motwani, T. Winograd, Technical Report, Stanford University, 1999.

The Structure and Function of Complex Networks, M. E. J. Newman, SIAM Review, 2003, 45, 167-256.

Link Mining: A New Data Mining Challenge, L. Getoor, SIGKDD Explorations, 2003, 5(1), 84-89.

Link Mining: A Survey, L. Getoor, SIGKDD Explorations, 2005, 7(2), 3-12.
Semi-supervised Learning

Semi-Supervised Learning Literature Survey, X. Zhu, Computer Sciences TR 1530, University of Wisconsin — Madison.

Introduction to Semi-Supervised Learning, in Semi-Supervised Learning (Chapter 1) O. Chapelle, B. Scholkopf, A. Zien (eds.), MIT Press, 2006. (Fordham’s library has online access to the entire text)

Learning with Labeled and Unlabeled Data, M. Seeger, University of Edinburgh (unpublished), 2002.

Person Identification in Webcam Images: An Application of Semi-Supervised Learning, M. Balcan, A. Blum, P. Choi, J. lafferty, B. Pantano, M. Rwebangira, X. Zhu, Proceedings of the 22nd ICML Workshop on Learning with Partially Classified Training Data, 2005.

Learning from Labeled and Unlabeled Data: An Empirical Study across Techniques and Domains, N. Chawla, G. Karakoulas, Journal of Artificial Intelligence Research, 23:331-366, 2005.

Text Classification from Labeled and Unlabeled Documents using EM, K. Nigam, A. McCallum, S. Thrun, T. Mitchell, Machine Learning, 39, 103-134, 2000.

Self-taught Learning: Transfer Learning from Unlabeled Data, R. Raina, A. Battle, H. Lee, B. Packer, A. Ng, in Proceedings of the 24th International Conference on Machine Learning, 2007.

An iterative algorithm for extending learners to a semisupervised setting, M. Culp, G. Michailidis, 2007 Joint Statistical Meetings (JSM), 2007
Partially-Supervised Learning / Learning with Uncertain Class Labels

Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers, V. Sheng, F. Provost, P. Ipeirotis, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008.

Logistic Regression for Partial Labels, in 9th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Volume III, pp. 1935-1941, 2002.

Classification with Partial labels, N. Nguyen, R. Caruana, in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008.

Imprecise and Uncertain Labelling: A Solution based on Mixture Model and Belief Functions, E. Come, 2008 (powerpoint slides).

Induction of Decision Trees from Partially Classified Data Using Belief Functions, M. Bjanger, Norweigen University of Science and Technology, 2000.

Knowledge Discovery in Large Image Databases: Dealing with Uncertainties in Ground Truth, P. Smyth, M. Burl, U. Fayyad, P. Perona, KDD Workshop 1994, AAAI Technical Report WS-94-03, pp. 109-120, 1994.

Recommender Systems

Trust No One: Evaluating Trust-based Filtering for Recommenders, J. O’Donovan and B. Smyth, In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI-05), 2005, 1663-1665.

Trust in Recommender Systems, J. O’Donovan and B. Symyth, In Proceedings of the 10th International Conference on Intelligent User Interfaces (IUI-05), 2005, 167-174.

General resources available on this topic:

ICML 2003 Workshop: Learning from Imbalanced Data Sets II

AAAI ’2000 Workshop on Learning from Imbalanced Data Sets


A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data, G. Batista, R. Prati, and M. Monard, SIGKDD Explorations, 6(1):20-29, 2004.

Class Imbalance versus Small Disjuncts, T. Jo and N. Japkowicz, SIGKDD Explorations, 6(1): 40-49, 2004.

Extreme Re-balancing for SVMs: a Case Study, B. Raskutti and A. Kowalczyk, SIGKDD Explorations, 6(1):60-69, 2004.

A Multiple Resampling Method for Learning from Imbalanced Data Sets, A. Estabrooks, T. Jo, and N. Japkowicz, in Computational Intelligence, 20(1), 2004.

SMOTE: Synthetic Minority Over-sampling Technique, N. Chawla, K. Boyer, L. Hall, and W. Kegelmeyer, Journal of Articifial Intelligence Research, 16:321-357.

Generative Oversampling for Mining Imbalanced Datasets, A. Liu, J. Ghosh, and C. Martin, Third International Conference on Data Mining (DMIN-07), 66-72.

Learning from Little: Comparison of Classifiers Given Little of Classifiers given Little Training, G. Forman and I. Cohen, in 8th European Conference on Principles and Practice of Knowledge Discovery in Databases, 161-172, 2004.

Issues in Mining Imbalanced Data Sets – A Review Paper, S. Visa and A. Ralescu, in Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, pp. 67-73, 2005.

Wrapper-based Computation and Evaluation of Sampling Methods for Imbalanced Datasets, N. Chawla, L. Hall, and A. Joshi, in Proceedings of the 1st International Workshop on Utility-based Data Mining, 24-33, 2005.

C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling, C. Drummond and R. Holte, in ICML Workshop onLearning from Imbalanced Datasets II, 2003.

C4.5 and Imbalanced Data sets: Investigating the effect of sampling method, probabilistic estimate, and decision tree structure, N. Chawla, in ICML Workshop on Learning from Imbalanced Datasets II, 2003.

Class Imbalances: Are we Focusing on the Right Issue?, N. Japkowicz, in ICML Workshop on Learning from Imbalanced Datasets II, 2003.

Learning when Data Sets are Imbalanced and When Costs are Unequal and Unknown, M. Maloof, in ICML Workshop on Learning from Imbalanced Datasets II, 2003.

Uncertainty Sampling Methods for One-class Classifiers, P. Juszcak and R. Duin, in ICML Workshop on Learning from Imbalanced Datasets II, 2003.

Active Learning

Improving Generalization with Active Learning, D Cohn, L. Atlas, and R. Ladner, Machine Learning 15(2), 201-221, May 1994.

On Active Learning for Data Acquisition, Z. Zheng and B. Padmanabhan, In Proc. of IEEE Intl. Conf. on Data Mining, 2002.

Active Sampling for Class Probability Estimation and Ranking, M. Saar-Tsechansky and F. Provost, Machine Learning 54:2 2004, 153-178.

The Learning-Curve Sampling Method Applied to Model-Based Clustering, C. Meek, B. Thiesson, and D. Heckerman, Journal of Machine Learning Research 2:397-418, 2002.

Active Sampling for Feature Selection, S. Veeramachaneni and P. Avesani, Third IEEE Conference on Data Mining, 2003.

Heterogeneous Uncertainty Sampling for Supervised Learning, D. Lewis and J. Catlett, In Proceedings of the 11th International Conference on Machine Learning, 148-156, 1994.

Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction, G. Weiss and F. Provost, Journal of Artificial Intelligence Research, 19:315-354, 2003.

Active Learning using Adaptive Resampling, KDD 2000, 91-98.

Cost-Sensitive Learning

Types of Cost in Inductive Concept Learning, P. Turney, In Proceedings Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning.

Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection, P. Chan and S. Stolfo, KDD 1998.


Follow @datamadesimple

If you want to conduct a research project on data mining and are looking for facts and topics, then you’ve come to the right place. The previous guide 10 facts on data mining for an academic research project must have given you a comprehensive outlook on data mining and you can get further help by reading this guide which has 20 interesting topics. In fact, not only does this guide provide 20 topics, but also an essay on one them to make it easier for you to start your research work today. If you want the specifics on how to approach this academic genre then feel free to go to our guide.

Data mining is a way to sample parts of a huge amount of data. These samples, further divided into variables, can then be used in mathematical calculations and algorithms. The algorithms make it possible to predict a pattern, which can then be utilized in thousands of applications. The purpose of data mining is to find patterns and this is the ethical line that needs to be kept in check.

Here is a list of 20 topics which you can base your research project on:

  1. The Process of Anomaly Detection
  2. How is Dependency Modeling Performed?
  3. How is Representative-based Clustering Performed?
  4. What’s the need of Density-based Clustering?
  5. Association Rule Learning in Data Mining
  6. How Can Linear and Nonlinear Regression Analysis Be Made More Effective?
  7. Clustering through Graphical and Spectral Representation
  8. Why is Probabilistic Classification Necessary in Data Mining?
  9. What Are Bayesian Procedures and How Can They Be Used to Classify Unlabeled Points?
  10. Reliability of Naive Bayes Classifier
  11. Applications of Hierarchical Clustering
  12. Is Kernel Estimation a Reliable Classification Algorithm?
  13. What is a Decision Tree Classifier?
  14. Keeping Data Mining in The Constraints of Legality, Privacy and Ethics
  15. How Can Data Mining Help in The Growth of a Business?
  16. Using Data Mining Techniques to Analyze Supermarket Transaction Data
  17. Role of Subject-Based Data Mining in Reducing Terrorism
  18. Role of Data Mining in Condition Monitoring of High Voltage Electrical Equipment
  19. Using Data Mining to Perfect Expertise Finding Systems in Social Programs
  20. Role of Spatial Data Mining of Wireless Sensor Networks in Air Pollution Monitoring

Our objective is to help your train of thought get a direction so you can stop procrastinating and start working on your project. You can chose a topic from the above mentioned list or you can integrate two or more and make an even more detailed research project. There is a tsunami of information available on the internet about each and every one of the above mentioned topics so research won’t be an issue.

Sample Data Mining Project: Association Rule Learning in Data Mining

In data mining, association rule learning is an extremely vital tool through which two previously unrelated variables can be related in a significantly large data pool. Through this method, strong rules are successfully discovered in databases. Professor Rakesh Agrawal used the concept of strong rules to establish a different set of association rules that highlighted similarities between products even in huge amounts of transaction data in supermarkets.

If a log in the transaction data exists about a customer buying beer and potato chips, and if this is repeated by several other customers, we can safely establish the fact that the two products are connected. It is safe to assume that the next time a person buys beer, he or she will buy potato chips too. If a supermarket owner finds this out and puts the two products side by side, this assumption can turn into a fact, which will ultimately increase sales. This can also be used to design marketing campaigns. This mined data can help marketers put together two products in one picture to increase sales of both products.

Market basket analysis is an actual study which is being implemented not only in the supermarket industry but in web usage mining, continuous production, bioinformatics and intrusion detection too. Association rule learning is slightly different from sequence mining because it doesn’t take the order of items in a transaction under consideration.

Although used in many practical scenarios, association rule learning is not free of problems. One of the biggest issues with this method is that there is a significant chance of unusable or incorrect associations when an algorithm is going through massive numbers to locate items that seemed to be associated.

These incorrect associations occur by chance, as the associations between the items simply come forth due to unforeseen repetitions in the data. If the number of items is in the thousands, and the algorithm is trying to find an association between two items, then statistically speaking, there are thousands and thousands of possibilities. In this case there is the concept of statistically sound associations, which is designed to help reduce the amount of error in association though a more carefully coded probability algorithm.

There are some very famous algorithms designed over the years to create accurate association rules over the years. Although some famous algorithms exist such as Apriori, FP-Growth and Eclat, they can’t be expected to produce efficient results. In order to achieve specific and useful association results, one needs to go beyond the mining frequent item sets and create rules based on frequent item sets from a particular database.

Shmueli, G., Bruce, P. C., & Patel, N. R. (2010). Data Mining For Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel® with XLMiner®, Second Edition. John Wiley & Sons.
Steinbach, M., Tan, P., & Kumar, V. (2005). Data mining. Harlow: Addison-Wesley.
Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning tools and techniques. Burlington, MA: Morgan Kaufmann.
Han, J., Kamber, M., & Pei, J. (2011). Data mining: Concepts and techniques concepts and techniques. San Francisco: Morgan Kaufmann In.
Aggarwal, C. C. (2015). Data Mining: The textbook. Cham: Springer.
Russell, M. A. (2013). Mining the Social Web: Data Mining from Facebook, Twitter, and LinkedIn, Google , GitHub, and More (2nd Edition). O’Reilly Media.
Provost, F. (2013). Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking.

0.00 avg. rating (0% score) - 0 votes

Tags: research paper topics, research project ideas, research project topics

0 Thoughts to “Data Mining Research Paper Topics

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *