|
|
|
Number of Unites: 4
Schedule: Three hours of lecture and one hour of discussion per week.
Prerequisites: Basic concepts and algorithms from probability and statistics
Catalog Description :
Knowledge discovery is the process of discovering useful regularities in large
and complex data sets. The field encompasses techniques from artificial
intelligence (representation and search), statistics (inference), and databases
(data storage and access). When integrated into useful systems, these techniques
can help human analysts make sense of vast stores of digital information. This
course presents the fundamental principles of the field, familiarizes students with
the technical details of representative algorithms.
Expanded Description:
- Data pre-processing
- Data cleaning
- Data transformation
- Data reduction
- Discretization
- Association rules and sequential patterns
- Basic concepts
- Apriori Algorithm
- Mining association rules with multiple minimum supports
- Mining class association rules
- Sequetial pattern mining
- Supervised learning (Classification)
- Basic concepts
- Decision trees
- Classifier evaluation
- Rule induction
- Classification based on association rules
- Naive-Bayesian learning
- Naive-Bayesian learning for text classification
- Support vector machines
- K-nearest neighbor
- Unsupervised learning (Clustering)
- Basic concepts
- K-means algorithm
- Representation of clusters
- Hierarchical clustering
- Distance functions
- Data standardization
- Handling mixed attributes
- Which clustering algorithm to use?
- Cluster evaluation
- Discovering holes and data regions
- Post-processing
- Objective interestingness
- Subjective interestingness
- Information retrieval and Web search
- Basic text processing and representation
- Cosine similarity
- Relevance feedback and Rocchio algorithm
- Partially supervised learning
- Semi-supervised learning
- Learning from labeled and unlabeled examples using EM
- Learning from labeled and unlabeled examples using co-training
- Learning from positive and unlabeled examples
- Link analysis
- Social network analysis
- Citation analysis: co-citation and bibliographic coupling
- The PageRank algoithm (of Google)
- The HITS algorithm: authorities and hubs
- Mining communities on the Web
- Data extraction and information integration
Course Objectives & Role in the Program:
This course has three objectives. First, to provide students with a sound basis
in data mining tasks and techniques. Second, to ensure that students are able to
read, and critically evaluate data mining research papers. Third, to ensue that
students are able to implement and to use some of the important data mining and
text mining algorithms.
Method of Evaluation:
- Midterm: 25%
- Final Exam: 40%
- Projects:
- Project 1: Algorithm implementation (15%)
- Project 2: Research project (including implementation) (20%)
Required Books:
Textbooks:
- Building an Intelligent Web: Theory & Practice, R. Akerkar & P. Lingras; Jones & Bartlett, 2007.
- Data mining: Concepts and Techniques, by Jiawei Han and Micheline Kamber, Morgan
Kaufmann Publishers, ISBN 1-55860-489-8.
Reference books:
- Principles of Data Mining, by David Hand, Heikki Mannila, Padhraic Smyth,
The MIT Press, ISBN 0-262-08290-X.
- Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, and Vipin
Kumar, Pearson/Addison Wesley, ISBN 0-321-32136-7.
- Data mining resource site: KDnuggets Directory
Useful Links:
|
|
foo
|