252x Filetype PDF File size 0.51 MB Source: www.stet.edu.in
Sengamala Thayaar Educational Trust Women’s College
(Affiliated to Bharathidasan University)
(Accredited with ‘A’ Grade {3.45/4.00} By NAAC)
(An ISO 9001: 2015 Certified Institution)
Sundarakkottai, Mannargudi-614 016.
Thiruvarur (Dt.), Tamil Nadu, India.
DATA MINING AND WARE HOUSING
V.GEETHA
ASSISTANT PROFESSOR
PG & RESEARCH DEPARTMENT OF COMPUTER SCIENCE
1
II M.Sc., COMPUTER SCIENCE
Semester : III
CORE COURSE VII-DATA MINING AND WARE HOUSING
P16CS31
Inst. Hours/Week : 5 Credit : 5
Objective : On successful completion of the course the students should have: Understood data mining techniques-
Concepts and design of data warehousing.
UNIT I
Introduction – What is Data mining – Data Warehouses – Data Mining Functionalities – Basic Data mining tasks –
Data Mining Issues – Social Implications of Data Mining– Applications and Trends in Data Mining.
UNIT II
Data Preprocessing : Why preprocess the Data ? –Data Cleaning - Data Integration and Transformation – Data
Reduction – Data cube Aggregation – Attribute Subset Selection Classification: Introduction – statistical based
algorithms – Bayesian Classification. Distance based algorithms – decision tree based algorithms – ID3.
UNIT III
Clustering: Introduction - Hierarchical algorithms – Partitional algorithms – Minimum spanning tree – K-Means
Clustering - Nearest Neighbour algorithm. Association Rules: What is an association rule? – Methods to discover
an association rule–APRIORI algorithm – Partitioning algorithm .
UNIT IV
Data Warehousing: An introduction – characteristics of a data warehouse – Data marts – other aspects of data mart
.Online analytical processing: OLTP & OLAP systems.
UNIT V
Developing a data warehouse : Why and how to build a data warehouse – Data warehouse architectural strategies
and organizational issues – Design consideration – Data content – meta data – distribution of data – tools for data
warehousing – Performance considerations
TEXT BOOKS
1. Jiawei Han and Miceline Kamber , “Data Mining Concepts and Techniques “ , Morgan Kaulmann Publishers,
2006. (Unit I – Chapter 1 -1.2, 1.4 , Chapter 11- 11.1) (Unit II – Chapter 2 - 2.1,2.3, 2.4, 2.5.1,2.5.2) 2. Margaret H
Dunham , “Data mining Introductory & Advanced Topics”, Pearson Education , 2003.(Unit I – Chapter 1 -1.1 , 1.3,
1.5) , (UNIT II – Chapter 4 – 4.1, 4.2, 4.3, 4.4) (UNIT III – Chapter 5 – 5.1,5.4, 5.5.1, 5.5.3,5.5.4, Chapter 6 –
6.1,6.3. 3. C.S.R.Prabhu, “Data Warehousing concepts, techniques, products & applications”, PHI, Second Edition.
) (UNIT IV & V ) REFERENCES: 1. Pieter Adriaans, Dolf Zantinge, “Data Mining” Pearson Education, 1998.
2. Arun K Pujari, “Data Mining Techniques”,Universities Press(India) Pvt, 2003.
3. S.Rajashekharan, G A Vijaylakshmi Bhai,”Neural Networks,Fuzzy Logic,and Genetic Algorithms synthesis and
Application”, PHI 4. Margaret H.Dunham,” Data Mining Introductory and Advanced topics”,Pearson Eductaionn
2003.
*****
2
UNIT I
1.1 INTRODUCTION : WHAT IS DATA MINING?
Definition
Data Mining refers to extracting or mining knowledge from large amount of data .
In simple words ,data mining is defined a process used to extract usable data from
a larger set of any raw data.
Data mining is the practice of examining large pre-existing databases in order to
generate new information
On defining data mining we can know the related terms of data mining , they are
Database
-Database is an organized collection of data, generally stored and accessed
electronically from a computer system .
DBMS
-Database Management system is a software that interacts with the end users,
applications, and the database itself to capture and analyze the data.
Data warehouse
- a large store of data accumulated from a wide range of sources within a company
and used to guide management decisions.
OLTP
-Online Transaction processing is a class of software programs capable of
supporting transactions oriented applications on the internet. (eg) log file, online
banking .
KDD
Many people treat data mining as a synonym for another popular used term
Knowledge Discovery from Data or KDD.
But Data Mining is an essential step in the process of knowledge discovery
Data mining as a step in the process of Knowledge discovery
1.Data Cleaning
2. Data Integration
3.Data Selection
4.Data Transformation
5.Data mining
6.Pattern Evaluation
7.Knowledge Presentation
3
Data Cleaning
-To remove noise and inconsistent data.
Data Integration
-where multiple sources may be combined
Data Integration
-where data relevant to the analysis task are retrieved from the database
Data Transformation
-where data are transformed or consolidated into forms appropriate for mining
by performing summary or aggregation operations.
Data Mining
-an essential process where intelligent methods are applied in order to exact
data pattern
Pattern Evaluation
-to identify the truly interesting patterns representing knowledge based on
some interestingness measures
4
no reviews yet
Please Login to review.