336x Filetype PDF File size 1.45 MB Source: people.cmix.louisiana.edu
Data Mining:
Concepts and Techniques
—Chapter 2 —
2nd Edition, Han and Kamber
[Note: Materials of this presentation are from Chapter 2, 2nd Edition of textbook,
unless mentioned otherwise)
Jiawei Han
Department of Computer Science
University of Illinois at Urbana-Champaign
www.cs.uiuc.edu/~hanj
©2006 Jiawei Han and Micheline Kamber, All rights reserved
February 19, 2008 Data Mining: Concepts and Techniques 1
February 19, 2008 Data Mining: Concepts and Techniques 2
Chapter 2: Data Preprocessing
◼ Why preprocess the data?
◼ Descriptive data summarization (Ch. 2.1, 3rdEdition, textbook)
◼ Data cleaning
◼ Data integration and transformation
◼ Data reduction
◼ Discretization and concept hierarchy generation
◼ Summary
February 19, 2008 Data Mining: Concepts and Techniques 3
Why Data Preprocessing?
◼ Data in the real world is dirty
◼ incomplete: lacking attribute values, lacking
certain attributes of interest, or containing
only aggregate data
◼ e.g., occupation=“ ”
◼ noisy: containing errors or outliers
◼ e.g., Salary=“-10”
◼ inconsistent: containing discrepancies in codes
or names
◼ e.g., Age=“42” Birthday=“03/07/1997”
◼ e.g., Was rating “1,2,3”, now rating “A, B, C”
◼ e.g., discrepancy between duplicate records
February 19, 2008 Data Mining: Concepts and Techniques 4
no reviews yet
Please Login to review.