SlideShare a Scribd company logo
1 of 14
Chapter 1
Introduction
Introduction
Data mining is often defined as finding hidden information in a database or
exploratory data analysis, data driven discovery, deductive learning. Data
mining access of a database differs from a traditional access in:
• Query: The query might not be well formed or precisely stated. The data
miner might not even be exactly sure of what he wants to see.
• Data: The data accessed is usually a different version from that all of the
original operational database. The data have been cleansed and modified
to better support the mining process.
• Output: The output of the data mining query probably is not at subset of
the database. Instead it is the output of some analysis of the contents of
the database.
Data Mining Algorithms
DM algorithms attempt to fit a model to the data. They examine the
data and determine a model that is closest to the characteristics of the
data being examined. Such algorithms can be characterized as
consisting of three parts:
• Model: The purpose of the algorithm is to fit a model to the data.
What attributes should be used to define what class structure?
• Preference: Some criteria must be used to fit one model over another.
The preference will be given to the criteria that fits data the best.
• Search: All algorithms require some technique to search the data. The
criteria needed to fit the data to the classes must be properly defined.
• A predictive model makes a prediction about values of data using known results
found from other (historical) data.
• A descriptive model identifies patterns or relationships in data. It serves as a way
to explore the properties of the data examined, not to predict new properties.
1.1 Basic Data Mining Models and Tasks
• Classification maps data into predefined groups or classes. It is often referred to as supervised
learning because classes are determined before examining the data.
• Regression is it used to math data item to a real valued prediction variable. Regression assumes
that the target data fit into song known type of function (e.g., , linear, logistic etc.) and
determines the best function of this type that models the given data. In actuality regression
involves learning of the function that does this mapping.
• Time series analysis examines the value of an attribute as it varies over time (obtained at evenly
spaced points). There're three basic functions performed in time series analysis: 1) similarity
between different time series is determines using distance measures; 2) the structure of the line
is examined to determine (perhaps classify) its behavior; 3) future values are predicted using
historical time series plot.
• Prediction predicts future data states based on past and current data. Prediction can be also
viewed as a type of classification.
Predictive Models
Descriptive Models
• Clustering is similar to classification except for that the groups are not predefined
but rather defined by the data alone. The clustering is usually accomplished by
determining the similarity among the data on predefined attributes. The most
similar data are grouped into clusters.
• Summarization extracts or derives representative information about the
database. It maps data into subsets with associated simple descriptions. It is also
called characterization or generalization.
• Association rules (link analysis, affinity analysis or association) refers to
uncovering relationships among data. An association rule is a model that
identifies specific types of data associations. These are not casual relationships,
and there is no guarantee that an association will apply in the future.
• Sequence discovery is used to determine sequential patterns in data. These
patterns are based on time (a sequence of actions). Temporal association rules
fall into this category.
Knowledge Discovery Steps
Data Mining Issues
• Human interaction. Experts are used to formulate the queries, identify data and desired results.
• Overfitting: It occurs when the model does not fit future states. This may be caused by
assumptions that are made about the data or may simply be caused by the small size of the
training database.
• Outliers.
• Interpretation of results. Output may require expert to correctly interpret the results.
• Large databases: Sampling and parallelization are effective tools to attack the scalability problem.
• High dimensionality. One solution to this problem is to reduce the number of attributes, which is
known as dimensionality reduction.
• Multimedia data, missing data, irrelevant data, noisy data, changing data.
• Integration and application: Business practices may have to be modified to determine how to
effectively use the information uncovered.
Data Mining Metrics
• From an overall business perspective, a measure such as the return
on investment (ROI) could be used. ROI examines the difference
between what the data mining technique costs and what the savings
or benefits from its use are. It could be measured as increased to
sales, increased advertising expenditure, or both.
• The metrics used include the traditional metrics of space and time
based on complexity and analysis. In some cases, such as accuracy in
classification, more specific metrics targeted to data mining task may
be used.
Cross-Industry Standard Process Model for
Data Mining (CRISP-DM)
The process lifecycle consists of:
• business understanding,
• data understanding,
• data preparation,
• modeling
• evaluation and deployment.
ETL, Online Analytic Processing (OLAP), BI
Examples of Data Mining Applications
• Healthcare data can identify best practices that improve care and reduce costs. Mining can be used to predict the volume
of patients in every category, to find best practices for diagnosis and the most effective treatments
• Market Basket Analysis may allow the retailer to understand the purchase behavior of a buyer.
• Education. Learning pattern of the students can be captured and used to develop techniques to teach them.
• Manufacturing Engineering. Discovering patterns in product architecture, product portfolio, and customer needs data.
Predicting product development span time, cost, or dependencies among tasks.
• Customer Relationship Management (CRM) and customer segmentation are used for implementing customer focused
strategies in acquiring and retaining customers, improving customers’ loyalty.
• Fraud Detection, image analysis, facial and speech recognition.
• Financial Banking. Finding patterns, causalities, and correlations in business information and market prices.
• Research in bio informatics, biology, medicine, neuroscience: gene finding, protein function inference, protein and gene
interaction network reconstruction, data cleansing, and protein sub-cellular location prediction.
• The Human Genome Project. Scientists use Microarray data to look at the gene expressions and sophisticated data analysis
techniques are employed to account for the background noise and normalization of data.
Information Flow Diagram
References:
Dunham, Margaret H. “Data Mining: Introductory and Advanced
Topics”. Pearson Education, Inc., 2003.

More Related Content

What's hot

Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningAarshDhokai
 
OLAP operations
OLAP operationsOLAP operations
OLAP operationskunj desai
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningParas Kohli
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and RegressionMegha Sharma
 
Web mining (structure mining)
Web mining (structure mining)Web mining (structure mining)
Web mining (structure mining)Amir Fahmideh
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesKrish_ver2
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysisDataminingTools Inc
 
MMBD - Multimedia Databases
MMBD - Multimedia DatabasesMMBD - Multimedia Databases
MMBD - Multimedia Databasesrahmivolkan
 
Searching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureSearching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureBalwant Gorad
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
Data Reduction
Data ReductionData Reduction
Data ReductionRajan Shah
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data MiningDHIVYADEVAKI
 

What's hot (20)

Introduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data MiningIntroduction to Web Mining and Spatial Data Mining
Introduction to Web Mining and Spatial Data Mining
 
OLAP operations
OLAP operationsOLAP operations
OLAP operations
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Web mining (structure mining)
Web mining (structure mining)Web mining (structure mining)
Web mining (structure mining)
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
supervised learning
supervised learningsupervised learning
supervised learning
 
web mining
web miningweb mining
web mining
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
MMBD - Multimedia Databases
MMBD - Multimedia DatabasesMMBD - Multimedia Databases
MMBD - Multimedia Databases
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
Neural network
Neural networkNeural network
Neural network
 
Searching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data StructureSearching and Sorting Techniques in Data Structure
Searching and Sorting Techniques in Data Structure
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Data cubes
Data cubesData cubes
Data cubes
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Web Content Mining
Web Content MiningWeb Content Mining
Web Content Mining
 

Viewers also liked

05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data MiningValerii Klymchuk
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an IntroductionAli Abbasi
 
Artificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectArtificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectValerii Klymchuk
 
Data mining slides
Data mining slidesData mining slides
Data mining slidessmj
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data miningDatamining Tools
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Independent component analysis
Independent component analysisIndependent component analysis
Independent component analysisVanessa S
 
Co-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approachCo-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approachAllen Wu
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 

Viewers also liked (20)

05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
Data mining
Data miningData mining
Data mining
 
02 Related Concepts
02 Related Concepts02 Related Concepts
02 Related Concepts
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Data Warehouse Project
Data Warehouse ProjectData Warehouse Project
Data Warehouse Project
 
Data mining
Data miningData mining
Data mining
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
Introduction to DataMining
Introduction to DataMiningIntroduction to DataMining
Introduction to DataMining
 
Database Project
Database ProjectDatabase Project
Database Project
 
Introduction data mining
Introduction data miningIntroduction data mining
Introduction data mining
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
Artificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectArtificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support Project
 
Data mining
Data miningData mining
Data mining
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Independent component analysis
Independent component analysisIndependent component analysis
Independent component analysis
 
Co-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approachCo-clustering of multi-view datasets: a parallelizable approach
Co-clustering of multi-view datasets: a parallelizable approach
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 

Similar to Introduction to Data Mining Concepts and Techniques

Data modelling it's process and examples
Data modelling it's process and examplesData modelling it's process and examples
Data modelling it's process and examplesJayeshGadhave1
 
Data warehouse 16 data analysis techniques
Data warehouse 16 data analysis techniquesData warehouse 16 data analysis techniques
Data warehouse 16 data analysis techniquesVaibhav Khanna
 
Data Mining Presentation.pptx
Data Mining Presentation.pptxData Mining Presentation.pptx
Data Mining Presentation.pptxChingChingErm
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
7.-Data-Analytics.pptx
7.-Data-Analytics.pptx7.-Data-Analytics.pptx
7.-Data-Analytics.pptxmarow75067
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onwordSulman Ahmed
 
Data mining basic concept and Data warehousing
Data mining basic concept and Data warehousingData mining basic concept and Data warehousing
Data mining basic concept and Data warehousingNivaTripathy1
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data miningDhilsath Fathima
 
Unit-V-Introduction to Data Mining.pptx
Unit-V-Introduction to  Data Mining.pptxUnit-V-Introduction to  Data Mining.pptx
Unit-V-Introduction to Data Mining.pptxHarsha Patel
 
What is Data mining? Data mining Presentation
What is Data mining? Data mining Presentation What is Data mining? Data mining Presentation
What is Data mining? Data mining Presentation Pralhad Rijal
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CSThanveen
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningBarry Leventhal
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxcloudserviceuit
 

Similar to Introduction to Data Mining Concepts and Techniques (20)

Data mining
Data miningData mining
Data mining
 
Data modelling it's process and examples
Data modelling it's process and examplesData modelling it's process and examples
Data modelling it's process and examples
 
Data warehouse 16 data analysis techniques
Data warehouse 16 data analysis techniquesData warehouse 16 data analysis techniques
Data warehouse 16 data analysis techniques
 
Data Mining Presentation.pptx
Data Mining Presentation.pptxData Mining Presentation.pptx
Data Mining Presentation.pptx
 
Unit 4 Advanced Data Analytics
Unit 4 Advanced Data AnalyticsUnit 4 Advanced Data Analytics
Unit 4 Advanced Data Analytics
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
7.-Data-Analytics.pptx
7.-Data-Analytics.pptx7.-Data-Analytics.pptx
7.-Data-Analytics.pptx
 
Data mining Basics and complete description onword
Data mining Basics and complete description onwordData mining Basics and complete description onword
Data mining Basics and complete description onword
 
Data mining basic concept and Data warehousing
Data mining basic concept and Data warehousingData mining basic concept and Data warehousing
Data mining basic concept and Data warehousing
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data mining
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 
Unit-V-Introduction to Data Mining.pptx
Unit-V-Introduction to  Data Mining.pptxUnit-V-Introduction to  Data Mining.pptx
Unit-V-Introduction to Data Mining.pptx
 
What is Data mining? Data mining Presentation
What is Data mining? Data mining Presentation What is Data mining? Data mining Presentation
What is Data mining? Data mining Presentation
 
Weka bike rental
Weka bike rentalWeka bike rental
Weka bike rental
 
Data Mining
Data MiningData Mining
Data Mining
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 

More from Valerii Klymchuk

Sample presentation slides template
Sample presentation slides templateSample presentation slides template
Sample presentation slides templateValerii Klymchuk
 
Crime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation DataCrime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation DataValerii Klymchuk
 

More from Valerii Klymchuk (7)

Sample presentation slides template
Sample presentation slides templateSample presentation slides template
Sample presentation slides template
 
Toronto Capstone
Toronto CapstoneToronto Capstone
Toronto Capstone
 
03 Data Representation
03 Data Representation03 Data Representation
03 Data Representation
 
05 Scalar Visualization
05 Scalar Visualization05 Scalar Visualization
05 Scalar Visualization
 
06 Vector Visualization
06 Vector Visualization06 Vector Visualization
06 Vector Visualization
 
07 Tensor Visualization
07 Tensor Visualization07 Tensor Visualization
07 Tensor Visualization
 
Crime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation DataCrime Analysis based on Historical and Transportation Data
Crime Analysis based on Historical and Transportation Data
 

Recently uploaded

原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 

Recently uploaded (20)

原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 

Introduction to Data Mining Concepts and Techniques

  • 2. Introduction Data mining is often defined as finding hidden information in a database or exploratory data analysis, data driven discovery, deductive learning. Data mining access of a database differs from a traditional access in: • Query: The query might not be well formed or precisely stated. The data miner might not even be exactly sure of what he wants to see. • Data: The data accessed is usually a different version from that all of the original operational database. The data have been cleansed and modified to better support the mining process. • Output: The output of the data mining query probably is not at subset of the database. Instead it is the output of some analysis of the contents of the database.
  • 3. Data Mining Algorithms DM algorithms attempt to fit a model to the data. They examine the data and determine a model that is closest to the characteristics of the data being examined. Such algorithms can be characterized as consisting of three parts: • Model: The purpose of the algorithm is to fit a model to the data. What attributes should be used to define what class structure? • Preference: Some criteria must be used to fit one model over another. The preference will be given to the criteria that fits data the best. • Search: All algorithms require some technique to search the data. The criteria needed to fit the data to the classes must be properly defined.
  • 4. • A predictive model makes a prediction about values of data using known results found from other (historical) data. • A descriptive model identifies patterns or relationships in data. It serves as a way to explore the properties of the data examined, not to predict new properties. 1.1 Basic Data Mining Models and Tasks
  • 5. • Classification maps data into predefined groups or classes. It is often referred to as supervised learning because classes are determined before examining the data. • Regression is it used to math data item to a real valued prediction variable. Regression assumes that the target data fit into song known type of function (e.g., , linear, logistic etc.) and determines the best function of this type that models the given data. In actuality regression involves learning of the function that does this mapping. • Time series analysis examines the value of an attribute as it varies over time (obtained at evenly spaced points). There're three basic functions performed in time series analysis: 1) similarity between different time series is determines using distance measures; 2) the structure of the line is examined to determine (perhaps classify) its behavior; 3) future values are predicted using historical time series plot. • Prediction predicts future data states based on past and current data. Prediction can be also viewed as a type of classification. Predictive Models
  • 6. Descriptive Models • Clustering is similar to classification except for that the groups are not predefined but rather defined by the data alone. The clustering is usually accomplished by determining the similarity among the data on predefined attributes. The most similar data are grouped into clusters. • Summarization extracts or derives representative information about the database. It maps data into subsets with associated simple descriptions. It is also called characterization or generalization. • Association rules (link analysis, affinity analysis or association) refers to uncovering relationships among data. An association rule is a model that identifies specific types of data associations. These are not casual relationships, and there is no guarantee that an association will apply in the future. • Sequence discovery is used to determine sequential patterns in data. These patterns are based on time (a sequence of actions). Temporal association rules fall into this category.
  • 8. Data Mining Issues • Human interaction. Experts are used to formulate the queries, identify data and desired results. • Overfitting: It occurs when the model does not fit future states. This may be caused by assumptions that are made about the data or may simply be caused by the small size of the training database. • Outliers. • Interpretation of results. Output may require expert to correctly interpret the results. • Large databases: Sampling and parallelization are effective tools to attack the scalability problem. • High dimensionality. One solution to this problem is to reduce the number of attributes, which is known as dimensionality reduction. • Multimedia data, missing data, irrelevant data, noisy data, changing data. • Integration and application: Business practices may have to be modified to determine how to effectively use the information uncovered.
  • 9. Data Mining Metrics • From an overall business perspective, a measure such as the return on investment (ROI) could be used. ROI examines the difference between what the data mining technique costs and what the savings or benefits from its use are. It could be measured as increased to sales, increased advertising expenditure, or both. • The metrics used include the traditional metrics of space and time based on complexity and analysis. In some cases, such as accuracy in classification, more specific metrics targeted to data mining task may be used.
  • 10. Cross-Industry Standard Process Model for Data Mining (CRISP-DM) The process lifecycle consists of: • business understanding, • data understanding, • data preparation, • modeling • evaluation and deployment.
  • 11. ETL, Online Analytic Processing (OLAP), BI
  • 12. Examples of Data Mining Applications • Healthcare data can identify best practices that improve care and reduce costs. Mining can be used to predict the volume of patients in every category, to find best practices for diagnosis and the most effective treatments • Market Basket Analysis may allow the retailer to understand the purchase behavior of a buyer. • Education. Learning pattern of the students can be captured and used to develop techniques to teach them. • Manufacturing Engineering. Discovering patterns in product architecture, product portfolio, and customer needs data. Predicting product development span time, cost, or dependencies among tasks. • Customer Relationship Management (CRM) and customer segmentation are used for implementing customer focused strategies in acquiring and retaining customers, improving customers’ loyalty. • Fraud Detection, image analysis, facial and speech recognition. • Financial Banking. Finding patterns, causalities, and correlations in business information and market prices. • Research in bio informatics, biology, medicine, neuroscience: gene finding, protein function inference, protein and gene interaction network reconstruction, data cleansing, and protein sub-cellular location prediction. • The Human Genome Project. Scientists use Microarray data to look at the gene expressions and sophisticated data analysis techniques are employed to account for the background noise and normalization of data.
  • 14. References: Dunham, Margaret H. “Data Mining: Introductory and Advanced Topics”. Pearson Education, Inc., 2003.