SlideShare a Scribd company logo
1 of 8
Download to read offline
Data Mining Techniques 2016
1 | 7
White Paper
Data Mining Techniques
Prepared by
Mehmet BEYAZ
TTG International, L.T.D.
www.ttgint.com
30/06/2016
Words of Wisdom
You will see it as you like to see.
- Mevlana Jalaluddin Rumi-
Data Mining Techniques 2016
2 | 7
Introduction
Everyone knows that the Internet and smart phones have changed how businesses operate,
governments function, and society lives and communicates. Recently, new technological trend is just
as transformative: “big data.” Big data starts with the fact that there is a lot more information
floating around these days than ever before, and it is being put to extraordinary new uses. Big data is
about more than just communication. Since, we live in the world of “Big Data. The idea is that we can
learn from a large body of information that we could not comprehend when we used only smaller
amounts.
DATA MINING
We are living in a world, where a vast amount of digital data which is called big data. Plus as the
world becomes more and more connected via the Internet of Things (IoT). The IoT has been a major
influence on the Big Data landscape. These data are collected consciously from 5 minutes to hourly
and daily basses from different sources every day. The analysis of such big data brings ahead
business competition to the next level of innovation and productivity. Therefore, the extraction and
interpretation of hidden patterns in data sets is of great importance. Data mining is a modern tool
that aims to discover meaningful knowledge from large data sets and prediction trends. Data mining
offers not only a retrospective view on a business process, but also enables humans to develop a
successful market strategy.
Origins
The Data mining originates in the 80s, when it was introduced and utilized within a research
community. Data mining also known as KDD (Knowledge Discovery in Databases) and sometimes
refer as a Data Analytics as well. The data mining is defined as the component of KDD process and
deals with the examination of inner patterns in databases. Besides that, KDD is concerned about the
evaluation and interpretation of discovered patterns. Although, exact meanings of KDD and data
mining terms differ from each other, often they are used interchangeably. In this paper I utilize KDD
and data mining as synonyms, if it is not specified. Data mining is the analysis of large data
observational data sets to find out unknown relationships with in the verity of data set and to
summarize the data in novel ways that are both understandable and useful to the data owner. Data
mining computational methods find themselves in the intersection of classical statistics, artificial
intelligence, and machine learning. Data mining as a whole knowledge discovery process also
involves many disciplines, such as databases, data cleaning, visualization, exploratory data analysis,
and performance and KPI evaluation.
Methods
Data mining techniques are categorized into supervised, semi-supervised, and unsupervised
methods. Supervised method is where you have input variables (x) and an output variable (Y)
and you use an algorithm to learn the mapping function from the input to the output.
Data Mining Techniques 2016
3 | 7
Y = f(X)
The goal is to approximate the mapping function so well that when you have new input data
(x) that you can predict the output variables (Y) for that data.
It is called supervised learning because the process of algorithm learning from the training
dataset can be thought of as a teacher supervising the learning process.
Unlike the supervised approach, the unsupervised technique is to model the underlying
structure or distribution in the data in order to learn more about the data.
These are called unsupervised learning because unlike supervised learning above there are no
correct answers and there is no teacher. Algorithms are left to their own devises to discover
and present the interesting structure in the data.
Also data scientists identify semi-supervised learning, which is similar to a supervised one.
Problems where you have a large amount of input data (X) and only some of the data is
labelled (Y) are called semi-supervised learning problems.
These problems sit in between both supervised and unsupervised learning.
A good example is a photo archive where only some of the images are labelled, (e.g. dog, cat,
cow, person) and the majority are unlabelled.
Many real world machine learning problems fall into this area. This is because it can be
expensive or time consuming to label data as it may require access to domain experts.
Whereas unlabelled data is cheap and easy to collect and store.
You can use unsupervised learning techniques to discover and learn the structure in the input
variables.
In Summary
In this paper you learned the difference between supervised, unsupervised and semi-
supervised learning. You now know that:
 Supervised: All data is labelled and the algorithms learn to predict the output from the input
data.
 Unsupervised: All data is unlabelled and the algorithms learn to inherent structure from the
input data.
 Semi-supervised: Some data is labelled but most of it is unlabelled and a mixture of
supervised and unsupervised techniques can be used.
Data Mining Techniques 2016
4 | 7
The aim of the Data mining is may be distinguished in different processes categories. While discovery
focuses on searching a database for hidden patterns without a predefined hypothesis about the
nature of the pattern and deriving a model of the causal generator of the data. Data mining usually
falls into two main categories. They are Predictive and Descriptive. See figure 1 at below.
Predictive:
 Classification aims to categorize unseen input data records into known classes. The
assignment model or classifier learns from the training data set, where the relationship
between records and classes is provided.
 Time series forecasting predicts the future value of a target function based on the previously
observed measurements
Figure 1 Data mining technics.
Descriptive:
 Data mining requires some data to find the pattern. Predictive and Descriptive data mining
are also classified in different parts.
 Regression aims to predict numerical values for input data records. The mapping function
learns from the training data set, where the relationship between records and their values is
known.
Data Mining Techniques 2016
5 | 7
Anomaly detection extracts points or outliers that are considerably different from the rest manifold
of data points.
Descriptive:
 Clustering identifies manifolds of points called clusters with similar properties or behaviours.
 Association analysis discovers relationships between records within the same data set.
Knowledge Discovery in Databases Process
The KDD is an automatic, exploratory data analysis and modelling of large data sources. The KDD is
the organized process of identifying valid, novel, useful, and human eye understandable patterns
from large and complex data sets. Data Mining is the core of the KDD process, involving the
connecting of algorithms that explore the data, develop the model and discover previously unknown
patterns. The KDD knowledge discovery process is repetitive, interactive, and consists of nine steps.
Figure 2
The unifying goal of the KDD process is to extract useful information from data in the context of large
databases. Data mining refers to the set of computational methods that extract valuable patterns
from original data. Additionally, KDD process is concerned about manipulation with massive data,
scaling algorithms for better performance, proper interpretation of retrieved information, and
human interaction with the overall process. KDD process is a sequential analysis that includes the
following steps, see Figure 2:
 selection,
 pre-processing,
 transformation,
Data Mining Techniques 2016
6 | 7
 data mining,
 and information interpretation
However, this sequential knowledge extraction approach may involve iterations, because at any
point the data analyst can change settings and repeat previous steps again. The process starts with
determining the KDD goals, and ends with the implementation of the discovered knowledge. Thus,
the basic KDD sequence may include closed loops, and the effects are then measured on the new
data repositories, and the KDD process is launched again.
The knowledge exploration process starts with the development of necessary theoretical and
practical background in the application domain. The understanding of relevant knowledge is
important to achieve customer’s goals. The followings are a brief description of the nine step KDD
process;
Selection
It implies the selection of the target data set based on goals. Determine what data will be used for
the knowledge discovery, such as: what data is available, obtaining additional necessary data, and
the integrating all the data for the knowledge discovery into one data set. This process is very
important because the data mining learns and discovers from the available data.
Pre-processing
The quality of the selected data is often inappropriate for further analysis, because of multiple
reasons. Outliers, missing variables, or high level of noise during the measurements require special
data strategies. Hence, Data reliability is enhanced in this stage.
Transformation
This step can be crucial for the success of the entire KDD project, and it is usually very project
specific. Transformation projects an original data into a low dimensional (dimension reduction) space
embedded space and includes linear and nonlinear method. The reduced set of embedded features
allows visual inspection and facilitates the further mining of knowledge.
Data mining
The core element of the KDD process is the data mining phase, which includes several steps.
Depending on the customer’s goal, a specific data mining task is chosen classification, anomaly
detection, regression, or clustering. There are two major goals in data mining: prediction and
description. Then, the chosen data mining algorithm is executed to search for underlying patterns
and valuable knowledge.
Interpretation/Evaluation
The final step of the KDD process is interpretation and evaluation of the retrieved information with
respect to the goals defined in the first step. This step involves techniques for visual analysis and a
Data Mining Techniques 2016
7 | 7
number of performance metrics. The correct interpretation of results is important, because it allows
checking assumptions and tuning parameters of previous KDD components.
Finally, the discovered knowledge and designed KDD algorithm may be incorporate into an existing
business model. The possible usage scenarios encompass reporting and prediction, optimization and
automation of the business processes.
Data Mining Techniques 2016
8 | 7
References
1. Detecting Cellular Network Anomalies Using the Knowledge Discovery Process by, Sergey
Chernov, JYVÄSKYLÄ 2015
2. The UCI KDD Archive of Large Data Sets for Data Mining Research and Experimentation by,
Stephen D. Bay, Dennis Kibler, Michael J. Pazzani, and Padhraic Smyth Department of
Information and Computer Science University of California, Irvine Irvine, CA 92697
3. Data mining and complex telecommunications problems modeling Janusz Granat
4. DATA MINING IN TELECOMMUNICATIONS Gary M. Weiss Department of Computer and
Information Science Fordham University
5. Data Mining with Big Data - IEEE Xplore
ieeexplore.ieee.org/iel7/69/4358933/06547630.pdf?arnumber=6547630
6. Data Mining for Big Data: A Review Bharti Thakur, Manish Mann Computer Science
Department LRIET, Solan (H.P), India
7. https://blog.udemy.com/knowledge-discovery-in-databases/
8. http://www.economist.com/node/15557443
9. http://www.neural-forecasting.com/nn_for_data_mining.htm
10. http://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/KDD3.htm

More Related Content

What's hot

Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools forIJDKP
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaIJERA Editor
 
1 Introduction to-data-mining lecture
1   Introduction to-data-mining lecture1   Introduction to-data-mining lecture
1 Introduction to-data-mining lectureMahmoud Alfarra
 
Introduction to-data-mining chapter 1
Introduction to-data-mining  chapter 1Introduction to-data-mining  chapter 1
Introduction to-data-mining chapter 1Mahmoud Alfarra
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)Krishan Pareek
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cyclehktripathy
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaIJDKP
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective ApproachIRJET Journal
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge DiscoverySSSW
 

What's hot (19)

Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data mining
Data miningData mining
Data mining
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools for
 
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social MediaFrequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
 
Data mining
Data miningData mining
Data mining
 
1 Introduction to-data-mining lecture
1   Introduction to-data-mining lecture1   Introduction to-data-mining lecture
1 Introduction to-data-mining lecture
 
Data mining
Data miningData mining
Data mining
 
Introduction to-data-mining chapter 1
Introduction to-data-mining  chapter 1Introduction to-data-mining  chapter 1
Introduction to-data-mining chapter 1
 
4113ijaia09
4113ijaia094113ijaia09
4113ijaia09
 
Ch 1 intro_dw
Ch 1 intro_dwCh 1 intro_dw
Ch 1 intro_dw
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective Approach
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Unit 2
Unit 2Unit 2
Unit 2
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge Discovery
 

Viewers also liked

4.gestion del cono y toma de decisiones
4.gestion del cono y toma de decisiones4.gestion del cono y toma de decisiones
4.gestion del cono y toma de decisionesMayra Granda
 
Prezentacija ozakonjenje
Prezentacija ozakonjenjePrezentacija ozakonjenje
Prezentacija ozakonjenjeanastenarii
 
Writing Sample -- International Investment in Health (1)
Writing Sample -- International Investment in Health (1)Writing Sample -- International Investment in Health (1)
Writing Sample -- International Investment in Health (1)Sibel Ozcelik
 
1.poder, liderazgo
1.poder, liderazgo1.poder, liderazgo
1.poder, liderazgoMayra Granda
 
Lessons Learned from Five Years of Investment by USDA NIFA into Climate Chang...
Lessons Learned from Five Years of Investment by USDA NIFA into Climate Chang...Lessons Learned from Five Years of Investment by USDA NIFA into Climate Chang...
Lessons Learned from Five Years of Investment by USDA NIFA into Climate Chang...National Institute of Food and Agriculture
 
премия расо гамбургский счет 2016
премия расо гамбургский счет 2016премия расо гамбургский счет 2016
премия расо гамбургский счет 2016Елена Волковская
 
Amazon Machine Learning for Developers
Amazon Machine Learning for DevelopersAmazon Machine Learning for Developers
Amazon Machine Learning for DevelopersAmazon Web Services
 
Outside the Comfort Zone: Cross Industry Use Cases in Big Data Analytics
Outside the Comfort Zone: Cross Industry Use Cases in Big Data AnalyticsOutside the Comfort Zone: Cross Industry Use Cases in Big Data Analytics
Outside the Comfort Zone: Cross Industry Use Cases in Big Data AnalyticsRising Media Ltd.
 
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftData Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftAmazon Web Services
 
How to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the CloudHow to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the CloudAttunity
 
雲端產品的用戶體驗檢測重要性與作法
雲端產品的用戶體驗檢測重要性與作法雲端產品的用戶體驗檢測重要性與作法
雲端產品的用戶體驗檢測重要性與作法NTUST
 
Predictive Analytics in Telecommunication
Predictive Analytics in TelecommunicationPredictive Analytics in Telecommunication
Predictive Analytics in TelecommunicationRising Media Ltd.
 
Roboter-Journalismus: die Erstellung automatisch generierter Spielberichte
Roboter-Journalismus: die Erstellung automatisch generierter SpielberichteRoboter-Journalismus: die Erstellung automatisch generierter Spielberichte
Roboter-Journalismus: die Erstellung automatisch generierter SpielberichteRising Media Ltd.
 

Viewers also liked (16)

4.gestion del cono y toma de decisiones
4.gestion del cono y toma de decisiones4.gestion del cono y toma de decisiones
4.gestion del cono y toma de decisiones
 
Prezentacija ozakonjenje
Prezentacija ozakonjenjePrezentacija ozakonjenje
Prezentacija ozakonjenje
 
Writing Sample -- International Investment in Health (1)
Writing Sample -- International Investment in Health (1)Writing Sample -- International Investment in Health (1)
Writing Sample -- International Investment in Health (1)
 
1.poder, liderazgo
1.poder, liderazgo1.poder, liderazgo
1.poder, liderazgo
 
Lessons Learned from Five Years of Investment by USDA NIFA into Climate Chang...
Lessons Learned from Five Years of Investment by USDA NIFA into Climate Chang...Lessons Learned from Five Years of Investment by USDA NIFA into Climate Chang...
Lessons Learned from Five Years of Investment by USDA NIFA into Climate Chang...
 
премия расо гамбургский счет 2016
премия расо гамбургский счет 2016премия расо гамбургский счет 2016
премия расо гамбургский счет 2016
 
Amazon Machine Learning for Developers
Amazon Machine Learning for DevelopersAmazon Machine Learning for Developers
Amazon Machine Learning for Developers
 
Outside the Comfort Zone: Cross Industry Use Cases in Big Data Analytics
Outside the Comfort Zone: Cross Industry Use Cases in Big Data AnalyticsOutside the Comfort Zone: Cross Industry Use Cases in Big Data Analytics
Outside the Comfort Zone: Cross Industry Use Cases in Big Data Analytics
 
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon RedshiftData Warehousing in the Era of Big Data: Intro to Amazon Redshift
Data Warehousing in the Era of Big Data: Intro to Amazon Redshift
 
Peluquera
PeluqueraPeluquera
Peluquera
 
How to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the CloudHow to Operationalise Real-Time Hadoop in the Cloud
How to Operationalise Real-Time Hadoop in the Cloud
 
cours
courscours
cours
 
雲端產品的用戶體驗檢測重要性與作法
雲端產品的用戶體驗檢測重要性與作法雲端產品的用戶體驗檢測重要性與作法
雲端產品的用戶體驗檢測重要性與作法
 
5.1_Empowering Clean Energy_Nasle_EPRI/SNL Microgrid
5.1_Empowering Clean Energy_Nasle_EPRI/SNL Microgrid5.1_Empowering Clean Energy_Nasle_EPRI/SNL Microgrid
5.1_Empowering Clean Energy_Nasle_EPRI/SNL Microgrid
 
Predictive Analytics in Telecommunication
Predictive Analytics in TelecommunicationPredictive Analytics in Telecommunication
Predictive Analytics in Telecommunication
 
Roboter-Journalismus: die Erstellung automatisch generierter Spielberichte
Roboter-Journalismus: die Erstellung automatisch generierter SpielberichteRoboter-Journalismus: die Erstellung automatisch generierter Spielberichte
Roboter-Journalismus: die Erstellung automatisch generierter Spielberichte
 

Similar to Data Mining Techniques 2016 White Paper

Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Bikramjit Sarkar, Ph.D.
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applicationsSubrat Swain
 
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEDATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEIJDKP
 
A Comprehensive Study on Outlier Detection in Data Mining
A Comprehensive Study on Outlier Detection in Data MiningA Comprehensive Study on Outlier Detection in Data Mining
A Comprehensive Study on Outlier Detection in Data MiningBRNSSPublicationHubI
 
DM-Unit-1-Part 1-R.pdf
DM-Unit-1-Part 1-R.pdfDM-Unit-1-Part 1-R.pdf
DM-Unit-1-Part 1-R.pdfssuserb933d8
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousingSunny Gandhi
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data scienceJohnson Ubah
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
 
IRJET- Comparative Study of Efficacy of Big Data Analysis and Deep Learni...
IRJET-  	  Comparative Study of Efficacy of Big Data Analysis and Deep Learni...IRJET-  	  Comparative Study of Efficacy of Big Data Analysis and Deep Learni...
IRJET- Comparative Study of Efficacy of Big Data Analysis and Deep Learni...IRJET Journal
 
4113ijaia09
4113ijaia094113ijaia09
4113ijaia09mamin321
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 

Similar to Data Mining Techniques 2016 White Paper (20)

Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Data Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope SurveyData Mining Applications And Feature Scope Survey
Data Mining Applications And Feature Scope Survey
 
Data mining
Data miningData mining
Data mining
 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applications
 
KDD assignmnt data.docx
KDD assignmnt data.docxKDD assignmnt data.docx
KDD assignmnt data.docx
 
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEDATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
 
A Comprehensive Study on Outlier Detection in Data Mining
A Comprehensive Study on Outlier Detection in Data MiningA Comprehensive Study on Outlier Detection in Data Mining
A Comprehensive Study on Outlier Detection in Data Mining
 
DM-Unit-1-Part 1-R.pdf
DM-Unit-1-Part 1-R.pdfDM-Unit-1-Part 1-R.pdf
DM-Unit-1-Part 1-R.pdf
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Big data upload
Big data uploadBig data upload
Big data upload
 
IRJET- Comparative Study of Efficacy of Big Data Analysis and Deep Learni...
IRJET-  	  Comparative Study of Efficacy of Big Data Analysis and Deep Learni...IRJET-  	  Comparative Study of Efficacy of Big Data Analysis and Deep Learni...
IRJET- Comparative Study of Efficacy of Big Data Analysis and Deep Learni...
 
4113ijaia09
4113ijaia094113ijaia09
4113ijaia09
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 

More from Mehmet Beyaz

GPRS KPIs based on network performance
GPRS KPIs based on network performance GPRS KPIs based on network performance
GPRS KPIs based on network performance Mehmet Beyaz
 
Netwoek opt seminar 2012
Netwoek opt seminar 2012Netwoek opt seminar 2012
Netwoek opt seminar 2012Mehmet Beyaz
 
Telekomünikasyon şebeke yönetimi
Telekomünikasyon şebeke yönetimiTelekomünikasyon şebeke yönetimi
Telekomünikasyon şebeke yönetimiMehmet Beyaz
 

More from Mehmet Beyaz (8)

Management 5 g
Management 5 gManagement 5 g
Management 5 g
 
GPRS KPIs based on network performance
GPRS KPIs based on network performance GPRS KPIs based on network performance
GPRS KPIs based on network performance
 
Analyze gears
Analyze gearsAnalyze gears
Analyze gears
 
Ttg leaflet
Ttg leafletTtg leaflet
Ttg leaflet
 
TTG's OSS Tools
TTG's OSS ToolsTTG's OSS Tools
TTG's OSS Tools
 
Netwoek opt seminar 2012
Netwoek opt seminar 2012Netwoek opt seminar 2012
Netwoek opt seminar 2012
 
Telekomünikasyon şebeke yönetimi
Telekomünikasyon şebeke yönetimiTelekomünikasyon şebeke yönetimi
Telekomünikasyon şebeke yönetimi
 
North i
North iNorth i
North i
 

Recently uploaded

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 

Recently uploaded (20)

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024VictoriaMetrics Anomaly Detection Updates: Q1 2024
VictoriaMetrics Anomaly Detection Updates: Q1 2024
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 

Data Mining Techniques 2016 White Paper

  • 1. Data Mining Techniques 2016 1 | 7 White Paper Data Mining Techniques Prepared by Mehmet BEYAZ TTG International, L.T.D. www.ttgint.com 30/06/2016 Words of Wisdom You will see it as you like to see. - Mevlana Jalaluddin Rumi-
  • 2. Data Mining Techniques 2016 2 | 7 Introduction Everyone knows that the Internet and smart phones have changed how businesses operate, governments function, and society lives and communicates. Recently, new technological trend is just as transformative: “big data.” Big data starts with the fact that there is a lot more information floating around these days than ever before, and it is being put to extraordinary new uses. Big data is about more than just communication. Since, we live in the world of “Big Data. The idea is that we can learn from a large body of information that we could not comprehend when we used only smaller amounts. DATA MINING We are living in a world, where a vast amount of digital data which is called big data. Plus as the world becomes more and more connected via the Internet of Things (IoT). The IoT has been a major influence on the Big Data landscape. These data are collected consciously from 5 minutes to hourly and daily basses from different sources every day. The analysis of such big data brings ahead business competition to the next level of innovation and productivity. Therefore, the extraction and interpretation of hidden patterns in data sets is of great importance. Data mining is a modern tool that aims to discover meaningful knowledge from large data sets and prediction trends. Data mining offers not only a retrospective view on a business process, but also enables humans to develop a successful market strategy. Origins The Data mining originates in the 80s, when it was introduced and utilized within a research community. Data mining also known as KDD (Knowledge Discovery in Databases) and sometimes refer as a Data Analytics as well. The data mining is defined as the component of KDD process and deals with the examination of inner patterns in databases. Besides that, KDD is concerned about the evaluation and interpretation of discovered patterns. Although, exact meanings of KDD and data mining terms differ from each other, often they are used interchangeably. In this paper I utilize KDD and data mining as synonyms, if it is not specified. Data mining is the analysis of large data observational data sets to find out unknown relationships with in the verity of data set and to summarize the data in novel ways that are both understandable and useful to the data owner. Data mining computational methods find themselves in the intersection of classical statistics, artificial intelligence, and machine learning. Data mining as a whole knowledge discovery process also involves many disciplines, such as databases, data cleaning, visualization, exploratory data analysis, and performance and KPI evaluation. Methods Data mining techniques are categorized into supervised, semi-supervised, and unsupervised methods. Supervised method is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.
  • 3. Data Mining Techniques 2016 3 | 7 Y = f(X) The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data. It is called supervised learning because the process of algorithm learning from the training dataset can be thought of as a teacher supervising the learning process. Unlike the supervised approach, the unsupervised technique is to model the underlying structure or distribution in the data in order to learn more about the data. These are called unsupervised learning because unlike supervised learning above there are no correct answers and there is no teacher. Algorithms are left to their own devises to discover and present the interesting structure in the data. Also data scientists identify semi-supervised learning, which is similar to a supervised one. Problems where you have a large amount of input data (X) and only some of the data is labelled (Y) are called semi-supervised learning problems. These problems sit in between both supervised and unsupervised learning. A good example is a photo archive where only some of the images are labelled, (e.g. dog, cat, cow, person) and the majority are unlabelled. Many real world machine learning problems fall into this area. This is because it can be expensive or time consuming to label data as it may require access to domain experts. Whereas unlabelled data is cheap and easy to collect and store. You can use unsupervised learning techniques to discover and learn the structure in the input variables. In Summary In this paper you learned the difference between supervised, unsupervised and semi- supervised learning. You now know that:  Supervised: All data is labelled and the algorithms learn to predict the output from the input data.  Unsupervised: All data is unlabelled and the algorithms learn to inherent structure from the input data.  Semi-supervised: Some data is labelled but most of it is unlabelled and a mixture of supervised and unsupervised techniques can be used.
  • 4. Data Mining Techniques 2016 4 | 7 The aim of the Data mining is may be distinguished in different processes categories. While discovery focuses on searching a database for hidden patterns without a predefined hypothesis about the nature of the pattern and deriving a model of the causal generator of the data. Data mining usually falls into two main categories. They are Predictive and Descriptive. See figure 1 at below. Predictive:  Classification aims to categorize unseen input data records into known classes. The assignment model or classifier learns from the training data set, where the relationship between records and classes is provided.  Time series forecasting predicts the future value of a target function based on the previously observed measurements Figure 1 Data mining technics. Descriptive:  Data mining requires some data to find the pattern. Predictive and Descriptive data mining are also classified in different parts.  Regression aims to predict numerical values for input data records. The mapping function learns from the training data set, where the relationship between records and their values is known.
  • 5. Data Mining Techniques 2016 5 | 7 Anomaly detection extracts points or outliers that are considerably different from the rest manifold of data points. Descriptive:  Clustering identifies manifolds of points called clusters with similar properties or behaviours.  Association analysis discovers relationships between records within the same data set. Knowledge Discovery in Databases Process The KDD is an automatic, exploratory data analysis and modelling of large data sources. The KDD is the organized process of identifying valid, novel, useful, and human eye understandable patterns from large and complex data sets. Data Mining is the core of the KDD process, involving the connecting of algorithms that explore the data, develop the model and discover previously unknown patterns. The KDD knowledge discovery process is repetitive, interactive, and consists of nine steps. Figure 2 The unifying goal of the KDD process is to extract useful information from data in the context of large databases. Data mining refers to the set of computational methods that extract valuable patterns from original data. Additionally, KDD process is concerned about manipulation with massive data, scaling algorithms for better performance, proper interpretation of retrieved information, and human interaction with the overall process. KDD process is a sequential analysis that includes the following steps, see Figure 2:  selection,  pre-processing,  transformation,
  • 6. Data Mining Techniques 2016 6 | 7  data mining,  and information interpretation However, this sequential knowledge extraction approach may involve iterations, because at any point the data analyst can change settings and repeat previous steps again. The process starts with determining the KDD goals, and ends with the implementation of the discovered knowledge. Thus, the basic KDD sequence may include closed loops, and the effects are then measured on the new data repositories, and the KDD process is launched again. The knowledge exploration process starts with the development of necessary theoretical and practical background in the application domain. The understanding of relevant knowledge is important to achieve customer’s goals. The followings are a brief description of the nine step KDD process; Selection It implies the selection of the target data set based on goals. Determine what data will be used for the knowledge discovery, such as: what data is available, obtaining additional necessary data, and the integrating all the data for the knowledge discovery into one data set. This process is very important because the data mining learns and discovers from the available data. Pre-processing The quality of the selected data is often inappropriate for further analysis, because of multiple reasons. Outliers, missing variables, or high level of noise during the measurements require special data strategies. Hence, Data reliability is enhanced in this stage. Transformation This step can be crucial for the success of the entire KDD project, and it is usually very project specific. Transformation projects an original data into a low dimensional (dimension reduction) space embedded space and includes linear and nonlinear method. The reduced set of embedded features allows visual inspection and facilitates the further mining of knowledge. Data mining The core element of the KDD process is the data mining phase, which includes several steps. Depending on the customer’s goal, a specific data mining task is chosen classification, anomaly detection, regression, or clustering. There are two major goals in data mining: prediction and description. Then, the chosen data mining algorithm is executed to search for underlying patterns and valuable knowledge. Interpretation/Evaluation The final step of the KDD process is interpretation and evaluation of the retrieved information with respect to the goals defined in the first step. This step involves techniques for visual analysis and a
  • 7. Data Mining Techniques 2016 7 | 7 number of performance metrics. The correct interpretation of results is important, because it allows checking assumptions and tuning parameters of previous KDD components. Finally, the discovered knowledge and designed KDD algorithm may be incorporate into an existing business model. The possible usage scenarios encompass reporting and prediction, optimization and automation of the business processes.
  • 8. Data Mining Techniques 2016 8 | 7 References 1. Detecting Cellular Network Anomalies Using the Knowledge Discovery Process by, Sergey Chernov, JYVÄSKYLÄ 2015 2. The UCI KDD Archive of Large Data Sets for Data Mining Research and Experimentation by, Stephen D. Bay, Dennis Kibler, Michael J. Pazzani, and Padhraic Smyth Department of Information and Computer Science University of California, Irvine Irvine, CA 92697 3. Data mining and complex telecommunications problems modeling Janusz Granat 4. DATA MINING IN TELECOMMUNICATIONS Gary M. Weiss Department of Computer and Information Science Fordham University 5. Data Mining with Big Data - IEEE Xplore ieeexplore.ieee.org/iel7/69/4358933/06547630.pdf?arnumber=6547630 6. Data Mining for Big Data: A Review Bharti Thakur, Manish Mann Computer Science Department LRIET, Solan (H.P), India 7. https://blog.udemy.com/knowledge-discovery-in-databases/ 8. http://www.economist.com/node/15557443 9. http://www.neural-forecasting.com/nn_for_data_mining.htm 10. http://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/KDD3.htm