1. VAIBHAV DHATTARWAL
CSE IDD
5TH YEAR
08211018
UNDER THE GUIDANCE OF
DR.DURGA TOSHNIWAL
Artificial Neural Networks based Data
Mining Techniques
2. Introduction
Introduction to Knowledge Discovery in Databases
Process and components of the Data Mining Process.
The various Data Mining Techniques and a brief
description of these techniques.
A brief overview of artificial neural networks and their
position as an applicable tool in data mining.
Applications of the techniques available to data mining
practitioners, including Artificial Neural Networks,
Regression, and Decision Trees.
3. Presentation Overview
KDD Process
Data Mining
CRISP-DM Model
Mining Techniques
Artificial Neural Networks
Back Propagation Algorithm
Applications
Conclusion
5. Knowledge Discovery in Databases(KDD) Process
The Knowledge Discovery in Databases (KDD) process is
commonly defined with the stages:
Selection
Pre-processing
Transformation
Data Mining
Interpretation/Evaluation.
6. Data Mining
Data mining is the term used to describe the process of
extracting value from a database. A Data-warehouse is a
location where information is stored. The type of data stored
depends largely on the type of industry and the company.
Data mining (the analysis step of the "Knowledge Discovery in
Databases" process, or KDD), is the process that attempts to
discover patterns in large data sets.
It utilizes methods at the intersection of artificial
intelligence, machine learning, statistics, and database
systems. The overall goal of the data mining process is to
extract information from a data set and transform it into an
understandable structure for further use.
7. Data Mining Process : Steps Involved
Data cleaning The task of this step is to remove noise
and inconsistent data.
Data integration In this step, multiple data sources like
the ones mentioned in the section above can be
combined to an integrated collection of data.
Data selection All the data relevant to the analysis task
is retrieved from the database in this step.
Data transformation The data is transformed or
consolidated into forms appropriate for mining by
performing summary or aggregation operations.
8. Data Mining Process : Steps Involved
Data mining The critical step where intelligent methods
are applied in order to extract data patterns.
Pattern evaluation This step is deployed to identify the
truly interesting patterns representing knowledge
based on certain measures.
Knowledge presentation In the final step, various
visualization and knowledge representation techniques
are used to present the mined knowledge to the user.
9. Data Mining Functions
Classification: It infers the defining characteristics of a
certain group.
Clustering: It identifies groups of items that share a
particular characteristic.
Association: It identifies relationships between events
that occur at one time.
Sequencing: It is similar to association, except that the
relationship exists over a period of time.
Forecasting: It estimates future values based on patterns
within large sets of data.
10. Data Mining : Data Types
Data Mining is performed on the following types of data :
Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
12. Business understanding - In this phase, the business objectives must be
understood clearly by finding out what the client really want to achieve.
Next, we have to assess the situation by finding about the resources,
assumptions, constraints and other important factors. Then from the
business objectives and current situations, we need to create goals to
achieve the business objective within the current situation.
Data understanding - This phase starts with initial data collection from
available sources to get familiar with data. Data load and Data integration
are carried out to ensure successful data collection. Then, the data need to
be explored by tackling the data mining questions, which can be addressed
using querying, reporting and visualization. Finally, we must check whether
the acquired data is complete, and ensure that there are no missing values
in the acquired data.
Data preparation - The data preparation normally consumes about 90% of
the time. The outcome of the data preparation phase is the final data set.
When the available data sources are identified, they need to be selected,
cleaned, constructed and formatted into the desired form.
Cross-Industry Standard Process for Data Mining
(CRISP-DM) Model
13. Modelling - Several modelling techniques are selected to be used for the
prepared dataset. A test scenario must be generated to validate the
model’s quality. One or more models are created by running the modelling
tool on the prepared dataset. The created models need to be assessed
carefully so that they meet business initiatives.
Evaluation - In the evaluation phase, the model results must be evaluated
in the context of business objectives in the first phase. In this phase, new
business requirements may be raised due to new patterns has been
discovered in the model results or from other factors. Gaining business
understanding is an iterative process in data mining. The final decision
must be made in this step to move to the deployment phase.
Deployment - The knowledge or information gained through data mining
process needs to be presented in such a way that it can be used, whenever
it is desired. From project point of view, the final evaluation of the project
needs to summarize the project experiences and review the project to see
what needs to be improved.
Cross-Industry Standard Process for Data Mining
(CRISP-DM) Model
14. Data Mining Techniques : Classification
Classification is the most commonly applied data mining technique,
which employs a set of pre-classified examples to develop a model
that can classify the population of records at large.
Classification is a classic data mining technique based on machine
learning. Basically classification is used to classify each item in a set
of data into one of predefined set of classes or groups.
The data classification process involves learning and classification.
In Learning, the training data are analyzed by classification algorithm.
In classification, test data are used to estimate the accuracy of the classification
rules. If the accuracy is acceptable, the rules can be applied to the new data
tuples.
Classification method makes use of mathematical techniques such
as decision trees, linear programming, neural network and statistics.
In classification, we make the software that can learn how to classify
the data items into groups.
15. Data Mining Techniques : Clustering
Clustering can be defined as identification of similar classes of objects.
Clustering is a data mining technique that makes meaningful or useful cluster of
objects that have similar characteristic using automatic technique. By using
clustering techniques we can further identify dense and sparse regions in object
space and can discover overall distribution pattern and correlations among data
attributes.
Due to the fact that classification approach can become costly, Clustering can be
used as pre-processing approach for attribute subset selection and classification.
In clustering technique, the classes are defined and accordingly objects are put in
them, whereas in classification objects are assigned into predefined classes.
16. Data Mining Techniques : Regression
Regression analysis helps in understanding how the typical value of the
dependent variable changes when any one of the independent variables is
varied, while the other independent variables are held fixed. In other
words, it estimates the average value of the dependent variable when the
independent variables are fixed.
In all cases, the estimation target is a function of the independent
variables called the regression function. In regression analysis, it is also of
interest to characterize the variation of the dependent variable around the
regression function, which can be described by a probability distribution.
17. Data Mining Techniques : Association Rules
Association is one of the best known data mining technique.
In association, a pattern is discovered based on a relationship
of a particular item on other items in the same transaction.
Association and correlation is usually to find frequent item set
findings among large data sets. This type of finding helps
businesses to make certain decisions, such as catalogue
design, cross marketing and customer shopping behaviour
analysis.
Association rules are usually required to satisfy a user-
specified minimum support and a user-specified minimum
confidence at the same time
18. Data Mining Techniques : Neural Networks
An Artificial Neural Network (ANN), usually called neural
network (NN), is a mathematical model or computational
model that is inspired by the structure and functional aspects
of biological neural networks.
A neural network consists of an interconnected group of
artificial neurons, and it processes information using a
connection based approach to computation.
In most cases an ANN is an adaptive system that changes its
structure based on external or internal information that flows
through the network during the learning phase.
Modern neural networks are non-linear statistical data
modelling tools. They are usually used to model complex
relationships between inputs and outputs or to find patterns
in data.
20. Artificial Neural Network
Neural networks are non-linear statistical data modelling
tools. They can be used to model complex relationships
between inputs and outputs; or to find patterns in data and to
infer rules from them.
Neural networks are useful in providing information on
associations, classifications, clusters, and forecasting. Using
neural networks as a tool, data warehousing firms can harvest
information from datasets in the data mining process.
Neural networks are used to estimate sampled functions
when we do not know the form of the functions.
The two abilities: pattern recognition and function estimation
make neural networks a very prevalent utility in data mining.
With their model-free estimators and their dual nature, neural
networks serve data mining in a variety of ways.
22. Input data is presented to the network and propagated through the
network until it reaches the output layer. This forward process
produces a predicted output.
The predicted output is subtracted from the actual output and an
error value for the networks is calculated.
The neural network then uses supervised learning, which in most
cases is back propagation, to train the network. Back propagation is
a learning algorithm for adjusting the weights. It starts with the
weights between the output layer PE’s and the last hidden layer PE’s
and works backwards through the network.
Once back propagation has finished, the forward process starts
again, and this cycle is continued until the error between predicted
and actual outputs is minimized.
Feed Forward Neural Network : Training
23. Back Propagation Algorithm
Initialize the weights in the network
Do
For each example E in the training set
O = neural-net-output (network, e); forward pass
T = teacher output for e
Calculate error (T - O) at the output units
Compute delta_wh for all weights from hidden layer to output layer ;
backward pass
Compute delta_wi for all weights from input layer to hidden layer ;
backward pass continued
Update the weights in the network
Until all examples classified correctly or stopping criterion
satisfied
Return the network
24. Back Propagation Algorithm
Phase 1: Propagation
Every propagation involves the following steps:
Forward propagation of a training pattern's input through the neural
network.
Backward propagation of the propagation's output activations through the
neural network using the training pattern's target.
Phase 2: Weight update
For each weight-synapse the following steps are used:
Multiply its output delta and input activation to get the gradient of the
weight.
Bring the weight in the opposite direction of the gradient by subtracting a
ratio of it from the weight.
Repeat phase 1 and 2 until the performance of the network is
satisfactory.
26. Spatial Data Mining
Spatial Data Cube Construction
As with relational data, we can integrate spatial data to
construct a data warehouse that facilitates spatial data
mining. A spatial data warehouse is a subject-oriented,
integrated, time variant and non-volatile collection of both
spatial and non-spatial data in support of spatial data mining
and spatial-data-related decision-making processes.
There are three types of dimensions in a spatial data cube:
A non spatial dimension
A spatial-to-non spatial dimension
A spatial-to-spatial dimension
28. Web Mining
The World Wide Web serves as a huge, widely distributed, global
information service centre for news, advertisements, consumer
information, financial management, education, government, e-
commerce, and many other information services. The Web also
contains a rich and dynamic collection of hyperlink information and
Web page access and usage information, providing rich sources for
data mining.
Challenges:
The Web seems to be too huge for effective data warehousing and data mining
The complexity of Web pages is far greater than that of any traditional text
document collection
The Web is a highly dynamic information source
The Web serves a broad diversity of user communities
Only a small portion of the information on the Web is truly relevant or useful
Besides mining Web contents and Web linkage structures, another
important task for Web mining is Web usage mining.
29. Applications : Intrusion Detection
The security of our computer systems and data is at continual
risk. The extensive growth of the Internet and increasing
availability of tools and tricks for intruding and attacking
networks have prompted intrusion detection to become a
critical component of network administration. Some areas in
which data mining technology is being applied or further
developed for intrusion detection:
Development of data mining algorithms for intrusion detection
Association and correlation analysis, and aggregation to help select and
build discriminating attributes
Analysis of stream data
Distributed data mining
Visualization and querying tools
30. Conclusions
Although the basic steps in data mining include data cleaning, selection and
transformation; the functions and techniques are only applied in the vital step where
intelligent methods are used to detect patterns.
Cross Industry Standard Process for Data Mining Model is an effective approach to a
model which considers business requirements at every step.
Classification and Clustering techniques are popular and easily applicable in data
mining, however classification we require prior characteristic information.
Artificial Neural Networks can be deployed to detect patterns and make predictions
which make them capable tools in data mining. A feed forward neural network uses a
back propagation algorithm to train itself.
The application of data mining techniques along with GIS techniques makes for a
potential opportunity to explore various aspects of Spatial Data Mining.
The growth of data available for processing, as well as multimedia elements and the
world wide web leads to greater opportunities for data mining techniques. However
the pre-processing, selection and transformation needs to be handled first.