Introduction Of Artificial neural network

Madras university Department Of Computer Science

Seminar On Introduction Of ANN, Rules And Adaptive Resonance Theory

GROUP MEMBERS ARE :P.JayaVelJ.Joseph Amal RajM.Kaja Mohinden

ARTIFICIAL NEURAL NETWORK (ANN) An artificial neural network (ANN), usually called "neural network" (NN), is a mathematical model or computational model that tries to simulate the structure and/or functional aspects of biological neural networks. It consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation. In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase.

ARTIFICIAL NEURAL NETWORK (ANN)

ARTIFICIAL NEURAL NETWORK (ANN) Why use neural networks? Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyse. This expert can then be used to provide projections given new situations of interest and answer "what if" questions.Other advantages include: ,[object Object]

Self-Organisation: An ANN can create its own organisation or representation of the information it receives during learning time.

Real Time Operation: ANN computations may be carried out in parallel, and special hardware devices are being designed and manufactured which take advantage of this capability.

Fault Tolerance via Redundant Information Coding: Partial destruction of a network leads to the corresponding degradation of performance. However, some network capabilities may be retained even with major network damage. ,[object Object]

Reinforcement learning,[object Object]

supervised learning ,[object Object],Quickprop[fahlman88empirical] ,[object Object]

Equation 2. Error derivative at this epoch The Quickprop algorithm is loosely based on Newton's method. It is quicker than standard backpropagation because it uses an approximation to the error curve, and second order derivative information which allow a quicker evaluation. Training is similar to backprop except for a copy of (eq. 1) the error derivative at a previous epoch. This, and the current error derivative (eq. 2), are used to minimise an approximation to this error curve.

supervised learning The update rule is given in equation 3: ,[object Object], This equation uses no learning rate. If the slope of the error curve is less than that of the previous one, then the weight will change in the same direction (positive or negative). However, there needs to be some controls to prevent the weights from growing too large.

Unsupervised learning In unsupervised learning we are given some data x and the cost function to be minimized, that can be any function of the data x and the network's output, f. The cost function is dependent on the task (what we are trying to model) and our a priori assumptions (the implicit properties of our model, its parameters and the observed variables).

Unsupervised learning As a trivial example, consider the model f(x) = a, where a is a constant and the cost C = E[(x − f(x))2]. Minimizing this cost will give us a value of a that is equal to the mean of the data. The cost function can be much more complicated. Its form depends on the application: for example, in compression it could be related to the mutual information between x and y, whereas in statistical modelling, it could be related to theposterior probability of the model given the data. (Note that in both of those examples those quantities would be maximized rather than minimized). Tasks that fall within the paradigm of unsupervised learning are in general estimation problems; the applications include clustering, the estimation of statistical distributions, compression and filtering.

Unsupervised learning Unsupervised learning, in contrast to supervised learning, does not provide the network with target output values. This isn't strictly true, as often (and for the cases discussed in the this section) the output is identical to the input. Unsupervised learning usually performs a mapping from input to output space, data compression or clustering.

Reinforcement learning In reinforcement learning, data x are usually not given, but generated by an agent's interactions with the environment. At each point in time t, the agent performs an action yt and the environment generates an observation xt and an instantaneous cost ct, according to some (usually unknown) dynamics. Tasks that fall within the paradigm of reinforcement learning are control problems, games and other sequential decision making tasks.

Reinforcement learning The aim is to discover a policy for selecting actions that minimizes some measure of a long-term cost; i.e., the expected cumulative cost. The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated. ANNs are frequently used in reinforcement learning as part of the overall algorithm.

Neural Network “Learning Rules”: Successful learning in any neural network is dependent on how the connections between the neurons are allowed to change in response to activity. The manner of change is what the majority of researchers call "a learning rule". However, we will call it a "synaptic modification rule" because although the network learned the sequence, it is not clear that the *connections* between the neurons in the network "learned" anything in particular.

Mathematical synaptic Modification rule There are many categories of mathematical synaptic modification rule which are used to describe how synaptic strengths should be changed in a neural network. Some of these categories include: backpropgration of error, correlative Hebbian, and temporally-asymmetric Hebbian.

Mathematical synaptic modification rule Backpropogation of error states that connection strengths should change throughout the entire network in order to minimize the difference between the actual activity and the "desired" activity at the "output" layer of the network.

Mathematical synaptic Modification rule Correlative Hebbian states that any two interconnected neurons that are active at the same time should strengthen their connections, so that if one of the neurons is activated again in the future the other is more likely to become activated too.

Mathematical synaptic Modification rule Temporally-asymmetric Hebbian is described in more detail in the example below, but essentially emphasizes the importants of causality: if a neuron realiably fires before another, its connection to the other neuron should be strengthened. Otherwise, it should be weakened.

Neural Network “Learning Rules”: The Delta Rule The Pattern Associator The Hebb Rule

The Delta Rule A generalized form of the delta rule, developed by D.E. Rumelhart, G.E. Hinton, and R.J. Williams, is needed for networks with hidden layers. They showed that this method works for the class of semilinear activation functions (non-decreasing and differentiable). Generalizing the ideas of the delta rule, consider a hierarchical network with an input layer, an output layer and a number of hidden layers.

The Delta Rule . We will consider only the case where there is one hidden layer. The network is presented with input signals which produce output signals that act as input to the middle layer. Output signals from the middle layer in turn act as input to the output layer to produce the final output vector. This vector is compared to the desired output vector. Since both the output and the desired output vectors are known, the delta rule can be used to adjust the weights in the output layer.

The Delta Rule Can the delta rule be applied to the middle layer? Both the input signal to each unit of the middle layer and the output signal are known. What is not known is the error generated from the output of the middle layer since we do not know the desired output. To get this error, backpropagate through the middle layer to the units that are responsible for generating that output. The error genrated from the middle layer could be used with the delta rule to adjust the weights.

The Pattern Associator A pattern associator learns associations between input patterns and output patterns. One of the most appealing characteristics of such a network is the fact that it can generate what it learns about one pattern to other similar input patterns. Pattern associators have been widely used in distributed memory modeling.

The Pattern Associator The pattern associator is one of the more basic two-layer networks. Its architecture consists of two sets of units, the input units and the output units. Each input unit connects to each output unit via weighted connections. Connections are only allowed from input units to output units.

The Pattern Associator The effect of a unit ui in the input layer on a unit uj in the output layer is determined by the product of the activation ai of ui and the weight of the connection from ui to uj. The activation of a unit uj in the output layer is given by: SUM(wij * ai).

Adaptive Resonance Theory (ART) Discrete Bidirectional Associative Memory Kochen Self Organization Map Counter Propagation Network (CPN) Perceptron Vector Representation ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element) Madaline (Multiple Adaline) Backpropagation, or propagation of error

Adaptive Resonance Theory (ART) Adaptive Resonance Theory (ART) is a theory developed by Stephen Grossberg and Gail Carpenter on aspects of how the brain processes information. It describes a number of neural network models which use supervised and unsupervised learning methods, and address problems such as pattern recognition and prediction.

Discrete Bidirectional Associative Memory

Kochen Self Organization Map The self-organizing map (SOM) invented by TeuvoKohonen performs a form of unsupervised learning. A set of artificial neurons learn to map points in an input space to coordinates in an output space. The input space can have different dimensions and topology from the output space, and the SOM will attempt to preserve these.

Kochen Self Organization Map If an input space is to be processed by a neural network, the first issue of importance is the structure of this space. A neural network with real inputs computes a function f defined from an input space A to an output space B. The region where f is defined can be covered by a Kohonen network in such a way that when, for example,an input vector is selected from the region a1, only one unit in the network fires. Such a tiling in which input space is classified in subregions is also called a chart or map of input space. Kohonen networks learn to create maps of the input space in a self-organizing way.

Kochen Self Organization Map-Advantages Probably the best thing about SOMs that they are very easy to understand. It’s very simple, if they are close together and there is grey connecting them, then they are similar. If there is a black ravine between them, then they are different. Unlike Multidimensional Scaling or N-land, people can quickly pick up on how to use them in an effective manner. Another great thing is that they work very well. As I have shown you they classify data well and then are easily evaluate for their own quality so you can actually calculated how good a map is and how strong the similarities between objects are.

Perceptron The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. It can be seen as the simplest kind of feedforward neural network: a linear classifier. The Perceptron is a binary classifier that maps its input x (a real-valued vector) to an output value f(x) (a single binary value) across the matrix. where w is a vector of real-valued weights and is the dot product (which computes a weighted sum). b is the 'bias', a constant term that does not depend on any input value.

ADALINE Definition Adaline is a single layer neural network with multiple nodes where each node accepts multiple inputs and generates one output. Given the following variables: x is the input vector w is the weight vector n is the number of inputs θ some constant y is the output then we find that the output is . If we further assume that xn + 1 = 1 wn + 1 = θ then the output reduces to the dot product of x and w

Madaline Madaline (Multiple Adaline) is a two layer neural network with a set of ADALINEs in parallel as its input layer and a single PE (processing element) in its output layer. For problems with multiple input variables and one output, each input is applied to one Adaline. For similar problems with multiple outputs, madalines in parallel can be used. The madaline network is useful for problems which involve prediction based on multiple inputs, such as weather forecasting (Input variables: barometric pressure, difference in pressure. Output variables: rain, cloudy, sunny).

Backpropagation Backpropagation, or propagation of error, is a common method of teaching artificial neural networks how to perform a given task. It was first described by Arthur E. Bryson and Yu-Chi Ho in 1969,[1][2] but it wasn't until 1986, through the work of David E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams, that it gained recognition, and it led to a “renaissance” in the field of artificial neural network research. It is a supervised learning method, and is an implementation of the Delta rule. It requires a teacher that knows, or can calculate, the desired output for any given input. It is most useful for feed-forward networks (networks that have no feedback, or simply, that have no connections that loop). The term is an abbreviation for "backwards propagation of errors". Backpropagation requires that the activation function used by theartificial neurons (or "nodes") is differentiable.

Backpropagation Backpropagation Calculation of error dk = f(Dk) -f(Ok)

Network Structure –Back-propagation Network Oi Output Unit Wj,i ajHidden Units Wk,j IkInput Units

Counter propagation network (CPN) (§ 5.3) Basic idea of CPN Purpose: fast and coarse approximation of vector mapping not to map any given x to its with given precision, input vectors x are divided into clusters/classes. each cluster of x has one output y, which is (hopefully) the average of for all x in that class. Architecture: Simple case: FORWARD ONLY CPN, y z x 1 1 1 y v z w x j j,k k k,i i y z x m p n from hidden (class) to output from input to hidden (class)

[object Object],training sample (x, d ) where is the desired precise mapping Phase1: weights coming into hidden nodes are trained by competitive learning to become the representative vector of a cluster of input vectors x: (use only x, the input part of(x, d )) 1. For a chosen x, feedforward to determined the winning 2. 3. Reduce , then repeat steps 1 and 2 until stop condition is met Phase 2: weights going out of hidden nodes are trained by delta rule to be an average output of where x is an input vector that causes to win (use both x andd). 1. For a chosen x, feedforward to determined the winning 2. (optional) 3. 4. Repeat steps 1 – 3 until stop condition is met

Adaptive Resonance Theory Adaptive Resonance Theory (ART) was developed by Grossberg (1976) Input vectors which are close to each other according to a specific similarity measure should be mapped to the same cluster ART adapts itself by storing input patterns, and tries to match best the input pattern 45

Adaptive Resonance Theory 1 (ART 1) ART 1 is a binary classification model. Various other versions of the model have evolved from ART 1 Pointers to these can be found in the bibliographic remarks The main network comprises the layers F1, F2 and the attentional gain control as the attentional subsystem The attentional vigilance node forms the orienting subsystem

ART 1: Architecture … … Attentional Subsystem Orienting Subsystem F2 - - + + F1 - + - G A + + + I

ART 1: 2/3 Rule J … F2 Si(yj) vji si - sG G F1 + l li Three kinds of inputs to each F1 neuron decide when the neuron fires ,[object Object]

Top-down feedback through outstar weights vji

Gain control signal sG,[object Object]

Adaptive Resonance Theory (ART) ,[object Object]

Motivations: Previous methods have the following problems:Number of class nodes is pre-determined and fixed. ,[object Object]

Some nodes may have empty classes.

no control of the degree of similarity of inputs grouped in one class. Training is non-incremental: ,[object Object]

adding new samples often requires re-train the network with the enlarged training set until a new stable state is reached.,[object Object]

To achieve these, we need: a mechanism for testing and determining (dis)similarity between x and . a control for finding/creating new class nodes. need to have all operations implemented by units of local computation. Only the basic ideas are presented Simplified from the original ART model Some of the control mechanisms realized by various specialized neurons are done by logic statements of the algorithm

Working of ART1 3 phases after each input vector x is applied Recognition phase: determine the winner cluster for x Using bottom-up weights b Winner j* with max yj* = bj*ּx x is tentatively classified to cluster j* the winner may be far away from x (e.g., |tj* - x| is unacceptably large)

Working of ART1 (3 phases) Comparison phase: Compute similarity using top-down weights t: vector: If (# of 1’s ins)|/(# of 1’s inx) > ρ, accept the classification, update bj* and tj* else: remove j* from further consideration, look for other potential winner or create a new node with x as its first patter.

Weight update/adaptive phase Initial weight: (no bias) bottom up: top down: When a resonance occurs with If k sample patterns are clustered to node jthen = pattern whose 1’s are common to all these k samples

Example for input x(1) Node 1 wins

Notes Classification as a search process No two classes have the same b and t Outliers that do not belong to any cluster will be assigned separate nodes Different ordering of sample input presentations may result in different classification. Increase of r increases # of classes learned, and decreases the average class size. Classification may shift during search, will reach stability eventually. There are different versions of ART1 with minor variations ART2 is the same in spirit but different in details.

R G1 G2 ART1 Architecture + + - - + + + +

cluster units: competitive, receive input vector x through weights b: to determine winner j. input units: placeholder or external inputs interface units: pass s to x as input vector for classification by compare x and controlled by gain control unit G1 Needs to sequence the three phases (by control units G1, G2, and R)

R = 0: resonance occurs, update and R = 1: fails similarity test, inhibits J from further computation

ART clustering algorithms ,[object Object]

Fuzzy ART Layer1 consists of neurons that are connected to the neurons in Layer 2 through weight vectors. Thenumber of neurons in Layer 1 depends on the characteristics of the input data. The Layer 2 represent clusters.

Fuzzy ART FMEA FMEA values are evaluated separately with severity, detection and occurrence values The aim is to apply Fuzzy ART algorithm to FMEA method and by performing FMEA on test problems, most favorable parameter combinations (α , β and ρ) are investigated.

Hand-worked Example Cluster the vectors 11100, 11000, 00001, 00011 Low vigilance: 0.3 High vigilance: 0.7

Hand-worked Example:  = 0.3

ART 1: Clustering Application  = 0.3

Hand-worked Example:  = 0.7

ART 1: Clustering Application  = 0.7

Neurophysiological Evidence for ARTMechanisms The attentional subsystem of an ART network has been used to model aspects of the inferotemporal cortex Orienting subsystem has been used to model a part of the hippocampal system, which is known to contribute to memory functions The feedback prevalent in an ART network can help focus attention in models of visual object recognition

Other Applications Aircraft Part Design Classification System. See text for details.

Ehrenstein Pattern Explained by ART ! The bright disc disappears when the alignment of the dark lines is disturbed! Generates a circular illusory contour – a circular disc of enhanced brightness

78 Other Neurophysiological Evidence Adam Sillito [University College, London] Cortical feedback in a cat tunes cells in its LGN to respond best to lines of a specific length. Chris Redie [MPI Entwicklungsbiologie, Germany] Found that some visual cells in a cat’s LGN and cortex respond best at line ends— more strongly to line ends than line sides. Sillito et al. [University College, London] Provide neurophysiological data suggesting that the cortico-geniculate feedback closely resembles the matching and resonance of an ART network. Cortical feedback has been found to change the output of specific LGN cells, increasing the gain of the input for feature linked events that are detected by the cortex.

Computational Experiment Anon-binary dataset of FMEA is used to evaluate the performance of the Fuzzy ART neural network on different test problems 79

80 Computational Experiment For acomprehensive analysis of the effects of parameters on the performance of Fuzzy ART in FMEA case, a number of levels of parameters are considered.

81 Computational Experiment The Fuzzy ART neural network method is applied to determine the most favorable parameter (α, β and ρ) combinations during application of FMEA on test problems

82 Results For any test problem 900 solutions are obtained. The β-ρ interactions for parameter combinations are considered where solutions are obtained. For each test problem, all the combinations are evaluated and frequency distribution of clusters are constituted

Introduction Of Artificial neural network

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Introduction Of Artificial neural network

Similar to Introduction Of Artificial neural network (20)

More from Nagarajan

More from Nagarajan (18)

Recently uploaded

Recently uploaded (20)

Introduction Of Artificial neural network