4. ARTIFICIAL NEURAL NETWORK (ANN) An artificial neural network (ANN), usually called "neural network" (NN), is a mathematical model or computational model that tries to simulate the structure and/or functional aspects of biological neural networks. It consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation. In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase.
7. Self-Organisation: An ANN can create its own organisation or representation of the information it receives during learning time.
8. Real Time Operation: ANN computations may be carried out in parallel, and special hardware devices are being designed and manufactured which take advantage of this capability.
14. Equation 2. Error derivative at this epoch The Quickprop algorithm is loosely based on Newton's method. It is quicker than standard backpropagation because it uses an approximation to the error curve, and second order derivative information which allow a quicker evaluation. Training is similar to backprop except for a copy of (eq. 1) the error derivative at a previous epoch. This, and the current error derivative (eq. 2), are used to minimise an approximation to this error curve.
15.
16. Unsupervised learning In unsupervised learning we are given some data x and the cost function to be minimized, that can be any function of the data x and the network's output, f. The cost function is dependent on the task (what we are trying to model) and our a priori assumptions (the implicit properties of our model, its parameters and the observed variables).
17. Unsupervised learning As a trivial example, consider the model f(x) = a, where a is a constant and the cost C = E[(x − f(x))2]. Minimizing this cost will give us a value of a that is equal to the mean of the data. The cost function can be much more complicated. Its form depends on the application: for example, in compression it could be related to the mutual information between x and y, whereas in statistical modelling, it could be related to theposterior probability of the model given the data. (Note that in both of those examples those quantities would be maximized rather than minimized). Tasks that fall within the paradigm of unsupervised learning are in general estimation problems; the applications include clustering, the estimation of statistical distributions, compression and filtering.
18. Unsupervised learning Unsupervised learning, in contrast to supervised learning, does not provide the network with target output values. This isn't strictly true, as often (and for the cases discussed in the this section) the output is identical to the input. Unsupervised learning usually performs a mapping from input to output space, data compression or clustering.
19. Reinforcement learning In reinforcement learning, data x are usually not given, but generated by an agent's interactions with the environment. At each point in time t, the agent performs an action yt and the environment generates an observation xt and an instantaneous cost ct, according to some (usually unknown) dynamics. Tasks that fall within the paradigm of reinforcement learning are control problems, games and other sequential decision making tasks.
20. Reinforcement learning The aim is to discover a policy for selecting actions that minimizes some measure of a long-term cost; i.e., the expected cumulative cost. The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated. ANNs are frequently used in reinforcement learning as part of the overall algorithm.
21. Neural Network “Learning Rules”: Successful learning in any neural network is dependent on how the connections between the neurons are allowed to change in response to activity. The manner of change is what the majority of researchers call "a learning rule". However, we will call it a "synaptic modification rule" because although the network learned the sequence, it is not clear that the *connections* between the neurons in the network "learned" anything in particular.
22. Mathematical synaptic Modification rule There are many categories of mathematical synaptic modification rule which are used to describe how synaptic strengths should be changed in a neural network. Some of these categories include: backpropgration of error, correlative Hebbian, and temporally-asymmetric Hebbian.
23. Mathematical synaptic modification rule Backpropogation of error states that connection strengths should change throughout the entire network in order to minimize the difference between the actual activity and the "desired" activity at the "output" layer of the network.
24. Mathematical synaptic Modification rule Correlative Hebbian states that any two interconnected neurons that are active at the same time should strengthen their connections, so that if one of the neurons is activated again in the future the other is more likely to become activated too.
25. Mathematical synaptic Modification rule Temporally-asymmetric Hebbian is described in more detail in the example below, but essentially emphasizes the importants of causality: if a neuron realiably fires before another, its connection to the other neuron should be strengthened. Otherwise, it should be weakened.
27. The Delta Rule A generalized form of the delta rule, developed by D.E. Rumelhart, G.E. Hinton, and R.J. Williams, is needed for networks with hidden layers. They showed that this method works for the class of semilinear activation functions (non-decreasing and differentiable). Generalizing the ideas of the delta rule, consider a hierarchical network with an input layer, an output layer and a number of hidden layers.
28. The Delta Rule . We will consider only the case where there is one hidden layer. The network is presented with input signals which produce output signals that act as input to the middle layer. Output signals from the middle layer in turn act as input to the output layer to produce the final output vector. This vector is compared to the desired output vector. Since both the output and the desired output vectors are known, the delta rule can be used to adjust the weights in the output layer.
29. The Delta Rule Can the delta rule be applied to the middle layer? Both the input signal to each unit of the middle layer and the output signal are known. What is not known is the error generated from the output of the middle layer since we do not know the desired output. To get this error, backpropagate through the middle layer to the units that are responsible for generating that output. The error genrated from the middle layer could be used with the delta rule to adjust the weights.
30. The Pattern Associator A pattern associator learns associations between input patterns and output patterns. One of the most appealing characteristics of such a network is the fact that it can generate what it learns about one pattern to other similar input patterns. Pattern associators have been widely used in distributed memory modeling.
31. The Pattern Associator The pattern associator is one of the more basic two-layer networks. Its architecture consists of two sets of units, the input units and the output units. Each input unit connects to each output unit via weighted connections. Connections are only allowed from input units to output units.
32. The Pattern Associator The effect of a unit ui in the input layer on a unit uj in the output layer is determined by the product of the activation ai of ui and the weight of the connection from ui to uj. The activation of a unit uj in the output layer is given by: SUM(wij * ai).
33. Adaptive Resonance Theory (ART) Discrete Bidirectional Associative Memory Kochen Self Organization Map Counter Propagation Network (CPN) Perceptron Vector Representation ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element) Madaline (Multiple Adaline) Backpropagation, or propagation of error
34. Adaptive Resonance Theory (ART) Adaptive Resonance Theory (ART) is a theory developed by Stephen Grossberg and Gail Carpenter on aspects of how the brain processes information. It describes a number of neural network models which use supervised and unsupervised learning methods, and address problems such as pattern recognition and prediction.
36. Kochen Self Organization Map The self-organizing map (SOM) invented by TeuvoKohonen performs a form of unsupervised learning. A set of artificial neurons learn to map points in an input space to coordinates in an output space. The input space can have different dimensions and topology from the output space, and the SOM will attempt to preserve these.
37. Kochen Self Organization Map If an input space is to be processed by a neural network, the first issue of importance is the structure of this space. A neural network with real inputs computes a function f defined from an input space A to an output space B. The region where f is defined can be covered by a Kohonen network in such a way that when, for example,an input vector is selected from the region a1, only one unit in the network fires. Such a tiling in which input space is classified in subregions is also called a chart or map of input space. Kohonen networks learn to create maps of the input space in a self-organizing way.
38. Kochen Self Organization Map-Advantages Probably the best thing about SOMs that they are very easy to understand. It’s very simple, if they are close together and there is grey connecting them, then they are similar. If there is a black ravine between them, then they are different. Unlike Multidimensional Scaling or N-land, people can quickly pick up on how to use them in an effective manner. Another great thing is that they work very well. As I have shown you they classify data well and then are easily evaluate for their own quality so you can actually calculated how good a map is and how strong the similarities between objects are.
40. Perceptron The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. It can be seen as the simplest kind of feedforward neural network: a linear classifier. The Perceptron is a binary classifier that maps its input x (a real-valued vector) to an output value f(x) (a single binary value) across the matrix. where w is a vector of real-valued weights and is the dot product (which computes a weighted sum). b is the 'bias', a constant term that does not depend on any input value.
41. ADALINE Definition Adaline is a single layer neural network with multiple nodes where each node accepts multiple inputs and generates one output. Given the following variables: x is the input vector w is the weight vector n is the number of inputs θ some constant y is the output then we find that the output is . If we further assume that xn + 1 = 1 wn + 1 = θ then the output reduces to the dot product of x and w
42. Madaline Madaline (Multiple Adaline) is a two layer neural network with a set of ADALINEs in parallel as its input layer and a single PE (processing element) in its output layer. For problems with multiple input variables and one output, each input is applied to one Adaline. For similar problems with multiple outputs, madalines in parallel can be used. The madaline network is useful for problems which involve prediction based on multiple inputs, such as weather forecasting (Input variables: barometric pressure, difference in pressure. Output variables: rain, cloudy, sunny).
43. Backpropagation Backpropagation, or propagation of error, is a common method of teaching artificial neural networks how to perform a given task. It was first described by Arthur E. Bryson and Yu-Chi Ho in 1969,[1][2] but it wasn't until 1986, through the work of David E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams, that it gained recognition, and it led to a “renaissance” in the field of artificial neural network research. It is a supervised learning method, and is an implementation of the Delta rule. It requires a teacher that knows, or can calculate, the desired output for any given input. It is most useful for feed-forward networks (networks that have no feedback, or simply, that have no connections that loop). The term is an abbreviation for "backwards propagation of errors". Backpropagation requires that the activation function used by theartificial neurons (or "nodes") is differentiable.
46. Counter propagation network (CPN) (§ 5.3) Basic idea of CPN Purpose: fast and coarse approximation of vector mapping not to map any given x to its with given precision, input vectors x are divided into clusters/classes. each cluster of x has one output y, which is (hopefully) the average of for all x in that class. Architecture: Simple case: FORWARD ONLY CPN, y z x 1 1 1 y v z w x j j,k k k,i i y z x m p n from hidden (class) to output from input to hidden (class)
49. Adaptive Resonance Theory Adaptive Resonance Theory (ART) was developed by Grossberg (1976) Input vectors which are close to each other according to a specific similarity measure should be mapped to the same cluster ART adapts itself by storing input patterns, and tries to match best the input pattern 45
50. Adaptive Resonance Theory 1 (ART 1) ART 1 is a binary classification model. Various other versions of the model have evolved from ART 1 Pointers to these can be found in the bibliographic remarks The main network comprises the layers F1, F2 and the attentional gain control as the attentional subsystem The attentional vigilance node forms the orienting subsystem
51. ART 1: Architecture … … Attentional Subsystem Orienting Subsystem F2 - - + + F1 - + - G A + + + I
60. To achieve these, we need: a mechanism for testing and determining (dis)similarity between x and . a control for finding/creating new class nodes. need to have all operations implemented by units of local computation. Only the basic ideas are presented Simplified from the original ART model Some of the control mechanisms realized by various specialized neurons are done by logic statements of the algorithm
62. Working of ART1 3 phases after each input vector x is applied Recognition phase: determine the winner cluster for x Using bottom-up weights b Winner j* with max yj* = bj*ּx x is tentatively classified to cluster j* the winner may be far away from x (e.g., |tj* - x| is unacceptably large)
63. Working of ART1 (3 phases) Comparison phase: Compute similarity using top-down weights t: vector: If (# of 1’s ins)|/(# of 1’s inx) > ρ, accept the classification, update bj* and tj* else: remove j* from further consideration, look for other potential winner or create a new node with x as its first patter.
64. Weight update/adaptive phase Initial weight: (no bias) bottom up: top down: When a resonance occurs with If k sample patterns are clustered to node jthen = pattern whose 1’s are common to all these k samples
68. Notes Classification as a search process No two classes have the same b and t Outliers that do not belong to any cluster will be assigned separate nodes Different ordering of sample input presentations may result in different classification. Increase of r increases # of classes learned, and decreases the average class size. Classification may shift during search, will reach stability eventually. There are different versions of ART1 with minor variations ART2 is the same in spirit but different in details.
70. cluster units: competitive, receive input vector x through weights b: to determine winner j. input units: placeholder or external inputs interface units: pass s to x as input vector for classification by compare x and controlled by gain control unit G1 Needs to sequence the three phases (by control units G1, G2, and R)
71. R = 0: resonance occurs, update and R = 1: fails similarity test, inhibits J from further computation
77. Fuzzy ART Layer1 consists of neurons that are connected to the neurons in Layer 2 through weight vectors. Thenumber of neurons in Layer 1 depends on the characteristics of the input data. The Layer 2 represent clusters.
80. Fuzzy ART FMEA FMEA values are evaluated separately with severity, detection and occurrence values The aim is to apply Fuzzy ART algorithm to FMEA method and by performing FMEA on test problems, most favorable parameter combinations (α , β and ρ) are investigated.
86. Neurophysiological Evidence for ARTMechanisms The attentional subsystem of an ART network has been used to model aspects of the inferotemporal cortex Orienting subsystem has been used to model a part of the hippocampal system, which is known to contribute to memory functions The feedback prevalent in an ART network can help focus attention in models of visual object recognition
88. Ehrenstein Pattern Explained by ART ! The bright disc disappears when the alignment of the dark lines is disturbed! Generates a circular illusory contour – a circular disc of enhanced brightness
89. 78 Other Neurophysiological Evidence Adam Sillito [University College, London] Cortical feedback in a cat tunes cells in its LGN to respond best to lines of a specific length. Chris Redie [MPI Entwicklungsbiologie, Germany] Found that some visual cells in a cat’s LGN and cortex respond best at line ends— more strongly to line ends than line sides. Sillito et al. [University College, London] Provide neurophysiological data suggesting that the cortico-geniculate feedback closely resembles the matching and resonance of an ART network. Cortical feedback has been found to change the output of specific LGN cells, increasing the gain of the input for feature linked events that are detected by the cortex.
90. Computational Experiment Anon-binary dataset of FMEA is used to evaluate the performance of the Fuzzy ART neural network on different test problems 79
91. 80 Computational Experiment For acomprehensive analysis of the effects of parameters on the performance of Fuzzy ART in FMEA case, a number of levels of parameters are considered.
92. 81 Computational Experiment The Fuzzy ART neural network method is applied to determine the most favorable parameter (α, β and ρ) combinations during application of FMEA on test problems
93. 82 Results For any test problem 900 solutions are obtained. The β-ρ interactions for parameter combinations are considered where solutions are obtained. For each test problem, all the combinations are evaluated and frequency distribution of clusters are constituted
95. Results For example, for test problem 1, four groups which consist the 70% of combinations are selected, cluster numbers that contains minimum 80% of the all combinations are determined according to the results of pareto analysis. These are groups 2-3 and 4 84
96. Results Parameter combinations, β-ρ interactions and the number of α parameters in any combination of β and ρ, is shown at the side. Favorable solutions are marked as bold and italic 85
97. Results Number of cluster increases with the increase in ρ. Number of cluster increases with the increase in β. Clustering of the data in most problems depends on the interaction between the β and ρ parameters. α parameter has no effect on solution in small scaled problems, but in large scale problems, effect of α turns to an irregular state Also with the increase in problem scale, the change in number of clusters is defined.
98. Results In FMEA test problems, which determine most favorable parameter combinations, β-ρ interactions providing appropriate cluster numbers are noted on the summary table that evaluates each test problem separately. The values involve favorable β-ρ combinations are marked with the blue area. This is a suitable solution area for FMEA problem. 87
99. 88 ART 1: Clustering Application Clustering pixel based alphabet images
100. 89 Conclusion and Discussion Fuzzy ART neural network is applied to FMEA Appropriate parameter intervals are investigated for giving successful results of Fuzzy ART in FMEA problems. The investigations show us, if input number is smaller than or equal to 30, FMEA problem is defined as small scale, otherwise it is large scale. We suggest that cluster numbers should be determined between 2 and 6 at small scale problems for practical studies. Cluster numbers of large scale problems should be maximum 12for practical studies.
101. 90 Conclusion and Discussion Determinations about α: In small scale problems, alfa increases cluster number only if β is greater than or equal to 0.8. In other conditions, it is observed that α values have no effect on solution. In large scale problems, appropriate interval cannot be determined because the effect of α becomes irregular Determinations about β: For both small and large scale problems, number of cluster increases with the increase in β. Determinations about ρ: For both small and large scale problems, number of cluster increases with the increase in ρ.
102. 91 Conclusion and Discussion For small and large scale problems in FMEA, Fuzzy ART algorithm is fast, effective and easy to implement. Parameter combinations are acquired where the best solution is obtained for non-binary problems.
104. References : Carpenter, G.A. and Grossberg, S. (1987a), "A massively parallel architecture for a self-organizing neural pattern recognition machine", Computer Vision, Graphics, and Image Processing 37, 54-115. Carpenter, G.A. and Grossberg, S. (1987b), "ART 2: Stable self-organization of pattern recognition codes for analog input patterns", Applied Optics 26, 4919-4930. Carpenter, G.A. and Grossberg, S. (1990), "ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures", Neural Networks 3, 129-152. Carpenter, G.A., Grossberg, S. and Reynolds, J.H. (1991a), "ARTMAP: Supervised real-time learning and classification of non-stationary data by a self-organizing neural network", Neural Networks 4, 565-588.