Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

my IEEE

  • Login to see the comments

  • Be the first to like this

my IEEE

  1. 1. Comparison of Genetic Algorithm Optimization on Artificial Neural Network and Support Vector Machine in Intrusion Detection System Amin Dastanpour* Advanced Informatics School, Universiti Teknologi Malaysia, Kuala lumpur amindastanpoure@ gmail.com Suhaimi Ibrahim Advanced Informatics School, Universiti Teknologi Malaysia, Kuala lumpur suhaimiibrahim@ utm.my Reza Mashinchi Faculty of Computing, Universiti Technology Malaysia, Johor, Malaysiar_mashinchi@ yahoo.com Ali Selamat Faculty of Computer Science & IS Universiti Teknologi Malaysia, Johor, asalamat@utm.my Abstract—Asthe technology trend in the recent years uses the systems with network bases, it is crucial to detect them from threats. In this study, the following methods are applied for detecting the network attacks: support vector machine (SVM) classifier, artificial Neural Networks (ANN), and Genetic Algorithms (GA). The objective of this study is to compare the outcomes of GA with SVM and GA with ANN and thencomparing the outcomes of GA with SVM and GA with ANN and other algorithms. Knowledge Discovery and Data Mining (KDD CPU99) data set has been used in this paper for obtaining the results. Keywords—Genetic algorithm (GA); Artificial Neural Network (ANN); Support Vector Machine (SVM); intrusion detection; machine learning; I. INTRODUCTION Today, the internet is the most common communication tool for people. As a result, everyone expects a secure channel or network for their communication purposes. In the recent years, numerous research studies have been performed in the secure communication for safety insurance of the stored and transmitted data. The intrusion detection system (IDS) is one of the tools that has been applied by the administrators for protecting the networks against the unknown activities [1]. The limitation of the system is that it is only capable of detecting the previously known attacks and it is necessary to frequently update the attack signature. In addition, too many attributes need to be considered and this leads to very high network traffic and the distribution of the data will be highly imbalanced. Therefore, the challenge is recognizing the normal behavior against the abnormal ones. Different artificial intelligence approaches have been used for overcoming this particular problem [2]. The goal of the machine learning is improving the machine performance by adapting, learning and discovering the situations that are likely to change over the passing time. In the intrusion detection field, the algorithms of machine learning apply the input of reference for learning the patterns of the attacks. Then, the algorithms will be deployed by upon the unknown attacks for performing the real detection. In addition to the capability of such algorithms for recognizing the new attack patterns, they are also capable of sanitizing the dataset with the irrelevant and redundant features. The mentioned ability will lead to achieving the optimized detection process by only including a few main features in the dataset [3]. The goal of machine learning is discovering, learning, and then adapting to the situations that may change, thus, making improvements in the machine performance. In the intrusion detection field, the input of the references is applied in the machine learning algorithms so that they can learn the attack patterns. After that, the algorithms are used in the unknown input attacks for performing the actual detection process. In addition to the ability of new attack pattern recognition, there is able for these algorithms as well. They can also clean the dataset form the irrelevant and redundant features and as a result, only the key features will be contained in the data set and the detection process will be optimized [4]. Some of the known approaches of machine learning are artificial neural network (ANN), support vector machine (SVM) and Genetic algorithm (GA). The artificial neural network (ANN) is one of the most popular machine learning techniques and it has been applied for solving the classification and regression problems. The ANNs are capable of machine learning and recognizing the patterns. For example, in neural system recognition, the features if the input data may lead to activating a set of input neurons for representing an attack or a normal activity [5]. The ANNs have a number of advantages. However, one of these advantages is known as the most popular one and it is their capability of learning from the observing the data set. In this method, the ANN is actually used as an approximation tool for the random functions [6]. The ANNs have three levels of interconnections. The first layer is the input neurons. These neurons send the data to the next layer. The second one sends the resulted of neurons in the third layer and the third layer is an output [7]. 2014 IEEE Conference on Open Systems (ICOS), October 26-28, 2014, Subang, Malaysia 978-1-4799-6367-6/14/$31.00©2014 IEEE 72
  2. 2. The applicability of the ANN is for the classification and reorganization of the data. However, for reorganization and classification, the ANN needs a large dataset. In order to optimize this type of data and for generating and making a pattern or feature, the ANN needs a special system or algorithm to overcome such problems. This study aims to propose the application of GA to improve the ANN mechanism. Besides, in this study, GA will be used to overcome this problem [8]. One of the most popular methods of machine learning is Support Vector Machine (SVM) and it has been applied to solve the regression and classification problems. For each one of the given input data, the SVM takes a series of input data and predicts that the output is formed by which one of two probable classes (it is also known as the binary linear classifier). Given a set of training examples, each marked as belonging to one of two categories (Attack or Normal). In the attack detection, the SVM is responsible of predicting if the new data falls into the category of normal data or the attack group [9]. The SVM is helpful in reorganizing and classifying the data. However, for classification and reorganization, a large data set is required by SVM [10]. For optimizing this data type and for making and generating a feature pattern, a special algorithm or system is required by SVM for overcoming such problems. This study has proposes to apply the GA for improving the mechanism of SVM. In addition, this study intends to utilize the GA for overcoming this problem [11]. GA is one of the most used and most popular algorithms for the machine learning. It is an adaptive and exploratory algorithm for search and work that has been based upon the evolutionary ideas of natural genetics [12]. The GA generates the primary individual population with a quality in a high level of the individuals. Besides, each one of these individuals represents a solution for the problem [13]. GA is known as a parallel algorithm and it can find a solution for a problem with many subsets, thus, this algorithm is a proper algorithm to be used for IDS. Genetic algorithm is capable of simultaneously finding and searching for solutions in various problem subsets. Moreover, the GA has no mathematical derivation and it is capable of reaching to the roper solution sets for the problems. In addition, GA can propose a solution in a single solution that its value is optimal. Besides, the GA is capable of recognizing the new data or attacks from the previous ones and it is considered as a suitable method for the intrusion detection systems, particularly for detecting the attacks, which are based upon the human behavior [14]. In the machine learning field, the process of selecting a set or a subset in a related feature for making a solution model is known as the feature selection. When the feature is in use, the assumption is that there are redundant and irrelevant information included in the data. Thus, in machine learning and to overcome this problem, the feature selection algorithm is used by the researchers to select the relevant and useful information [15]. II. RELATED WORK In the previous studies, the researchers have tried to solve this problem by using different methods such as LCFS, FFSA and MMIFS [16], fuzzy rule based [17], SVM Classification, GA optimization [18], ANN Classification [19], GA optimization and four- angle-star [20]. Table 1 illustrate of these methods in brief. TABLE I. PREVIOUS WORK Author Method objective Bin Luo et al. four-angle-star based visualized feature generation approach, (FASVFG) evaluate the distance between samples in a 5-class classification problem Abraham et al. fuzzy rule based classifiers framework for Distributed Intrusion Detection Systems (DIDS) Amiri et al. Forward feature selection algorithm(FFSA) Liner correlation feature selection (LCFS) Modified mutual information feature selection (MMIFS) Propose a feature selection phase, which can be generally implemented on any intrusion detection Li et al. Ant colony algorithm and support vector machine (SVM) This paper proposes a desirable IDS model with high efficiency and accuracy Dastanpour et al. Propose a feature selection based on the genetic algorithm and support vector machine Improve detection rate with the less number of features Dastanpour et al. Applying Genetic Algorithms (GA) with Artificial Neural Networks classifier to detect the attacks in network Increase of accuracy with the optimal number of features 2014 IEEE Conference on Open Systems (ICOS), October 26-28, 2014, Subang, Malaysia 73
  3. 3. III. DATA ANALYSIS In this study, theKnowledge Discovery and Data Mining(KDD CUP 1999) has been applied for the data set. This dataset has been used due to its comprehensiveness. It is also the best dataset to investigate one’s IDS performance. There are 22 attack types included in this dataset [21] and they can be classified into 4 groups [22]: probing, U2R, R2L, and DOS with the following details [23]: Probing: surveillance and other probing: the network is scanned by this attack type of data collection about the targeted host. U2R: unauthorized access to privileges of the root (local super user). This attack is known as the attacks in which the attacker can access the system and can exploit the vulnerabilities to gain the key permissions. R2L: unauthorized access from a remote machine. In this attack type, some packets are sent in the network for achieving the network accessibility as a known and local user. DOS: denial of service. This attack type is applied to user behavior understanding. This attack type requires spending some computing resources and memory. IV. METHODOLOGY In Fig.1, the main idea of the study and the entire method has been illustrated. First, in this method, the dataset will be dived in a random pattern into 2 groups, the training set and the testing set. In the training phase, the 1st task of the machine learning is leaning and selecting the most proper features and then in the testing phase, the machine learning knowledge is tested by the machine learning and the selected features in the training phase are also tested and after that the data is categorized into the two groups, the attacks and the normal data. In the machine learning process, the SVM and ANN receive the data and then the both of the SVM and ANN are used by the system for classifying the training data. Then the SVM and ANN are ready to be applied in the training set of the system. After all, when each of these algorithm classifier of testing data the result of detection or classification pass to the GA for optimization or improvement of each algorithm for high reorganization [24]. In other words, when the classification of SVM and ANN are finished, the classification of each algorithm is improved by GA for achieving high detection. FIGURE 1. OVERALL METHOD OF THIS PAPER The GA is a method in which the global optimization is searched and it is able to simulate the behavior and the process of the evolution in the nature. It means that each key that may be possible will be trained in a vector type that is known as the chromosome. Each one of the vector elements is a representative of a gene. A population will be formed by the whole set of the chromosomes and the population projection is based upon the function of the fitness [25]. For measuring the chromosome fitness, a fitness value is used. The genetic process primary populations are developed randomly. The operators are applied by the GA to create the next generation out of the current generation: mutation, crossover, and reproduction. The chromosomes that have lower fitness are omitted by the GA. Besides, the GA prevents the chromosomes with high fitness [26]. All the aforementioned process will be repeated and as a result, more chromosomes will be received by the next generation with high fitness. This process will be continued until an individual proper chromosome is detected [27]. A primary individual set is turned into the individuals with high quality by the GA and each individual that has been achieved can operate as one solution. The above mentioned individuals are known as the chromosomes and some pre-determined genes are the elements that form those chromosomes [28]. 2014 IEEE Conference on Open Systems (ICOS), October 26-28, 2014, Subang, Malaysia 74
  4. 4. V. EXPERIMENTAL RESULT In this paper, the SVM and ANN are first used for the classification and recognition of the data in groups: normal and attack. Then the genetic algorithm was used to optimize the recognized data by SVM and ANN. In this study, the GA optimization means the improvement of the classification of each method for FIGURE FIGURE FIGURE 4. 98 98.5 99 99.5 100 1 3 5 7 9 11 DetectionRaate(%) 99.93 99.94 99.95 99.96 99.97 99.98 99.99 100 1 3 5 7 9 11 13 DetectionRate(%) 98 98.5 99 99.5 100 1 3 5 7 9 11 13 DetectionRate(%) RESULT In this paper, the SVM and ANN are first used for the classification and recognition of the data into two groups: normal and attack. Then the genetic algorithm was used to optimize the recognized data by SVM and ANN. In this study, the GA optimization means the improvement of the classification of each method for the percentage of classification and recog GA and ANN results are shown in and SVM results are indicated in effectiveness of the GA on the classification methods is illustrated. In table 2 the comparison between the effect of GA on SVM and ANN with algorithms applied in the intrusion detection is illustrated. IGURE 2. RESULT OF DETECTION RATE FOR ANN WITH GA IGURE 3. RESULT OF DETECTION RATE FOR SVM WITH GA RESULT OF COMPARING ANN WITH GA AND SVM WITH GA 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Number Of Feature 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Number Of Feature 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 Number Of Feature the percentage of classification and recognition. The GA and ANN results are shown in Fig.2 and the GA and SVM results are indicated in Fig.3. In Fig.4, the effectiveness of the GA on the classification methods illustrated. In table 2 the comparison between the effect of GA on SVM and ANN with the other algorithms applied in the intrusion detection is 41 GA- ANN GA-SVM GA - SVM GA - ANN 2014 IEEE Conference on Open Systems (ICOS), October 26-28, 2014, Subang, Malaysia 75
  5. 5. TABLE II. COMPARATIVE OF GA ON ANN AND SVM WITH OTHER ALGORITHM Name of algorithm Detection rate Number of Feature LCFS 100 % 21 FFSA 100 % 31 MMIFS 100 % 24 fuzzy rule based 100 % 41 FASVFG 94 % 20 SVM With GA 100 % 24 ANN with GA 100 % 18 The comparison indicates that the GA with ANN will result in a better performance with a lower number of features. When the GA and SVM are compared with GA and ANN, it can be recognized that the GA the effectiveness of the GA is higher on ANN than SVM. Although high detection rates can be achieved by the other algorithms, GA and ANN can reach a high detection rate with a lower number of features. VI. CONCLUSION In this study GA has been proposed for producing the detection features. Then the SVM and ANN are used for the detection system classifier and comparing with each other to show the effectiveness of the GA on these methods. The outcomes show that in comparison with the other methods, the highest detection rate is obtained by the GA with ANN. In this study, a series of experiments was conducted by applying the KDD cup 99 dataset for the detection of four categories of network attacks. The feature selection that has been based upon the GA with the ANN classification shows more proper detection rates in the proposed intrusion detection system. In order to detect the attacks efficiently, the GA with SVM requires 24 features and GA with ANN needs 18 for achieving 100% of detection. In the future work, it has been planned that the other methods of classification be employed with GA. In addition, their effectiveness is planned to be explored in the network attack detection. VII. ACKNOWLEDGEMENT This research is funded by the Research University grant of UniversityTechnology Malaysia (UTM) under the Vot no. 08H28. The authors would like to thank the Research Management Centre of UTM and the Malaysian ministry of education for their support and cooperation including students and other individuals who are either directly or indirectly involved in this project. VIII. REFRENCES [1] S. X. Wu and W. Banzhaf, "The use of computational intelligence in intrusion detection systems: A review," Applied Soft Computing, vol. 10, pp. 1-35, 2010. [2] A. Simmonds, P. Sandilands, and L. Van Ekert, "An ontology for network security attacks," in Applied Computing, ed: Springer, 2004, pp. 317-323. [3] A. Tamilarasan, S. Mukkamala, A. H. Sung, and K. Yendrapalli, "Feature ranking and selection for intrusion detection using artificial neural networks and statistical methods," in Neural Networks, 2006. IJCNN'06. International Joint Conference on, 2006, pp. 4754-4761. [4] V. T. Goh, J. Zimmermann, and M. Looi, "Towards intrusion detection for encrypted networks," in Availability, Reliability and Security, 2009. ARES'09. International Conference on, 2009, pp. 540-545. [5] M. S. Prasad, A. V. Babu, and M. K. B. Rao, "An Intrusion Detection System Architecture Based on Neural Networks and Genetic Algorithms," International Journal of Computer Science and Management Research, vol. 2, 2013. [6] E. Corchado and Á. Herrero, "Neural visualization of network traffic data for intrusion detection," Applied Soft Computing, vol. 11, pp. 2042-2056, 2011. [7] O. Linda, T. Vollmer, and M. Manic, "Neural network based intrusion detection system for critical infrastructures," in Neural Networks, 2009. IJCNN 2009. International Joint Conference on, 2009, pp. 1827-1834. [8] A. Bivens, C. Palagiri, R. Smith, B. Szymanski, and M. Embrechts, "Network-based intrusion detection using neural networks," Intelligent Engineering Systems through Artificial Neural Networks, vol. 12, pp. 579-584, 2002. [9] S. Mukkamala, G. Janoski, and A. Sung, "Intrusion detection using neural networks and support vector machines," in Neural Networks, 2002. IJCNN'02. Proceedings of the 2002 International Joint Conference on, 2002, pp. 1702-1707. [10] T. Shon, J. Seo, and J. Moon, "SVM approach with a genetic algorithm for network intrusion detection," in Computer and Information Sciences-ISCIS 2005, ed: Springer, 2005, pp. 224-233. [11] D. S. Kim and J. S. Park, "Network-based intrusion detection with support vector machines," in Information Networking, 2003, pp. 747-756. [12] M. S. Hoque, M. Mukit, M. Bikas, and A. Naser, "An implementation of intrusion detection system using genetic algorithm," arXiv preprint arXiv:1204.1336, 2012. [13] P. Gupta and S. K. Shinde, "Genetic algorithm technique used to detect intrusion detection," in Advances in Computing and Information Technology, ed: Springer, 2011, pp. 122-131. [14] W. Li, "Using genetic algorithm for network intrusion detection," Proceedings of the United States Department of Energy Cyber Security Group, pp. 1-8, 2004. [15] G. G. Helmer, J. S. Wong, V. Honavar, and L. Miller, "Intelligent agents for intrusion detection," in Information Technology Conference, 1998. IEEE, 1998, pp. 121-124. [16] F. Amiri, M. Rezaei Yousefi, C. Lucas, A. Shakery, and N. Yazdani, "Mutual information-based feature selection for intrusion detection systems," Journal of Network and Computer Applications, vol. 34, pp. 1184-1199, 2011. [17] A. Abraham, R. Jain, J. Thomas, and S. Y. Han, "D-SCIDS: Distributed soft computing intrusion detection system," Journal of Network and Computer Applications, vol. 30, pp. 81-98, 2007. [18] A. Dastanpour and R. A. R. Mahmood, "Feature Selection Based on Genetic Algorithm and SupportVector Machine for Intrusion Detection System," in The Second International 2014 IEEE Conference on Open Systems (ICOS), October 26-28, 2014, Subang, Malaysia 76
  6. 6. Conference on Informatics Engineering & Information Science (ICIEIS2013), 2013, pp. 169-181. [19] A. Dastanpour, S. Ibrahim, and R. Mashinchi, "Using Genetic Algorithm to Supporting Artificial Neural Network for Intrusion Detection System," in The International Conference on Computer Security and Digital Investigation (ComSec2014), 2014, pp. 1-13. [20] B. Luo and J. Xia, "A novel intrusion detection system based on feature generation with visualization strategy," Expert Systems with Applications, 2014. [21] M. K. Siddiqui and S. Naahid, "Analysis of KDD CUP 99 Dataset using Clustering based Data Mining," International Journal of Database Theory & Application, vol. 6, 2013. [22] L. M. L. de Campos, R. C. L. de Oliveira, and M. Roisenberg, "Network Intrusion Detection System Using Data Mining," in Engineering Applications of Neural Networks, ed: Springer, 2012, pp. 104-113. [23] I. Levin, "KDD-99 classifier learning contest: LLSoft's results overview," SIGKDD explorations, vol. 1, pp. 67-75, 2000. [24] S. Dhopte and M. Chaudhari, "Genetic Algorithm for Intrusion Detection System." [25] Y.-X. Meng, "The practice on using machine learning for network anomaly intrusion detection," in Machine Learning and Cybernetics (ICMLC), 2011 International Conference on, 2011, pp. 576-581. [26] H. Sarvari and M. M. Keikha, "Improving the accuracy of intrusion detection systems by using the combination of machine learning approaches," in Soft Computing and Pattern Recognition (SoCPaR), 2010 International Conference of, 2010, pp. 334-337. [27] R. Sommer and V. Paxson, "Outside the closed world: On using machine learning for network intrusion detection," in Security and Privacy (SP), 2010 IEEE Symposium on, 2010, pp. 305-316. [28] M. H. Mashinchi, M. R. Mashinchi, and S. M. H. Shamsuddin, "A Genetic Algorithm Approach for Solving Fuzzy Linear and Quadratic Equations," World Academy of Science, Engineering and Technology, vol. 28, 2007. 2014 IEEE Conference on Open Systems (ICOS), October 26-28, 2014, Subang, Malaysia 77

×