Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

Anomaly Detection through Reinforcement Learning Slide 1 Anomaly Detection through Reinforcement Learning Slide 2 Anomaly Detection through Reinforcement Learning Slide 3 Anomaly Detection through Reinforcement Learning Slide 4 Anomaly Detection through Reinforcement Learning Slide 5 Anomaly Detection through Reinforcement Learning Slide 6 Anomaly Detection through Reinforcement Learning Slide 7 Anomaly Detection through Reinforcement Learning Slide 8 Anomaly Detection through Reinforcement Learning Slide 9 Anomaly Detection through Reinforcement Learning Slide 10 Anomaly Detection through Reinforcement Learning Slide 11 Anomaly Detection through Reinforcement Learning Slide 12 Anomaly Detection through Reinforcement Learning Slide 13 Anomaly Detection through Reinforcement Learning Slide 14 Anomaly Detection through Reinforcement Learning Slide 15 Anomaly Detection through Reinforcement Learning Slide 16 Anomaly Detection through Reinforcement Learning Slide 17 Anomaly Detection through Reinforcement Learning Slide 18 Anomaly Detection through Reinforcement Learning Slide 19 Anomaly Detection through Reinforcement Learning Slide 20 Anomaly Detection through Reinforcement Learning Slide 21 Anomaly Detection through Reinforcement Learning Slide 22 Anomaly Detection through Reinforcement Learning Slide 23 Anomaly Detection through Reinforcement Learning Slide 24 Anomaly Detection through Reinforcement Learning Slide 25 Anomaly Detection through Reinforcement Learning Slide 26 Anomaly Detection through Reinforcement Learning Slide 27 Anomaly Detection through Reinforcement Learning Slide 28
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

2 Likes

Share

Download to read offline

Anomaly Detection through Reinforcement Learning

Download to read offline

Talk given at Ottawa Artificial Intelligence and Machine Learning MeetUp group on 29th Jan 2018

Anomaly Detection through Reinforcement Learning

  1. 1. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 1 Anomaly Detection through Reinforcement Learning . . Dr. Hari Koduvely Chief Data Scientist ZIGHRA.COM
  2. 2. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 2 Outline of Talk: . ● Zighra and SensifyID Platform ● Sequential Anomaly Detection Problem ● Introduction to Reinforcement Learning ● Markov Decision Process and Q-Learning ● Function Approximation using Neural Networks ● Application to Network Intrusion Detection Problem ● Implementation using TensorFlow
  3. 3. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 3 ZIGHRA.COM . ● Zighra (https://zighra.com) provides solutions for Continuous Behavioural Authentication & Threat Detection ● Highlights of our SensifyID Platform: ○ Core is an AI based 6-layer Anomaly Detection System combining behavioral biometrics with contextual, social and other signals ○ Cover uses cases such as User Verification, Account Takeover, Remote Attacks and Bot Attacks ○ Can be integrated to any Web, Mobile & IoT application ○ 2 patents granted and 10+ in application stage
  4. 4. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 4 Sequential Anomaly Detection Problem . ● Classical Anomaly Detection Problem is to find patterns in a dataset that do not conform to expected normal behavior ● Formulated as a one-class classification task in machine learning ● In many domains the data distribution changes continuously (concept shift) ● An online learning setting is more ideal to deal with concept shifts current_week_purchase average_weekly_purchase Source of image https://www.linkedin.com/pulse/part-2-keep-simple-machine-learning-algorithms-big-dr-dinesh/
  5. 5. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 5 Sequential Anomaly Detection Problem . ● In Sequential Anomaly Detection problem the goal is to find out if a subsequence of a sequence of events shows anomaly or not ● Each event in isolation would appear to be normal and only the sequence of events would indicate an anomaly ○ Username-Password, Username-Password, Username-Password,.... ○ Login to corporate network in midnight, Access a DB rarely used, Download lot of data, Transfer to USB,...... ● A straightforward supervised learning is not feasible here because of credit assignment problem
  6. 6. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 6 Introduction to Reinforcement Learning . ● In Reinforcement Learning, an autonomous agent interacts with an environment and takes certain actions at in each state st ● The environment in return supplies a reward rt for the action agent performed as a supervision signal and also a new state st+1 Agent Environm ent at st rt , st+1
  7. 7. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 7 Introduction to Reinforcement Learning . ● Reinforcement Learning can be formally defined as a Markov Decision Process ● A Markov Decision Process (MDP) is defined by the 5-tuple {st , at , P(st+1 |st , at ), γ , rt } ○ st - State at time t ○ at - Action in state s ○ P(st+1 |st , at ) - State transition probabilities ○ γ - Discount factor ○ rt - Reward function ● Objective of MDP is to come up with an Optimum Policy that achieves maximum cumulative rewards over long period of time
  8. 8. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 8 Q-Learning and Markov Decision Process . ● Q-value Function Q(s, a) - an estimate of maximum total long term rewards starting from state s and performing action a ● Bellman Equation: Q(s, a) = r(s) + γ maxa’ ∑s’ P(s’ |s, a) Q(s’, a’) Q-value for a state-action pair is the current reward plus the expected Q-value of its successor states ● Central theoretical concept used in almost all formulations of reinforcement learning ● It can be proved that starting from random initial conditions, upon iteration of Bellman equation Q(s, a) will converge to an optimum quality function Q*(s, a) ● Optimum policy is given by Π*(s) = argmaxa Q*(s, a)
  9. 9. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 9 Q-Learning and Markov Decision Process . ● It is difficult to know the state transition probabilities P(st+1 |st , at ) for a given problem ● Bellman’s equation can be cast in a derivative form where transition probabilities are not needed ● Only the actual observed state from the environment is used ● Temporal Difference Learning Algorithm: When an agent makes a transition from state s by performing an action a to state s’, its Q value is updated as follows: Q(s, a) ← Q(s, a) + α [ r(s) + γ maxa’ Q(s’, a’) - Q(s, a) ] α is a learning rate << 1 ● The Q-values are adjusted towards the ideal local equilibrium when Bellman’s equation holds.
  10. 10. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 10 Function Approximation using Neural Networks . ● The Bellman’s equation is a deterministic algorithm ● For problems where the state and action spaces are small one can use a table to represent Q(s, a) ● In many practical applications, state and action spaces are continuous ● One needs an efficient function approximation method for representing Q(s, a) ● Two standard approaches for this are ○ Tile Coding: Partition continuous space into overlapping set of tiles. ➢ Success depends up on the number and width of tiles. ➢ It is a linear function approximation ○ Neural Networks: Nonlinear function approximation, more powerful representation
  11. 11. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 11 Function Approximation using Neural Networks . ● One can use Neural Networks to approximate Q(s, a) as follows: ○ Inputs : State s represented by the D-dimensional vector {s1 ,s2 ,......,sD } ○ Outputs: Q values for each of the N actions {Q1 , Q2 ,........,QN } Hidden Layers s1 s2 s3 sD Q1 Q2 QN
  12. 12. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 12 Function Approximation using Neural Networks . ● The loss function for training NN is taken as the difference between Q values predicted by the DNN and target Q values given by the Bellman’s equation L = ½ [ (r + γ maxa Q(s’, a’)) - Q(s,a) ]2 ● NN is trained using back propagation as follows: 1. Start an episode of explorations 2. Initialize NN and start from a random state s 3. Do a forward pass of state s through the DNN and get Q-values for all actions 4. Perform an ε-greedy exploration for choosing an action a for the current state s 5. Get the next state s’ and reward r from the environment 6. Pass s’ also through the DNN and compute maxa Q(s’, a’) 7. Set the target Q-value for the output node corresponding to action a to be r + γ maxa’ Q(s’, a’) 8. For all other nodes, keep the target Q-value same as that obtained from DNN prediction in step 2 9. Update the weights using backpropagation 10. Repeat the steps 2-6 till a termination condition is reached 11. Repeat the episodes till network is trained
  13. 13. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 13 Function Approximation using Neural Networks . ● High Level TD NN Learning iteration flow DNN Model Iteration over episodes Iteration over exploration
  14. 14. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 14 Network Intrusion Detection . ● Can we use Reinforcement Learning for Network Intrusion Detection? ● Related research works: ○ James Cannady used a CMAC Neural Network and formulated Network Intrusion Detection as an online learning problem 1 ○ Xin Xu studied the problem of host-based intrusion detection as a multi-stage cyber attack and applied reinforcement learning 2 ○ Arturo Servin studied the DDoS attack as a traffic anomaly problem and used reinforcement learning for detection 3 ○ Kleanthis M used a distributed reinforcement network for network intrusion response 4 ● None of these have used a DNN for function approximation
  15. 15. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 15 Network Intrusion Detection . ● Standard dataset for scientific research NSL-KDD Dataset 5 ● Dataset contains 4 categories of attacks in a local area network ○ DOS - Denial of Service Attacks ○ R2L - Remote to Local where remote hacker trying to get local user privileges ○ U2R - Hacker operates as a normal user and exploit vulnerabilities ○ Probing - Hacker scans the machine to determine vulnerabilities ● Dataset contains 125, 973 connections for Training and 22, 543 for Testing ● Training set has 53.5% normal connections and 46.5% abnormal connections ● There are 41 features (32 continuous, 3 nominal and 6 binary) ● Eg. Type of protocol (TCP, UDP), port number, packet size, rate of transmission
  16. 16. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 16 Network Intrusion Detection . Source of image https://nycdatascience.com/blog/student-works/network-intrusion-detection-2/
  17. 17. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 17 Network Intrusion Detection . ● However NLS-KDD dataset cannot be used for sequential anomaly detection ○ There is not time stamp. Dataset is not a time series data ○ There is no way one can identify the different connections are from the same user/hacker or not ○ One could use it for standard anomaly detection problem using reinforcement learning
  18. 18. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 18 Network Intrusion Detection . ● However NLS-KDD dataset cannot be used for sequential anomaly detection ○ There is not time stamp. Dataset is not a time series data ○ There is no way one can identify the different connections are from the same user/hacker or not ○ One could use the dataset for standard anomaly detection problem using reinforcement learning
  19. 19. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 19 Network Intrusion Detection . ● Reinforcement Learning Formulation with NSL-KDD Dataset ○ The states are characterized by the 41 features in the data set ○ For every state the agent takes either of the two actions: ■ Send an alert ■ Not send an alert ○ The rewards generated by the environment: ■ +1 if the state is normal and action is not send alert ■ +1 if the state is malicious and action is send alert ■ -1 if the state is malicious and action is not send alert ■ -1 if the state is normal and action is send alert
  20. 20. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 20 Implementation using TensorFlow . ● Creation of the Environment ○ Goal of the environment is to stimulate the reward scheme mentioned for the NSL-KDD dataset and also supply a new state every time ○ This can be done using the Gym toolkit from Open AI https://github.com/openai/gym/tree/master/gym/envs gym-network_intrusion/ README.md setup.py gym_network_intrusion/ __init__.py envs/ __init__.py network_intrusion_env.py from gym.envs.registration import register register( id='NetworkIntrusion-v0', entry_point='gym_network_intrusion.envs:NetworkIntr usionEnv', )
  21. 21. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 21 Implementation using TensorFlow . ● Creation of the Environment import gym from gym import error, spaces, utils from gym.utils import seeding class NetworkIntrusionEnv(gym.Env): def __init__(self): ... def _step(self, action): return new_state, reward, episode_over, details ... def _reset(self): return initial_state ... def _get_reward(self, action):
  22. 22. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 22 Implementation using TensorFlow . ● Implementation using TensorFlow ● Two architectures: ○ Deep NN architecture: ■ Discretize continuous variables and use one hot representation ○ Deep and Wide NN architecture: ■ Useful for combining continuous and discrete variables into one NN model ■ Also combines the power of memorization and generalization ■ https://www.tensorflow.org/tutorials/wide_and_deep
  23. 23. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 23 Implementation using TensorFlow . ● Implementation a simple NN using TensorFlow ○ Discretize continuous variables and use one hot representation ○ Used binning (#bins = 5) to convert continuous to categorical ○ There are 226 one hot vectors ○ 3 layer feed forward neural network (226 X 10 X 1) ● Code available at https://github.com/harik68/RL4AD
  24. 24. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 24 Implementation using TensorFlow . ● Model performance (work in progress !) Baseline DNN-RL Model V0.1 TPR FPR Source of image for baseline https://nycdatascience.com/blog/student-works/network-intrusion-detection-2/
  25. 25. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 25 Next Steps . ● Experiment with different discretization scheme or even tile coding ● Experiment with different NN architectures (Deep and Wide)
  26. 26. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 26 References . 1. Next Generation Intrusion Detection: Autonomous Reinforcement Learning of Network Attacks, J. Cannadey, 23rd National Information Systems Security Conference (2000) 2. Sequential anomaly detection based on temporal-difference learning: Principles, models and case studies, Xin Xu, Applied Soft Computing 10 (2010) 859–867 3. Towards Traffic Anomaly Detection via Reinforcement Learning and Data Flow, A. Servin [PDF] york.ac.uk 4. Distributed response to network intrusions using multiagent reinforcement learning, Engineering Applications of Artificial Intelligence, Volume 41 Issue C, May 2015 Pages 270-284 5. NSL-KDD dataset, Canadian Institute for Cyber Security, University of New Brunswick, (http://www.unb.ca/cic/datasets/nsl.html) 6. Artificial Intelligence a Modern Approach by Peter Norvig and Stuart J. Russell, Prentice Hall (2009)
  27. 27. PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM THANK YOU ! We are hiring Data Scientists, Machine Learning Engineers and Mobile Developers Apply at career@zighra.com
  • RavindraGuntur

    Jul. 25, 2020
  • mathematixy

    Feb. 6, 2018

Talk given at Ottawa Artificial Intelligence and Machine Learning MeetUp group on 29th Jan 2018

Views

Total views

1,936

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

90

Shares

0

Comments

0

Likes

2

×