Anomaly Detection through Reinforcement Learning

PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM 1
Anomaly Detection
through Reinforcement Learning
.
.
Dr. Hari Koduvely
Chief Data Scientist
ZIGHRA.COM

Outline of Talk:
.
● Zighra and SensifyID Platform
● Sequential Anomaly Detection Problem
● Introduction to Reinforcement Learning
● Markov Decision Process and Q-Learning
● Function Approximation using Neural Networks
● Application to Network Intrusion Detection Problem
● Implementation using TensorFlow

ZIGHRA.COM
.
● Zighra (https://zighra.com) provides solutions for Continuous Behavioural
Authentication & Threat Detection
● Highlights of our SensifyID Platform:
○ Core is an AI based 6-layer Anomaly Detection System combining
behavioral biometrics with contextual, social and other signals
○ Cover uses cases such as User Verification, Account Takeover,
Remote Attacks and Bot Attacks
○ Can be integrated to any Web, Mobile & IoT application
○ 2 patents granted and 10+ in application stage

Sequential Anomaly Detection Problem
.
● Classical Anomaly Detection Problem is to find patterns in a dataset that do
not conform to expected normal behavior
● Formulated as a one-class classification task in machine learning
● In many domains the data distribution changes continuously (concept shift)
● An online learning setting is more ideal to deal with concept shifts
current_week_purchase
average_weekly_purchase
Source of image https://www.linkedin.com/pulse/part-2-keep-simple-machine-learning-algorithms-big-dr-dinesh/

Sequential Anomaly Detection Problem
.
● In Sequential Anomaly Detection problem the goal is to find out if a
subsequence of a sequence of events shows anomaly or not
● Each event in isolation would appear to be normal and only the sequence
of events would indicate an anomaly
○ Username-Password, Username-Password, Username-Password,....
○ Login to corporate network in midnight, Access a DB rarely used, Download lot of data,
Transfer to USB,......
● A straightforward supervised learning is not feasible here because of credit
assignment problem

Introduction to Reinforcement Learning
.
● In Reinforcement Learning, an autonomous agent interacts with an environment and
takes certain actions at
in each state st
● The environment in return supplies a reward rt
for the action agent performed as a
supervision signal and also a new state st+1
Agent
Environm
ent
at
st
rt
, st+1

Introduction to Reinforcement Learning
.
● Reinforcement Learning can be formally defined as a Markov Decision Process
● A Markov Decision Process (MDP) is defined by the 5-tuple {st
, at
, P(st+1
|st
, at
), γ , rt
}
○ st
- State at time t
○ at
- Action in state s
○ P(st+1
|st
, at
) - State transition probabilities
○ γ - Discount factor
○ rt
- Reward function
● Objective of MDP is to come up with an Optimum Policy that achieves maximum
cumulative rewards over long period of time

Q-Learning and Markov Decision Process
.
● Q-value Function Q(s, a) - an estimate of maximum total long term rewards starting from
state s and performing action a
● Bellman Equation:
Q(s, a) = r(s) + γ maxa’
∑s’
P(s’ |s, a) Q(s’, a’)
Q-value for a state-action pair is the current reward plus the expected Q-value of its successor states
● Central theoretical concept used in almost all formulations of reinforcement learning
● It can be proved that starting from random initial conditions, upon iteration of Bellman
equation Q(s, a) will converge to an optimum quality function Q*(s, a)
● Optimum policy is given by
Π*(s) = argmaxa
Q*(s, a)

Q-Learning and Markov Decision Process
.
● It is difficult to know the state transition probabilities P(st+1
|st
, at
) for a given problem
● Bellman’s equation can be cast in a derivative form where transition probabilities are
not needed
● Only the actual observed state from the environment is used
● Temporal Difference Learning Algorithm:
When an agent makes a transition from state s by performing an action a to state s’,
its Q value is updated as follows:
Q(s, a) ← Q(s, a) + α [ r(s) + γ maxa’
Q(s’, a’) - Q(s, a) ]
α is a learning rate << 1
● The Q-values are adjusted towards the ideal local equilibrium when Bellman’s equation holds.

Function Approximation using Neural Networks
.
● The Bellman’s equation is a deterministic algorithm
● For problems where the state and action spaces are small one can use a table to
represent Q(s, a)
● In many practical applications, state and action spaces are continuous
● One needs an efficient function approximation method for representing Q(s, a)
● Two standard approaches for this are
○ Tile Coding: Partition continuous space into overlapping set of tiles.
➢ Success depends up on the number and width of tiles.
➢ It is a linear function approximation
○ Neural Networks: Nonlinear function approximation, more powerful representation

.
● One can use Neural Networks to approximate Q(s, a) as follows:
○ Inputs : State s represented by the D-dimensional vector {s1
,s2
,......,sD
}
○ Outputs: Q values for each of the N actions {Q1
, Q2
,........,QN
}
Hidden Layers
s1
s2
s3
sD
Q1
Q2
QN

.
● The loss function for training NN is taken as the difference between Q values predicted
by the DNN and target Q values given by the Bellman’s equation
L = ½ [ (r + γ maxa
Q(s’, a’)) - Q(s,a) ]2
● NN is trained using back propagation as follows:
1. Start an episode of explorations
2. Initialize NN and start from a random state s
3. Do a forward pass of state s through the DNN
and get Q-values for all actions
4. Perform an ε-greedy exploration for choosing an
action a for the current state s
5. Get the next state s’ and reward r from the
environment
6. Pass s’ also through the DNN and compute
maxa
Q(s’, a’)
7. Set the target Q-value for the output node
corresponding to action a to be
r + γ maxa’
Q(s’, a’)
8. For all other nodes, keep the target Q-value
same as that obtained from DNN prediction in
step 2
9. Update the weights using backpropagation
10. Repeat the steps 2-6 till a termination condition
is reached
11. Repeat the episodes till network is trained

.
● High Level TD NN Learning iteration flow
DNN Model
Iteration over episodes
Iteration over exploration

Network Intrusion Detection
.
● Can we use Reinforcement Learning for Network Intrusion Detection?
● Related research works:
○ James Cannady used a CMAC Neural Network and formulated Network Intrusion
Detection as an online learning problem 1
○ Xin Xu studied the problem of host-based intrusion detection as a multi-stage cyber
attack and applied reinforcement learning 2
○ Arturo Servin studied the DDoS attack as a traffic anomaly problem and used
reinforcement learning for detection 3
○ Kleanthis M used a distributed reinforcement network for network intrusion response
4
● None of these have used a DNN for function approximation

.
● Standard dataset for scientific research NSL-KDD Dataset 5
● Dataset contains 4 categories of attacks in a local area network
○ DOS - Denial of Service Attacks
○ R2L - Remote to Local where remote hacker trying to get local user privileges
○ U2R - Hacker operates as a normal user and exploit vulnerabilities
○ Probing - Hacker scans the machine to determine vulnerabilities
● Dataset contains 125, 973 connections for Training and 22, 543 for Testing
● Training set has 53.5% normal connections and 46.5% abnormal connections
● There are 41 features (32 continuous, 3 nominal and 6 binary)
● Eg. Type of protocol (TCP, UDP), port number, packet size, rate of transmission

.
Source of image https://nycdatascience.com/blog/student-works/network-intrusion-detection-2/

.
● However NLS-KDD dataset cannot be used for sequential anomaly detection
○ There is not time stamp. Dataset is not a time series data
○ There is no way one can identify the different connections are from the same
user/hacker or not
○ One could use it for standard anomaly detection problem using reinforcement
learning

.
● However NLS-KDD dataset cannot be used for sequential anomaly detection
○ There is not time stamp. Dataset is not a time series data
○ There is no way one can identify the different connections are from the same
user/hacker or not
○ One could use the dataset for standard anomaly detection problem using
reinforcement learning

.
● Reinforcement Learning Formulation with NSL-KDD Dataset
○ The states are characterized by the 41 features in the data set
○ For every state the agent takes either of the two actions:
■ Send an alert
■ Not send an alert
○ The rewards generated by the environment:
■ +1 if the state is normal and action is not send alert
■ +1 if the state is malicious and action is send alert
■ -1 if the state is malicious and action is not send alert
■ -1 if the state is normal and action is send alert

Implementation using TensorFlow
.
● Creation of the Environment
○ Goal of the environment is to stimulate the reward scheme mentioned for the
NSL-KDD dataset and also supply a new state every time
○ This can be done using the Gym toolkit from Open AI
https://github.com/openai/gym/tree/master/gym/envs
gym-network_intrusion/
README.md
setup.py
gym_network_intrusion/
__init__.py
envs/
__init__.py
network_intrusion_env.py
from gym.envs.registration import register
register(
id='NetworkIntrusion-v0',
entry_point='gym_network_intrusion.envs:NetworkIntr
usionEnv',
)

.
● Creation of the Environment
import gym
from gym import error, spaces, utils
from gym.utils import seeding
class NetworkIntrusionEnv(gym.Env):
def __init__(self):
...
def _step(self, action):
return new_state, reward, episode_over, details
...
def _reset(self):
return initial_state
...
def _get_reward(self, action):

.
● Implementation using TensorFlow
● Two architectures:
○ Deep NN architecture:
■ Discretize continuous variables and use one hot representation
○ Deep and Wide NN architecture:
■ Useful for combining continuous and discrete variables into one NN model
■ Also combines the power of memorization and generalization
■ https://www.tensorflow.org/tutorials/wide_and_deep

.
● Implementation a simple NN using TensorFlow
○ Discretize continuous variables and use one hot representation
○ Used binning (#bins = 5) to convert continuous to categorical
○ There are 226 one hot vectors
○ 3 layer feed forward neural network (226 X 10 X 1)
● Code available at https://github.com/harik68/RL4AD

.
● Model performance (work in progress !)
Baseline DNN-RL Model V0.1
TPR
FPR
Source of image for baseline https://nycdatascience.com/blog/student-works/network-intrusion-detection-2/

Next Steps
.
● Experiment with different discretization scheme or even tile coding
● Experiment with different NN architectures (Deep and Wide)

References
.
1. Next Generation Intrusion Detection: Autonomous Reinforcement Learning of Network Attacks,
J. Cannadey, 23rd National Information Systems Security Conference (2000)
2. Sequential anomaly detection based on temporal-difference learning: Principles,
models and case studies, Xin Xu, Applied Soft Computing 10 (2010) 859–867
3. Towards Traffic Anomaly Detection via Reinforcement Learning and Data Flow, A. Servin
[PDF] york.ac.uk
4. Distributed response to network intrusions using multiagent reinforcement learning, Engineering
Applications of Artificial Intelligence, Volume 41 Issue C, May 2015 Pages 270-284
5. NSL-KDD dataset, Canadian Institute for Cyber Security, University of New Brunswick,
(http://www.unb.ca/cic/datasets/nsl.html)
6. Artificial Intelligence a Modern Approach by Peter Norvig and Stuart J. Russell, Prentice Hall
(2009)

PAGE©2018 ZIGHRA | WWW.ZIGHRA.COM
THANK YOU !
We are hiring Data Scientists, Machine Learning Engineers and Mobile Developers
Apply at career@zighra.com

Anomaly Detection through Reinforcement Learning

Anomaly Detection through Reinforcement Learning

Recommended

Recommended

More Related Content

Similar to Anomaly Detection through Reinforcement Learning

Similar to Anomaly Detection through Reinforcement Learning (20)

Recently uploaded

Recently uploaded (20)

Anomaly Detection through Reinforcement Learning