2. Outline
• Introduction
o P2P Networks
o P2P Botnets
• Work overview
• Related Work
• Our work
o Generating traffic
o Feature extraction & selection
o Evaluation of feature selection techniques
o Future scope of work
3. What is a P2P Network?
A
D
E F
G
H
F
H
GA
E
C
C
B
P2P
overlay
layer
Native IP
layer
D
B
AS1
AS2
AS3
AS4
AS5
AS6
4. Generic P2P Architecture
Capability &
Configuration
Peer Role Selection
Operating System
NAT/ Firewall Traversal
Routing and Forwarding
Neighbor Discovery
Join/Leave
Bootstrap
Overlay Messaging API
Content
Storage
Search API
8. Work overview
Evaluation of 3 feature selection algorithms-
Correlation-based Feature Selection
Consistency-based Subset Evaluation
Principal Component Analysis
Models built with 3 machine learning algorithms-
Naïve Bayes classifier
Bayes Networks
C4.5 Decision trees
Performance evaluation for the detection of some
recent and well-known P2P botnets.
9. Related work
• Early work using feature selection algorithms [1] [2]
used the DARPA dataset, which is no longer suitable
for today’s security research.
• Early approaches for P2P botnet detection [3]
applied static, port based analysis- easily defeated
by modern botnets.
• Recent work [4] [5] has employed machine learning
and data mining techniques for detection of P2P
botnets.
10. Our work
Machine Learning Algorithms
Bayes Network Naïve Bayes C4.5 Decision Trees
Feature Selection
Correlation-based Feature Selection Consistency-based Subset Evaluation Principal Component Analysis
Feature Extraction
source min. packet size dest. TCP Push flag count source avg. packet size dest. total volume duration …
Flow Extraction
<Source IP, Source port, Destination IP, Destination port, Protocol>
Network captures
jNetPcap Library with Java module
11. Generating Traffic
Botnet traffic generation
Internet
Info.
Sec.
Lab
Dist.
Sys. Lab Multimedia
Lab
Hostels
Wing
Data collection for P2P
and web traffic
Anonymization
(Anon tool)
Botnet
detection
module
Firewall
Core
Switch 6509
Distribution
Switch 4500
Access
Switch 2500
Content
Mgmt.
Application
Servers
DB
Cluster
IDS
Ethernet
12. Dataset
Data Application Number of flows
Benign data
HTTP, HTTPS, SMTP, FTP, POP 30,000 flows
P2P apps- eMule, BitTorrent, Mute, Gnutella etc. 50,000 flows
Botnet data
[4,5]
Zero Access 720 flows
SkyNet 770 flows
Waledac 80,000 flows
Storm 2,20,000 flows
13. Feature Extraction &
Selection
• A ‘Flow’ defined by:
• <Source IP, Source port, Dest. IP, Dest. port, Protocol>
• Features extracted from each flow:
• Packet count (bi-directional)
• Packet size (bytes) (min, max, mean and standard deviation)
(bi-directional)
• Total volume (bytes) (bi-directional)
• Inter-arrival times (min, max, mean and standard deviation)
(bi-directional)
• TCP Push flag count (bi-directional)
• Duration of the flow (no context of direction)
• TOTAL - 23 features extracted from each flow
14.
15. Feature Extraction &
Selection
• Three Feature Selection techniques used:
1. Correlation-based Feature Selection (CFS)
2. Consistency-based Subset Evaluation (CSE)
3. Principal Component Analysis (PCA)
• Evaluated with three algorithms:
1. Naïve Bayes
2. Bayes Network
3. C4.5 Decision Trees
16. Feature Extraction &
Selection
Feature Selection Search
method
No. of
features
Description
CFS
Best first
search
5
source packet count, source min.
packet size, source max. packet
size, dest. max. packet size,
source inter-arrival time std.
CSE
Best first
search
8
source min. packet size, source
max. packet size, dest. max.
packet size, source avg. packet
size, dest. avg. packet size,
source max. inter-arrival time,
flow duration, source volume
PCA - 12 A linear combination of features
20. Future Scope
Ensemble of classifiers
(Work in Progress- paper submitted to I-CARE 2013)
Close-to-real-time Detection Tool
(Work in progress)
Space-efficient data structures
21. References
1. A. H. Sung and S. Mukkamala. The feature selection and intrusion detection
problems. In Advances in Computer Science-ASIAN 2004. Higher-Level
Decision Making, pages 468–482. Springer, 2005.
2. S. Chebrolu, A. Abraham, and J. P. Thomas. Feature deduction and
ensemble design of intrusion detection systems. Computers & Security,
24(4):295–307, 2005.
3. R. Schoof and R. Koning. Detecting peer-to-peer botnets. University of
Amsterdam, 2007.
4. S. Saad, I. Traore, A. Ghorbani, B. Sayed, D. Zhao, W. Lu, J. Felix, and P.
Hakimian. Detecting p2p botnets through network behavior analysis and
machine learning. In Privacy, Security and Trust (PST), 2011 Ninth Annual
International Conference on, pages 174–180. IEEE, 2011.
5. B. Rahbarinia, R. Perdisci, A. Lanzi, and K. Li. Peerrush: Mining for unwanted
p2p traffic. In DIMVA. 2013.