MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Datasets, by Yuji Sekiya.
Presented at the APNIC 40 APOPS 1 session, Tue 8 Sep 2015.
Magic exist by Marta Loveguard - presentation.pptx
MATATABI: Cyber Threat Analysis and Defense Platform using Huge Amount of Datasets
1. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
MATATABI : Cyber Threat
Analysis and Defense Platform
using Huge Amount of Datasets
Yuji Sekiya*
*The University of Tokyo, Japan
2. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
Multi-layer Threat Analysis
Victim side action
Filtering
Load balancing
Isolation
Countermeasure for Attackers
Report to ISP
Announce to users
Filtering at ISP level
Configuration to servers
Data collection at
Multiple layers/locations
Network device
Servers
Users Device
Analysis Platform
Analysis 1
Analysis 2
Analysis 3
Threat analysis (detection) across
multiple datasources
Threat Information Share
Among organizations
Announce to public
2
3. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
Security Information Pipeline
Making pipeline through divert activities
Data collection (Traffic, User behavior, etc)
Threat Analysis
Human decision
Protection (Enforcement)
ProtectionData Analysis
Human
Inputs
3
4. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
Datasets
4
MATATABI
Switch
Router
DNS
Firewall
SPAM
Phishing Site
External
Information
sFlow
NetFlow
URL
SPAM Sender
URL
syslog
querylog
pcap
text
URL
5. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
Data Volume
N*10GByte/day
20TB/10months
Traffic sampling
Packet dump
E-mail
DNS
Web traffic
5
6. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
1. Forensics : preserving log data
To keep evidences as traceable.
To analyze multi-source data exhaustively
2. Scalability : should be tolerable to huge data
To store a huge amount of datasets
To process datasets in a reasonable time
3. Real-time analysis : processing performance
Possibly real-time analysis of any datasets
4. Uniform programmability :
Various data format should be easily accessible
Various analysis program can be used
Goals of MATATABI
6
7. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
NECOMA ECO System
Infrastructure
Data
End Point
Data
API API
Analysis Module /
Early Warning System
API
Threat
Information
Sharing
External
Knowledge DB
API
Crawler
API
External
Resource (web)
Infrastructure
Devices
End Point
Devices
API API
Resilience Mechanism
API
Get external
threat information
Get data
Put analysis results
Get threat
information
and other
results Get threat information
Control infrastructure and
end point devices
Crawling external resource
and extracting knowledge
Collection Probe Collection Probe
Get data
Petsas et al., A Trusted Knowledge Management System for
Multi-layer Threat Analysis. TRUST 14’ (poster session), June 2014
7
8. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
HDFS
DGA
Analyzer
DDoS
detection
Hive/
Presto
Thrift Mahout Rhadoop
DNS querylog
dns-pcap
sflow
netflow
spam
open resolver
phishing
darknet
topology
endpoint
user behavior
client honeypot
Hadoop Cluster
API (JSON)
hadoop-
pcap
anomaly
detection
(2) Data
import
Measurement
Data
(3) Analysis
Module
(1) Data
Storage
(4) MATATAPI
4 components
1) Storage
2) Data import/process module
3) Analysis module
4) Application Programming Interface (API)
MATATABI Overview
8
9. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
Built by Open-Source Software
Actively using open-sourced software
Apace Hadoop (HDFS, MapReduce, etc)
Apache Hive (SQL-like language => distributed jobs)
Facebook Presto (Distributed SQL engine)
Apache Mahout (Machine learning library)
Apache Thrift (Language bindings)
Hadoop-pcap (pcap file parser)
Fixed issues and packaged by NECOMA
https://github.com/necoma
9
10. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
1) Storage
Storing measured data
to Hadoop Distributed
FileSystem (HDFS)
Easily scaled-out
• Data access by tools
– Hive/Presto-db
– Hadoop-pcap
HDFS
DGA
Analyzer
DDoS
detection
Hive/
Presto
Thrift Mahout Rhadoop
DNS querylog
dns-pcap
sflow
netflow
spam
open resolver
phishing
darknet
topology
endpoint
user behavior
client honeypot
Hadoop Cluster
API (JSON)
hadoop-
pcap
anomaly
detection
(2) Data
import
Measurement
Data
(3) Analysis
Module
(1) Data
Storage
(4) MATATAPI
10
11. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
2) Data import module
Pre-processing
measurement data
• By each dataset
– Raw data (e.g., pcap)
– Converting to Hive tables
HDFS
DGA
Analyzer
DDoS
detection
Hive/
Presto
Thrift Mahout Rhadoop
DNS querylog
dns-pcap
sflow
netflow
spam
open resolver
phishing
darknet
topology
endpoint
user behavior
client honeypot
Hadoop Cluster
API (JSON)
hadoop-
pcap
anomaly
detection
(2) Data
import
Measurement
Data
(3) Analysis
Module
(1) Data
Storage
(4) MATATAPI
11
12. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
3) (Threat) Analysis module
Easily implement-able
Bunch of analysis
Distributed computations
(MapReduce)
HDFS
DGA
Analyzer
DDoS
detection
Hive/
Presto
Thrift Mahout Rhadoop
DNS querylog
dns-pcap
sflow
netflow
spam
open resolver
phishing
darknet
topology
endpoint
user behavior
client honeypot
Hadoop Cluster
API (JSON)
hadoop-
pcap
anomaly
detection
(2) Data
import
Measurement
Data
(3) Analysis
Module
(1) Data
Storage
(4) MATATAPI
12
13. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
4) Application Programming Interface (API)
Export analysis results
Export dataset itself (if
needed)
Implemented with n6
REST API
JSON/CSV/IODEF format
HDFS
DGA
Analyzer
DDoS
detection
Hive/
Presto
Thrift Mahout Rhadoop
DNS querylog
dns-pcap
sflow
netflow
spam
open resolver
phishing
darknet
topology
endpoint
user behavior
client honeypot
Hadoop Cluster
API (JSON)
hadoop-
pcap
anomaly
detection
(2) Data
import
Measurement
Data
(3) Analysis
Module
(1) Data
Storage
(4) MATATAPI
13
14. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
Analysis Modules (Use cases)
14
Name Datasets Frequency LoC
(#lines)
Remark
ZeuS DGA detector DNS pcap, netflow daily 25 hadoop-pcap
UDP fragmentation detector sflow daily 48
Phishing likelihood calculator Phishing URLs,
Phishing content
1-shot –
Mahout
(RandomForest)
NTP amplifier detector
netflow, sflow daily 143
pyhive, Maxmind
GeoIP
sflow daily 24
DNS amplifier detector sflow, open resolver
[19]
daily 37
Anomalous heavy-hitter
detector
netflow, sflow daily 106
pyhive
DNS anomaly detection DNS pcap, whois,
malicious/legitimate
domain list
daily 57
hadoop-pcap, Mahout
(RandomForest)
SSL scan detector sflow 1-shot 36
DNS failure graph analysis DNS pcap daily 159 pyhive
15. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
• Make a SQL request by Presto
• Get IP addresses that sends UDP traffic on
port 123 with a packet size = 468
• Packet size of Monlist reply = 468 bytes
15
Analysis Example (1)
Finding NTP Amplifiers
SELECT sa FROM netflow WHERE sp=123 AND pr='UDP' AND
ibyt/ipkt=468 GROUP BY sa
18. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu 18
Analysis Example (2)
Detecting DNS Amplifier Attacks
Open Resolver
DNS Server
Attackers
Spoofed Packets
19. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
Found Response with RD(Recursive Desired)
flag.
Queries from Open Resolver Servers
Attempts of the Water Torture Attack
select src,count(*) from dns_pcaps where dt='20150401' and dns_qr=true and
dns_flags like '%rd%' and server=‘dns1-pcap’ group by src;
Analysis Example (2)
Detecting DNS Amplifier Attacks
20. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu 20
Authoritative
DNS Servers
Resolver
DNS Server
Attackers
Spoofed
Answers
Analysis Example (3)
Detecting DNS Cache Poisoning Attacks
Query
21. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
Analysis Example (3)
Detecting DNS Cache Poisoning Attacks
Normally
# of query from resolver server > # of query to resolver server
Counting number of queries from resolver server
Counting number of answers to resolver server
If not, it is possibly ddos or cache poisoning attack
against our DNS resolver server
select floor(ts/60),count(*) from dns_pcaps where dt = '20150401’ and dns_qr=false and
dns_flags not like ‘%rd%’ and server=’ns1-pcap‘ group by floor(ts/60);
select floor(ts/60),count(*) from dns_pcaps where dt = '20150401’ and dns_qr=true and
dns_flags like ‘%aa%’ and server=‘ns1-pcap’ group by floor(ts/60);
22. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
Detecting Botnet infected hosts
by DGA Queries
22
• Domain Generation
Algorithm (DGA)
– Auto generated domain
names used by botnets
– Usually the names are
changed in a short span
– Difficult to detect botnets
hosts by domain name.
• ZeuS-DGA
– [a-z0-
9]{32,48}.(ru|com|biz|info|o
rg|net)
– Example:
f528764d624db129b32c21fbc
a0cb8d6.com
001: gh3t852dwps7v47v4139eid62g190bjrs
002: g22tdk3q8097o97fcs0j46fe0l7wc56us
003: gj9d611364m0ysceiq0x250fm5u69zq5s
:
botmaster
bot
domain list: periodically generate
001: gh3t852dwps7v47v4139eid62g190bjrs
002: g22tdk3q8097o97fcs0j46fe0l7wc56us
003: gj9d611364m0ysceiq0x250fm5u69zq5s
:
domain list: periodically generate
g22tdk3q8097o97fcs0j46fe0l7wc56us.ru
001.ru 001.com 002.ru
23. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
Found specific regular expression type in
queries
Some botnet clients generate dynamic,
randomized DNS name to contact botnet
C&C servers (so called DGA)
select src,dns_question from dns_pcaps where regexp_like (dns_question,
'[a-z0-9]{32,48}.(ru|com|biz|info|org|net)') AND NOT regexp_like(dns_question,
'xn--') AND dt='20150401';
Analysis Example (4)
Detecting DGA Queries
25. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
2001:XXXX:1d8:0:0:0:0:106 | cg79wo20kl92doowfn01oqpo9mdieowv5tyj. 0 IN A
2001:XXXX:0:1:0:0:0:f | cg79wo20kl92doowfn01oqpo9mdieowv5tyj.com. 0 IN A
157.XXX.234.35 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN A
133.XXX.127.131 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN A
23.XXX.104.44 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN A
133.XXX.124.164 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN A
157.XXX.234.35 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN AAAA
133.XXX.127.131 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN AAAA
23.XXX.111.231 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN AAAA
133.XXX.124.164 | 96e4c3658d4cb4b559057995ae5a382c.com. 0 IN AAAA
157.XXX.193.67 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A
133.XXX.127.131 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A
173.XXX.59.40 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A
133.XXX.124.164 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A
157.XXX.193.67 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A
133.XXX.127.131 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A
192.XXX.79.30 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A
133.XXX.127.131 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A
185.XXX.155.12 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A
133.XXX.124.164 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A
157.XXX.193.67 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A
133.XXX.127.131 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A
173.XXX.58.45 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A
133.XXX.124.164 | bf3b6eb48a734f3abae02ae1d7ff62e7.com. 0 IN A
25
Analysis Example (4)
Detecting DGA Queries
27. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
Visualization of Zeus DGA and Botnet
2015/07/01 – 2015/07/05
The number of the most active DGA query is 23
Related traffic flows from netflow datasets.
27
29. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
One of Protection Methods
SDN IX (PIX-IE)
Programmable IX in Edo : PIX-IE
Mitigating and filtering suspicious flows at IX
IX is a public space in the Internet
Before link saturation, an ISP operator can stop DDoS
flows
29
Programmable IX
(PIX-IE)
ISP
ISP ISP
ISP
ISP
ISP
Vic m
ISP Vic m Service
Spoofed SRC UDP
Link
Satura on
The operator has to contact to
each ISP, and ask to filter the
DDoS packets …
Human
Interac on
Programmable IX
(PIX-IE)
ISP
ISP ISP
ISP
ISP
ISP
Vic m
ISP Vic m Service
Mi ga on
Mi ga on
Mi ga on
Mi ga on
REST API
30. Yuji Sekiya <sekiya@wide.ad.jp> www.necoma-project.eu
Summary and Ongoing Work
MATATABI: a platform for threat analysis
Exploiting (existing) big data software
Data collection to threat knowledge base
Toward security information pipeline
Enrichment of analytical results
To policy enforcement
Real-time analysis
30
ProtectionData Analysis
Human
Inputs
Editor's Notes
セキュリティ情報のパイプライン構築
Controlling several pieces of network components (measurements, analysis, endpoints, others actiivties) via Threat Information sharing (NECOMAtter)