SlideShare a Scribd company logo
1 of 59
Download to read offline
Surasak Sanguanpong
Surasak.S@ku.ac.th
Applied Network Research Lab
Department of Computer Engineering
Faculty of Engineering, Kasetsart University
Software	
  Freedom	
   Day	
  2016	
  – Sept	
  17	
  Bangkok
Experiences	
  in	
  ELK	
  with	
  D3.js	
  
for	
  Large	
  Log	
  Analysis	
  
and	
  Visualization
U-Bahn Station Candid Plazt, Munich,Germany
In This Talk
2
Real	
  Time	
  
Visualization
with	
  D3.js
Search	
  Platform	
  
with	
  ELK
About	
  Traffic	
  Log
(KU	
  case	
  study)
Lessons	
  Learnt
Chapter I
Network Traffic Log Structure and Sizing
KU Case Study
3
Why keeping Log
• Legal compliant
• Troubleshooting
• Security analysis
• Statistics/Analytics
4
Log Monitoring
Collecting
Processing
Analysing
Visualising
5
Source:	
  https://www.flickr.com/photos/sbeebe/4772418919
Searching in Log
Find relevant stuff
Find it fast
Make our lives easier
6
Traffic Logging Solution
Splunk?
Great, but.. Commercial, Proprietary
Graylog?
Excellence, but too automatic
Elasticsearch, Logtash, Kibana (ELK), D3
That is!, open source, fun to play
7
KU Logging
2008-2015 2015-
8
Raw Log
MySQL
Simple Web GUI
Raw Log
Elasticsearch
Web GUI/Kibana/D3
• On the fly text based log to
MySQL converter
• Simple but slow
• Much faster!
KU Logging Structure
9
Network
Login
Portal
Search GUI
Logging
Engine
Packet
Capture
Socket
Login Log
Web Log
Packet Log
Multicore x86
with 10 GbE
Raw Log
Raw Login Log Format
• Real-time logging, one file per day
10
Date Time Action IP UserName LogServer
Jul 1 10:04:57 login 158.108.X.X XXXXX@ku.ac.th 192.168.1.1
Jul 1 10:04:58 logout 158.108.X.X YYYYY@ku.ac.th 192.168.1.2
Jul 1 10:04:59 timeout 158.108.X.X ZZZZZ@ku.ac.th 192.168.1.2
Raw Web Log Format
• Real-time logging, one file per minute
11
UnixTime SrcIPv4 SrcIPv6 DstIPv4 DstIPv6 SrcPort DstPort URL Referer/HTTPS
20151103010000 192.55.X.X - 158.108.X.X - 17490 80
mirror1.ku.ac.th/fedora-epel/6/i386/jday-devel-2.4-5.el6.i686.rpm
http://mirror1.ku.ac.th/fedora-epel/6/i386/
20151103010000 - 2406:3100:1018:1::XX - 2600:1417:a::174c:XX 61154 443
fbcdn-photos-g-a.akamaihd.net HTTPS
20151103010000 - 2406:3100:1018:1::XX - 2a03:2880:f002:105:fa:b0:0:YYXX
59960 443 edge-mqtt.facebook.com HTTPS
Raw Packet Log Format (Header Log)
• Real-time logging, one file per minute
12
TimeStamp SrcIP DstIP SrcPort Proto Size DstPort SrcPort [Flag]
2009-07-16 17:53:59.999206 208.117.8.X 158.108.234.X 1514 TCP 80 1371 0x10
2009-07-16 17:53:59.999209 158.108.2.X 202.143.136.X 90 UDP 123 123
TimeStamp SrcIP DstIP Proto Code
2009-07-16 17:53:59.999210 158.108.184.X 218.164.54.X ICMP 168
Time based Hierarchical Folder
13
Minutely	
  FileHourDayMonthYear
2015
01
01
00
201501010000.txt
201501010001.txt
:
201501010059.txt01
:
:
23
201501012300.txt
201501012301.txt
:
201501012359.txt
02
:
30
02
:
12
At What Scale?
Quite Large..
14Source:
http://www.24hourcampfire.com/ubbthreads/ubbthreads.php/topics/5976731/all/That_s_a_loa
d_of_logs
SPEED
400,000 req/s peak
STRUCTURE
Text/binary
SIZE
30 TB
3.2 trillion docs
Facts about KU
Accounts
113,XXX
4 Campuses
BKN, KPS, SRC, SKN
158.108.0.0/16
192.102.83.0/24
10.0.0.0/8
2406:3100::/32
2001:3c8:1303::/48
2001:f00:2003::/48
IPv4
IPv6
50,XXX
Concurrent Active IP Address
(25,XXX:Wifi)
Registered Devices
210,XXX
1,4XX
Access Points
System Structure
Internal
network
Internet
IPv4/IPv6  Parallel  Firewalls          with  Load  Balancers
Gateway
Router
Core
Router
Session  Manager Login  Servers
Quota  Manager Traffic  Logger  
Case	
  Study
5x1  Gbps
1x10  Gbps
Sample Minutely HTTP Request Rate
17
11	
  days	
  (11x	
  24x60=	
  15,640	
  data	
  points)
Request Rate and Log Sizing
18
3.1	
  req/s
27	
  MB/d
2,100	
  req/s
33	
  GB/d
380,000	
  req/s
330	
  GB/d
Accumulated Log Request and Size
19
#Files	
  :	
  120
20M
2.04	
  GB
14.1B
2.57	
  TB
#Files	
  :	
  172,800
28.03	
  TB
3.27T
#Files	
  :	
  172,800
New Logging Architecture
20
Network
Login
Portal
Logging
Engine
Packet
Capture
Socket
Login Log
Web Log
Flow Log
Multicore x86
with 10 GbE
Raw Log
DHCP,
RADIUS
Session
Tracking &
Accounting
Elasticsearch
Real time
Indexing
GUI/
Analytics
Chapter II
ELK Stack Testbed
21
What is the Elasticsearch?
22
Real-time
Search
Engine SW
Document-
Oriented
JSON based
REST API
JAVA/
Lucene
based
Open
Source
Apache 2
License
REST: Representational State Transfer
JSON: JavaScript Object Notation
Elasticsearch and Database
• Roughly layout comparison
23
Relational Database
Database
Table
Row
Column
Schema
Elasticsearch
Index
Type
Document
Field
Mapping
Elasticsearch Logical Layout
24
Index: social Index: blog
Elasticsearch Node
Type: story Type: user Type: posts
1 2 1 2 1 2 3 4
Index Application
Search Application
• Using any HTTP client to talk to
Elasticsearch at localhost port 9200
• RESTful : Interact through common
HTTP methods (GET, POST, PUT,
DELETE)
• Not maintain state information
• Each request is independent and
resources are returned in JSON
text formats
How the world is using Elasticsearch?
25
Analytics solution on 40 million
documents per day to deliver
real-time visibility
Providing search across
GitHub's code
Full-text search to find related
questions and answers
Full-text search with highlighted
search snippets
Elasticsearch and Big Data
ES-Hadoop: Connectivity of Hadoop's big data analytics and the real-time search of Elasticsearch.
26
Source:	
  https://www.elastic.co/products/hadoop
What does Elasticsearch offer?
27
Full Text
Search
Very Fast
Fault
Tolerance
High
Availability
Distributed Scalable
Plugin
Architecture
Node and Cluster
28
CLUSTER
P0
Node 1
P1 R2 R0
Node 2
R1 P2
CLUSTER
P0
Node 1
P1 P2
Px: Primary Shard : Chunk of index
Rx: Replica Shard: Copy of Shard
A Node : A running Elasticsearch process
CLUSTER
P0
Node 1 Node 2
R0
Node 3
R1 R2P1 P2R2 R2 R2
(3 Shards/1 Replica)
When a Node Fails (3 Shards/1 Replica)
29
CLUSTER
P0
Node 1 Node 2
R0
Node 3
R1 R2P1 P2R2 R0 R1
CLUSTER
P0
Node 1 Node 2
R0
Node 3
P1 R2P1 P2R2 R2 R1
When a Node Fails (3 Shards/2 Replicas)
30
CLUSTER
P0
Node 1 Node 2
R0
Node 3
R1 R2P1 P2R2 R0 R1
CLUSTER
P0
Node 1 Node 2
R0
Node 3
P1 R2P1 P2R2 R2 R1
Elasticsearch documents
• Document : Basic unit of user data in JSON representation
• Sample Document :
31
{
"user" : ”Chris"
"gender" : ”M",
"birthdate" : "1980-12-11"
}
URI of a document
32
http://localhost:9200/sample_index/sample_type/1
Protocol used:
supported HTTP
Port to connect to
Using 9200 by default
Host name of the
Elasticsearch node
Index name
Type name
Document ID
HTTP based CRUD operation
Create
curl -XPUT “http://localhost:9200/<index>/<type>/<id>”
Read
curl -XGET "http://localhost:9200/<index>/<type>/<id>"
Update
curl -XPOST "http://localhost:9200/<index>/<type>/<id>"
Delete
curl -XDELETE "http://localhost:9200/<index>/<type>/<id>"
33
ELK stack from Elastic
34
Elasticsearch: High-
performance
scalable search engine
Logstash: Log transport
and processing daemon
(Log Shipper)
ELK Stack
Kibana: Visualisation
dashboard
Logstash
35
Log aggregator and parser
Transferring parsed data
to Elasticsearch
Configuration file for
specifying input, filtering
(parsing) and output
input	
  {	
  stdin {	
  }	
  }
filter	
  {	
  	
  
grok {	
  	
  	
  
match	
  =>	
  {	
  "message"	
   =>	
  "%{COMBINEDAPACHELOG}"	
  }	
  	
  
}	
  	
  
date	
  {	
  	
  	
  	
  match	
  =>	
  [	
  "timestamp"	
   ,"dd/MM/yyyy:HH:mm:ss"	
  ]	
  	
  
}
}
output	
  {	
  	
  Elasticsearch {	
  hosts	
  =>	
  ["localhost:9200"]	
  }	
  	
  
stdout {	
  codec	
  =>	
  rubydebug }}
Sample	
  Configuration	
  Source:	
  	
  
https://www.elastic.co/guide/en/logstash/current/config-­‐examples.html
Kibana
36
General purpose query UI
Includes many widgets
Query Elasticsearch without
coding
Alternative Stack
37
ELK
EFK
Indexing Performance (Weblog)
38
35
36
37
38
39
40
41
42
43
44
45
0
50
100
150
200
250
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
THOUSANDS
MILLIONS
Daily	
  	
  Performance	
  Indexing
#Records Records/s
• Dell R230
• Xeon E3-1271v3 3.6
Ghz 4C/8T
• Hyper-threading off
• 32 GB RAM
• 2x6 TB NLSAS
• Elasticsearch
2.3.2
• 10 Shards/0 Replica
Indexing	
  Performance
on	
  single	
  machine	
  
Search Performance
Search keyword:
“ face” against each
daily log
Not yet Optimization
39
2.01
2.33
1.99
2.13
2.67
2.00
1.33
1.02
3.00
2.33
2.00
2.67
3.00
2.67
2.43
3.33
2.67
2.14
3.33
17,551
22,816
16,346
18,218
16,240
7,958
5,622
1,886
23,559
9,127
8,221
12,343
28,259
25,405
22,092
33,528
17,683
12,951
18,054
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
SEARCH	
  TIME	
  (MS)
Search	
  Performance	
  and	
  Hits
Search	
  Time	
  (ms) Hits
Sample Search Performance
40
Kibana: Main Dashboard
41
Kibana: Login Profile
42
Kibana: Concurrent Login View
43
Chapter III
Playing with D3.js
44
Real Time Visualization with D3.js
• Data-Driven
Documents (D3)
• JavaScript library for
manipulating
documents based on
data
• Developed by Mike
Bostock
45
https://d3js.org/
D3 Architecture
§ Input data to build
visualizations (JSON,
CSV,…)
§ Data manipulation of HTML
elements dynamically with
JavaScript
46
node.js
socket.io
Sample Gallery
47
Real-time makes impression
48
http://map.norsecorp.com/#/Norse	
  Live	
  Attack	
  Map	
  
Sample Concurrent Login
49
Sample IP Matrix Occupied
50
Sample Tree Map Web Access
51
Sample Traffic Connectivity
52
Chapter IV
Lessons Learned
53
Lessons Learned
Elasticsearch offers a very fast full-text search services
Indexing size may 3x to 5x bigger than source data
Use Elasticsearch for search services, not for data archiving
More cores or Faster clock? : Choose CORES
64 GB of RAM is the ideal
Go with SSD if possible
54
Lessons Learned
Designed to work in a trusted environment
No built in security
No authentication or authorization, no concept of a user
Anyone that can send a request to the cluster is a super user
Easy to erase all the data:
curl –XDELETE http://<server>:9200/_all
55
Lessons Learned
Shield from Elasticsearch: A comprehensive security solution,
including encrypted communications, RBAC, AD/LDAP
integration and auditing
Use with a proxy : Authentication
and request filtering with nginx or others
56
Lessons Learned
Logstash : A powerful tool to manipulate log
Kibana : Simple and useful for visualize data
57
Lessons Learned
D3 pros:
Flexible, Fascinating Visualization
D3 cons:
Low Level, Steep Learning Curve, CPU intensive
58
Thank you for your attention
Q & A Time
Q&A…
59
Kasom Koth-Arsa
Core Log Design and Development
Jautuporn Chuchuay
Peerapol Boonthaganon
Web GUI Development
Sataporn Techaaramwong
Web/Elasticsearch Development
Peerapong Thongpubeth
Jiradech Sirijantadilok
Kibana Development
Poomipat Thongudom
Nichapat Nattee
D3 Development
Surachai Chitpinijyol
Project Coordinator
Surasak Sanguanpong
Project Director
Special Thanks to Kasetsart Office of Computer
Services for supporting traffic dataSunset at Narita Airport
Kasom Koth-Arsa
Core Log Design and Development
Jautuporn Chuchuay
Peerapol Boonthaganon
Web GUI Development
Sataporn Techaaramwong
Web/Elasticsearch Development
Peerapong Thongpubeth
Jiradech Sirijantadilok
Kibana Development
Poomipat Thongudom
Nichapat Nattee
D3 Development
Surachai Chitpinijyol
Project Coordinator
Surasak Sanguanpong
Project Director
Special Thanks to Kasetsart Office of Computer
Services for supporting traffic data

More Related Content

What's hot

Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysisDivante
 
ELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log ManagementELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log ManagementEl Mahdi Benzekri
 
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.Airat Khisamov
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Andrii Vozniuk
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELKYuHsuan Chen
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.Renzo Tomà
 
Elk devops
Elk devopsElk devops
Elk devopsIdeato
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELKGeert Pante
 
'Scalable Logging and Analytics with LogStash'
'Scalable Logging and Analytics with LogStash''Scalable Logging and Analytics with LogStash'
'Scalable Logging and Analytics with LogStash'Cloud Elements
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK hypto
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
 
Logstash: Get to know your logs
Logstash: Get to know your logsLogstash: Get to know your logs
Logstash: Get to know your logsSmartLogic
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackRohit Sharma
 
Toronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELKToronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELKAndrew Trossman
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life琛琳 饶
 
ELK, a real case study
ELK,  a real case studyELK,  a real case study
ELK, a real case studyPaolo Tonin
 

What's hot (20)

Introducing ELK
Introducing ELKIntroducing ELK
Introducing ELK
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
 
ELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log ManagementELK Elasticsearch Logstash and Kibana Stack for Log Management
ELK Elasticsearch Logstash and Kibana Stack for Log Management
 
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
Central LogFile Storage. ELK stack Elasticsearch, Logstash and Kibana.
 
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...
 
Introduction to ELK
Introduction to ELKIntroduction to ELK
Introduction to ELK
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
 
Elk devops
Elk devopsElk devops
Elk devops
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELK
 
'Scalable Logging and Analytics with LogStash'
'Scalable Logging and Analytics with LogStash''Scalable Logging and Analytics with LogStash'
'Scalable Logging and Analytics with LogStash'
 
ELK introduction
ELK introductionELK introduction
ELK introduction
 
elk_stack_alexander_szalonnas
elk_stack_alexander_szalonnaselk_stack_alexander_szalonnas
elk_stack_alexander_szalonnas
 
Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK Machine Learning in a Twitter ETL using ELK
Machine Learning in a Twitter ETL using ELK
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
 
Elk scilifelab
Elk scilifelabElk scilifelab
Elk scilifelab
 
Logstash: Get to know your logs
Logstash: Get to know your logsLogstash: Get to know your logs
Logstash: Get to know your logs
 
Centralized Logging System Using ELK Stack
Centralized Logging System Using ELK StackCentralized Logging System Using ELK Stack
Centralized Logging System Using ELK Stack
 
Toronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELKToronto High Scalability meetup - Scaling ELK
Toronto High Scalability meetup - Scaling ELK
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
ELK, a real case study
ELK,  a real case studyELK,  a real case study
ELK, a real case study
 

Viewers also liked

Building Product from ground up using Open Source Technologies
Building Product from ground up using Open Source TechnologiesBuilding Product from ground up using Open Source Technologies
Building Product from ground up using Open Source TechnologiesAmit Goel
 
ELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learnedELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learnedTin Le
 
Warehouse based Intelligent Banking Transaction Analysis System
Warehouse based Intelligent Banking Transaction Analysis SystemWarehouse based Intelligent Banking Transaction Analysis System
Warehouse based Intelligent Banking Transaction Analysis SystemJivan Nepali
 
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and SparkCrystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and SparkJivan Nepali
 
A Basic Guide to Server Log Analysis
A Basic Guide to Server Log AnalysisA Basic Guide to Server Log Analysis
A Basic Guide to Server Log AnalysisAndrew Halliday
 
Learn ELK in docker
Learn ELK in dockerLearn ELK in docker
Learn ELK in dockerLarry Cai
 
Framingham Go Meetup - October 2016
Framingham Go Meetup - October 2016Framingham Go Meetup - October 2016
Framingham Go Meetup - October 2016Matthew Broberg
 
From Config Management Sucks to #cfgmgmtlove
From Config Management Sucks to #cfgmgmtlove From Config Management Sucks to #cfgmgmtlove
From Config Management Sucks to #cfgmgmtlove Kris Buytaert
 
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016Matthew Broberg
 
Data science team, a practice to setup
Data science team, a practice to setupData science team, a practice to setup
Data science team, a practice to setupOmid Mogharian
 
Send that (damn) elevator down !
Send that (damn) elevator down !Send that (damn) elevator down !
Send that (damn) elevator down !Ekta Grover
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016Matthew Broberg
 
Real-time data analysis using ELK
Real-time data analysis using ELKReal-time data analysis using ELK
Real-time data analysis using ELKJettro Coenradie
 
Monitoring with ElasticSearch
Monitoring with ElasticSearch Monitoring with ElasticSearch
Monitoring with ElasticSearch Kris Buytaert
 
Elastic Stackにハマった話
Elastic Stackにハマった話Elastic Stackにハマった話
Elastic Stackにハマった話Kazuhiro Kosaka
 
Cloud Log Analysis and Visualization
Cloud Log Analysis and VisualizationCloud Log Analysis and Visualization
Cloud Log Analysis and VisualizationRaffael Marty
 
Monitoring using Open source technologies
Monitoring using Open source technologiesMonitoring using Open source technologies
Monitoring using Open source technologiesUTKARSH BHATNAGAR
 
The Rise of Real Time
The Rise of Real TimeThe Rise of Real Time
The Rise of Real Timeconfluent
 

Viewers also liked (20)

Building Product from ground up using Open Source Technologies
Building Product from ground up using Open Source TechnologiesBuilding Product from ground up using Open Source Technologies
Building Product from ground up using Open Source Technologies
 
ELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learnedELK at LinkedIn - Kafka, scaling, lessons learned
ELK at LinkedIn - Kafka, scaling, lessons learned
 
Warehouse based Intelligent Banking Transaction Analysis System
Warehouse based Intelligent Banking Transaction Analysis SystemWarehouse based Intelligent Banking Transaction Analysis System
Warehouse based Intelligent Banking Transaction Analysis System
 
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and SparkCrystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
Crystal Ball Event Prediction and Log Analysis with Hadoop MapReduce and Spark
 
A Basic Guide to Server Log Analysis
A Basic Guide to Server Log AnalysisA Basic Guide to Server Log Analysis
A Basic Guide to Server Log Analysis
 
Learn ELK in docker
Learn ELK in dockerLearn ELK in docker
Learn ELK in docker
 
Framingham Go Meetup - October 2016
Framingham Go Meetup - October 2016Framingham Go Meetup - October 2016
Framingham Go Meetup - October 2016
 
From Config Management Sucks to #cfgmgmtlove
From Config Management Sucks to #cfgmgmtlove From Config Management Sucks to #cfgmgmtlove
From Config Management Sucks to #cfgmgmtlove
 
Rootconf
RootconfRootconf
Rootconf
 
Mesoscon 2015
Mesoscon 2015Mesoscon 2015
Mesoscon 2015
 
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
 
Data science team, a practice to setup
Data science team, a practice to setupData science team, a practice to setup
Data science team, a practice to setup
 
Send that (damn) elevator down !
Send that (damn) elevator down !Send that (damn) elevator down !
Send that (damn) elevator down !
 
Intro to open source telemetry linux con 2016
Intro to open source telemetry   linux con 2016Intro to open source telemetry   linux con 2016
Intro to open source telemetry linux con 2016
 
Real-time data analysis using ELK
Real-time data analysis using ELKReal-time data analysis using ELK
Real-time data analysis using ELK
 
Monitoring with ElasticSearch
Monitoring with ElasticSearch Monitoring with ElasticSearch
Monitoring with ElasticSearch
 
Elastic Stackにハマった話
Elastic Stackにハマった話Elastic Stackにハマった話
Elastic Stackにハマった話
 
Cloud Log Analysis and Visualization
Cloud Log Analysis and VisualizationCloud Log Analysis and Visualization
Cloud Log Analysis and Visualization
 
Monitoring using Open source technologies
Monitoring using Open source technologiesMonitoring using Open source technologies
Monitoring using Open source technologies
 
The Rise of Real Time
The Rise of Real TimeThe Rise of Real Time
The Rise of Real Time
 

Similar to Experiences in ELK with D3.js for Large Log Analysis and Visualization

Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Florian Lautenschlager
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Splunk App for Stream
Splunk App for StreamSplunk App for Stream
Splunk App for StreamSplunk
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Codemotion
 
Managing your black friday logs - Code Europe
Managing your black friday logs - Code EuropeManaging your black friday logs - Code Europe
Managing your black friday logs - Code EuropeDavid Pilato
 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream csching
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC OsloDavid Pilato
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...RISC-V International
 
#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitterTwitter Developers
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextLucidworks
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesDataWorks Summit
 
Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsLouise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsDataconomy Media
 
Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Brendan Tierney
 
Fast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL EngineFast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL EngineDatabricks
 
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Amazon Web Services
 
Code4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch PortalCode4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch Portaleby
 

Similar to Experiences in ELK with D3.js for Large Log Analysis and Visualization (20)

Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017Chronix Poster for the Poster Session FAST 2017
Chronix Poster for the Poster Session FAST 2017
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Splunk App for Stream
Splunk App for StreamSplunk App for Stream
Splunk App for Stream
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
Managing your Black Friday Logs - Antonio Bonuccelli - Codemotion Rome 2018
 
Managing your black friday logs - Code Europe
Managing your black friday logs - Code EuropeManaging your black friday logs - Code Europe
Managing your black friday logs - Code Europe
 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC Oslo
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Lrz kurs: big data analysis
Lrz kurs: big data analysisLrz kurs: big data analysis
Lrz kurs: big data analysis
 
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
 
#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Fast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL ReleasesFast and Reliable Apache Spark SQL Releases
Fast and Reliable Apache Spark SQL Releases
 
Louise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx SystemsLouise McCluskey, Kx Engineer at Kx Systems
Louise McCluskey, Kx Engineer at Kx Systems
 
Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017Ireland OUG Meetup May 2017
Ireland OUG Meetup May 2017
 
Fast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL EngineFast and Reliable Apache Spark SQL Engine
Fast and Reliable Apache Spark SQL Engine
 
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
Empowering Congress with Data-Driven Analytics (BDT304) | AWS re:Invent 2013
 
Code4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch PortalCode4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch Portal
 

Recently uploaded

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 

Recently uploaded (20)

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 

Experiences in ELK with D3.js for Large Log Analysis and Visualization

  • 1. Surasak Sanguanpong Surasak.S@ku.ac.th Applied Network Research Lab Department of Computer Engineering Faculty of Engineering, Kasetsart University Software  Freedom   Day  2016  – Sept  17  Bangkok Experiences  in  ELK  with  D3.js   for  Large  Log  Analysis   and  Visualization U-Bahn Station Candid Plazt, Munich,Germany
  • 2. In This Talk 2 Real  Time   Visualization with  D3.js Search  Platform   with  ELK About  Traffic  Log (KU  case  study) Lessons  Learnt
  • 3. Chapter I Network Traffic Log Structure and Sizing KU Case Study 3
  • 4. Why keeping Log • Legal compliant • Troubleshooting • Security analysis • Statistics/Analytics 4
  • 6. Searching in Log Find relevant stuff Find it fast Make our lives easier 6
  • 7. Traffic Logging Solution Splunk? Great, but.. Commercial, Proprietary Graylog? Excellence, but too automatic Elasticsearch, Logtash, Kibana (ELK), D3 That is!, open source, fun to play 7
  • 8. KU Logging 2008-2015 2015- 8 Raw Log MySQL Simple Web GUI Raw Log Elasticsearch Web GUI/Kibana/D3 • On the fly text based log to MySQL converter • Simple but slow • Much faster!
  • 9. KU Logging Structure 9 Network Login Portal Search GUI Logging Engine Packet Capture Socket Login Log Web Log Packet Log Multicore x86 with 10 GbE Raw Log
  • 10. Raw Login Log Format • Real-time logging, one file per day 10 Date Time Action IP UserName LogServer Jul 1 10:04:57 login 158.108.X.X XXXXX@ku.ac.th 192.168.1.1 Jul 1 10:04:58 logout 158.108.X.X YYYYY@ku.ac.th 192.168.1.2 Jul 1 10:04:59 timeout 158.108.X.X ZZZZZ@ku.ac.th 192.168.1.2
  • 11. Raw Web Log Format • Real-time logging, one file per minute 11 UnixTime SrcIPv4 SrcIPv6 DstIPv4 DstIPv6 SrcPort DstPort URL Referer/HTTPS 20151103010000 192.55.X.X - 158.108.X.X - 17490 80 mirror1.ku.ac.th/fedora-epel/6/i386/jday-devel-2.4-5.el6.i686.rpm http://mirror1.ku.ac.th/fedora-epel/6/i386/ 20151103010000 - 2406:3100:1018:1::XX - 2600:1417:a::174c:XX 61154 443 fbcdn-photos-g-a.akamaihd.net HTTPS 20151103010000 - 2406:3100:1018:1::XX - 2a03:2880:f002:105:fa:b0:0:YYXX 59960 443 edge-mqtt.facebook.com HTTPS
  • 12. Raw Packet Log Format (Header Log) • Real-time logging, one file per minute 12 TimeStamp SrcIP DstIP SrcPort Proto Size DstPort SrcPort [Flag] 2009-07-16 17:53:59.999206 208.117.8.X 158.108.234.X 1514 TCP 80 1371 0x10 2009-07-16 17:53:59.999209 158.108.2.X 202.143.136.X 90 UDP 123 123 TimeStamp SrcIP DstIP Proto Code 2009-07-16 17:53:59.999210 158.108.184.X 218.164.54.X ICMP 168
  • 13. Time based Hierarchical Folder 13 Minutely  FileHourDayMonthYear 2015 01 01 00 201501010000.txt 201501010001.txt : 201501010059.txt01 : : 23 201501012300.txt 201501012301.txt : 201501012359.txt 02 : 30 02 : 12
  • 14. At What Scale? Quite Large.. 14Source: http://www.24hourcampfire.com/ubbthreads/ubbthreads.php/topics/5976731/all/That_s_a_loa d_of_logs SPEED 400,000 req/s peak STRUCTURE Text/binary SIZE 30 TB 3.2 trillion docs
  • 15. Facts about KU Accounts 113,XXX 4 Campuses BKN, KPS, SRC, SKN 158.108.0.0/16 192.102.83.0/24 10.0.0.0/8 2406:3100::/32 2001:3c8:1303::/48 2001:f00:2003::/48 IPv4 IPv6 50,XXX Concurrent Active IP Address (25,XXX:Wifi) Registered Devices 210,XXX 1,4XX Access Points
  • 16. System Structure Internal network Internet IPv4/IPv6  Parallel  Firewalls          with  Load  Balancers Gateway Router Core Router Session  Manager Login  Servers Quota  Manager Traffic  Logger   Case  Study 5x1  Gbps 1x10  Gbps
  • 17. Sample Minutely HTTP Request Rate 17 11  days  (11x  24x60=  15,640  data  points)
  • 18. Request Rate and Log Sizing 18 3.1  req/s 27  MB/d 2,100  req/s 33  GB/d 380,000  req/s 330  GB/d
  • 19. Accumulated Log Request and Size 19 #Files  :  120 20M 2.04  GB 14.1B 2.57  TB #Files  :  172,800 28.03  TB 3.27T #Files  :  172,800
  • 20. New Logging Architecture 20 Network Login Portal Logging Engine Packet Capture Socket Login Log Web Log Flow Log Multicore x86 with 10 GbE Raw Log DHCP, RADIUS Session Tracking & Accounting Elasticsearch Real time Indexing GUI/ Analytics
  • 21. Chapter II ELK Stack Testbed 21
  • 22. What is the Elasticsearch? 22 Real-time Search Engine SW Document- Oriented JSON based REST API JAVA/ Lucene based Open Source Apache 2 License REST: Representational State Transfer JSON: JavaScript Object Notation
  • 23. Elasticsearch and Database • Roughly layout comparison 23 Relational Database Database Table Row Column Schema Elasticsearch Index Type Document Field Mapping
  • 24. Elasticsearch Logical Layout 24 Index: social Index: blog Elasticsearch Node Type: story Type: user Type: posts 1 2 1 2 1 2 3 4 Index Application Search Application • Using any HTTP client to talk to Elasticsearch at localhost port 9200 • RESTful : Interact through common HTTP methods (GET, POST, PUT, DELETE) • Not maintain state information • Each request is independent and resources are returned in JSON text formats
  • 25. How the world is using Elasticsearch? 25 Analytics solution on 40 million documents per day to deliver real-time visibility Providing search across GitHub's code Full-text search to find related questions and answers Full-text search with highlighted search snippets
  • 26. Elasticsearch and Big Data ES-Hadoop: Connectivity of Hadoop's big data analytics and the real-time search of Elasticsearch. 26 Source:  https://www.elastic.co/products/hadoop
  • 27. What does Elasticsearch offer? 27 Full Text Search Very Fast Fault Tolerance High Availability Distributed Scalable Plugin Architecture
  • 28. Node and Cluster 28 CLUSTER P0 Node 1 P1 R2 R0 Node 2 R1 P2 CLUSTER P0 Node 1 P1 P2 Px: Primary Shard : Chunk of index Rx: Replica Shard: Copy of Shard A Node : A running Elasticsearch process CLUSTER P0 Node 1 Node 2 R0 Node 3 R1 R2P1 P2R2 R2 R2 (3 Shards/1 Replica)
  • 29. When a Node Fails (3 Shards/1 Replica) 29 CLUSTER P0 Node 1 Node 2 R0 Node 3 R1 R2P1 P2R2 R0 R1 CLUSTER P0 Node 1 Node 2 R0 Node 3 P1 R2P1 P2R2 R2 R1
  • 30. When a Node Fails (3 Shards/2 Replicas) 30 CLUSTER P0 Node 1 Node 2 R0 Node 3 R1 R2P1 P2R2 R0 R1 CLUSTER P0 Node 1 Node 2 R0 Node 3 P1 R2P1 P2R2 R2 R1
  • 31. Elasticsearch documents • Document : Basic unit of user data in JSON representation • Sample Document : 31 { "user" : ”Chris" "gender" : ”M", "birthdate" : "1980-12-11" }
  • 32. URI of a document 32 http://localhost:9200/sample_index/sample_type/1 Protocol used: supported HTTP Port to connect to Using 9200 by default Host name of the Elasticsearch node Index name Type name Document ID
  • 33. HTTP based CRUD operation Create curl -XPUT “http://localhost:9200/<index>/<type>/<id>” Read curl -XGET "http://localhost:9200/<index>/<type>/<id>" Update curl -XPOST "http://localhost:9200/<index>/<type>/<id>" Delete curl -XDELETE "http://localhost:9200/<index>/<type>/<id>" 33
  • 34. ELK stack from Elastic 34 Elasticsearch: High- performance scalable search engine Logstash: Log transport and processing daemon (Log Shipper) ELK Stack Kibana: Visualisation dashboard
  • 35. Logstash 35 Log aggregator and parser Transferring parsed data to Elasticsearch Configuration file for specifying input, filtering (parsing) and output input  {  stdin {  }  } filter  {     grok {       match  =>  {  "message"   =>  "%{COMBINEDAPACHELOG}"  }     }     date  {        match  =>  [  "timestamp"   ,"dd/MM/yyyy:HH:mm:ss"  ]     } } output  {    Elasticsearch {  hosts  =>  ["localhost:9200"]  }     stdout {  codec  =>  rubydebug }} Sample  Configuration  Source:     https://www.elastic.co/guide/en/logstash/current/config-­‐examples.html
  • 36. Kibana 36 General purpose query UI Includes many widgets Query Elasticsearch without coding
  • 38. Indexing Performance (Weblog) 38 35 36 37 38 39 40 41 42 43 44 45 0 50 100 150 200 250 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 THOUSANDS MILLIONS Daily    Performance  Indexing #Records Records/s • Dell R230 • Xeon E3-1271v3 3.6 Ghz 4C/8T • Hyper-threading off • 32 GB RAM • 2x6 TB NLSAS • Elasticsearch 2.3.2 • 10 Shards/0 Replica Indexing  Performance on  single  machine  
  • 39. Search Performance Search keyword: “ face” against each daily log Not yet Optimization 39 2.01 2.33 1.99 2.13 2.67 2.00 1.33 1.02 3.00 2.33 2.00 2.67 3.00 2.67 2.43 3.33 2.67 2.14 3.33 17,551 22,816 16,346 18,218 16,240 7,958 5,622 1,886 23,559 9,127 8,221 12,343 28,259 25,405 22,092 33,528 17,683 12,951 18,054 0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 0.00 0.50 1.00 1.50 2.00 2.50 3.00 3.50 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 SEARCH  TIME  (MS) Search  Performance  and  Hits Search  Time  (ms) Hits
  • 45. Real Time Visualization with D3.js • Data-Driven Documents (D3) • JavaScript library for manipulating documents based on data • Developed by Mike Bostock 45 https://d3js.org/
  • 46. D3 Architecture § Input data to build visualizations (JSON, CSV,…) § Data manipulation of HTML elements dynamically with JavaScript 46 node.js socket.io
  • 50. Sample IP Matrix Occupied 50
  • 51. Sample Tree Map Web Access 51
  • 54. Lessons Learned Elasticsearch offers a very fast full-text search services Indexing size may 3x to 5x bigger than source data Use Elasticsearch for search services, not for data archiving More cores or Faster clock? : Choose CORES 64 GB of RAM is the ideal Go with SSD if possible 54
  • 55. Lessons Learned Designed to work in a trusted environment No built in security No authentication or authorization, no concept of a user Anyone that can send a request to the cluster is a super user Easy to erase all the data: curl –XDELETE http://<server>:9200/_all 55
  • 56. Lessons Learned Shield from Elasticsearch: A comprehensive security solution, including encrypted communications, RBAC, AD/LDAP integration and auditing Use with a proxy : Authentication and request filtering with nginx or others 56
  • 57. Lessons Learned Logstash : A powerful tool to manipulate log Kibana : Simple and useful for visualize data 57
  • 58. Lessons Learned D3 pros: Flexible, Fascinating Visualization D3 cons: Low Level, Steep Learning Curve, CPU intensive 58
  • 59. Thank you for your attention Q & A Time Q&A… 59 Kasom Koth-Arsa Core Log Design and Development Jautuporn Chuchuay Peerapol Boonthaganon Web GUI Development Sataporn Techaaramwong Web/Elasticsearch Development Peerapong Thongpubeth Jiradech Sirijantadilok Kibana Development Poomipat Thongudom Nichapat Nattee D3 Development Surachai Chitpinijyol Project Coordinator Surasak Sanguanpong Project Director Special Thanks to Kasetsart Office of Computer Services for supporting traffic dataSunset at Narita Airport Kasom Koth-Arsa Core Log Design and Development Jautuporn Chuchuay Peerapol Boonthaganon Web GUI Development Sataporn Techaaramwong Web/Elasticsearch Development Peerapong Thongpubeth Jiradech Sirijantadilok Kibana Development Poomipat Thongudom Nichapat Nattee D3 Development Surachai Chitpinijyol Project Coordinator Surasak Sanguanpong Project Director Special Thanks to Kasetsart Office of Computer Services for supporting traffic data