Digital Marketing Plan, how digital marketing works
Experiences in ELK with D3.js for Large Log Analysis and Visualization
1. Surasak Sanguanpong
Surasak.S@ku.ac.th
Applied Network Research Lab
Department of Computer Engineering
Faculty of Engineering, Kasetsart University
Software
Freedom
Day
2016
– Sept
17
Bangkok
Experiences
in
ELK
with
D3.js
for
Large
Log
Analysis
and
Visualization
U-Bahn Station Candid Plazt, Munich,Germany
2. In This Talk
2
Real
Time
Visualization
with
D3.js
Search
Platform
with
ELK
About
Traffic
Log
(KU
case
study)
Lessons
Learnt
7. Traffic Logging Solution
Splunk?
Great, but.. Commercial, Proprietary
Graylog?
Excellence, but too automatic
Elasticsearch, Logtash, Kibana (ELK), D3
That is!, open source, fun to play
7
8. KU Logging
2008-2015 2015-
8
Raw Log
MySQL
Simple Web GUI
Raw Log
Elasticsearch
Web GUI/Kibana/D3
• On the fly text based log to
MySQL converter
• Simple but slow
• Much faster!
10. Raw Login Log Format
• Real-time logging, one file per day
10
Date Time Action IP UserName LogServer
Jul 1 10:04:57 login 158.108.X.X XXXXX@ku.ac.th 192.168.1.1
Jul 1 10:04:58 logout 158.108.X.X YYYYY@ku.ac.th 192.168.1.2
Jul 1 10:04:59 timeout 158.108.X.X ZZZZZ@ku.ac.th 192.168.1.2
11. Raw Web Log Format
• Real-time logging, one file per minute
11
UnixTime SrcIPv4 SrcIPv6 DstIPv4 DstIPv6 SrcPort DstPort URL Referer/HTTPS
20151103010000 192.55.X.X - 158.108.X.X - 17490 80
mirror1.ku.ac.th/fedora-epel/6/i386/jday-devel-2.4-5.el6.i686.rpm
http://mirror1.ku.ac.th/fedora-epel/6/i386/
20151103010000 - 2406:3100:1018:1::XX - 2600:1417:a::174c:XX 61154 443
fbcdn-photos-g-a.akamaihd.net HTTPS
20151103010000 - 2406:3100:1018:1::XX - 2a03:2880:f002:105:fa:b0:0:YYXX
59960 443 edge-mqtt.facebook.com HTTPS
12. Raw Packet Log Format (Header Log)
• Real-time logging, one file per minute
12
TimeStamp SrcIP DstIP SrcPort Proto Size DstPort SrcPort [Flag]
2009-07-16 17:53:59.999206 208.117.8.X 158.108.234.X 1514 TCP 80 1371 0x10
2009-07-16 17:53:59.999209 158.108.2.X 202.143.136.X 90 UDP 123 123
TimeStamp SrcIP DstIP Proto Code
2009-07-16 17:53:59.999210 158.108.184.X 218.164.54.X ICMP 168
22. What is the Elasticsearch?
22
Real-time
Search
Engine SW
Document-
Oriented
JSON based
REST API
JAVA/
Lucene
based
Open
Source
Apache 2
License
REST: Representational State Transfer
JSON: JavaScript Object Notation
23. Elasticsearch and Database
• Roughly layout comparison
23
Relational Database
Database
Table
Row
Column
Schema
Elasticsearch
Index
Type
Document
Field
Mapping
24. Elasticsearch Logical Layout
24
Index: social Index: blog
Elasticsearch Node
Type: story Type: user Type: posts
1 2 1 2 1 2 3 4
Index Application
Search Application
• Using any HTTP client to talk to
Elasticsearch at localhost port 9200
• RESTful : Interact through common
HTTP methods (GET, POST, PUT,
DELETE)
• Not maintain state information
• Each request is independent and
resources are returned in JSON
text formats
25. How the world is using Elasticsearch?
25
Analytics solution on 40 million
documents per day to deliver
real-time visibility
Providing search across
GitHub's code
Full-text search to find related
questions and answers
Full-text search with highlighted
search snippets
26. Elasticsearch and Big Data
ES-Hadoop: Connectivity of Hadoop's big data analytics and the real-time search of Elasticsearch.
26
Source:
https://www.elastic.co/products/hadoop
27. What does Elasticsearch offer?
27
Full Text
Search
Very Fast
Fault
Tolerance
High
Availability
Distributed Scalable
Plugin
Architecture
28. Node and Cluster
28
CLUSTER
P0
Node 1
P1 R2 R0
Node 2
R1 P2
CLUSTER
P0
Node 1
P1 P2
Px: Primary Shard : Chunk of index
Rx: Replica Shard: Copy of Shard
A Node : A running Elasticsearch process
CLUSTER
P0
Node 1 Node 2
R0
Node 3
R1 R2P1 P2R2 R2 R2
(3 Shards/1 Replica)
31. Elasticsearch documents
• Document : Basic unit of user data in JSON representation
• Sample Document :
31
{
"user" : ”Chris"
"gender" : ”M",
"birthdate" : "1980-12-11"
}
32. URI of a document
32
http://localhost:9200/sample_index/sample_type/1
Protocol used:
supported HTTP
Port to connect to
Using 9200 by default
Host name of the
Elasticsearch node
Index name
Type name
Document ID
45. Real Time Visualization with D3.js
• Data-Driven
Documents (D3)
• JavaScript library for
manipulating
documents based on
data
• Developed by Mike
Bostock
45
https://d3js.org/
46. D3 Architecture
§ Input data to build
visualizations (JSON,
CSV,…)
§ Data manipulation of HTML
elements dynamically with
JavaScript
46
node.js
socket.io
54. Lessons Learned
Elasticsearch offers a very fast full-text search services
Indexing size may 3x to 5x bigger than source data
Use Elasticsearch for search services, not for data archiving
More cores or Faster clock? : Choose CORES
64 GB of RAM is the ideal
Go with SSD if possible
54
55. Lessons Learned
Designed to work in a trusted environment
No built in security
No authentication or authorization, no concept of a user
Anyone that can send a request to the cluster is a super user
Easy to erase all the data:
curl –XDELETE http://<server>:9200/_all
55
56. Lessons Learned
Shield from Elasticsearch: A comprehensive security solution,
including encrypted communications, RBAC, AD/LDAP
integration and auditing
Use with a proxy : Authentication
and request filtering with nginx or others
56
57. Lessons Learned
Logstash : A powerful tool to manipulate log
Kibana : Simple and useful for visualize data
57
59. Thank you for your attention
Q & A Time
Q&A…
59
Kasom Koth-Arsa
Core Log Design and Development
Jautuporn Chuchuay
Peerapol Boonthaganon
Web GUI Development
Sataporn Techaaramwong
Web/Elasticsearch Development
Peerapong Thongpubeth
Jiradech Sirijantadilok
Kibana Development
Poomipat Thongudom
Nichapat Nattee
D3 Development
Surachai Chitpinijyol
Project Coordinator
Surasak Sanguanpong
Project Director
Special Thanks to Kasetsart Office of Computer
Services for supporting traffic dataSunset at Narita Airport
Kasom Koth-Arsa
Core Log Design and Development
Jautuporn Chuchuay
Peerapol Boonthaganon
Web GUI Development
Sataporn Techaaramwong
Web/Elasticsearch Development
Peerapong Thongpubeth
Jiradech Sirijantadilok
Kibana Development
Poomipat Thongudom
Nichapat Nattee
D3 Development
Surachai Chitpinijyol
Project Coordinator
Surasak Sanguanpong
Project Director
Special Thanks to Kasetsart Office of Computer
Services for supporting traffic data