Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Debugging Skynet
A Machine Learning Approach to Log Analysis
ianir ideses - Logz.io

The Problem - Overlogging
• Millions of logs per week
• Important logs get lost in the clutter
• Need to surface the relevant logs, deemphasize irrelevant logs

Proposed Solution
• A Machine Learning approach
• Can sift through large amounts of data
• Can evolve and react to changes in data
• Requires large amounts of data to be effective

Machine Learning
• Unsupervised
• Clustering
• Anomaly detection
• Supervised
• Recommender systems
• Classifiers

Unsupervised Machine Learning
• No labels are needed, just lots of data
• Useful when reducing a large amount of data points to a smaller
cluster subset

Unsupervised Machine Learning
"GET /twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.Confi
"GET /twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2 HTTP/1.
"GET /mailman/listinfo/hsdivision HTTP/1.1" 200 6291
"GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1" 200 7352
"GET /twiki/bin/view/Main/DCCAndPostFix HTTP/1.1" 200 5253
"GET /twiki/bin/oops/TWiki/AppendixFileSystem?template=oopsmore¶m1=1.
"GET /twiki/bin/view/Main/PeterThoeny HTTP/1.1" 200 4924
"GET /twiki/bin/edit/Main/Header_checks?topicparent=Main.Configuratio
"GET /twiki/bin/attach/Main/OfficeLocations HTTP/1.1" 401 12851
"GET /twiki/bin/view/TWiki/WebTopicEditTemplate HTTP/1.1" 200 3732
"GET /app_dev.php/ HTTP/1.1" 200 6715 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X
10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36"
"GET /bundles/framework/css/body.css HTTP/1.1" 200 6657 "http://my.log-
sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.231
"GET /bundles/framework/css/structure.css HTTP/1.1" 200 1191 "http://my.log-
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.
"GET /bundles/acmedemo/css/demo.css HTTP/1.1" 200 2204 "http://my.log-
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311
"GET /bundles/acmedemo/images/welcome-quick-tour.gif HTTP/1.1" 200 4770
"http://my.log-sandbox/app_dev.php/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3)
AppleWebKit/537.36 (KHTML, like Gecko)
"GET /bundles/acmedemo/images/welcome-demo.gif HTTP/1.1" 200 4053 "http://my.log-
AppleWebKit/537.36 (KHTML, like Gecko) Chrom
Nov 20 17:27:55 HANNIBAL MyProgram[13163]: Program started by User 1000
Nov 21 17:27:53 HANNIBAL MyProgram[13163]: Program terminated by User 1000
Nov 21 17:27:58 JANE MyProgram[13163]: Program started by User 555
Nov 23 18:27:53 ARILOU MyProgram[13163]: Program stopped by User 777

Supervised Machine Learning
• Learning from labeled examples
• Requires a well defined question:
• Is this email spam?
• Is this object a car?
• Is this log interesting?
• Deployed successfully in many domains, most notable classifiers are
NN, SVM, Bayesian Classifiers

Supervised Machine Learning - SVM
• Data elements are arranged in vectors
• Each vector index is assigned a weight in the training phase
• A score is computed by summing up the relevant weights
0.1
0.5
-0.9
0.3
Xconnection error success failure
“Connection failure”: 0.1 + 0.3 = 0.4
“Connection success”: 0.1 - 0.9 = -0.8

Log Relevancy
• An ill posed problem
• Relevancy is user specific
• People tend to search for
known issues
• There are also unknown
unknowns
• Labels are potentially
very tedious to acquire

Proposed Solution - Labels
• Acquiring labels:
• Implicit/explicit user behavior
• Inter-user similarities
• Public knowledge bases

Machine Learning in Practice
• Data is textual, numerical and alphanumerical
• Classifiers that have shown good results:
• Random Forests, resemble flow chart decision making
• Linear SVM
• Both classifiers are easy to interpret in the feature space

Machine Learning in Practice
connected: -0.157199772246
to provider: -0.15319903564
connected successfully: -0.15319903564
unable: 0.671539714688
topic: 0.678756599452
error: 0.788508324168

Machine Learning in Practice - Modules
• Log normalization
• Label acquisition
• Model training
• Log classification and enhancement

Log Normalization
• Lower case, stem, stop words
• Identify common fields (timestamp, severity, etc’)
• Identify variable, functions, class names
• Identify known reserved words
• Cluster logs that share the same prototype

Labeler
• Different sources for labels
• CQA sites
• Explicit user interaction
• Implicit user interaction
• Heuristics

Log Enhancer
• Use knowledge about log events to add prior data
• Suggest solutions to known problems
• Tag relevant logs for display to the user

Flow
Log Normalization
Labeler
ML - Training Log Enhancer
Logs
Classifiers
Logs

Machine Learning at Scale
• Use Spark to drive high throughput, high scale
• Tbytes of data, daily
• Spot Instances to keep costs at bay

To Sum Up
• Formulate your question
• Get enough data
• Get enough labels
• Clean data
• Train your classifier

Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016

Similar to Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016 (20)

More from DevOpsDays Tel Aviv

More from DevOpsDays Tel Aviv (20)

Recently uploaded

Recently uploaded (20)

Debugging Skynet: A Machine Learning Approach to Log Analysis - Ianir Ideses, logz.io - DevOpsDays Tel Aviv 2016