SlideShare a Scribd company logo
1 of 60
Download to read offline
ISAAC DAWSON,
AROUND THE WEB IN 80 HOURS: SCALABLE
FINGERPRINTING WITH CHROMIUM AUTOMATION
VERACODE
15
VERACODE
AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH
CHROMIUM AUTOMATION
ABOUT ME:
▸ Previously at @stake, Symantec (10 years)
▸ Moved into research role at Veracode, Inc. (6 years)
▸ Living in Japan for 12 years
▸ I <3
VERACODE
AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH
CHROMIUM AUTOMATION
IT ALL STARTED IN 2012…
VERACODE
AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH
CHROMIUM AUTOMATION
SECURITY HEADER SCANNING HISTORY
▸ All scanners use the Alexa Top 1 Million URLs
▸ Galexa (November 2012 - March 2014)
▸ Golexa (March 2014 - February 2016)
▸ Creeper v0-v1 (February 2016 - July 2016)
▸ Creeper v2 (July 2016 - …)
ARCHITECTURE
THE SYSTEM:
VERACODE
AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH
CHROMIUM AUTOMATION
SUMMARY OF SYSTEMS & COMPONENTS
▸ Admin (x1) - Manages jobs
▸ Agents (x50) - Analyzes URLs
▸ DB Writers (x4) - Feeds analysis data into the DB & S3
▸ Database (x1) - PostgreSQL 9.5 DB
▸ NSQ - A message queue for URLs, reports and responses
▸ S3 - Stores serialized DOM and HTML/JS
VERACODE
AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH
CHROMIUM AUTOMATION
THE MESSAGE QUEUE -NSQD, NSQLOOKUPD
▸ NSQ is an easy to deploy message queue
▸ JSON messages between all systems
▸ All agents point to Admin service running NSQLookupd
VERACODE
AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH
CHROMIUM AUTOMATION
HELPFUL NSQ FEATURES
// Create consumer
c.urlConsumer, err = nsq.NewConsumer(job.Topics["url"],
creeper_types.UrlChannel, cfg)
// Process numBrowser of messages concurrently (7)
c.urlConsumer.AddConcurrentHandlers(
nsq.HandlerFunc(c.processUrls),
numBrowsers)


// Job taking too long to handle/process a message?
msg.Touch() // notify we are still working on this message
// Need to requeue because chrome crashed?
msg.RequeueWithoutBackoff(-1)
// Need to change max # of inflight messages?
c.urlConsumer.ChangeMaxInFlight(c.getInflightCount())
1
2
3
4
VERACODE
DATA STORAGE
AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH
CHROMIUM AUTOMATION
DATAFLOW
DB
AGENT
ADMIN
WRITER
WRITER
WRITER S3
AGENT
AGENT
CREEPER AGENTS
GETTING THE DATA WITH:
VERACODE
CREEPER AGENTS: GETTING THE DATA
VERACODE
CREEPER AGENTS: GETTING THE DATA
BROWSER AUTOMATION REQUIREMENTS
▸ Automatable
▸ Fast
▸ Capture network
▸ Capture various browser events (CSP violations)
▸ Inject JavaScript
VERACODE
CREEPER AGENTS: GETTING THE DATA
CHOSE CHROME, FOR OBVIOUS REASONS…
▸ Each agent runs 3-6 tabs concurrently
▸ Headless, uses Xvfb
▸ Can get full read access to network response data
▸ Easily inject javascript
▸ Can subscribe to console messages
VERACODE
CREEPER AGENTS: GETTING THE DATA
AGENT DESIGN
CREEPER AGENT
BROWSER
MANAGER
ANALYZER
REPORTER
APP LOGIC
CONTROLLING THE
BROWSER
VERACODE
CREEPER AGENTS: GETTING THE DATA
GOOGLE CHROME REMOTE DEBUGGER
▸ Huge definition files: browser_protocol.json and
js_protocol.json
{
"version": { "major": "1", "minor": "1" },
"domains": [{ "domain": "Inspector",
"hidden": true,
"types": [],
"commands": [{
"name": "enable",
"description": "Enables inspector domain...”,
"handlers": ["browser", "renderer"]
}],
"events": [{
"name": "evaluateForTestInFrontend",
"parameters": [ … ]
}],
}
}
VERACODE
CREEPER AGENTS: GETTING THE DATA
GCD
▸ GCD generates Go code using templates
▸ Remote access to debugger events, functions, types.
▸ Can be updated easily as the protocol files change
VERACODE
CREEPER AGENTS: GETTING THE DATA
GCD WAS GOOD BUT…
▸ Needed something better
▸ Built autogcd to automate:
▸ Trapping console messages
▸ Intercepting network data
▸ Injecting JS
▸ Took some inspiration from WebDriver
VERACODE
CREEPER AGENTS: GETTING THE DATA
GETTING CSP EVENTS
func (b *Browser) StartIntercepting() error {
b.tab.GetConsoleMessages(b.cspHandler())
return nil
}
func (b *Browser) cspHandler() autogcd.ConsoleMessageFunc {
return func(tab *autogcd.Tab, message
*gcdapi.ConsoleConsoleMessage) {
if message.Source != "security" {
return
}
parseCsp(b.creeperData.CspResults,
b.creeperData.ReportOnlyCspResults, message.Text)
}
}
1
2
VERACODE
CREEPER AGENTS: GETTING THE DATA
TRAPPING NETWORK RESPONSES
func (b *Browser) StartIntercepting() error {
b.tab.GetNetworkTraffic(nil, b.responseHandler(), b.respFinishedHandler())
}
func (b *Browser) responseHandler() autogcd.NetworkResponseHandlerFunc {
return func(tab *autogcd.Tab, response *autogcd.NetworkResponse) {
creeperResponse.Url = response.Response.Url
b.networkContainer.WaitFor(response.RequestId)
creeperResponse.ResponseBody, _ = b.encodeBody(response.RequestId,
creeperResponse.MimeType,
creeperResponse.Url)
b.networkContainer.AddReady(creeperResponse)
}
}
// mark the body as ready
func (b *Browser) respFinishedHandler() autogcd.NetworkFinishedHandlerFunc {
return func(tab *autogcd.Tab, requestId string, dataLength, timeStamp float64) {
b.networkContainer.BodyReady(requestId)
}
}
1
2
3
4
VERACODE
CREEPER AGENTS: GETTING THE DATA
INJECTING JAVASCRIPT
▸ Extract JS libraries and versions
▸ Retire.js and Wappalyzer have some good pointers
▸ Created a JSON file with 86 frameworks
▸ Must wait for the page to be fully loaded
VERACODE
CREEPER AGENTS: GETTING THE DATA
INJECTING JAVASCRIPT - THE QUERIES
{
"libraries": [ {
"url": "http://jquery.com/",
"key": "jquery",
"statement": "jQuery.fn.jquery"
}, {
"url": "https://jquerymobile.com/",
"key": "jquery-mobile",
"statement": "jQuery.mobile.version"
}, {
"url": "http://www.embeddedjs.com/",
"key": "embeddedjs 1.0",
"statement": "(typeof EJS === "function"
&& typeof EJS.Buffer === "function") ? "ejs 1.0":"""
}, {
"url": "http://www.embeddedjs.com/",
"key": "embeddedjs 0.x",
"statement": "(typeof EJS === "function"
&& typeof EjsScanner === "function") ? "ejs 0.x":"""
} ]
}
VERACODE
CREEPER AGENTS: GETTING THE DATA
INJECTING JAVASCRIPT - INJECTING
for _, library := range JsLibs.Libraries {
res, err := b.ExecuteScript(library.Statement)
if err == nil && string(res) != "" {
log.Printf("%s library result was: %sn",
library.Key,
string(res))
report.JavaScriptLibraries[library.Key] = string(res)
}
}
VERACODE
CREEPER AGENTS: GETTING THE DATA
INJECTING JAVASCRIPT - WHEN IS A PAGE DONE?
▸ DOMContentLoaded doesn’t handle dynamically loaded
JS
▸ Listen for DOM change events
▸ Page loaded if no DOM change events occur for > 2
seconds
▸ Timeout after 5 seconds
CHALLENGES
VERACODE
CREEPER AGENTS: GETTING THE DATA
CHALLENGES - CONTAMINATION
					
+ + + +
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
+ + + +
																																																					
Start	
Capture
Load	
URL
Document		
Loaded	
Stop		
Capture
VERACODE
CREEPER AGENTS: GETTING THE DATA
CHALLENGES - CONTAMINATION - SOLUTION
+ + + + + + +
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
+ + + + + + +
Borrow	
Browser
Start	
Capture
Load	
URL
Document	
Loaded
Stop	
Capture
Kill	
Browser
Start/Add	
Pool
VERACODE
CREEPER AGENTS: GETTING THE DATA
CHALLENGES - CHROME BUG #1
▸ Turns out opening tabs excessively can cause tabs to not
respond to debugger protocol
VERACODE
CREEPER AGENTS: GETTING THE DATA
CHALLENGES - CHROME BUG #1 - SOLUTION
▸ Mark tabs as ‘dead’
▸ If max dead tab count is reached, drain active URLs and kill
chrome
CRASHSAFARI.COM
VERACODE
CREEPER AGENTS: GETTING THE DATA
CHALLENGES - CHROME BUG #2 - CHRASHSAFARI.COM
▸ Would completely kill chrome *and* agent
▸ Lost all active tabs
▸ This site cost me about 2-3 weeks development time
VERACODE
▸ Created killface package
▸ Sends a notification to stop active work
▸ Worker count dynamically adjusted to 1
▸ Pauses queue, runs all unfinished URLs again
▸ Once active count is 0, restart normally
CREEPER AGENTS: GETTING THE DATA
CHALLENGES - CRASHSAFARI.COM - SOLUTION
VERACODE
CREEPER AGENTS: GETTING THE DATA
OTHER CHALLENGES
✘ NSQ messages too large, zipping ineffective
✓Split response data/report data
✘ Sites block AWS IP ranges, (craigslist.com etc)
☹ Timeout…
✘ Concurrency issues
✓ Very careful use of go routines, channels and timers.
✘ Site analysis failures/timeouts
✓ Try 3 times, keep track of retry state.
✓ During retry, open a new browser and work on additional url
DB WRITERS & S3
STORING THE DATA WITH:
VERACODE
DB WRITERS: STORING THE DATA
PREVIOUSLY…
▸ Creeper v0 had many problems
▸ RDS did not support PostgreSQL 9.5
▸ Duplicate data
▸ For v1, wrote to disk, SHA1 of contents:
▸ /job/files/5/a/b/c/5abcfbe73e39e0572a939b09f1eb16d7.html
▸ v1 did not shard database tables
▸ Database tables were normalized
▸ Lock contention
VERACODE
DB WRITERS: STORING THE DATA
DATABASE REFRESHER - NORMALIZING
url header_name header_value
http://veracode.com x-xss-protection 1; mode=block
http://codeblue.jp x-xss-protection 1; mode=block
http://google.jp x-xss-protection 1; mode=block report-uri
url header_name_id header_value_id
http://veracode.com 0 0
http://codeblue.jp 0 0
http://google.jp 0 1
header_name_id header_name
0 x-xss-protection
header_value_id header_value
0 1; mode=block
1 1; mode=block report-uri …
NORMALIZED:
FLATTENED:
VERACODE
DB WRITERS: STORING THE DATA
CHALLENGES - GETTING THE DATA IN QUICKLY
▸ Get the data out of the DB writers as soon as possible
▸ Careful to not overload the database with many
connections
▸ Reduce lock contention for writing
VERACODE
DB WRITERS: STORING THE DATA
SOLUTION #1 - GETTING THE DATA IN QUICKLY
▸ DB Writers batch up reports and responses
▸ Inserted every 2.5-3.5 seconds
▸ Reduces number of required DB connections
VERACODE
DB WRITERS: STORING THE DATA
SOLUTION #1 BATCHER
func (b *Batcher) AddReport(r *creeper_types.CreeperReport) {
select {
case b.reportPool <- r:
atomic.AddInt32(&b.reportCount, 1)
}
}
func (b *Batcher) EmptyReports() []*creeper_types.CreeperReport {
reports := make([]*creeper_types.CreeperReport, 0)
for {
select {
case report := <-b.reportPool:
reports = append(reports, report)
default:
return reports
}
}
return nil
}
VERACODE
DB WRITERS: STORING THE DATA
SOLUTION #2 - GETTING THE DATA IN QUICKLY
▸ Insert into temporary table using COPY FROM
▸ Extracted from temporary table and INSERTed into final
table. This allows for UPSERTS:
INSERT INTO header_names (header_name)
SELECT responses_tmp.header_name FROM responses_tmp
ON CONFLICT DO NOTHING;
VERACODE
DB WRITERS: STORING THE DATA
CHALLENGES - LARGE TABLES
▸ INSERT INTO … FROM SELECT … on a table with
80,000,000 rows
▸ As tables got bigger, db writers slowed down
▸ This is not scalable
VERACODE
DB WRITERS: STORING THE DATA
SOLUTION - TABLE SHARDING
▸ Much like sharding for the file system
▸ Requires a key:
▸ URL ID. (Ex: 1,google.com 2,microsoft.com etc)
▸ Only large tables require sharding
VERACODE
shardKey % inputId
shardKey = 1
shardKey = 2
shardKey = 3
DB
DB WRITERS: STORING THE DATA
TABLE SHARDING
WRITER
VERACODE
DB WRITERS: STORING THE DATA
CREATING A SHARD KEY
▸ Choose the number of times to shard your tables:
▸ shardKey = input_id % 32
▸ Created PLpgSQL functions:
▸
create unlogged table if not exists job_0_responses (
response_id serial primary key,
input_id integer not null,
body_hash varchar(64) not null,
resp_url bytea not null,
resp_uuid varchar(64) unique not null,
resp_type_id integer references resp_types (resp_type_id) not null,
status_id integer references status_lines (status_id) not null,
status_code integer,
mime_type_id integer references mime_types (mime_type_id) not null,
response_time bigint
);
EXECUTE merge_headers(job, shardKey)
VERACODE
DB WRITERS: STORING THE DATA
CONS WITH SHARDING
▸ Added complexity for querying
▸ Best to create a new table with all data for reporting
▸ In the future, may use Citus for sharding across multiple
databases
VERACODE
DB WRITERS: STORING THE DATA
RESPONSE DATA (JS/HTML)
VERACODE
▸ S3 limits 100/rps, but pushing 200-2000/rps
▸ Had to contact support
▸ Exponential Backoff, retry 10 times
▸ Hash is stored in response table
▸ HeadObject first to check existence, then PutObject
▸ HeadObjects are way cheaper
DB WRITERS: STORING THE DATA
MOVING TO S3
VERACODE
DB WRITERS: STORING THE DATA
LASTLY…
▸ Created unlogged tables
▸ Modified PostgreSQL configuration:
▸ Set checkpoints 5 minutes (max) instead of 1
▸ Enabled fsync
▸ Set max_wal_size 256
THE RESULTS
A LOOK AT THE DATA
VERACODE
THE RESULTS: A LOOK AT DATA
SCAN STATISTICS
Responses 72,193,155
Headers 525,385,900
JS Results 1,943,925
URLs w/Errors 67,315
Redirected to HTTPS 145,268
URLS w/CSP Violations 740
Scan Time 15 Hours
Cost 343$ / 35063円
VERACODE
THE RESULTS: A LOOK AT DATA
CSP VIOLATIONS
▸ 722 out of 4965 sites using CSP had violations
▸ Security sites:
▸ https://www.globalsign.com/en/, http://secunia.com/,
▸ https://lastpass.com/, https://www.avant.com/, http://
www.veracode.com/
▸ Well known organizations:
▸ http://www.alibaba.com, https://www.doubleclickbygoogle.com
▸ https://mozillians.org/en-US/
VERACODE
THE RESULTS: A LOOK AT DATA
SUM OF CSP VIOLATION TYPES
0
750
1500
2250
3000
SCRIPTSRC
IMGSRC
FRAMESRC
FONTSRC
STYLESRC
CONNECTSRC
MEDIASRC
CHILDSRC
OBJECTSRC
BASEURI
FORMACTIONMANIFESTSRC
VERACODE
THE RESULTS: A LOOK AT DATA
TOP JAVASCRIPT LIBRARIES > 3000
0
200000
400000
600000
800000
JQUERY
JQUERY-UI
MODERNIZR
JQUERY-UI-DIALOG
YEPNOPE
JQUERY-UI-AUTOCOMPLETE
JQUERY-UI-TOOLTIP
BOOTSTRAP
HTML5SHIV
UNDERSCORE
JQUERY.PRETTYPHOTO
PROTOTYPEJS
DRUPAL
MOOTOOLS
MEJS
BACKBONE.JS
ANGULARJS
FOUNDATION
JWPLAYER
REQUIREJS
HANDLEBARS.JS
HAMMERJS
JPLAYER
MUSTACHE.JS
SCRIPTACULOUS
SHADOWBOX
ZEROCLIPBOARD
YUI
RAPHAEL
DATATABLES
KNOCKOUT
VERACODE
THE RESULTS: A LOOK AT DATA
JAVASCRIPT ‘NEXTGEN’ FRAMEWORKS > 100
0
4500
9000
13500
18000
BACKBONE.JS
ANGULARJS
FOUNDATION
YUI
KNOCKOUT
DOJO
REACTJS
MARIONETTEJS
VUEJS
EMBER
METEOR
MITHRIL
EXTJS
POLYMER
VERACODE
THE RESULTS: A LOOK AT DATA
VULNERABILITY COUNTS
0
20000
40000
60000
80000
JQUERY
JQUERY-UI-DIALOG
JQUERY.PRETTYPHOTO
ANGULARJS
JQUERY-UI-TOOLTIP
JPLAYER
HANDLEBARS.JS
ZEROCLIPBOARD
MUSTACHE.JS
YUI
PROTOTYPEJS
MEJS
JWPLAYER
DOJO
EMBER
TINYMCE
PLUPLOAD
JQUERY-MOBILE
CKEDITOR
VERACODE
THE RESULTS: A LOOK AT DATA
LONGEST SECURITY HEADER AWARD - HTTPS://WWW.INSIGHTGUIDES.COM/
Content-Security-Policy: default-src 'self' http://tagmanager.google.com https://tagmanager.google.com https://*.doubleclick.net http://*.doubleclick.net https://*.google-
analytics.com http://*.google-analytics.com https://*.livechatinc.com http://*.livechatinc.com https://*.cloudfront.net http://*.cloudfront.net https://*.googleusercontent.com http://
*.googleusercontent.com https://www.bugherd.com http://www.bugherd.com https://*.braintreegateway.com http://*.braintreegateway.com https://www.biblioimages.com http://
www.biblioimages.com https://fonts.gstatic.com http://fonts.gstatic.com https://*.googleapis.com http://*.googleapis.com https://tripadvisor.com http://tripadvisor.com https://
*.gstatic.com http://*.gstatic.com https://www.tripadvisor.com http://www.tripadvisor.com https://www.insightguides.com http://www.insightguides.com https://rum-
static.pingdom.net http://rum-static.pingdom.net https://rum-collector.pingdom.net http://rum-collector.pingdom.net https://*.youtube.com http://*.youtube.com https://
www.googleadservices.com http://www.googleadservices.com https://connect.facebook.net http://connect.facebook.net https://googleads.g.doubleclick.net http://
googleads.g.doubleclick.net https://www.facebook.com http://www.facebook.com https://cdn.inspectlet.com http://cdn.inspectlet.com https://hn.inspectlet.com http://
hn.inspectlet.com https://*.apa.yoda.site http://*.apa.yoda.site https://www.preprod.apa.yoda.site http://www.preprod.apa.yoda.site https://www.test.apa.yoda.site http://
www.test.apa.yoda.site https://www.google.com http://www.google.com https://www.google.pl http://www.google.pl https://www.google.co.uk http://www.google.co.uk https://
google.com http://google.com https://google.pl http://google.pl https://google.co.uk http://google.co.uk https://ethn.io http://ethn.io https://stats.g.doubleclick.net http://
stats.g.doubleclick.net https://platform.instagram.com http://platform.instagram.com https://instagram.com http://instagram.com https://www.instagram.com http://
www.instagram.com https://*.amazonaws.com http://*.amazonaws.com blob:; script-src 'self' http://www.googletagmanager.com https://www.googletagmanager.com http://
tagmanager.google.com https://tagmanager.google.com https://*.doubleclick.net http://*.doubleclick.net https://*.google-analytics.com http://*.google-analytics.com https://
*.livechatinc.com http://*.livechatinc.com https://*.cloudfront.net http://*.cloudfront.net https://*.googleusercontent.com http://*.googleusercontent.com https://www.bugherd.com
http://www.bugherd.com https://*.braintreegateway.com http://*.braintreegateway.com https://www.biblioimages.com http://www.biblioimages.com https://fonts.gstatic.com http://
fonts.gstatic.com https://*.googleapis.com http://*.googleapis.com https://tripadvisor.com http://tripadvisor.com https://*.gstatic.com http://*.gstatic.com https://www.tripadvisor.com
http://www.tripadvisor.com https://www.insightguides.com http://www.insightguides.com https://rum-static.pingdom.net http://rum-static.pingdom.net https://rum-
collector.pingdom.net http://rum-collector.pingdom.net https://*.youtube.com http://*.youtube.com https://www.googleadservices.com http://www.googleadservices.com https://
connect.facebook.net http://connect.facebook.net https://googleads.g.doubleclick.net http://googleads.g.doubleclick.net https://www.facebook.com http://www.facebook.com
https://cdn.inspectlet.com http://cdn.inspectlet.com https://hn.inspectlet.com http://hn.inspectlet.com https://*.apa.yoda.site http://*.apa.yoda.site https://www.preprod.apa.yoda.site
http://www.preprod.apa.yoda.site https://www.test.apa.yoda.site http://www.test.apa.yoda.site https://www.google.com http://www.google.com https://www.google.pl http://
www.google.pl https://www.google.co.uk http://www.google.co.uk https://google.com http://google.com https://google.pl http://google.pl https://google.co.uk http://google.co.uk
https://ethn.io http://ethn.io https://stats.g.doubleclick.net http://stats.g.doubleclick.net https://platform.instagram.com http://platform.instagram.com https://instagram.com http://
instagram.com https://www.instagram.com http://www.instagram.com https://*.amazonaws.com http://*.amazonaws.com 'unsafe-eval' 'unsafe-inline' https://apis.google.com blob:;
connect-src * 'self' http://tagmanager.google.com https://tagmanager.google.com https://*.doubleclick.net http://*.doubleclick.net https://*.google-analytics.com http://*.google-
analytics.com https://*.livechatinc.com http://*.livechatinc.com https://*.cloudfront.net http://*.cloudfront.net https://*.googleusercontent.com http://*.googleusercontent.com https://
www.bugherd.com http://www.bugherd.com https://*.braintreegateway.com http://*.braintreegateway.com https://www.biblioimages.com http://www.biblioimages.com https://
fonts.gstatic.com http://fonts.gstatic.com https://*.googleapis.com http://*.googleapis.com https://tripadvisor.com http://tripadvisor.com https://*.gstatic.com http://*.gstatic.com
https://www.tripadvisor.com http://www.tripadvisor.com https://www.insightguides.com http://www.insightguides.com https://rum-static.pingdom.net http://rum-static.pingdom.net
https://rum-collector.pingdom.net http://rum-collector.pingdom.net https://*.youtube.com http://*.youtube.com https://www.googleadservices.com http://www.googleadservices.com
https://connect.facebook.net http://connect.facebook.net https://googleads.g.doubleclick.net http://googleads.g.doubleclick.net https://www.facebook.com http://
www.facebook.com https://cdn.inspectlet.com http://cdn.inspectlet.com https://hn.inspectlet.com http://hn.inspectlet.com https://*.apa.yoda.site http://*.apa.yoda.site https://
www.preprod.apa.yoda.site http://www.preprod.apa.yoda.site https://www.test.apa.yoda.site http://www.test.apa.yoda.site https://www.google.com http://www.google.com https://
www.google.pl http://www.google.pl https://www.google.co.uk http://www.google.co.uk https://google.com http://google.com https://google.pl http://google.pl https://
google.co.uk http://google.co.uk https://ethn.io http://ethn.io https://stats.g.doubleclick.net http://stats.g.doubleclick.net https://platform.instagram.com http://
platform.instagram.com https://instagram.com http://instagram.com https://www.instagram.com http://www.instagram.com https://*.amazonaws.com http://*.amazonaws.com blob:;
VERACODE
THE RESULTS: A LOOK AT DATA
SOME OF MY FAVORITE HTTP STATUS LINES
▸ HTTP 500 access denied ("java.io.FilePermission" "D:
homeXXXXXXXXX.comoriModelGlueunityeventrequ
estEventRequest.cfc" "read")
▸ HTTP 500 "Duplicate entry '1473335051' for key
'timestamp' SQL=INSERT INTO `#__zt_visitor_counter`
(`id`,`timestamp`,`visits`,`guests`,`ipaddress`,`useragent`)
VALUES (null, '1473335051', 1 , 1 , '54.208.81.16',
‘chrome')"
▸ HTTP 500 "Server Made Big Boo"
“NO HACKING”
ABSOLUTE FAVORITE STATUS LINE
VERACODE
THE RESULTS: A LOOK AT DATA
CONCLUSION
▸ Use NSQ, seriously.
▸ Concurrency can be difficult
▸ Batch data before inserting to DB
▸ If DB rows > a few million, consider sharding
▸ Test different types of table schema for performance
▸ Treat browsers like garbage and handle appropriately
VERACODE
THE RESULTS: A LOOK AT DATA
QUESTIONS?
▸ twitter: @_wirepair
▸ github: wirepair
▸ gcd: https://github.com/wirepair/gcd
▸ autogcd: https://github.com/wirepair/autogcd
▸ killface: https://github.com/wirepair/killface
▸ Thanks to all my coworkers supporting and listening to my
daily rants!

More Related Content

More from CODE BLUE

[cb22] Hayabusa Threat Hunting and Fast Forensics in Windows environments fo...
[cb22] Hayabusa  Threat Hunting and Fast Forensics in Windows environments fo...[cb22] Hayabusa  Threat Hunting and Fast Forensics in Windows environments fo...
[cb22] Hayabusa Threat Hunting and Fast Forensics in Windows environments fo...CODE BLUE
 
[cb22] Tales of 5G hacking by Karsten Nohl
[cb22] Tales of 5G hacking by Karsten Nohl[cb22] Tales of 5G hacking by Karsten Nohl
[cb22] Tales of 5G hacking by Karsten NohlCODE BLUE
 
[cb22] Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...
[cb22]  Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...[cb22]  Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...
[cb22] Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...CODE BLUE
 
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...CODE BLUE
 
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(4) by 板橋 博之
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(4) by 板橋 博之[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(4) by 板橋 博之
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(4) by 板橋 博之CODE BLUE
 
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...CODE BLUE
 
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(3) by Lorenzo Pupillo
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(3) by Lorenzo Pupillo[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(3) by Lorenzo Pupillo
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(3) by Lorenzo PupilloCODE BLUE
 
[cb22] ”The Present and Future of Coordinated Vulnerability Disclosure” Inte...
[cb22]  ”The Present and Future of Coordinated Vulnerability Disclosure” Inte...[cb22]  ”The Present and Future of Coordinated Vulnerability Disclosure” Inte...
[cb22] ”The Present and Future of Coordinated Vulnerability Disclosure” Inte...CODE BLUE
 
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(2)by Allan Friedman
[cb22]  「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(2)by Allan Friedman [cb22]  「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(2)by Allan Friedman
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(2)by Allan Friedman CODE BLUE
 
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...CODE BLUE
 
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション (1)by 高橋 郁夫
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション (1)by  高橋 郁夫[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション (1)by  高橋 郁夫
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション (1)by 高橋 郁夫CODE BLUE
 
[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...
[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...
[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...CODE BLUE
 
[cb22] Wslinkのマルチレイヤーな仮想環境について by Vladislav Hrčka
[cb22] Wslinkのマルチレイヤーな仮想環境について by Vladislav Hrčka [cb22] Wslinkのマルチレイヤーな仮想環境について by Vladislav Hrčka
[cb22] Wslinkのマルチレイヤーな仮想環境について by Vladislav Hrčka CODE BLUE
 
[cb22] Under the hood of Wslink’s multilayered virtual machine en by Vladisla...
[cb22] Under the hood of Wslink’s multilayered virtual machine en by Vladisla...[cb22] Under the hood of Wslink’s multilayered virtual machine en by Vladisla...
[cb22] Under the hood of Wslink’s multilayered virtual machine en by Vladisla...CODE BLUE
 
[cb22] CloudDragon’s Credential Factory is Powering Up Its Espionage Activiti...
[cb22] CloudDragon’s Credential Factory is Powering Up Its Espionage Activiti...[cb22] CloudDragon’s Credential Factory is Powering Up Its Espionage Activiti...
[cb22] CloudDragon’s Credential Factory is Powering Up Its Espionage Activiti...CODE BLUE
 
[cb22] From Parroting to Echoing: The Evolution of China’s Bots-Driven Info...
[cb22]  From Parroting to Echoing:  The Evolution of China’s Bots-Driven Info...[cb22]  From Parroting to Echoing:  The Evolution of China’s Bots-Driven Info...
[cb22] From Parroting to Echoing: The Evolution of China’s Bots-Driven Info...CODE BLUE
 
[cb22] Who is the Mal-Gopher? - Implementation and Evaluation of “gimpfuzzy”...
[cb22]  Who is the Mal-Gopher? - Implementation and Evaluation of “gimpfuzzy”...[cb22]  Who is the Mal-Gopher? - Implementation and Evaluation of “gimpfuzzy”...
[cb22] Who is the Mal-Gopher? - Implementation and Evaluation of “gimpfuzzy”...CODE BLUE
 
[cb22] Mal-gopherとは?Go系マルウェアの分類のためのgimpfuzzy実装と評価 by 澤部 祐太, 甘粕 伸幸, 野村 和也
[cb22] Mal-gopherとは?Go系マルウェアの分類のためのgimpfuzzy実装と評価 by 澤部 祐太, 甘粕 伸幸, 野村 和也[cb22] Mal-gopherとは?Go系マルウェアの分類のためのgimpfuzzy実装と評価 by 澤部 祐太, 甘粕 伸幸, 野村 和也
[cb22] Mal-gopherとは?Go系マルウェアの分類のためのgimpfuzzy実装と評価 by 澤部 祐太, 甘粕 伸幸, 野村 和也CODE BLUE
 
[cb22] Tracking the Entire Iceberg - Long-term APT Malware C2 Protocol Emulat...
[cb22] Tracking the Entire Iceberg - Long-term APT Malware C2 Protocol Emulat...[cb22] Tracking the Entire Iceberg - Long-term APT Malware C2 Protocol Emulat...
[cb22] Tracking the Entire Iceberg - Long-term APT Malware C2 Protocol Emulat...CODE BLUE
 
[cb22] Fight Against Malware Development Life Cycle by Shusei Tomonaga and Yu...
[cb22] Fight Against Malware Development Life Cycle by Shusei Tomonaga and Yu...[cb22] Fight Against Malware Development Life Cycle by Shusei Tomonaga and Yu...
[cb22] Fight Against Malware Development Life Cycle by Shusei Tomonaga and Yu...CODE BLUE
 

More from CODE BLUE (20)

[cb22] Hayabusa Threat Hunting and Fast Forensics in Windows environments fo...
[cb22] Hayabusa  Threat Hunting and Fast Forensics in Windows environments fo...[cb22] Hayabusa  Threat Hunting and Fast Forensics in Windows environments fo...
[cb22] Hayabusa Threat Hunting and Fast Forensics in Windows environments fo...
 
[cb22] Tales of 5G hacking by Karsten Nohl
[cb22] Tales of 5G hacking by Karsten Nohl[cb22] Tales of 5G hacking by Karsten Nohl
[cb22] Tales of 5G hacking by Karsten Nohl
 
[cb22] Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...
[cb22]  Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...[cb22]  Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...
[cb22] Your Printer is not your Printer ! - Hacking Printers at Pwn2Own by A...
 
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...
 
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(4) by 板橋 博之
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(4) by 板橋 博之[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(4) by 板橋 博之
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(4) by 板橋 博之
 
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...
 
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(3) by Lorenzo Pupillo
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(3) by Lorenzo Pupillo[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(3) by Lorenzo Pupillo
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(3) by Lorenzo Pupillo
 
[cb22] ”The Present and Future of Coordinated Vulnerability Disclosure” Inte...
[cb22]  ”The Present and Future of Coordinated Vulnerability Disclosure” Inte...[cb22]  ”The Present and Future of Coordinated Vulnerability Disclosure” Inte...
[cb22] ”The Present and Future of Coordinated Vulnerability Disclosure” Inte...
 
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(2)by Allan Friedman
[cb22]  「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(2)by Allan Friedman [cb22]  「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(2)by Allan Friedman
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション(2)by Allan Friedman
 
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...
[cb22] "The Present and Future of Coordinated Vulnerability Disclosure" Inter...
 
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション (1)by 高橋 郁夫
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション (1)by  高橋 郁夫[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション (1)by  高橋 郁夫
[cb22] 「協調された脆弱性開示の現在と未来」国際的なパネルディスカッション (1)by 高橋 郁夫
 
[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...
[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...
[cb22] Are Embedded Devices Ready for ROP Attacks? -ROP verification for low-...
 
[cb22] Wslinkのマルチレイヤーな仮想環境について by Vladislav Hrčka
[cb22] Wslinkのマルチレイヤーな仮想環境について by Vladislav Hrčka [cb22] Wslinkのマルチレイヤーな仮想環境について by Vladislav Hrčka
[cb22] Wslinkのマルチレイヤーな仮想環境について by Vladislav Hrčka
 
[cb22] Under the hood of Wslink’s multilayered virtual machine en by Vladisla...
[cb22] Under the hood of Wslink’s multilayered virtual machine en by Vladisla...[cb22] Under the hood of Wslink’s multilayered virtual machine en by Vladisla...
[cb22] Under the hood of Wslink’s multilayered virtual machine en by Vladisla...
 
[cb22] CloudDragon’s Credential Factory is Powering Up Its Espionage Activiti...
[cb22] CloudDragon’s Credential Factory is Powering Up Its Espionage Activiti...[cb22] CloudDragon’s Credential Factory is Powering Up Its Espionage Activiti...
[cb22] CloudDragon’s Credential Factory is Powering Up Its Espionage Activiti...
 
[cb22] From Parroting to Echoing: The Evolution of China’s Bots-Driven Info...
[cb22]  From Parroting to Echoing:  The Evolution of China’s Bots-Driven Info...[cb22]  From Parroting to Echoing:  The Evolution of China’s Bots-Driven Info...
[cb22] From Parroting to Echoing: The Evolution of China’s Bots-Driven Info...
 
[cb22] Who is the Mal-Gopher? - Implementation and Evaluation of “gimpfuzzy”...
[cb22]  Who is the Mal-Gopher? - Implementation and Evaluation of “gimpfuzzy”...[cb22]  Who is the Mal-Gopher? - Implementation and Evaluation of “gimpfuzzy”...
[cb22] Who is the Mal-Gopher? - Implementation and Evaluation of “gimpfuzzy”...
 
[cb22] Mal-gopherとは?Go系マルウェアの分類のためのgimpfuzzy実装と評価 by 澤部 祐太, 甘粕 伸幸, 野村 和也
[cb22] Mal-gopherとは?Go系マルウェアの分類のためのgimpfuzzy実装と評価 by 澤部 祐太, 甘粕 伸幸, 野村 和也[cb22] Mal-gopherとは?Go系マルウェアの分類のためのgimpfuzzy実装と評価 by 澤部 祐太, 甘粕 伸幸, 野村 和也
[cb22] Mal-gopherとは?Go系マルウェアの分類のためのgimpfuzzy実装と評価 by 澤部 祐太, 甘粕 伸幸, 野村 和也
 
[cb22] Tracking the Entire Iceberg - Long-term APT Malware C2 Protocol Emulat...
[cb22] Tracking the Entire Iceberg - Long-term APT Malware C2 Protocol Emulat...[cb22] Tracking the Entire Iceberg - Long-term APT Malware C2 Protocol Emulat...
[cb22] Tracking the Entire Iceberg - Long-term APT Malware C2 Protocol Emulat...
 
[cb22] Fight Against Malware Development Life Cycle by Shusei Tomonaga and Yu...
[cb22] Fight Against Malware Development Life Cycle by Shusei Tomonaga and Yu...[cb22] Fight Against Malware Development Life Cycle by Shusei Tomonaga and Yu...
[cb22] Fight Against Malware Development Life Cycle by Shusei Tomonaga and Yu...
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

[CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

  • 1. ISAAC DAWSON, AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION VERACODE 15
  • 2. VERACODE AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION ABOUT ME: ▸ Previously at @stake, Symantec (10 years) ▸ Moved into research role at Veracode, Inc. (6 years) ▸ Living in Japan for 12 years ▸ I <3
  • 3. VERACODE AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION IT ALL STARTED IN 2012…
  • 4. VERACODE AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION SECURITY HEADER SCANNING HISTORY ▸ All scanners use the Alexa Top 1 Million URLs ▸ Galexa (November 2012 - March 2014) ▸ Golexa (March 2014 - February 2016) ▸ Creeper v0-v1 (February 2016 - July 2016) ▸ Creeper v2 (July 2016 - …)
  • 6. VERACODE AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION SUMMARY OF SYSTEMS & COMPONENTS ▸ Admin (x1) - Manages jobs ▸ Agents (x50) - Analyzes URLs ▸ DB Writers (x4) - Feeds analysis data into the DB & S3 ▸ Database (x1) - PostgreSQL 9.5 DB ▸ NSQ - A message queue for URLs, reports and responses ▸ S3 - Stores serialized DOM and HTML/JS
  • 7. VERACODE AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION THE MESSAGE QUEUE -NSQD, NSQLOOKUPD ▸ NSQ is an easy to deploy message queue ▸ JSON messages between all systems ▸ All agents point to Admin service running NSQLookupd
  • 8. VERACODE AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION HELPFUL NSQ FEATURES // Create consumer c.urlConsumer, err = nsq.NewConsumer(job.Topics["url"], creeper_types.UrlChannel, cfg) // Process numBrowser of messages concurrently (7) c.urlConsumer.AddConcurrentHandlers( nsq.HandlerFunc(c.processUrls), numBrowsers) 
 // Job taking too long to handle/process a message? msg.Touch() // notify we are still working on this message // Need to requeue because chrome crashed? msg.RequeueWithoutBackoff(-1) // Need to change max # of inflight messages? c.urlConsumer.ChangeMaxInFlight(c.getInflightCount()) 1 2 3 4
  • 9. VERACODE DATA STORAGE AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION DATAFLOW DB AGENT ADMIN WRITER WRITER WRITER S3 AGENT AGENT
  • 12. VERACODE CREEPER AGENTS: GETTING THE DATA BROWSER AUTOMATION REQUIREMENTS ▸ Automatable ▸ Fast ▸ Capture network ▸ Capture various browser events (CSP violations) ▸ Inject JavaScript
  • 13. VERACODE CREEPER AGENTS: GETTING THE DATA CHOSE CHROME, FOR OBVIOUS REASONS… ▸ Each agent runs 3-6 tabs concurrently ▸ Headless, uses Xvfb ▸ Can get full read access to network response data ▸ Easily inject javascript ▸ Can subscribe to console messages
  • 14. VERACODE CREEPER AGENTS: GETTING THE DATA AGENT DESIGN CREEPER AGENT BROWSER MANAGER ANALYZER REPORTER APP LOGIC
  • 16. VERACODE CREEPER AGENTS: GETTING THE DATA GOOGLE CHROME REMOTE DEBUGGER ▸ Huge definition files: browser_protocol.json and js_protocol.json { "version": { "major": "1", "minor": "1" }, "domains": [{ "domain": "Inspector", "hidden": true, "types": [], "commands": [{ "name": "enable", "description": "Enables inspector domain...”, "handlers": ["browser", "renderer"] }], "events": [{ "name": "evaluateForTestInFrontend", "parameters": [ … ] }], } }
  • 17. VERACODE CREEPER AGENTS: GETTING THE DATA GCD ▸ GCD generates Go code using templates ▸ Remote access to debugger events, functions, types. ▸ Can be updated easily as the protocol files change
  • 18. VERACODE CREEPER AGENTS: GETTING THE DATA GCD WAS GOOD BUT… ▸ Needed something better ▸ Built autogcd to automate: ▸ Trapping console messages ▸ Intercepting network data ▸ Injecting JS ▸ Took some inspiration from WebDriver
  • 19. VERACODE CREEPER AGENTS: GETTING THE DATA GETTING CSP EVENTS func (b *Browser) StartIntercepting() error { b.tab.GetConsoleMessages(b.cspHandler()) return nil } func (b *Browser) cspHandler() autogcd.ConsoleMessageFunc { return func(tab *autogcd.Tab, message *gcdapi.ConsoleConsoleMessage) { if message.Source != "security" { return } parseCsp(b.creeperData.CspResults, b.creeperData.ReportOnlyCspResults, message.Text) } } 1 2
  • 20. VERACODE CREEPER AGENTS: GETTING THE DATA TRAPPING NETWORK RESPONSES func (b *Browser) StartIntercepting() error { b.tab.GetNetworkTraffic(nil, b.responseHandler(), b.respFinishedHandler()) } func (b *Browser) responseHandler() autogcd.NetworkResponseHandlerFunc { return func(tab *autogcd.Tab, response *autogcd.NetworkResponse) { creeperResponse.Url = response.Response.Url b.networkContainer.WaitFor(response.RequestId) creeperResponse.ResponseBody, _ = b.encodeBody(response.RequestId, creeperResponse.MimeType, creeperResponse.Url) b.networkContainer.AddReady(creeperResponse) } } // mark the body as ready func (b *Browser) respFinishedHandler() autogcd.NetworkFinishedHandlerFunc { return func(tab *autogcd.Tab, requestId string, dataLength, timeStamp float64) { b.networkContainer.BodyReady(requestId) } } 1 2 3 4
  • 21. VERACODE CREEPER AGENTS: GETTING THE DATA INJECTING JAVASCRIPT ▸ Extract JS libraries and versions ▸ Retire.js and Wappalyzer have some good pointers ▸ Created a JSON file with 86 frameworks ▸ Must wait for the page to be fully loaded
  • 22. VERACODE CREEPER AGENTS: GETTING THE DATA INJECTING JAVASCRIPT - THE QUERIES { "libraries": [ { "url": "http://jquery.com/", "key": "jquery", "statement": "jQuery.fn.jquery" }, { "url": "https://jquerymobile.com/", "key": "jquery-mobile", "statement": "jQuery.mobile.version" }, { "url": "http://www.embeddedjs.com/", "key": "embeddedjs 1.0", "statement": "(typeof EJS === "function" && typeof EJS.Buffer === "function") ? "ejs 1.0":""" }, { "url": "http://www.embeddedjs.com/", "key": "embeddedjs 0.x", "statement": "(typeof EJS === "function" && typeof EjsScanner === "function") ? "ejs 0.x":""" } ] }
  • 23. VERACODE CREEPER AGENTS: GETTING THE DATA INJECTING JAVASCRIPT - INJECTING for _, library := range JsLibs.Libraries { res, err := b.ExecuteScript(library.Statement) if err == nil && string(res) != "" { log.Printf("%s library result was: %sn", library.Key, string(res)) report.JavaScriptLibraries[library.Key] = string(res) } }
  • 24. VERACODE CREEPER AGENTS: GETTING THE DATA INJECTING JAVASCRIPT - WHEN IS A PAGE DONE? ▸ DOMContentLoaded doesn’t handle dynamically loaded JS ▸ Listen for DOM change events ▸ Page loaded if no DOM change events occur for > 2 seconds ▸ Timeout after 5 seconds
  • 26. VERACODE CREEPER AGENTS: GETTING THE DATA CHALLENGES - CONTAMINATION + + + + | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | + + + + Start Capture Load URL Document Loaded Stop Capture
  • 27. VERACODE CREEPER AGENTS: GETTING THE DATA CHALLENGES - CONTAMINATION - SOLUTION + + + + + + + | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | + + + + + + + Borrow Browser Start Capture Load URL Document Loaded Stop Capture Kill Browser Start/Add Pool
  • 28. VERACODE CREEPER AGENTS: GETTING THE DATA CHALLENGES - CHROME BUG #1 ▸ Turns out opening tabs excessively can cause tabs to not respond to debugger protocol
  • 29. VERACODE CREEPER AGENTS: GETTING THE DATA CHALLENGES - CHROME BUG #1 - SOLUTION ▸ Mark tabs as ‘dead’ ▸ If max dead tab count is reached, drain active URLs and kill chrome
  • 31. VERACODE CREEPER AGENTS: GETTING THE DATA CHALLENGES - CHROME BUG #2 - CHRASHSAFARI.COM ▸ Would completely kill chrome *and* agent ▸ Lost all active tabs ▸ This site cost me about 2-3 weeks development time
  • 32. VERACODE ▸ Created killface package ▸ Sends a notification to stop active work ▸ Worker count dynamically adjusted to 1 ▸ Pauses queue, runs all unfinished URLs again ▸ Once active count is 0, restart normally CREEPER AGENTS: GETTING THE DATA CHALLENGES - CRASHSAFARI.COM - SOLUTION
  • 33. VERACODE CREEPER AGENTS: GETTING THE DATA OTHER CHALLENGES ✘ NSQ messages too large, zipping ineffective ✓Split response data/report data ✘ Sites block AWS IP ranges, (craigslist.com etc) ☹ Timeout… ✘ Concurrency issues ✓ Very careful use of go routines, channels and timers. ✘ Site analysis failures/timeouts ✓ Try 3 times, keep track of retry state. ✓ During retry, open a new browser and work on additional url
  • 34. DB WRITERS & S3 STORING THE DATA WITH:
  • 35. VERACODE DB WRITERS: STORING THE DATA PREVIOUSLY… ▸ Creeper v0 had many problems ▸ RDS did not support PostgreSQL 9.5 ▸ Duplicate data ▸ For v1, wrote to disk, SHA1 of contents: ▸ /job/files/5/a/b/c/5abcfbe73e39e0572a939b09f1eb16d7.html ▸ v1 did not shard database tables ▸ Database tables were normalized ▸ Lock contention
  • 36. VERACODE DB WRITERS: STORING THE DATA DATABASE REFRESHER - NORMALIZING url header_name header_value http://veracode.com x-xss-protection 1; mode=block http://codeblue.jp x-xss-protection 1; mode=block http://google.jp x-xss-protection 1; mode=block report-uri url header_name_id header_value_id http://veracode.com 0 0 http://codeblue.jp 0 0 http://google.jp 0 1 header_name_id header_name 0 x-xss-protection header_value_id header_value 0 1; mode=block 1 1; mode=block report-uri … NORMALIZED: FLATTENED:
  • 37. VERACODE DB WRITERS: STORING THE DATA CHALLENGES - GETTING THE DATA IN QUICKLY ▸ Get the data out of the DB writers as soon as possible ▸ Careful to not overload the database with many connections ▸ Reduce lock contention for writing
  • 38. VERACODE DB WRITERS: STORING THE DATA SOLUTION #1 - GETTING THE DATA IN QUICKLY ▸ DB Writers batch up reports and responses ▸ Inserted every 2.5-3.5 seconds ▸ Reduces number of required DB connections
  • 39. VERACODE DB WRITERS: STORING THE DATA SOLUTION #1 BATCHER func (b *Batcher) AddReport(r *creeper_types.CreeperReport) { select { case b.reportPool <- r: atomic.AddInt32(&b.reportCount, 1) } } func (b *Batcher) EmptyReports() []*creeper_types.CreeperReport { reports := make([]*creeper_types.CreeperReport, 0) for { select { case report := <-b.reportPool: reports = append(reports, report) default: return reports } } return nil }
  • 40. VERACODE DB WRITERS: STORING THE DATA SOLUTION #2 - GETTING THE DATA IN QUICKLY ▸ Insert into temporary table using COPY FROM ▸ Extracted from temporary table and INSERTed into final table. This allows for UPSERTS: INSERT INTO header_names (header_name) SELECT responses_tmp.header_name FROM responses_tmp ON CONFLICT DO NOTHING;
  • 41. VERACODE DB WRITERS: STORING THE DATA CHALLENGES - LARGE TABLES ▸ INSERT INTO … FROM SELECT … on a table with 80,000,000 rows ▸ As tables got bigger, db writers slowed down ▸ This is not scalable
  • 42. VERACODE DB WRITERS: STORING THE DATA SOLUTION - TABLE SHARDING ▸ Much like sharding for the file system ▸ Requires a key: ▸ URL ID. (Ex: 1,google.com 2,microsoft.com etc) ▸ Only large tables require sharding
  • 43. VERACODE shardKey % inputId shardKey = 1 shardKey = 2 shardKey = 3 DB DB WRITERS: STORING THE DATA TABLE SHARDING WRITER
  • 44. VERACODE DB WRITERS: STORING THE DATA CREATING A SHARD KEY ▸ Choose the number of times to shard your tables: ▸ shardKey = input_id % 32 ▸ Created PLpgSQL functions: ▸ create unlogged table if not exists job_0_responses ( response_id serial primary key, input_id integer not null, body_hash varchar(64) not null, resp_url bytea not null, resp_uuid varchar(64) unique not null, resp_type_id integer references resp_types (resp_type_id) not null, status_id integer references status_lines (status_id) not null, status_code integer, mime_type_id integer references mime_types (mime_type_id) not null, response_time bigint ); EXECUTE merge_headers(job, shardKey)
  • 45. VERACODE DB WRITERS: STORING THE DATA CONS WITH SHARDING ▸ Added complexity for querying ▸ Best to create a new table with all data for reporting ▸ In the future, may use Citus for sharding across multiple databases
  • 46. VERACODE DB WRITERS: STORING THE DATA RESPONSE DATA (JS/HTML)
  • 47. VERACODE ▸ S3 limits 100/rps, but pushing 200-2000/rps ▸ Had to contact support ▸ Exponential Backoff, retry 10 times ▸ Hash is stored in response table ▸ HeadObject first to check existence, then PutObject ▸ HeadObjects are way cheaper DB WRITERS: STORING THE DATA MOVING TO S3
  • 48. VERACODE DB WRITERS: STORING THE DATA LASTLY… ▸ Created unlogged tables ▸ Modified PostgreSQL configuration: ▸ Set checkpoints 5 minutes (max) instead of 1 ▸ Enabled fsync ▸ Set max_wal_size 256
  • 49. THE RESULTS A LOOK AT THE DATA
  • 50. VERACODE THE RESULTS: A LOOK AT DATA SCAN STATISTICS Responses 72,193,155 Headers 525,385,900 JS Results 1,943,925 URLs w/Errors 67,315 Redirected to HTTPS 145,268 URLS w/CSP Violations 740 Scan Time 15 Hours Cost 343$ / 35063円
  • 51. VERACODE THE RESULTS: A LOOK AT DATA CSP VIOLATIONS ▸ 722 out of 4965 sites using CSP had violations ▸ Security sites: ▸ https://www.globalsign.com/en/, http://secunia.com/, ▸ https://lastpass.com/, https://www.avant.com/, http:// www.veracode.com/ ▸ Well known organizations: ▸ http://www.alibaba.com, https://www.doubleclickbygoogle.com ▸ https://mozillians.org/en-US/
  • 52. VERACODE THE RESULTS: A LOOK AT DATA SUM OF CSP VIOLATION TYPES 0 750 1500 2250 3000 SCRIPTSRC IMGSRC FRAMESRC FONTSRC STYLESRC CONNECTSRC MEDIASRC CHILDSRC OBJECTSRC BASEURI FORMACTIONMANIFESTSRC
  • 53. VERACODE THE RESULTS: A LOOK AT DATA TOP JAVASCRIPT LIBRARIES > 3000 0 200000 400000 600000 800000 JQUERY JQUERY-UI MODERNIZR JQUERY-UI-DIALOG YEPNOPE JQUERY-UI-AUTOCOMPLETE JQUERY-UI-TOOLTIP BOOTSTRAP HTML5SHIV UNDERSCORE JQUERY.PRETTYPHOTO PROTOTYPEJS DRUPAL MOOTOOLS MEJS BACKBONE.JS ANGULARJS FOUNDATION JWPLAYER REQUIREJS HANDLEBARS.JS HAMMERJS JPLAYER MUSTACHE.JS SCRIPTACULOUS SHADOWBOX ZEROCLIPBOARD YUI RAPHAEL DATATABLES KNOCKOUT
  • 54. VERACODE THE RESULTS: A LOOK AT DATA JAVASCRIPT ‘NEXTGEN’ FRAMEWORKS > 100 0 4500 9000 13500 18000 BACKBONE.JS ANGULARJS FOUNDATION YUI KNOCKOUT DOJO REACTJS MARIONETTEJS VUEJS EMBER METEOR MITHRIL EXTJS POLYMER
  • 55. VERACODE THE RESULTS: A LOOK AT DATA VULNERABILITY COUNTS 0 20000 40000 60000 80000 JQUERY JQUERY-UI-DIALOG JQUERY.PRETTYPHOTO ANGULARJS JQUERY-UI-TOOLTIP JPLAYER HANDLEBARS.JS ZEROCLIPBOARD MUSTACHE.JS YUI PROTOTYPEJS MEJS JWPLAYER DOJO EMBER TINYMCE PLUPLOAD JQUERY-MOBILE CKEDITOR
  • 56. VERACODE THE RESULTS: A LOOK AT DATA LONGEST SECURITY HEADER AWARD - HTTPS://WWW.INSIGHTGUIDES.COM/ Content-Security-Policy: default-src 'self' http://tagmanager.google.com https://tagmanager.google.com https://*.doubleclick.net http://*.doubleclick.net https://*.google- analytics.com http://*.google-analytics.com https://*.livechatinc.com http://*.livechatinc.com https://*.cloudfront.net http://*.cloudfront.net https://*.googleusercontent.com http:// *.googleusercontent.com https://www.bugherd.com http://www.bugherd.com https://*.braintreegateway.com http://*.braintreegateway.com https://www.biblioimages.com http:// www.biblioimages.com https://fonts.gstatic.com http://fonts.gstatic.com https://*.googleapis.com http://*.googleapis.com https://tripadvisor.com http://tripadvisor.com https:// *.gstatic.com http://*.gstatic.com https://www.tripadvisor.com http://www.tripadvisor.com https://www.insightguides.com http://www.insightguides.com https://rum- static.pingdom.net http://rum-static.pingdom.net https://rum-collector.pingdom.net http://rum-collector.pingdom.net https://*.youtube.com http://*.youtube.com https:// www.googleadservices.com http://www.googleadservices.com https://connect.facebook.net http://connect.facebook.net https://googleads.g.doubleclick.net http:// googleads.g.doubleclick.net https://www.facebook.com http://www.facebook.com https://cdn.inspectlet.com http://cdn.inspectlet.com https://hn.inspectlet.com http:// hn.inspectlet.com https://*.apa.yoda.site http://*.apa.yoda.site https://www.preprod.apa.yoda.site http://www.preprod.apa.yoda.site https://www.test.apa.yoda.site http:// www.test.apa.yoda.site https://www.google.com http://www.google.com https://www.google.pl http://www.google.pl https://www.google.co.uk http://www.google.co.uk https:// google.com http://google.com https://google.pl http://google.pl https://google.co.uk http://google.co.uk https://ethn.io http://ethn.io https://stats.g.doubleclick.net http:// stats.g.doubleclick.net https://platform.instagram.com http://platform.instagram.com https://instagram.com http://instagram.com https://www.instagram.com http:// www.instagram.com https://*.amazonaws.com http://*.amazonaws.com blob:; script-src 'self' http://www.googletagmanager.com https://www.googletagmanager.com http:// tagmanager.google.com https://tagmanager.google.com https://*.doubleclick.net http://*.doubleclick.net https://*.google-analytics.com http://*.google-analytics.com https:// *.livechatinc.com http://*.livechatinc.com https://*.cloudfront.net http://*.cloudfront.net https://*.googleusercontent.com http://*.googleusercontent.com https://www.bugherd.com http://www.bugherd.com https://*.braintreegateway.com http://*.braintreegateway.com https://www.biblioimages.com http://www.biblioimages.com https://fonts.gstatic.com http:// fonts.gstatic.com https://*.googleapis.com http://*.googleapis.com https://tripadvisor.com http://tripadvisor.com https://*.gstatic.com http://*.gstatic.com https://www.tripadvisor.com http://www.tripadvisor.com https://www.insightguides.com http://www.insightguides.com https://rum-static.pingdom.net http://rum-static.pingdom.net https://rum- collector.pingdom.net http://rum-collector.pingdom.net https://*.youtube.com http://*.youtube.com https://www.googleadservices.com http://www.googleadservices.com https:// connect.facebook.net http://connect.facebook.net https://googleads.g.doubleclick.net http://googleads.g.doubleclick.net https://www.facebook.com http://www.facebook.com https://cdn.inspectlet.com http://cdn.inspectlet.com https://hn.inspectlet.com http://hn.inspectlet.com https://*.apa.yoda.site http://*.apa.yoda.site https://www.preprod.apa.yoda.site http://www.preprod.apa.yoda.site https://www.test.apa.yoda.site http://www.test.apa.yoda.site https://www.google.com http://www.google.com https://www.google.pl http:// www.google.pl https://www.google.co.uk http://www.google.co.uk https://google.com http://google.com https://google.pl http://google.pl https://google.co.uk http://google.co.uk https://ethn.io http://ethn.io https://stats.g.doubleclick.net http://stats.g.doubleclick.net https://platform.instagram.com http://platform.instagram.com https://instagram.com http:// instagram.com https://www.instagram.com http://www.instagram.com https://*.amazonaws.com http://*.amazonaws.com 'unsafe-eval' 'unsafe-inline' https://apis.google.com blob:; connect-src * 'self' http://tagmanager.google.com https://tagmanager.google.com https://*.doubleclick.net http://*.doubleclick.net https://*.google-analytics.com http://*.google- analytics.com https://*.livechatinc.com http://*.livechatinc.com https://*.cloudfront.net http://*.cloudfront.net https://*.googleusercontent.com http://*.googleusercontent.com https:// www.bugherd.com http://www.bugherd.com https://*.braintreegateway.com http://*.braintreegateway.com https://www.biblioimages.com http://www.biblioimages.com https:// fonts.gstatic.com http://fonts.gstatic.com https://*.googleapis.com http://*.googleapis.com https://tripadvisor.com http://tripadvisor.com https://*.gstatic.com http://*.gstatic.com https://www.tripadvisor.com http://www.tripadvisor.com https://www.insightguides.com http://www.insightguides.com https://rum-static.pingdom.net http://rum-static.pingdom.net https://rum-collector.pingdom.net http://rum-collector.pingdom.net https://*.youtube.com http://*.youtube.com https://www.googleadservices.com http://www.googleadservices.com https://connect.facebook.net http://connect.facebook.net https://googleads.g.doubleclick.net http://googleads.g.doubleclick.net https://www.facebook.com http:// www.facebook.com https://cdn.inspectlet.com http://cdn.inspectlet.com https://hn.inspectlet.com http://hn.inspectlet.com https://*.apa.yoda.site http://*.apa.yoda.site https:// www.preprod.apa.yoda.site http://www.preprod.apa.yoda.site https://www.test.apa.yoda.site http://www.test.apa.yoda.site https://www.google.com http://www.google.com https:// www.google.pl http://www.google.pl https://www.google.co.uk http://www.google.co.uk https://google.com http://google.com https://google.pl http://google.pl https:// google.co.uk http://google.co.uk https://ethn.io http://ethn.io https://stats.g.doubleclick.net http://stats.g.doubleclick.net https://platform.instagram.com http:// platform.instagram.com https://instagram.com http://instagram.com https://www.instagram.com http://www.instagram.com https://*.amazonaws.com http://*.amazonaws.com blob:;
  • 57. VERACODE THE RESULTS: A LOOK AT DATA SOME OF MY FAVORITE HTTP STATUS LINES ▸ HTTP 500 access denied ("java.io.FilePermission" "D: homeXXXXXXXXX.comoriModelGlueunityeventrequ estEventRequest.cfc" "read") ▸ HTTP 500 "Duplicate entry '1473335051' for key 'timestamp' SQL=INSERT INTO `#__zt_visitor_counter` (`id`,`timestamp`,`visits`,`guests`,`ipaddress`,`useragent`) VALUES (null, '1473335051', 1 , 1 , '54.208.81.16', ‘chrome')" ▸ HTTP 500 "Server Made Big Boo"
  • 59. VERACODE THE RESULTS: A LOOK AT DATA CONCLUSION ▸ Use NSQ, seriously. ▸ Concurrency can be difficult ▸ Batch data before inserting to DB ▸ If DB rows > a few million, consider sharding ▸ Test different types of table schema for performance ▸ Treat browsers like garbage and handle appropriately
  • 60. VERACODE THE RESULTS: A LOOK AT DATA QUESTIONS? ▸ twitter: @_wirepair ▸ github: wirepair ▸ gcd: https://github.com/wirepair/gcd ▸ autogcd: https://github.com/wirepair/autogcd ▸ killface: https://github.com/wirepair/killface ▸ Thanks to all my coworkers supporting and listening to my daily rants!