SlideShare a Scribd company logo
1 of 32
Download to read offline
Visualization
  Lifecycle

datainsight
 San Francisco 2011
     Raffael Marty
“Transform a dataset into a captive story.”



              ‣ Assess                        Youʼre on your own              Art
              ‣ Parse

              ‣ Clean

              ‣ Visualize



                                          Visualization Tools and Libraries

pixlcloud | collect. visualize. understand.                                         Copyright (c) 2011
Audience
                                                        Expert

                                                                  Fun

                                Technical                               Overview

                                              Boring




                                                       Beginner

pixlcloud | collect. visualize. understand.                                        Copyright (c) 2011
Visualization Process
                                Contextual Data

                                                                                                     iterations




      Data Sources                  (Data Store)             Structured Data                   Visual Representation


                                                                               visualization

                                                   parsing
                                                                               feature selection

                                    files
                                    database
                                                              filtering
                                                              aggregation
                                                              cleansing



pixlcloud | collect. visualize. understand.                                                                       Copyright (c) 2011
Data Sources
      ‣ File                                             XML, JSON, CSV, TSV

      ‣Database                                 mysql -u root -p mydatabase < dump.sql

      ‣ API
                                                curl ‘http://freebase.com/api/service/
         ‣Factual                                   search?query=al+gore&indent=1’

         ‣Freebase

         ‣Infochimps

         ‣OpenStreetMap




pixlcloud | collect. visualize. understand.                                    Copyright (c) 2011
Explore Data
      ‣ What          is the data about?
      ‣ What          are the data features/columns?
      ‣ Is    there a common structure in the data?
      ‣ What          are the data types?
                Nov 7 09:14:46 fwbox kernel: DROPPED IN=eth0 OUT= MAC=00:0c:29:e3:45:bd:00:0c:
                29:b5:5c:ee:08:00 SRC=10.1.222.31 DST=10.1.222.202 LEN=60 TOS=0x00 PREC=0x00
                TTL=64 ID=63849 DF PROTO=TCP SPT=58485 DPT=9111 WINDOW=5840 RES=0x00 SYN URGP=0

                May 25 20:24:20 ram-laptop kernel: BLOCK any in: IN=eth1 OUT=
                MAC=00:13:02:ac:d8:ea:00:09:5b:3d:df:00:08:00 SRC=213.175.90.24 DST=192.168.0.15
                LEN=576 TOS=0x00 PREC=0x00 TTL=115 ID=23513 PROTO=TCP SPT=9030 DPT=56772
                WINDOW=65535 RES=0x00 ACK URGP=0



pixlcloud | collect. visualize. understand.                                                  Copyright (c) 2011
Parsing and Normalization
     ‣ Parsing
        ‣ extraction of entities / features

        ‣ imposing structure
                                              Oct 13 20:00:43.874401 rule 193/0(match): block in on xl0:
                                              212.251.89.126.3859 >: S 1818630320:1818630320(0) win 65535 <mss
                                              1460,nop,nop,sackOK> (DF)

        ‣ often use regexes                   Oct 13 20:00:43 fwbox local4:warn|warning fw07 %PIX-4-106023: Deny tcp
                                              src internet: 212.251.89.126/3859 dst 212.254.110.98/135 by access-
                                              group "internet_access_in"

     ‣ Normalize                              Oct 13 20:00:43 fwbox kernel: DROPPED IN=eth0 OUT=
                                              MAC=ff:ff:ff:ff:ff:ff:00:0f:cc:81:40:94:08:00 SRC=212.251.89.126
                                              DST=212.254.110.98 LEN=576 TOS=0x00 PREC=0x00 TTL=255 ID=8624
                                              PROTO=TCP SPT=3859 DPT=135 LEN=556
        ‣ field normalization

        ‣ term normalization: block, deny, dropped

     ‣ Generate              a common output format for vis-tools (e.g., CSV)

pixlcloud | collect. visualize. understand.                                                          Copyright (c) 2011
Parser
                        Oct 13 20:00:38.018152 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 62.2.32.250.53:    34388 [1au][|domain] (DF)

Raw                     Oct 13 20:00:38.115862 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 192.134.0.49.53:   49962 [1au][|domain] (DF)

                        Oct 13 20:00:38.157238 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 194.25.2.133.53:   14434 [1au][|domain] (DF)




                                      (.*) rule ([-d]+/d+)(.*?): (pass|block) (in|out) on (w+):
                                                    (d+.d+.d+.d+).?(d*) [<>]
Regex / Parser                                       (d+.d+.d+.d+).?(d*): (.*)



                        Oct 13 20:00:38.018152,57/0,match,pass,in,xl1,195.141.69.45,1030,62.2.32.250,53,34388 [1au][|domain] (DF)
Normalized              Oct 13 20:00:38.115862,57/0,match,pass,in,xl1,195.141.69.45,1030,192.134.0.49,53,49962 [1au][|domain] (DF)
(CSV)                   Oct 13 20:00:38.157238,57/0,match,pass,in,xl1,195.141.69.45,1030,194.25.2.133,53,14434 [1au][|domain] (DF)




pixlcloud | collect. visualize. understand.                                                                                        Copyright (c) 2011
UNIX Tools
     ‣ grep
        ‣cat file | grep –v “foo”

     ‣ awk
        ‣awk –F, ‘{printf(“%s,%sn”,$2,$1);}’

        ‣awk -F, -v OFS=, ‘{print $2,$1}’

     ‣ sed
        ‣sed -e 's/fubar/foobar/g' filename




pixlcloud | collect. visualize. understand.                Copyright (c) 2011
Regular Expression Resources
     ‣   http://regexlib.com
     ‣   http://www.regular-expressions.info
     ‣   http://gskinner.com/RegExr




pixlcloud | collect. visualize. understand.    Copyright (c) 2011
Data Cleansing
     ‣ Filter




     ‣ Normalize                  (see earlier)



     ‣ Aggregation



pixlcloud | collect. visualize. understand.             Copyright (c) 2011
Load CSV into Database
    # mysql -u <user> -p                          Sometimes you just load
                                                  your data into a tool,
                                                  and you can omit this
    mysql> create database data;                  step


    mysql> create table set1 (id int, address
           varchar(20), ...);
    mysql> LOAD DATA LOCAL INFILE 'input_file' INTO
                        TABLE set1 FIELDS TERMINATED BY ',' LINES
                        TERMINATED BY 'n';



pixlcloud | collect. visualize. understand.                        Copyright (c) 2011
Contextual Data
     ‣ Either          dump into DB or use via API calls to augment



     ‣ IP    -> Geo mapping
     ‣ Information                    about countries
     ‣ Port       number -> service name


pixlcloud | collect. visualize. understand.                     Copyright (c) 2011
Feature Selection
     ‣ What          are the fields you are interested in?
     ‣ Compute                 new fields
        ‣start time, end time -> duration

        ‣IP subnets [ 10.2.4.2 -> 10.0.0.0/8 or 192.168.1.2 -> 192.168.1.0/24 ]
        ‣ Entropy: H ( X ) = E ( I ( X ) )

     ‣ Dimensionality                         reduction
        ‣See Bryan’s talk!




pixlcloud | collect. visualize. understand.                             Copyright (c) 2011
Choose Your Poison




pixlcloud | collect. visualize. understand.      Copyright (c) 2011
Ode to the Pie




pixlcloud | collect. visualize. understand.               Copyright (c) 2011
A Good Visual
     ‣ Chose        the right graph            ‣ Simultaneous   views




     ‣ Reduce         non-data ink                         ‣ Interactivity




pixlcloud | collect. visualize. understand.                                  Copyright (c) 2011
Visual Transformations
     ‣ keep         iterating on visual transformations, change
        ‣color

        ‣shape

        ‣features display

     ‣ add        new fields?
     ‣ add        more context?
     ‣ is   the output expressive?
     ‣ capture             output and prettify it for presentation
pixlcloud | collect. visualize. understand.                          Copyright (c) 2011
Data Visualization Tools
and Libraries
Tools and Libraries
      ‣ http://datainsightsf.com/resources/
         ‣Choose what’s appropriate!

      ‣ Data         Analysis and Visualization LInuX
         ‣davix.secviz.org

      ‣ GraphViz
         ‣graphviz.org

      ‣ AfterGlow                 (CSV -> DOT)
         ‣afterglow.sf.net


pixlcloud | collect. visualize. understand.             Copyright (c) 2011
Libraries
     ‣ Reporting                 Libraries         ‣Visualization Libraries
        ‣HighCharts                                 ‣TheJIT
        ‣Flot                                       ‣Graphael
        ‣Google Chart API                           ‣Protovis
        ‣Open Flash Chart                           ‣ProcessingJS
        ‣JQuery Sparklines                          ‣Flare
        ‣Polymaps                                   ‣D3


                                                    -

pixlcloud | collect. visualize. understand.                              Copyright (c) 2011
HighCharts



 ‣ Click-Through

 ‣ On      load
    ‣near real-time updates

 ‣ Zoom
                                                           www.highcharts.com

pixlcloud | collect. visualize. understand.                             Copyright (c) 2011
Google Visualization API


     http://code.google.com/apis/visualization/interactive_charts.html

      ‣ JavaScript

      ‣ Based          on DataTables()
      ‣ Many          graphs
      ‣ Playground
         ‣   http://code.google.com/apis/ajax/playground

pixlcloud | collect. visualize. understand.                              Copyright (c) 2011
ProtoVis
     ‣ JavaScript               based visualization library
     ‣ Charting

     ‣ Treemaps

     ‣ BoxPlots

     ‣ Parallel           Coordinates
     ‣ etc.


                                                   http://vis.stanford.edu/protovis/
pixlcloud | collect. visualize. understand.                                  Copyright (c) 2011
TheJIT   http://thejit.org/

     ‣ JavaScript               InfoVis Toolkit
     ‣ Interactive

     ‣ Link        Graphs




pixlcloud | collect. visualize. understand.                      Copyright (c) 2011
Processing
     ‣   Visualization library
     ‣   Java based
     ‣   Interactive (event handling)
     ‣   Number of libraries to
         ‣ draw    in OpenGL
         ‣ read    XML files
     ‣   Processing JS
         ‣ JavaScript
         ‣ HTML 5 Canvas
         ‣ WebGL                                   http://processingjs.org/
         ‣ Web IDE                                 http://processing.org/

pixlcloud | collect. visualize. understand.                                   Copyright (c) 2011
Visualization Tools
     ‣ Gephi

     ‣R

     ‣ Matlab

     ‣ Mondrian

     ‣ PicViz

     ‣ Treemap                 4.1
     ‣ Google             Earth
pixlcloud | collect. visualize. understand.         Copyright (c) 2011
Gephi   http://gephi.org


     ‣ reads:           CSV, DOT, etc.
     ‣ graph           analysis algorithms
     ‣ highly           interactive




pixlcloud | collect. visualize. understand.                    Copyright (c) 2011
PicViz




                                                   http://www.wallinfire.net/picviz/

pixlcloud | collect. visualize. understand.                               Copyright (c) 2011
Treemap 4.1




                                                    http://www.cs.umd.edu/hcil/treemap/
pixlcloud | collect. visualize. understand.                                  Copyright (c) 2011
Google Earth
 • KML data format for
   encoding data




pixlcloud | collect. visualize. understand.   Copyright (c) 2011
pixlcloud                       buy now



collect. visualize. understand.



                 @raffaelmarty

More Related Content

What's hot

Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
Bhavendra Chavan
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry Hamon
Grammarly
 

What's hot (20)

Big Data Analytics (1).ppt
Big Data Analytics (1).pptBig Data Analytics (1).ppt
Big Data Analytics (1).ppt
 
Big data analytics in healthcare industry
Big data analytics in healthcare industryBig data analytics in healthcare industry
Big data analytics in healthcare industry
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
Enterprise Data Management
Enterprise Data ManagementEnterprise Data Management
Enterprise Data Management
 
SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?SPSNYC2019 - What is Common Data Model and how to use it?
SPSNYC2019 - What is Common Data Model and how to use it?
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Natural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry HamonNatural Language Processing for biomedical text mining - Thierry Hamon
Natural Language Processing for biomedical text mining - Thierry Hamon
 
Data Science Project Lifecycle
Data Science Project LifecycleData Science Project Lifecycle
Data Science Project Lifecycle
 
Privacy Engineering
Privacy EngineeringPrivacy Engineering
Privacy Engineering
 
The 2012 Industry Digitization Index
The 2012 Industry Digitization IndexThe 2012 Industry Digitization Index
The 2012 Industry Digitization Index
 
Data Observability.pptx
Data Observability.pptxData Observability.pptx
Data Observability.pptx
 
Healthcare in the Metaverse.pdf
Healthcare in the Metaverse.pdfHealthcare in the Metaverse.pdf
Healthcare in the Metaverse.pdf
 
Airline Analysis of Data Using Hadoop
Airline Analysis of Data Using HadoopAirline Analysis of Data Using Hadoop
Airline Analysis of Data Using Hadoop
 
Importance of data analytics for business
Importance of data analytics for businessImportance of data analytics for business
Importance of data analytics for business
 
Getting Started With Digitisation
Getting Started With DigitisationGetting Started With Digitisation
Getting Started With Digitisation
 
PPC Restart 2022: Radek Laci - Jak najít rovnováhu mezi clickem a klikem
PPC Restart 2022: Radek Laci - Jak najít rovnováhu mezi clickem a klikemPPC Restart 2022: Radek Laci - Jak najít rovnováhu mezi clickem a klikem
PPC Restart 2022: Radek Laci - Jak najít rovnováhu mezi clickem a klikem
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Data Monetization Framework
Data Monetization FrameworkData Monetization Framework
Data Monetization Framework
 
Big data project management
Big data project managementBig data project management
Big data project management
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 

Viewers also liked

Viewers also liked (6)

Analytic Journeys from Predictive Analytics World
Analytic Journeys from Predictive Analytics WorldAnalytic Journeys from Predictive Analytics World
Analytic Journeys from Predictive Analytics World
 
Cyber Security – How Visual Analytics Unlock Insight
Cyber Security – How Visual Analytics Unlock InsightCyber Security – How Visual Analytics Unlock Insight
Cyber Security – How Visual Analytics Unlock Insight
 
AfterGlow
AfterGlowAfterGlow
AfterGlow
 
Security Insights at Scale
Security Insights at ScaleSecurity Insights at Scale
Security Insights at Scale
 
Workshop: Big Data Visualization for Security
Workshop: Big Data Visualization for SecurityWorkshop: Big Data Visualization for Security
Workshop: Big Data Visualization for Security
 
Gephi Quick Start
Gephi Quick StartGephi Quick Start
Gephi Quick Start
 

Similar to Visualization Lifecycle

breed_python_tx_redacted
breed_python_tx_redactedbreed_python_tx_redacted
breed_python_tx_redacted
Ryan Breed
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Spark Summit
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
IndicThreads
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
mjfrankli
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
DataWorks Summit
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Databricks
 

Similar to Visualization Lifecycle (20)

Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
breed_python_tx_redacted
breed_python_tx_redactedbreed_python_tx_redacted
breed_python_tx_redacted
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
 
Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0Oracle Trace File Analyzer - What's New in 12.2.1.1.0
Oracle Trace File Analyzer - What's New in 12.2.1.1.0
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
 
Examining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail FilesExamining Oracle GoldenGate Trail Files
Examining Oracle GoldenGate Trail Files
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
 
Structuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingStructuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and Streaming
 
Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17Hopping in clouds - phpuk 17
Hopping in clouds - phpuk 17
 
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
Tim Hall [InfluxData] | InfluxDB Roadmap | InfluxDays Virtual Experience Lond...
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoop
 

More from Raffael Marty

AI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are DangerousAI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are Dangerous
Raffael Marty
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and Visualization
Raffael Marty
 

More from Raffael Marty (20)

Exploring the Defender's Advantage
Exploring the Defender's AdvantageExploring the Defender's Advantage
Exploring the Defender's Advantage
 
Extended Detection and Response (XDR) An Overhyped Product Category With Ulti...
Extended Detection and Response (XDR)An Overhyped Product Category With Ulti...Extended Detection and Response (XDR)An Overhyped Product Category With Ulti...
Extended Detection and Response (XDR) An Overhyped Product Category With Ulti...
 
How To Drive Value with Security Data
How To Drive Value with Security DataHow To Drive Value with Security Data
How To Drive Value with Security Data
 
Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?
Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?
Cyber Security Beyond 2020 – Will We Learn From Our Mistakes?
 
Artificial Intelligence – Time Bomb or The Promised Land?
Artificial Intelligence – Time Bomb or The Promised Land?Artificial Intelligence – Time Bomb or The Promised Land?
Artificial Intelligence – Time Bomb or The Promised Land?
 
Understanding the "Intelligence" in AI
Understanding the "Intelligence" in AIUnderstanding the "Intelligence" in AI
Understanding the "Intelligence" in AI
 
Security Chat 5.0
Security Chat 5.0Security Chat 5.0
Security Chat 5.0
 
AI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are DangerousAI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are Dangerous
 
AI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are DangerousAI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are Dangerous
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and Visualization
 
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't ChangedAI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
AI & ML in Cyber Security - Welcome Back to 1999 - Security Hasn't Changed
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & Visualization
 
Creating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & VisualizationCreating Your Own Threat Intel Through Hunting & Visualization
Creating Your Own Threat Intel Through Hunting & Visualization
 
Visualization in the Age of Big Data
Visualization in the Age of Big DataVisualization in the Age of Big Data
Visualization in the Age of Big Data
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?
 
Visualization for Security
Visualization for SecurityVisualization for Security
Visualization for Security
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?
 
DAVIX - Data Analysis and Visualization Linux
DAVIX - Data Analysis and Visualization LinuxDAVIX - Data Analysis and Visualization Linux
DAVIX - Data Analysis and Visualization Linux
 
Cloud - Security - Big Data
Cloud - Security - Big DataCloud - Security - Big Data
Cloud - Security - Big Data
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Recently uploaded (20)

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 

Visualization Lifecycle

  • 1. Visualization Lifecycle datainsight San Francisco 2011 Raffael Marty
  • 2. “Transform a dataset into a captive story.” ‣ Assess Youʼre on your own Art ‣ Parse ‣ Clean ‣ Visualize Visualization Tools and Libraries pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 3. Audience Expert Fun Technical Overview Boring Beginner pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 4. Visualization Process Contextual Data iterations Data Sources (Data Store) Structured Data Visual Representation visualization parsing feature selection files database filtering aggregation cleansing pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 5. Data Sources ‣ File XML, JSON, CSV, TSV ‣Database mysql -u root -p mydatabase < dump.sql ‣ API curl ‘http://freebase.com/api/service/ ‣Factual search?query=al+gore&indent=1’ ‣Freebase ‣Infochimps ‣OpenStreetMap pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 6. Explore Data ‣ What is the data about? ‣ What are the data features/columns? ‣ Is there a common structure in the data? ‣ What are the data types? Nov 7 09:14:46 fwbox kernel: DROPPED IN=eth0 OUT= MAC=00:0c:29:e3:45:bd:00:0c: 29:b5:5c:ee:08:00 SRC=10.1.222.31 DST=10.1.222.202 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=63849 DF PROTO=TCP SPT=58485 DPT=9111 WINDOW=5840 RES=0x00 SYN URGP=0 May 25 20:24:20 ram-laptop kernel: BLOCK any in: IN=eth1 OUT= MAC=00:13:02:ac:d8:ea:00:09:5b:3d:df:00:08:00 SRC=213.175.90.24 DST=192.168.0.15 LEN=576 TOS=0x00 PREC=0x00 TTL=115 ID=23513 PROTO=TCP SPT=9030 DPT=56772 WINDOW=65535 RES=0x00 ACK URGP=0 pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 7. Parsing and Normalization ‣ Parsing ‣ extraction of entities / features ‣ imposing structure Oct 13 20:00:43.874401 rule 193/0(match): block in on xl0: 212.251.89.126.3859 >: S 1818630320:1818630320(0) win 65535 <mss 1460,nop,nop,sackOK> (DF) ‣ often use regexes Oct 13 20:00:43 fwbox local4:warn|warning fw07 %PIX-4-106023: Deny tcp src internet: 212.251.89.126/3859 dst 212.254.110.98/135 by access- group "internet_access_in" ‣ Normalize Oct 13 20:00:43 fwbox kernel: DROPPED IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:0f:cc:81:40:94:08:00 SRC=212.251.89.126 DST=212.254.110.98 LEN=576 TOS=0x00 PREC=0x00 TTL=255 ID=8624 PROTO=TCP SPT=3859 DPT=135 LEN=556 ‣ field normalization ‣ term normalization: block, deny, dropped ‣ Generate a common output format for vis-tools (e.g., CSV) pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 8. Parser Oct 13 20:00:38.018152 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 62.2.32.250.53: 34388 [1au][|domain] (DF) Raw Oct 13 20:00:38.115862 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 192.134.0.49.53: 49962 [1au][|domain] (DF) Oct 13 20:00:38.157238 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 > 194.25.2.133.53: 14434 [1au][|domain] (DF) (.*) rule ([-d]+/d+)(.*?): (pass|block) (in|out) on (w+): (d+.d+.d+.d+).?(d*) [<>] Regex / Parser (d+.d+.d+.d+).?(d*): (.*) Oct 13 20:00:38.018152,57/0,match,pass,in,xl1,195.141.69.45,1030,62.2.32.250,53,34388 [1au][|domain] (DF) Normalized Oct 13 20:00:38.115862,57/0,match,pass,in,xl1,195.141.69.45,1030,192.134.0.49,53,49962 [1au][|domain] (DF) (CSV) Oct 13 20:00:38.157238,57/0,match,pass,in,xl1,195.141.69.45,1030,194.25.2.133,53,14434 [1au][|domain] (DF) pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 9. UNIX Tools ‣ grep ‣cat file | grep –v “foo” ‣ awk ‣awk –F, ‘{printf(“%s,%sn”,$2,$1);}’ ‣awk -F, -v OFS=, ‘{print $2,$1}’ ‣ sed ‣sed -e 's/fubar/foobar/g' filename pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 10. Regular Expression Resources ‣ http://regexlib.com ‣ http://www.regular-expressions.info ‣ http://gskinner.com/RegExr pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 11. Data Cleansing ‣ Filter ‣ Normalize (see earlier) ‣ Aggregation pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 12. Load CSV into Database # mysql -u <user> -p Sometimes you just load your data into a tool, and you can omit this mysql> create database data; step mysql> create table set1 (id int, address varchar(20), ...); mysql> LOAD DATA LOCAL INFILE 'input_file' INTO TABLE set1 FIELDS TERMINATED BY ',' LINES TERMINATED BY 'n'; pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 13. Contextual Data ‣ Either dump into DB or use via API calls to augment ‣ IP -> Geo mapping ‣ Information about countries ‣ Port number -> service name pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 14. Feature Selection ‣ What are the fields you are interested in? ‣ Compute new fields ‣start time, end time -> duration ‣IP subnets [ 10.2.4.2 -> 10.0.0.0/8 or 192.168.1.2 -> 192.168.1.0/24 ] ‣ Entropy: H ( X ) = E ( I ( X ) ) ‣ Dimensionality reduction ‣See Bryan’s talk! pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 15. Choose Your Poison pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 16. Ode to the Pie pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 17. A Good Visual ‣ Chose the right graph ‣ Simultaneous views ‣ Reduce non-data ink ‣ Interactivity pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 18. Visual Transformations ‣ keep iterating on visual transformations, change ‣color ‣shape ‣features display ‣ add new fields? ‣ add more context? ‣ is the output expressive? ‣ capture output and prettify it for presentation pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 20. Tools and Libraries ‣ http://datainsightsf.com/resources/ ‣Choose what’s appropriate! ‣ Data Analysis and Visualization LInuX ‣davix.secviz.org ‣ GraphViz ‣graphviz.org ‣ AfterGlow (CSV -> DOT) ‣afterglow.sf.net pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 21. Libraries ‣ Reporting Libraries ‣Visualization Libraries ‣HighCharts ‣TheJIT ‣Flot ‣Graphael ‣Google Chart API ‣Protovis ‣Open Flash Chart ‣ProcessingJS ‣JQuery Sparklines ‣Flare ‣Polymaps ‣D3 - pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 22. HighCharts ‣ Click-Through ‣ On load ‣near real-time updates ‣ Zoom www.highcharts.com pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 23. Google Visualization API http://code.google.com/apis/visualization/interactive_charts.html ‣ JavaScript ‣ Based on DataTables() ‣ Many graphs ‣ Playground ‣ http://code.google.com/apis/ajax/playground pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 24. ProtoVis ‣ JavaScript based visualization library ‣ Charting ‣ Treemaps ‣ BoxPlots ‣ Parallel Coordinates ‣ etc. http://vis.stanford.edu/protovis/ pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 25. TheJIT http://thejit.org/ ‣ JavaScript InfoVis Toolkit ‣ Interactive ‣ Link Graphs pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 26. Processing ‣ Visualization library ‣ Java based ‣ Interactive (event handling) ‣ Number of libraries to ‣ draw in OpenGL ‣ read XML files ‣ Processing JS ‣ JavaScript ‣ HTML 5 Canvas ‣ WebGL http://processingjs.org/ ‣ Web IDE http://processing.org/ pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 27. Visualization Tools ‣ Gephi ‣R ‣ Matlab ‣ Mondrian ‣ PicViz ‣ Treemap 4.1 ‣ Google Earth pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 28. Gephi http://gephi.org ‣ reads: CSV, DOT, etc. ‣ graph analysis algorithms ‣ highly interactive pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 29. PicViz http://www.wallinfire.net/picviz/ pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 30. Treemap 4.1 http://www.cs.umd.edu/hcil/treemap/ pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 31. Google Earth • KML data format for encoding data pixlcloud | collect. visualize. understand. Copyright (c) 2011
  • 32. pixlcloud buy now collect. visualize. understand. @raffaelmarty