SlideShare a Scribd company logo
1 of 41
SQL Ginsu
      Better Living (And Data Reduction)
      Through Databases

                             Normalize and reduce lots of data.
                             Provide quick and decisive insight.

                                    Jul 21, 2011 SANSFIRE

©2011 Lewes Technology Consulting, LLC
Phil Hagen
                                         8yrs Contract InfoSec/
                                         Forensic work with DoD,
                                         IC, LE, commercial
                                         5yrs USAF InfoSec/Comm
                                         BS: CompSci, USAFA
                                         Contacts:
                                           gplus.to/philhagen
                                           @PhilHagen
                                           stuffphilwrites.com


©2011 Lewes Technology Consulting, LLC
Phil Hagen
                                         8yrs Contract InfoSec/
                                         Forensic work with DoD,
                                         IC, LE, commercial
                                         5yrs USAF InfoSec/Comm
                                         BS: CompSci, USAFA
                                         Contacts:
                                           gplus.to/philhagen
                                           @PhilHagen
                                           stuffphilwrites.com


©2011 Lewes Technology Consulting, LLC
Phil Hagen
                                         8yrs Contract InfoSec/
                                         Forensic work with DoD,
                                         IC, LE, commercial
                                         5yrs USAF InfoSec/Comm
                                         BS: CompSci, USAFA
                                         Contacts:
                                           gplus.to/philhagen
                                           @PhilHagen
                                           stuffphilwrites.com


©2011 Lewes Technology Consulting, LLC
Phil Hagen
                                         8yrs Contract InfoSec/
                                         Forensic work with DoD,
                                         IC, LE, commercial
                                         5yrs USAF InfoSec/Comm
                                         BS: CompSci, USAFA
                                         Contacts:
                                           gplus.to/philhagen
                                           @PhilHagen
                                           stuffphilwrites.com


©2011 Lewes Technology Consulting, LLC
Why “SQL Ginsu”?




                                         Photos:
                                         lifeandtimesincleveland.blogspot.com/
©2011 Lewes Technology Consulting, LLC   2011/06/michael.html
Why “SQL Ginsu”?
         ↓ Time spent manually
           reviewing data




                                         Photos:
                                         lifeandtimesincleveland.blogspot.com/
©2011 Lewes Technology Consulting, LLC   2011/06/michael.html
Why “SQL Ginsu”?
         ↓ Time spent manually
           reviewing data

         ↓ Complexity of data for
           clearer presentation




                                         Photos:
                                         lifeandtimesincleveland.blogspot.com/
©2011 Lewes Technology Consulting, LLC   2011/06/michael.html
Why “SQL Ginsu”?
         ↓ Time spent manually
           reviewing data

         ↓ Complexity of data for
           clearer presentation

         ↑ YOUR value to the
           investigation!




                                         Photos:
                                         lifeandtimesincleveland.blogspot.com/
©2011 Lewes Technology Consulting, LLC   2011/06/michael.html
Why “SQL Ginsu”?
         ↓ Time spent manually
           reviewing data

         ↓ Complexity of data for
           clearer presentation

         ↑ YOUR value to the
           investigation!
         Cuts a lead pipe and still
         cuts wafer-thin tomatoes


                                         Photos:
                                         lifeandtimesincleveland.blogspot.com/
©2011 Lewes Technology Consulting, LLC   2011/06/michael.html
Why “SQL Ginsu”?
         ↓ Time spent manually
           reviewing data

         ↓ Complexity of data for
           clearer presentation

         ↑ YOUR value to the
           investigation!
         Cuts a lead pipe and still
         cuts wafer-thin tomatoes
         Infomercials are funny
                                         Photos:
                                         lifeandtimesincleveland.blogspot.com/
©2011 Lewes Technology Consulting, LLC   2011/06/michael.html
Background
         Data continues to grow exponentially
             1995: 1GB @ $600; 2011: 1TB @ $90
             [6,826x ↓ unit cost!]
         Data sources increasingly diverse
             3+ browsers, many databases, countless mobile
             apps, tools, sites and services, cross-platform
             inconsistencies
             Core questions remain consistent:
             who/what/where/why/when/how

©2011 Lewes Technology Consulting, LLC
Background
         Data continues to grow exponentially
             1995: 1GB @ $600; 2011: 1TB @ $90
             [6,826x ↓ unit cost!]
         Data sources increasingly diverse
             3+ browsers, many databases, countless mobile
             apps, tools, sites and services, cross-platform
             inconsistencies
             Core questions remain consistent:
             who/what/where/why/when/how

©2011 Lewes Technology Consulting, LLC
Background
         Value of the “all in one” forensic tool is not in question,
         but...
             ...can no longer be the ONLY tool used in an examination
         Staying flexible is key!
             Cyclical nature of the collect/analyze/report processes
             New leads generated - from the exam or elsewhere
             Change in legal strategy
             New personalities with different perspectives

©2011 Lewes Technology Consulting, LLC
Foundational Skill Sets




©2011 Lewes Technology Consulting, LLC
Foundational Skill Sets
         In computer forensics and incident response, I contend
         two foundational skills are:
             Data management and data reduction
         Many ways to accomplish, but we’ll focus on using SQL
             SQL is very scalable, relatively universal, extremely
             powerful and repeatable
             Excel, text manipulation (sed, awk, cut, grep), etc are
             also perfectly good tools for this

©2011 Lewes Technology Consulting, LLC
SQL: It’s Not that Hard

         CREATE TABLE `tblname` (`col1` integer,
         `col2` varchar(10));
         INSERT INTO `tblname` (`col1`, `col2`)
         VALUES (‘1’, ‘val2’);
         SELECT `col2` FROM `tblname` WHERE
         `col1` = 1;



©2011 Lewes Technology Consulting, LLC
OK, That’s Not Quite Fair
         SQL can be extremely powerful, but not prohibitively complicated
             JOINs: Combine data from different tables into one result set
             Sub-SELECTs: Nested queries can simplify logic and
             decrease need to master JOINs
             UNIONs: Craft similarly-structured queries against different
             data sets and present as one result set
             INDEXes: Can speed queries and JOINs when created on
             commonly-used columns
         As with anything worthwhile, it’s an ongoing educational process


©2011 Lewes Technology Consulting, LLC
SQL Super Functions
         Do analysis within SQL statements
             SUM(), AVE(), MAX(), MIN(), COUNT(), STD()
             Date/time addition/subtraction/slicing
                Pull date or clock time from DATETIME values
             Math, number formatting
         Use most efficient data types
             INET_ATON(), INET_NTOA()
         Start writing the report with SQL: GROUP_CONCAT()


©2011 Lewes Technology Consulting, LLC
Step 1: Schema Design
         Schema consists of column names and type definitions
         Your data might already be databased somewhere!
             log2timeline, SGUIL, Splunk
         Adapt to developing requirements: ALTER TABLE
             (or fix what you messed up when you started)
         Doing this efficiently takes practice
             Ranges of integers
             “Normal Forms”


©2011 Lewes Technology Consulting, LLC
Step 1: Schema Tricks/Tips
         Use an ‘interesting’ column

             ‘y’, ‘n’, ‘-’ values for later data reduction
         Use 32-bit unsigned integers for IP addresses
             Might want to use dotted quads too, if you expect on-the-
             fly subnet-style queries (add/remove as needed)
         Use indexes where sensible
             Too many indexes decrease performance, increase storage
             Focus on commonly-queried columns

©2011 Lewes Technology Consulting, LLC
Step 1: Schema Tricks/Tips
         Use an ‘interesting’ column

             ‘y’, ‘n’, ‘-’ values for later data reduction
         Use 32-bit unsigned integers for IP addresses
             Might want to use dotted quads too, if you expect on-the-
             fly subnet-style queries (add/remove as needed)
         Use indexes where sensible
             Too many indexes decrease performance, increase storage
             Focus on commonly-queried columns

©2011 Lewes Technology Consulting, LLC
Step 2: Data Load
         Lots of data available - how do we normalize it for a
         database? By any means possible!
             Shell commands/scripts (CLKF!): awk, sed, cut, grep
             Scripting languages: Python, Perl, PHP
             Office apps: Excel, OOCalc
             By any means possible!
         Separate individual data items to craft SQL INSERTs


©2011 Lewes Technology Consulting, LLC     http://blog.commandlinekungfu.com
Step 3: Data Reduction

         “Load less data” or “classify after load”?
         Load less? Might not be able to go back and get it!
             Needs less disk space, simplifies queries
         Load and classify? Might need lots of disk space
             Queries can be slower and more complicated
         Me? Load everything until resources are an issue


©2011 Lewes Technology Consulting, LLC
Step 4: Analyze! (Profit?)



©2011 Lewes Technology Consulting, LLC
Step 4: Analysis Tricks/Tips
         Sub-SELECTs and UNIONs can drastically affect
         speed
             Sometimes a script or two-stage querysaves hours!
         Save SQL statements with report for re-generation in
         the future - especially with new data
         Partially-tested, unverified, embarrassing but functional
         scripts at stuffphilwrites.com/2011/07/sql-ginsu
             Seriously, don’t laugh. Please.

©2011 Lewes Technology Consulting, LLC
Example 1: Login Records



©2011 Lewes Technology Consulting, LLC
Example 1, Step 1: Schema
         CREATE TABLE `logins` (
           `id` int(11) unsigned NOT NULL
                auto_increment,
           `userid` varchar(20) NOT NULL,
           `systemname` varchar(20) NOT NULL
                        default '',
           `src_ip` int unsigned NOT NULL,
           `logintime` datetime NOT NULL,
                                                    Integer!
           `logouttime` datetime NOT NULL,
           `interesting` enum('y','n','-')
                         NOT NULL
                         default '-',

           PRIMARY KEY (`id`),
           KEY `src_ip` (`src_ip`),
           KEY `userid` (`userid`)             Indexes
         );


©2011 Lewes Technology Consulting, LLC
Example 1, Step 2: Load
         Output of Linux “last -i” command:
             phil      pts/0             1.2.3.4   Thu May   5 16:39 - 22:02 (10+05:22)

                “phil” = username, “pts/0” = terminal
                “1.2.3.4” = source IP, “Thu May 5 16:39” = login date and time
                “22:02” = logout (time only!), “10+05:22” = duration
         INSERT INTO `logins` (`userid`, `src_ip`, `logintime`,
         `logouttime`) VALUES (‘phil’, INET_ATON(‘1.2.3.4’),
         ‘2010-05-05 16:39:00’,
         TIMEADD(‘2010-05-05 16:39’, ’10 05:22:00’));

         Python script for Linux at stuffphilwrites.com/2011/07/sql-ginsu


©2011 Lewes Technology Consulting, LLC
Example 1, Step 3: Reduce


         Eliminate known-good IPs, system accounts, date
         ranges, etc
             What pitfalls could this induce?




©2011 Lewes Technology Consulting, LLC
Example 1, Step 4: Analyze!
         Session info for most frequently used source IPs
             SELECT systemname, INET_NTOA(src_ip), COUNT(*) AS count
             FROM logins
             WHERE interesting != 'n'
             GROUP BY systemname, src_ip
             ORDER BY count DESC, src_ip;
         Sessions with login duration > 1 day
             SELECT *, INET_NTOA(src_ip),
                TIMEDIFF(logouttime, logintime) AS duration
             FROM logins
             WHERE interesting != 'n'
             HAVING duration > '24:00:00'
             ORDER BY duration DESC;



©2011 Lewes Technology Consulting, LLC
Example 1, Step 4: Analyze!
         Daily login window per user
             SELECT userid, systemname, COUNT(*),
                MIN(TIME(logintime)), MAX(TIME(logintime))
             FROM logins
             WHERE interesting != 'n'
             GROUP BY userid,systemname;
         IPs per username per host
             SELECT userid, systemname, COUNT(*),
                GROUP_CONCAT(DISTINCT INET_NTOA(src_ip)
                   SEPARATOR ', ')
             FROM logins
             WHERE interesting != 'n'
             GROUP BY userid, systemname
             ORDER BY systemname, userid;


©2011 Lewes Technology Consulting, LLC
Example 2: Network Flows



©2011 Lewes Technology Consulting, LLC
Example 2, Step 1: Schema
         CREATE TABLE `traffic` (
           `interesting` enum
              ('y','n','-') default '-',
           `sancp_id` bigint unsigned,
           `start_time_gmt` datetime,
           `stop_time_gmt` datetime,
           `eth_proto` smallint
              unsigned,
           `ip_proto` tinyint unsigned,
           `src_ip` int unsigned,          Smaller Integers
           `src_port` smallint unsigned,
           `dst_ip` int unsigned,
           `dst_port` smallint unsigned,
           `duration` int unsigned,
           `src_pkts` bigint unsigned,
           `dst_pkts` bigint unsigned,
           `src_bytes` bigint unsigned,
           `dst_bytes` bigint unsigned,
         );

©2011 Lewes Technology Consulting, LLC
Example 2, Step 2: Load
         Distill pcap (or live) network data to sessions with SANCP
             Creates pipe-separated list of 50 fields per session
         Use script to extract relevant fields
             INSERT INTO `traffic` (`sancp_id`, `start_time_gmt`,
             `stop_time_gmt`, `eth_proto`, `ip_proto`, `src_ip`,
             `src_port`, `dst_ip`, `dst_port`, `duration`,
             `src_pkts`, `dst_pkts`, `src_bytes`, `dst_bytes`)
             VALUES (5613951026752893809, ‘2011-06-03 11:17:11’,
             ‘2011-06-03 11:17:25’, 8, 6, INET_NTOA(‘0.2.246.190’),
             3306, INET_NTOA(‘0.3.162.24’), 14, 82, 43, 15866, 0);
         Python script for pcaps at stuffphilwrites.com/2011/07/sql-ginsu


©2011 Lewes Technology Consulting, LLC             http://metre.net/sancp.html
Example 2, Step 3: Reduce
         Brute force SSH logins? Do one and observe network traffic
             <5 sec && <45 packets && <2500 bytes

                UPDATE traffic SET interesting='n'
                WHERE (src_port=22 OR dst_port=22)
                   AND duration < 5
                   AND src_pkts+dst_pkts < 45
                   AND src_bytes+dst_bytes < 2500;
         DNS, NTP can sometimes be ruled out

             UPDATE traffic SET interesting='n'
             WHERE eth_proto=8 AND ip_proto=17
                AND (dst_port=53 OR dst_port=123);

©2011 Lewes Technology Consulting, LLC
Example 2, Step 4: Analyze!

         SSH/SCP/SFTP sources
             SELECT INET_NTOA(src_ip), duration,
                INET_NTOA(dst_ip),
                ((src_bytes+dst_bytes)/1024/1024)
                   AS MB
             FROM traffic
             WHERE (src_port=22 or dst_port=22)
                AND interesting != 'n';


©2011 Lewes Technology Consulting, LLC
Example 2, Step 4: Analyze!
         Inbound FTP: multiple sessions used - two-stage query
             SELECT GROUP_CONCAT(src_ip SEPARATOR ', ')
             FROM traffic
             WHERE (dst_port=21 AND dst_bytes>0);
             SELECT INET_NTOA(src_ip), duration,
                (src_bytes/1024/1024) AS src_MB,
                (dst_bytes/1024/1024) AS dst_MB,
                start_time_gmt, stop_time_gmt
             FROM traffic
             WHERE (src_port>1024 AND dst_port>1024)
                AND src_ip IN (<list from prev query>) AND
             interesting != 'n';


©2011 Lewes Technology Consulting, LLC
Example 2, Step 4: Analyze!
         Bot beaconing
             Add column

                ALTER TABLE `traffic`
                ADD COLUMN `elapsed` TIME DEFAULT NULL
                AFTER `dst_bytes`;
             During data load,“time last seen” = “srcIP:dstIP:dstport”
             If you’ve seen the IP+IP+port before, insert time delta:

                elapsed = TIMEDIFF(start_time_gmt,
                <time_last_seen>);
                Set “time last seen” in loader script to new
                start_time_gmt
©2011 Lewes Technology Consulting, LLC
Wrap-up
         Mo’ data, mo’ problems
         Forensicators need foundational skills of:
         data management and data reduction
         Use SQL to do this:
             1: Schema design
             2: Data load
             3: Reduce
             4: Analyze!

©2011 Lewes Technology Consulting, LLC
Questions?



©2011 Lewes Technology Consulting, LLC

More Related Content

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

SQL Ginsu (SANS @Night, SANSFIRE 2011)

  • 1. SQL Ginsu Better Living (And Data Reduction) Through Databases Normalize and reduce lots of data. Provide quick and decisive insight. Jul 21, 2011 SANSFIRE ©2011 Lewes Technology Consulting, LLC
  • 2. Phil Hagen 8yrs Contract InfoSec/ Forensic work with DoD, IC, LE, commercial 5yrs USAF InfoSec/Comm BS: CompSci, USAFA Contacts: gplus.to/philhagen @PhilHagen stuffphilwrites.com ©2011 Lewes Technology Consulting, LLC
  • 3. Phil Hagen 8yrs Contract InfoSec/ Forensic work with DoD, IC, LE, commercial 5yrs USAF InfoSec/Comm BS: CompSci, USAFA Contacts: gplus.to/philhagen @PhilHagen stuffphilwrites.com ©2011 Lewes Technology Consulting, LLC
  • 4. Phil Hagen 8yrs Contract InfoSec/ Forensic work with DoD, IC, LE, commercial 5yrs USAF InfoSec/Comm BS: CompSci, USAFA Contacts: gplus.to/philhagen @PhilHagen stuffphilwrites.com ©2011 Lewes Technology Consulting, LLC
  • 5. Phil Hagen 8yrs Contract InfoSec/ Forensic work with DoD, IC, LE, commercial 5yrs USAF InfoSec/Comm BS: CompSci, USAFA Contacts: gplus.to/philhagen @PhilHagen stuffphilwrites.com ©2011 Lewes Technology Consulting, LLC
  • 6. Why “SQL Ginsu”? Photos: lifeandtimesincleveland.blogspot.com/ ©2011 Lewes Technology Consulting, LLC 2011/06/michael.html
  • 7. Why “SQL Ginsu”? ↓ Time spent manually reviewing data Photos: lifeandtimesincleveland.blogspot.com/ ©2011 Lewes Technology Consulting, LLC 2011/06/michael.html
  • 8. Why “SQL Ginsu”? ↓ Time spent manually reviewing data ↓ Complexity of data for clearer presentation Photos: lifeandtimesincleveland.blogspot.com/ ©2011 Lewes Technology Consulting, LLC 2011/06/michael.html
  • 9. Why “SQL Ginsu”? ↓ Time spent manually reviewing data ↓ Complexity of data for clearer presentation ↑ YOUR value to the investigation! Photos: lifeandtimesincleveland.blogspot.com/ ©2011 Lewes Technology Consulting, LLC 2011/06/michael.html
  • 10. Why “SQL Ginsu”? ↓ Time spent manually reviewing data ↓ Complexity of data for clearer presentation ↑ YOUR value to the investigation! Cuts a lead pipe and still cuts wafer-thin tomatoes Photos: lifeandtimesincleveland.blogspot.com/ ©2011 Lewes Technology Consulting, LLC 2011/06/michael.html
  • 11. Why “SQL Ginsu”? ↓ Time spent manually reviewing data ↓ Complexity of data for clearer presentation ↑ YOUR value to the investigation! Cuts a lead pipe and still cuts wafer-thin tomatoes Infomercials are funny Photos: lifeandtimesincleveland.blogspot.com/ ©2011 Lewes Technology Consulting, LLC 2011/06/michael.html
  • 12. Background Data continues to grow exponentially 1995: 1GB @ $600; 2011: 1TB @ $90 [6,826x ↓ unit cost!] Data sources increasingly diverse 3+ browsers, many databases, countless mobile apps, tools, sites and services, cross-platform inconsistencies Core questions remain consistent: who/what/where/why/when/how ©2011 Lewes Technology Consulting, LLC
  • 13. Background Data continues to grow exponentially 1995: 1GB @ $600; 2011: 1TB @ $90 [6,826x ↓ unit cost!] Data sources increasingly diverse 3+ browsers, many databases, countless mobile apps, tools, sites and services, cross-platform inconsistencies Core questions remain consistent: who/what/where/why/when/how ©2011 Lewes Technology Consulting, LLC
  • 14. Background Value of the “all in one” forensic tool is not in question, but... ...can no longer be the ONLY tool used in an examination Staying flexible is key! Cyclical nature of the collect/analyze/report processes New leads generated - from the exam or elsewhere Change in legal strategy New personalities with different perspectives ©2011 Lewes Technology Consulting, LLC
  • 15. Foundational Skill Sets ©2011 Lewes Technology Consulting, LLC
  • 16. Foundational Skill Sets In computer forensics and incident response, I contend two foundational skills are: Data management and data reduction Many ways to accomplish, but we’ll focus on using SQL SQL is very scalable, relatively universal, extremely powerful and repeatable Excel, text manipulation (sed, awk, cut, grep), etc are also perfectly good tools for this ©2011 Lewes Technology Consulting, LLC
  • 17. SQL: It’s Not that Hard CREATE TABLE `tblname` (`col1` integer, `col2` varchar(10)); INSERT INTO `tblname` (`col1`, `col2`) VALUES (‘1’, ‘val2’); SELECT `col2` FROM `tblname` WHERE `col1` = 1; ©2011 Lewes Technology Consulting, LLC
  • 18. OK, That’s Not Quite Fair SQL can be extremely powerful, but not prohibitively complicated JOINs: Combine data from different tables into one result set Sub-SELECTs: Nested queries can simplify logic and decrease need to master JOINs UNIONs: Craft similarly-structured queries against different data sets and present as one result set INDEXes: Can speed queries and JOINs when created on commonly-used columns As with anything worthwhile, it’s an ongoing educational process ©2011 Lewes Technology Consulting, LLC
  • 19. SQL Super Functions Do analysis within SQL statements SUM(), AVE(), MAX(), MIN(), COUNT(), STD() Date/time addition/subtraction/slicing Pull date or clock time from DATETIME values Math, number formatting Use most efficient data types INET_ATON(), INET_NTOA() Start writing the report with SQL: GROUP_CONCAT() ©2011 Lewes Technology Consulting, LLC
  • 20. Step 1: Schema Design Schema consists of column names and type definitions Your data might already be databased somewhere! log2timeline, SGUIL, Splunk Adapt to developing requirements: ALTER TABLE (or fix what you messed up when you started) Doing this efficiently takes practice Ranges of integers “Normal Forms” ©2011 Lewes Technology Consulting, LLC
  • 21. Step 1: Schema Tricks/Tips Use an ‘interesting’ column ‘y’, ‘n’, ‘-’ values for later data reduction Use 32-bit unsigned integers for IP addresses Might want to use dotted quads too, if you expect on-the- fly subnet-style queries (add/remove as needed) Use indexes where sensible Too many indexes decrease performance, increase storage Focus on commonly-queried columns ©2011 Lewes Technology Consulting, LLC
  • 22. Step 1: Schema Tricks/Tips Use an ‘interesting’ column ‘y’, ‘n’, ‘-’ values for later data reduction Use 32-bit unsigned integers for IP addresses Might want to use dotted quads too, if you expect on-the- fly subnet-style queries (add/remove as needed) Use indexes where sensible Too many indexes decrease performance, increase storage Focus on commonly-queried columns ©2011 Lewes Technology Consulting, LLC
  • 23. Step 2: Data Load Lots of data available - how do we normalize it for a database? By any means possible! Shell commands/scripts (CLKF!): awk, sed, cut, grep Scripting languages: Python, Perl, PHP Office apps: Excel, OOCalc By any means possible! Separate individual data items to craft SQL INSERTs ©2011 Lewes Technology Consulting, LLC http://blog.commandlinekungfu.com
  • 24. Step 3: Data Reduction “Load less data” or “classify after load”? Load less? Might not be able to go back and get it! Needs less disk space, simplifies queries Load and classify? Might need lots of disk space Queries can be slower and more complicated Me? Load everything until resources are an issue ©2011 Lewes Technology Consulting, LLC
  • 25. Step 4: Analyze! (Profit?) ©2011 Lewes Technology Consulting, LLC
  • 26. Step 4: Analysis Tricks/Tips Sub-SELECTs and UNIONs can drastically affect speed Sometimes a script or two-stage querysaves hours! Save SQL statements with report for re-generation in the future - especially with new data Partially-tested, unverified, embarrassing but functional scripts at stuffphilwrites.com/2011/07/sql-ginsu Seriously, don’t laugh. Please. ©2011 Lewes Technology Consulting, LLC
  • 27. Example 1: Login Records ©2011 Lewes Technology Consulting, LLC
  • 28. Example 1, Step 1: Schema CREATE TABLE `logins` ( `id` int(11) unsigned NOT NULL auto_increment, `userid` varchar(20) NOT NULL, `systemname` varchar(20) NOT NULL default '', `src_ip` int unsigned NOT NULL, `logintime` datetime NOT NULL, Integer! `logouttime` datetime NOT NULL, `interesting` enum('y','n','-') NOT NULL default '-', PRIMARY KEY (`id`), KEY `src_ip` (`src_ip`), KEY `userid` (`userid`) Indexes ); ©2011 Lewes Technology Consulting, LLC
  • 29. Example 1, Step 2: Load Output of Linux “last -i” command: phil pts/0 1.2.3.4 Thu May 5 16:39 - 22:02 (10+05:22) “phil” = username, “pts/0” = terminal “1.2.3.4” = source IP, “Thu May 5 16:39” = login date and time “22:02” = logout (time only!), “10+05:22” = duration INSERT INTO `logins` (`userid`, `src_ip`, `logintime`, `logouttime`) VALUES (‘phil’, INET_ATON(‘1.2.3.4’), ‘2010-05-05 16:39:00’, TIMEADD(‘2010-05-05 16:39’, ’10 05:22:00’)); Python script for Linux at stuffphilwrites.com/2011/07/sql-ginsu ©2011 Lewes Technology Consulting, LLC
  • 30. Example 1, Step 3: Reduce Eliminate known-good IPs, system accounts, date ranges, etc What pitfalls could this induce? ©2011 Lewes Technology Consulting, LLC
  • 31. Example 1, Step 4: Analyze! Session info for most frequently used source IPs SELECT systemname, INET_NTOA(src_ip), COUNT(*) AS count FROM logins WHERE interesting != 'n' GROUP BY systemname, src_ip ORDER BY count DESC, src_ip; Sessions with login duration > 1 day SELECT *, INET_NTOA(src_ip), TIMEDIFF(logouttime, logintime) AS duration FROM logins WHERE interesting != 'n' HAVING duration > '24:00:00' ORDER BY duration DESC; ©2011 Lewes Technology Consulting, LLC
  • 32. Example 1, Step 4: Analyze! Daily login window per user SELECT userid, systemname, COUNT(*), MIN(TIME(logintime)), MAX(TIME(logintime)) FROM logins WHERE interesting != 'n' GROUP BY userid,systemname; IPs per username per host SELECT userid, systemname, COUNT(*), GROUP_CONCAT(DISTINCT INET_NTOA(src_ip) SEPARATOR ', ') FROM logins WHERE interesting != 'n' GROUP BY userid, systemname ORDER BY systemname, userid; ©2011 Lewes Technology Consulting, LLC
  • 33. Example 2: Network Flows ©2011 Lewes Technology Consulting, LLC
  • 34. Example 2, Step 1: Schema CREATE TABLE `traffic` ( `interesting` enum ('y','n','-') default '-', `sancp_id` bigint unsigned, `start_time_gmt` datetime, `stop_time_gmt` datetime, `eth_proto` smallint unsigned, `ip_proto` tinyint unsigned, `src_ip` int unsigned, Smaller Integers `src_port` smallint unsigned, `dst_ip` int unsigned, `dst_port` smallint unsigned, `duration` int unsigned, `src_pkts` bigint unsigned, `dst_pkts` bigint unsigned, `src_bytes` bigint unsigned, `dst_bytes` bigint unsigned, ); ©2011 Lewes Technology Consulting, LLC
  • 35. Example 2, Step 2: Load Distill pcap (or live) network data to sessions with SANCP Creates pipe-separated list of 50 fields per session Use script to extract relevant fields INSERT INTO `traffic` (`sancp_id`, `start_time_gmt`, `stop_time_gmt`, `eth_proto`, `ip_proto`, `src_ip`, `src_port`, `dst_ip`, `dst_port`, `duration`, `src_pkts`, `dst_pkts`, `src_bytes`, `dst_bytes`) VALUES (5613951026752893809, ‘2011-06-03 11:17:11’, ‘2011-06-03 11:17:25’, 8, 6, INET_NTOA(‘0.2.246.190’), 3306, INET_NTOA(‘0.3.162.24’), 14, 82, 43, 15866, 0); Python script for pcaps at stuffphilwrites.com/2011/07/sql-ginsu ©2011 Lewes Technology Consulting, LLC http://metre.net/sancp.html
  • 36. Example 2, Step 3: Reduce Brute force SSH logins? Do one and observe network traffic <5 sec && <45 packets && <2500 bytes UPDATE traffic SET interesting='n' WHERE (src_port=22 OR dst_port=22) AND duration < 5 AND src_pkts+dst_pkts < 45 AND src_bytes+dst_bytes < 2500; DNS, NTP can sometimes be ruled out UPDATE traffic SET interesting='n' WHERE eth_proto=8 AND ip_proto=17 AND (dst_port=53 OR dst_port=123); ©2011 Lewes Technology Consulting, LLC
  • 37. Example 2, Step 4: Analyze! SSH/SCP/SFTP sources SELECT INET_NTOA(src_ip), duration, INET_NTOA(dst_ip), ((src_bytes+dst_bytes)/1024/1024) AS MB FROM traffic WHERE (src_port=22 or dst_port=22) AND interesting != 'n'; ©2011 Lewes Technology Consulting, LLC
  • 38. Example 2, Step 4: Analyze! Inbound FTP: multiple sessions used - two-stage query SELECT GROUP_CONCAT(src_ip SEPARATOR ', ') FROM traffic WHERE (dst_port=21 AND dst_bytes>0); SELECT INET_NTOA(src_ip), duration, (src_bytes/1024/1024) AS src_MB, (dst_bytes/1024/1024) AS dst_MB, start_time_gmt, stop_time_gmt FROM traffic WHERE (src_port>1024 AND dst_port>1024) AND src_ip IN (<list from prev query>) AND interesting != 'n'; ©2011 Lewes Technology Consulting, LLC
  • 39. Example 2, Step 4: Analyze! Bot beaconing Add column ALTER TABLE `traffic` ADD COLUMN `elapsed` TIME DEFAULT NULL AFTER `dst_bytes`; During data load,“time last seen” = “srcIP:dstIP:dstport” If you’ve seen the IP+IP+port before, insert time delta: elapsed = TIMEDIFF(start_time_gmt, <time_last_seen>); Set “time last seen” in loader script to new start_time_gmt ©2011 Lewes Technology Consulting, LLC
  • 40. Wrap-up Mo’ data, mo’ problems Forensicators need foundational skills of: data management and data reduction Use SQL to do this: 1: Schema design 2: Data load 3: Reduce 4: Analyze! ©2011 Lewes Technology Consulting, LLC

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  11. MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  12. MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  13. MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  14. MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  15. MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  16. MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  17. MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  18. MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  19. MOAR DATA\nMy HDDs\nMOAR DATA SOURCES\n- supertimelining = consistent picture from diverse data sources\n\n
  20. Niche data sources too new or small to get attention from &amp;#x201C;big boys&amp;#x201D;\nScripting is possible, but even that can be limiting (and proprietary)\nFLEXIBLE!\n
  21. Concept of foundational skill sets\nConstruction: Measuring, structural integrity\nCooking: Knife handling, food safety\nSales: Psychology, personality\nComputer Science: Data structures, algorithms\n
  22. \n
  23. SQL is an advanced art\nRaw power still accessible to anyone\n
  24. \n
  25. Data management\nDon&amp;#x2019;t re-invent if not needed\n&amp;#x201C;Can we fix it?&amp;#x201D;\n
  26. Build for reduction\nSave space (and time!)\nKnow tradeoffs\n
  27. Command Line Kung Fu\nUse what you know best!\n\n
  28. Pitfalls with either choice\nBest of both: go big until too big\n
  29. \n
  30. Elegant/creative not always fast\nDOCUMENT!\n\n
  31. \n
  32. \n
  33. \n
  34. Only 800 records in VM\n
  35. VM\n
  36. VM\n
  37. \n
  38. Used schema with 8-10M sessions/day\n
  39. 150k records in VM\n
  40. Observe to characterize traffic for reduction\n&amp;#x201C;always&amp;#x201D; reduce is bad practice (tunneling)\n
  41. VM: Simple list of SSH sessions\nCould do Mbps\n
  42. Two-step query\n(&lt;200ms for both, ~10s for subSELECT)\n
  43. Notional idea - untested\nALTER (can we fix it?)\nTrack running &amp;#x201C;last seen&amp;#x201D; date/time stamp\n
  44. \n
  45. \n