SlideShare a Scribd company logo
1 of 34
© Microsoft Corporation
Security Trend Analysis
with CVE Topic Models
Stephan Neuhaus
Universita degli Studi di Trento, Italy
Thomas Zimmermann
Microsoft Research, Redmond, USA
ISSRE 2010, San Jose, CA, USA
© Microsoft Corporation
Background
http://www.microsoft.com/security/sir/default.aspx
Steve Christey, Robert A. Martin
http://cve.mitre.org/docs/vuln-trends/index.html
http://www.sans.org/top-cyber-security-risks/
© Microsoft Corporation
Can we automate the trend analysis of
security reports?
© Microsoft Corporation
Trend Analysis on Vulnerability Data
Raw Data Cleaning Topic Models Trends
© Microsoft Corporation
Data Sources
• Common Vulnerabilities and Exposures
– Hosted by MITRE (large US research company
and defense contractor)
– Clearinghouse for vulnerabilities: Assigns IDs to
vulnerabilities and collects descriptions
• National Vulnerability Database
– Annotated version of the CVE data
– Downloadable from NIST
© Microsoft Corporation
CVE Overview
• Earliest CVE has a published date of
October 1, 1988 (CVE 1999-0095)
– Reports the sendmail DEBUG hole that was
exploited by the Morris worm
• Latest CVEs in our dataset are from
December 31, 2009
• Total of 39,743 entries, 350 duplicates
• 39,393 unique CVEs remain
© Microsoft Corporation
Number of CVEs
© Microsoft Corporation
Summary
Impact
Classification
Date
ID
© Microsoft Corporation
Document Processing
Stack-based buffer overflow in vpnconf.exe in TheGreenBow
IPSec VPN Client 4.51.001, 4.65.003, and possibly other
versions, allows user-assisted remote attackers to execute
arbitrary code via a long OpenScriptAfterUp parameter in a
policy (.tgb) file, related to "phase 2."
© Microsoft Corporation
Document Processing
Stack-based buffer overflow in vpnconf.exe in TheGreenBow
IPSec VPN Client 4.51.001, 4.65.003, and possibly other
versions, allows user-assisted remote attackers to execute
arbitrary code via a long OpenScriptAfterUp parameter in a
policy (.tgb) file, related to "phase 2."
STOP WORDS
Remove
common
words
© Microsoft Corporation
Stack-based buffer overflow in vpnconf.exe in TheGreenBow
IPSec VPN Client 4.51.001, 4.65.003, and possibly other
versions, allows user-assisted remote attackers to execute
arbitrary code via a long OpenScriptAfterUp parameter in a
policy (.tgb) file, related to "phase 2."
Stack-based buffer overflow in vpnconf.exe in TheGreenBow
IPSec VPN Client 4.51.001, 4.65.003, and possibly other
versions, allows user-assisted remote attackers to execute
arbitrary code via a long OpenScriptAfterUp parameter in a
policy (.tgb) file, related to "phase 2."
i
i
Document Processing
STOP WORDS
Remove
common
words
STEMMER
Remove
word suffixes
© Microsoft Corporation
Length of CVE summary
© Microsoft Corporation
Topic Analysis
CVE
Application Server
Buffer Overflow
Cross-site
Scripting
© Microsoft Corporation
Topic Analysis
• Based on “Latent Dirichlet Allocation”
• Documents (CVEs) are bags of words
– Word order does not matter
• Words are assigned to (different) topics
– Probabilistic assessment
• We compute topics for all years 2000-2009
– Use post-hoc probabilities to find the fraction of
CVEs about a given topic in a given year
© Microsoft Corporation
Topic Example
overflow 6.29 execut 6.15 buffer 5.74 arbitrari 5.55 code 4.77
remot 4.29 command 3.05 long 2.83 craft 1.31 file 1.30
script 14.31 html 7.26 cross-sit 7.15 xss 6.92 web 6.73
inject 6.62 vulner 6.48 arbitrari 6.34 remot 5.73 paramet 4.47
11.3%
9.2%
Buffer Overflow
Cross-site Scripting
© Microsoft Corporation
28 Topics
http://www.wordle.net/show/wrdl/2674704/Relative_Importance_of_Security_Topics_identified_from_CVEs
2009
© Microsoft Corporation
Trends
Topic Trend 2000 2009
Application Servers 1% 5%
Arbitrary Code (PHP) 0% 2%
Buffer Overflow 19% 11%
Cross-Site Scripting 0% 10%
Link Resolution 6% 1%
Privilege Escalation 12% 2%
Resource Management 14% 10%
SQL Injection 1% 10%
© Microsoft Corporation
Cause and Impact
“x allows someone to y”
BEA WebLogic Portal 10.0 and 9.2 through MP1,
when an administrator deletes a single instance of a
content portlet, removes entitlement policies for
other content portlets
bypass intended access restrictions.
which allows attackers to
© Microsoft Corporation
Cause and Impact
“x allows someone to y”
BEA WebLogic Portal 10.0 and 9.2 through MP1,
when an administrator deletes a single instance of a
content portlet, removes entitlement policies for
other content portlets
bypass intended access restrictions.
which allows attackers to
CAUSE
© Microsoft Corporation
Cause and Impact
“x allows someone to y”
BEA WebLogic Portal 10.0 and 9.2 through MP1,
when an administrator deletes a single instance of a
content portlet, removes entitlement policies for
other content portlets
bypass intended access restrictions.
which allows attackers to
CAUSE
IMPACT
© Microsoft Corporation
24 Topics for “Cause”
http://www.wordle.net/show/wrdl/2674788/Relative_Importance_of_Security_%22Cause%22_Topics_identified_from_CVEs
2009
© Microsoft Corporation
Trends for “Cause”
Topic Trend 2000 2009
Buffer Overflow 17% 10%
Cross-Site Scripting 11% 17%
PHP 5% 8%
SQL Injection 10% 21%
© Microsoft Corporation
12 Topics for “Impact”
http://www.wordle.net/show/wrdl/2674920/Relative_Importance_of_Security_%22Impact%22_Topics_identified_from_CVEs
2009
© Microsoft Corporation
Trends for “Impact”
Topic Trend 2000 2009
Arbitrary Code 15% 24%
Arbitrary Script 17% 35%
Denial of Service 30% 11%
Information Leak 22% 15%
Privilege Escalation 11% 7%
© Microsoft Corporation
Common Weakness Enumeration
• Supposed to be a complete dictionary for
software weaknesses
• There are lots of available CWEs (659)
• But only 19 are used in CVE entries,
73% of CVE entries have a CWE field
• How well do our LDA topics align with the
manual CWE classification?
Classification
© Microsoft Corporation
Alignment with CWEs
Precision Recall LDA Topic Name
Only those that mapped to CWEs
© Microsoft Corporation
Alignment with CWEs
Precision Recall LDA Topic Name
SQL Injection
Cross-Site Scripting
Directory Traversal
Link Resolution
Format String
Buffer Overflow
Resource Management
Cross-Site Request Forgery
Information Leak
Cryptography
Credentials Management
Arbitrary Code
© Microsoft Corporation
Alignment with CWEs
Precision Recall LDA Topic Name
97.8 94.6 SQL Injection
98.1 85.4 Cross-Site Scripting
93.1 85.6 Directory Traversal
57.6 80.1 Link Resolution
51.8 75.3 Format String
60.1 57.6 Buffer Overflow
29.7 49.3 Resource Management
24.9 54.5 Cross-Site Request Forgery
33.1 18.6 Information Leak
28.0 18.0 Cryptography
12.1 38.7 Credentials Management
14.2 8.7 Arbitrary Code
© Microsoft Corporation
Buffer Overflow
• CVE 2008-0090:
– “A certain ActiveX control in npUpload.dll in DivX
Player 6.6.0 allows remote attackers to cause a
denial of service (Internet Explorer 7 crash) via a
long argument to the SetPassword method.”
– Possible classifiers: input validation, resource
management, credentials management
– Assigned classifier: buffer overflow
© Microsoft Corporation
Buffer Overflow
• Vulnerability descriptions are sometimes
not very specific
• CWEs are not mutually exclusive (there is
“buffer overflow” and “arbitrary code”)
• CWE assignment is not quality-checked
• Only one CWE can be assigned, even
when the CVE is about multiple issues
© Microsoft Corporation
Get the Data and Scripts!
http://tomz.me/issre2010cve
(Or follow the link in the paper.)
© Microsoft Corporation
PHP: declining, with occasional SQL injection.
Buffer Overflows: flattening out after decline.
Format Strings: in steep decline.
SQL Injection and XSS: remaining strong, and rising.
Cross-Site Request Forgery: a sleeping giant perhaps, stirring.
Application Servers: rising steeply.
http://msrconf.org
© Microsoft Corporation
Thank you!
© Microsoft Corporation
Mining Software Repositories 2011
http://msrconf.org

More Related Content

What's hot

Container security
Container securityContainer security
Container securityAnthony Chow
 
[OWASP Poland Day] A study of Electron security
[OWASP Poland Day] A study of Electron security[OWASP Poland Day] A study of Electron security
[OWASP Poland Day] A study of Electron securityOWASP
 
Patch Tuesday - August 2017 - Ivanti
Patch Tuesday - August 2017 - IvantiPatch Tuesday - August 2017 - Ivanti
Patch Tuesday - August 2017 - IvantiErica Azad
 
Equifax cyber attack contained by containers
Equifax cyber attack contained by containersEquifax cyber attack contained by containers
Equifax cyber attack contained by containersAqua Security
 
Threat modeling with architectural risk patterns
Threat modeling with architectural risk patternsThreat modeling with architectural risk patterns
Threat modeling with architectural risk patternsStephen de Vries
 
BlueHat v17 || Down the Open Source Software Rabbit Hole
BlueHat v17 || Down the Open Source Software Rabbit Hole BlueHat v17 || Down the Open Source Software Rabbit Hole
BlueHat v17 || Down the Open Source Software Rabbit Hole BlueHat Security Conference
 
JHipster and Okta - JHipster Virtual Meetup December 2020
JHipster and Okta - JHipster Virtual Meetup December 2020JHipster and Okta - JHipster Virtual Meetup December 2020
JHipster and Okta - JHipster Virtual Meetup December 2020Matt Raible
 
Evaluating container security with ATT&CK Framework
Evaluating container security with ATT&CK FrameworkEvaluating container security with ATT&CK Framework
Evaluating container security with ATT&CK FrameworkSandeep Jayashankar
 
Virtual Networking Security - Perimeter Security
Virtual Networking Security - Perimeter SecurityVirtual Networking Security - Perimeter Security
Virtual Networking Security - Perimeter SecurityEng Teong Cheah
 
Abusing, Exploiting and Pwning with Firefox Add-ons
Abusing, Exploiting and Pwning with Firefox Add-onsAbusing, Exploiting and Pwning with Firefox Add-ons
Abusing, Exploiting and Pwning with Firefox Add-onsAjin Abraham
 
Modern Security Operations aka Secure DevOps @ All Day DevOps 2017
Modern Security Operations aka Secure DevOps @ All Day DevOps 2017Modern Security Operations aka Secure DevOps @ All Day DevOps 2017
Modern Security Operations aka Secure DevOps @ All Day DevOps 2017Madhu Akula
 
How to measure your security response readiness?
How to measure your security response readiness?How to measure your security response readiness?
How to measure your security response readiness?Tomasz Jakubowski
 
What you need to know about ExPetr ransomware
What you need to know about ExPetr ransomwareWhat you need to know about ExPetr ransomware
What you need to know about ExPetr ransomwareKaspersky
 
Study of Directory Traversal Attack and Tools Used for Attack
Study of Directory Traversal Attack and Tools Used for AttackStudy of Directory Traversal Attack and Tools Used for Attack
Study of Directory Traversal Attack and Tools Used for Attackijtsrd
 

What's hot (20)

Container security
Container securityContainer security
Container security
 
Fortify dev ops (002)
Fortify   dev ops (002)Fortify   dev ops (002)
Fortify dev ops (002)
 
[OWASP Poland Day] A study of Electron security
[OWASP Poland Day] A study of Electron security[OWASP Poland Day] A study of Electron security
[OWASP Poland Day] A study of Electron security
 
Patch Tuesday - August 2017 - Ivanti
Patch Tuesday - August 2017 - IvantiPatch Tuesday - August 2017 - Ivanti
Patch Tuesday - August 2017 - Ivanti
 
Equifax cyber attack contained by containers
Equifax cyber attack contained by containersEquifax cyber attack contained by containers
Equifax cyber attack contained by containers
 
Five years of Persistent Threats
Five years of Persistent ThreatsFive years of Persistent Threats
Five years of Persistent Threats
 
Anatomy of a Cloud Hack
Anatomy of a Cloud HackAnatomy of a Cloud Hack
Anatomy of a Cloud Hack
 
Crouching powerpoint, Hidden Trojan
Crouching powerpoint, Hidden TrojanCrouching powerpoint, Hidden Trojan
Crouching powerpoint, Hidden Trojan
 
Injection flaw teaser
Injection flaw teaserInjection flaw teaser
Injection flaw teaser
 
Threat modeling with architectural risk patterns
Threat modeling with architectural risk patternsThreat modeling with architectural risk patterns
Threat modeling with architectural risk patterns
 
BlueHat v17 || Down the Open Source Software Rabbit Hole
BlueHat v17 || Down the Open Source Software Rabbit Hole BlueHat v17 || Down the Open Source Software Rabbit Hole
BlueHat v17 || Down the Open Source Software Rabbit Hole
 
JHipster and Okta - JHipster Virtual Meetup December 2020
JHipster and Okta - JHipster Virtual Meetup December 2020JHipster and Okta - JHipster Virtual Meetup December 2020
JHipster and Okta - JHipster Virtual Meetup December 2020
 
Evaluating container security with ATT&CK Framework
Evaluating container security with ATT&CK FrameworkEvaluating container security with ATT&CK Framework
Evaluating container security with ATT&CK Framework
 
Virtual Networking Security - Perimeter Security
Virtual Networking Security - Perimeter SecurityVirtual Networking Security - Perimeter Security
Virtual Networking Security - Perimeter Security
 
Abusing, Exploiting and Pwning with Firefox Add-ons
Abusing, Exploiting and Pwning with Firefox Add-onsAbusing, Exploiting and Pwning with Firefox Add-ons
Abusing, Exploiting and Pwning with Firefox Add-ons
 
Modern Security Operations aka Secure DevOps @ All Day DevOps 2017
Modern Security Operations aka Secure DevOps @ All Day DevOps 2017Modern Security Operations aka Secure DevOps @ All Day DevOps 2017
Modern Security Operations aka Secure DevOps @ All Day DevOps 2017
 
How to measure your security response readiness?
How to measure your security response readiness?How to measure your security response readiness?
How to measure your security response readiness?
 
What you need to know about ExPetr ransomware
What you need to know about ExPetr ransomwareWhat you need to know about ExPetr ransomware
What you need to know about ExPetr ransomware
 
Securing Apache Web Servers
Securing Apache Web ServersSecuring Apache Web Servers
Securing Apache Web Servers
 
Study of Directory Traversal Attack and Tools Used for Attack
Study of Directory Traversal Attack and Tools Used for AttackStudy of Directory Traversal Attack and Tools Used for Attack
Study of Directory Traversal Attack and Tools Used for Attack
 

Similar to Security trend analysis with CVE topic models

Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Amine Barrak
 
How do JavaScript frameworks impact the security of applications?
How do JavaScript frameworks impact the security of applications?How do JavaScript frameworks impact the security of applications?
How do JavaScript frameworks impact the security of applications?Ksenia Peguero
 
Secure Application Development in the Age of Continuous Delivery
Secure Application Development in the Age of Continuous DeliverySecure Application Development in the Age of Continuous Delivery
Secure Application Development in the Age of Continuous DeliveryBlack Duck by Synopsys
 
Secure Application Development in the Age of Continuous Delivery
Secure Application Development in the Age of Continuous DeliverySecure Application Development in the Age of Continuous Delivery
Secure Application Development in the Age of Continuous DeliveryTim Mackey
 
Martin Koons Resume 2015
Martin Koons Resume 2015Martin Koons Resume 2015
Martin Koons Resume 2015Marty Koons
 
Français Patch Tuesday – Novembre
Français Patch Tuesday – NovembreFrançais Patch Tuesday – Novembre
Français Patch Tuesday – NovembreIvanti
 
Patch Tuesday de Noviembre
Patch Tuesday de NoviembrePatch Tuesday de Noviembre
Patch Tuesday de NoviembreIvanti
 
Protecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropperProtecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropperShakas Technologies
 
Protecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropperProtecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropperShakas Technologies
 
Catching Multilayered Zero-Day Attacks on MS Office
Catching Multilayered Zero-Day Attacks on MS OfficeCatching Multilayered Zero-Day Attacks on MS Office
Catching Multilayered Zero-Day Attacks on MS OfficeKaspersky
 
2023 November Patch Tuesday
2023 November Patch Tuesday2023 November Patch Tuesday
2023 November Patch TuesdayIvanti
 
Security Operations
Security OperationsSecurity Operations
Security Operationsankitmehta21
 
Protecting Against Web Attacks
Protecting Against Web AttacksProtecting Against Web Attacks
Protecting Against Web AttacksAlert Logic
 
Stranger Danger: Your Java Attack Surface Just Got Bigger | JBCNConf 2022
Stranger Danger: Your Java Attack Surface Just Got Bigger | JBCNConf 2022Stranger Danger: Your Java Attack Surface Just Got Bigger | JBCNConf 2022
Stranger Danger: Your Java Attack Surface Just Got Bigger | JBCNConf 2022Brian Vermeer
 
Patch Tuesday Italia Novembre
Patch Tuesday Italia NovembrePatch Tuesday Italia Novembre
Patch Tuesday Italia NovembreIvanti
 
Introducing the Open Container Project
Introducing the Open Container ProjectIntroducing the Open Container Project
Introducing the Open Container ProjectAndrew Kennedy
 
Apache struts vulnerabilities compromise corporate web servers 
Apache struts vulnerabilities compromise corporate web servers Apache struts vulnerabilities compromise corporate web servers 
Apache struts vulnerabilities compromise corporate web servers Jeff Suratt
 
Application security meetup k8_s security with zero trust_29072021
Application security meetup k8_s security with zero trust_29072021Application security meetup k8_s security with zero trust_29072021
Application security meetup k8_s security with zero trust_29072021lior mazor
 
CI and CD with Spinnaker
CI and CD with SpinnakerCI and CD with Spinnaker
CI and CD with SpinnakerVMware Tanzu
 
Microservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native AppsMicroservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native AppsAraf Karsh Hamid
 

Similar to Security trend analysis with CVE topic models (20)

Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
Just-in-time Detection of Protection-Impacting Changes on WordPress and Media...
 
How do JavaScript frameworks impact the security of applications?
How do JavaScript frameworks impact the security of applications?How do JavaScript frameworks impact the security of applications?
How do JavaScript frameworks impact the security of applications?
 
Secure Application Development in the Age of Continuous Delivery
Secure Application Development in the Age of Continuous DeliverySecure Application Development in the Age of Continuous Delivery
Secure Application Development in the Age of Continuous Delivery
 
Secure Application Development in the Age of Continuous Delivery
Secure Application Development in the Age of Continuous DeliverySecure Application Development in the Age of Continuous Delivery
Secure Application Development in the Age of Continuous Delivery
 
Martin Koons Resume 2015
Martin Koons Resume 2015Martin Koons Resume 2015
Martin Koons Resume 2015
 
Français Patch Tuesday – Novembre
Français Patch Tuesday – NovembreFrançais Patch Tuesday – Novembre
Français Patch Tuesday – Novembre
 
Patch Tuesday de Noviembre
Patch Tuesday de NoviembrePatch Tuesday de Noviembre
Patch Tuesday de Noviembre
 
Protecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropperProtecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropper
 
Protecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropperProtecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropper
 
Catching Multilayered Zero-Day Attacks on MS Office
Catching Multilayered Zero-Day Attacks on MS OfficeCatching Multilayered Zero-Day Attacks on MS Office
Catching Multilayered Zero-Day Attacks on MS Office
 
2023 November Patch Tuesday
2023 November Patch Tuesday2023 November Patch Tuesday
2023 November Patch Tuesday
 
Security Operations
Security OperationsSecurity Operations
Security Operations
 
Protecting Against Web Attacks
Protecting Against Web AttacksProtecting Against Web Attacks
Protecting Against Web Attacks
 
Stranger Danger: Your Java Attack Surface Just Got Bigger | JBCNConf 2022
Stranger Danger: Your Java Attack Surface Just Got Bigger | JBCNConf 2022Stranger Danger: Your Java Attack Surface Just Got Bigger | JBCNConf 2022
Stranger Danger: Your Java Attack Surface Just Got Bigger | JBCNConf 2022
 
Patch Tuesday Italia Novembre
Patch Tuesday Italia NovembrePatch Tuesday Italia Novembre
Patch Tuesday Italia Novembre
 
Introducing the Open Container Project
Introducing the Open Container ProjectIntroducing the Open Container Project
Introducing the Open Container Project
 
Apache struts vulnerabilities compromise corporate web servers 
Apache struts vulnerabilities compromise corporate web servers Apache struts vulnerabilities compromise corporate web servers 
Apache struts vulnerabilities compromise corporate web servers 
 
Application security meetup k8_s security with zero trust_29072021
Application security meetup k8_s security with zero trust_29072021Application security meetup k8_s security with zero trust_29072021
Application security meetup k8_s security with zero trust_29072021
 
CI and CD with Spinnaker
CI and CD with SpinnakerCI and CD with Spinnaker
CI and CD with Spinnaker
 
Microservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native AppsMicroservices Architecture - Cloud Native Apps
Microservices Architecture - Cloud Native Apps
 

More from Thomas Zimmermann

Software Analytics = Sharing Information
Software Analytics = Sharing InformationSoftware Analytics = Sharing Information
Software Analytics = Sharing InformationThomas Zimmermann
 
Predicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode OperationsPredicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode OperationsThomas Zimmermann
 
Analytics for smarter software development
Analytics for smarter software development Analytics for smarter software development
Analytics for smarter software development Thomas Zimmermann
 
Characterizing and Predicting Which Bugs Get Reopened
Characterizing and Predicting Which Bugs Get ReopenedCharacterizing and Predicting Which Bugs Get Reopened
Characterizing and Predicting Which Bugs Get ReopenedThomas Zimmermann
 
Data driven games user research
Data driven games user researchData driven games user research
Data driven games user researchThomas Zimmermann
 
Not my bug! Reasons for software bug report reassignments
Not my bug! Reasons for software bug report reassignmentsNot my bug! Reasons for software bug report reassignments
Not my bug! Reasons for software bug report reassignmentsThomas Zimmermann
 
Empirical Software Engineering at Microsoft Research
Empirical Software Engineering at Microsoft ResearchEmpirical Software Engineering at Microsoft Research
Empirical Software Engineering at Microsoft ResearchThomas Zimmermann
 
Analytics for software development
Analytics for software developmentAnalytics for software development
Analytics for software developmentThomas Zimmermann
 
Characterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixedCharacterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixedThomas Zimmermann
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesThomas Zimmermann
 
Cross-project defect prediction
Cross-project defect predictionCross-project defect prediction
Cross-project defect predictionThomas Zimmermann
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesThomas Zimmermann
 
Predicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency GraphsPredicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency GraphsThomas Zimmermann
 
Quality of Bug Reports in Open Source
Quality of Bug Reports in Open SourceQuality of Bug Reports in Open Source
Quality of Bug Reports in Open SourceThomas Zimmermann
 
Predicting Subsystem Defects using Dependency Graph Complexities
Predicting Subsystem Defects using Dependency Graph Complexities Predicting Subsystem Defects using Dependency Graph Complexities
Predicting Subsystem Defects using Dependency Graph Complexities Thomas Zimmermann
 
Got Myth? Myths in Software Engineering
Got Myth? Myths in Software EngineeringGot Myth? Myths in Software Engineering
Got Myth? Myths in Software EngineeringThomas Zimmermann
 
Mining Workspace Updates in CVS
Mining Workspace Updates in CVSMining Workspace Updates in CVS
Mining Workspace Updates in CVSThomas Zimmermann
 

More from Thomas Zimmermann (20)

Software Analytics = Sharing Information
Software Analytics = Sharing InformationSoftware Analytics = Sharing Information
Software Analytics = Sharing Information
 
MSR 2013 Preview
MSR 2013 PreviewMSR 2013 Preview
MSR 2013 Preview
 
Predicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode OperationsPredicting Method Crashes with Bytecode Operations
Predicting Method Crashes with Bytecode Operations
 
Analytics for smarter software development
Analytics for smarter software development Analytics for smarter software development
Analytics for smarter software development
 
Characterizing and Predicting Which Bugs Get Reopened
Characterizing and Predicting Which Bugs Get ReopenedCharacterizing and Predicting Which Bugs Get Reopened
Characterizing and Predicting Which Bugs Get Reopened
 
Klingon Countdown Timer
Klingon Countdown TimerKlingon Countdown Timer
Klingon Countdown Timer
 
Data driven games user research
Data driven games user researchData driven games user research
Data driven games user research
 
Not my bug! Reasons for software bug report reassignments
Not my bug! Reasons for software bug report reassignmentsNot my bug! Reasons for software bug report reassignments
Not my bug! Reasons for software bug report reassignments
 
Empirical Software Engineering at Microsoft Research
Empirical Software Engineering at Microsoft ResearchEmpirical Software Engineering at Microsoft Research
Empirical Software Engineering at Microsoft Research
 
Analytics for software development
Analytics for software developmentAnalytics for software development
Analytics for software development
 
Characterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixedCharacterizing and predicting which bugs get fixed
Characterizing and predicting which bugs get fixed
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
 
Cross-project defect prediction
Cross-project defect predictionCross-project defect prediction
Cross-project defect prediction
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
 
Predicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency GraphsPredicting Defects using Network Analysis on Dependency Graphs
Predicting Defects using Network Analysis on Dependency Graphs
 
Quality of Bug Reports in Open Source
Quality of Bug Reports in Open SourceQuality of Bug Reports in Open Source
Quality of Bug Reports in Open Source
 
Meet Tom and his Fish
Meet Tom and his FishMeet Tom and his Fish
Meet Tom and his Fish
 
Predicting Subsystem Defects using Dependency Graph Complexities
Predicting Subsystem Defects using Dependency Graph Complexities Predicting Subsystem Defects using Dependency Graph Complexities
Predicting Subsystem Defects using Dependency Graph Complexities
 
Got Myth? Myths in Software Engineering
Got Myth? Myths in Software EngineeringGot Myth? Myths in Software Engineering
Got Myth? Myths in Software Engineering
 
Mining Workspace Updates in CVS
Mining Workspace Updates in CVSMining Workspace Updates in CVS
Mining Workspace Updates in CVS
 

Recently uploaded

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Recently uploaded (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Security trend analysis with CVE topic models

  • 1. © Microsoft Corporation Security Trend Analysis with CVE Topic Models Stephan Neuhaus Universita degli Studi di Trento, Italy Thomas Zimmermann Microsoft Research, Redmond, USA ISSRE 2010, San Jose, CA, USA
  • 2. © Microsoft Corporation Background http://www.microsoft.com/security/sir/default.aspx Steve Christey, Robert A. Martin http://cve.mitre.org/docs/vuln-trends/index.html http://www.sans.org/top-cyber-security-risks/
  • 3. © Microsoft Corporation Can we automate the trend analysis of security reports?
  • 4. © Microsoft Corporation Trend Analysis on Vulnerability Data Raw Data Cleaning Topic Models Trends
  • 5. © Microsoft Corporation Data Sources • Common Vulnerabilities and Exposures – Hosted by MITRE (large US research company and defense contractor) – Clearinghouse for vulnerabilities: Assigns IDs to vulnerabilities and collects descriptions • National Vulnerability Database – Annotated version of the CVE data – Downloadable from NIST
  • 6. © Microsoft Corporation CVE Overview • Earliest CVE has a published date of October 1, 1988 (CVE 1999-0095) – Reports the sendmail DEBUG hole that was exploited by the Morris worm • Latest CVEs in our dataset are from December 31, 2009 • Total of 39,743 entries, 350 duplicates • 39,393 unique CVEs remain
  • 9. © Microsoft Corporation Document Processing Stack-based buffer overflow in vpnconf.exe in TheGreenBow IPSec VPN Client 4.51.001, 4.65.003, and possibly other versions, allows user-assisted remote attackers to execute arbitrary code via a long OpenScriptAfterUp parameter in a policy (.tgb) file, related to "phase 2."
  • 10. © Microsoft Corporation Document Processing Stack-based buffer overflow in vpnconf.exe in TheGreenBow IPSec VPN Client 4.51.001, 4.65.003, and possibly other versions, allows user-assisted remote attackers to execute arbitrary code via a long OpenScriptAfterUp parameter in a policy (.tgb) file, related to "phase 2." STOP WORDS Remove common words
  • 11. © Microsoft Corporation Stack-based buffer overflow in vpnconf.exe in TheGreenBow IPSec VPN Client 4.51.001, 4.65.003, and possibly other versions, allows user-assisted remote attackers to execute arbitrary code via a long OpenScriptAfterUp parameter in a policy (.tgb) file, related to "phase 2." Stack-based buffer overflow in vpnconf.exe in TheGreenBow IPSec VPN Client 4.51.001, 4.65.003, and possibly other versions, allows user-assisted remote attackers to execute arbitrary code via a long OpenScriptAfterUp parameter in a policy (.tgb) file, related to "phase 2." i i Document Processing STOP WORDS Remove common words STEMMER Remove word suffixes
  • 13. © Microsoft Corporation Topic Analysis CVE Application Server Buffer Overflow Cross-site Scripting
  • 14. © Microsoft Corporation Topic Analysis • Based on “Latent Dirichlet Allocation” • Documents (CVEs) are bags of words – Word order does not matter • Words are assigned to (different) topics – Probabilistic assessment • We compute topics for all years 2000-2009 – Use post-hoc probabilities to find the fraction of CVEs about a given topic in a given year
  • 15. © Microsoft Corporation Topic Example overflow 6.29 execut 6.15 buffer 5.74 arbitrari 5.55 code 4.77 remot 4.29 command 3.05 long 2.83 craft 1.31 file 1.30 script 14.31 html 7.26 cross-sit 7.15 xss 6.92 web 6.73 inject 6.62 vulner 6.48 arbitrari 6.34 remot 5.73 paramet 4.47 11.3% 9.2% Buffer Overflow Cross-site Scripting
  • 16. © Microsoft Corporation 28 Topics http://www.wordle.net/show/wrdl/2674704/Relative_Importance_of_Security_Topics_identified_from_CVEs 2009
  • 17. © Microsoft Corporation Trends Topic Trend 2000 2009 Application Servers 1% 5% Arbitrary Code (PHP) 0% 2% Buffer Overflow 19% 11% Cross-Site Scripting 0% 10% Link Resolution 6% 1% Privilege Escalation 12% 2% Resource Management 14% 10% SQL Injection 1% 10%
  • 18. © Microsoft Corporation Cause and Impact “x allows someone to y” BEA WebLogic Portal 10.0 and 9.2 through MP1, when an administrator deletes a single instance of a content portlet, removes entitlement policies for other content portlets bypass intended access restrictions. which allows attackers to
  • 19. © Microsoft Corporation Cause and Impact “x allows someone to y” BEA WebLogic Portal 10.0 and 9.2 through MP1, when an administrator deletes a single instance of a content portlet, removes entitlement policies for other content portlets bypass intended access restrictions. which allows attackers to CAUSE
  • 20. © Microsoft Corporation Cause and Impact “x allows someone to y” BEA WebLogic Portal 10.0 and 9.2 through MP1, when an administrator deletes a single instance of a content portlet, removes entitlement policies for other content portlets bypass intended access restrictions. which allows attackers to CAUSE IMPACT
  • 21. © Microsoft Corporation 24 Topics for “Cause” http://www.wordle.net/show/wrdl/2674788/Relative_Importance_of_Security_%22Cause%22_Topics_identified_from_CVEs 2009
  • 22. © Microsoft Corporation Trends for “Cause” Topic Trend 2000 2009 Buffer Overflow 17% 10% Cross-Site Scripting 11% 17% PHP 5% 8% SQL Injection 10% 21%
  • 23. © Microsoft Corporation 12 Topics for “Impact” http://www.wordle.net/show/wrdl/2674920/Relative_Importance_of_Security_%22Impact%22_Topics_identified_from_CVEs 2009
  • 24. © Microsoft Corporation Trends for “Impact” Topic Trend 2000 2009 Arbitrary Code 15% 24% Arbitrary Script 17% 35% Denial of Service 30% 11% Information Leak 22% 15% Privilege Escalation 11% 7%
  • 25. © Microsoft Corporation Common Weakness Enumeration • Supposed to be a complete dictionary for software weaknesses • There are lots of available CWEs (659) • But only 19 are used in CVE entries, 73% of CVE entries have a CWE field • How well do our LDA topics align with the manual CWE classification? Classification
  • 26. © Microsoft Corporation Alignment with CWEs Precision Recall LDA Topic Name Only those that mapped to CWEs
  • 27. © Microsoft Corporation Alignment with CWEs Precision Recall LDA Topic Name SQL Injection Cross-Site Scripting Directory Traversal Link Resolution Format String Buffer Overflow Resource Management Cross-Site Request Forgery Information Leak Cryptography Credentials Management Arbitrary Code
  • 28. © Microsoft Corporation Alignment with CWEs Precision Recall LDA Topic Name 97.8 94.6 SQL Injection 98.1 85.4 Cross-Site Scripting 93.1 85.6 Directory Traversal 57.6 80.1 Link Resolution 51.8 75.3 Format String 60.1 57.6 Buffer Overflow 29.7 49.3 Resource Management 24.9 54.5 Cross-Site Request Forgery 33.1 18.6 Information Leak 28.0 18.0 Cryptography 12.1 38.7 Credentials Management 14.2 8.7 Arbitrary Code
  • 29. © Microsoft Corporation Buffer Overflow • CVE 2008-0090: – “A certain ActiveX control in npUpload.dll in DivX Player 6.6.0 allows remote attackers to cause a denial of service (Internet Explorer 7 crash) via a long argument to the SetPassword method.” – Possible classifiers: input validation, resource management, credentials management – Assigned classifier: buffer overflow
  • 30. © Microsoft Corporation Buffer Overflow • Vulnerability descriptions are sometimes not very specific • CWEs are not mutually exclusive (there is “buffer overflow” and “arbitrary code”) • CWE assignment is not quality-checked • Only one CWE can be assigned, even when the CVE is about multiple issues
  • 31. © Microsoft Corporation Get the Data and Scripts! http://tomz.me/issre2010cve (Or follow the link in the paper.)
  • 32. © Microsoft Corporation PHP: declining, with occasional SQL injection. Buffer Overflows: flattening out after decline. Format Strings: in steep decline. SQL Injection and XSS: remaining strong, and rising. Cross-Site Request Forgery: a sleeping giant perhaps, stirring. Application Servers: rising steeply. http://msrconf.org
  • 34. © Microsoft Corporation Mining Software Repositories 2011 http://msrconf.org