SlideShare a Scribd company logo
1 of 13
Download to read offline
FFRI,Inc.
Fourteenforty Research Institute, Inc.
FFRI, Inc
http://www.ffri.jp
Monthly Research
Consideration and evaluation of using fuzzy hashing
Ver2.00.01
FFRI,Inc.
• Background and purpose
• Basis of fuzzy hashing
• An experiment
• The result
• Consideration
Agenda
2
FFRI,Inc.
• ‘fuzzy hashing’ was introduced in 2006 by Jesse Kornblum
– http://dfrws.org/2006/proceedings/12-Kornblum.pdf
• In malware analysis fuzzy hashing algorithms such as ssdeep are being
introduced in recent years
• (IMHO) However, we don’t consider the effective usage of them enough
• In this slides, we evaluate an effectiveness of classification of malware similarity
by fuzzy hashing
Background and purpose
3
FFRI,Inc.
• In general, cryptographic hashing like MD5 is popular and it has the attributes
as follows: (cf. http://en.wikipedia.org/wiki/Cryptographic_hash_function)
– it is easy to compute the hash value for any given message
– it is infeasible to generate a message that has a given hash
– it is infeasible to modify a message without changing the hash
– it is infeasible to find two different messages with the same hash
• Cryptographic hashing is often used for identify the same files
• On the other hand, it is unsuitable to identify similar files because digests are
completely different even if 1 bit is altered in the other file
• In DFIR, this need exists and fuzzy hashing was developed to solve this
problem
Basis of Fuzzy hashing(1/4)
4
FFRI,Inc.
• Fuzzy hashing = Context Triggered Piecewise Hashing(CTPH) =
Piecewise hashing + Rolling hashing
• Piecewise Hashing
– Dividing message into N-block and calculating hash value of each blocks
• Rolling Hash
– A method to calculate hash value of sub-message(position 1-3, 2-4…) fast
– In general, if calculated values are the same, it is a high probability that
original messages are identical
Basis of Fuzzy hashing(2/4)
5
1 2 3 4 5 6 7 8
a b c d e f g h
Calculating a hash value for each N chars(N=3)
FFRI,Inc.
• Fuzzy hashing(CTPH)
– When rolling hash generates a specific value at any position, it calculates
cryptographic hash value of the partial message from the beginning to the
position
– Generate a hash value by concatenating all (partial) hashes
Basis of Fuzzy hashing(3/4)
6
1 2 3 4 5 6 7 8
a a a b b b c c
①calculating rolling hash, and the result is the specific value (trigger)
②calculating a hash value of this block by cryptographic hash
③calculating rolling hash value from position 4(trying to determine a next block)
FFRI,Inc.
• Fuzzy hashing(CTPH)
– With almost identical messages, it would calculate a hash value of identical
partial message stochastically
– We can identify partial matches between similar messages
Basis of Fuzzy hashing(4/4)
7
1 2 3 4 5 6 7 8
a a a b b b c c
1 2 3 4 5 6 7 8
b b b a a a d d
a823 928c 817d
928c a823 1972
Hash values
for each block
238c7d
8c2372
Hash value for
entire message
FFRI,Inc.
• In general, usage of fuzzy hashing is proposed as follows:
– Determining similar files(i.e. almost identical but MD5s aren’t matched)
– Matching partial data in files
• This time, we evaluate determining similar malware by fuzzy hashing
• We make it clear “how effective it is in actually” and “what we should consider if
we applying it” for the above
An experiment(1/2)
8
FFRI,Inc.
• Preparation
– Preparing 2,036 unique malware files in MD5s collected by ourselves
– Calculating(fuzzy) hash values by ssdeep and similarity of all of each files
• nCr: 2,036C2 = 2,071,630 combinations
• Determining similar malware
– Extracting all the pairs whose similarities are 50%-100%
– Determining if the detection name of files in a pair is matched for each
similarity threshold
An experiment(2/2)
9
A B C …
A 90 82 54
B 76 62
C 46
…
similarity of each malware files(%)
80%+
・A-B
・A-C
80%+
・(A)Trojan.XYZ – (B)Trojan.XYX
・(A)Trojan.XYZ – (C)WORM.DEA
Matched
Not matched
FFRI,Inc.
• The higher the threshold is, the higher matching rate of detection name we get
– Up to the threshold of 90% it keeps around 50-60% matching rate
The result
10
Thenumberofpairswhose
similarityareabovethreshold
Threshold of similarity(%)
FFRI,Inc.
• Dividing “matched” pairs into a group who has “generic” in its name and the others
• “matched(Generic)%” shows the same trend with the matching rate above
-> The higher the threshold is, the more malware are detected as “generic”
The rate detected by the name “Generic”?
11
Thenumberofpairswhose
similarityareabovethreshold
Threshold of similarity(%)
FFRI,Inc.
• A meaning of the result depends on if the AVV uses fuzzy hashing for generic
detection
– If they use fuzzy hashing for generic detection
• The result is natural
– If not
• By using fuzzy hashing, we may obtain a similar result to the generic
detection
• If we use fuzzy hashing for generic detection, 90%+ similarity might be
required with known malware (fuzzy) hash values
Consideration
12
FFRI,Inc.
• E-Mail: research-feedback@ffri.jp
• twitter: @FFRI_Research
Contact Information
13

More Related Content

What's hot

Finding Similar Files in Large Document Repositories
Finding Similar Files in Large Document RepositoriesFinding Similar Files in Large Document Repositories
Finding Similar Files in Large Document Repositories
feiwin
 
It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...
It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...
It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...
Shaghayegh (Sherry) Sahebi
 

What's hot (12)

Probabilistic content models,
Probabilistic content models,Probabilistic content models,
Probabilistic content models,
 
Kyung Kim
Kyung KimKyung Kim
Kyung Kim
 
CNIT 141: 6. Hash Functions
CNIT 141: 6. Hash FunctionsCNIT 141: 6. Hash Functions
CNIT 141: 6. Hash Functions
 
RSA
RSARSA
RSA
 
Symmetric ciphermodel
Symmetric ciphermodelSymmetric ciphermodel
Symmetric ciphermodel
 
Securing Neural Networks
Securing Neural NetworksSecuring Neural Networks
Securing Neural Networks
 
Symmetric Encryption Techniques
Symmetric Encryption Techniques Symmetric Encryption Techniques
Symmetric Encryption Techniques
 
Authorcontext:ire
Authorcontext:ireAuthorcontext:ire
Authorcontext:ire
 
Finding Similar Files in Large Document Repositories
Finding Similar Files in Large Document RepositoriesFinding Similar Files in Large Document Repositories
Finding Similar Files in Large Document Repositories
 
A New Modified Version of Caser Cipher Algorithm
A New Modified Version of Caser Cipher AlgorithmA New Modified Version of Caser Cipher Algorithm
A New Modified Version of Caser Cipher Algorithm
 
It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...
It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...
It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collab...
 
Cross-Domain Recommendation for Large-Scale Data
Cross-Domain Recommendation for Large-Scale DataCross-Domain Recommendation for Large-Scale Data
Cross-Domain Recommendation for Large-Scale Data
 

Similar to MR201403 consideration and evaluation of using fuzzy hashing

secure hash function for authentication in CNS
secure hash function for authentication in CNSsecure hash function for authentication in CNS
secure hash function for authentication in CNS
NithyasriA2
 
CISSP Week 20
CISSP Week 20CISSP Week 20
CISSP Week 20
jemtallon
 
Acquisition of malicious code using active learning
Acquisition of malicious code using active learningAcquisition of malicious code using active learning
Acquisition of malicious code using active learning
UltraUploader
 

Similar to MR201403 consideration and evaluation of using fuzzy hashing (20)

Public Key Encryption & Hash functions
Public Key Encryption & Hash functionsPublic Key Encryption & Hash functions
Public Key Encryption & Hash functions
 
Lecture 02 - 05 Oct 21.pptx
Lecture 02 - 05 Oct 21.pptxLecture 02 - 05 Oct 21.pptx
Lecture 02 - 05 Oct 21.pptx
 
Malicious Hashing: Eve’s Variant of SHA-1
Malicious Hashing: Eve’s Variant of SHA-1Malicious Hashing: Eve’s Variant of SHA-1
Malicious Hashing: Eve’s Variant of SHA-1
 
Network security cryptographic hash function
Network security  cryptographic hash functionNetwork security  cryptographic hash function
Network security cryptographic hash function
 
secure hash function for authentication in CNS
secure hash function for authentication in CNSsecure hash function for authentication in CNS
secure hash function for authentication in CNS
 
Mr201311 behavioral-based malware clustering (English)
Mr201311 behavioral-based malware clustering (English)Mr201311 behavioral-based malware clustering (English)
Mr201311 behavioral-based malware clustering (English)
 
Hash Function & Analysis
Hash Function & AnalysisHash Function & Analysis
Hash Function & Analysis
 
Improving accuracy of malware detection by filtering evaluation dataset based...
Improving accuracy of malware detection by filtering evaluation dataset based...Improving accuracy of malware detection by filtering evaluation dataset based...
Improving accuracy of malware detection by filtering evaluation dataset based...
 
CISSP Week 20
CISSP Week 20CISSP Week 20
CISSP Week 20
 
Hashing
HashingHashing
Hashing
 
Hash function landscape
Hash function landscapeHash function landscape
Hash function landscape
 
Acquisition of malicious code using active learning
Acquisition of malicious code using active learningAcquisition of malicious code using active learning
Acquisition of malicious code using active learning
 
SPIE-2014
SPIE-2014SPIE-2014
SPIE-2014
 
HP-UX with Rsync by Dusan Baljevic
HP-UX with Rsync by Dusan BaljevicHP-UX with Rsync by Dusan Baljevic
HP-UX with Rsync by Dusan Baljevic
 
HASH FUNCTIONS.pdf
HASH FUNCTIONS.pdfHASH FUNCTIONS.pdf
HASH FUNCTIONS.pdf
 
Webinar alain-2009-03-04-clamav
Webinar alain-2009-03-04-clamavWebinar alain-2009-03-04-clamav
Webinar alain-2009-03-04-clamav
 
Hashing
HashingHashing
Hashing
 
Information and data security cryptanalysis method
Information and data security cryptanalysis methodInformation and data security cryptanalysis method
Information and data security cryptanalysis method
 
Malware Static Analysis
Malware Static AnalysisMalware Static Analysis
Malware Static Analysis
 
Digital signatures
Digital signaturesDigital signatures
Digital signatures
 

More from FFRI, Inc.

Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARMAppearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
FFRI, Inc.
 
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARMAppearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
FFRI, Inc.
 

More from FFRI, Inc. (20)

Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARMAppearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
 
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARMAppearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
Appearances are deceiving: Novel offensive techniques in Windows 10/11 on ARM
 
TrustZone use case and trend (FFRI Monthly Research Mar 2017)
TrustZone use case and trend (FFRI Monthly Research Mar 2017) TrustZone use case and trend (FFRI Monthly Research Mar 2017)
TrustZone use case and trend (FFRI Monthly Research Mar 2017)
 
Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...
Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...
Android Things Security Research in Developer Preview 2 (FFRI Monthly Researc...
 
An Overview of the Android Things Security (FFRI Monthly Research Jan 2017)
An Overview of the Android Things Security (FFRI Monthly Research Jan 2017) An Overview of the Android Things Security (FFRI Monthly Research Jan 2017)
An Overview of the Android Things Security (FFRI Monthly Research Jan 2017)
 
Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016)
Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016) Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016)
Black Hat Europe 2016 Survey Report (FFRI Monthly Research Dec 2016)
 
An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)
An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)
An Example of use the Threat Modeling Tool (FFRI Monthly Research Nov 2016)
 
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
STRIDE Variants and Security Requirements-based Threat Analysis (FFRI Monthly...
 
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
Introduction of Threat Analysis Methods(FFRI Monthly Research 2016.9)
 
Black Hat USA 2016 Survey Report (FFRI Monthly Research 2016.8)
Black Hat USA 2016  Survey Report (FFRI Monthly Research 2016.8)Black Hat USA 2016  Survey Report (FFRI Monthly Research 2016.8)
Black Hat USA 2016 Survey Report (FFRI Monthly Research 2016.8)
 
About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7)
About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7) About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7)
About security assessment framework “CHIPSEC” (FFRI Monthly Research 2016.7)
 
Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)
Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)
Black Hat USA 2016 Pre-Survey (FFRI Monthly Research 2016.6)
 
Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)
Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)
Black Hat Asia 2016 Survey Report (FFRI Monthly Research 2016.4)
 
ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...
ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...
ARMv8-M TrustZone: A New Security Feature for Embedded Systems (FFRI Monthly ...
 
CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)
CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)
CODE BLUE 2015 Report (FFRI Monthly Research 2015.11)
 
Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...
Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...
Latest Security Reports of Automobile and Vulnerability Assessment by CVSS v3...
 
Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)
Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)
Black Hat USA 2015 Survey Report (FFRI Monthly Research 201508)
 
A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)
A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)
A Survey of Threats in OS X and iOS(FFRI Monthly Research 201507)
 
Security of Windows 10 IoT Core(FFRI Monthly Research 201506)
Security of Windows 10 IoT Core(FFRI Monthly Research 201506)Security of Windows 10 IoT Core(FFRI Monthly Research 201506)
Security of Windows 10 IoT Core(FFRI Monthly Research 201506)
 
Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...
Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...
Trend of Next-Gen In-Vehicle Network Standard and Current State of Security(F...
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

MR201403 consideration and evaluation of using fuzzy hashing

  • 1. FFRI,Inc. Fourteenforty Research Institute, Inc. FFRI, Inc http://www.ffri.jp Monthly Research Consideration and evaluation of using fuzzy hashing Ver2.00.01
  • 2. FFRI,Inc. • Background and purpose • Basis of fuzzy hashing • An experiment • The result • Consideration Agenda 2
  • 3. FFRI,Inc. • ‘fuzzy hashing’ was introduced in 2006 by Jesse Kornblum – http://dfrws.org/2006/proceedings/12-Kornblum.pdf • In malware analysis fuzzy hashing algorithms such as ssdeep are being introduced in recent years • (IMHO) However, we don’t consider the effective usage of them enough • In this slides, we evaluate an effectiveness of classification of malware similarity by fuzzy hashing Background and purpose 3
  • 4. FFRI,Inc. • In general, cryptographic hashing like MD5 is popular and it has the attributes as follows: (cf. http://en.wikipedia.org/wiki/Cryptographic_hash_function) – it is easy to compute the hash value for any given message – it is infeasible to generate a message that has a given hash – it is infeasible to modify a message without changing the hash – it is infeasible to find two different messages with the same hash • Cryptographic hashing is often used for identify the same files • On the other hand, it is unsuitable to identify similar files because digests are completely different even if 1 bit is altered in the other file • In DFIR, this need exists and fuzzy hashing was developed to solve this problem Basis of Fuzzy hashing(1/4) 4
  • 5. FFRI,Inc. • Fuzzy hashing = Context Triggered Piecewise Hashing(CTPH) = Piecewise hashing + Rolling hashing • Piecewise Hashing – Dividing message into N-block and calculating hash value of each blocks • Rolling Hash – A method to calculate hash value of sub-message(position 1-3, 2-4…) fast – In general, if calculated values are the same, it is a high probability that original messages are identical Basis of Fuzzy hashing(2/4) 5 1 2 3 4 5 6 7 8 a b c d e f g h Calculating a hash value for each N chars(N=3)
  • 6. FFRI,Inc. • Fuzzy hashing(CTPH) – When rolling hash generates a specific value at any position, it calculates cryptographic hash value of the partial message from the beginning to the position – Generate a hash value by concatenating all (partial) hashes Basis of Fuzzy hashing(3/4) 6 1 2 3 4 5 6 7 8 a a a b b b c c ①calculating rolling hash, and the result is the specific value (trigger) ②calculating a hash value of this block by cryptographic hash ③calculating rolling hash value from position 4(trying to determine a next block)
  • 7. FFRI,Inc. • Fuzzy hashing(CTPH) – With almost identical messages, it would calculate a hash value of identical partial message stochastically – We can identify partial matches between similar messages Basis of Fuzzy hashing(4/4) 7 1 2 3 4 5 6 7 8 a a a b b b c c 1 2 3 4 5 6 7 8 b b b a a a d d a823 928c 817d 928c a823 1972 Hash values for each block 238c7d 8c2372 Hash value for entire message
  • 8. FFRI,Inc. • In general, usage of fuzzy hashing is proposed as follows: – Determining similar files(i.e. almost identical but MD5s aren’t matched) – Matching partial data in files • This time, we evaluate determining similar malware by fuzzy hashing • We make it clear “how effective it is in actually” and “what we should consider if we applying it” for the above An experiment(1/2) 8
  • 9. FFRI,Inc. • Preparation – Preparing 2,036 unique malware files in MD5s collected by ourselves – Calculating(fuzzy) hash values by ssdeep and similarity of all of each files • nCr: 2,036C2 = 2,071,630 combinations • Determining similar malware – Extracting all the pairs whose similarities are 50%-100% – Determining if the detection name of files in a pair is matched for each similarity threshold An experiment(2/2) 9 A B C … A 90 82 54 B 76 62 C 46 … similarity of each malware files(%) 80%+ ・A-B ・A-C 80%+ ・(A)Trojan.XYZ – (B)Trojan.XYX ・(A)Trojan.XYZ – (C)WORM.DEA Matched Not matched
  • 10. FFRI,Inc. • The higher the threshold is, the higher matching rate of detection name we get – Up to the threshold of 90% it keeps around 50-60% matching rate The result 10 Thenumberofpairswhose similarityareabovethreshold Threshold of similarity(%)
  • 11. FFRI,Inc. • Dividing “matched” pairs into a group who has “generic” in its name and the others • “matched(Generic)%” shows the same trend with the matching rate above -> The higher the threshold is, the more malware are detected as “generic” The rate detected by the name “Generic”? 11 Thenumberofpairswhose similarityareabovethreshold Threshold of similarity(%)
  • 12. FFRI,Inc. • A meaning of the result depends on if the AVV uses fuzzy hashing for generic detection – If they use fuzzy hashing for generic detection • The result is natural – If not • By using fuzzy hashing, we may obtain a similar result to the generic detection • If we use fuzzy hashing for generic detection, 90%+ similarity might be required with known malware (fuzzy) hash values Consideration 12
  • 13. FFRI,Inc. • E-Mail: research-feedback@ffri.jp • twitter: @FFRI_Research Contact Information 13