We have developed a set of techniques to detect malicious SSL certificates using data collected by Bro. Our analysis framework consists of Bro for collecting the data and a variety of tools such as Splunk and AWS ML for data analysis. We show how we used Bro for collecting the attributes we needed for SSL certificates from both good and bad sources. Bro is a very effective and simple tool for analyzing and extracting data from network traffic.
Next, the extracted data was loaded into Splunk and we ran a series of Machine Learning algorithms to identify those attributes that correlated with malicious activity. The algorithms we used also allowed for categorization of certificates used in the delivery and control of malware. Our analysis showed that there were a number of patterns that emerged that allowed for classification of high-jacked devices, self-signed certificates, etc. We will present the results of our analysis which show which attributes are the most relevant for detecting malicious SSL certificates and as well the performance of the ML algorithms. Finally, we show how well the training has worked in detecting new malicious sources. All of the source code will be made available on github.
3. Motivation
• SSL traffic is increasing and so is malicious usage!
• Content visibility of SSL traffic is becoming increasingly harder
4. Motivation
• SSL traffic is increasing and so is malicious usage!
• Content visibility of SSL traffic is becoming increasingly harder
• BSides Charm talk – Using Bro IDS to Detect X509 Anomalies by Will
Glodek
5. Direct application of cert feeds
• Well known SSL cert blacklist, SSLBL by abuse.ch
• Identifies certificates via hash (SHA1)
• Averages about 10 new entries per week
• Relatively high efficacy
6. David Bianco’s Pyramid Triangle of Pain
• Reflects the pain you
cause to an adversary
• Generating new
certificates (even signed
ones) causes little pain
7. Using cert feeds and Bro to greater effect
• Use the feeds as a starting point to gather and label data
• Analyze metadata from known bad certificates as a training set
• Treat other certs resulting from other feeds as maybes
• Try to find patterns in the metadata we can use to match as many
known bad and maybes as possible, verify against known (or at least,
heavily biased) good traffic
8. Why Bro?
• Content awareness
• Ability to apply patterns to live network traffic
• Symmetry on the front and the back end
9. I don’t have a supercomputer
• I have a 7 year old Dell workstation
my wife’s IT department was
throwing out
• Nothing here would be remotely
considered HPC
10. Generating training sets
• Visit every potentially malicious site you can possibly find
• OSINT feeds are great for this
• Don’t have a lot of context (if any)
• Look for certificates that match our known bad ones
• “Everything else” creates a data set that isn’t totally trustworthy, use
for testing
12. Problems with generating data sets
• Expect a low response rate
• Sites get taken down, not HTTPS port 443, don’t serve anything out,
unregistered DGAs, etc
• Less than 1 in 5000 respond (with no guarantee those responses are
actually bad)
• Number that match on the SSLBL is even worse, and that’s biased
• Based entirely on what’s already labeled as bad
14. Subjects and Issuers
• CN=nycards2016.com,OU=PositiveSSL,OU=Domain Control
Validated
• emailAddress=ha@163.com,CN=gjf,OU=comba,O=comba,L=guang
zhou,ST=china,C=CN
• CN=A_LifeSize_System,C=US,ST=Texas,L=Austin,emailAddres
s=hostmaster@lifesize.com,OU=IT,O=LifeSize
Communications, Inc.
• CN=Symantec Class 3 Secure Server CA - G4,OU=Symantec
Trust Network,O=Symantec Corporation,C=US
• OU=Test,O=Peersec
Networks,L=Bellevue,ST=WA,C=US,CN=MatrixSSL Sample
Server CA
15. Splitting the Attributes
• Subject and Issuer are the string representations of multiple Attribute
Value Assertions (AVAs)
• Hard to compare them as big strings, but a lot more commonality
once you split them up
• Not hard to parse out each attribute using something like Splunk or
Kiabana, but it makes matching on those fields harder later
• Split the fields into a new Bro log based on x509.log
(x509_extended.log)
16. Many attributes, but we’re just using a subset
• C Country
• CN Common Name (Site identifier)
• L Locality (City)
• O Organization
• OU Organizational Unit
• ST State (or Province)
• emailAddress
• unstructuredName
• serialNumber
18. Need a prototyping system
• Wanted to gather data, then test patterns on the same data sets over
and over
• Could do this with Bro directly, but you don’t really need to reprocess
the packets and sessions over and over again
• Process traffic into Bro logs, evaluate via Splunk or SQL
• May want to apply new certificate feeds to existing logs outside of Bro
19. Analysis
• Look at data in $VISUALIZATION
• Clustering -> Pattern Synthesis
• Check for hits in the bad table
• Check for hits in the unknown table
• Confirm against a known good set
21. Default Values
C ST O emailAddress
AU Some-State Internet Widgits Pty Ltd -
AU Some-State Internet Widgits Pty Ltd chmod 0600 /etc/nginx/ssl/server.key
AU Some-State Internet Widgits Pty Ltd -
22. openssl Command Defaults
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:
State or Province Name (full name) [Some-State]:
Locality Name (eg, city) []:
Organization Name (eg, company) [Internet Widgits Pty Ltd]:
Organizational Unit Name (eg, section) []:
Common Name (e.g. server FQDN or YOUR name) []:
Email Address []:
23. Is it actionable?
• Very strong correlation between sites that were hosting malware or
control nodes, though
• Gozi, Gootkit, Shifu, others have all been identified running from
servers with “Internet Widgits Pty Ltd” certificates
• Non-malicious sites mostly default server pages and sites under
development
• A user visiting a site outside the network could be considered
anomalous
• Default Company Ltd, Default City, also used by some OpenSSL
distributions
24. Copypasta
sha1 O L ST
1147947433f261bcd2cd8f508461e01898c3960b Dis Springfield Denial
f2a61975cb541e6a62ed8ca5214020108d922a14 Dis Springfield Denial
368e6beb6f8d2f6049831fe25dd397287823c5e6 Dis Springfield Denial
a9650a4522140d42e5ca4529da54805625eebe64 Dis Springfield Denial
• 4 cert feed matches in our original sample set
• SSLBL lists all four as TorrentLocker C2 servers
• 14 others were found with the same ST, L, and O fields (and other
fields not present)
• 5 of those have shown up in the SSLBL feed since
• So far ALL TorrentLocker C2 servers seem to use the same pattern
26. “Random” Values
C CN L O ST
CN TJMauph2wkefdglVFzqmyEvM 3KLyyRWQF0IRfH91yu5frdLX rfUvM2rqVg1P8IpFP2mJbEjD ST
CN RJHeFQ9nCz69k5RNTTLmVCIf gBEUDkp44OE7ihODZD4VbdDv oLsGPV9bx43NaNg1ZjOqIGfJ ST
CN Hcoc6tfYqmEXPnDtwJ39vBFg N9El3p9XpqOBDcqUQxKCbw5V OJ2vl3Vz2Tn0skdsUsLUMwFz ST
CN X5WBo9o5AqvtVGGAVyBiNgwO wHMhVyFMNPcbdG84Q8gKcijH 8V3jDPLZIGdNoOmKQ42ZmhlE ST
CN rQ9YqiO7S1pgULTmD3nNahn7 OBfmruLgjF88LKyg0fVHqRzU zs3L7avZO3gDESogMpf4HBxj ST
• Fixed C and ST values, and exactly 24 character in the CN, L, and O
fields
• Over 27 matches for the same pattern in the “maybe” set
• All C2 nodes from the same malware family
27. Applying Patterns to Bro
• Wrote collection of bro scripts that load the x509_extended module
• Hooks into an event after subject and issuer subfields have been
parsed out
• Logs to notice.log
29. Recap
• Bro makes it easy to extract certificate metadata
• Using OSINT and Bro you can easily collect large sets of data on bad
and suspect certificates
• Patterns in the certificate metadata can yield higher-value
information than the feeds alone
• Hard to definitively say something is malicious with no context, but
you can get to a high level of confidence
• Since Bro can operate a line speed, it can be used to match against
those patterns with live traffic
30. Future
• Better ways of applying patterns in Bro (less hardcoding into scripts)
• Certificate analysis has potential for uncovering a lot more patterns
• Better automatic clustering
• BSides DC talk focusing on clustering and analysis (Oct 22, '16)
• Continuing to enhance our collection of good/bad certs
• Looking for collaborators - let us know if you are interested...
31. Thanks to:
• Abuse.ch
• John Bambenek and Bambenek Consulting
• AlienVault and numerous OTX contributors
• Ravi Pandey from University of Maryland
Hello folks,
My name is Ajit Thyagarajan and this is my colleague, Andrew Beard. We are both from Atomic Mole, a small startup developing a security solution for the mid-market.
Before we dive into the guts of this presentation, I thought it would be worthwhile to tell you how we got interested in this work.
The statistics are out there. SSL traffic is on the increase (courtesy Dell). Expected growth is 20% each year and not surprisingly only a very small percentage of this traffic is analyzed.
According to Dell, there was a sharp increase in SSL/TLS encryption through 2015. In the fourth quarter of 2015 nearly 65 percent of all web connections that Dell observed were encrypted, leading to a lot more under-the-radar attacks, according to the company. Gartner has predicted that 50 percent of all network attacks will take advantage of SSL/TLS by 2017.
Another interesting statistic is that it is becoming harder and harder to inspect SSL traffic content. First you need a man-in-the-middle device and then you have to figure out how to manage certificates. In addition, Certificate pinning and use of complex ciphers make decryption of content even more challenging. A lot of organizations also just do not decrypt SSL traffic due to legal concerns and infringement of privacy for users .
[Ref: https://securityevaluators.com/knowledge/case_studies/mutual/]
So, all of this points to a need for new innovative ways to analyze SSL traffic for maliciousness without decrypting the content by looking for other clues.
So, Andrew and I both attended BSides Baltimore and there was a talk there about detecting anomalies in x.509 certs using bro.
Since our security solution uses Bro for network traffic analysis, we figured that it would be a great feature to explore. So, we listen and it turns out that the main hypothesis of that talk was around the certificate creation time. The author had hypothesized that if the certificate creation time was some random value like 10am, then there was a high probability of it being malicious mainly because most of the good certs apparently had the creation time as midnight. Well, that didn't make any sense to us and we decided to really look into this further.
Turns out that the default value for creation time (its an optional field during cert creation) is 0, so thats why a lot of the values were showing up as 12:00am. That validated our theory that the creation time probably didn't have much to do with the maliciousness of a cert.
But we left with a burning question - so what attributes could potentially indicate maliciousness (if any) and we thought that there were definitely some attributes that were more likely than not which might be indicative of maliciousness. We also had to sort of think like a hacker and try and see how and what they might do to get certs as compared to a legitimate organization.
In the BSides talk, the author had used 2K good certs and a "handful" of bad certs. Not sure if this was a good training set in the first place. So, we decided to go all out with much higher fidelity . We needed a setup that would pull a very large number of good and bad certs and analyze the heck out of them.
One thing I did want to point out is that when our abstract was accepted, we got some great feedback from the reviewers (kudos to them - wish more conferences gave that kind of feedback). They mentioned that since this was a Bro conference, it would be beneficial to focus more on the Bro aspect of things, so thats what we did.
Talk a little bit about certificate feeds, as these become very important to information gathering process as you’ll see in a moment
Mostly commodity C2 servers, Dridex, Dyre, TorrentLocker, etc
10/week excludes Dyre
High efficacy -> Ground truth
Pain on adversary by denying them access to X
Not just for self-signed certs anymore. Mention the rise of Let’s Encrypt and other no-cost ways of getting legit certificates.
As a network defender, I want to cause pain. How do we cause pain?
Identify adversary's process and tools
Make them identify and change things that take time and effort
We can do better, enriching feed data and processing results
Not just cert feeds, but they’re my confirmed bads
What do I mean by resulting from? Refer to next side
Could have scripted something with curl, openssl, and a lot of parsing
Combine Bro for research and monitoring after the fact
Talk about feedback loop, bro -> pattern generating process -> back into bro
Tours of my datacenter usually begin with someone asking “What’s making that noise?”
All x509 certificates end up in files table with sha1s
By isn’t totally trustworthy, I mean we don’t for use it’s malicious. Things move, takedown pages, etc
Response rate on the order of most telemarketers
This is a funnel, start wide, end up with few final results.
Match rate is biased because our feeds include other Abuse.ch feeds, so there’s a better chance than looking at other sources
Mention Bsides Charm talk concentrated on not_valid_after and not_valid_before
Not scalar, categorical
Ran into problems with initial analysis, not a lot of commonality.
There is structure here. Separate attributes with key and value pairs
Yes, we’ll make it a package
ALL optional
Last point, feeds change. I’ve already processed the data, tell me given what I know know about the things that happened then
Look at data -> common elements. Values that recur. Every single attribute you have, sort by count
Start with a bad record, spider out of other similar records
Next steps -> automation. Still working on getting good results there.
Known good, baseline set of traffic you trust. With a lot number of results it’s always the possibility that your “known good” traffic isn’t that good, so review to be sure.
The email address part didn’t make a lot of sense until we tried generating a certificate with the openssl command
Alex Kirk blog post talking about Snort signature, Talos
Netresec, post on https reverse tunnel
Note about the email address going off the end of the command. Probably not an automated system.
Also saw results for Default Company Ltd, some versions of openssl use this by default instead
These aren’t malicious, but there’s a higher proibability of something bad going on
Not all patterns are in the values themselves, some are in the form
Difficult to do via automated analysis
Entropy calculation on text would probably show very high value
What’s the best way to do these? Not really part of intel framework as I understand it.
Longer shelf life. Issuing new certs part of brining up new domains, but if the patterns don’t change puts you a step ahead.
Try to piece together why. Easy to create patterns that are coincidental.
Special thanks to Ravi Pandey, one of our summer interns from UMD, who spent a fair bit of time with the certificates applying trying various ML algorithms on multiple versions of our dataset.