Presentation slides presented by Cody Thomas and Christopher Korban at x33fcon 2018 about how to jumpstart your purple teaming with the MITRE ATT&CK framework, and accompanying Adversary Emulation Plans
So, what can we do to address all of the issues Chris pointed out? We can start doing more purple teaming. What is purple teaming? Red and blue are working together for the same goal - making a network more secure. This ‘win/lose’ mentality between red and blue causes a lot of strife, without any benefit. Blue tries to keep red in the dark (security through obscurity), and red reports vague findings so they can make sure they ‘win’ again next year. You need both sides of the picture (red and blue) to make a really effective defense, so there needs to be benefits for a heightened level of transparency.
So, what does this new cycle look like?
Red and blue need to be working together more often throughout the security process. For an internal red team, this blending of efforts can happen every stage of the way. For an external red team though, this most likely means an extra week or so at the end of an engagement to sit down with the blue team and have a mini purple team
We do a similar process for development - unit testing of code. We tend to not do this for operations though. The best time to have red input into defenses is in design!
The main process for purple teaming through is that it’s a quick, iterative, and collaborative workflow that benefits most from blending all parts of red and blue, but can be done at any portion. As red and blue start working more closely together, they need a common way to talk about things that’s one step above Windows Event IDs and command lines.
What is needed for this kind of language to work well for purple teaming? It means that red and blue need to be able to communicate effectively to articulate what happened in a test and the results It means that there needs to be a way to talk about what was done during a test so that it’s repeatable And it means that the language needs some way to measure improvement between tests
We like to use ATT&CK for purple teaming. ATT&CK is Adversary Tactics, Techniques, and Common Knowledge We have a small sample of it here. There are currently 11 Tactics across the top - each one refers to a ‘goal’ of the attacker. This equates to the reason why an attacker is doing any given technique. Down each column are different techniques that achieve that tactic. These techniques equate to what the adversary is doing (creating services, using WMI for persistence, dumping credentials, etc).
If you just glance across the different techniques we have listed, you’ll notice something start to jump out - these are descriptions of adversary behaviors, not indicators of compromise. The same holds true for the information we capture about different threat groups on ATT&CK - we tie everything back to behaviors.
We focus on adversary TTPs and behaviors because that’s the hardest thing for an adversary to change.
If you look at David Bianco’s pyramid of pain, you’ll see that it’s trivial for an adversary to change IoCs (like IP addresses, domain names, file names, hashes, etc), a bit harder for them to change tooling (but still feasible), but becomes a lot harder to change how they operate (their TTPs).
If we dive into the details for a given technique … (next slide)
We get something like this. There’s a few main sections here across this slide and the next one. There’s a high level description of the technique (what it does normally and how it’s abused by the attacker). There are examples of how we’ve seen this technique used in the wild. This is an important one because ATT&CK focuses on techniques that are actually seen in use by adversaries in the wild (and cited to their respective threat intel reports). There are a few exceptions to this of course (hence the ‘Common Knowledge’) part of ATT&CK. Some techniques are known to be used by Red Teams but for some reason or another, we haven’t seen in threat intel reports. So, in an effort to make sure we’re providing the most useful information, we do include some techniques that are not backed by threat intel yet.
On the right hand side you’ll see some tactic-specific information such as what the permissions are before/after executing the technique or which defenses are being evaded. On the next slide …
We include mitigations and detections opportunities for each technique. We try to refrain from mentioning specific vendor tools, and instead try to talk to the broader capabilities that are needed for mitigation and detection.
Ok, so we talked about a common language to use, but ATT&CK is getting pretty big! We’ve scoped the realm of the possible down to the realm of the probable, but can we start to prioritize a bit more from there? We sure can! This is where we start doing Adversary Emulation, or sometimes called Threat-based Red Teaming.
In our case, we don’t want to just look like advanced adversaries, we want to look like a very specific adversary. We want to look like the adversary you’re most likely going to face (based on your industry, your company, your past incidents, etc) so that we can prioritize working on defenses for those techniques first.
Remember, this is a prioritization mechanism to help frame where you should start working on defenses and forcing your offense and defense to work together to build stronger behavior-based defensive measures.
Ok, this is cool, but how can I do this adversary emulation thing you describe?
We like ATT&CK, so we do this adversary emulation thing with ATT&CK (and we already have one example here for you). More emulation plans to come, and we welcome all community additions or edits to the emulation plans (email email@example.com)
As with lots of red teaming work, part of the initial process is a rules of engagement. Adversary emulation is no exception. We also are scoping what we’re able to do by a few different variables: How much time is allotted for the test. This can of course dictate how many techniques you’re able to use Threat intelligence abundance/quality. If you can’t get the threat intel to determine which category of actors are likely to target you or what kinds of techniques they use, it’ll be hard to prioritize defenses in this way. And lastly is capability. It’s entirely possible that the adversary you’re wanting to emulate is too sophisticated for you to emulate without a lot of development.
You might be thinking: “I’m hamstrung from doing technique X, which would get me Domain Admin. That’s not realistic, right?” Remember why we’re doing this. We want red and blue working together to solve a shared problem. We’re using red to help scope blue. We’re prioritizing which defenses we bolster first based on prior threat intelligence. This does not guarantee that you’ll be protected from all APTX in the future. This is looking at a snapshot in time in the past, and even that can be muddied a bit based on the quality of your threat intel. However, the prioritization is still extremely useful. This also helps with a coherent story for what defenders are spending money on and can help mitigate that ‘shiny object’ syndrome from higher level management.
You might be wondering though, how do I go about this whole process?
The two big pieces of developing an adversary emulation plan are getting the threat intel and then getting the right data from that intel.
For our emulation plans, since we wanted to make sure we could release them to the public, we stuck exclusively to open source data. We scoured public threat intel feeds and used some google-fu to get a big list of reports relating to APT3. Part of this involves pulling threads, so we also looked for campaigns tied to APT3 and reports on APT3’s tooling (even if they don’t call out APT3 by name)
From here, we mapped APT3’s techniques and the capabilities of their tools to ATT&CK. If they had a capability that wasn’t in ATT&CK, we added it. After reading all of these reports, we were able to come up with a general MO for APT3 and a phased approach to emulating them on a network. What you see here is the phased approach to our emulation prototype that tries to keep everything generally at the ATT&CK Tactic level
After you get this information …
You can take it one step further and start providing a possible ordering to techniques. Unfortunately, due to the kind of threat intel reports that are out there and when IR teams tend to get called in, there is some information that’s just not captured. We do our best to fill in these gaps just based on prior red teaming and threat intel reporting knowledge. With this, we come up with a possible technique flow (on the right). Our mapping of tool capabilities to ATT&CK techniques is here on the left. You can also see that for the sake of helping operators and defenders, we take this one step further and provide examples of doing the same ATT&CK technique with built-in commands, cobalt strike commands, and Metasploit. There are of course a lot of different frameworks that can be leveraged and a lot of different implementations of how to do these ATT&CK behaviors, but at this stage, we keep it light weight.
Now that you have an idea for the kinds of things that the adversary is capable of, you need to determine if you can do it as well. This involves looking through open source and commercial tools to see if they have the capabilities (natively or with some configuration/scripting) to do the same ATT&CK techniques as your adversary. Sometimes this is easy, but other times the technique you’re trying to emulate is extremely specific. In these cases, you might have to create your own tool. You need some diversity in this area because you want to make sure that the defense isn’t signaturing your tool or the way your tool works instead of detecting the malicious behavior.
An artifact of going through these phases is the creation of an adversary emulation field manual for the adversary you’re targeting. This breaks out very specific command lines, scripts, and tooling configurations needed to do the ATT&CK techniques you selected. This is where you start breaking out many different implementations for ATT&CK techniques to hone in on the behavior of what’s bad instead of tailoring a defense to a single implementation. The goal would be that you can even get more junior red teamers or even defenders able to pick up the field manual and start operating for testing purposes.
At this point, you’re almost ready to actually emulate the adversary on the network.
You need to adjust your generic APTX emulation plan to match any restrictions placed on the engagement, and you need to setup your offensive infrastructure to match your emulation plans. When adjusting your emulation plan is where you’ll take into account this specific “rules of engagement” which will limit target users, machines, groups, etc.
When you start using tools for the evaluation, remember to change the defaults!
Ok, so you emulated an adversary for a customer (or internally). Now what? What was the output of that? Remember, this is a prioritization mechanism. You can get a planning matrix like the one above. Clearly this matrix doesn’t include enough information to really tell a defender what exactly is detected, what the alerts were based on, if IoCs were involved, or anything beyond a very high level planning view. Once we start diving into this, you’ll see that there are actually many other dimensions to this that take into account the specific implementations that were used, how robust the detections/mitigations were, how noisy the collection is, etc.
This planning aid’s application is described in the next slide …
This is where we go from adversary emulation to purple teaming (it’s a blurry line). Now that you have some, high level idea of what your coverage is for the subset of techniques that adversary uses, it’s time to dig into them a bit more. This is something you’ll do for all colors of the matrix, but probably prioritized red, yellow, green, grey (yes, even green). The real purple teaming cycle comes into effect to start throwing many different implementations at the defenses to see what all is detected, what isn’t, why, how that can be updated, and continue trying.
When do you stop? No guaranteed stop point. Are you ever 100% sure you detect all possible implementations of a behavior? You can get to a point where you’re confident you detect it and accept the risk for not doing more testing.