Third in the VisibleRisk series about Malicious PDF analysis. This post focuses on instrumenting Spidermonkey to benefit the analyst when identifying shellcode in javasript.Read More
Is my network traffic lying to me? Most malware authors don’t seem to spend a lot of effort trying to blend into network traffic. I’m pretty sure the reason for this is “they don’t need to”. Why spend extra effort figuring out how to blend into network traffic when something as simple as the following HTTP request can sneak by undetected.
GET /statistics.html HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0.1; sv:2; id: 1A698BE9-0211-5EB4-AFDC-644AA479D972) Gecko/20100101 Firefox/9.0.1
That. works. ಠ_ಠ People had trouble finding it initially, people admit to be being hard to track and identify, and there was a lot of reliance in tracking it via User-Agent string or by domain. I’d be willing to place a beer (or two) on the line if anybody can successfully argue their way into that being any kind of legit Firefox User-Agent string. Compare that request with the following:
GET /logo.gif HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0.1) Gecko/20100101 Firefox/ 9.0.1
Accept-Encoding: gzip, deflate
Yes, that is a legit Firefox connection and you’ll notice it looks absolutely nothing like the prior example. Aside from both possessing Host and User-Agent headers and being a GET request, they can’t be more different. It turns out people that write web browsers love reading RFCs and making them as valid and flexible as possible (thanks guys!). They also seem to favor consistency, except for the dudes writing MSIE. One of the things I (amongst many other people) have been preaching for years is identify stuff that’s legitimate and then don’t look there for anything malicious. What if we had the technology to determine valid HTTP client requests (which we do), and using that information begin looking at everything that wasn’t a valid request? This can be made a bit more interesting from the analytic side by looking at everything that says it’s browser X, but doesn’t appear to behave like it (X). The hard part is not coming up with a method of detection (there are quite a few), but it seems to be collecting the data to verify the detection mechanism.
Luckily there are a few projects that allow analysis of network traffic in such a way to allow for analysis/questioning like this to occur. I’m sure I’m leaving some out, but these are the few that I’m familiar with (short business plug: we use, develop content for, and help with analytics on): p0f, Bro IDS, and NetWitness. Each technology does things in a slightly different way but this still leads to a solid analytic experience to ask different questions about your network.
It’s back with a vengeance, and I’m late to the game posting about it. Whatever, let’s look at how it works (strictly functionality non-algorithmically). If you dig through the README you’ll find a note like this:
The signature will be matched even if other headers appear in
between, as long as the list itself is matched in the specified
That’s key. In other words you specify which headers should appear in what order - relative to one another vs. the absolute position in the request. The logic determines if the request matches that order, and viola signature matched. There are several other really nice features of this implementation; you can look for inconsistencies between platform (OS) and browser, you have the ability to look for default values (this is important to gain accuracy), and the signature format is highly flexible. The only real drawback (as with most tools/products/whatever) is the lack of signatures. Overall one of the best, if not the best (or maybe only), out-of-the-box implementations for passive browser fingerprinting.
Note: The next two technologies do not include this functionality out of the box, but there is some sample content floating around to allow for it.
Bro is also benefiting from a relatively new yet major release. While it doesn’t ship with the ability to profile HTTP connections out of the box Seth Hall has put some leg work in over the years to come up with a good way to do this, with a rewrite recently. This is a slightly different way of creating the request signatures, but still has the notion of required headers and relative order along with optional headers. This method doesn’t take default values for headers, but you can get a surprising amount of accuracy just using header order (although the ability to say ‘not this header’ is important). Once again, you’ve got to have data to come up with good (accurate) signatures but it’s refreshing that some are provided for you. The one downside to Bro, in it’s most current revision,is that it normalizes HTTP headers. While the normalization is useful for other analytics in this case being case sensitive is a really good thing as all major browsers upper-case the first letter of each word in the header name. Bro is proving to be an exceedingly agile platform for network monitoring in general and this could be a great place for analytics like this to live.
NetWitness is in the same boat as Bro in that it doesn’t provide the functionality or signatures within the product, however it’s another tool you can use to perform similar analysis of network traffic. Instead you (or somebody else) has to create the proper content to do the analysis. The really downside is there is no freely available reference to base this off of (for now, but keep reading).
This has been an area that’s piqued my interest for quite a while, as I really like finding new ways to look interesting behaviors in network traffic. When we were looking to solve this problem we took a bit of a different approach. Since we’re not big on duplicating efforts we looks at the technologies we had available (at the time it was NetWitness and Bro) and figured out a way to solve this problem in those systems. Doing things a bit backwards we decided to pick the signature and then create the mapping from signature to potential match. We started with a list of headers that we saw the majority of major browsers user, and used their relative order to determine the browser creating the traffic. Using the following headers: Host, Accept, Accept-Language, Accept-Charset, Accept-Encoding, Connection, User-Agent, UA-CPU, XMLHTTPRequst and Keep-Alive and the case requirement we came up with a series of decision trees to determine the browser. We went with decision trees because it seemed a bit easier to manage than signatures, I’m not sure it will end up that way but it’s been really easy to manage thus far. The header choice seems to lend itself to profiling HTTP 1.1 connections but we’ve had some success with HTTP 1.0 requests as well. By adding more headers and and some default values you can get increasingly more accurate with identification.
In the meantime I’ve uploaded a reference for a NetWitness parser that takes care of some of the Opera signatures. If you’re not familiar with the NetWitness parsing language; the parser does a basic check to insure it’s in an HTTP session. The parser will then check for the presence of each header that we care about and note it’s relative position, when it encounters (what should be) the User-Agent header it checks to see if Opera/ is present and that is noted. At the end of header ([CR][LF][CR][LF]) there is logic to check the order of the headers and fall through to it being valid. If it doesn’t match the pattern (signature) for Opera, and it saw Opera/ in the User-Agent then it will cause an alert to be populated in the Alerts key.
It’s really awesome seeing other people looking at more flexible and creative methods of analytics, it makes me feel less crazy for trying stuff like this. The best part, in a couple cases, the signatures matched up with my tree (that sounds odd). What a cool verification point tho. I hope to get some more data under our analytics and begin to contribute more back to the community.
VisibleRisk is always happy to support the local community by sponsoring events that meet both our intellectual needs and our need to be around insanly intelligent people. SANS DFIR certainly accomplishes both of those directives. This year we are sponsoring the DFIR event in a couple of ways. For those of you attending the Summit we are sponsoring breakfast on day 1 of the summit. Additionally, we will have an information table set up on the 26 and 27th. We’ll be there to answer questions and simply contribute to the conversation. No sales just peers - real practictioners to share what we can with you! As an added incentive for you to visit our booth (and to make up for the fact that haven’t yet sold our souls and rented booth bunnies) we’ll have a give-away or two that you’ll enjoy.
Information Table - June 26-27 * 9:00am - 5:00pm
Sponsored Breakfast - June 26 * 7:00am
If you haven’t registered and want to attend let me know and we can get you a 10% discount. It is always a worthwhile event. It is probably one of the most deeply technical events I’ve ever attended as well as being attended by people most of us would love to work with on a daily basis.
Follow us on twitter @visiblerisk, @rockyd, @sooshie for more information during the Summit!
We here at VisibleRisk like to go through the occasional exercise, while some appear to be really good ideas others turn out horribly. Yesterday we decided to undertake attempting to enumerate as many of the Flame domains and IP address as possible, and we decided to attempt to do this without analyzing a sample of the malware. We were able to find 76 of the domains used in the campaign and 12 of the IP addresses. This exercise was done by using only open source information and we wanted to publish this information because nobody else was. Although, enough information was published that if people wanted to go on a domain hunt they’d be able to. Thank you to everybody who has performed some sort of analysis and published that information. Feel free to use these lists for whatever you like.
You can grab the files from here.