Sunday, September 21, 2014

"Policy of Truth" - Indirect Network Mapping via Metadata Extraction and Document Farming

You had something to hide, should've hidden shouldn't you?

Metadata is probably the single most devastating information gathering method you will ever be exposed to. The most jarring facts about it are:

1. It's easy to fix.
Most of the time, it's simply a click (maybe 2 or 3) to scrub. It's really that easy.
2. You and your users are giving me all of the information I need to compromise your assets
When you see what you're giving away, your jaw will drop.
3. It's remarkably "low-tech."
I teach this in 5 minutes, any person could do it.
4. You've allowed search engines to do the hard work.
In fact, you and your users have probably directed a search engine to index it and expose you.
5. It's so valuable, even your favorite 3-letter agencies are collecting it!
Utah Data Center --

What does that tell you?
It's just time to pay the price, for not listening to advice...

You've been told your entire life about metadata. You probably use it everyday. In fact, most economies are built upon it.

Credit reporting.

Aren't you careful about what shows up on your credit report? Aren't you extremely vigilant about who gets to see your credit report? Aren't you pounded daily about how important it is to every facet of your life?

Credit reports = metadata.

Getting the picture?

Metadata is "data about the data." It describes the information you are looking at, not the information itself. Most entry-level information security folks have a hard time wrapping their head around that or the fact that in most cases, the metadata itself is more valuable than the data it describes.

If you're a criminal and you're relatively sharp, you're going to obfuscate or encode your communications. Odds are, short of having an incredible toolset or knowledge about the targets, it's incredibly difficult to decrypt or decipher what two parties are talking about.

Say you're organizing a heist or are part of a larger criminal organization and I'm investigating you. I have little to no idea what your activities are, but I have a feeling you're "planning something big." I'm going to start monitoring your phone calls. You're speaking in code to someone else, but you're doing it frequently. I can start building from there.

Who you're talking to, how long you're talking to them, how frequently it's occurring and who they are talking to is all important. I'm not going to figure your code out, if you're doing it correctly. However, I can gather a lot of useful information from watching. It may be the only information I have to go on.

I start watching you and your friends. I find out you are going to the same places; that you're buying guns, ammunition, bolt cutters, ski masks, two used cars, one of those folks is in deep gambling debt and happens to work as a teller at a bank.

Do I need to know what you're talking about to start figuring out what you're up to? Probably not.

You have already assumed from that scenario that I'm probably describing a bank heist. You took my description of something, aggregated and inferred from the facts provided and drawn a pretty solid conclusion.

That's all metadata farming and extraction. You've been doing it your entire life.

It's too late to change events, it's time to face the consequence

There are literally thousands of methods and attacks that can be created and used against you.
I'm going to keep it simple. I'll delve into other methods at another time.
What you're going to see is how through a few simple search strings, your own website, and the files your user publicly posted are enough to destroy your organization.

ON TOP OF THAT, getting rid of it is near impossible.

Google Hacking and your website

Google is a powerful tool. You probably already know this. What you probably weren't aware of is the list of google operators and what they are capable of.

Google provides some operators for the saavy user or programmer to leverage. We're primarily concerned in this post with these:

Filetype will give you ONLY file extensions which match your query. If you wanted ONLY docx files (Microsoft Word 2007 and newer), it would only present those in your search.
Site will restrict your site to a certain site or domain.
Cache will return the cache of a site you point it at. Google maintains an voluminous cache of pages it has visited.
Info will give you a list of information about the site, where it's linked to, etc. An entry point to some of the terms above, along with some other useful bits.
Let's build from a simple search.

The first few results are garbage or likely viruses.

We're concerned with the ones that are not and are pointed out above.

At this point, we can open up a tool like FOCA and extract the relevant metadata.

FOCA is a tool for metadata profiling of networks and organizations.

FOCA builds profiles of networks and assets based on extracted metadata. It does *much* more than digesting PDF and Office docs, it's well worth the download.

FOCA has extracted some very useful information for us.

From this, I have a good profile to work with:
Richard Lawhern and Red are usernames on this machine. The machine uses Office 2007 and was created in 2012.
Let's go back to our search.
We can now use a separate operand to make our search more powerful.

Now, this site only has 2 DOCX files on it and I'll extract the metadata from the other.

Once again, we get Office 2007 and Red as a username. At this point, you can start digging in further on the site and this user to start profiling them. Odds are, you're going to start mining a lot more information.

I picked this example mainly because there's not too much information to be found but made for a quick display. The point is that this can be done passively and completely outside of your realm of control.

Consider this:

1. It is very likely that you do not control or host  your website. As someone who worked for several hosting companies, I will tell you that it's also very likely that no one is looking at the logs or they're not looking often enough. The logs that are generated for your average webhost are MASSIVE and they tend to look at them after an incident. You're trusting an unknown party to keep a watch on things for you. Bad idea.

2. If you do this correctly, there is absolutely no way they can tell that the attacker is doing anything "wrong." The attacker is just looking at documents. The attacker is farming google for links, clicking for docs and leaving. This is normal behavior or it is spread across multiple hosts. This is hard to correlate without a subpoena for logs, a decent investigator and quite bit of time (and money.)

3. You may have little or no control over who is posting your documents and files to the page. This is usually a marketing function. Others may have the ability to upload files, depending on your organization. Are all of those users trained on how to remove metadata from files? If they are, do you think they are actually doing it?

4. If you started today and scrubbed all metadata and trained your users to do so, several webcaches (including Google and will require manual removal, even if they honor your request. You would need to find every copy of your files on the internet and replace/delete them.

5. All of the documents with metadata that can be found may not be hosted by you at all! Users send out PDF, JPG, PNG, DOCX, PPTX, etc. files to others via email, public postings, client documents that are posted to their websites, etc.

What is going to irritate you more than anything is that this is a remarkably easy problem to fix.

Consider this:

Right click the file, Select PROPERTIES, then DETAILS.

See that little link that says "Remove Properties and Personal Information"?

The user can select which field to remove or create a copy with it scrubbed. This has to be done EACH time it is saved or the metadata will be repopulated. PDF's written by Adobe Acrobat are a little different, but actually easier. Adobe embeds a lot of information in their files, they're probably my favorite to use. Most files need to have this done to them before public distribution.

How do I get traction with my user base?

It may be a pain to the user but my strategy for this is simple, I make it personal.
If they do not want to comply, have them supply personal files or find their postings on the internet. Better yet, retrieve a picture they've posted publicly or on social media. Extract the metadata. You can likely extract the following:
1. GPS data of where the photo was taken
2. The type and name of their phone/tablet/camera (if it was a mobile device)
3. Time/Date of picture
4. Tagging information
5. If it was edited with Photoshop, Paint, etc., you can tell them about their home computer

This tends to have some serious weight. Many users are laissez-faire about their habits as custodians of their own data but are very protective, in particular, they really hate the tech folks or executives having knowledge about their private lives*.

*I have zero interest in people's private lives and I have personal ethics about preserving privacy. I don't want to know anything about the people I work with. Period. It's awkward, it's creepy and you as an IT or InfoSec person have a personal duty to protect your users. You can not abuse the power and knowledge that you have been given. It's a serious job requiring discretion and trust. Without either, you're going to have a very difficult career.

That being said, I don't care about appearing to wear a black hat into work when I walk in the door. Sometimes, you have to throw yourself on the grenade of dislike to protect your assets. I'm not advocating being hated or even encouraging it. Far from it! You have to build trust and a good relationship with your users. They are your human firewall. They are your last line of defense. Fear, Uncertainty and Doubt are poor sales tactics, but great motivators. Sometimes you need the "right tool" for the "right job" and when it comes to this, the fear tactic motivates well.

I'll cover more on metadata extraction, farming and public information gathering. Sites like LinkedIn, Monster, Craigslist, ARIN and DNSSTUFF are amazing for mapping a target without much effort.

No comments:

Post a Comment