Metadata Poses Both Risks And Rewards
The National Security Agency’s focus on metadata has raised awareness of the threat that activity tracking poses to individual privacy and has renewed debates over the level of monitoring that should be permissible by government and businesses.
For businesses, the lessons are more subtle. Organization can both inadvertently leak metadata–giving adversaries a look into their operations and a potential covert communications channel–and analyze their own metadata to gain information on anomalous activity within their network. Metadata, a byproduct of the adoption of technology, should be helpful–and can be–if companies are aware of the issues posed by the data, says Will Irace, vice president of threat research at General Dynamics Fidelis Cybersecurity Solutions.
“I don’t look at metadata as some bogey man,” he says. “Instead, we have to figure out how to distill knowledge from the massive amounts of raw information that we are collecting.”
Metadata arrived in the lexicon of every-day technology users in 2013, when the leak of classified documents from the National Security Agency highlighted the amount of information collected by service providers and requested by the government. While the U.S. government is barred from collecting the content of communications without a warrant, metadata–loosely defined as data about data–has historically been fair game. Yet, metadata is as important–and many technologists argue, more important–than the content of messages or documents, because it can be used to create mappings of the relationships between content and the creators of that content.
[Establishing ‘normal’ behaviors, traffics, and patterns across the network makes it easier to spot previously unknown bad behavior. See Network Baseline Information Key To Detecting Anomalies.]
In an ongoing study using volunteers who allow their information to be tracked, Stanford University has found that significant information about participants can be inferred just from their phone metadata. In one instance, a subject contacted a home improvement store, locksmiths, a hydroponics dealer, and a head shop. In another instance, a participant made “calls to a firearm store that specializes in the AR semiautomatic rifle platform (and) they also spoke at length with customer service for a firearm manufacturer that produces an AR line,” according to a March 12 update on the research by Jonathan Mayer, a Ph.D. student in computer science at Stanford University.
From a privacy perspective, the term “metadata” is typically used to identify what legal experts believe is data that can be collected, whether by business or government, without infringing on the privacy of citizens. Yet, the MetaPhone project shows that such data about content still leaks significant privacy-infringing information, Mayer says.
“I think the notion of metadata and privacy as being separate … is not born out,” he says. “Even if you excise the personally identifiable information (PII), someone could still re-identify the data set or make sensitive inferences. So getting rid of the PII does not get rid of the privacy problems.”
While the MetaPhone project focused on data about who called whom, metadata includes a wide variety of machine-generated information: browser histories, document information, network packet headers, and access logs are all common sources of metadata produced by companies and their employees. Attackers frequently seek out this information to use in reconnaissance against a targeted firm and gain valuable knowledge about their employees and network infrastructure.
While doing research on metadata leakage, Spanish security firm Eleven Paths created a tool that could mine the data from public documents available on a company’s website. Because firms frequently do not sanitize the information placed in documents, attackers can gain information about who authored the file, when they created it, and on what type of machine. In a more recent 2013 study, the company found that data-loss prevention firms do not fully sanitize their own files and documents, leaking potentially sensitive information. In some cases, file servers and printers can also be revealed.
“A persistent attacker can create a piece of malware for a specific target and use information taken from documents to create a more targeted attack,” says Chema Alonso, CEO of Eleven Paths. “By looking at metadata, they can identify people and figure out what internal servers they need to infect.”
Yet, companies that become more aware of metadata can collect data on and analyze their employees’ activities to gain more visibility into their networks and detect anomalous activity. Frequently referred to as Big Data analytics, such monitoring and analysis projects can help companies identify what activities may need more scrutiny.
They are, however, not easy, says General Dynamics’ Irace.
“Big data analytics brings to mind a magical black box that takes in all this raw data and produces a diamond of actionable knowledge, but it is much messier than that,” he says. “It is much more human-driven.”
Companies should dip their toe into collecting and analyzing metadata to gain experience and a grasp of what kinds of information should be collected and how the company should process it correctly, he says.
Have a comment on this story? Please click “Add Your Comment” below. If you’d like to contact Dark Reading’s editors directly, send us a message.