Big Data Detectives
For Vigilant, it started in 2009. And as with most companies, it started small.
The security services startup, now part of audit and consulting firm Deloitte, wanted a way to bring information about external threats to clients that were using SIEM (security information and event management) systems to monitor their own environments. The Vigilant team knew that the combination of external threat data with internal security event data could be a powerful way to improve enterprise defenses, but crunching all that data would be a monumental task.
Vigilant began combining threat intelligence feeds, filtering the data to pull out the most important information for each client, and then transmitting the data to their clients’ SIEM systems. The company started with two threat lists: domains serving malware, and domains compromised by the Trojans SpyEye and Zeus. To reduce false alarms and aid in analysis, the company began adding more data feeds.
Vigilant’s analysts quickly became addicted to the analysis. Each new source of data gave them the ability to tease out additional information on threats. By 2011, the company was processing about 50 to 100 GBs per day. But the company’s systems couldn’t keep up with the flow of data, and it started missing performance deadlines, says Joe Magee, co-founder and former CTO of Vigilant, who is now a director at Deloitte.
“We were not able to catch up,” Magee says. “We were not able to process the information and push it out fast enough, and that’s when it became a big data issue for us. We needed to be able to rip through this data in Google-like fashion.”
The volume of data and rate of change caused the problem, because most of the data came in the form of feeds updated daily with gigabytes of data. It overwhelmed the company’s initial database built on top of Postgres. In 2011, Vigilant moved to Hadoop and became one of many companies — both vendors and enterprises — that are advocating the use of big data analytics to improve the response to security threats.
Big Data Still Just A Promise
For security teams, the use of analytics on massive quantities of security data — from device and application logs to collections of captured network packets and operational business data — promises better visibility into the security threats that elude current defenses.
Big data analytics can be more complex than the log collection and analysis conducted by most SIEM systems, so automating the number crunching is often needed to let security pros more easily use statistical correlations to discover trends and anomalies. Tracking days or weeks of business activity allows the system to find outliers — a user who accesses far more data on a daily basis than the average employee, or a system that has a sudden spike in resource consumption. Analysts then can dig deeper into the large data sets of security information for any flagged events.
“Big data is not just about gaining insights, it’s about helping remediate issues faster,” says Jason Corbin, director of security intelligence strategy for IBM Security Systems. “The big problem is that [security teams] are overwhelmed with information they have. All that information goes to some guy who has to sift through tons of incidents or vulnerability reports and decide what they need to patch or virtually patch or fix. Security teams fall behind, and that’s how companies suffer breaches based on known but unpatched vulnerabilities.”
But for many companies, the promise of big data in security is just that — a promise. While security teams hope to gain more awareness of what is going on in their networks by collecting and analyzing more of their data, the technology is still in its adolescence. “Hadoop has been around for a while, but it is still figuring out what it is and what is wants to be,” says Adrian Lane, CTO for security consultancy Securosis.
Still, the potential is huge, Lane adds. Companies that kick off a big data project for security can collect an immense volume of data and have a security analyst poke through the information, ask queries of the data and make important discoveries.