It’s Big Data Warfare - Good versus Evil

Whether your data hacker worldview is in terms of "black hat" v.s. "white hat" (or Gandalf's early grey), "us" v.s. "them", or, as in the military, "blue" v.s. "red" (the red guys are the ones you aim at), the fact is that exploitable data sets are growing larger and more available day by day. Big data techniques are maturing, ripe for easier adoption, and because they are based on commoditized infrastructure with plenty of open source tooling, both the good guys and the bad guys are fast becoming enabled to take advantage of formerly unavailable information.

I was able to catch a great Strata sponsored online conference this week on Data Warfare emceed by Alistar Croll (a prelude to more in-depth presentations next month at the Strata conference in CA), and while we've all heard about some good uses of Big data to combat evils like financial fraud, identity theft, tax evasion, and spam, it was eye opening to understand that inevitably bad guys are going to use Big data to conduct nefarious and often criminal activities against us. 

For example, greyish marketing wizards (Target, can already do a credible job at assembling personal identifying information from various retail mareketing, point of sales, and clickstream data sets. You might think your information has been protected at-large when each data set "owner" anonymizes your name, address, and phone number out of their data set before they sell or share it.  But one of the big values of big data analytics is that it is not just used to comb for specifically named needles in super huge haystacks of information, but can readily aggregate data across data sets with analysis that can effectively re-create identifying features. 

Recently some big data researchers showed how easy it can be to re-identify someone specifically by aggregating anonymized genomics and marketing data sets.  (for more examples, see the Strata conf replay)  If we think of this capability as "linkage", and that big data is really all about increasing analytical linkage, then great forces that can be used for good or evil are being unleashed every day.

In the conference Lexis/Nexis, claiming to be a white hat fighting evil doers with big data, presented how they use their super "linkage" big data powers to identify folks in detailed profiles to prevent fraud. In fact, they want you to give them more of your information so that its harder for someone else to pretend they are you. To me this had shades of big brother doublespeak (you can trust us) and the implications on privacy abound. But what is clear is that if they can create a super-accurate "real world" social graph (with you in the middle) out of disparate sources of available data, so can black hat bad guys. (by the way, their HPCC system is interesting to big data techies as an alternative/adjunct to the Hadoop way of doing things)

I don't think we should run scared, cancel our social media accounts, drop off the grid and pay for groceries through barter. Rather we need to march forward with our eyes open. As data sets grow bigger, as analysis gets smarter, as analysts get better, big data efforts are going to unlock increasingly massive value. But the lesson here is that we all might need to work harder to make sure that value accrues to the good. Especially with our own big data projects.

And let's not be fooled by superficial anonymity. 

(Next week -  big data solutions to increase security and help with defense...)

