DARPA Grant Will Help Stanford Dig Deep into the Big Data in Social Networks
Backed by a $5.6 million grant from the U.S. ‘Defense Advanced Research Projects Agency’ (DARPA)and a team at Stanford, has embarked on a four-year project to better understand and model complex communication patterns in social networks in real time…
Experts in the analysis of big data have noticed a curious pattern among those who tweet: Twitter accounts with the most followers are more likely to attract new ones. Its just one of the many interesting and useful nuggets revealed when researchers peer deeply into the remarkable ocean of data being stockpiled by social media channels.
Stanford researchers from diverse disciplines are developing new and better ways to find meaning in data. The promise of their work has grown so great that the Defense Advanced Research Projects Agency (DARPA) recently stepped to the plate with a grant of $5.6 million to support their research.
The new project is called MEGA: Modern Graph Analysis for Dynamic Networks, and is led by Associate Professor Ashish Goel of Stanford’s Management Science and Engineering department. A team of seven principal investigators, six of them Stanford faculty, will develop algorithms which model human communication and detect subtle patterns in huge data sets from social media.
DARPA is interested because, from a “national security” standpoint, big data holds the promise of “recognizing threats” in unusual or “suspicious social interactions of terrorists” and other foreign adversaries. But Goel, who also holds a Courtesy Appointments in computer science and serves on the technical advisory board of Twitter, Inc., said that the models and algorithms MEGA develops will also influence social media itself, leading to a more sophisticated, personalized experience for all users.
Harnessing enormous data sets
Our daily social communication is spread across many forms of interaction. E-mails, tweets, text messages and Facebook posts define our modern social lives. More than ever, information about this correspondence and behavior can be collected, stored, and made available to computer scientists.
With access to billions of tweets, e-mails and text messages, a project like MEGA can build reliable mathematical models of social phenomena, like the way news spreads through a network for instance, or even how people choose their social connections, Goel said. “From an intellectual point of view, it’s really exciting.”
One goal of the MEGA project is to model human online behavior and find how it shapes social networks. That bit about well-followed Twitter accounts attracting the most new followers is but one example. The team can then transform these known patterns into a more general, abstract theory and see if it applies across many social networks.
The sheer number of communications and the speed at which a network changes have given rise to new challenges, said Goel, problems that more storage or more processing power cannot solve. For instance, in order to analyze the masses of data flowing out of popular social media sites like Twitter, what happened yesterday might as well have happened last century. What matters most is now. The MEGA team wants to analyze it immediately, “not gather” [isn’t that what the NSA claimed? “Not wittingly”] and organize it later.
“On a site like Twitter, you’re not finding data that was there yesterday, you’re finding data that was there last second. And even one second of this data is too big to process on a single machine,” he said. To achieve real-time analysis, the data must be stored and explored across many different computers, which requires yet more new algorithms. This is a second component of MEGA’s research: writing the step-by-step procedures for processing distributed data in real time.
Age-old problems and futuristic solutions
Goel says they have had some early successes, and the group expects to publish high-impact results in the form of new models and algorithms within the project’s first or second year.
Some of their algorithms and programs will be passed directly to DARPA to be used in a “security context,” but the team is also tackling long-standing theoretical problems in computer science. One such problem is the notoriously difficult “travelling salesman” scenario studied by Amin Saberi and his students: if a salesman has a list of cities to visit, and he must visit each one exactly once before returning to where he started, how can we calculate the shortest possible route?
This problem may seem unrelated to the world of social media, but at heart, it deals with a network of access points – like mobile phones or computers on the Internet – combined with an algorithm for calculating the shortest path among them. Goel said it is important to keep making progress on these kinds of classical problems. Even when they don’t have an immediate, real-world application, he said, they advance our understanding of computer science as a discipline.
The team also plans to explore the connection between human behavior – the things we enjoy and choose to share in our social networks, or what we’re looking for when we search online – and algorithms that help shape our online experience, like friend recommendations or search engine results.
MEGA’s algorithms might, for example, lead to a search engine that takes into account not only keywords a user is typing in, but also that user’s social connections and what’s trending online at that moment. This system would essentially construct a brand new, highly personal search engine for each and every search, he said.
A tight network
Helping things along, the MEGA team enjoys close ties to networking companies including Facebook, Twitter and Cisco. This means that their work may someday be used to drive new features on popular social media sites. “It happens only occasionally that you can design an abstract system that actually affects society and the economy on such a large scale,” Goel said.
The project likewise unites a diverse group of experts. Goel’s expertise lies in algorithm design, and he is responsible for several of Twitter’s algorithmic products. Two other Management Science and Engineering professors, Amin Saberi and Ramesh Johari, will also contribute their algorithmic and modeling knowledge. Andrea Montanari, an associate professor of electrical engineering and statistics, will be the team’s statistician and information theorist, while Associate Professor of Computer Science Jure Leskovec brings expertise in data mining and modeling. Economics professor Matthew Jackson has been collecting data from villages in India, which he hopes to compare to online networks like Facebook and Twitter. Also involved in the research is John Heidemann of USC’s Information Sciences Institute.
“We were all having a lot of success in our individual research,” Goel said, “but the DARPA grant allows us to work together to understand how social networks operate.”
MUSE Envisions Mining “Big Code” to Improve Software Reliability and Construction
During the past decade information technologies have driven the productivity gains essential to U.S. economic competitiveness, and computing systems now control significant elements of critical national infrastructure. As a result, tremendous resources are devoted to ensuring that programs are correct, especially at scale. Unfortunately, in spite of developers’ best efforts, software errors are at the root of most execution errors and security vulnerabilities.
To help improve this state, DARPA has created the Mining and Understanding Software Enclaves (MUSE) program. MUSE seeks to make significant advances in the way software is built, debugged, verified, maintained and understood. The “collective knowledge” gleaned from MUSE’s efforts would facilitate new mechanisms for dramatically improving software correctness, and help develop radically different approaches for automatically constructing and repairing complex software.
“Our goal is to apply the principles of big data analytics to identify and understand deep commonalities among the constantly evolving corpus of software drawn from the hundreds of billions of lines of open source code available today,” said Suresh Jagannathan, DARPA program manager. “We’re aiming to treat programs—more precisely, facts about programs—as data, discovering new relationships (enclaves) among this ‘big code’ to build better, more robust software.”
Central to MUSE’s approach is the creation of a community infrastructure that would incorporate a continuously operational specification-mining engine. This engine would leverage deep program analyses and foundational ideas underlying big data analytics to populate and refine a database containing inferences about salient properties, behaviors and vulnerabilities of the program components in the corpus. If successful, MUSE could provide numerous capabilities that have so far remained elusive.
The Special Notice for MUSE is available at http://go.usa.gov/BwgG. The Broad Agency Announcement (BAA) for MUSE is available at http://go.usa.gov/BuR5. To familiarize potential participants with the technical objectives of MUSE, DARPA was on ‘Proposers’ Day’ on Friday, March 7, 2014, at DARPA’s offices in Arlington, Va. For details, visit www.sa-meetings.com/MUSE. For more information, please email MUSE@darpa.mil.
A Privacy-Based Social Network
With rising public interest in what developers refer to as the “privacy economy,” a new…
What Your Phone Records Reveal
Two computer science graduate students have found that the NSA’s mass collection of phone…