Pattern of Life Calculation from Big Graphs
Navy SBIR 2014.1 - Topic N141-075
ONR - Ms. Lore Anne Ponirakis - [email protected]
Opens: Dec 20, 2013 - Closes: Jan 22, 2014
N141-075 TITLE: Pattern of Life Calculation from Big Graphs
TECHNOLOGY AREAS: Information Systems, Human Systems
OBJECTIVE: The research objective is to mature big graph data analytics to model patterns of life and automatically detect changes indicative of anomalous events . A capability that scales linearly with graph complexity is needed.
DESCRIPTION: Sailors and Marines are responsible for conducting missions such as assaults, embassy protection, non-combatant evacuation, and disaster relief. Mission planning involves understanding normal patterns of life (POL) so anomalies can be recognized. For example, observed changes in imagery over time has been used to derive POL for areas of interest. This type of data can be expressible as a vector of terms enabling expected and estimates of change to be calculable as Euclidian distances from mean values. Relevant POL modeling, however, needs to consider far richer data sets that include information on the content and frequency of interaction of nodes. Relevant data sets must also include open source (e.g. social media), cyber (transactional) and messaging (theme and concept spread) in addition to considering the relationship between movement, people and places. This data, even when time/place bounded, results in very large graphs.
A new capability to enable POL calculation, based on large graph theory, is needed. While Euclidian distance calculation scales linearly with nodes, graph analytics do not. Spurred by social networking data, big graph approximate analytics are maturing towards linearity with nodes. These developments empower answers to questions such as who a person may know or personal preferences. POL calculations must consider more diverse data sets and richer graph representations, calling for even more efficient distance calculations.
Network analysis provides powerful means of studying structural connections . Of interest for this topic, is POL for situation awareness. Military and intelligence operators typically rely on their own data sources and analysis for determination of threat activities. Much of the data reported is of events unfolding or that have already taken place; that is important for response; situation assessment; and historical base-lining. However, this data doesn't provide normal POL that could serve as early indicators of change. This topic will explore analytics of nontraditional open source data fused with conventional data that can provide insight into anomalous activities.
Nontraditional data sources can provide insight into real time activity . Examples of data include open source text, satellite imagery available through search engines, video feeds of vehicle traffic flow, video feeds from public areas available on the Internet, public utility patterns (electricity, water, etc.), weather station reports and many more. It should be noted that combining multiple diverse information sources into a unified graph that can be mapped to POL indicators presents challenges. For instance, how do we assess quality of sources? How do we normalize data types? How do we form graph structures?
The technical challenges of this topic are as follows: 1) automating collection of data for big graph formation; 2) data enrichment and fusion 3) construction and maintenance of a dynamic big graph representation ; 4) calculation of relevant static and dynamic POL metrics from very large and diverse graphs ; 5) setting filters for event detection relevant to user needs (i.e. location, time and tasking); 6) providing means to optimize data collections over time by monitoring and adjusting data sources, (i.e. user data needs and haves); and, 7) identifying analytic methods to scale processing. Practical system building needs to be considered as well as metrics to measure development success.
Creative solutions are desired. Public data gathering should be done on a �not to interfere� basis with providers and should comply with policies for use of the data acquired. Data used, and modeling methods, should be relevant to potential customer for product transition, such as a government agency, program of record or commercial market place. Use of open standards is encouraged to reduce costs and improve system interoperability.
PHASE I: Develop processes and techniques to characterize the content, as it relates to patterns of life, of big graphs over time. The data behind graphs should contain information extracted from diverse data sources. Key technical risks should be identified as well as key technical parameters that measure progress against the risk areas. Results from analysis and concept feasibility tests should be documented in a technical report or paper at a selected conference. The final Phase I brief/demonstration should show risk reduction to the development of a fully responsive Phase II product as well as plans for Phase I Option and Phase II.
PHASE II: Produce a prototype system that is capable of detecting changes to features describing patterns of life rapidly from dynamic large graphs populated by multiple data types and providing mission relevant early threat indicators all enabled by big graph analytics. The prototype system should be able to automatically process, display and alert on activity discoveries relevant to the specific user location and mission interests. The system should support data acquisition, large graph data storage and analytics and alert dissemination. It is desired that context and pedigree of information be maintained for operator review. At this point the performer should focus on a proof-of-concept of capability using data sources that are of interest to a transition program. It is possible that some data sources of interest may be classified secret such as multi-intelligence data (IMINT, HUMINT, MASINT, ELINT).
PHASE III: Produce a system capable of deployment and operational evaluation. The system should address POL indicators that are of value to transition program or commercial application. Machine based processing steps and inferences about patterns of life should be accessible by operator and presented in human understandable form. The software and hardware should be modified to operate in accordance with guidelines provided by transition sponsor.
PRIVATE SECTOR COMMERCIAL POTENTIAL/DUAL-USE APPLICATIONS: The capability specified by this topic is highly relevant to non-government organizations involved in disaster relief who need to track life disruptions and return to normalcy over time.
2. Carter T. Butts, "Revisiting the Foundations of Network Analysis", Science 325, 414, 2009.
3. T.von Landesberger, et. al., "Visual Analysis of Large Graphs: State-of-the-Art and Future Research Challenges". Computer Graphics Forum. http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8659.2011.01898.x/abstract
4. Erica Naome, "The New Big Data Today�s big data is forcing researchers to find new techniques for knowledge discovery and data mining" MIT Technical Review, Aug 22, 2011. http://www.technologyreview.com/news/425090/the-new-big-data/
KEYWORDS: Big Graph Analytics; Patterns of Life; Activity Detection, Dynamic Analysis; Change Detection; Analytics; Graphs; Scalability