Automatic Data Representation, Analysis, and Visualization
Navy SBIR 2010.2 - Topic N102-175 ONR - Mrs. Tracy Frost - [email protected] Opens: May 19, 2010 - Closes: June 23, 2010 N102-175 TITLE: Automatic Data Representation, Analysis, and Visualization TECHNOLOGY AREAS: Information Systems, Sensors, Battlespace OBJECTIVE: Develop and implement techniques for fast and accurate automatic representation, analysis, and visualization of unstructured digital data of different modalities for a large class of navy-related applications. DESCRIPTION: Current technologies for data representation tend to fall into two categories. The first category aims at structuring data at the semantic level that provides a more contextual description of data content. Ontologies such as OWL belong to this category. While this view is closer to human cognition, the approach requires a great deal of human interaction and expertise to design a quality information system from unstructured data such as written reports, text documents, images, or general sensor data. It also demands information update and management that cannot be executed in near real-time to support DoD�s time-sensitive missions. At the other extreme, one can tag an unstructured dataset with metadata. The metadata fields, however, are often prone to errors and, more importantly, ignore data content. This solicitation seeks a new alternative to the two formalizations described briefly above. In particular, ONR is seeking a data-driven framework that can address the stated objective with mathematical or statistical rigor. For instance, a 3-D visual presentation must accurately preserve the information in a high-dimensional data set. By the same token, a fast numerical algorithm for data analysis and extraction of salient information must be supported by a theoretical study on its efficacy and efficiency with respect to the data size and errors. A new approach that can handle different data modalities and establish overlapped or correlated information between them, if there is any, is of great interest to ONR. The recent developments of fast numerical methods in linear algebra and optimization together with the kernel-based, manifold-based, graph-based methods for inductive learning and classification are pushing the frontier of data-driven techniques in several application domains with preliminary results that look promising. More rigorous fine-tuning of algorithms and testing of applications are needed to ascertain the viability of a proposed methodology in real-life scenarios. For the purpose of algorithm development, the performers may use their own data sets provided that these data are in the realm of naval applications. The Navy also reserves the right to test the algorithms and their implementation with data sets of an appropriate type in order to assess performance, scalability, and estimate computational resources. PHASE I: Develop algorithms that can automatically represent, analyze, and update unstructured data of different modalities, grounded on scientific rigor. Validate the proposed approach with experiments on various Navy-related data sets. PHASE II: Develop algorithms to find, compare, and bridge potentially correlated information in different data sets based on the results established in Phase I. Develop a visualization method for high-dimensional data which is user-friendly and allows user-interactive tasks to refine the process of data representation and information discovery. Validate all tasks in an integrated system. PHASE III: The end product should result in a dual-use technology that shares military and commercial interests such as automatic surveillance and biotechnology, biomedicine. PRIVATE SECTOR COMMERCIAL POTENTIAL/DUAL-USE APPLICATIONS: The following industries will benefit from the product developed under this SBIR topic: biotechnology, biomedicine, REFERENCES: [2] M. Belkin, and P. Niyogi, Laplacian Eigenmaps for Dimensionality Reduction and Data Representation, Neural Computation, Vol. 15, No. 6, pp. 1373-1396, 2003. [3] D. Donoho, I. U. Rahman, I. Drori, V. Stodden, P. Schroder, Multiscale Representations for Manifold-valued Data, SIAM Multiscale Model. & Simul., Vol. 4, No. 4, pp. 1201-1232, 2005. [4] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker, Geometric diffusions as a tool for harmonic analysis and structure definition of data: Multiscale methods, PNAS, Vol. 102, pp 7432-7437, 2005. [5] G. Wahba, Encoding Dissimilarity Data for Statistical Model Building, Preprint, 2009. [6] K. Fukumizu, F. Bach, M. I. Jordan, Kernel Dimension Reduction in Regression, Ann. Stat., Vol. 37, No. 4, pp 1871-1905, 2009. KEYWORDS: fast algorithms; multimodality data; information extraction; information processing; pattern discovery; automatic learning and classification
|