Acoustic Training Data Prioritization

Navy SBIR 24.1 - Topic N241-030
NAVSEA - Naval Sea Systems Command
Pre-release 11/29/23   Opens to accept proposals 1/03/24   Now Closes 2/21/24 12:00pm ET    [ View Q&A ]

N241-030 TITLE: Acoustic Training Data Prioritization

OUSD (R&E) CRITICAL TECHNOLOGY AREA(S): Trusted AI and Autonomy

The technology within this topic is restricted under the International Traffic in Arms Regulation (ITAR), 22 CFR Parts 120-130, which controls the export and import of defense-related material and services, including export of sensitive technical data, or the Export Administration Regulation (EAR), 15 CFR Parts 730-774, which controls dual use items. Offerors must disclose any proposed use of foreign nationals (FNs), their country(ies) of origin, the type of visa or work permit possessed, and the statement of work (SOW) tasks intended for accomplishment by the FN(s) in accordance with the Announcement. Offerors are advised foreign nationals proposed to perform on this topic may be restricted due to the technical data under US Export Control Laws.

OBJECTIVE: Develop a tool for assessing training data with artificial intelligence or machine learning (AI/ML) algorithms that provides desired data prioritization results from current or new data for effective, complete, and precise training.

DESCRIPTION: Systems that detect and track submarines are migrating to AI/ML to improve the probability of detecting submarines and to limit the probability of false alerts. The current paradigm for training AI/ML is to use large sets of data. However, the cost associated with training AI/ML on large amounts of data is high and may not result in optimal training results.

There is not currently a commercial tool to assess how comprehensive a training set truly is, how much of the training data is effectively redundant, or whether some data over-represents unusual conditions. Additionally, there is not currently a tool that would enable researchers to determine a priori whether a newly collected data set would add useful diversity to the existing training data. This lack of tools for assessing training data for AI/ML algorithms results in a current state where all data is collected for training, resulting in possible excessive training costs as well as possible over-training to specific data which may not be representative of the full range of conditions in which the system will function during hostile tactical operations.

The Navy seeks a tool for analysis of acoustic data collected by undersea warfare systems to enable selection of data that is diverse, representative, and as small as practical for training of AI/ML algorithms.

Acoustic data used for detection of submarines is collected on arrays of transducers, whether towed line receive arrays such as the Multi-Function Towed Array, or hull-mounted source/receiver arrays such as the 576-element AN/SQS-53C hull-mounted sonar array. The signals from the transducers are formed into beams representing the acoustic environment as a function of bearing at any given point in time. Key characteristics of data sets will include both meta-data (e.g., season, latitude and longitude, time of day) and attributes of the data (e.g., volume reverberation levels, numbers of "clusters" associated with reflectors such as bathymetric features, marine entities, surface ships, submarines, and wakes).

The tool developed will need to demonstrate the training data prioritization technology which reduces the amount of training data used to allow the AI/ML algorithm(s) to maintain or improve performance. Performance of the system is determined by the Receiver Operating Characteristic (ROC) curve, where recorded data is run through the system to determine the number of true positives are achieved as a function of false positives.

Work produced in Phase II may become classified. Note: The prospective contractor(s) must be U.S. owned and operated with no foreign influence as defined by 32 U.S.C. § 2004.20 et seq., National Industrial Security Program Executive Agent and Operating Manual, unless acceptable mitigating procedures can and have been implemented and approved by the Defense Counterintelligence and Security Agency (DCSA) formerly Defense Security Service (DSS). The selected contractor must be able to acquire and maintain a secret level facility and Personnel Security Clearances. This will allow contractor personnel to perform on advanced phases of this project as set forth by DCSA and NAVSEA in order to gain access to classified information pertaining to the national defense of the United States and its allies; this will be an inherent requirement. The selected company will be required to safeguard classified material during the advanced phases of this contract IAW the National Industrial Security Program Operating Manual (NISPOM), which can be found at Title 32, Part 2004.20 of the Code of Federal Regulations. Reference: National Industrial Security Program Executive Agent and Operating Manual (NISP), 32 U.S.C. § 2004.20 et seq. (1993). https://www.ecfr.gov/current/title-32/subtitle-B/chapter-XX/part-2004

PHASE I: Develop a concept for an AI/ML training data prioritization tool that meets the requirements in the Description and demonstrate the feasibility of that concept using unclassified data obtained or created by the company. If the unclassified data is not acoustic data, then it must be clearly extensible to the acoustic data use case. Feasibility will be demonstrated through analysis and modeling. Demonstrate the ROC curve associated with training on all data and how the ROC curve is maintained or even improved when AI/ML is trained using the prioritized subset of all data.

The Phase I Option, if exercised, will include the initial design specifications and capabilities description to build a prototype solution in Phase II.

PHASE II: Design, develop, and deliver a prototype AI/ML training data prioritization tool for testing and evaluation based on the results of Phase I. Demonstrate the prototype meets the requirements in the Description. The government will provide data sets used to train current AI/ML algorithms that are used in the AN/SQQ-89A(V)15 sonar system, and a MatLab implementation of at least one such algorithm.

It is probable that the work under this effort will be classified under Phase II (see Description section for details).

PHASE III DUAL USE APPLICATIONS: Support the Navy in transitioning the technology to Navy use. The Navy will establish a contract vehicle to apply the training data prioritization technology to AN/SQQ-89A(V)15 in support of additional AI/ML algorithm development opportunities, not limited to Undersea Warfare systems.

Given the emerging importance of AI/ML in numerous major industry sectors this technology can be used in many training areas. Science and engineering professions would do well their training centers to incorporate the technology because of ever changing information data.

REFERENCES:

  1. "AN/SQQ-89(V) Undersea Warfare / Anti-Submarine Warfare Combat System, updated 20 Sep 2021." https://www.navy.mil/Resources/Fact-Files/Display-FactFiles/Article/2166784/ansqq-89v-undersea-warfare-anti-submarine-warfare-combat-system/
  2. "The Essential Guide to Quality Training Data for Machine Learning: What You Need to Know About Data Quality and Training the Machine." Cloudfactory. https://www.cloudfactory.com/training-data-guide
  3. Rim of the Pacific (RIMPAC) international maritime exercise website, available 6 Apr 2023 at https://www.cpf.navy.mil/RIMPAC/

KEYWORDS: Artificial intelligence or machine learning (AI/ML); training data for AI/ML algorithms; acoustic data; undersea warfare systems; data that is diverse and representative; Multi-Function Towed Array; AN/SQS-53C hull-mounted sonar


** TOPIC NOTICE **

The Navy Topic above is an "unofficial" copy from the Navy Topics in the DoD 24.1 SBIR BAA. Please see the official DoD Topic website at www.defensesbirsttr.mil/SBIR-STTR/Opportunities/#announcements for any updates.

The DoD issued its Navy 24.1 SBIR Topics pre-release on November 28, 2023 which opens to receive proposals on January 3, 2024, and now closes February 21, (12:00pm ET).

Direct Contact with Topic Authors: During the pre-release period (November 28, 2023 through January 2, 2024) proposing firms have an opportunity to directly contact the Technical Point of Contact (TPOC) to ask technical questions about the specific BAA topic. Once DoD begins accepting proposals on January 3, 2024 no further direct contact between proposers and topic authors is allowed unless the Topic Author is responding to a question submitted during the Pre-release period.

SITIS Q&A System: After the pre-release period, until January 24, 2023, at 12:00 PM ET, proposers may submit written questions through SITIS (SBIR/STTR Interactive Topic Information System) at www.dodsbirsttr.mil/topics-app/ by logging in and following instructions. In SITIS, the questioner and respondent remain anonymous but all questions and answers are posted for general viewing.

Topics Search Engine: Visit the DoD Topic Search Tool at www.dodsbirsttr.mil/topics-app/ to find topics by keyword across all DoD Components participating in this BAA.

Help: If you have general questions about the DoD SBIR program, please contact the DoD SBIR Help Desk via email at [email protected]

Topic Q & A

1/12/24  Q. Is the "training data" to be prioritized always labeled - or might it be "raw" (i.e., right from the sensor) or a mix of the two?
   A. The "training data" would initially be clusters with the ~20 features associated with each cluster. At this time we go through a "truthing" step so it is known if a cluster is believed to be associated with a submarine or not. If there is a technology that meets the topic requirement with processed and truthed data but is also extensible to pre-truthed or even pre-processed data, that would be of interest.
Since the acoustic data collected by the AN/SQQ-89 is secret, it will not be available for use for Phase I. We encourage small businesses to propose an analogous data set they can collect or generate or obtain that allows them to demonstrate technology that is extensible to our active sonar problem.
1/10/24  Q. Are all of the features in your data real numbers or are there categorical/ordinal features as well ?
   A. As shared with those who talked with authors, there are approximately twenty features for each detected cluster representing various attributes such as signal to noise ratio, perceived bearing, perceived range, and perceived reflector extent. None of the various features are categorical or ordinal.

[ Return ]