N241-055 TITLE: Generative Text Engine for Form Completion
OUSD (R&E) CRITICAL TECHNOLOGY AREA(S): Advanced Computing and Software; Human-Machine Interfaces; Trusted AI and Autonomy
OBJECTIVE: This SBIR topic is soliciting tools and techniques to facilitate generating semi-structured text reports with free-form text. There is a research interest in exploring the application of Generative Text artificial intelligence (AI) (such as Chat GPT, GPT3/4, etc.) to facilitate the filling in of text-based data collection forms; however, other tools and approaches will be considered if it is explained how they would contribute to the requested capability. The data generated by this general purpose form completion engine will lead to reduced data curation for subsequent analytics. The desired solution will:
(1) generate a general-purpose curation/creation text engine that facilitates completing a variety of text-based forms.
(2) describe a mechanism for incorporating technical terminology & phrasing appropriate for a specific usage domains (potentially including sensitive or classified terminology and phrases) along with a general baseline generative text engine.
(3) be designed to be useful with minimal compute, and without immediate or sustained connection to cloud-based processing resources. Cloud-based Processing intense resources may be used in developing the general-purpose engine and achieving threshold performance, but the proposal must describe how the initial capability will be refined to be useful with minimal computer and storage footprint. Further, the proposal must state the size and capabilities for processing that shall be required to achieve with threshold and objective (final) performance in the desired system.
(4) describe any key technologies being used in creating the capability, and clearly characterize the data usage rights associated with those capabilities.
The concept being proposed in this SBIR topic shall demonstrate the use generative text algorithms to curate the text entries as they are being created. The desired solutions should:
(1) focus on a workflow / process for a prompting dialog between the generative text engine and the user vice developing large language models. It is expected that some tuning of large language models may be required to address a specific technical domain, but that should be as constrained as possible to focus on the process whereby users interact with the models to facilitate form completion.
(2) be easily adapted for incorporating technical jargon and domain specific phrases for different usage domains. The technique(s) for incorporating specialized technical language into the application must be described.
(3) address anticipated prompt tuning techniques to adapt to specific technical domains enabling techniques for one-shot or few-shot learning.
(4) generate appropriate phrases/descriptions (an understanding of what is being described) in different task domains that are correctly structured and generate consistent and appropriate technical descriptions.
(5) be scalable for use from PCs/Tablets/Phones with limited connectivity to a local server and be cloud- connected, not cloud-dependent.
(6) provide for the use of instructions + answers as a sustainable workflow for maintaining / utilizing the authoring / curation engine.
DESCRIPTION: This effort is aimed at enabling the creation of text-based forms with consistent terminology and phrasing by applying generative text artificial intelligence (AI) technology during the authoring of form content. The desired technology will assist content creators by offering interactive curation during the content authoring. The application of the developed technology will result in more consistent form content that is amenable to automated analytics on the generated text and will therefore accelerate and improve accuracy of ship maintenance reporting.
New advances in integrating Large Language Models (LLMs) in application pipelines have demonstrated the potential to support a wide range of technical reporting domains; however, there are significant challenges in generating text with relevant content and terminology when completing maintenance reports. While LLMs show impressive performance in general knowledge and reasoning capabilities, they have inherent limitations and lack capabilities required for broad language understanding and use in the real world (e.g., specialized or proprietary knowledge of terms, facts and concepts). Fine tuning, parameterizing, and combining LLMs with external tools should produce capabilities that enable LLMs to be more useful in real world settings, such as that of facilitating completing form-based descriptions of technical problems and their impacts. The desired applications will provide customized content to support maintenance reporting workflows and answer technical questions across a variety of maintenance reporting use cases.
PHASE I: Conduct research in open source LLMs with commercially permissive license (e.g., Apache 2.0, MIT) to identify, select, and track appropriate models that have the potential to perform well for the Navy domain and desired downstream tasks. Selected models must be usable in both research and commercial settings. The solution will need to work on resource constrained devices (e.g., tablets, laptops), which may be disconnected from the Internet and cloud-based resources during form authoring. To improve the performance of models in deployment environments, different techniques (e.g., distillation, supervised fine-tuning, parameterization) should be identified, explored, and evaluated to ensure correct information is generated for the defined downstream tasks. Define the task and data sources that will be used to act as a suitable proxy for ship maintenance reporting, which involves consistently generating text necessary to fill-in ship maintenance forms. The longer-term technical objective is a general-purpose form-completion engine that can be readily adapted to various technical domains and terminologies and utilize alternative technical jargon and phraseology. The selected LLM and a systems-based approach will minimize model behaviors that generate incorrect content for the selected domains and defined tasks. It is assumed that the task being performed will require new knowledge that was not part of the pre-training data of a general large language model. Successful approaches will securely combine new private data into the workflow and customize the LLM for a target domain and authoring task. Phase I should result in proof of concept demonstrations of key capabilities so as to show how a prototype tool will be built and demonstrated during Phase II. The primary metrics for Phase I success will be quality of proposed workflows for user interaction and a demonstrated use case to show how forms would be completed using a representative large language model.
PHASE II: Build on the tools and results of Phase I to create a viable prototype tool for form completion. Utilize real world forms completion tasks. Ideally the problems and real-world data sources would relate to Navy ship maintenance reporting and ship material readiness, although use cases for other transition customers would be acceptable. A prototype tool will be built and tested to demonstrate a proof of concept involving a user interacting with the system to produce a complete and accurate report. The Ship's Maintenance Action Form (OPNAV 4790 or two-kilo) is an example of a primary maintenance data system (MDS) form that would be of interest, which is used to report both deferred and completed maintenance actions. The mission-degrading casualty report (CASREP), is another example that is used to report an equipment degradation to the operational commander which impacts mission readiness. Automated tools will (1) generate text and fill in these semi-structured forms with free-form text fields, (2) reduce data curation requirements, and (3) enable analytics on the curated data.
For Naval applications, the contractor will need to be able to process Controlled Unclassified Information (CUI) and/or classified data sources up to the Secret level. The government team will provide contractor access to historical reports to support development and evaluation of the proposed techniques, automated tools, and analytics (e.g., text generators, classifiers). The historical text was often written inconsistently and therefore making it challenging to automate analytics across this data. Address inconsistencies and unique language in the various text reporting workflows and describe how the proposed capabilities will support generation of high-quality data for reporting. Describe and demonstrate analytics/metrics on the text data generated to assess the quality of the text being generated. Assess how the tool will run on resource constrained hardware (e.g., tablets, laptops) with reasonable compute capabilities and document its ability to run on-line and off-line (i.e., that the developed technology would be suitable for shipboard/at sea use with limited access to cloud/remote computing capabilities). The tool will provide a tailorable vocabulary database suitable for use across different technical reporting domains (e.g., electrical systems, distillation systems, turbine mechanics, etc.). The workflow and user interface will be fully described and demonstrated as appropriate. The workflow shall be demonstrably easy to use and will demonstrate valid, predictive results. Technical evaluations, capability demonstrations, and metrics will focus on the quality of the human machine interaction (HMI), completeness / correctness of reports, and generalizability of approach across technical reporting domains shall be addressed at the completion of Phase II.
PHASE III DUAL USE APPLICATIONS: Integrate and transition the developed tools for support of the NAVSEA SEA21 Ship Maintenance Data Improvement Initiative (SMDII) Program of Record (POR) to support automated text processing requirements for Navy ship maintenance reporting and ship material readiness. The tools being developed are expected to be applicable to a broad range of form completion applications, including for medical, maintenance, and other domains reliant on text-based data entry.
REFERENCES:
KEYWORDS: Automated Text Curation, Large Language Models (LLM), 2-Kilos, CASREP, Casualty Report, Form authoring, Artificial Intelligence, AI
** TOPIC NOTICE ** |
The Navy Topic above is an "unofficial" copy from the Navy Topics in the DoD 24.1 SBIR BAA. Please see the official DoD Topic website at www.defensesbirsttr.mil/SBIR-STTR/Opportunities/#announcements for any updates. The DoD issued its Navy 24.1 SBIR Topics pre-release on November 28, 2023 which opens to receive proposals on January 3, 2024, and now closes February 21, (12:00pm ET). Direct Contact with Topic Authors: During the pre-release period (November 28, 2023 through January 2, 2024) proposing firms have an opportunity to directly contact the Technical Point of Contact (TPOC) to ask technical questions about the specific BAA topic. Once DoD begins accepting proposals on January 3, 2024 no further direct contact between proposers and topic authors is allowed unless the Topic Author is responding to a question submitted during the Pre-release period. SITIS Q&A System: After the pre-release period, until January 24, 2023, at 12:00 PM ET, proposers may submit written questions through SITIS (SBIR/STTR Interactive Topic Information System) at www.dodsbirsttr.mil/topics-app/ by logging in and following instructions. In SITIS, the questioner and respondent remain anonymous but all questions and answers are posted for general viewing. Topics Search Engine: Visit the DoD Topic Search Tool at www.dodsbirsttr.mil/topics-app/ to find topics by keyword across all DoD Components participating in this BAA.
|
1/21/24 | Q. | If possible, can you share which systems (technical domains) generate the highest number of 2-kilos reports or tickets? |
A. | No, we do not have that information. 2-Kilos can be about any system on a ship, and are supposed to be generated anytime there is a technical issue. | |
1/17/24 | Q. | Is the prototype proof of concept demonstration required as part of the Phase I base period? Or will a demonstration during the Phase I option be acceptable so long as it falls within the total period of performance |
A. | Prototype fabrication is not essential for phase I, although fabricating key components will strengthen the case for a phase II. | |
1/3/24 | Q. | 1. Can you describe the largest / most important business problem you are trying to solve for this effort for Phase I?
2. What data sources / databases are available for the Phase I effort? Is there any current data shared between systems/users? 3. Is there an existing solution that automates or facilitates the completion of text based forms? If yes, can you provide the name of the solution and if it is Government owned / operated? 4. Do you have an existing CONOPS which will eventually integrate this problem into the DON enterprise? 5. Who are the key end users for this requirement? For example, is it for maintainers or ISEAs afloat? For ashore? Is the plan to have it available to all NAVSEA users like CMAS? 6. What is the minimum compute and storage footprint you are looking to achieve in this effort? The max? 7. What are the existing maintenance workflows supported under this effort? 8. Please clarify the following requirement from the SOW under the Phase I section: Define the task and data sources that will be used to act as a suitable proxy for ship maintenance reporting. Is the contractor required to identify the data sources as part of the proposal? Will there be access to SMEs to understand current sources, workflows and tasks? 9. What data rights is the Government anticipating for this effort? 10. Will the prototype solution be deployed in a Government environment? Or is there a requirement to describe how the solution would be built on NIPRNET? |
A. | 1. The overall intent of the topic is to develop a general purpose form completion engine for forms that requiring open text answers. The Navy specifically has a need for better quality 2-Kilo reports. It is asserted in this topic that the use of generative AI will allow end users to generate better 2-Kilo reports (as stated in the topic). However, the proposal may address completion other text based forms for other applications.
2. Perhaps. It depends on who you target as a transition customer, what their needs are, and what they have for data.If you target 2-Kilos, NASEA SEA21D is prepared to work with Phase 1 awardees to get a sampling of completed 2-Kilo reports that could serve as story templates for the training of generative AI. 3. No. 4. 2-Kilos are effectively trouble tickets. They are written reports with some structure. They can describe anything that is broken or not working on a ship, so they cover a broad range of technical domains (communications, networking IT, HVAC, mechanical systems, plumbing, electrical, etc.). They are currently generated on an ad hoc basis (at the end of a shift or as the technician has time) by technicians who are filling in the 2-kilo form using a web interface. The interface involves some drop-down entries, but largely require the users to type in text that describe what the problem is, what the system is, where it is located, symptoms, what they tried to do to fix it, what the status is when they are done, if parts are needed or follow-up action is required, etc. Block 35 is effectively free form text where the technician needs to tell a story about what the problem is and what its repair status is. 2-Kilo data is used for a variety of purposes by ship maintenance and repair entities, including shipyards scheduling and planning for maintenance. There is documentation available on the web describing 2-kilos, their purpose and how they should be filled out on the web and is cited in the topic. 5. Yes. 2-Kilos are used for many purposes across the Fleet for ship maintenance and repair, and used by operational commands in reporting unit readiness. The SEA21 Ship Maintenance Data Improvement Initiative (SMDII) program of record is seeking to improve the quality and availability of data across Fleet users thru their web portal. NAVSEA makes 2-Kilo data available to Fleet users. 6. We don’t have numbers. It is expected that the proposals to this topic will tell us what they expect to need for computing resources. We are anticipating that there will need to be two use cases: 1) Development and 2) Deployment. During development, the proposer can use whatever they need (or can afford) to develop and tune their large language model, and create a form completion capability. During deployment, they will need to describe what compute capabilities they will need and ensure that it is reasonable for using by their end users. For Navy applications, e.g. 2-Kilos, this specifically means using from a deployed ship. Ships have limited access to off-board resources, and users will have access to midrange computers (3-5 year old PCs running MS Windows) on a local area network. Servers aboard a ship may have limited generative AI infrastructure (GPU’s and memory, and local instances of language models), but shipboard capabilities are still be defined by the Navy, and will have limited deployment. It is expected that the proposers will determine what their computing infrastructure requirements are, and provide the computing expectations for both model development and deployment. Proposers will need to discuss the acceptability of their expected computing infrastructure assumptions / assertions with their anticipated transition customer (whomever they expect to provide Phase 2B matching funds) to see if those expectations are acceptable. 7. Not a pertinent question. 2-Kilos are trouble ticket type reports generated for all systems in the surface fleet. They are created and updated by a range of end users throughout the life-cycle of a problem until it is successfully addressed. 2-Kilos are text-based data that are used by range of people for different purposes at different times. 8. This topic is to build a general purpose text form completion engine. It can be for any open text form. It is expected that you will need access to some corpus of technical jargon to build a supplemental language model that will operate along with a general purpose large language model. You need to identify where you will get the technical jargon data you propose to use to create the specialized language model for the (transition customer) domain you are focusing on. If you focus on the Navy’s 2-Kilo problem, NAVSEA SEA21D is prepared to provide limited support from the SMDII support staff in obtaining 2-Kilo and relevant data. However, we expect that as 2-Kilos describe such a wide range of technical domains, that the proposer will need to focus on only 1 or 2 domains, and then use supplemental materials (e.g. training manuals, technical service manuals, etc.) to get the technical jargon necessary to create a suitable targeted supplemental language model for those domains. We expect proposers will choose to focus on one or more technical domains, and they will do that based on their being able to find appropriate data to develop the capability. As a proposer, you will need to tell us where you expect to get that data. 9. Rights in technical data BAA generally remain with the contractor, except that the Government obtains a royalty-free license to use such technical data only for Government purposes during the period commencing with contract award and ending twenty years after completion of the project under which the data were generated. Please review section 8.8, Technical Data Rights, of the DoD SBIR 24.1 Preface 10. Yes, if you are targeting completing Navy forms (e.g. the 2-Kilos), the tool will be deployed on the NAVY NIPRNET (UNCLASS) environment. (Thus the need for a deployment use case.) You will need to describe what you anticipate as requirements for end users to use your “Form completion engine”. The product could be a text document that is manually pasted into the current web portal form. Or generates a file for a database. What the output needs to be will be negotiated with your transition customer who will be investing in your capability at Phase 2B of the SBIR. |