High Performance Computing (HPC) for AEGIS Combat Systems Test Bed (CSTB)
Navy SBIR 2019.1 - Topic N191-019
NAVSEA - Mr. Dean Putnam - dean.r.putnam@navy.mil
Opens: January 8, 2019 - Closes: February 6, 2019 (8:00 PM ET)

N191-019

TITLE: High Performance Computing (HPC) for AEGIS Combat Systems Test Bed (CSTB)

 

TECHNOLOGY AREA(S): Information Systems

ACQUISITION PROGRAM: PEO IWS 1.0, AEGIS Integrated Systems Program Office

The technology within this topic is restricted under the International Traffic in Arms Regulation (ITAR), 22 CFR Parts 120-130, which controls the export and import of defense-related material and services, including export of sensitive technical data, or the Export Administration Regulation (EAR), 15 CFR Parts 730-774, which controls dual use items. Offerors must disclose any proposed use of foreign nationals (FNs), their country(ies) of origin, the type of visa or work permit possessed, and the statement of work (SOW) tasks intended for accomplishment by the FN(s) in accordance with section 3.5 of the Announcement. Offerors are advised foreign nationals proposed to perform on this topic may be restricted due to the technical data under US Export Control Laws.

OBJECTIVE: Provide dynamic resource allocation software for High Performance Computing (HPC) by optimizing computing hardware/software usage in response to unanticipated simulation events and/or simulations requiring more processing time in the Combat Systems Test Bed (CSTB).

DESCRIPTION: The CSTB, as an integrated model across the entire AEGIS Combat System, is computationally intensive to operate and functions in a time-managed environment. The AEGIS program office has made an investment in modeling and simulation capabilities to emulate an integrated Combat System. Ultimately this system requires a grid environment, Monte Carlo analysis capability, innovative scheduling software, and modular models to ensure necessary model speed and capacity. The required innovative methods are (1) dynamically allocation of resources during the simulation and (2) optimization of models in a modular fashion so that they can take advantage of all hardware available in a grid environment.

The CSTB will be integrating 30-plus models and when used in a simulation will produce a high-fidelity representation of the entire AEGIS Combat System.  Every model created contains inherent limitations and system resource requirements, and operates at a designated speed. A High Level Architecture (HLA) enables the models to integrate together and facilitates the transportation of interactions amongst them. The current paradigm is to schedule a model on one server, the next model on another server, and so forth. The necessary innovative breakthrough is for the scheduling software to be smart enough to adjust the model in a modular fashion so that the slowest part of the model is able to run as fast as its quickest part, which would achieve efficiency in runtime. Furthermore, if a model’s runtime could be sped up with a GPU (Graphical Processing Unit), the scheduling software should be aware of this and apply the appropriate resources when possible. Additionally, there is no current capability for the system to reallocate resources due to an unplanned event.  For example, if a threat did a certain maneuver or a type of jamming midway through the simulation, there is no way to dynamically allocate the available computing resources so that this event does not slow down the entire simulation. This has an exponential impact on time when considering Monte Carlo runs. For the CSTB to be effective in its mission and deliver critical analysis, the Navy must run the CSTB using a High Performance Computing (HPC) paradigm. This HPC environment will use servers in parallel and will need a method for maximizing the resource capability, availability, throughput, and capacity within fiscal limitations.

There are commercial off-the-shelf (COTS) solutions available for resource allocation such as Univa Grid Engine (UGE) and HTCondor. UGE optimizes throughput and performance of applications, containers, and services by maximizing shared computing resources. HTCondor is able to develop, implement, deploy, and evaluate mechanisms and policies that support High Throughput Computing (HTC) on large collections of distributive computing resources. Unfortunately, neither of these solutions addresses unplanned events during a simulation or compensates for additional processing requirements and resource allocation.

The Navy seeks scheduling software that allocates and monitors computing resources, as well as starts the simulations using HPC. Software used in an HPC-enabled CSTB computing environment will have to comply with the DISA Risk Management Framework (RMF) methodology for identifying, managing, and mitigating cybersecurity risk [Ref 3]. The solution will use multiple models with distribution across multiple servers that utilize the Linux Operating System, allowing for extraordinary levels of processing speed. The software will start the simulations by dynamically allocating system resources to software processes, efficiently utilize the available resources, monitor resources to ensure effective execution of priorities, and enable reallocation of resources when required. The innovation needed to achieve these objectives requires the capability of dynamically adjusting resources throughout a simulation to shorten the time it takes specific events to execute, e.g., a maneuvering threat or a scenario that requires jamming. Furthermore, the scheduling software must be intelligent to operate the models in a modular fashion allowing for the slowest part of the model to execute as quick as the fastest part. For example, if a radar model could execute in a shorter time using a Graphical Processing Unit (GPU), the scheduling software should be aware of this and take advantage of this computing resource in a modular fashion. Overall, this innovation would save days of computing time required for Monte Carlo runs.

The scheduling software will drive affordability through the Navy by reducing costs in acquisition and manning. The software will maximize shared computing resources across the server farm, which optimizes performance of the models’ throughput. This distributed structure will reduce costs by selecting resources that are optimal for each segment of work, subsequently extending the mean time before failure of each server. Initial estimates for service life enable a cost reduction of 40% for server purchases. In addition, an estimated 20-40% reduction in staffing costs is expected for running the model. The current process to achieve high performance computing is starting each run manually on individual computers. The scheduling software will enable one individual to commence and monitor multiple simultaneous runs. Thus, through a reduction in acquisition of servers and in staffing required to commence and monitor runs, the scheduling software helps to achieve affordability for the Navy. The AEGIS CSTB needs to execute runs in a timely manner to answer engineering questions posed by the technical team. This requires the AEGIS CSTB to run on a server farm and have the ability to spin up multiple processes in parallel to support the analysis required to answer the engineering questions asked. HPC allows the AEGIS CSTB to operate on a server farm to execute parallel processing, and cuts overall run time for Monte Carlo analysis. This will allow the CSTB to conduct multiple runs concurrently. The requirement is to reduce the time it takes to run 100 Monte Carlo sets in series down to the time it would take to run 2-10 sets in series. In this manner, runtime performance will be optimized, allowing the response time to be decreased by at least a factor of 10.

The system parameters required to attain the specific intended use of the scheduling software are accepting/starting modeling jobs; allocating jobs to available resources; monitoring the jobs; ensuring the jobs are executed to completion; saving the data that is produced on a network-attached storage; and confirming the validity of the data. User prioritization of jobs will guarantee that high-priority jobs are finished first.

The CSTB operates in a test environment that consists of desktops and a server farm. The desktop allows the end-user to access the server farm, where multiple simulations are executed concurrently. The desktops are used for conducting analyses on the data that is produced from the simulations.

The Phase II effort will likely require secure access, and NAVSEA will process the DD254 to support the contractor for personnel and facility certification for secure access. The Phase I effort will not require access to classified information. If need be, data of the same level of complexity as secured data will be provided to support Phase I work.

Work produced in Phase II may become classified. Note: The prospective contractor(s) must be U.S. Owned and Operated with no Foreign Influence as defined by DOD 5220.22-M, National Industrial Security Program Operating Manual, unless acceptable mitigating procedures can and have been be implemented and approved by the Defense Security Service (DSS). The selected contractor and/or subcontractor must be able to acquire and maintain a secret level facility and Personnel Security Clearances, in order to perform on advanced phases of this contract as set forth by DSS and NAVSEA in order to gain access to classified information pertaining to the national defense of the United States and its allies; this will be an inherent requirement. The selected company will be required to safeguard classified material IAW DoD 5220.22-M during the advance phases of this contract.

PHASE I: Define and develop a concept for scheduling software relative to HPC. Demonstrate that the concept shows it will feasibly support the test environments identified in the Description. Determine feasibility by an assessment of analysis and simulation runtime. Develop a Phase II plan. The Phase I Option, if exercised, will include the initial design specifications and capabilities included in the Description to build a prototype solution in Phase II.

PHASE II: Design, develop, and deliver a prototype scheduling software to efficiently allocate and monitor resources, as well as start simulations across a server farm. Ensure that the prototype system will be capable of accepting CSTB Modeling jobs in accordance with the Description requirements.

It is probable that the work under this effort will be classified under Phase II (see Description section for details).

PHASE III DUAL USE APPLICATIONS: Support the Navy in transitioning the technology to Navy use in order to meet a critical Navy need to decrease the amount of time it takes to generate data required to answer engineering questions posed by the technical team. Test the product in the CSTB Laboratory to verify and validate its functionality. The final product must be approved by the AEGIS CSTB program office.

This scheduling software can be utilized across the motor vehicle industry and other large industries that have intensive computational needs. Academia, the aviation industry, the weather industry, and the energy industry, could benefit from this technology.

REFERENCES:

1. “Introduction to High Performance Computing.” HPC Advisory Council, 18 March 2018. http://www.hpcadvisorycouncil.com/pdf/Intro_to_HPC.pdf

2. Newton, Randall. “What’s Happening to Cluster Computing?” Digital Engineering, 1 November 2016. http://www.digitaleng.news/de/whats-happening-to-cluster-computing/

3. DODI 8510.01, Risk Management Framework (RMF) for DoD Information Technology (IT), 12 March 2014. http://www.esd.whs.mil/Portals/54/Documents/DD/issuances/dodi/851001_2014.pdf

KEYWORDS: High Performance Computing; HPC; Monte Carlo Analysis; Dynamically Allocating System Resources; Parallel Processing; Combat Systems Test Bed; CSTB; Scheduling Software; ACS; AEGIS Combat System

TPOC-1:

John Clarke

Phone:

202-781-3922

Email:

john.r.clarke3@navy.mil

 

TPOC-2:

Robert Rumbaugh

Phone:

202-781-4932

Email:

robert.rumbaugh@navy.mil

 

** TOPIC NOTICE **

These Navy Topics are part of the overall DoD 2019.1 SBIR BAA. The DoD issued its 2019.1 BAA SBIR pre-release on November 28, 2018, which opens to receive proposals on January 8, 2019, and closes February 6, 2019 at 8:00 PM ET.

Between November 28, 2018 and January 7, 2019 you may communicate directly with the Topic Authors (TPOC) to ask technical questions about the topics. During these dates, their contact information is listed above. For reasons of competitive fairness, direct communication between proposers and topic authors is not allowed starting January 8, 2019
when DoD begins accepting proposals for this BAA.
However, until January 23, 2019, proposers may still submit written questions about solicitation topics through the DoD's SBIR/STTR Interactive Topic Information System (SITIS), in which the questioner and respondent remain anonymous and all questions and answers are posted electronically for general viewing until the solicitation closes. All proposers are advised to monitor SITIS during the Open BAA period for questions and answers and other significant information relevant to their SBIR/STTR topics of interest.

Topics Search Engine: Visit the DoD Topic Search Tool at sbir.defensebusiness.org/topics/ to find topics by keyword across all DoD Components participating in this BAA.

Proposal Submission: All SBIR/STTR Proposals must be submitted electronically through the DoD SBIR/STTR Electronic Submission Website, as described in the Proposal Preparation and Submission of Proposal sections of the program Announcement.

Help: If you have general questions about DoD SBIR program, please contact the DoD SBIR Help Desk at 800-348-0787 or via email at sbirhelp@bytecubed.com