The development of disaggregated mission architectures wherein multiple vehicles work cooperatively and autonomously in a cluster or formation to achieve mission objectives depends on a number of key software technologies to be addressed. Fault Management is one of them. Emergent is developing the Separable Architecture for Fault Isolation and Recovery (SAFIR) for these missions through NASA’s SBIR program. Although SAFIR is applicable to any system of systems, our Phase 2 SBIR demonstrates the architecture for a cluster of space vehicles. The demonstration includes fault detection, isolation, and recovery (FDIR) algorithms for the cluster and protocols for communication of fault information between vehicles. The SAFIR software was developed as apps for Core Flight System (cFS), and it was run successfully on representative hardware using a high fidelity simulation of vehicles in low earth orbit.


The figure above provides a notional diagram of SAFIR. On each vehicle (or module), FDIR algorithms employ sensors already in place on the module to detect faults. In addition, modules in the cluster exchange FDIR data in a manner that enhances performance over localized FDIR algorithms, while allowing the system to be robust to communication interruptions or outages. Data from each module’s FDIR system can be relayed to a ground station in case operators are needed to determine further actions.

Flexibility in SAFIR comes from the fact that the FDIR algorithms are customizable to an individual mission. While our reference FDIR system employs algorithms designed to detect, isolate and recover from major types of faults that affect multi-vehicle operations, additional FDIR algorithms can be added to or removed for future applications. The service-oriented architecture eases portability to new platforms and subsystems by providing flexibility, scalability and robustness. A library of detection algorithms that are already developed will be made available for reuse, or SAFIR can be expanded with any number of detection algorithms that follow the communications paradigm.The Diagnosis and Recovery components are customizable for new aspects of the health message.

This promotes easy adaptation to new systems and future missions. Some examples are: unmanned vehicles, cloud servers or networks of biomechanical devices.


During Phase 2, we demonstrated SAFIR on four HummingBoard i2’s and one Raspberry Pi 2. All four boards run Debian Jessy. They communicate to a simulation of the spacecraft bus over Ethernet, but they communicate to each other over Wi-Fi. The figure above is a diagram of the demonstration environment setup. Ethernet connects the HummingBoards using a 10/100 network switch. The Ethernet connections facilitate communications between Trick and the HummingBoards as well as the inter-module links between the HummingBoards. Each SAFIR component is implemented as a cFS app. The F6 CFA apps for cFS are also run for the demonstration. Trick is a NASA simulation environment that we had used in the testing and demonstrating CFA previously for System F6. Trick is set up with JEOD (JSC Engineering Orbital Dynamics) and EDGE (Engineering Dynamic On-board Ubiquitous Graphics (DOUG) for Exploration). The integrated demonstration is shown running below: