d-av-3 2 2b beschreibung time & space partitioning and the...

40
D-AV-3.2.2B: Beschreibung von Time & Space Partitioning Version: 1.0 Projektbezeichnung SPES 2020 Verantwortlich Dr. Rupert Reiger QS-Verantwortlich Stephan Stilkerich Erstellt am 09.09.2009 Zuletzt geändert 17.02.2011 17:27 Freigabestatus Vertraulich für Partner: Alle Partner Projektöffentlich X Öffentlich Bearbeitungszustand in Bearbeitung vorgelegt X fertig gestellt

Upload: others

Post on 09-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.2B: Beschreibung von Time & Space Partitio ning

Version: 1.0

Projektbezeichnung SPES 2020

Verantwortlich Dr. Rupert Reiger

QS-Verantwortlich Stephan Stilkerich

Erstellt am 09.09.2009

Zuletzt geändert 17.02.2011 17:27

Freigabestatus Vertraulich für Partner: Alle Partner

Projektöffentlich

X Öffentlich

Bearbeitungszustand in Bearbeitung

vorgelegt

X fertig gestellt

Page 2: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

Weitere Produktinformationen

Erzeugung Dr. Rupert Reiger

Mitwirkend

Änderungsverzeichnis

Änderung Geänderte

Kapitel Beschreibung der Änderung Autor Zustand

Nr. Datum Versi

on

1 02.11.10 & folgende

Wochen

0.1 alle Initiale Produkterstellung, alle Belange zum Thema

sammeln

Reiger

In Bearbeitung

2 16.11.10 & folgende

Wochen

0.15 alle Recherche state-of-the-art

Reiger In Bearbeitung

3 08.12.10 & folgende

Wochen

0.2 alle Dokumentstruktur fest,

Reiger

In Bearbeitung

4 15.12.10 & folgende

Wochen

0.3 alle Bilder, alles Aspekte stichpunktartig in der Struktur

Reiger

In Bearbeitung

5 19.01.11 & folgende

Wochen

0.5 alle Umbau des Dokuments, Stichpunkte in Prosa

Reiger

In Bearbeitung

6 25.01.11 & folgende

Wochen

0.7 alle Bilder ändern, finale Version Stichpunkte in Prosa

Reiger

In Bearbeitung

7 28.01.11 & folgende

Wochen

0.85 alle Tailoring, Struktur finale Version,

Stichpunkte in Prosa

Reiger

In Bearbeitung

8 05.02.11 & folgende

Wochen

schreiben Reiger

In Bearbeitung

9 11.02.11 0.99 Fertigstellung Version 0.99 Reiger final

10 17.02.11 0.99 QA Stilkerich final

11 17.02.11 1.00 Fertigstellung Reiger final

Page 3: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

3

Kurzfassung

Das Dokument baut auf folgendem Dokument auf und führt weiter:

D-AV-3 2 2A Anforderungen zu Time & Space Partitioning V1.00 Reiger.doc

Schon dieses Dokument behandelt nach den Anforderungen an Time & Space Parti-tioning aus einer zwangsweisen Logik heraus auch die Beschreibung der IMA Archi-tektur, im wesentlichen Time- und Space Partitioning. So werden auch die Standards ASAAC und ARINC 653 vorgestellt.

Somit ist obiges Dokument auch vor dem vorliegenden zu lesen.

Das vorliegende Dokument

D-AV-3 2 2B Beschreibung Time & Space Partitioning V1.00 Reiger.doc

detailliert somit das Dokument

D-AV-3 2 2A Anforderungen zu Time & Space Partitioning V1.00 Reiger.doc

und ist somit als Supplement dessen zu sehen.

Page 4: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

4

Inhalt: Inhalt: ........................................... .............................................................................. 4

Abbildungsverzeichnis: ............................ ............................................................... 5

1 Einordnung und Kurzbeschreibung ................... ............................................. 6

1.1 Motivation und Einordnung ........................................................................... 6 1.2 Management Summary ................................................................................ 6

2 Discussion of Partitioning in Avionics Architecture s with the Aspect of Certification ..................................... .......................................................................... 9

2.1 Mechanisms and Assurance ........................................................................ 9 2.1.1 Mechanisms: Partitioning within a single Processor ............................ 13

2.1.1.1 Spatial Partitioning ....................................................................... 13 2.1.1.2 Temporal Partitioning ................................................................... 16

2.1.2 Mechanisms: Partitioning within a distributed System ......................... 22 2.2 System Considerations & Standards ARINC 653 and DO-178B ................ 23 2.3 The Must (sometime) 1: IMA driven Modular Certification .......................... 27 2.4 The Must (some day) 2: IMA driven In-Time Certification .......................... 30

2.4.1 Only with IMA: Compositional Certification ......................................... 32 2.4.2 Only with IMA: the next step: Synthesis and a further step: Just-In-Time Certification ....................................................................................................... 34

2.5 The Must (right now) 3: Modular Certification – CMU SEI PACC and IMA . 35 Literaturverzeichnis: .............................................................................................. 40

Page 5: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

5

Abbildungsverzeichnis: Figure 1: Alternative Operating System/Partition Designs1: (a) VxWorks type (b) PikeOS type ............ 9

Figure 2: Distinguish between shared resources in t ime/space and time & space partitioning,

so also shared resources in time need space partiti oning ............................................................. 12

Figure 3: Principle example of a partitioning system ............................................................................. 21

Figure 4: Robust partitioning impact by DMA-induced temporal violation ............................................ 24

Figure 5: Modular versus traditional certification ................................................................................... 28

Figure 6: Multi module modular certification.......................................................................................... 28

Figure 7: Assume-guarantee modular certification ............................................................................... 29

Figure 8: Component assemblies are “in the zone” if their runtime behavior is analytically predictable.

............................................................................................................................................................... 36

Figure 9: Each assembly (system assembled from components) in the zone has a corresponding

model in that zone’s analytic theory. ..................................................................................................... 36

Figure 10: PACC Assignment of a statistical confidence label to analytic theories .............................. 37

Figure 11: Trusted components must provide more insight into their inner workings. Analytic theories

require additional quality-specific properties such as task execution time to make predictions. These

quality-specific properties are exposed as analytic interfaces of components ..................................... 38

Figure 12: To support predictability, a component technology must exhibit the minimal, ideal design

pattern shown above. In this ideal pattern, all interactions among components are exposed and use

standard connection mechanisms. Additionally, resource management and coordination policies are

defined and enforced by a standard component runtime environment ................................................. 39

Page 6: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

6

1 Einordnung und Kurzbeschreibung

1.1 Motivation und Einordnung Das dem vorliegenden Dokument vorhergehende Dokument

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

leitet die Anforderungen an Integrierte Modulare Avionik (IMA) her, die für ein IMA System erfüllt sein müssen. Die wesentlichen Anforderungen betreffen dabei Avionik-Sicherheitsaspekte sowie gemischte Kritikalitäten und somit der DO-178-B, time- und space-partitioning mit Verweis auf ARINC 653 folgend.

Im Sinne eines Roten Fadens sind in diesem Zusammenhang auch die Architekturen in allen wesentlichen Aspekten schon in diesem vorhergehenden Dokument be-schrieben und zwar in den Kapiteln:

The Avionics Architectures – IMA and Beyond ..................................................................................... 24 5.1 Architectures descriptures .................................................................................................... 24

5.1.1 A short overview and a couple of scheduling policies ................................................. 24 5.1.2 The federated system architecture example ............................................................... 26 5.1.3 The IMA Architecture ................................................................................................... 27 5.1.4 The APEX Example ..................................................................................................... 34 5.1.5 The Replication and Reconfiguration Example ........................................................... 37

5.2 ASAAC / ARINC 653 ............................................................................................................ 42 5.3 DIMA: Integrated Systems-of-Systems

Das vorliegende Dokument

D-AV-3 2 2B Beschreibung Time & Space Partitioning V1.00 Reiger.doc

detailliert somit bzgl. Beschreibung von time & space partitioning das Dokument

D-AV-3 2 2A Anforderungen zu Time & Space Partitioning V1.00 Reiger.doc

und ist somit als Supplement dessen zu sehen.

Inhaltlich ist das Dokument eine Kompilation der Re ferenzen des Literaturver-zeichnisses, da die Abhandlungen dort bereits ausge zeichnet ausgeführt sind;

auf die Originale sei somit verwiesen!

1.2 Management Summary Das vorliegende Dokument D-AV-3 2 2B baut auf dem Dokument D-AV-3 2 2A, wel-ches Time & Space Partitioning schon beschreibt auf und hat zum Inhalt eine Dis-kussion der Beschreibung von Time & Space Partitioning als Supplement zu D-AV-3 2 2A mit den Kapiteln

2 Discussion of Partitioning in Avionics Architectures & the Aspect of Certification

2.1 Mechanisms and Assurance A centralized IMA architecture must provide replicated and physically distributed hardware for fault tolerance, together with mechanisms for redundancy management. So a conceptually centralized IMA will be, internally, a distributed system.

Page 7: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

7

The traditional federated architecture is a major obstacle to a rational organization of flight functions, and IMA is the best hope for removing this obstacle. That suggests examining partitioning for IMA as a distributed system in which flight functions are each allocated to separate processors, replicated as necessary for fault tolerance. In this model it is to limit fault propagation between the processors supporting each function, but not within them.

If functions have no internal partitioning, then all their software must be assured and certified to the level appropriate for that function. Thus, all the software in an autopilot function is likely to require assurance to Level A of DO-178B if part of it has that re-quirement.

With IMA partitioning within a processor could allow an individual function to be di-vided into software components of different criticalities; each could then be devel-oped and certified to the level appropriate to its criticality.

It is described Spatial partitioning and Temporal/Time partitioning and how to do it and what to take care for.

2.2 System Considerations & Standards ARINC 653 and DO-178B Several IMA system design challenges are posed by common hardware features such as interrupts , Direct Memory Access (DMA) , and system Input/Output (I/O) , as they may produce unpredictable distortions that violate the constraints of time and space partitioning.

Time and space partitioning must be deterministic in an ARINC 653-based IMA sys-tem in order to avoid aircraft certification concerns and incompatibilities with the standards and guidelines set forth in DO-178B. Methods for identifying, analyzing, and rectifying potential determinism issues in the IMA system are presented herein.

As described robust time partitioning “must ensure that the service received from shared resources by the software in one partition cannot be affected by the software in another partition the performance of the resource concerned, as well as the rate, latency, jitter, and duration of scheduled access to it”

For that it has to be dealt with

� Potential Issues with DMA Transfers in Robustly-Partitioned IMA Systems

� Potential Issues with Interrupts in Robustly-Partitioned IMA Systems

� Potential Issues with I/O in Robustly-Partitioned IMA Systems

2.3 The Must (sometime) 1: IMA driven Modular Certification Airplanes are certified as a whole: there is no established basis for separately certify-ing some components, particularly software-intensive ones, independently of their specific application in a given airplane.

The absence of separate certification inhibits the development of modular compo-nents that could be largely "precertified " and used in several different contexts within a single airplane, or across many different airplanes.

Here is to examine the issues in modular certification of software components and propose an approach based on assume-guarantee reasoning. The method is ex-tended from verification to certification by considering behavior in the presence of failures. This exposes the need for partitioning,

2.4 The Must (some day) 2: IMA driven In-Time Certification

Page 8: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

8

In-time certification is a goal-based approach in which unconditional claims delivered by formal methods are combined with other evidence in multi-legged cases sup-ported by Bayesian analysis and

� the necessity of extending this to compositional certification and

� the possibility of adaptive systems in which methods of analysis traditionally used to support certification at design time are instead used for synthesis and monitoring at runtime and certification is performed “just-in-time.”

The traditional approach to software certification may be called “standards based” and largely requires the applicant to follow prescribed processes (e.g., DO-178B for airborne software, the Common Criteria for computer security, or IEC-61508 for pro-grammable devices) and to develop specified evidence (e.g. MC/DC tests for DO-178B Level A).

It is to discuss a goal-based approach to software certification and on advances in the power of automated formal methods of analysis.

2.5 The Must (right now) 3: Modular Certification – CMU SEI PACC and IMA It is to fast assemble new and innovative software-intensive systems for high-stakes applications from certified, trusted software components . Timing estimates made at design time are routinely within required tolerance with 99.99% confidence or bet-ter.

Critical safety and security properties are formally verified, and one has obtained firm control over the quality of software components developed by suppliers from around the world. These capabilities give a potent differentiator in an increasingly competi-tive industry. Engineering predictability beyond testing is the use of software ar-chitecture to predictably satisfy the quality requirements of software systems.

The approach is to develop analytic theories that predict the behavior of entire clas-ses of systems. In this approach, the emphasis is on confirming the validity of theo-ries. Once confirmed, one can be sure that all systems satisfying the assumptions of the theory will have behavior that is predictable in that theory in define zones of pre-dictability.

Page 9: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

9

2 Discussion of Partitioning in Avionics Architectu res with the Aspect of Certification

2.1 Mechanisms and Assurance 1 A centralized IMA architecture must provide replicated and physically distributed hardware for fault tolerance, together with mechanisms for redundancy management. So a conceptually centralized IMA will be, internally, a distributed system.

The traditional federated architecture is a major obstacle to a rational organization of flight functions, and IMA is the best hope for removing this obstacle. That suggests examining partitioning for IMA as a distributed system in which flight functions are each allocated to separate processors, replicated as necessary for fault tolerance. In this model it is to limit fault propagation between the processors supporting each function, but not within them.

If functions have no internal partitioning, then all their software must be assured and certified to the level appropriate for that function. Thus, all the software in an autopilot function is likely to require assurance to Level A of DO-178B if part of it has that re-quirement.

With IMA partitioning within a processor could allow an individual function to be di-vided into software components of different criticalities; each could then be devel-oped and certified to the level appropriate to its criticality.

The design choices for partitioning interact with those for providing operating system services. The major decision is whether partitioning is provided above an operating system layer, Figure 1, or above a minimal kernel or executive with most operating system services then provided separately in each partition. The first choice is the way standard operating systems are structured with partitions being client processes, but it has the disadvantage that partitioning then relies on a great deal of operating sys-tem software. The second choice is sometimes called the “virtual machine" approach, and it has the advantage that partitioning relies only on the kernel and its supporting hardware.

Figure 1: Alternative Operating System/Partition De signs 1: (a) VxWorks type (b) PikeOS type

Page 10: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

10

Partitioning should be considered both within a single processor and across a distrib-uted system, and that partitioning has interactions with the provision of operating sys-tem services and fault tolerance.

The purpose of partitioning is fault containment: a failure in one partition must not propagate to cause failure in another partition. However, it is to be careful about what kinds of faults and failures are considered. The function in a partition depends on the correct operation of its processor and associated peripherals, and partitioning is not intended to protect against their failure. This can be achieved only by replicating functions across multiple processors in a fault-tolerant manner.

After all, each function would be just as vulnerable to hardware failure if it had its own processor. Rather, the intent of partitioning is to control the additional hazard that is created when a function shares its processor or, more generally, a resource with oth-er functions. The additional hazard is that faults in the design or implementation of one function may affect the operation of other functions that share resources with it.

Now a design or implementation fault in a flight function is surely a very serious event and it might be supposed that such faults are so serious that it does not matter what else goes wrong, or certification ensures that such faults cannot occur or could be catastrophic.

Thus, while a design fault in, say, the auto throttle function would be serious, appro-priate design and system-level hazard analysis will ensure that it is not catastrophic, provided other functions do not fail at the same time. Allowing a fault in this function to propagate to another function, .e.g., autoland, would violate the assumption of in-dependent failures. Thus, far from a fault in a critical function being so serious as to render concern for partitioning irrelevant, it is the need to contain the consequences of such a fault that renders partitioning essential.

So TSP is a technique which permits the sharing of a computing platform between multiple independent applications.

Spatial partitioning indicates the division of shared resources, such as memory, which may be utilized by multiple applications simultaneously. Spatial regions such as memory address ranges can be limited to exclusive access by one application, or shared access by multiple applications can be granted.

Temporal/Time partitioning indicates the division of shared resources, such as a simple processor, which cannot be utilized by multiple applications simultaneously. Such resources must be wholly ‘owned’ by a single application at any point in time, and are shared by multiplexing the access to the resource by applications in time.

So:

No partition can:

1. Space Partitioning: Contaminate another’s code, I/O, or data storage areas

2. Time Partitioning: Consume shared processor resources to the exclusion of

any other partition

3. I/O: space and time partitioning Consume I/O resources to the exclusion of any other partition

4. Cause adverse effects to any other partition as a result of a hardware failure unique to that partition

Alternative formulation:

Page 11: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

11

Spatial Partitioning:

Spatial partitioning must ensure that software in one partition cannot change the software or private data of another partition .either in memory or in transit, nor com-mand the private devices or actuators of other partitions.

Temporal Partitioning:

Temporal partitioning must ensure that the service received from shared resources by the software in one partition cannot be affected by the software in another parti-tion. This includes the performance of the resource concerned, as well as the rate, latency, jitter, and duration of scheduled access to it.

So the mechanisms of partitioning must block the spatial and temporal pathways for fault propagation by interposing themselves between avionics software functions and the shared resources that they use.

Leading to:

Space partitioning:

� Protected system page tables

� Constructed at build time or proved correctness otherwise (see 2.4.2 Only with IMA: the next step: Synthesis and a further step: Just-In-Time Certification)

� Validated MMU

Time partitioning:

� Non-user mask-able

� SAFEbus interrupt drives OS schedule

� No other interrupts allowed

Page 12: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

Figure 2: Distinguish between shared resources in t ime/space and time & space partitioning, so also sh ared resources in time need space partitioning

Page 13: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

2.1.1 Mechanisms: Partitioning within a single Proc essor An application will generally be composed of smaller units of computation that are called or scheduled separately; we generally refer to these as tasks. Again depend-ing on the implementation, these may correspond to an operating system notion such as thread.

2.1.1.1 Spatial Partitioning The basic concern of spatial partitioning is the possibility that software in one partition might write into the memory of another: memory is often pictured as a one- or two-dimensional grid, hence the reference to the spatial dimension for this aspect of parti-tioning. Memory includes that used to store programs as well as data, although in embedded systems it is sometimes possible to hold the former in ROM, where it cannot be overwritten by errant software.

Hardware mediation provided by a memory management unit (MMU) is the usual way to guard against violations of spatial partitioning.

The basic idea is that the processor has, at least) two modes of operation and, when it is in “user” mode, all accesses to memory addresses are either checked or trans-lated using tables held in the MMU. A layer of operating system software (the kernel) manages the MMU tables so that the memory locations that can be read and written in each partition are disjoint, apart, possibly, from certain locations used for inter-partition communications.

The kernel also uses the MMU to protect itself from being modified by software in its client partitions, and must be careful to manage the user-supervisor mode distinc-tions of the processor correctly to ensure that the mediation provided by the MMU cannot be bypassed. In particular, entry and exit from the kernel needs to be handled carefully so that software in a partition cannot gain supervisor mode.

Software executing in a partition accesses processor registers such as accumulators and index registers as well as memory. Generally, the kernel arranges things so that the software in one partition executes for a while, then another partition is given con-trol, and so on; when one partition is suspended and another started, the kernel first saves the contents of all the processor registers in memory locations dedicated to the partition being suspended, and then reloads the registers .including those in the MMU that determine which memory locations are accessible with values saved for the partition that executes next. The software in the partition resumes where it left off and cannot tell apart from the passage of time while it was suspended that it is shar-ing the processor with other partitions.

The description just given resembles classical time-sharing, where partitions can be suspended at arbitrary points and resumed later. Some variations are possible for embedded systems. For example, if partitions are guaranteed an uninterruptible time slice of known duration, they can be expected to have finished their tasks before be-ing suspended and can then be restarted in some standard state, rather than re-sumed where they left off. This eliminates the cost of saving the processor registers when a partition is suspended, but at least some of them including the program coun-ter must be restored to standard values when the partition is restarted. We can refer to the two types of partition swapping arrangements as the restoration and restart models, respectively.

The “restoration" model, the processor state must be restored to exactly what it was before suspension; in the “restart" model, it must be restored to some known state.

Page 14: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

14

Though, the description just given resembles classical time-sharing, in dependable systems partition change must resist any possible fault and this must be shown and certified.

The requirement to make behavior predictable across the suspension and resump-tion of a partition generates in turn the requirement that the operation of the proces-sor must be specified precisely and accurately with respect to all of its registers.

An alternative to spatial partitioning using hardware mediation is Software Fault Isolation (SFI). The idea here is similar to array bounds checking in high-level pro-gramming languages, except that it is applied to all memory references, not just those that index into arrays. By examining the machine code of the software in a par-tition, it is possible to determine the destinations of some memory references and jumps and hence to check, statically, whether they are safe.

Static (i.e. compile-time) analysis, abstract inter pretation of information flow with-in individual programs written in high-level languages has long been a topic in com-puter security. In its simplest form, some of the variables used by the program are labeled high and some low, and the goal is to check whether information from a high variable can ever influence the final value of one labeled low. Techniques for infor-mation flow analysis include approximate methods similar to type checking or to data flow analysis as well as exact methods and those that rely on formal proof. It is pos-sible that approaches based on these techniques could reduce, or even eliminate, the runtime overhead of SFI.

A disadvantage of SFI compared with hardware-mediated partitioning is that it im-poses an additional analysis and certification cost on every program, whereas hard-ware mediation has the one-time cost of designing, implementing, and certifying the partitioning mechanisms of the kernel and its supporting hardware. On the other hand, the analysis required for SFI lends itself to powerful automation, extended stat-ic checking, and proof carrying code where the certification cost would be transferred to the one-time cost of certifying the tools.

One concern about SFI, especially when static analysis is used to optimize away many of the runtime checks, is that it provides little protection against hardware faults e.g. SEU-induced bit-flips that cause memory addresses that were correct when ana-lyzed to be turned into ones that are incorrect when executed. The bad memory ref-erence will be caught only if a runtime check is in the right place; hardware MMU, on the other hand, mediates every reference at its time of execution. It was stated that the purpose of partitioning is to protect functions against faults of design and imple-mentation in other functions, not to guard against hardware faults since these could afflict the function even if it had its own dedicated processor, but a hardware fault that leads to a violation of partitioning is not a fault that would have afflicted the function if it had its own processor, so it seems that the concern is legitimate.

In designs where it is possible to provide a custom MMU, it would be prudent to en-sure that this is either fault tolerant, or that it merely checks rather than translates addresses so that a double fault would be needed to violate partitioning; best of all might be relocation or checking with hardwired values.

Processor and the memory ���� inter-partition communication:

Consideration of partitioning has considered only the processor and the memory, and has assumed that different partitions are meant to be isolated from each other; it’s now to consider inter-partition communications, and devices. Like partitioning itself, there are two dimensions to inter-partition communication:

Page 15: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

15

� The spatial dimension is concerned with where and how data is transferred from one partition to another, while

� The temporal dimension is concerned with whether and how synchronization is performed, and how one partition invokes services.

Any interference between partitions (like synchroni zation) by inter-partition communication is not possible in any way that so reducing the possibilities of possible communication principles.

So the obvious way to communicate data from one partition to another is to copy it from a buffer in memory belonging to the first partition into a separate buffer in the memory of the second. Because only the kernel has access to the memory of both partitions, it must perform the copying and, since it generally runs without memory protection, it must check carefully against buffer overruns. A more efficient scheme uses a single buffer in memory locations that are among those the sending partition can write and the receiver can read. Both MMU and SFI forms of memory protection can do this; data can then be copied into the shared buffer by the sending partition without the active participation of the kernel. The receiving partition must assume that the sending one can write arbitrary data anywhere in their shared buffers whenever it has control and its verification must be performed under this assumption. It seems cleanest if separate buffers are used for each direction of transfer, but bidirectional buffers may also be acceptable. It is, however, important that separate buffers are used for each pair of partitions otherwise partition A could overwrite the data of B in C's single input buffer.

Observe that it is important to restrict inter-partition communications to those that are intended: one partition should be able to send data to another only if that communi-cation is authorized in the specification of the system configuration and the receiving partition must then have a buffer to receive it.

A related topic is how one partition should name the other partitions with which it communicates.

Absolute addresses lead to a rigid and fragile system organization and are to be dep-recated on this account. Functional addresses are little better: they build assumptions about the system structure into individual applications and limit the opportunities for reuse and reconfiguration. Relative addressing allows the binding of names to specif-ic inter-partition communication channels to be postponed until system configuration time and may allow some dynamic reconfiguration, but requires a database to record what type of data or service is provided on a given port. The best arrangement may be one where partitions use the type of data or service provided or expected as the name of the port concerned .e.g. “send this datum out on my air-data-samples port" or “get me an air-data-sample". The binding of these names to interpartition channels can be done during system configuration, or at runtime. In the latter case, we have something like a publish-subscribe architecture, this provides excellent sup-port for dynamic reconfiguration, but its application to life-critical systems is still an issue for research. Software in one partition should not make assumptio ns about when tasks in other partitions are scheduled. Tasks within some partitions may be dynamically scheduled, this, combined with normally asynchronous communication, means that care is needed when communicating time-sensitive data. For example, a task that collects from its input buffer a sensor sample contributed by another partition needs to know when that sample was taken. The usual arrangement is to attach a time

Page 16: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

16

stamp to the sample, since both partitions are running in the same processor, they have access to a common clock.

In addition to communications between partitions, we must examine communications between partitions and devices. Devices, which include sensors and actuators as well as peripherals such as mass storage, have implications for both temporal and spatial partitioning. Most devices raise an interrupt when data is available, or when they need service. Such interrupts affect the timing and locus of control, and consid-eration of their impact is postponed to the discussion on temporal partitioning.

Devices impact spatial partitioning in three ways: they need to be protected against access by the wrong partition, they must not be allowed to become agents for violat-ing partitioning, and they may themselves need to be partitioned.

The simplest case is where a device “belongs” to some partition and should not be accessed by others. Most modern processors use memory-mapped I.O, meaning that interaction with devices is conducted by reading and writing to registers that are referenced like ordinary memory locations.

Some devices may be “shared” by more than one partition . Such devices come in two forms: those that need protection and those that do not. An example of the former is a sensor that periodically places a sample in a device register. There seems no harm in allowing two partitions both to have read access to the memory location containing that device register. Devices that accept commands are more problematical in that faulty software in one partition may issue commands that render the device inoperable or otherwise unavailable to other partitions. Protection by a special device management partition seems necessary to mediate access in these cases (the Clementine spacecraft was lost when a software fault caused garbage to be sent over an unmediated bus, where it was interpreted by an attached device as a command to fire all the thrusters without limit).

2.1.1.2 Temporal Partitioning The context is real-time embedded systems, where correctness requires not only that the right results are produced, but that they are produced at the right time. The con-cern of temporal partitioning is to ensure that activities in one partition do not disturb the timing of events in other partitions.

The concerns are that faulty software in one partition might monopolize the CPU, or that it might crash the system or issue a halt instruction effectively denying service to all other partitions.

Other scenarios that can cause a partition to fail to relinquish the CPU on time in-clude simple schedule overruns, where particular parameter values cause a compu-tation to take longer than its allotted time, and runaway executions, where a program gets stuck in a loop.

Although their manifestations are in the temporal dimension, system crashes and instructions that halt the CPU are usually prevented by the mechanisms of spatial partitioning.

Runaway executions in the kernel, lockups, and un-trapped halt instructions could all afflict a processor dedicated to a single function, and so their treatment is more in the domain of system-level design verification or fault tolerance than partitioning. Over-runs or runaways within a function, however, are genuinely the concern of partition-ing and are usually controlled through timer interrupts managed by the kernel: the kernel sets a timer when it gives control to a partition; if the partition does not relin-

Page 17: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

17

quish control voluntarily before its time is up, the timer interrupt will activate the ker-nel, which will then take control away from the overrunning partition and give it to an-other partition under the same constraints.

Merely taking control away from an overrunning partition does not guarantee that other partitions will be able to proceed, however, for the overrunning partition could be holding some shared device or other resource that is needed by those other parti-tions. The kernel could break any locks held by the errant partition and forcibly seize the resource, but this may do little good if the resource has been left in an incon-sistent state. These considerations reinforce the earlier conclusion that devices and other resources cannot be directly shared across partitions. Instead, a management partition must own the resource and must manage it in such a way that behavior by one client partition cannot affect the service received by another.

Another problem can arise if the overrunning partition is performing some service on behalf of another partition: it will generally be necessary to notify the invoking parti-tion, the next time it is scheduled, of the failure of the service provided by the other. The invoking partition must have enough fault tolerance that it can do something sensible despite the failure of the service. It may also be necessary for the kernel to perform some remedial action on the partition that overran its allocation. This could force that partition to do a restart next time it is scheduled, or could simply notify the partition of its failure and leave recovery e.g., the killing of orphans, to the operating system functions resident in that partition.

Timeout mechanisms such as those just described ensure that each partition will get enough access to the CPU and other resources, but real-time systems need more than this: the tasks within partitions need to get access to the CPU and to devices and other resources at the right time with predictability. This means that discussion of temporal partitioning cannot be divorced from consideration of scheduling issues. The real-time tasks within a partition generally consist of iterative tasks that must be run at some fixed frequency e.g. 20 times a second and sporadic tasks that run in response to some event (interrupt) e.g . when the pilot press-es a button .

Iterative tasks often require tight bounds on jitte r, meaning that they must sample sensors or deliver outputs to their actuator s at very precise instants e.g. within a millisecond of their deadline , and sporadic tasks often have tight bounds on latency, meaning that they must deliver a n output within some short interval of the event that triggered them .

There are two basic ways to schedule a real-time system: statically or dynamical-ly . In a static schedule, a list of tasks is executed cyclic ally at a fixed rate . Tasks that need to be executed at a faster rate are allocated multiple slots in the task‘s schedule. Even sporadic tasks are scheduled cyclically to poll for input and process it if present .

The maximum execution time of each task is calculated, and sufficient time is allo-cated within the schedule to allow it to run to completion: thus, one task never inter-rupts execution of another, although a task may be terminated if it exceeds its alloca-tion. Notice that this means that a long-duration task may need to be broken into several smaller pieces to make room for short tasks with higher iteration rates. The schedule is calculated during system development and is not changed at runtime .although it may be possible to select among a fixed collection of different schedules at runtime according to the current operating mode (but in the moment each sched-

Page 18: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

18

ule must be certified incl. all depending systems, making a process like that expen-sive).

In a dynamic schedule, on the other hand, the choice and timing of which tasks to dispatch is decided at runtime. The usual approach allocates a fixed priority to each task, and the system always runs the highest-priority task that is ready for execution. If a high-priority task becomes ready .e.g., due to a timer or external interrupt. while a lower-priority task is running, the lower-priority task is interrupted and the high-priority task is allowed to run. Note that this requires a context-switching mechanism to save and later restore the state of the interrupted task. The challenge in dynamic schedul-ing is to allocate priorities to tasks in such a way that overall system behavior is pre-dictable and all deadlines are satisfied. Originally, various plausible and ad-hoc schemes were tried, such as allocating priorities on the basis of .importance, but the field is now dominated by the rate monotonic scheduling (RMS). Under RMS, pri-orities are simply allocated on the basis of iteration rate, the highest priorities going to the tasks with the highest rates, and, under certain simplifying assumptions, it can be shown that all tasks will meet their deadlines as long as the utilization of the proces-sor does not exceed 69%.

The basic arguments in favor of static scheduling are its complete predictability and the simplicity of its implementation; the arguments against are that all tasks must run at a multiple of the basic iteration rate, so that some run more or less frequently than is ideal for their control function, the handling of sporadic tasks is wasteful, and long-running tasks must be broken into multiple, separately scheduled pieces to make room for tasks with faster iteration rates. The arguments in favor of dynamic schedul-ing are that it is more flexible and copes better with occasional task overruns; the ar-guments against hinge on the difficulty of giving complete assurance that a given task set will always meet its deadlines under all circumstances.

The mechanisms of both static and dynamic scheduling have to be modified to oper-ate in a partitioned environment, and these modifications change some traditional expectations about the tradeoffs between the two approaches; in addition, partition-ing creates opportunities for hybrid approaches that combine elements of both basic mechanisms. The traditional scheduling problem is to ensure satisfaction of all dead-lines, given information about the rate and duration of the tasks concerned. It is as-sumed that this information is accurate; if it is not if, for example, some task runs longer or requests service more often than expected, then the system may fail. When all the tasks in the system are contributing to some single application, such a failure may be undesirable but will not have repercussions beyond those consequent on the failure of the application concerned.

In a partitioned system, however, it is necessary to ensure that faulty assumptions about the temporal behavior of tasks belonging to one application cannot affect the temporal behavior of applications in different partitions.

There seem to be two ways to achieve this temporal partitioning: one is a two-level structure in which the kernel schedules partitions, with the application in each parti-tion then responsible for locally scheduling its own tasks; the other is a single-level structure in which the kernel schedules tasks, but with a quota system to limit the consequences of any faults to the partition that is in violation.

The first approach usually employs static scheduling at the partition level: the kernel guarantees service to each partition for specified durations at a specified frequency e.g., 20 ms every 100 ms. and the partitions then schedule their tasks within their individual allocations in any way they choose; in particular, partitions may use dy-

Page 19: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

19

namic scheduling for their own tasks. Any partition that schedules its tasks dynami-cally must provide a mechanism for interrupting one task in favor of another. Such support for task swapping is one of the reasons for preferring dynamic over static scheduling: it simplifies application programming by allowing long-running, low-frequency tasks to be interrupted by shorter high-frequency tasks, whereas statically scheduled systems have to break long-running tasks into separately scheduled frag-ments that perform their own saving and restoration of local state data to create room for the higher-frequency tasks. If partition swapping uses the restoration model, how-ever, it provides an alternative mechanism for dealing with long-running tasks within a statically scheduled environment: a single application can be divided into parts that are allocated to separate partitions that are scheduled at different rates. The parti-tion-swapping mechanism then takes care of interrupting and restoring the long-running tasks, thereby simplifying their construction.

The convenience of dynamic scheduling is the ease with which it can accommodate aperiodic activities driven by external events, such as operator e.g., pilot. inputs and device interrupts, and it requires care to support this on top of static partition schedul-ing, even when this is running at kilohertz rates.

The basic concern is that external events of interest to one partition must not disturb the temporal behavior of other partitions. If partitions are scheduled dynamically, use of suitable quota schemes can allow temporal predictability to coexist with aperiodic event-driven task activations but static partition scheduling ensures predictability through temporal determinism and this imposes strong restrictions on event-driven activations.

First and most obviously, a static partition schedule does not allow an external event to initiate a partition swap: the partition schedule is driven strictly by the processor's internal clock, so that if an event requires the services of a task in a partition other than the current one, it must wait until the next regularly scheduled activation of the partition concerned. This increases latency, but may not be a problem if partitions are scheduled at kilohertz rates. Less obvious, perhaps, are the consequences of the requirement that the currently executing partition should see no temporal impact from the arrival of events destined for other partitions.

Even the cost of a kernel activation to latch an interrupt for delivery to a later parti-tion reduces availability of the CPU to the current partition and must be strictly con-trolled.

This concern is a manifestation of a more general issue: temporal partitioning re-quires not only that each partition has access to the resources of the system at guar-anteed intervals, but that those resources provide their expected performance. A CPU whose performance is degraded by the cost of latching interrupts for later deliv-ery is just one example; others include a memory subsystem degraded by DMA transfers on behalf of other partitions, or an I.O subsystem that is busy on their be-half.

Under static partition scheduling, temporal partitioning is predicated on determinism: because it is difficult to bound the behavior of faulty partitions, the availability and performance of each resource is ensured by guaranteeing that no other partition can initiate any activity that will compete with the partition scheduled to access the re-source. This means that no CPU or memory cycles may be consumed other than at the behest of the software in the currently scheduled partition. Thus, in particular, there can be no servicing of device interrupts, nor cycle-stealing DMA transfers other than those initiated by the current partition. These requirements can be violated in

Page 20: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

20

two ways: a previously scheduled partition may have had some I.O activity pending when it was suspended, or the external environment may generate an interrupt spon-taneously .e.g., to indicate that a button has been pressed.

Draconian measures seem necessary to prevent these sources of temporal uncer-tainty. External events either should not generate interrupts, the relevant partition should poll for the event instead, or it should be possible to defer handling them until the relevant partition is running .whether this is possible depends on the nature of the device and the interrupt, and on how selectively the CPU architecture allows inter-rupts to be masked. Similarly, interrupts due to pending I.O from a device command-ed by a previous partition should be masked. If interrupts cannot be masked with suf-ficient selectivity, we could require the kernel to issue commands that quiet the de-vices of the previous partition as part of the process of suspending that partition and starting the next. Alternatively, if devices go quiet when un-commanded for some short time, the kernel could make the device registers unavailable .e.g., by changing the MMU table during the few milliseconds of each partition's schedule.

The restrictions just described as necessary to ensure that temporal correctness of tasks in one partition are unaffected by software in other partitions have conse-quences for inter-partition communications. With static scheduling of partitions, a task that needs the services of software in another partition e.g., to access a shared de-vice, cannot simply issue a procedure call. In fact, there can be no synchronous ser-vices i.e., where the caller blocks and waits for the service provider to reply, across partitions because .a. one partition should not depend on another, that may be faulty, to unblock its progress, and it would impose a large performance penalty: the caller would block at least until its next slot in the schedule after the service provider's slot.

Instead, all inter-partition communication must be asynchronous, where the caller places requests in the input buffers of tasks in other partitions and continues execu-tion; when next activated, it looks in its own input buffers for replies, requests, and unsolicited data from other partitions. Because faulty software could generate an ex-cessive number of requests for service by another partition, it seems necessary that fixed quotas should be imposed on the number or rate of service requests that will be honored from each partition.

Some of the restrictions that are necessary when partitions are scheduled statically may possibly be relaxed when they are scheduled dynamically. It makes little sense to schedule partitions dynamically and tasks statically, and when both partitions and tasks are scheduled dynamically there is little point in maintaining two levels of scheduling, so the unit of scheduling will actually be the task. However, the concern for temporal partitioning will influence which tasks are eligible for execution.

Whereas static scheduling ensures temporal partitioning through strict preplanned determinism, dynamic scheduling relies on theorems from the mathematical study of for example RMS. There are two problems in applying this theory in the context of partitioning: one is that a faulty partition may violate the assumptions underlying the theorem concerned; the other problem is that the simplest and therefore for life-critical applications preferred theorems make the strongest assumptions e.g. that context switches take no time, many partition swaps entailed may have a more dele-terious effect on the tasks of other partitions than the CPU time directly consumed by the faulty task. A plausible way to overcome this problem is to subtract the cost of a partition swap and the performance degradation caused by disturbing the caches from the quota of the task that causes it. Quotas managed in this way provide many of the guarantees of static scheduling while retaining some of the flexibility of dynam-

Page 21: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

21

ic scheduling. For example, such a scheme could allow synchronous as well as asynchronous inter-partition communications, together with the ability to service ape-riodic events and interrupts. However, many of the restrictions and concerns dis-cussed for static partition scheduling remain relevant for dynamic scheduling: For example, it still seems necessary to eliminate cycle-stealing DMA transfers and other performance-degrading activities that cannot easily be controlled by quotas, and it is also necessary to ensure that interrupts for a partition that has exceeded its quota are masked or latched at truly zero cost. Other potential sources of cross-partition interference such as locks and semaphores must also be suitably controlled.

Quota-based dynamic scheduling may provide simple guarantees that the tasks of nonfaulty partitions receive their expected allocations, i.e. they receive enough time, but guarantees that they will hit their deadlines i.e. they get it at the right time are more problematical, there are, for example, scenarios under RMS where the early completion of one task causes another to miss its deadline (!!!).

Whether partitions and tasks are statically or dynamically scheduled, the kernel must collaborate with other software to provide some of the services of an operating sys-tem, at the very least it will be necessary to service interrupts. Under static partition scheduling, interrupts from external devices are allowed only when their partition is running; this means it is possible to vector interrupts directly to handlers in the parti-tion, rather than handle them in the kernel. The advantage of the former arrangement is that it minimizes the complexity of the kernel; its difficulty is that interrupts are often vectored in supervisor mode, which can threaten hardware-mediated spatial partition-ing. Compromise arrangements have the kernel yielding the hardware interrupt, but then passing it in a safe way to the partition for service. Arguments against device handling in a partition are that this really is an operating system service that is better done by an operating system. A conventional operating system is unattractive in a partitioned environment because, as in Figure 1 left hand, it is a large shared re-source that must be shown to respect partitioning as well as to be free of other faults. A more suitable arrangement provides operating system services separately within each partition, as portrayed previously in Figure 1 right hand. This arrangement has the additional benefit that different partitions can use different sets of operating sys-tem services: So see Figure 3 a critical function might use a minimal set of services Partition C while a less critical but more complex function might employ something close to a COTS operating system, Partition B, and a device management partition might consist largely of standardized operating system services for device manage-ment. Operating system services cannot affect basic partitioning in this arrangement!

Figure 3: Principle example of a partitioning system

Page 22: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

22

2.1.2 Mechanisms: Partitioning within a distributed System A distributed system raises new issues with respect to partitioning: if we accept that the partitioning mechanisms employed within individual processors are sound, then connecting several such systems together surely cannot do any harm…

This would be true if to could arrange dedicated physical point-to-point communica-tions between partitions in different processors, but the only physical communications that can be provided are between processors. This limitation has a fairly significant impact, which is compounded when we consider shared communications, such as buses.

Thus, suppose we wish to communicate data from partition a1 of processor A to parti-tion b1 in a different processor B, and that we have a suitable communications line from A to B . Interrupts will be generated at B as the data starts to arrive and, as we discovered in the previous section, some care is needed to ensure that these do not disturb temporal partitioning in B. If B is dynamically scheduled , the quota schemes discussed previously may be all that is needed, but matters can be more complicated when partitions are scheduled statically. Under static scheduling , we must require either that the interrupts can be latched at no cost until the scheduled execution of partition b1 or that partition b1 or some device management partition that handles the communications line is guaranteed to be executing when the interrupts arrive. The latter clearly requires synchronization between the partition schedules of processors A and B and, by extension to other processors, this implies global synchronization of schedules across all processors.

The only way to avoid these consequences when static partition scheduling is em-ployed is to have a data concentrator device at B that buffers incoming data without imposing a load on the CPU or its buses. The partition b1 can then retrieve incoming data from the data concentrator as part of its normally scheduled activity. A more ag-gressive design would allow the data concentrator to write incoming data directly into buffers associated with each partition using dual-ported RAM.

Even these designs do not necessarily eliminate the need for global synchronization, however, because of the need to control “babbling idiot" failures in partitions and pro-cessors.

These are failures where a transmitter sends data constantly, possibly overwhelming its recipient, or denying service to other transmitters. One scenario would be a runa-way in partition a1 that causes it to transmit to b1 throughout its scheduled execution. We need to be sure that this heavy load on the communications line from A does not affect the ability of the recipient B or its data concentrator to service its other lines. This requires either some kind of quota scheme at the recipient, or a global schedule that excludes simultaneous transmissions. A babbling partition can do so only during its scheduled execution, so a global schedule may be able to ensure that no two pro-cessors simultaneously schedule partitions that transmit to the same recipient. An alternative if a1 does not drive the communications line directly, but instead sends data to a device management partition, is for the management partition to impose a quota on the quantity of data that it will accept from any one partition. A babbling pro-cessor is an even more serious problem than a babbling partition; either the recipient must be able to tolerate the fault, or it must be prevented at the transmitter mecha-nisms to do this are discussed in the context of bus communications.

The measures discussed above address temporal partitioning in inter-processor communications; we also need to consider spatial partitioning. The spatial dimension

Page 23: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

23

to partitioning requires mechanisms to ensure that partition a1 of processor A can send data to partition b1 in a different processor B only if that communication is au-thorized.

2.2 System Considerations & Standards ARINC 653 and DO-178B 2 Several IMA system design challenges are posed by common hardware features such as interrupts , Direct Memory Access (DMA) , and system Input/Output (I/O) , as they may produce unpredictable distortions that violate the constraints of time and space partitioning.

Time and space partitioning must be deterministic in an ARINC 653-based IMA sys-tem in order to avoid aircraft certification concerns and incompatibilities with the standards and guidelines set forth in DO-178B. Methods for identifying, analyzing, and rectifying potential determinism issues in the IMA system are presented herein.

As described robust time partitioning “must ensure that the service received from shared resources by the software in one partition cannot be affected by the software in another partition the performance of the resource concerned, as well as the rate, latency, jitter, and duration of scheduled access to it”

Temporal partitioning in an IMA system is a zero-sum game; any utilization of the shared processing capability exceeding that allocated to a given partition occurs at the expense of the adjacent partition.

Time partitioning is realized in an ARINC 653-compliant Real Time Operating System (RTOS) by strict scheduling mechanism that deterministically forces partition context switches in accordance with a set of predetermined configuration parameters.

The RTOS manages processing contexts by allowing each partition exclusive rights to execute only during its allocated time in the schedule. The scheduling mechanism in the RTOS typically achieves its temporal base and fidelity (the system “tick”) from a single system interrupt event driven by a hardware-based timer device.

Under normal operating conditions, software applications hosted in partitions execute as a process (or set of processes) in their own virtual memory space, during their own allocated time slice. Time slices are assigned system-wide in a perpetually re-peating temporal pattern known as a major frame. Major frames are decomposed into a series of minor frames of a deterministic duration, to which partitions are bound (or are equal to) to form a context of execution. Thus, assume that the RTOS pro-cess model allows for the existence of multiple processes of varying priorities in a single partition.

Operation commences at the beginning of the major frame, where processing context is transferred to Partition A, since it is bound to the first minor frame.

During Partition A’s scheduled processing context, the Controls process executes within its own virtual memory space to perform the work of the Controls Application. Once Partition, lets say A’s, processing time has expired, the RTOS saves away Par-tition A’s processing state information and initiates a context switch that releases all resources used by Partition A in preparation for executing Partition B. The RTOS ini-tializes the system resources needed by Partition B and the System Application be-comes operational. The System Application function requires the System Manage-ment process to run to completion, where thereafter inter-partition control will be re-linquished to the Input/Output management process (assuming it has the current pri-ority), so that it may execute in its entirety.

Page 24: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

24

The RTOS performs the context switch: storing away Partition let’s say C’s pro-cessing state information, releasing Partition C’s resources, initializing Partition A’s resources, and restoring Partition A’s previously-saved processing state information. Partition A’s context begins such that processing resumes with the Controls Applica-tion in accordance with its process model considering where it last terminated prior to the context switch to Partition B.

A system-wide perspective is necessary when designing the IMA solution in order to ensure robust time and space partitioning is preserved during normal operation. Cer-tain hardware constructs and device software implementations may cause resource violations that disrupt robust time and space partitioning. In particular, Direct Memory Access (DMA) transfers, interrupt handling, and Input/Output (I/O) pro-cessing must be handled carefully in order to avoid temporal and spatial disruptions that undermine robust partitioning, as discussed in the following.

� Potential Issues with DMA Transfers in Robustly-Par titioned IMA Systems

In comparing standard memory transactions with DMA transactions , it should be evident that the transaction efficiency provided by DMA engines provides a powerful means of transferring large blocks of memory within the system; however, the power afforded by the DMA engine is not without potential ramifications to robust partition-ing. Since the DMA engine is granted exclusive access to the memory bus to perform a block transfer, and since the memory bus is a resource that is shared by the entire IMA system, there exists an opportunity for one partition to deprive another partition of a critical resource it needs to execute if the shared resource is not made available to the subsequent requestor.

� DMA-Induced Temporal Violations

DMA-induced temporal violations occur whenever a DMA transfer is initiated by a partition that has less execution time remaining than the time required completing the DMA transfer.

Figure 4: Robust part itioning impact by DMA -induced temporal violation

Page 25: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

25

� Potential Issues with Interrupts in Robustly-Partit ioned IMA Systems

In general, a processor interrupt is caused by an external signal which triggers an asynchronous event in the system where normal operation is suspended, and pro-cessing is redirected to a piece of software that handles or services the interrupt event, if it is enabled, or masked appropriately.

When a processor interrupt is enabled and the interrupt event occurs, the processor must switch contexts to a space defined by the interrupt vector and a time specified by the execution duration of the interrupt handler software routine. As aforemen-tioned, the only interrupt typically found in a robustly-partitioned IMA system is the system clock, and this is for good reason. The RTOS in the IMA system must be in complete control of temporal and spatial contexts in order to guarantee robust parti-tioning, and interrupts present the threat of injecting an unpredictable, independent context into the system that redirects normal operation. While interrupts may provide a convenient method of forcing the system to handle some high-priority external event, handling interrupts in the IMA system may cause distortions in the temporal domain that enervate robust partitioning. When context is restored, the time spent handling the interrupt is accounted for by the interrupted partition, since any read-justment of the partition’s original execution timeline may introduce an unacceptable amount of jitter into the system and violate the strict scheduling requirements given in ARINC 653. Thus as a consequence of strict scheduling, if the global interrupt han-dler’s duration exceeds the execution time remaining in the interrupted partition, exe-cution time will be accounted for by the subsequent partition(s).

� Interrupt-Induced Temporal Violations

Interrupt-induced temporal violations occur whenever an interrupt other than the sys-tem clock interrupt occurs during the normal execution schedule of the system. Inter-rupts could be caused by external events such as I/O operations transpiring or per-haps might even be caused by activity directly initiated by the partition application. Externally-caused interrupt activity is especially problematic for the IMA system since there may be no method to completely analyze or characterize the activity unless it is periodic in nature.

� Potential Issues with I/O in Robustly-Partitioned I MA Systems

I/O in a partitioned IMA system presents the developer with one of their most chal-lenging aspects of design. While ARINC 653 clearly defines operations and interfac-es for inter-partition I/O via ARINC sampling or queuing ports, I/O to physical devices or inter-module I/O are left to RTOS implementers and other stakeholders to provide.

As discussed, use of interrupts can cause temporal violations in the IMA system due to their asynchronous nature . The removal of asynchronous behavior from the system tends to drive I/O solutions towards eit her polling-mode software operations or towards a hardware-based design.

The polling–mode solution clearly introduces a limit to the bandwidth of the I/O data that may be processed and also increases the latency of response; therefore, the solution must be analyzed for acceptable performance.

Best, the architecture of the I/O solution would try to isolate any I/O access to a sin-gle partition-based I/O implementation in order to afford the maximum benefit of parti-tioning. Such an I/O architecture defines the “classical” I/O partition model that is usually the first choice when confronted with a partitioned system. The primary limita-tion of the classical I/O partition solution is that the I/O bandwidth will be limited by

Page 26: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

26

the frequency at which the I/O partition is scheduled in the major frame. Depending on the data rate required, the classical I/O partition model may not be acceptable since the necessary periodicity of the I/O partition may be too high to be accommo-dated by the overall system schedule.

� I/O-Induced Temporal Violations

I/O-induced temporal violations can occur for a number of reasons (as previously discussed), and in general, such violations are typically remedied either by avoiding potentially temporally unsafe operations or by designing sufficient intelligence into the drivers to prevent them from occurring.

From a performance standpoint, it is desirable to transfer data to driver memory space and allow the application to continue operating, with the expectation of a pro-cess subsequently transferring the data from driver memory space to the device.

Improper device operation can also occur if the device has some form of FIFO (First In, First Out) mechanism that would allow the data to decay since the FIFO memory is not refreshed periodically in the hardware.

Designing the system to allow the support process to run only during idle time is an acceptable practice; it does not violate the temporal partitioning or the potential certi-fiability of the system as long as the process is preemptable and is of a low enough priority that it does not affect the temporal schedule imposed by ARINC 653

Page 27: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

27

The following chapters are built on IMA-guaranteed time & space

partitioning fully exploiting IMA potential.

2.3 The Must (sometime) 1: IMA driven Modular Certi fication 3 Airplanes are certified as a whole: there is no est ablished basis for separately

certifying some components, particularly software-i ntensive ones, independently of their specific application in a gi ven airplane.

The absence of separate certification inhibits the development of modular compo-nents that could be largely "precertified " and used in several different contexts within a single airplane, or across many different airplanes.

Here is to examine the issues in modular certification of software components and propose an approach based on assume-guarantee reasoning. The method is ex-tended from verification to certification by considering behavior in the presence of failures. This exposes the need for partitioning, and separation of assumptions and guarantees into normal and abnormal cases. Then there three classes of property that must be verified within this framework are identified: safe function, true guaran-tees, and controlled failure.

The federated architecture is expensive (because of the duplication of resources) and limited in the functionality that it can provide (because of the lack of interaction among different functions). There is therefore a move toward integrated modular avi-onics (IMA) architectures in which several functions share a common (fault tolerant) computing resource, and operate in a more integrated (i.e., mutually interactive) manner.

A similar transition is occurring in the lower-level "control" functions (such as engine and auxiliary power unit (APU) control, cabin pressurization), where Honeywell has developed a modular aerospace controls (MAC) architecture. IMA and MAC architec-tures not only allow previously separate functions to be integrated, they allow individ-ual functions to be "deconstructed" into smaller components that can be reused across different applications and that can be developed and certified to different criti-cality levels.

Certification costs are a significant element in aerospace engineering, so full realiza-tion of the benefits of the IMA and MAC approach depends on modularization and reuse of certification arguments. However, there is currently no provision for separate or modular certification of components: an airplane is certified as a whole. Of course, the certification argument concerning similar components and applications is likely to proceed similarly across different aircraft, so there is informal reuse of argument and evidence, but this is not the same as a modular argument, with defined interfaces between the arguments for the components and the whole.

The basic idea that we wish to examine is portrayed in Figure 5; here X represents some function, and Y the rest of the aircraft. In the traditional method of certification, shown on the right, certification considers X and Y as an indivisible whole; in modular certification, shown on the left, the idea is to certify the whole by somehow integrating (suggested by the symbol +) properties of X considered in isolation with properties of Y considered in isolation.

Page 28: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

28

Many benefits would accrue if such a process were feasible, especially if the function were reused in several different aircraft, or if there were several suppliers of X-like functions for a single aircraft.

The problem is that conventional designs, and the notion of an interface, are con-cerned with normal operation, whereas much of the consideration that goes into certi-fication concerns abnormal operation, and the malfunction of components. More par-ticularly, it concerns the hazards that one component may pose to the larger system, and these may not respect the interfaces that define the boundaries between com-ponents in normal operation.

It seems that the potential hazards between an aircraft and its functions are suffi-ciently rich that it is not really feasible to consider them in isolation: hazards are not included in the conventional notion of interface, and we have to consider the system as a whole for certification purposes. This is a compelling argument; it demonstrates that modular certification, construed in its most general form, is infeasible. To devel-op an approach that is feasible is to focus aims more narrowly. Now, the main con-cern is software, so it might be to develop a suitable approach by supposing that the X in Figure 5 is the software for some function that is part of Y (e.g., X is software that controls the thrust reversers).

Though it is easy to see that this interpretation is completely unworkable: how to possibly certify control software separately from the function that it controls. It seems to need to focus our interpretation even more narrowly. The essential idea of IMA and MAC architectures is that they allow software for different functions or sub-functions to interact and to share computational and communications resources: the different functions are (separately) certified with the aircraft, what is new is that we want to conclude that they can be certified to operate together in an IMA or MAC en-vironment, Figure 6.

Figure 5: Modular versus traditional certification

Figure 6: Multi module modular certification

Page 29: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

29

Figure 7 shows this approach, where A(X1) and A(X2) represent assumptions about X1 and X2, respectively, and the dotted lines are intended to indicate that we perform certification of X1, for example, in the context of Y, and assumptions about X2.

As mentioned, assume-guarantee reasoning is known and used in computer science but it is used for verification, not certification. Verification is concerned with showing that things work correctly, whereas certification is also concerned with showing that they cannot go badly wrong even when other things are going wrong. This means that the assumptions about X2 that are used in certifying X1 must include assump-tions about the way X_ behaves when it has failed! This is not such an improbable approach as it may seem it corresponds to the way avionics functions are actually designed. Avionics functions are designed to be fault tolerant and fail safe; this means, for example, that the thrust reverser may normally use sensor data supplied by the engine controller, but that it has some way of checking the integrity and re-gency of that data and will do something safe if that data source ceases, or becomes corrupt. In the worst case, we may be able to establish that one function behaves in a safe way in the absence of any assumptions about other functions (but it behaves in more desirable ways when some assumptions are true). There are applications and algorithms that can indeed operate under such worst case assumptions (these are called "Byzantine fault-tolerant" algorithms).

This analysis suggests that we can adapt assume-guarantee reasoning to the needs of certification by breaking the various assumptions and guarantees into normal and (possibly several) abnormal elements. We then establish that X1 delivers its normal guarantee, assuming that X2 does the same (and vice versa), and similarly for the various abnormal assumptions and guarantees. It will be desirable to establish that the abnormal assumptions and guarantees do not have a "domino effect": that is, if X1 suffers a failure that causes its behavior to revert from guarantee G(X1) to G'(X1), we may expect that X's behavior will revert from G(X2) to G'(X2), but we do not want the lowering of X2's guarantee to cause a further regression of X1 from G'(X1) to G"(X1) and so on. In general, there will be more than just two components,

Now are identified the elements that together create the possibility of modular certifi-cation for software:

� Partitioning : protects the computational and communications environment per-ceived by nonfaulty components: faulty components cannot affect the computa-tions performed by nonfaulty components, nor their ability to communicate, nor the services they provide and use. The only way a faulty component can affect nonfaulty ones is by supplying faulty data, or by performing its function incorrectly. Partitioning is achieved by architectural means: in IMA and MAC architectures it is

Figure 7: Assume -guarantee modular certification

Page 30: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

30

the responsibility of the underlying bus architecture, which must be certified to construct and enforce this property, subject to a specified fault hypothesis.

� Assume-guarantee reasoning : allows properties of one component to be estab-lished on the basis of assumptions about the properties of others. The precise way in which this is done requires care, as the reasoning is circular and potential-ly unsound.

� Separation of properties into normal and abnormal : allows assume-guarantee reasoning to be extended from verification to certification. The abnormal cases al-lows to reason about the behavior of a component when components with which it interacts fail in some way. This is to say that a component is subject to an exter-nal failure when some component with which it interacts no longer delivers its normal guarantee; it suffers an internal failure when one of its own subcompo-nents fails. Its abnormal assumptions record the external fault hypothesis for a component; its internal fault hypothesis is a specification of the kinds, numbers, and arrival rates of possible internal failures.

Certification of an individual component must establish the following two classes of properties:

� Safe function : under all combinations of faults consistent with its external and internal fault hypotheses, the component must be shown to perform its function safely (e.g., if it is an engine controller, it must control the engine safely).

� True guarantees : under all combinations of faults consistent with its external and internal fault hypotheses, the component must be shown to satisfy one or more of its normal or abnormal guarantees.

� Controlled failure : avoids the domino effect. Normal guarantees are at level O, abnormal guarantees are assigned to levels greater than zero. Internal faults are also allocated to severity levels in a similar manner. We must show that if a com-ponent has internal faults at severity level i, and if every component with which it interacts delivers guarantees on level i or better (i.e., numerically lower), then the component delivers a guarantee of level i or better. Notice that the requirement for true guarantees can be subsumed within that for controlled failure.

2.4 The Must (some day) 2: IMA driven In-Time Certi fication 4 In-time certification is a goal-based approach in which unconditional claims delivered by formal methods are combined with other evidence in multi-legged cases sup-ported by Bayesian analysis and

� the necessity of extending this to compositional certification and

� the possibility of adaptive systems in which methods of analysis traditionally used to support certification at design time are instead used for synthesis and monitor-ing at runtime and certification is performed “just-in-time.”

Certification is a judgment based on a body of material that consists of three ele-ments: claims, evidence, and argument. The claims identify the adverse conse-quences to be considered and the degree of risk considered tolerable; evidence comprises the results of analyses, reviews, and tes ts ; and the argument makes the case, based on the evidence, that the claims are satisfied.

Page 31: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

31

The traditional approach to software certification may be called “standards based” and largely requires the applicant to follow prescribed processes (e.g., DO-178B for airborne software, the Common Criteria for computer security, or IEC-61508 for pro-grammable devices) and to develop specified evidence (e.g. MC/DC tests for DO-178B Level A).

It is to discuss a goal-based approach to software certification and on advances in the power of automated formal methods of analysis.

Certification is applied to complete systems , with scrutiny penetrating down into subsystems. But a transition to compositional certification of systems based on separately certified components has become urgently desirable and further adap-tive systems that configure, assemble, or even synt hesize their behavior at runtime and considers the provocative possibility that certification, too, could part-ly be performed “just-in-time ”.

The conceptual basis for all methods of certificati on is similar in principle to formal verification: for certification it is needed to anticipate all possible cir-cumstances that can arise in the interaction of the system with its environment and to show that none of them poses unacceptable ri sks of adverse conse-quences; in formal verification there are considere d all reachable states of the system in interaction with its environment and it i s to show that none of them violates desired invariants (i.e., properties speci fied over the state variables of the system and its environment). The spaces of “all possible circumstances” or “all reachable states” are vast, if not infinite , and so it is to employ abstrac-tion or approximation to group similar circumstance s or states together so that only a feasibly finite number of cases need be cons idered.

Safety analysis methods are mostly applied by hand to the design and environment of the system as specified in documents describing its requirements, specifications, and assumptions. These descriptions are mostly informal, but as industry practice moves toward model-based development, so they become increasingly formal and it is feasible that the methods of analysis and abstraction employed in certification can be formalized also, and assisted by automated tools.

The design documents subjected to methods such as hazard analysis are generally quite high level, so it is to be sure that the lower levels of design and implementation do not introduce new risks. In traditional certification practice, this is done by requiring conservative design practices (e.g., no d ynamic scheduling) and ex-tensive processes of review and evaluation to show that the detailed design and implementation exactly matches its specification. It is challenging to demonstrate ex-act compliance between an implementation and its specification, so it is common to require several forms of assurance: for example, conservative design practices, plus reviews, plus testing .

Certification is concerned with risk, which is understood as a combination of the severity of adverse outcome and its likelihood , and most certification regimes require an inverse relationship between these two measures (!! !). Likelihood of an adverse outcome is naturally expressed and analy zed in terms of probabili-ties (understood as frequencies of occurrence in the long run), so it is standard to set claims such as 10-9 per flight hour for the probability of catastrophic failure in an air-borne system.

There are actually two kinds of uncertainty in certification: one is the likelihood of ad-verse outcomes (e.g. the probability of failure on demand), the other concerns the

Page 32: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

32

efficacy of the assurance process intended to guarantee the required likelihood. Dis-cussion and analysis of this second concern requires some mathematical framework for reasoning about uncertainty in human judgments.

Standards-based approaches to certification generally distinguish different levels of criticality (e.g., Levels A to E in DO-178B , Evaluation Assurance Levels (EALs) in the Common Criteria, and Safety Integrity Levels (SILs) in IEC-61508) and require different kinds and amounts of evidence for the different levels.

Different levels of criticality might identify, say: 10-9, 10-7,and 10-5 as tolerable proba-bilities of adverse outcomes, but do not generally state goals for confidence in the assurance process itself.

Under DO-178B, for example, Level A software (the most critical) requires more test-ing evidence (MC/DC coverage) than does Level B. However, when static analysis was applied to a variety of Level A and Level B avionics software, significant num-bers of anomalies were found and there was no discernible difference in anomaly rates between the two levels.

Thus diversity is the idea, or hope, that different methods (of design, implementation, or analysis) will fail independently and hence their combination should give a multipli-cative increase in reliability or confidence. Independence of failures in multiple im-plementations (as assumed in n-version programming) is viewed skeptically and its employment in assurance cases should raise similar concerns. A more principled consideration of multiple forms of evidence uses Bayesian Belief Nets (BBNs), which explicitly represent dependence among differe nt items of evidence, to evaluate what are called “multi-legged” assurance cases. Bayes theorem is the principal tool for analyzing subjective probabilities: it allows a prior assess-ment of probability to be updated by new evidence to yield a rational posterior probability; BBNs allow this computation to be extended to complex m odels.

Some forms of evidence confer unconditional claims: for example, suitable static analysis delivers evidence for unconditional absenc e of run-time exceptions or floating point over/underflow . The static analysis may be flawed, so its evidence is contingent, but the claim that it supports is unconditional.

Modern highly automated formal methods, which include static analysis, can provide evidence for many unconditional properties (possibly contingent on assumptions about other properties).

Multi-legged arguments based on this kind of evidence “add up” easily: they deliver the conjunction of their individual properties as an unconditional claim. This conjoined claim may then be combined with other forms of evidence, such as testing and re-views.

Currently, hazard analysis and other safety assurance techniques are applied to rela-tively high level and informal specifications because they are performed by hand. Modern formal methods could allow certain strong claims to be analyzed directly on low-level specifications

2.4.1 Only with IMA: Compositional Certification The FAA certifies only airplanes, engines, and propellers; there is no provision for certifying a software component such as an operating system separately from the certification of a specific airplane in which it is used. Recent advisory circulars on re-usable software components and guidelines on integrated modular avionics (IMA)

Page 33: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

33

make some provision for taking the assurance case for a software component from one airplane certification into another, but they fall a long way short of endorsing a compositional approach to certification in which components can be separately certi-fied and systems using these do not need to reexamine their content.

The reason why the FAA and other certification authorities consider only whole sys-tems is that safety, for example, is a system property, so the system must be consid-ered in its entirety. This point is, of course, true, but does not explain why system cer-tification could not be largely a compositional argument based on separately cer-tified components .

� Compositional verification of correctness assumes t he integrity of the ana-lyzed components. Thus composed of a sender and receiver, verification of the sender will be performed on the basis of assumptions about the receiver. These assumptions will include failure modes, such as the possibility that the receiver may drop or duplicate messages, or even that it will respond unpredictably. But the assumptions will not include the possibility that a failure in the receiver will change the program executed by the sender, modify data in its private memory, or prevent it from running at all. Yet if the both sender and receiver share a pro-cessor with inadequate isolation between processes, then it is quite possible for a malfunctioning receiver to write into the program or data memory of the sender, or to monopolize the CPU. Thus, certification can make use of compositional verifi-cation only if mechanisms are present that guarantee the integrity of the verified components and their interfaces.

� The property that must be guaranteed by these mechanisms is “robust parti-tioning” in avionics. The mechanisms themselves include operating system ker-nels, and distributed communication systems; their construction and certification is a specialized activity that can be performed independently of the application systems that they support; e.g. the protection profile for separation kernels and consideration of safety-critical bus architectures partitioning mechanisms must provide composability for application software and must themselves be additively compositional. Composability means that properties of subsystems are preserved under composition. Thus, the properties of an application subsystem are un-changed when it is composed with a partitioning mechanism. More subtly, if several subsystems are composed with a partitioning mechanism,

then composability ensures that no subsystem can interfere with

the properties of another. Hence, composability means that properties of subsystems are both preserved and guaranteed by partitioning. Thus partitioning creates an environment in which application subsystems cannot interfere with one another, but they can cooperate e.g. an air data subsystem can provide airspeed and other sensor data to an autopilot. Partitioning guarantees the integrity of the-se subsystems, so it it’s now reason about their composition using computer sci-ence methods for compositional verification.

So it is going for compositional verification for certification, which is the possibility of interaction through the controlled plant.

Page 34: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

34

2.4.2 Only with IMA: the next step: Synthesis and a further step: Just-In-Time Certification

The main drivers for revisiting the way certification is performed are changes in the time at which a system’s configuration and behavior are finally determined. Tradition-ally, critical systems were developed and assembled as bespoke artifacts and certi-fied as a definitive unit. Compositional certification is a refinement to this approach, which recognizes that systems are now assembled from components that will be used in many different systems; hence, it is attractive to pre-certify the compo-nents so that system certification then becomes focused on their integration and the specific configuration and unique attributes of the final system.

But, in the next step, the final configuration of many systems is now determined later than the time of design and certification: system instances may configure themselves at installation or load time, and may even reconfigure themselves at runtime.

If the system had only a single configuration then certification would consider that configuration specifically, directly checking its attributes against requirements. And this is the opportunity: if one could mechanize this check and transfer it to load time or runtime, then it would retain many of the characteristics of certification; in particu-lar, it would be performed against requirements or a model, and it need consider only the single configuration that is about to be installed to have achieved “just in time certification. ”

A design-time certification approach might formally verify that the component fol-lows the protocol, while a just-in-time approach could generate a runtime moni-tor that blocks any interactions that violate the protocol. The runtime monitor would be synthesized from the model that specifies the protocol using very similar—and equally trustworthy—techniques as those used in formal verification.

Synthesis of monitors and checkers is widely researched and practiced under the name runtime verification (www.runtime-verification.org) and should be relatively un-controversial. However, the usual interpretation has the monitor synthesized at de-sign time and applied at runtime.

One can instead imagine adaptive systems where component compositions are cre-ated dynamically. In traditional design-time composition , each component has builtin knowledge about the components with which it interacts (e.g., the algorithm of the sender in a communication protocol is designed—and certainly verified—using knowledge about the expected behavior of the receiver).

For runtime composition , this knowledge must be acquired dynamically and so I propose that components make explicit models of their behavior and of their re-quirements for safe operation available to other components. This idea—that compo-nents publish a model of their capabilities and that these can be used to check or synthesize component interactions—is already found in restricted forms in the inter-face automata.

Beyond runtime synthesis of monitors lays synthesis of the component interactions themselves. Service Oriented Architecture (SOA) is an approach to system com-position that already does this: components are assembled (using deductive meth-ods) to achieve specified goals based on their published service descriptions. Cur-rent realizations of the approach use weak models but could evolve using richer forms of logic and deduction into a certifiable form of self-assembly .

Page 35: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

35

Beyond adaptation lies full behavioral synthesis . In this scenario, the environment of a component is represented by the sum of the models published by the compo-nents with which it interacts; each component then strives to discharge its claims, while avoiding behaviors that lead to unacceptable outcomes. Here, component models will include attributes of their controlled plant and will typically be hybrid sys-tems. Calculation of a winning strategy requires a search over a large space. Hence, the computational cost of solving a synthesis problem is formidable, but is made fea-sible by advances in mechanized deduction, notably, SMT solvers, and the power of modern processors.

Notice that tools, or techniques, formerly used for verification, such as model check-ers, are here being used for synthesis and monitoring. Certification can build on this: we trust these techniques in the analysis of safety at design time, so why not trust them to synthesize and/or monitor safety at runtime? In fact, runtime methods could be more credible than their design-time ancestors—for at design time we must antic-ipate all possible future states of the system, whereas in controller synthesis we need only consider those states reachable from the current state, and possibly only so far into the future. We could imagine the synthesis generating a certificate, rather as some theorem proving techniques can generate an independently checkable proof object. Such a certificate would truly be just-in-time certification .

2.5 The Must (right now) 3: Modular Certification – CMU SEI PACC and IMA 5

Predictability by Construction:

Building high-stakes systems from certified softwar e components

or

Predictable Assembly from Certifiable Components (P ACC)

It is to fast assemble new and innovative software-intensive systems for high-stakes applications from certified, trusted software components . Timing estimates made at design time are routinely within required tolerance with 99.99% confidence or bet-ter.

Critical safety and security properties are formally verified, and one has obtained firm control over the quality of software components developed by suppliers from around the world. These capabilities give a potent differentiator in a increasingly competitive industry. Engineering predictability beyond testing is the use of software architec-ture to predictably satisfy the quality requirements of software systems. PACC takes architectural predictability to the extreme ranges of rigor and objective confidence, and incorporates architecture design directly into the basic units of software construc-tion. The approach is to develop analytic theories that predict the behavior of entire classes of systems. In this approach, the emphasis is on confirming the validity of theories. Once confirmed, one can be sure that all systems satisfying the assump-tions of the theory will have behavior that is predictable in that theory.

Zones of predictability:

PACC divides the world into systems whose behaviors are predictable by analytic means and those whose are not, Figure 8, Figure 9. Analytically predictable systems

Page 36: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

36

are said to lie within the set, or zone, of predictable assemblies (systems assembled from components).

The behavior of assemblies that are in the zone can be determined at design time with objective confidence. Assemblies that lie outside the zone are not analytically predictable. Their behavior can only be observed after they are built.

Figure 8: Component assemblies are “in the zone” if their runtime behavior is analytically pre-dictable.

Assemblies may be predictable using any number of analytic theories (timing, securi-ty, safety, fault tolerance, and power consumption) and may therefore lie within any number of zones of predictability.

Figure 9: Each assembly (system assembled from components) in the zone has a corr e-sponding model in that zone’s analytic theory.

not OK

OK

PACC provable OK

zone

Page 37: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

37

The PACC approach is to ensure that we build only systems that lie within the re-quired zones of predictability. Our focus is on critical qualities that are likely to be of significant business value, such as performance, safety, and security.

Objective confidence Predictions must be trustworthy. Objective confidence, through rigorous empirical and formal validation, is the acid test PACC applies to predictions. This distinguishes PACC from other model-based and generative ap-proaches to software development.

The predicted behaviors of all assemblies have associated measures of confidence. This confidence is the bankable outcome of predictable assembly—leading to shorter development cycles, decreased development and testing costs, and higher quality systems.

Predictability by construction The aim is to build only those systems that have predictable behavior, rather than to predict the behavior of any systems we build. PACC technology ensures that the assumptions of analytic theories are established as invariants during system construction.

So the systems are predictable because the component technology guarantees that assemblies always stay in their required zones.

Figure 10: PACC Assignment of a statistical confide nce label to analytic theories

Page 38: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

38

Trusted components: PACC establishes a foundation for trusted components, one that goes beyond functional correctness to encompass other “as built” qualities that components must possess. Analytic theories tell us which qualities are needed to support predictability.

Figure 11: Trusted components must provide more ins ight into their inner workings. Analytic theories require additional quality-specific proper ties such as task execution time to make pre-dictions. These quality-specific properties are exp osed as analytic interfaces of components

All analytic theories make assumptions. Some of these assumptions dictate what we need to know and trust about components. So a performance theory might require that we know the maximum non-blocking execution time for all component opera-tions, while a safety theory might require that it’s to possess a state machine for each component operation.

In these examples, analytic theories require visibility into some aspect of the behavior or implementation of components.

These new visibilities make up the analytic interface of a component. Different analyt-ic theories will, in general, require different analytic interfaces; each such interface defines a distinct, analysis-specific view of the component.

To have objective confidence in the predictions of analytic theories, one must also have objective confidence in the parameters to these theories components’ analytic interfaces. PACC therefore establishes an objective, measurable, and predictive foundation for component trust and certification.

A higher order component technology

The SEI develops a component technology that is parameterized by analytic theo-ries. Filling in these parameters with specific analysis theories, for performance and security for example, results in a prediction-enabled component technology (PECT).

Page 39: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

39

Prediction enabling means that systems built using the component technology will be guaranteed by construction to be in the zone of predictability for critical system prop-erties.

Figure 12: To support predictability, a component t echnology must exhibit the minimal, ideal design pattern shown above. In this ideal pattern, all interactions among components are ex-posed and use standard connection mechanisms. Addit ionally, resource management and coordination policies are defined and enforced by a standard component runtime environment

A strict foundation

A PECT requires a strict and well-defined component technology at its core. The component technology we use is a result of close examination of the best features and patterns found in today’s component technologies. The design pattern differs from those patterns in its combination of features and the strictness with which it is enforced. It includes only the features needed to support predictability by construc-tion.

Reasoning frameworks

That is the extension of component technology with reasoning frameworks. The ele-ments of a reasoning framework are an analytic theory, based on a solid foundation such as queuing theory, rate monotonic scheduling theory, finite-state automata, or temporal logic.

Page 40: D-AV-3 2 2B Beschreibung Time & Space Partitioning and the ...spes2020.informatik.tu-muenchen.de/results/AV-AP3 D-AV-3.2.B.pdf · D-AV-3.2.2B: Beschreibung von Time & Space Partitioning

D-AV-3.2.1A: Anforderungen zu Time & Space Partitioning in der Avionik

40

Literaturverzeichnis: 1 John Rushby, Stanford Research Institute International (SRI)

NASA/CR-1999-209347 Partitioning in Avionics Architectures: Requirements, Mechanisms, and Assurance http://www.sri.com/ 2 Justin Littlefield-Lawwill

GE Aviation, Grand Rapids, Michigan Larry Kinnan

Systems, Alameda, California SYSTEM CONSIDERATIONS FOR ROBUST TIME AND SPACE PARTITIONING IN INTEGRATED MODULAR AVIONICS http://www.geaviationsystems.com/About/Locations/North-America/Grand-Rapids/index.asp http://www.windriver.com/ 3 John Rushby, Stanford Research Institute International (SRI)

NASA/CR-2002-212130 Modular Certification http://www.sri.com/ 4 John Rushby, Stanford Research Institute International (SRI)

AFRL, Raytheon, NASA/NNL06AA07B, NSF grant CNS-0644783 Just-in-Time Certification http://www.sri.com/ 5 Linda M. Northrop

CMU SEI Predictability by Construction: PACC http://www.sei.cmu.edu/predictability/ http://www.sei.cmu.edu/library/assets/pacc.pdf Software Product Lines ISBN-13: 978-0201703320