防不胜防的软件错误 —— 例 1 : 1963 年, 美国, 飞往火星的火箭爆炸, 损失 $...

59
防防防防防防防防防 —— 防1 1963 防 , 防防 , 防防防防防防防防防 , 防防 $ 10 million. 防防 : FORTRAN 防防 DO 5 I = 1, 3 防防防 DO 5 I = 1.3 防2 [ 防防 Pfleeger 防防 “ REAL-TIME EXAMPLE”] Our real-time example is based on the embedded software in the Ariane-5, a space rocket belonging to the European Space Agency (ESA). On June 4, 1996, on its maiden flight, the Ariane-5 was launched and performed perfectly for approximately 40 seconds. Then, it began to veer off course. At the direction of an Ariane ground

Upload: philip-simmons

Post on 12-Jan-2016

321 views

Category:

Documents


8 download

TRANSCRIPT

防不胜防的软件错误 ——

例 1 : 1963 年 , 美国 , 飞往火星的火箭爆炸 , 损失 $ 10 million. 原因 : FORTRAN 循环 DO 5 I = 1, 3 误写为 DO 5 I = 1.3

例 2 : [ 摘自 Pfleeger 书中的“ REAL-TIME EXAMPLE”]

Our real-time example is based on the embedded software

in the Ariane-5, a space rocket belonging to the European Space

Agency (ESA). On June 4, 1996, on its maiden flight, the

Ariane-5 was launched and performed perfectly for

approximately 40 seconds. Then, it began to veer off course. At

the direction of an Ariane ground controller, the rocket was

destroyed by remote control. The destruction of the uninsured rocket was a loss not only of the rocket itself, but also of the four satellites it contained; the total cost of the disaster was $500 million.

The business impact of the incident went well beyond the $500

million in equipment. In 1996, the Ariane-4 rocket and

previous variants held more than half of the world’s launch

contracts, ahead of American, Russian, and Chinese launchers.

Thus, the credibility of the program was at stake, as well as the

potential business from future Ariane rockets.

Cause: There was no discussion in the SRI (Inertial Reference

System) requirements documents of the ways in which the

Ariane-5 trajectory would be different from Ariane-4.

软件测试是保证软件质量的关键步骤,是对软件规格说明、设计和编码的最后复审,其工件量约占总工作量 40% 以上(对于人命关天的情况,测试相当于其它部分总成本的 3 — 5 倍)。

Chapter 8 Testing the Software

• Software faults and failures • The purpose of software testing• Principle of Software Testing• Testing issues • Test Methods: Black-Box testing & White-Box testing • Unit testing & Integration testing• Function testing & Performance testing • Acceptance testing & Installation testing • Test documentation

8.1 Software Faults and Failures

Software failure—software does not do what the requirements describe.

Reason:•Wrong requirement•Missing requirement•Requirement impossible to implement•Faulty design•Faulty code•Improperly implemented design

Source of Software FaultsSource of Software Faults

8.2 Purpose of Software Testing

Testing is a process of executing a program with the intent of finding an error

A good test case is one that has a high probability of finding an as-yet-undiscovered error

A successful test is one that uncovers an as-yet-undiscovered error

Types of faults

• Algorithmic fault• Syntax fault• Computation and precision

fault• Documentation fault• Stress or overload fault• Capacity or boundary fault• Timing or coordination

fault

• Throughput or performance fault

• Recovery fault• Hardware and system

software fault• Documentation fault• Standards and procedures

fault

8.3 Principle of Software Testing

A programmer should avoid attempting to test his or her own program.

Test cases must be designed for input conditions that are invalid, as well as for those that are valid.

Test cases should be made up of two parts, those are the test input data and the corresponding expected output.

To start testing as soon as possible, and carry out testing continually.

The testing plans should be executed strictly.

8.4 Testing issues

Tested component

Tested component

Integrated modules

Functioning system

Verified validated software

Accepted system

System In Use!

Integration

test

Design Specifications

Function test

System functional

requirements

Performance test

Other software

requirements

Acceptance test

Customer requirements specifications

Installation test

User environment

Unit test

Unit test

Unit test

Component code

Find faults as best an one can , and try to correct them.

Attitudes toward Testing

Who perform the test?

developer

Understands the system, but will test “gently” and, is driven by “delivery”

independent tester

Must learn about the system, but will attempt to break it and, is driven by quality

The test object is viewed from outside as a closed box or black box whose content are unknown.

The goal of testing is to be sure that every kind of input is submitted, and the output observed matches the output expected.

8.5 Test Methods : Black-Box testing

validate functional requirements without regard to the internal workings of a program.

For Example:

ax2+bx+c=0

The testing object is viewed from outside as a open box (sometimes called clear box or white box ).

We can use the structure of the testing object to test in different ways.

Our goal is to ensure thatOur goal is to ensure that : all statements have been executed at least once

all logical conditions have been exercised.

8.5 Test Method : White-Box testing

For Example:

The choice of test philosophy depends of many factors:

•The number of possible logical paths

•The nature of the input data

•The amount of computation involved

•The complexity of the algorithms

Test Case Design

Objective: derive a set of tests that have the highest likelihood for uncovering the most errors with a minimum amount of time and effort.

Techniques:

Why not try exhaustive testing?

loop = 20 X

There are 5201014 possible paths! If we execute one test per millisecond, it would take 3,170 years to test this program!!

Exhaustive Testing

Why Cover?logic errors and incorrect assumptions are inversely logic errors and incorrect assumptions are inversely

proportional to a path's execution probabilityproportional to a path's execution probability

we often believe that a path is not likely to be we often believe that a path is not likely to be executed; in fact, reality is often counter intuitiveexecuted; in fact, reality is often counter intuitive

typographical errors are random; it's likely that typographical errors are random; it's likely that untested paths will contain someuntested paths will contain some

Main Technologies:

Logical coverage( 逻辑覆盖 ) —— 适用于白盒测试

覆盖程度由弱到强顺次为: ⑴ Statement coverage( 语句覆盖 ): 每个语句至少执行一次。

入口

A > 1AND B=0

T

A=2OR X > 1

T

X = X / A

X = X + 1

返回

F

F

Example:

Note: If AND is written as OR, or X>1 is written as X<1, the error will not be detected by the above test case.

Test case : A=2 , B=0 , X=4.

⑵ Branch coverage ( 判定覆盖 ) :在⑴的基础上,每个判定的每个分支至少执行一次。

Test cases:

①A=3 , B=0 , X=3

②A=2 , B=1 , X=1

Note : If X>1 is written as X<1, the error will not be detected by the above test case 。

入口

A > 1AND B=0

T

A=2OR X > 1

T

X = X / A

X = X + 1

返回

F

F

⑶ Condition coverage ( 条件覆盖 ) :在⑴的基础上,使每个判定表达式的每个条件都取到各种可能的结果。

Test cases:①A=2 , B=0 , X=4(满足 A>1, B=0; A=2, X>1)

②A=1, B=1, X=1( 满足 A1, B0; A 2,

X1 )

Q :条件覆盖 ? 判定覆盖 A: not always 。 counterexample:

A=2, B=0, X=1① ②A=1, B=1, X=2

⑷ Branch / Condition coverage : 即判定覆盖条件覆盖

入口

A > 1AND B=0

T

A=2OR X > 1

T

X = X / A

X = X + 1

返回

F

F

⑸ condition combination coverage( 条件组合覆盖 ) :每个判定表达式中条件的各种可能组合都至少出现一次。

入口

A > 1AND B=0

T

A=2OR X > 1

T

X = X / A

X = X + 1

返回

F

F

The whole possible combination :

① A>1, B=0 A>1, B② 0 ③ A1, B=0 A④ 1, B

0 ⑤ A=2, X>1 A=2, X⑥ 1 ⑦ A 2, X>1 A ⑧ 2,X

1 Test cases:

① A=2, B=0, X=4 (T T) ② A=2. B=1, X=1 (F T) ③ A=1, B=0, X=2 (F T) ④ A=1, B=1, X=1 (F F)

Note : T F can’t be tested

From the viewpoint of control flow graph :

⑹ node coverage=statement coverage

⑺ edge coverage=branch coverage

⑻ Path coverage ( 路径覆盖 ): 每条可能的路径都至少执行一次,若图中有环,则每个环至少经过一次。

Test cases: ① A=1 , B=1 , X=1 ② A=1 , B=1 , X=2 ③ A=3 , B=0 , X=1 ④ A=2 , B=0 , X=4

Loop Testing

Simple loop

Concatenated Loops Unstructured

Loops

Nested Loops

Loop Testing: Simple Loops1. 1. skip the loop entirelyskip the loop entirely2. 2. only one pass through the looponly one pass through the loop3. 3. two passes through the looptwo passes through the loop4. 4. m passes through the loop m < nm passes through the loop m < n5. (5. (n-1), n, and (n+1) passes through the loop n-1), n, and (n+1) passes through the loop

where n is the maximum number of allowable where n is the maximum number of allowable passespasses

Objective: validate functional requirements without regard to the internal workings of a program.

requirementsrequirements

eventseventsinputinput

outputoutput

Equivalence Partitioning

useruserqueriesqueries mousemouse

pickspicks

outputoutputformatsformats

promptsprompts

FKFKinputinput

datadata

An equivalence class represents a set of valid or invalid states for input conditions, so that there is no particular reason to choose one element over another as a class representative.

1.partition the equivalence class

Input condition valid data invalid data

… … …

Guidelines:

1. If an input condition specifies a range, one valid and two invalid equivalence classes are defined.

2. If an input condition requires a specific value, one valid and two invalid equivalence classes are defined.

3. If an input condition specifies a member of a set, one valid and one invalid equivalence classes are defined.

4. If an input condition is Boolean, one valid and one invalid equivalence classes are defined.

2.design the test cases

例:某报表处理系统,要求用户输入报表处理的日期,假设日期限制在 1990 年 1 月至 1999 年 12 月,即系统只能对该段时期内的报表进行处理。如果用户输入的日期不在此范围,则显示错误信息。该系统规定日期由年、月的 6 位数字字符组成,前 4 位代表年,后两位代表月。现用等价类划分法设计测试用例,来测试程序的“日期检查功能”。

Step1:partition and number the equivalence class

Input condition valid data invalid data

报表日期的类型及长度 1. 6 位数字字符

2. 有非数字字符

3. 少于 6 个数字字符

4. 多于 6 个数字字符年份范围 5. 在 1990~1999 之

间6. 小于 1990

7. 大于 1999

月份范围 8. 在 1~12 之间 9. 等于 0

10. 大于 12

Step2:design the test cases for the valid equivalence class

测试数据 期望结果 覆盖范围

199905 输入有效 1 , 5 , 8

Step3:design at least one test case for every invalid equivalence class测试数据 期望结果 覆盖范围

99MAY 输入无效 2

19995 输入无效 3

1999005 输入无效 4

198912 输入无效 6

200001 输入无效 7

199900 输入无效 9

199913 输入无效 10

Boundary Value Analysis (BVA)

"Bugs lurk in corners and congregate at boundaries ..."

Boris Beizer

Boundary value analysis is a test case design Boundary value analysis is a test case design technique that technique that complementscomplements equivalence equivalence partitioning.partitioning.

IF ( ReportData<=MaxData ) AND ( ReportData>=MinData )

THEN 产生指定日期报表 ELSE 显示错误信息ENDIF

<

Guidelines :1. If an input condition specifies a range boundary by

values a and b,test cases should be designed with values a and b just above and just below a and b;

2. If an input condition specifies a number of values, test cases should be developed that exercise the minimum and maximum numbers. values just above and just below minimum and maximum are also tested;

3. Apply guidelines 1 and 2 to ouput conditions;4. If internal program data structures have prescribed

boundaries, be certain to design a test cese to exercise the data structure at its boundary.

1 个数字字符5 个数字字符7 个数字字符有 1 个非数字字符全部都是非数字字符6 个数字字符

5

19995

1999005

1999.5

May---

199905

显示出错显示出错显示出错显示出错显示出错输入有效

仅有 1 个合法字符比有效字符少 1

比有效字符多 1

只有一个非法字符6 个非法字符类型及长度均有效

在有效范围边界上选取数据

199001

199912

199900

199913

输入有效输入有效显示出错显示出错

最小日期最大日期刚好小于最小日期刚好大于最大日期

月份为 1 月月份为 12 月月份 <1

月份 >12

199801

199812

199800

199813

输入有效输入有效显示出错显示出错

最小月份最大月份刚好小于最小月份刚好大于最大月份

报表日期的类型及长度

日期范围

月份范围

输入等价类 测试用例说明 测试数据 期望结果 选取理由

Other Black Box Techniques

• error guessing methods

• decision table techniques

• cause effect graphing

实用策略 (Practical Strategies) 黑盒设计 白盒补充

① 在任何情况下都应该使用边界值分析的方法;② 必要时用等价划分法补充;③ 必要时再用错误推测法补充;④ 对照程序逻辑,检查测试方案。可根据对程

序可靠性的要求采用不同的逻辑覆盖标准,必要时补充一些测试方案。

注 : 即使用上述综合策略设计测试方案,仍不能保证发现一切错误。例如 Lucent 公司经过包括逐行检查源代码在内的多方面测试之后,其软件能达标运行的成功率为 80%

Examining the code

Code Review—ask a group of experts to review both your code and its documentation for misunderstandings, inconsistencies, and other faults.

1.code walkthrough

present code and accompanying documentation to the review team, and the team comments on their correctness.

2.code inspection

the review team checks the code and documentation against a prepared list of concerns

8.6 Unit Testing

Proving Code Correct

• Formal proof techniques

• Symbolic execution

• Automated theorem proving

interface interface local data structureslocal data structures

boundary conditionsboundary conditionsindependent pathsindependent pathserror handling pathserror handling paths

modulemoduleto beto betestedtested

test casestest cases

Testing program components

8.7 Integration Testing

Big-bang testing

Test A,B, C, D

TestA

TestB

TestC

TestD

Chaos !

Isolation of

causes is complicated.

Incremental testing

Top-down

M

S1 S2M1

S3 S4

M2S2

verify major control or

decision points early.

no significant data

can flow upward.

Bottom-up

M M MM M

M

M M M

M M

M

D D D

D D

D

Cluster1

Cluster2 Cluster3

8.8 Function TestingPurpose and Roles

Tread: the set of actions associated with a function

Function testing Thread testing

例:水监控系统

需求:监测出 4 种特性的大变化

溶解氧

温度

酸度

放射性

测试:

确定溶解氧的改变

确定温度的改变

确定酸度的改变

确定放射性的改变

A test should:

• have a high probability of detecting a fault

• use a test team independent of the designers and programmers

• know the expected actions and output

• never modify the system just to make testing easier

• have stopping criteria

Cause-and-Effect Graphs

The inputs are called causes, and the outputs and transformations are effects.

The result is a Boolean graph reflecting these relationships, called a cause-and-effect graph.

Creating a cause-and-effect graph:

Step1: the requirements are separated so each requirement describes a single function

Step2: all causes and effects are described.

Example: a water-level monitoring system

Requirement: the system sends a message to the dam operator about the safety of the lake level

INPUT: The syntax of the function is LEVEL(A,B)where A is the height in meters of the water behind the dam, and B is the number of centimeters of rain in the last 24-hour period.

PROCESSING: The function calculates whether the water level is within a safe range, is too high, or is too low.

OUTPUT: The screen shows one of the following messages:“LEVEL = SAFE”, when the result is safe or low. “LEVEL = HIGH”, when the result is high.“INVALID SYNTAX”depending on the result of the calculation.

Causes:

1. The first five characters of the command “LEVEL”.

2. The command contains exactly two parameters separated by a comma and enclosed in parentheses.

3. The parameters A and B are real numbers such that the water level is calculated to be LOW.

4. The parameters A and B are real numbers such that the water level is calculated to be SAFE.

5. The parameters A and B are real numbers such that the water level is calculated to be HIGH.

Effects:

1. The message “LEVEL = SAFE” is displayed on the screen.2. The message “LEVEL = HIGH” is displayed on the screen.3. The message “LNVALID SYNTAX” is printed out

Intermediate nodes:

1. The command is syntactically valid.2. The operands are syntactically valid.

Table 9.2. Decision table for cause-and-effect graph.

Test 1 Test 2 Test 3 Test 4 Test 5Cause 1 I I I S ICause 2 I I I X SCause 3 I S S X XCause 4 S I S X XCause 5 S S I X XEffect 1 P P A A AEffect 2 A A P A AEffect 3 A A A P P

8.9 Performance testsPurpose and Roles

System performance is measured against the performance objectives set by the customer as expressed in the nonfunctional requirements

Types of performance testing:• Stress tests

• Volume tests

• Configuration tests

• Compatibility tests

• Regression tests

• Security tests

• Timing tests

• Environmental tests

• Quality tests

• Recovery tests

• Maintenance tests

• Documentation tests

• Human factors (usability) tests

8.10 Reliability, Availability, and Maintainability

Definitions

Software Reliability is the probability that a system is operate without failure under given conditions for a given time interval.

0:unreliable system

1:high reliable system

Software Availability is the probability that a system is operating successfully according to specification at a given point in time.

0: unusable system

1:completely up and running system

Software Maintainability is the probability that , for a given condition of use, a maintenance activity can be carried out within a stated time interval and using stated procedures and resources.

0: un-maintainable system

1: high maintainable system

Four different level of failure severity:

• Catastrophic

• Critical

• Marginal

• Minor

Measuring Reliability, Availability, and Maintainability

Mean Time To Failure( MTTF , 平均失效等待时间 ):

the average of interfailure times or times to failure( as t1,t2,…, tn).Ti: denote the yet-to-be-observed next time to failure

Mean Time To Repair( MTTR, 平均修复时间 ) : the average time it takes to fix a faulty software component.

Mean Time Between Failure( MTBF, 平均失效间隔时间 ) : MTBF=MTTF + MTTR

Reliability: R=MTTF/(1+MTTF)

Availability: A=MTBF/(1+MTBF)

Maintainability: M=1/(1+MTTR)

Reliability Stability and Growth

Reliability Stability: if the system’s interfailure times stay the same

Reliability Growth: if the system’s interfailure times increase

Reliability Prediction

Importance of the Operational Environment

Operational profile: describe likely user input over time

CREAT DELETE MODIFY

0.5 0.25 0.25

Two benefits of the statistical testing:

1. Testing concentrates on the parts of the system most likely to be used and hence should result in a system that the user finds more reliable.

2. Reliability predictions based on the test results should give us accurate prediction of reliability as seen by the user.

8.11 Acceptance Testing

Purpose and Roles

Enable the customers and users to determine if the system we build really meets their needs and expectations.

Types of acceptance tests

• Benchmark test: the customer prepares a set of test cases that represent typical conditions under which the system will operates when actually installed to evaluate the system’s performance.

• Pilot test: install on experimental basis

Alpha test: in-house test

Beta test: customer pilot

•Parallel testing: new system operates in parallel with old system

Result of Acceptance testing•The system is acceptable because the functions and performances accord with the requirement specification.

•The system can’t be accepted because the functions and performances differ with the requirement specification.

8.12 Installation Testing

The tests focus on two things:

Completeness of the installed system

Verification of any functional or nonfunctional characteristics that may be affected by site conditions