mt311 java application development and programming languages li tak sing ( 李德成 )

Post on 13-Dec-2015

223 Views

Category:

Documents

7 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MT311 Java Application Development and Programming

Languages

Li Tak Sing (李德成 )

Record Type

Nearly all modern programming languages have ways to represent a number of data items as one collection, usually using some form of record types. The next reading discusses the design and implementation of record types.

Record Type

The key questions to answer as you look at the design of record types are:– What is the syntax of referencing a field in a

record? The use of the -> operators in C increase the writability.

– Are elliptical references allowed? Elliptical references means omitting the record name in a reference to a record field when no ambiguity will result.

Record Type

– What operations are allowed on the whole record? Apart from referencing fields, usually only a few operations are allowed on a record as a whole in typical programming languages. For example in C, the operations allowed are taking the address (the & operator) and assignments:

Record Type

struct product {

int prod_no;

float price;

float weight;

} prod1, newprod;

struct product *p;

...

newprod = prod1;

p = &newprod;

Record Type

– Is it possible to denote a value of a record type? Without this capability and if we want to change the values of all the fields of a record, we have to use a number of assignment statements to change the fields one by one. With this capability, we can just use one assignment statement to assign the denoted value to the record. For example, consider the following Ada code:

Record Type

type rec_type is

record

A: String(1..20);

B: Integer;

end record;

Now if we have a variable C of type rec_type, we use the following assignment statement to change the value of C:

C:=(“REC”,4);

Record Type

If this is to be done in Pascal, then two assignment statements are needed. Such denotation of record values is allowed in C only when a variable is initialized.

Union types

A union is a type that allows a variable to store values of different data types at different times during program execution. It is one way of solving the problem of not allowing a variable to change type during execution. In addition, it allows us to define heterogeneous data structures such as tree structures of integers or floating points.

Union types

However, the second use of union types is diminishing in object oriented languages because we can inherit different children from a root class and then construct a tree structure of the root class. Such structure should be capable of storing records of the children of different types. This method is more reliable than one that uses union types.

Union types

When you think about union types in a programming language, try to find solutions to these questions:– Is dynamic (run-time) checking of data type

required? Most languages do not check the type of the data stored in a union with respect to a tag. In this case there will be no assurance that the value stored is in the intended data type. This will make some data type errors undetected.

Union types

– Is union a special construct inside a record, or is it a new data type in the language? In languages such as C, C++, union is a data type in itself. In Pascal and Ada, the union must be part of a record, since at least a tag is required to be stored alongside with the union.

Pointer types

Allowing pointer types adds writability to a language. The programmer can more flexibly handle dynamic data structures like binary trees. Of course this flexibility may lead to problems like dangling pointers or lost heap-dynamic variables.

Pointer types

The key point in the design of pointer types are: What is the lifetime of a pointer variable?

Many pointer variables are storage bound at run time with memory allocated from the heap. They are called heap-dynamic variables.

Pointer types

Are pointers restricted as to the data types of values to which they can point? For example, in C, most of the time (but not always) you have to predefine the type of value pointed to by a pointer, e.g.:int *ptr1;

double *ptr2, *ptr3;

Pointer types

Does the language allow pointer types, reference types or both? We can view reference type variables as a restricted but safer form of pointers, since pointer arithmetic is not allowed on them. They are included in languages such as C++ and Java.

Expression

In a programming language variables with data types become one of the building blocks of expressions, and it is through expressions that we can perform calculations of various kinds. The other building blocks that compose an expression are constants, function calls, parentheses and operators.

Operator evaluation order

We all know that * and / has a higher precedence than + and -. However, how about boolean operators, unary operators like ++, -- in C or Java?

Boolean operators are likely to have lower precedence than + and - because the results of + and - can be operands of boolean operators but the reverse is not true.

Operator evaluation order

For example: the expression4+2=3*4is first evaluated to:6=7and is then evaluated to false.If = has a higher precedence, then the expression would first evaluated to:4+false*4which is an invalid expression.

Associative, left-associative, and right-associative

An operator is associative if (a b) c =a (b c)we understand that + and * are associative. So - and / are not associative.

An operator is left associative ifa b c = (a b) cSo - and / are left associative.

An operator is right associative if

Associative, left-associative, and right-associative

An operator is right-associative ifa b c = a (b c)exponentiation of real number is right-associative.

Operand evaluation order

When an operator has two operands, then the result may be different if the order of the evaluation of the order is different. Consider the following example:

int i=4;int j=(i++)+(i--);

Operand evaluation order

If the left operand of the + operator is evaluation, the result is j=4+5=9otherwise, the result is:j=3+4=7

Operand evaluation order

This would only happen if the operand has side-effect, i.e., evaluating the operand will have changes to some of the variables. If all the operands have no side effect, then the order of evaluating the operands are not important

Overloaded operator

The same operator can mean different operations according to the situation. * means multiplication of two numbers in C, but it also means dereferencing a pointer in C, if we use it as an unary operator. The concept that there are multiple meanings of an operator in different context is called operator overloading.

Overloaded operator

Some languages like C++, Ada, etc even allow a user to define additional meaning to an operator — in this case we have user-defined operator overloading.

Type conversion

Coercion is the implicit type conversion when an operator encounter operands of different types. For example:

int a=2;float b=2.3;float c=a+b;

In the example, a will be converted to float first before it is added.

Type conversion

Too much coercion may defeat the type checking mechanism of the language because incompatible operands will be converted to compatible types without the note of the programmer. So reliability is reduced.

Too less coercion will put the type conversion task on the shoulder of the programmer. So writability is decreased.

Explicit type conversion

Explicit type conversion is required for a strong typed language because, from time to time, there is a need to use the value of a variable in an expression that is not type compatible with the variable. However, this kind of conversion may lead to problems. In C, you can convert a pointer of one type to point to a value of any type. For example, the following program is syntactically correct:

Explicit type conversion

#include <stdio.h>

int main() {

float a,b;

int c, *t;

b=1.0;

t=(int*) &b;

}

t is pointing to a value that is supposed to be of type integer. However, the value is actually of type floating point.

Short-circuit evaluations

The binary Boolean operators, AND and OR, seem to be simple and innocent enough. But on closer look we could find out that the Pascal and, or operators are very different from the Java &&, || operators, due to the short-circuit evaluation of the Java operators.

Short-circuit evaluations

Consider the expression:A and Bif A has been evaluated to false, then, it does not matter whether B is true or false, the result will always be false. So in short circuit evaluation, if the result of an expression is already determined regardless of the value of the remaining operand, then it will stop further evaluation and just return the result.

Short-circuit evaluations

Similarly, if in the expression A or B and A has been evaluated to be true, then, in short-circuit evaluation, B will not be evaluated and true is returned.

Then, what is the problem if short-circuit evaluation is used?

Short-circuit evaluation

if in evaluating the operands of an expression, there are side effects and these side effects are what the programmer wants, then, using short-circuit evaluation may have problem because not all of these side effect may take place.

Any way, as a good programmer, should not use side effects to achieve your purpose.

Short-circuit evaluation

So what is the advantages of short-circuit evaluation?– it is more efficient in the evaluation of the

expression as sometimes there is no need to evaluate the whole expression;

– it solves the problem of evaluating a Boolean expression that is a conjunction of two so that when the first conjunct is false, the second conjunct is undefined.

Short-circuit evaluation

– The following is an example:(i>0) and (log(i)>0)

– if i is negative and the above expression is not short-circuit evaluated, then an error would occur because log(i) is undefined. However, if the expression is short-circuit evaluated, then the expression would have the value false. Thus writability is increased.

Assignments

The assignment statement is another feature that started out simple in older languages but has evolved to many different forms in modern languages such as C, C++ and Java.

Assignments

Multiple targets. Statements likea, b, c:=d;

will not only increase the writability, it will also increase the efficiency. This is because after the value of d was assigned to c, this value should already be in a register of the CPU. Therefore, the value can be copied to the address of b directly.

Assignments

If the above statement was separated into four, then the value has to be loaded into a register for four times.

Assignment as expression. Again, this would enable the compiler to produce more efficient code because the assigned value is already in a register after the execution of an assignment statement. However, it may decrease the readability if an assignment is mixed with other expression.

Compound statements

A compound statement is a collection of statements but the programmer can place it anywhere where only a single statement is allowed. In Pascal, a compound statement is marked at the two ends by the keyword begin and end. In C, C++ or Java, a compound statement is marked at the two ends by { and }.

Compound statements

Some languages do not need compound statements as they allow the programmer to put multiple statements in writing control statements like selections, loops. For example, in Ada, you can write statements like this:if a=b then

a:=4;

b:=2;

end if;

Compound statements

So multiple statements can be executed when the condition is satisfied without the need for compound statements. On the other hand, if the same logic is to be implemented in Java, we need a compoundstatement:

if (a==b) {

a=4;

b=2;

}

Compound statements

In some languages, you can declare variables at the beginning of a compound statement. This type of compound statements are called blocks. The compound statements in C are blocks while those in Pascal are not.

Variable declarations within a block

Below is a block in C (embedded in an if statement). You can see that C allows data declarations to be present in the beginning of a block, and the scope of temp is the statements inside the block.if (a < b) {

int temp;

temp = a; a = b; b = temp;

}

Variable declarations within a block

There are two advantages of allowing declaration within a block:– A variable can be declared close to where it is

used. There is no need to go to the beginning of the subprogram to look at the declaration of a variable.

– The variable is not visible outside the block where it should not be used. It will also avoid the problem of possible tampering with a global variable which is used in other parts of the program.

Variable declarations within a block

A pitfall of allowing declaration within a block is the declaration of a number of variables with the same name on different levels of nested blocks. This would decrease the readability of the program as it is difficult to figure out the true identify of these variables.

Selection statements

A frequently used control statement is the selection statement. The if statement in many languages is called a two-way selection statement, since the control flow is directed towards one of two available selections.

Two-way selection statements

Two-way selection statements seem to be very simple, but in the reading we have learned that different languages handle the problem of an ambiguous nested if statement in different ways. For example:– In Pascal, C, a semantic rule specifies that an

else clause is always paired with the most recent unpaired then clause.

Two-way selection statements

– In Algol 60, a compound statement must be used to enclose an if statement nested inside a then clause. No nested-if statement will then be ambiguous.

– In Algol 68, FORTRAN 77, Ada, etc, they have a special marker, such as endif in FORTRAN, marking the end of an if statement. Again, no nested-if statement will then be ambiguous.

Two-way selection statements

In terms of readability, the if-then-else-endif statements used in FORTRAN77 are much more readable then if-else statements used in C as no ambiguity will result in nested statements.

In addition, the former allows multiple statements to be selected using elseif and therefore there is no need to use compound statements. This would also simplify the structure.

Multiple selection statements

Some languages have multiple selection statements in addition to if statements. They are used to direct the control flow towards one out of many options. They are available as case statements and if-then-else-elseif-endif statements in many languages.

Multiple selection statements

writability of case or switch statements can be increased by:– allowing subranges to be specified in constant

lists;– allowing the use of OR in constant lists.

Multiple selection statements

Reliability is increased by:– enforcing that the constant lists must be

exhaustive. The compiler should be able to check if the list is exhaustive or not.

– prohibiting control to flow from one branch to another. C does not enforce this and therefore is less reliable.

Iterative statements

Selection statements are not the only type of control statements. We also have iterative constructs like for loops, while loops, etc that provide the programmer the ability to specify iteration. We classify these iterative statements into two types:– counter-controlled loops; and– logically controlled loops.

Counter-controlled loops

The for loops in C and Pascal are examples of counter-controlled loops. As the name suggests, usually a counter called the loop variable is used to control the number of times the iteration executes.

Counter-controlled loops

The following points explore some examples of different design issues when using counter-controlled loops in different languages:

Counter-controlled loops

– What are the pros and cons of allowing the program to explicitly change the value of a loop variable? Ada does not allow a value to be assigned to the loop variable explicitly. This avoids the problem of mistakenly changing the value of the loop variable. This property would therefore increase the reliability of the language.

Counter-controlled loops

If we have to change the value of the loop variable inside a loop, we should not have used a for loop in the first place. This is because a while loop should have been more appropriate. C and C++, Java allow the loop variables to be assigned new values within the loop explicitly. In fact, the for loop in C, C++ and Java is semantically closer to the while loop then to the for loop in other languages.

Counter-controlled loops

The for loop:

for (expr1; expr2; expr3)

statement;

is semantically equivalent to the while loop:

expr1;

while (expr2) {

statement;

expr3;

}

Counter-controlled loops

Therefore, the loop variable in C, C++ or Java does not receive any special treatments that exist in Ada.

Counter-controlled loops

– Should we use or not to use a usual variable as the loop variable? In Ada, we cannot use a usual variable as the loop variable. The declaration of the loop variable is integrated into the loop statement in Ada. In addition, the variable is only available within the loop.

Counter-controlled loops

An example of an Ada counter-controlled loop is:

for i in 1..10 loop

sum:=sum+a[i];

end loop;

i is declared within the for loop statement and its type is determined by the range specified.

Counter-controlled loops

In some other languages like Pascal and C, the loop variable is declared in the same way as other variables. There is one advantage in the Ada design: the variable is specially declared and is not visible outside the loop, it can never be mistakenly assigned a different value within the loop or when a subprogram is called within the loop. On the other hand, when a global variable is used as a loop variable, then the variable may be visible in other subprogram. There is no guarantee that the variable will not be changed when a subprogram is called within the loop.

Counter-controlled loops

– How flexible is the counter controlled loop when the value of the loop variable is changed after an iteration? In most languages like Pascal and Ada, we can only increment or decrement the value of the loop variables by a constant amount after each iteration of the loop. C and C++ allow the programmer to use any statements to change the value of the loop variable.

Counter-controlled loops

The advantages of the Pascal and Ada design in this instance are:

1 Both languages allow the compiler to generate very efficient code for changing the value of the loop variable after each iteration. This is because the amount to be incremented or decremented is the same very times. Therefore we can store the value of the loop variable and the value to be incremented or decremented in two registers of the CPU.

Counter-controlled loops

2 As the change of the value of the loop variable is constant, it is less likely that the loop will not terminate. On the other hand, the design of C has the advantage in that it is very flexible.

Counter-controlled loops

– Does the language provide a construct that enables us to exit from the middle of a loop other than using goto? Pascal and FORTRAN do not allow us to exit from the middle of a loop except with the use of goto statements. Ada and C allow us to exit from the middle of the loop with the use of exit and break respectively. The difference between the use of goto and exit is that the former can re-direct the control to anywhere while the latter can only re-direct the control to the statement after the loop.

Counter-controlled loops

Therefore, the latter is much more readable. The ability to exit from the middle of a loop would increase the writability and efficiency of the language. C also provides the continue construct to skip the rest of the statements in one iteration.

Counter-controlled loops

– Are the values of loop parameters evaluated once for every iteration? The loop parameters include the initial value of the loop variable, the value to be added or subtracted from the loop variable every time and the final value of the loop variable when the loop should terminate. Most languages evaluate these parameters only once. We have mentioned earlier that the for loop in C is semantically similar to a while loop.

Counter-controlled loops

Therefore, the condition specifying whether the loop should terminate is evaluated before every iteration. It is, however, not advisable to specify the condition so that its value changes after each iteration because:

This would make the loop very difficult to read and check.

It is more likely that the loop will not terminate.

Logically controlled loops

In contrast to counter-controlled loops, logically controlled loops use a Boolean expression to control the continuity of an iteration. Pascal’s while-do and repeat-until loops, C’s while loops and dowhile loops are examples of logically controlled loop constructs.

Logically controlled loops

The design issues of logically controlled loops are quite similar to those of counter-controlled loops.– The condition will be tested every time either

before or after an iteration depending on whether it is a pre-test or post-test loop.

– Some languages provide exit or break to exit from the middle of the loop without the need to use goto. This increases the writability of the language.

Logically controlled loops

Generally, a while loop provides a much more flexible way of iteration then a for loop. The programmer has a greater control about what to do in each iteration. However, greater flexibility may also mean that it is more likely that the loop will not terminate. In addition, most languages would be able to generate more efficient codes for manipulating the loop variable of a for loop. This is not possible in a while loop.

Unconditional branching

Unconditional branching, also called the goto statement, is considered a dangerous construct. Undisciplined use of unconditional branching harms readability and renders a program very difficult to understand. Nevertheless, many commonly used languages still include a goto statement in case there is a real need for it in a program.

Unconditional branching

Although goto is generally considered hazardous, there are some reasons why it is still available in some modern languages:– It provides a way of exiting from a number of deeply

nested loops. The break or exit statement only enables one to exit from one loop.

– It provides a way of exiting from a number of deeply nested procedure calls. This is useful when a run time error is detected and we want to restart the computation.

Unconditional branching

The sensible way is, of course, to return from all the procedure calls and restart. This would be difficult because each of the called procedures have to be aware of the error and returns accordingly. Please note that this use of goto can be done more appropriately by using exception handling provided in Java, C++ and Ada.

Subprograms

We have already discussed the various building blocks of a program — the variables, expressions, assignments and control flow statements. One fundamental building block of a program remains to be studied: the subprograms. You should have written subprograms before: subprograms are procedures or functions in Pascal, or functions in C.

Subprogram

As a result, you should understand the concepts of subprogram call and parameter passing.

Subprogram

We have shown earlier that you can access non-local variables when you call a subprogram. So when you call a subprogram, you can pass information to the subprogram in two ways:– passing them as parameters;– passing them as non-local variables.

Passing information to a subprogram

The problem with accessing nonlocal variables is that if recursion is allowed, there may be a number of active instances in the same subprogram at any time. However, the same nonlocal variables are used as information, which is passed to these different instances of the subprogram.

Passing information to a subprogram

This means that such information may be tampered with by other instances of the subprogram and therefore may have been changed unintentionally.

Design issues for subprograms

In different languages, the issues concerning the design of the mechanism for subprogram calls are resolved in different ways. For example, different languages may have different parameter-passing methods: C uses only pass-by-value, while Pascal allows parameters to receive values in addition to passing values into subprograms.

Local referencing environments

In the discussion about Storage Bindings and Lifetime in the previous unit, you were told that variables could either be static, stack dynamic or heap dynamic. Local variables in a subprogram are actually implemented as either static or stack dynamic variables,

Local referencing environments

if recursion is allowed in a programming language, as is the case for Pascal, C and many other modern imperative languages, local variables in a subprogram must be stack dynamic. This ensures that a new storage area is allocated to the local variables for every recursive call to a subprogram.

Local referencing environments

On the other hand, if recursion is not allowed in the language, the relative advantages and disadvantages of static variables and stack dynamic variables, have to be considered by the designer of an implementation. Usually, they would all be static variables.

Parameter-passing methods

The main design issue of subprograms actually concerns parameterpassing methods. A programming language should have facilities for passing data into a subprogram through the parameter list. Some languages allow data to be passed out, and some allow data to be passed both ways.

Parameter-passing methods

We will look at how these requirements are implemented in today’s programming languages by reading about four implementation models of parameter passing:– Pass-by-value– Pass-by-result– Pass-by-value-result– Pass-by-reference.

Pass-by-value

Pass-by-value is usually implemented by using additional storage for the formal parameters. The storage is usually allocated from the stack. Then the actual parameters are copied to the storage allocated.

Pass-by-value

This method of parameter passing has the following properties:– It provides a way to allow the programmer to pass a

value to a function and be sure that this value would not be changed after returning from the function.

– In most implementations, both value parameters and local variables are located in the run time stack and are referenced in the same way. Therefore referencing a value parameter is as efficient as referencing a local variable.

Pass-by-value

– The actual parameters can be variables, constants, expressions.

The disadvantage of this parameter-passing method is:– The copying process would be inefficient if the

parameter size is too large. For example, if an array of 5000 integers is passed, it would be very time-consuming.

Pass-by-value

Although the parameter passed to a subprogram will not be changed when returned, it is interesting to notice that if such parameters contain pointers, it is still possible to change the values pointed to by these pointers when the subprogram is called. For example, consider the case that a tree structure is to be passed to a subprogram.

Pass-by-value

This tree structure is, of course, implemented using pointers. Even if this pointer is passed by value to the subprogram, the whole tree structure can still be changed after it is returned from the subprogram.

Pass-by-result

Pass-by-result is very similar to pass-by-value, except that pass-by-value is for out mode parameters.

Pass-by-result, like pass-by-value, uses extra storage for the formal parameters and then copies the formal parameters to the actual parameters when the subprogram returns. The actual parameters must be variables.

Pass-by-result

Problems will arise if the same variable is present twice or more in the actual parameter list of a single subprogram call. The main problem if the same variable is present twice or more is that different values will be assigned to the same variable and usually the language will not specify which one will be assigned first.

Pass-by-result

This may lead to different results when the same program is compiled and executed on different platforms. This method can be very inefficient if the parameter size is very large.

Pass-by-value-result

Pass-by-value-result is a method that combines pass-by-value and pass-by-result together for inout mode parameters.

Pass-by-reference

In this mode, only an access path, usually just a pointer, is transmitted to the subprogram. No extra storage and data copying is needed for the formal parameters when this method is used. This method has the following advantages:– It is more efficient because no copying is

required.

Pass-by-reference

– It is necessary to use the pass-by-refererence method if the actual size of a parameter is unknown at compile time. For example, C++ is an object-oriented language that allows inheritance. Assume that class A is inherited from class B. Thus, an instance of A can still be considered of type B. Now, if we want to write a single function that is capable of manipulating instances of type A or B, the function should accept an instance of B as parameter.

Pass-by-reference

However, the size of A and B may be different. Therefore, it is not possible to pass the parameter by value because the actual size of the parameter is unknown when the program is compiled. However, if the parameter is passed by reference, then the problem does not exist because the size of a reference is the same for any object that it is referenced.

Pass-by-reference

The disadvantages of the pass-by-reference method are:– The referencing of the actual parameter is less

efficient. This is because the actual value of the parameter has to be accessed indirectly through the reference passed.

– The alias problem may occur.– The actual parameters must be variables.

Example

void times(int a, int b) {

a = 2*a;

b = 3*b;

}

void main() {

int n = 1, array[3] = {4,5,6};

times(n, array[n]);

times(n, n);

}

Example

Assume that our language uses one of the following methods for parameter passing. What are the values of n and the array elements after each of the two function calls to times? Discuss the results.1 pass-by-value

2 pass-by-value-result

3 pass-by-reference

top related