mt311 java programming and programming languages li tak sing ( 李德成 )

MT311 Java Programming and Programming Languages

Li Tak Sing (李德成 )

Programming Languages

You need to buy the following book:

Concepts of Programming Languages (7th Edition) by Robert W. Sebesta.

Programming Languages

We are not teaching programming in this module. We are teaching different features of different programming languages and comparing their advantages and disadvantages.

Motivation to learn programming languages

Simulate a useful facility, which is not supported in the language you are forced to use. For example, after studying how virtual methods are implemented in C++, you may try to simulate this facility in C.

Choose a suitable language for a project. After you know the criteria for critically evaluating a programming language, you are much more capable of choosing an appropriate language.


Learn or design a new language more easily. Most languages share common fundamental principles. For example, after studying the object-oriented language paradigm, you should find it easier to learn C++ or Java.


Write codes that are more efficient or fix a bug more easily. By studying the implementation and mechanism of different constructs, you should find that you are able to understand a language more deeply. For example, after studying the mechanism of parameter passing, you would avoid passing a large array as value parameter to a function in Pascal.

Criteria for language evaluation

readability writability reliability and cost.

Readability

Readability of a program measures how easily it can be understood. If a program is difficult to read, then it is difficult to implement and maintain. Since the maintenance stage is the longest stage in the software life cycle, it is important for a language to have high readability. The following reading discusses the factors that affect the readability of a language.

Readability

Five characteristics of programming languages can affect their readability:

1 Overall simplicity of a programming language. There are three factors that determine overall simplicity.

– Number of basic components. For example, there are 31 different binary operators in C:

Readability

Even a competent programmer would find it difficult to remember all of their functions.

– Alternative means for accomplishing the same operation. The method that is used by the author may not be one with which the reader is also familiar. For example, in C, the 3rd element of an array A can be addressed in two ways:

Readability

A[2] or *(A+2) (Note that the 1st element is A[0])If the programmer uses the latter form, then it is much less readable.

Readability

– The meanings of operators are re-definable. Programs are difficult to read if such definitions are not of common sense. For example, if A and B are records and A+B is defined to be the sum of one of the fields of the two records (like salary). Unless you trace back to the definition of the operator, it is highly unlikely that you will understand this statement.

Readability

2 Number of exceptional rules (orthogonality). If a programming language has only a small number of rules and the exceptions to these rules are few, then the language is easier to learn and therefore easier to read. For example, all data in Smalltalk is object and all respond to the message class, which

Readability

returns the class type of the object. Therefore, you should have no problem to understand the meaning when you encounter this statement. On the other hand, in Java, some data are primitive data while others are objects. Therefore you have to remember a different set of rules for primitive data and objects.

Readability

3 Control statements. It is known that program readability is severely reduced by indistinguishing usage of goto statements. If a programming language has sufficient control constructs, the need for goto statements can be nearly eliminated. This increases the readability.

Readability

4 Data types and data structures. The availability of appropriate facilities for defining user-defined data types and data structures in a language can also improve program readability significantly. For example, assume that an inventory record has two fields: item number and price. In Pascal, you would have defined it as:

Readability

record inventory_record

item_number: integer;

price: integer;

end;

If you want 100 such records, you would have to declare the array as:

inventory_records:array[1..100] of

inventory_record;

Readability

Therefore, the item number and price of the 8th record would be written as:

inventory_records[8].item_number

inventory_records[8].price

Readability

Their meanings are self-explanatory. However, since FORTRAN does not allow you to declare a record, the array of the records has to be separated into two arrays:

integer itemno(100)

integer price(100)

The item number and price of the 8th record would be written as: itemno(8) and price(8) respectively.

Readability

Unless you are the author of this program, it is unlikely that you will notice that price(8) is the price of the item with item number itemno(8).

5 Syntax of a programming language. Syntax affects readability in two ways:– Identifier forms. If a language allows longer names, the

program would be easier to understand. For example, inventory_level is much more self-explanatory then invlev.

Readability

– Special words.

Consider the C program:

for (int i=1;i<10;i++) {

if (a==1) {

.....

}

}

and the corresponding Ada program:

Readability

for i in 1..10

if a=1 then

.....

end if;

end loop;

The special words end if, end loop are used to end the if block and for block. So it is easier to read then the C version.

Writability

Writability is a measure of how easily a language can be used to write programs.

program writability is affected by the following factors:1 Simplicity and fewer exceptional rules. If a

language consists of a small number of primitive constructs and there are only a few exceptional rules, then it is easier to write because there is less to remember.

Writability

2 Support for abstraction. Abstraction allows programmers to hide the implementation details of complicated data structures and operations, and thus encourages and simplifies the use of these details. For example, a stack can be implemented using pointer or array. The user of the stack would not be affected if this underlying structure were changed.

Writability

– Expressivity. This means that a language has convenient ways of specifying computation. For example, there is a multiple assignment statement in C like:

a=b=c=1;

If this is to be done in Pascal, it has to be written as three separate assignment statements:

a:=1;

b:=1;

c:=1;

Reliability

A program is reliable if it performs its specified operations under all conditions.

the following features can affect program reliability:

Reliability

– Type checking. If the parameters passed to a function are not those expected, then errors will occur. Therefore, it is important to check that the types are correct. Usually, this is done in most languages by restricting the programmer to declare a function first before it can be used somewhere. Languages like FORTRAN, and the original C language do not do any type checking for parameters of function calls. Therefore, they are very unreliable.

Reliability

– Exception handling. This is a mechanism that intercepts run-time errors, takes corrective procedures and then continues the program execution. Without this, a program that encountered errors might stop or continue to do something that would cause damages to the system.

Reliability

– Alias-ing. This refers to the situation of having two or more different referencing methods for the same memory location. Alias-ing significantly reduces program reliability. In Pascal, you can define a variant record like this:

type Shape=(Square, Rectangle);

Dimensions=record

case WhatShape: Shape of

Square: (Side1: real);

Rectangle: (Length, Width: real)

end;

Reliability

In this example, Dimensions can have two forms: one for a Square and the other for a Rectangle. This means that the memory occupied by this record may be accessed by either the Side1 field or the Length field. However, Pascal will not complain when a Dimensions record of a Square is used as Rectangle. This would, of course, be an error because the Length and Width fields would be rubbish.

Reliability

– Readability and Writability. Usually the easier the program to write the more likely it is to be correct. However, we will later show that sometimes writability can decrease reliability. The easier the program is to read, the easier it is to spot errors.

Cost

the seven factors that influence the cost are:– The training cost. If the language is simpler, with

fewer exceptional rules and the programmer is more experienced, then the cost is less.

– The cost of writing programs. Higher writability reduces the cost of writing programs.

Cost

– The cost of running programs. Interpreted languages like Basic have higher running costs than compiled languages like Fortran. But, of course, Basic programs need not be compiled in the first place and therefore have no cost in compilation. There are always tradeoffs between compilation cost and running cost.

Cost

– The cost of the language implementation system. Some languages like OCCAM require a special hardware platform known as transputer to run. Therefore they are more expensive.

Cost

– The cost of reliability. For critical systems, the cost is very high.

– The cost of maintaining programs in the language. Higher readability would reduce this cost as the program could be easily understood and therefore would make enhancement and bug fixing easier.

Cost

– The cost of compiling programs. This includes the time and resources that are required in compiling a program. For example, some compilers have to read in a program source twice (known as 2 passes compilers). They are therefore more time consuming and require more resources than those that have to read in a program only once.

Influences on language design

There are two more factors that can affect the design of a language:– computer architecture– software development methodology.

Computer architecture

Computer architecture may affect the design of a language. For example a 32-bit or a 64-bit processor may affect the size of an integer.

Software development methodologies

There are mainly three kinds of software development methodologies:– Top-down design and stepwise refinement– Object-oriented design– Process-oriented design

Top-down design and stepwise refinement

The main task is subdivided into smaller tasks. These smaller tasks are further subdivided until they are small enough to be implemented as a module or a subprogram. For example, the program for controlling an automatic teller machine can be subdivided into the following sub-tasks: password validation, balance checking, money delivery, updating account. Then these tasks may be further divided.

Object-oriented design

The objects exist in the real world are modeled by encapsulating their attributes and operations. Then the whole system is modeled by the interactions between these objects. For example, one of the objects in the automatic teller machine system is the customer. The attributes of the customer are customer number, name, etc. An operation of a customer is money withdrawal.

Process-oriented design

Different processes are modeled separately and the whole system is modeled by the interactions between these processes. This is mainly used in real time concurrent systems. For example, in a nuclear power plant system, there should be a process that regulates the water temperature and a process waiting to receive commands from a control panel. When the latter process receives a command to regulate the water temperature, it passes a signal to the former process to do the job.

The stepwise refinement methodology

The stepwise refinement methodology promotes the use of subprograms and modules. For example, standard Pascal does not support modules and therefore is not very useful when this methodology is used.

The object-oriented methodology

The object-oriented methodology promotes the use of object-oriented programming languages. Therefore, when this methodology becomes more popular, object-oriented programming languages would also gain in popularity. At this moment, you have at least learned one object oriented programming language, namely Java.

Process-oriented methodology

The process-oriented programming methodology promotes the use of process-oriented programming languages like CSP or Ada.

Language categories

Imperative languages. The procedures on how to perform the computation are stated explicitly, e.g., C, Pascal;

Object-oriented languages. The behavior of different objects are stated explicitly, e.g., Smalltalk, Java;

Language categories

Functional programming languages. The result, but not the procedures, of applying a function to some parameters are stated explicitly, e.g., LISP;

Logic programming languages. The rules that have to be observed are stated explicitly, e.g., Prolog.

Variables, data types and expressions

Variables have six attributes:– name– address– value– type– lifetime– scope.

Variable Name

The maximum length of a name. Short names reduce readability. For example, item_number would be much self-explaining then itemno.

The connector characters that are allowed to be used in a name. You can see that the use of ‘_’ in names would also increases readability. For example, item_number would be easier to read than itemnumber.

Variable Name

The case sensitivity of names. Languages that are case sensitive have the potential of being misused. For example, if Edge and edge are two variables, then both the programmer and reader of the program may be confused.

Variable name

Reserved words or keywords in the language. If a language has too many reserved words, then a program in this language is more difficult to write. This problem is particular severe with COBOL, which has more than 300 reserved words. It is a headache for a COBOL programmer to avoid using a reserved word as an identifier. On the other hand, PL/1 does not have reserved words. It only has keywords.

Variable name

Therefore, you can write statements like this:

IF IF=ELSE THEN THEN=ELSE; ELSE ELSE=END; END

The disadvantage of no reserved words is that if a very common keyword, like IF, is given a new meaning, then the program would be very difficult to read.

Variable name

There are two advantages of having no reserved words:– The programmer does not have the problem to

use an identifier that has been reserved.– If reserved words are used and the language is

extended, then there may be new reserved words. Then existing codes may no longer be valid.

Variable address, type and value

Address is the location of the memory space for a variable. The time when this address is fixed is called the binding time. It can be fixed at load time (this occurs with variables in FORTRAN) or it can be fixed at run time (as for local variables of a sub-program in Pascal or C). A more detailed discussion of binding time is given in the next section of the unit.


Type is the range of possible values of a variable. Primitive data types are defined when the language is designed. User defined types are defined when a program is written.


Some languages do not check whether a value stored in a variable is within the defined range because doing so would greatly decrease the efficiency of the program. Most languages do not check whether a variable has been initialized before it is used. It is therefore the programmer’s responsibility to write extra codes to do this if he/she really wants such checking.

Lifetime and scope

The remaining two attributes of variables are lifetime and scope. The lifetime of a variable begins when memory space is allocated to the variable and ends when the space becomes unavailable. The scope of a variable is the range of statements in which the variable can be referenced.

Lifetime and scope

For example, in this C function:

double area_of_circle(double radius)

{

double area;

area = 3.1415 * radius * radius;

return area;

}

Lifetime and scope

The lifetime of the variable area begins when the function is invoked (when storage is created for area ) and ends when the function returns its value to the caller (when the storage for area is to be returned to the system). The scope of area begins just after it is declared as a double and ends at the end of the function.

The concept of binding

Binding is the abstract concept of the association between an attribute and an entity (e.g. the association of a type to a variable), or between an operation and a symbol (e.g. the association of the multiplication meaning to the symbol *). We call the time when the binding happens the binding time.

Binding

Consider the following C statements:void myfunction(int a)

{

int x;

…………

x=a+3;

…………

otherfunction(x);

}

Binding

Binding

Bindings can happen at any of these times:– Language design time. For example, the meaning

of the operation ‘+’ is defined when the language is designed.

– Language implementation time. The possible range of int is usually defined when the language is implemented. Some C compilers may define int to have the range –32768 to 32767, while others may define it differently.

Binding

– Link time. For example, the relative address of the function otherfunction is bound when the program is linked. The address is relative because it can be loaded into anywhere in the memory when the program is executed.

– Compile time. For example, the type of the variable x is defined when this program is compiled.

Binding

– Load time. For example, the absolute address of the function otherfunction is bound when the program is loaded.

– Run time. For example, the address of the variable x is bound when myfunction is called during execution. Note that if myfunction is a recursive function, then there can be multiple instances of x at one time.

Static binding or dynamic binding

We can distinguish two types of bindings. A binding is called static if it occurs before run time and the binding is unchanged throughout program execution. On the other hand, a binding is dynamic if it occurs during run time or can change during program execution. When we look at the designs of languages, we can distinguish whether bindings are static or dynamic. Each binding type can be examined according to data type and storage.

Type binding

To appreciate the designs of different programming languages and explore type bindings, let’s ask these questions:– How and when is the data type of a variable in a

program decided?– Does the language allow the data type of a

variable changed during program execution?

Static binding

Many programming languages like C, Pascal, FORTRAN and COBOL have static type bindings:– the data type of a variable is defined using

explicit declaration or implicit declaration, and the compiler takes note of the data type. Type binding occurs at compilation time;

– the data type of a variable does not change throughout the program execution.

Static binding

This method of binding has one advantage that the type of a variable is known at compile time and therefore the compiler can detect errors resulting from using incompatible types. For example, consider the following Pascal fragment:

Static binding

var a:boolean;

i:integer;

...

a:=i+1; The Pascal compiler would complain about this

because an integer value is assigned to a Boolean variable. This is possible because the types of the variables are known during compile time and they cannot be changed in the run time.

Dynamic binding

Some programming languages like APL, SNOBOL4 have dynamic type binding:– the data type of a variable is determined at run

time (by the interpreter) — type binding occurs at run time;

– the data type of a variable can change in the course of program execution.

Dynamic binding

Dynamic type binding has one advantage:– It enables us to write generic subroutines, i.e. the

same subroutine can be used for many different variables. For example, the same sorting subroutine can be used to sort different types of variable. On the other hand, if static type binding is used, then usually, a subroutine is only written for a particular type of variable.

Dynamic binding

Dynamic type binding has two disadvantages:– he type checking mechanism of static type

binding cannot operate;– the running cost is higher.

Storage bindings and the lifetime of a variable

In order to see how the designs of programming languages can differ, let’s tackle these two questions:– How and when does the address of a variable get

assigned?– How is the lifetime of a variable determined in the

language?


The lifetime of a variable is the time during which the variable is bound to specific memory locations. It begins when memory locations are taken from a pool of available memory to be used by the variable — allocation. It ends when the memory locations previously bound to the variable are placed back to the pool of available memory — deallocation.


Four categories of variables can be distinguished according to their lifetime:– Static variables — they are bound to specific memory

locations before program execution begins and remain bound to the same locations until program execution ends. In Java, all static attributes of a class are static variables.

public class AClass {

static int staticVar;

}


Stack-dynamic variables — they are bound to storage at run time, when their declaration statements are elaborated. Elaboration of a variable declaration occurs when execution reaches the code that does the allocation of memory spaces and binding to the variable. The memory for stack-dynamic variables is allocated from the runtime stack.


In Java, local variables of primitive types are stack-dynamic, e.g. the local variable sum below. Storage is allocated when the execution reaches the declaration statement, and it is deallocated when execution reaches the end of the scope of the variable.


1. class AClass {

2. public void aMethod(){

3. double sum=0;

4. for (int i=0; i<10; i++){

5. sum+=i;

6. }

7. }

8. }


In the above example, memory is allocated to sum when execution reaches line 3, and is deallocated when execution reaches line 6. Although the storage bindings for stack-dynamic variables are dynamic (the storage is bound and unbound in the course of program execution), their type bindings are still static (the data type is bound at compile time and is fixed during program execution).


Explicit heap-dynamic variables — their storage is allocated and deallocated by using system functions in a program. The memory is usually allocated from the heap. They cannot be referenced by simple variable names, and are referenced using pointer variables. In Java, object variables are explicit heap-dynamic variables. Their storage is allocated when the ‘new’ method is called for a class. In most other languages, pointers are explicit heap-dynamic variables.


Implicit heap-dynamic variables — the storage, type and value are bound at run-time only when they are assigned values. We have an example of this kind of variable in APL:

PRICE ← 48.5


PRICE is only allocated storage when execution reaches the above assignment statement. The data type is also set to be real at the same time.

Type checking

Some languages do type checking while others do not.

Type checking means that the compiler will report errors when the actual type of a variable does not match the expected type.

Strong typing

A language is strongly typed if all type errors are reported either during runtime or compile time.

Fortran is not a strongly typed language because it does not report type errors.

Pascal is a nearly strongly typed language because it reports most type errors.

Java is a strongly typed language because it reports all type errors.

Type compatibility

Consider the following declaration statements:

type arraytype1=array [1..10] of integer;

arraytype2=array [1..10] of integer;

var

A,B:arraytype1;

C: arraytype2;

Type compatibility

There are two fundamental rules for type compatibility:

Name type compatibility. Two variables are of the same type only if they are declared with the same type name. A, B in the above example are of the same type. However, C is not of the same type with A.

Type compatibility

Structure type compatibility. Two variables are of the same type if they are of the same structure. A and C would be of the same type.

A comparison of the two types of compatibility

Writability Name type compatibility has lower writability

because it is more restrictive. Structure type compatibility has higher

writability because the programmer can now treat more data types to be the same and therefore can manipulate them together.


Implementation cost Name type compatibility has lower

implementation cost because it is easy to check two variables to be of the same type.

Structure type compatibility has higher implementation case because it is more difficult to check whether two variables are of the same type.


Reliability Name type compatibility is more restrictive

means that it is less likely to mistakenly assign a value to a wrong variable. So it is more reliable.

Structure type compatibility is less restrictive and therefore is less reliable.

Type compatibility

Few programming languages use strict name type compatibility or strict structure type compatibility. Most use combinations of the two. For example, Pascal uses a slight variation of name type compatibility called declaration equivalence — a programmer may define a data type to be equivalent to another type, then the two data types are compatible even though they have different names. C uses a variation of structure type compatibility, while C++ and Ada use a variation of name type compatibility.

Scope

The scope of a variable is the range of statements in which the variable can be referenced. The next reading discusses the scope method that is used in most programming languages — static scope.

Static scope

In languages with static scoping, the compiler can determine the scope of each variable by inspecting the program.

Static scope

Using Pascal syntax:procedure sub1;

var x, y : integer;

procedure sub2;

var x : integer

begin

x := 1;

y := 2;

end

begin

...

end

Static scope

Since sub2 is nested in sub1, we called sub1 a static parent of sub2. Any static parent of sub2, and their parents, and so forth becomes a static ancestor of sub1.y is called a non-local variable of sub2 since the declaration of y is not located in sub2.

Static scope

An important task of a compiler of a language with static scope is to find the correct declaration of any variable it encounters. For example, in the above Pascal program fragment, when the compiler sees “y:=2”, it has to search for the declaration statement of y. It will try to find the declaration of y in sub2 (the current procedure), then try sub1 and every static ancestor of sub2 until it finds the declaration. In this case, the declaration is in the static parent sub1.

Dynamic scope

With dynamic scoping, the scope of variables can only be determined at run time. The calling sequence of the subprograms in a program will affect the scope of a variable.

Dynamic scope

Compared to static scoping, dynamic scoping has the following disadvantages:

Dynamic scoping is less reliable because all local variables of the calling subprogram are visible from the called subprogram. This would make information hiding impossible. For example, when subprogram A calls subprogram B, then all the local variables of A are visible to B. There is no way to prevent B from changing the values of those variables.

Dynamic scope

The compiler cannot check type compatibility because it does not know where a non-local variable is declared. This information is only available at run time.

Referencing non-local variables is more expensive; The program is more difficult to read because the

identity of a nonlocal variable is difficult to trace by just reading the source program.

Scope and lifetime

Scope is something about 'where' while lifetime is something about 'when'.

Scope is the position in a program where a variable is visible.

Lifetime of a variable is the time when it begin to exist until it is no longer referable.

Scope and lifetime

For example, consider the following C code:

void meth() {

static int a=2;

int b=3;

....

}

The scope of a is the statements after its declaration. The lifetime of it is from the beginning of the program till it ends.

Scope and lifetime

The scope of b is the statements after its declaration. The lifetime of the variable is when the function is invoked till the function ends.

Referencing environments

The scope of a variable is the range of statements that can reference the variables. The referencing environment is just the same concept, but in this case we look from the point of view of a program statement. The referencing environment of a statement is the collection of all variables that are visible in the statement.

Data types

Primitive data types: Primitive data types not only are useful by themselves; they also become building blocks for defining user-defined data types, e.g. record structures, arrays, in languages that allow them. The following primitive data types are commonly available:

Primitive data type

Numeric types — integer, floating-point and decimal. The size of integer is usually one word. The size of floating-point is usually four bytes.

Boolean types — usually has a size of one byte for efficient access.

Character types — usually has a size of one byte except those for Unicode character set.

Primitive data type

The language C is special in that the differences between these three primitive types are very vague. First of all, it has no Boolean types, and variables of both numeric types and character types can be used where a Boolean expression is required.

Primitive data types

Secondly, variables of character types and integer types are interchangeable. The only constraint regarding this is the size difference between an integer variable and a character variable. This philosophy makes the language very flexible. For example, we can change the value of a character variable from ‘a’ to ‘b’ by adding 1 to it.

Primitive data types

With other languages, you have to call a function to do that. The disadvantage is that the type checking mechanism of the compiler is defected because a mixture of different primitive types in an expression is still considered to be valid. This is another example of the conflict between writability and reliability of a language.

Character string types

The key questions that you should ask as you analyse the design of character string types in a programming language are:– Are character strings a primitive type in the language

or are they constructed as an array of characters?– Are character strings in the language declared with

fixed lengths, or can they have variable lengths?– What operations are allowed on the character string

type?

User-defined ordinal types

The two kinds of user-defined ordinal types are the enumeration type and the subrange type. The main advantage of using these types is the improved readability and reliability of the program. However, the enumeration type provided in C only increases readability because the data of enumeration type is internally converted into integer.

User-defined ordinal types

Therefore, function that accepts a parameter of an enumeration type would also accept any integer. Therefore reliability is not increased by using enumeration type in C.

Array types

The key points in the design of array types in a language can be emphasized by asking these questions:– What types are legal as subscripts? Readability

and reliability increase if enumerated types are accepted as subscripts.

Array types

– Are subscripts ranges checked at run time? Some compilers will include run time range checks into generated code to check if an array reference is out of range. Some compilers, including most C compilers, will not. Such checking increases the reliability and running cost.

Array types

– When are subscript ranges bound? Some arrays can have sizes determined at time, others must be determined at run time.

– When is storage allocated? The storage can be bound statically (at compile time) or dynamically (at run time). For dynamically bound array, the storage could be allocated from the stack or from the heap.

Array types

– How many subscripts are allowed? Most modern languages do not put any limit on the number of subscripts.

– Can arrays be initialized at storage allocation? Allowing this would increase the writability because if a language does not have this facility then initialization has to be done with a number of assignment statements.

Array types

Is there a way of defining an array type with no subscript bounds? Consider the case when we need to write a subprogram to sort an array of integers. In Pascal, we would have the following fragment:type arr_type = array [1..10] of integer;

......

procedure sort(var a:arr_type)

begin

.......

Array types

The problem of this code is that sort is only suitable for sorting arrays that are of type arr_type. This means that it cannot be used to sort an integer array of integers that has other than ten members. We would need another procedure for sorting an array with 11 members and one for 12 members, etc. Ada solves this problem by defining an unconstrained array. The same fragment in Ada would be:

Array types

type arr_type is array (Integer range <>) of

Integer;

......

procedure sort(a:in out arr_type)

begin

.......

Array type

– Now, arr_type is an array and its subscripts range is not specified. Now, if we declare two variables A and B as:A: arr_type(0..9);B: arr_type(3..11);

Then both A and B are of type arr_type and therefore can be sorted by using sort. Within sort, the lower and upper bounds of the array can be accessed using different standard attributes of arrays in Ada:

Array type

A’First is the index of the first element in A.

A’Last is the index of the last element in A.

Since C uses pointers to access array, the problem does not apply. However, there is a problem of getting the size of the array. Therefore, in C, we have to explicitly pass the size of the array to the function. Therefore, the same fragment in C would be:

Array Type

void sort(int *a, int size) {

.. .. .. ..

}

We can see that if there is a way of defining an array type without bounds, then the writability would be increased.

Row-major order

In row-major storage, a multidimensional array in linear memory is accessed such that rows are stored one after the other. It is the approach used by the C programming language as well as many other langauges, with the notable exception of Fortran. When using row-major order, the difference between addresses of array cells in increasing rows is larger than addresses of cells in increasing columns.

Row-major order

For example, consider this 2×3 array:

1 2 3

4 5 6

Declaring this array in C as

int A[2][3];

Row-major order

would find the array laid-out in linear memory as:

1 2 3 4 5 6

Row-major order

The difference in offset from one column to the next is 1 and from one row to the next is 3. The linear offset from the beginning of the array to any given element A[row][column] can then be computed as:

offset = row*NUMCOLS + column

where NUMCOLS represents the number of columns in the array—in this case, 3.

Row-major order

To generalize the above formula, if we have the following C array:

int A[n1][n2][n3][n4][n5]

Then, the offset of the element A[m1][m2][m3][m4][m5] are:

offset = m1*n2*n3*n4*n5+ m2*n3*n4*n5+m3*n4+m3*n4*n5+m4*n5+m5

Column-major order

Column-major order is a similar method of flattening arrays onto linear memory, but the columns are listed in sequence. The programming language Fortran uses column-major ordering.

Column-major order

The array

1 2 3

4 5 6

7 8 9

if stored in memory with column-major order would look like the following:

1 4 7 2 5 8 3 6 9

Column-major order

With columns listed first. The memory offset could then be computed as:

offset = row + column*NUMROWS

Where NUMROWS is the number of rows in the array.

Column-major order

To generalize the above formula, if we have the following C array:

int A[n1][n2][n3][n4][n5]

Then, the offset of the element A[m1][m2][m3][m4][m5] are:

offset = m1+ m2*n1+m3*n2*n1+m4*n3*n2*n1+m4*n3*n2*n1+m5*n4*n3*n2*n1

Example

Consider the following array:

int A[3][7][8];

Assume that A[0][0][0] is at address 20000. What is the address of A[2][3][4]

(i) if row-major order is used?

(ii) if column-major order is used?

Example

(i) an integer has 4 bytes, so the address of A[2][3][4] is:20000+(2*7*8+3*8+4)*4

(ii) if column-major order is used, the address is:20000+(2+3*3+4*3*7)*4

mt311 java programming and programming languages li tak sing ( 李德成 )

Documents