midterm solhints

Upload: port

Post on 02-Jun-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Midterm SolHints

    1/3

    McMaster University

    Department of Computing and Software

    Dr. W. Kahl

    COMP SCI 2GA3, SFWR ENG 3GA3

    Midterm Solution Hints

    2013-10-30

    Computer Architecture

    30 October 2013

    1 CPU Speed Calculations 10%

    Consider two different implementations, M1 an d M2, of the same instruction set. There arethree classes (A, B and C) of instructions in the instruction set.

    M1has a clock rate of 1500 MHz, and M2has a clock rate of 2000 MHz. The average numberof cycles for each instruction class on the two machines are as follows:

    Instruction Class CPI on M1 CPI on M2A 2 3B 4 4C 5 6

    (a) If the number of instructions executed in a certain program is divided equally amongthe classes of instructions, how much faster is M2 than M1?

    Solution Hints: Find the CPI of each machine first. CPI for M1 is 113 ; CPI for M2is 13

    3

    CPU time for M1 is InstructionCount

    11

    3

    1500MHz

    CPU time for M2 is InstructionCount

    13

    3

    2000MHz

    M2 has a smaller execution time, and is faster by the inverse ratio of the executiontime, or 112000

    131500 1.1282.

    (One could also say that M2 is about 12.82% faster than M1.)

    (b) Assuming the instruction distribution from (a), at what clock rate would M1 have thesame performance as the 2000 MHz version ofM2?

    Solution Hints: M1 would be as fast if the clock rate were higher by a factor of

    1.1282.1500MHz 1.1282 = 1692MHz

    2 Amdahls Law 10%

    You are going to enhance a machine, and there are two possible improvements: you couldmake memory write instructions run twice as fast as before, or you could speed multipli-cation instructions up by a factor of three. You repeatedly run a program that takes 100seconds to execute.

    Of this time, 10% is used for memory writes, 20% for multiplication, and 70% for other

    tasks.(a) What will the speedup be if you improve only memory writes?

    (b) What will the speedup be if you improve only multiplication?

    (c) What will the speedup be if both improvements are made?

    Solution Hints: Using Amdahls Law:

    (a) Speedup for memory writes = 100102 +20+70

    1.0526

    (b) Speedup for multiplication = 10010+

    20

    3 +70

    1.1538

    (c) Speedup for both = 100102 +

    20

    3 +70

    1.2245

    3 MIPS Assembly Programming: signum 15%

    The mathematical sign function is defined as follows: signum(x) =

    1 ifx >00 ifx =0

    1 ifx

  • 8/10/2019 Midterm SolHints

    2/3

    4 Adding an Addressing Mode 15%

    In this question, we examine quantitatively the pros and cons of adding an addressing modeto MIPS that allows offsets to come from registers; for example

    sw $s1, $s2($s3)

    is now legal, and stores the contents of register $s1 into memory at the address obtainedfrom adding the register contents of$s2and $s3.

    Since this instruction reads from three registers, instead of from at most two like all con-ventional MIPS instructions, it needs significantly more wiring and logic circuitry in andaround the register file.

    The primary benefit is that fewer instructions will be executed because we wont have tocalculate variable-offset addresses via an adduinstruction before issuing thelwor swinstruc-tion.

    For simplicity, we assume that the primary disadvantage is that the cycle time will have toincrease to account for the additional time to perform register access.

    Assume that the new instruction will cause the cycle time to increase by 10%. Use theinstruction frequencies for the P1 benchmark from the table below. Assume that the newaddressing mode affects only the clock speed, not the CPI. What percentage of data transfer

    instructions must be transformed into the new instructions, assuming each such transforma-tion saves one add, to have at least the same performance?

    FrequencyI ns tr uc ti on c la ss MI PS e xa mpl es H LL co rr es po nd en ce P1 P2Arithmetic add, sub, addi operations in expressions 50% 45%Data transfer lw, sw, lb, sb references to data structures,

    such as arrays30% 47%

    Conditional branch beq, bne, slt, slti ifstatements a nd lo ops 1 5% 7%Jump j, jr, jal procedure calls, returns, and

    case/switch statements5% 1%

    Solution Hints: Let the program have ninstructions, the original clock cycle time be t,and Xbe the ratio of load and store instrructions transformed and eliminating an add a tthe same time.

    Then we have:

    The original program had 0.3 n data transfer instructions and 0.45 n arithmeticinstructions.

    The transformed program also has 0.3 n data transfer instructions, but only (0.450.3 X) narithmetic instructions.

    execold = n CPI t

    execnew = (1 X 0.3) n CPI 1.1 t

    Therefore:

    execnew execold

    Def. execold, execnew(1 X 0.3) n CPI 1.1 t n CPI t

    Isotony of multiplication, with n CPI t >0(1 X 0.3) 1.1 1

    Isotony of division, with 1.15 > 01 X 0.3 1

    1.1

    Isotony of addition1 10

    11 X 0.3

    Isotony of division, with 0.

    3>

    01110.3

    X

    Arithmetic: 1033 =0.30

    0.30 X

    We need to transform at least 30.4% of the data transfers. (If we transform exactly 30.3%,the new program will still be slower on the new machine than the old program on the oldmachine.)

    5 Floating-Point Representation 15%

    Assume that $s3 contains the base address of array a. Consider the following assemblyfragment:

    lui $t0, 0x64CE

    srl $t1, $t0, 24

    addu $t2, $t1, $s3

    s w $t 0, 0( $t 2)

    This can be understood as implementing the following pseudocode, with an int constant iand a floatconstant f:

    a[i] := f;

    Determine the decimal values of the index i and (possibly using decimal fractions d1d2 , so nocalculator is necessary) of the floating-point number f.

    Document the intermediate states and the bit pattern of the floating point representation of f .Solution Hints:

    lui $t0, 0x64CE# t0 = 0x64CE0000srl $t1, $t0, 24# t0 = 0x64CE0000, t1 = 0x64addu $t2, $t1, $s3# t0 = 0x64CE0000, t1 = 0x64, t2 = &a[25]s w $ t0 , ( 0) $t 2# t0 = 0x64CE0000, t1 = 0x64, t2 = &a[25], a[25] = 0 11001001 10011100000000000000000

    i =25 and f =(1)0 (1 + 3964) 2201127 =(1 + 3964) 274 1.609375 274 3.0400 1022

  • 8/10/2019 Midterm SolHints

    3/3

    6 C to MIPS Assembly 45%

    The following C function definition is to be translated to MIPS assembly code, followingthe standard MIPS conventions for subroutine memory allocation and argument and resultpassing:

    in t f ( in t k , in t [ ] A ){

    in t i = 0 ;in t s = 0 ;

    while ( i < k ){ A [ i ] = 2 i + 1 ;i f (2 i > k )

    A[ i 1 ] = s i ;e l s e

    s = s + A [ i ] ;i = i + 1 ;

    }return ( s + 7 ) ;

    }

    (a) Document which variables will be stored in which registers.

    (b) For the C function definition above, produce equivalent MIPS assembly code. Striveto use a minimal number of instructions, and using a minimal number of registers.

    (c) How many registers did you use?

    Solution Hints:

    (a) We can store s in the return value register $v0, but a temporary would of course bepossible, too.

    k $a0

    A $a1

    s $v0

    i $t0

    If any variables are stored in $s*registers, these need to be saved to the stack first!

    (b) One possible solution:

    f: addi $t0, $zero, 0 # i := 0addi $v0, $zero, 0 # s := 0

    While: slt $t1, $t0, $a0 # t0 := (i < k)

    beq $t1, $zero, Done

    sll $t1, $t0, 1 # t1 := 2 * i

    s lt $ t4 , $ a0 , $ t1 # t4 := (2 * i > k) , for fu tu re

    addiu $t1, $t1, 1 # t1 := 2 * i + 1

    srl $s1, $s0, 31 # s1 := (x < 0)

    sll $t2, $t0, 2 # t2 := 4 * i

    a dd u $ t3 , $ a1 , $ t2 # t3 := &A [i ]

    sw $t1, 0($t3) # A[i] := 2 * i + 1

    beq $t4, $zero, Else # if (2 * i > k) ...

    subu $t2, $v0, $t0 # t2 := s - i

    s w $ t2 , - 4( $t3 )

    j Incr

    Else: addu $v0, $v0, $t1 # s = s + A[i]; # s = s + 2 * i + 1;

    Incr: addiu $t0, $t0, 1 # i := i + 1

    j While

    Done: addiu $v0, $v0, 7

    jr $ra

    (c) Beyond the argument and result registers$a0, $a1, and $v0, the solution here uses five

    more: $t0

    to$t4

    .