040304_mmu

Upload: antonio-eleuteri

Post on 07-Jul-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/19/2019 040304_MMU

    1/8

    Programming an MMU

    Though their complexity can be intimidating, memory-management units offer powerful

    debugging capabilities. As explained last month ("Using an MMU," Embedded Systems

    Programming, Feb. 1991, pp. 38-55), a memory-management unit (MMU) can be very

    helpful in tracking down errant C pointers.

    This month, we'll examine details of programming the 68030 MMU and explore the code

    for an example project using a Motorola MVME147 VMEbus CPU board. You also

    might want to review last month's article, grab a copy of Motorola's MC68030 user's

    manual, and download the example code from CompuServe (Library 12 of CLM-

    FORUM) or the Embedded Systems Programming BBS at (415) 905-2689.

    Last month's discussion ended with a system design that used a four-level translation table.To move on to the implementation, we'll need to consider table descriptors in more detail.

    (The 68030 user's manual includes a reasonable introduction to table descriptors.) In this

    design, we won't be using the short-format (4-byte) descriptors because they lack some

    necessary control bits. The four table descriptors we'll be using are:

    * LFPD, the long-format page descriptor, which is used for page tables.

    * LFTD, the long-format table descriptor, which is used for pointer tables.

    * LFID, the long-format invalid descriptor, which is a descriptor for pointer tables and

     page tables used to map areas of nonexistent memory.

    * LFET, the long-format early-termination page descriptor, which is used in pointer tables

    to describe regions of pages.

    Table 1 shows the fields within the various descriptors. An EFTD consists of several

    fields: a table-address field, a limit field, and a status-bit field. LFP0s and I. FEls have a

     page-address field and a status-bit field. An LFID has only one bit, which marks it as an

    invalid descriptor.

    TRANSLATION CONTROL

    The 68030 user's manual and the example code accompanying this article show the exact

    layout of each descriptor field and should give you some insight into using the translation-

    control register (TC). The TC register defines the translation table's structure. The

    register's page-size field is of particular interest in our design. Proper page-size selection is

    important. It should be the largest value that allows fine enough granularity for the

    application requirements.

    http://1991.pdf/http://1991.pdf/http://1991.pdf/http://1991.pdf/http://40304t1.pdf/http://40304t1.pdf/http://1991.pdf/http://40304t1.pdf/

  • 8/19/2019 040304_MMU

    2/8

    Certain aspects of the application affect the page-size selection. If you have small adjacent

    regions that need to be mapped differently, the page size will have to be small enough to

    fit those regions. Large pages may waste memory space. For example, if you have large

     page sizes and your code area becomes large enough to end in the front part of a page, the

    rest of the page cannot be used as read-write data and must be left unused. Large pages

    mean smaller translation tables because fewer logical bits are needed to index down to page level. Other criteria, such as the appropriate size to swap to disk and so on, are used

    in virtual-memory systems but don't apply here.

    Our example design arbitrarily uses a page size of 4 kbytes to keep down the page-table

    size. You can choose any page size and modify the example tables as required.

    SEARCHING THE TABLES

    Before laying out the translation table, we need to examine how the MMU uses the IC

    register and translation table to create translations. When the application issues a 32-bit

    logical address, the processor sends the address and control information to the MMU. If the translation is not in ATe, the CPU stops processing instructions while the MMU

    resolves the translation. Once the translation is resolved, the CPU retries the instruction,

    which now succeeds in making a cache hit.

    In our case, FCL and SRE are disabled, so the MMU uses CRP to access the translation

    table, then points to the first-level table. Starting from the logical address's high bit, the

    MMU takes the number of bits specified by TIA and uses that value as an integer index for 

    the first-level table. This action may or may not index a valid descriptor. If it doesn't, the

    search terminates and a bus exception occurs. If the descriptor is a valid table descriptor,

    the MMU takes TIB bits from the logical address, starting at the next highest bit not used

     by TIA, and indexes the second-level table. It does the same for TIC and TID. At the finallevel, it must reach a page descriptor (some exceptions exist, but they are not used here).

    The page-descriptor mapping is transferred to ATC and the bits that were not used as

    table indexes specify an offset into the selected page.

    THE TRANSLATION TABLE

    Selecting the indexing used in TIA ..TID is important for the translation table's structure,

    and depends on what the memory map looks like. An important formula is that the sum of 

    IS + TIA + TIB + TIC + TID + PS = 32 bits. We have already selected IS = 0 and PS =

    12, which leaves 20 bits to be accounted for by the TIx registers. In selecting the TI

    register values, you want the regions they map to align with the physical layout of memory

    and I/O devices.

    In the diagram below, an a marks a bit that is used to index the A-level table, a b indexes

    the B-level table, and so on. An 0 marks a bit that is used as an offset within the page.

    Figure 1 shows the tree structure associated with the mappings.

    http://40304b.pdf/http://40304b.pdf/http://40304b.pdf/http://40304b.pdf/

  • 8/19/2019 040304_MMU

    3/8

    10987654321098765432109876543210

    aaaabbbbccccddddddddoooooooooooo

    TIA = $4 are the upper four bits of the 32-bit address. They index the A-level table,

    effectively dividing the address space into 16 256-megabyte regions, giving a first-level

    translation-table size of 16. This division is convenient for the MVME147, because thefirst table entry will map the DRAM in low memory, the last table entry will map the

    EPROM and I/O space in high memory, and all intermediate table descriptors can be

    marked as invalid. Now we only have two B-level tables to take care of.

    TIB = $4 are the next four bits of the 32-bit address. They index the B-level table,

    effectively dividing each 256 megabyte region into 16 regions of 16 megabytes each. This

    method works the same as in the A-level table, further isolating the low DRAM area and

    high EPROM and I/O area. Again, we use invalid descriptors to minimize the number of 

    lower-level trees. This mapping will force two C-level trees. The descriptor that maps the

    DRAM region will have L/U bit = 0 to indicate an upper limit. The LIMIT field will be set

    to 4, to limit the C-level table describing this region as four entries. Now all four megabytes on the MVME147 board are mapped.

    TIC = $4 are the next four bits of the 32-bit address. They index the C-level table, dividing

    each 16-megabyte region into 16 1-megabyte regions. The C-level table mapping the

    DRAM region will have four entries, while the C-level table mapping the EPROM and I/O

    space will have 16 entries. At this point, the table mapping the DRAM region will have

    two early-termination page descriptors (LFET) for the second and third megabyte, which

    align evenly with the application's memory map. Those levels will not require a D-level

     page table. Instead, they will map their entire megabyte with a single descriptor and its

    access controls. This feature dramatically limits the overall page-table size, and can be

    used whenever a memory area aligns evenly (top and bottom) with the region mapped by a particular table descriptor. In addition, the table mapping the high memory area will use

    early-termination descriptors to map the EPROM and I/O space. The EPROM space is 4

    megabytes and uses four C-level LFETs. The I/O space is less than a megabyte but has a

    full megabyte region mapping it as supervisor-only, cache-inhibit (CI). The CI is important

    for I/O devices because you don't want the CPU data cache trying to remember what it

    read from an I/O device. You could also use CI for global memory between processors on

    the bus in a multiprocessing system.

    In this case, the I/O space (including VME-bus short I/O) doesn't align evenly with the

    megabyte mapped by LFET. It should have a complete D-level page table with 256 entries

    mapping with 4-kbyte granularity to mark the appropriate valid and invalid (nonexistent)

     pages in that region. However, the MVME 147 will generate bus errors for invalid

    accesses in this region because no devices are present. Consequently, a finer granularity is

    not required, and we save a 256-entry page table.

    TID = $8 are the next eight bits. They index the D-level page table for tree branches that

    need a D level. We now have a 4-kbyte granularity in mapping the region, which is

  • 8/19/2019 040304_MMU

    4/8

    required for some parts of the application memory map. Tables on this level have 256

    entries unless limited. In this case they are not. Since a 256-entry table is too big to fit as a

    figure here, refer to the example cede for the complete mapping. It maps the regions

    exactly as described in Figure 1.

    The remaining 12 bits are offsets within a page. They are used directly, withoutinvolvement in the translation process.

    It's important to lay out a translation-table diagram on paper. Then, you can juggle TIx

    sizes and evaluate early terminations. I use the diagramming method shown in Figure 2 to

    get a clear picture of the tree. This method also translates very well to the code used to

    implement the translation table. In this method, the addresses being mapped by a particular 

     branch are progressively defined at each level until the final page descriptor is reached.

    ADDRESS TRANSLATION

    This design uses a direct logical-to-physical mapping scheme. with this type of staticmapping, it is possible to go further and design a scheme that does real translations. In the

    static method described here, adding some real translations would enable you to move

    memory blocks and rearrange the address space to something more convenient. I call this

    remapping. Remapping is primarily used in situations where you have several

    discontiguous DRAM regions and want to remap the regions so they appear more logical

    and contiguous.

    In Figure 1, several discontiguous regions were mapped logical = physical, which resulted

    in empty spaces between the regions. We could have used last month's translation-tree

    structure, set up the C-level tree's page descriptors with the physical addresses of the

    existing RAM, and ignored the empty spaces.

    Once you get the hang of the translation-tree mechanism, it is a simple extension of the

    methodology to start moving pages around until the logical memory map suits you. Just

    remember that the tree structure describes the logical memory map and the page

    descriptors at the tree's leaves tree can be pointed to any physical page by setting its

    address field appropriately. This leap to remapping has some drawbacks. In this case,

    remapping is not used because:

    * The RAM memory is contiguous.

    * Relocating the EPROM and I/O areas to a different location offers few benefits. Moving

    them would kill the de-bug monitor.

    * Relocating physical memory means the documentation for the CPU board no longer 

    matches the logical memory model, which adds complexity to the de-sign. Once things are

    moved around, an additional layer of documentation must be available to describe the new

    logical-memory space.

    http://40304b.pdf/http://40304b.pdf/http://40304b.pdf/http://40304b.pdf/http://40304c.pdf/http://40304c.pdf/http://40304c.pdf/http://40304b.pdf/http://40304b.pdf/http://40304b.pdf/http://40304c.pdf/http://40304b.pdf/http://40304b.pdf/

  • 8/19/2019 040304_MMU

    5/8

    * Remapping means that the mapping scheme is no longer transparent. Programs that run

    without mapping won't work with mapping on, and vice versa.

    The system designer must decide if re-mapping is worth the additional complexity. In this

    case, it's not, so I didn't use it.

    DEFINING DATA STRUCTURES

    Basically, you have two available methods for setting up your translation table: at run time

    or at compile time. Most heavy-duty operating systems do it at run time, and the table

    changes continually. In this project, a static compile-time definition does the trick. Getting

    a static definition is simply a matter of laying out the appropriately initialized data

    structure. Avoid future problems by using a systematic, comprehensible method.

    This application is written mostly in C. However, it is easier to define the translation tables

    in assembler. It is important that all the bits are in the right place. You also need finecontrol over some addressing issues, such as the fact that tables need to start on 16-byte

     boundaries. That structure is hard to force in C without some artificiality.

    If you prefer C, you need to translate the structures as a combination of arrays and structs,

    then resolve the alignment issues with some sort of padding. (For more thoughts on this

    subject, see P.J. Plauger's "State of the Art" column this month--Ed.)

    To make the tables easy to construct, I use macros to define the individual descriptor 

    types and a systematic labeling convention to define tables and their addresses. To set up

    the table, start with the root pointer, which is an 8-byte structure. The root pointer points

    to the A-level table. Once the root pointer is set up, each succesive level is constructed. Idid it all manually here, but you could easily write a C program or two to spit out the text

    for big tables.

    The macro definitions and some predefined status-bit symbols are in mmu.inc. I use a

    labeling convention to address the tables because it is easy to follow the table's structure in

    the assembly language source file that contains the definitions. The convention must have

    a two-letter prefix that defines the table's level and type. AT is pre-fixed to the label on the

    A-level table. BT is prefixed to all B-level table labels. After the prefix, I use an eight-digit

    hex address to define the memory region's base address. The label on each table is then

    used as the TADDRESS field in the next level table. PADDRESS values must specify the

    true physical address of the page they map.

    Here is an example of how to use the labeling convention and macros to define a four-

    entry A-level table and the subsequent B level tables:

    * BASE OF REGION FOR A

    * TABLE IS ALWAYS $00000000

  • 8/19/2019 040304_MMU

    6/8

    ALIGN16 TABLE_SECTION AT0000000:

    LFTD MMU_LLIM,4,BT00000000 * POINT TO

      * B TABLE

    LFTD MMU_LLIM,4,BT40000000 * POINT TO

      * B TABLE

    LFID * INVALIDLFID * INVALID

    BT00000000: * * EARLY

      * TERMINATION

     LFET MMU_LLIM,8,MMU_SUPV,MMU_CENA,

      MMU_RDWR,$00000000

    . . .

    BT400000000:

     LFTD MMU_LLIM,8,CT400000000

     LFTD MMU_LLIM,8,CT500000000

    ...

    You must ensure that the translation-table tree's structure has the same number of index

     bits in each level as the TC registers. The complete translation-table definition for the

    example mapping specified is in mmu.asm.

    MMU INITIALIZATION CODE

    Once the translation table is set up, the final cede sequence required is the MMU

    initialization code. This code gives the MMU registers their appropriate values. The MMU

    initialization is very short and sweet. The CPU root pointer register is loaded with a

     pointer to AT0000000, the two transparent translation registers are disabled, and the

    MMU is enabled with the TC register.

    The MMU registers are loaded with the 68030's unique PMOVE instruction. This

    instruction moves the appropriate data structure from memory to the MMU. For example,

    the CRP is initialized from a data structure in memory that is 64 bits long and similar to a

    table descriptor. This instruction has no immediate mode, so you must set up the values in

    memory, then PMOVE using a permitted addressing mode. In this case, I use address

    register AO indirectly.

    If you try to load the MMU registers with some sort of invalid value, you will get an

    MMU configuration exception. That error is pretty fatal, and should be caught by the

    debug monitor in the test phase. No handler for that exception is provided here.

    A 68020 assembler that doesn't have explicit 68030 support won't swallow the PM0VE

    opcode, so you'll have to assemble it by hand. I used that method here and it wasn't

     particularly difficult.

  • 8/19/2019 040304_MMU

    7/8

    If your board has an on-board monitor, it is quite easy to test the MMU using some of the

    monitor functions. The procedure to test manually is:

    * Assemble and link a module that contains the MMU-initialization code and the

    translation-table definition.

    * Download the module (s-records) to the appropriate memory location.

    * Single-step through the MMU-initialization code. At this point, the MMU is enabled.

    The idea is to try to use the monitor to access memory regions in ways that work and in

    ways that don't work. The monitor is only a piece of software and is subject to the MMU-

    translation mechanism.

    * Test the supervisor-mode read-only areas with the memory-modify and memory-display

    commands. This test should generate a bus error. Attempt to read and write from user 

    mode. Since monitors usually execute in supervisor mode, you might need to assemble an

    instruction or two that access the test areas, then execute them in user mode. For example,if you use:

    MOVE.B (A0),D0

    MOVE.B D0,(A0)

    you can set A0 to different addresses and test them by single-stepping in user mode. Both

    attempts in user mode should fail with a bus error.

    * Test the supervisor mode read-write areas. The easiest way is to use the de-bugger's

    memory-display and memory-modify commands. Now attempt to read and write from user 

    mode. Both attempts in user mode should fail with a bus error.

    * Attempt to access invalid areas, using the previously discussed procedure. The attempts

    should fail. When you have completed this test, reset the board or execute a PM0VE to

    TC that will disable the MMU.

    If this manual test works, the MMU will catch these bad accesses when you run the real

    application with MMU support. The example code contains an assembler program that

    tests appropriate accesses for the supervisor and user mode. (See the comments in the

    source-code file test. asm.)

    BUS-ERROR HANDLER 

    Before writing a test application, you must decide how to handle bus errors. If you have a

    monitor, you can let the bus error abort to the monitor. If the final application doesn't have

    a monitor, a bus-error handler is required. For this type of system, the handler is relatively

    simple if a bus error is considered a fatal exception. Motorola's 68030 user's manual

    describes how to construct a comprehensive MMU bus-error handler. If you consider a

  • 8/19/2019 040304_MMU

    8/8

     bus error a fatal fault, you can log the occurence to the screen or other output device and

    take appropriate action. In this example, we used the debug monitor's exception handler,

    which provides enough information to determine what caused the problem.

    MMU SUPPORT

    Putting MMU support into a real application is easy. Just include the table definition and

    initialization code in the load module and execute the MMU initialization whenever you

    want to turn on the support. It should not be turned on prior to setting up whatever bus-

    error handler you intend to use, but anytime later is fine.

    When compiling and linking the application, be sure to place the appropriate code and

    data in the appropriate places. In this example, I have some stubs in an assembler file that

    use the assembler's SECTION placement capabilities to force the different pieces into the

    right place. Then, the command file sets up the proper memory map.

    BY DAVID M. HOWARD

    David M. Howard is a senior software engineer at Sierra Nevada Corp. in Reno, Nev. He

    can be reached on Usenet at [email protected].