Optimizing global variable accesses by using direct addressing mode

  • warning: Illegal string offset 'data' in /home/htdocs.gnu-m68hc11/blog/includes/tablesort.inc on line 110.
  • warning: Illegal string offset 'data' in /home/htdocs.gnu-m68hc11/blog/includes/tablesort.inc on line 110.

The 68HC11 and 68HC12 micro-controllers support a direct addressing mode that gives access to the lower 256 bytes of the address space. This addressing mode is sometimes called page0 addressing mode. It is optimized to produce short instructions in most cases: it only requires one byte to represent the address instead of two bytes for the extended address mode. The page0 addressing mode as well as the extended addressing mode are used in general to access global variables. The discussion explains how you can tune your applications to use this direct addressing mode.

Let's suppose you have some global variable and a function that uses it. For example:

short counter;

void increment()
{
counter++;
}

the compiler will generate instructions that use the extended addressing mode (2 bytes). After a final link, if we put the counter variable at address 0x3000, the program will look like this (8 bytes):

68HC11
increment:
f000: fe 30 00 ldx counter
f003: 08 inx
f004: ff 30 00 stx counter
f007: 39 rts

Now if we put the counter variable in page0, it would be good that a direct addressing mode be used. The compiler and linker can handle this case. We just have to tell it! For this, it is possible to specify a section in which the compiler must put the global variables. The section will be mapped in memory by the linker. A special section called .page0 exists to represent the first 256 bytes (page 0). You may define your own sections and map them in memory as you wish (but this is another story!). You will specify that a variable goes in a particular section with a GCC __attribute__ on the variable:

short counter __attribute__((section(".page0")));

In the generated assembly file, your variable will have the following definition:

.section .page0,"aw",@progbits
.type counter, @object
.size counter, 2
counter:
.zero 2

This assembly definition defines a global variable counter which occupies 2 bytes and which is located somewhere in the .page0 section. We don't know the variable address: it will be defined by the linker during the final link. If you re-compile your program you will also notice that nothing else has changed. The compiler seems to still generate the extended addressing mode. This is right!

So, does the __attribute__ section really helps?

If you are using the 68HC11, yes it does!!! You are in fact lucky that the linker implements something called linker relaxation. At the link time, the linker will notice that the counter variable is in page0. It will then change the ldx and stx instructions to use the direct addressing mode, thus reducing you program by one byte for each instructions.

If you are using the 68HC12, it will not help. Sad but true. The linker relaxation is just not implemented. We must find and need something else!!!

So, we have not solved our problem or at least partially. Could we do better? Let's see.

The GCC compiler implements a specific attribute that tells it to put the variable in the good .page0 section but also that it can use the direct addressing mode to access the variable. You will declare your variable as follows:

short counter __attribute__((page0));

This time the generated code will be different and it will use the direct addressing mode. This is so for 68HC11 as well as 68HC12 (and of course the 68HCS12 derivatives). The final code for 68HC11 will not change but 68HC12 it will be optimized into:

68HC11
00008031 :
8031: dd 00 ldy *counter
8033: 1a 41 leax 1,Y
8035: 5e 00 stx *counter
8037: 3d rts

So, now it works and we could optimize our global variable addressing modes.

Guess what, the compiler can generate other optimized instructions when direct address mode is available. It can optimize the generation of bset and bclr instructions as well. Lets consider the following piece of code:

unsigned char mask __attribute__((page0));

void flag_it()
{
mask |= 0x10;
}

Because this is a global variable that is only one byte wide, the compiler can use the bset instruction to set the flag. Otherwise, the compiler is forced to load the value in a register, perform the bitwise or, and save back the value in memory.

68HC12
00008038 :
8038: 4c 02 10 bset *mask
803b: 3d rts

For 68HC12, putting some global variables in page0 has however one caveat. You have to map the IO registers somewhere else. By default they are mapped at address 0 and they would conflict with your global variables.

For 68HC11, we are lucky because we can manage so that both IO registers and some global variables can be accessed from page 0. For this, you must just be careful when you define the memory layout of your board. Basically, you must define the page0 memory bank in such a way that it does not overlap the IO registers. For example:

MEMORY
{
page0 (rwx) : ORIGIN = 0x040, LENGTH = 0x100 - 0x40
}

Now we know how to optimize. Fine. But which global variable should I put in page0 you are asking? Let's see how to do it.

Before, let's have a look at ELF files. The ELF file is the binary format in which the assembler stores the result of the assembled program: the code, data, symbols, debugging information, and,... relocations! What is a relocation you are asking! The relocation is a kind of marker that tells the linker to adjust (fix) the code so that it uses the final address for symbols. If you take an object file produced by the assembler and dump the relocation section, you will see lines that looks like:

m6811-elf-objdump -r file.o
...
00001043 R_M68HC11_8 _.frame
...
00001de1 R_M68HC11_16 _io_ports
...
00002b31 R_M68HC11_16 time_usec
...

The above relocation extract tells us that the symbol _.frame requires a 1-byte wide relocation in the code section at offset 0x1043. The R_M68HC11_8 represents the relocation type. This one corresponds to a page0 relocation. If the _.frame variable is put at address 0x10, the linker will adjust the code at relative position 0x1043 and add 0x10 there. The instruction that uses this variable will use the good address. The extract also shows us two relocations for global variable time_usec and _io_ports. The relocation is a R_M68HC11_16 which indicates that a 16-bit address is required and must be adjusted.

How can this relocation information help us?

Well, its easy. The relocation tells us exactly what we want: how often is a variable used. If we can count the R_M68HC11_16 relocation for each symbol, sort them by use count, we know which variable we can optimize. Let's do it.

The perl program attached to this article parses the output produced by m68hc11-elf-objdump -r, identify the symbols, count them and sort them.

When you run this script as follows:

m6811-elf-objdump -r file.o | perl page0-guess.pl

it produces an output similar to:

_.frame 1040
_.d1 377
_.d2 279
_.xy 147
_.d3 139
_.z 121
_.d4 114
_.tmp 102
_io_ports 94
time_usec 28
value 25
current 22
__assert 19
total_adjtime 18
panel_putchar 16
timer_create 16
time_sec 15
put_str 15
put_char 12
fast_timers 12
_timer_current_overflow 11

Of course, this list includes some global variables but also some functions
(__assert, panel_putchar, ...). But the following are variables that can be put in the page0:

_.frame _.d1 _.d2 _.xy _.d3 _.z _.d4 _.tmp _io_ports
time_usec value current time_sec fast_timers _timer_current_overflow

By knowing the usage of your global variables you can optimize their access by turning them to direct address modes if you wish. You can analyze whether mapping your IO registers in page0 is good for you or not. You may also use the attach script to identify the functions which are most often used and organize them in memory (put them in non-banked memory if you can) and so on.

Well, the ball is yours now!!!

Stephane

AttachmentSize
page0-guess.pl.txt767 bytes

m6812 does emit bset/bclr

talmy,
I'm not sure what you mean about adding '*' to instructions that can't take that addressing mode. Did you change gcc and recompile or just configure somehow?

I finally tried this with gnu-m68hc11 3.1 using other addresses, and you are right, it worked with a non-page0 address too. Thanks for pointing that out. Now I don't know why it was only in page0 before. Maybe older release.

#define CNSTADDR *(volatile unsigned char *)(0x20cb)

CNSTADDR |= (1<3);

40b9: 1c 20 cb 08 bset $20cb,#$08

Oh, first I didn't realize you switched to branch tests (brset) in second part. I thought first it was not wrong what you wrote, but I see it. The reason for testing it as 16 bits was to work around a bug that was found I think when a for loop didn't work right. Will you submit a bug report?

The gnu-m68hc11 assembler

The gnu-m68hc11 assembler syntax allows the * symbol to indicate direct addressing mode. The problem is to get gcc to generate those instructions (which can be done by specifying page0) but only for instructions which support that mode. As it is the C compiler will emit "ldaa *10" for instance, but also "movb #3, *10" which gives an assembler error. But getting the compiler to emit these in the first place requires contortions. If only gcc supported the "__at" operator which is in the other cross compilers I've used!

Currently I'm trying a peephole optimizer I wrote to fix these optimization problems. I don't know about filing a bug report. Is there anyone actively working on the the HC12 compiler?

why not the page0 attribute already described?

The blog we are commenting on talks about something much like the "__at" operator. GNU prefers a consistent "attribute" syntaxt rather than just randomly throwing more "keywords" into the C language as we think of them. Hence, there is an attribute to do practically what you need. The idea is that there is always a memory region configured in the linker called "page0". It is always covering the address range [00..0xFF].

Did we mention that compiler optimization needs to be turned on? I use "-Os"
In a nutshell:

# define PAGE0_ATTRIBUTE __attribute__((section(".page0")))
...
unsigned char var PAGE0_ATTRIBUTE;

with that, gcc knows it can use the direct addressing mode on porta. I just tested that, and works as expected. Maybe this is all working better than I thought it was.

When I use

#define PORTA _IO8(IO_BASE + 0x00) /* port A */

then it works equally well.

Anyone can file a bug report, but I can't say whether Stephane Carrez will have time to respond much. I think I've seen bugs fixed and released, but nothing mentioned in the bug reports. Last year he's made an extra effort to manage those better:

Bugs at http://savannah.gnu.org/bugs/?group=m68hc11

page0 optimization on HC12

Let's face it... We likely use the MCU for embedded (hardware) applications, not for the same purpose as our desktop PC. The PC would need page0 RAM to use direct mode addressing because it has software applications. For my HC12, however, I need direct mode to optimize manipulation of hardware registers. Hopefully that makes sense why the HC12 registers are by default in page0, not RAM in page0.

Since the way we allocate hardware registers and use them does not involve __attribute__((page0)), or any section name for that matter, I don't know any way to tell GCC to optimize using direct mode. It seems to me that sect ".page0" should not matter; However, if this is still the best way to tell unlinked code that it should use direct mode, then what we need is to define a way to allocate the hardware register space IN page0. It will also need to change the fact that currently section ".page0" is being linked into the RAM above 0x1000, which is not page0.

* Also I should bring up the point again that the HC12 can use bset/bclr regardless of the address region. GCC does not seem to realize that, and that can really "take away the punch", since all RAM is above page0.

Not quite the problem

I was able to get hardware registers to use direct mode addressing (HC12), however I won't bother posting how since the compiler generates some instructions that have no direct addressing mode yet are marked with the asterisk "*" causing the assembler to barf.

GCC does generate bset/bclr instructions for all addresses. Maybe that was a change in recent versions. However it never generates brset/brclr instructions so is extremely inefficient with bit testing. One typically gets for "if (var&1)" where var is declared volatile unsigned char:
ldab var
anda #0
andb #1
tbeq D,xxxx

which is wrong in so many ways it's laughable.

Things like this and the absence of the __at keyword to force locations really makes GCC a poor choice for the embedded program development in the HC12.