Better Living With Macros

Writing a kernel from scratch is a big project, and it’s even more challenging when you are doing it in pure assembly language. It’s not that any one particular thing is necessarily hard, so much as it is verbose and repetitive. Fortunately it’s not necessary to keep repeating the same code over and over. Any decent assembler will support macros, and the one I am using (as65) has some pretty decent macro support. I’ve used this to create a number of macros that alleviate the most common pain points I have encountered. Let’s take a look at a couple of examples.

Walking a Linked List

Something I am doing quite a bit in my new code is walking linked lists. This is pretty basic data structures task; in C it’s just one line:

        ptr = ptr->next;

In 65816 assembly it looks like this:

        ldyw #OFFSET_OF_NEXT
        lda [ptr],y
        tax
        iny
        iny
        lda [ptr],y
        sta ptr + 2
        stx ptr

Again, it’s not that this is hard, but it’s something I end up doing a lot. Fortunately, with a little macro work, this can be made fairly trivial:

.macro next_entry loc
        lda     [loc]
        tax
        ldyw    #2
        lda     [loc],y
        sta     loc + 2
        stx     loc
.endmacro

This macro makes one assumption; that the pointer to the next entry is always in the first four bytes of the structure. I could make the macro take an optional offset, but all of my structures are designed this way, so I didn’t feel it was necessary.

With this macro, the eight lines above are now one line:

        next_entry ptr

Function Stack Frame Management

Now let’s look at something more in-depth, something that stretches the limits of what’s possible in macros.

On the 65816 it is often desirable to set up a local direct page on the stack, both to make it easy to access stack-based parameters and to support local variables. Setting this all up is fairly boilerplate, so I wrote macros for it. First, let’s look at the macros themselves, which are a bit more involved than the trivial example above:

.macro _BeginDirectPage
        .struct
        .res 1
.endmacro

.macro _StackFrameRTS
        s_dreg    .word
        s_ret     .word
.endmacro

.macro _StackFrameRTL
        s_dreg    .word
        s_ret     .word
        s_bank    .byte
.endmacro

.macro _EndDirectPage
        _pend  .byte
        .endstruct

@lsize := s_dreg - 1
.endmacro

.macro _SetupDirectPage
        phd
        tsc
.if @lsize > 0
        sec
        sbcw    #@lsize
        tcs
.endif
        tcd
.endmacro

.macro _RemoveParams pend
.ifblank pend
          @pend := _pend
.else
          @pend := pend
.endif
.ifdef s_bank
          @psize := @pend - s_bank - 1
.else
          @psize := @pend - s_ret - 2
.endif
.if @psize > 0
.ifdef s_bank
        shortm
        lda     s_bank,s
        sta     s_bank + @psize,s
        longm
.endif
        lda     s_ret,s
        sta     s_ret + @psize,s
        lda     s_dreg,s
        sta     s_dreg + @psize,s
.endif
        tsc
        clc
        adcw    #@psize + @lsize
        tcs
.endmacro

There’s a lot going on here, but it just involves trying to automatically figure out how many bytes of parameters and local variables need to be set up, and how many bytes of stack need removed before returning to the caller.

Starting at the top, the _BeginDirectPage, _EndDirectPage, and _StackFrameRTx macros are defining a STRUCT that generates some function-scoped direct page labels. A struct is an as65 construct that acts sort of like a structure in C, except it’s not allocating memory. Instead, it just defines scoped labels whose values are their offsets in the structure. By default they start at 0, but since our DP actually starts at SP + 1, _BeginDirectPage starts off by reserving an unnamed byte at 0.

Next, the _SetupDirectPage macro takes care of saving the DP, moving the stack pointer down for local variables (if needed), and setting D to point to the new direct page. At this point all parameters and local variables are accessible.

Finally, _RemoveParams cleans up the stack in preparation for returning to the caller. It takes one optional parameter, which is the name of the first output parameter. This is needed to calculate where input parameters end so that the proper number of bytes can be removed from the stack.

Here’s a snippet of my kmalloc() function, takes a single WORD (16-bit) length parameter, and returns a DWORD (32-bit) pointer. It also has two local variables, one WORD and one DWORD.

.proc kmalloc
        _BeginDirectPage
          l_best_size   .word
          l_ptr         .dword
          _StackFrameRTS
          i_size        .word
          o_block       .dword
        _EndDirectPage

        _SetupDirectPage
        ... do stuff ...
        _RemoveParams o_block
        pld
        rts

The resulting assembly code looks like this:

        phd
        tsc
        sec
        sbcw  #0006
        tcs
        tcd
        ... do stuff ...
        lda   09,s
        sta   0B,s
        lda   07,s
        sta   09,s
        tsc
        clc
        adcw  #0008
        tcs
        pld
        rts                             

The first part is the result of _SetupDirectPage. It just moves the SP down by six bytes to account for our two local variables, then sets D to SP. If there were no local variables defined the SEC/SBC/TCS lines would be omitted.

At the end is the result of _RemoveParams. It first moves the saved D and return address on the stack up by two bytes, erasing the input parameters. It then moves SP up by eight bytes: 6 for local variables and 2 for the input parameter. After that the code restores DP and returns to the caller.

While these macros are much easier than doing this manually they are not perfect, and still have room for errors:

  • The D register is saved for you, but you must remember to restore it yourself. This was needed for a particular use case I won’t go into here.
  • You have to make sure to use the right _StackFrameRTx macro depending on whether your function returns with RTS or RTL.
  • The ordering of params is vital. Local variables must be listed first, followed by the stack frame, then input parameters, and finally output parameters.
  • If you have output parameters, and fail to pass the name of the first one to _RemoveParams, then it will remove the wrong number of bytes from the stack and the function will crash on RTS/RTL.

These shortcomings are largely due to the limitations of ca65’s macro definitions I have been considering adding a preprocessor step to my build process, either a generic C preprocessor for better macro support, or a custom one that can generate boilerplate code based on some directives in the source file.

Why not C?

Ideally I would write some of my code in a higher-level language such as C, and reserve the pure assembly for time-critical functions such as interrupt handlers. Unfortunately, the C compiler that’s part of my tool chain, cc65, does not generate native 65816 code at this time, making it unusable. In fact,, there are only two C compilers for the 65816 that I’m aware of: the WDC Tools and Calypsi. Both are free for personal use, closed-source tools. Additionally, the WDC Tools are Win32 binaries, so they must be executed via WINE, which makes things messier and slower.

In general I avoid closed source tools as much as possible. It’s not because I’m opposed to them in general (I am), so much as that they tend to become unmaintained. Once that happens it rapidly becomes difficult or impossible to make them run properly on newer Linux distributions. Frequently all that is needed is to recompile the tools against newer libraries, something that’s impossible when the tool is closed source.

In the end, just getting my existing code to assemble under a new tool chain would be a major undertaking, one that I’ve decided is just not worth the effort. if I ever were to make a change, it would probably be to Calypsi, simply because it has native Linux binaries.

(By the way, I have experimented with WDC’s C compiler, and the function setup/exit code it generates it nearly identical to what my macros are generating. I have not tried this yet with Calypsi.)

Conclusions

I am still hopeful that someday somebody will finally create a good, fully open-source C compiler for the 65816. There have been rumblings about adding 65816 support to the llvm-mos project, though to date nothing has been released. If and when this happens it would provide a great ecosystem for 65816 development, and I would be tempted to port my code over at that point.

In the mean time, clever and judicious use of macros has made my work much easier.