Writing a kernel from scratch is a big project, and it’s even more challenging when you are doing it in pure assembly language. It’s not that any one particular thing is necessarily hard, so much as it is verbose and repetitive. Fortunately it’s not necessary to keep repeating the same code over and over. Any decent assembler will support macros, and the one I am using (as65) has some pretty decent macro support. I’ve used this to create a number of macros that alleviate the most common pain points I have encountered. Let’s take a look at a couple of examples.
Walking a Linked List
Something I am doing quite a bit in my new code is walking linked lists. This is pretty basic data structures task; in C it’s just one line:
ptr = ptr->next;
In 65816 assembly it looks like this:
ldyw #OFFSET_OF_NEXT lda [ptr],y tax iny iny lda [ptr],y sta ptr + 2 stx ptr
Again, it’s not that this is hard, but it’s something I end up doing a lot. Fortunately, with a little macro work, this can be made fairly trivial:
.macro next_entry loc lda [loc] tax ldyw #2 lda [loc],y sta loc + 2 stx loc .endmacro
This macro makes one assumption; that the pointer to the next entry is always in the first four bytes of the structure. I could make the macro take an optional offset, but all of my structures are designed this way, so I didn’t feel it was necessary.
With this macro, the eight lines above are now one line:
Function Stack Frame Management
Now let’s look at something more in-depth, something that stretches the limits of what’s possible in macros.
On the 65816 it is often desirable to set up a local direct page on the stack, both to make it easy to access stack-based parameters and to support local variables. Setting this all up is fairly boilerplate, so I wrote macros for it. First, let’s look at the macros themselves, which are a bit more involved than the trivial example above:
.macro _BeginDirectPage .struct .res 1 .endmacro .macro _StackFrameRTS s_dreg .word s_ret .word .endmacro .macro _StackFrameRTL s_dreg .word s_ret .word s_bank .byte .endmacro .macro _EndDirectPage _pend .byte .endstruct @lsize := s_dreg - 1 .endmacro .macro _SetupDirectPage phd tsc .if @lsize > 0 sec sbcw #@lsize tcs .endif tcd .endmacro .macro _RemoveParams pend .ifblank pend @pend := _pend .else @pend := pend .endif .ifdef s_bank @psize := @pend - s_bank - 1 .else @psize := @pend - s_ret - 2 .endif .if @psize > 0 .ifdef s_bank shortm lda s_bank,s sta s_bank + @psize,s longm .endif lda s_ret,s sta s_ret + @psize,s lda s_dreg,s sta s_dreg + @psize,s .endif tsc clc adcw #@psize + @lsize tcs .endmacro
There’s a lot going on here, but it just involves trying to automatically figure out how many bytes of parameters and local variables need to be set up, and how many bytes of stack need removed before returning to the caller.
Starting at the top, the
_StackFrameRTx macros are defining a STRUCT that generates some function-scoped direct page labels. A struct is an as65 construct that acts sort of like a structure in C, except it’s not allocating memory. Instead, it just defines scoped labels whose values are their offsets in the structure. By default they start at 0, but since our DP actually starts at SP + 1,
_BeginDirectPage starts off by reserving an unnamed byte at 0.
_SetupDirectPage macro takes care of saving the DP, moving the stack pointer down for local variables (if needed), and setting D to point to the new direct page. At this point all parameters and local variables are accessible.
_RemoveParams cleans up the stack in preparation for returning to the caller. It takes one optional parameter, which is the name of the first output parameter. This is needed to calculate where input parameters end so that the proper number of bytes can be removed from the stack.
Here’s a snippet of my kmalloc() function, takes a single WORD (16-bit) length parameter, and returns a DWORD (32-bit) pointer. It also has two local variables, one WORD and one DWORD.
.proc kmalloc _BeginDirectPage l_best_size .word l_ptr .dword _StackFrameRTS i_size .word o_block .dword _EndDirectPage _SetupDirectPage ... do stuff ... _RemoveParams o_block pld rts
The resulting assembly code looks like this:
phd tsc sec sbcw #0006 tcs tcd ... do stuff ... lda 09,s sta 0B,s lda 07,s sta 09,s tsc clc adcw #0008 tcs pld rts
The first part is the result of
_SetupDirectPage. It just moves the SP down by six bytes to account for our two local variables, then sets D to SP. If there were no local variables defined the SEC/SBC/TCS lines would be omitted.
At the end is the result of
_RemoveParams. It first moves the saved D and return address on the stack up by two bytes, erasing the input parameters. It then moves SP up by eight bytes: 6 for local variables and 2 for the input parameter. After that the code restores DP and returns to the caller.
While these macros are much easier than doing this manually they are not perfect, and still have room for errors:
- The D register is saved for you, but you must remember to restore it yourself. This was needed for a particular use case I won’t go into here.
- You have to make sure to use the right
_StackFrameRTx macro depending on whether your function returns with RTS or RTL.
- The ordering of params is vital. Local variables must be listed first, followed by the stack frame, then input parameters, and finally output parameters.
- If you have output parameters, and fail to pass the name of the first one to
_RemoveParams, then it will remove the wrong number of bytes from the stack and the function will crash on RTS/RTL.
These shortcomings are largely due to the limitations of ca65’s macro definitions I have been considering adding a preprocessor step to my build process, either a generic C preprocessor for better macro support, or a custom one that can generate boilerplate code based on some directives in the source file.
Why not C?
Ideally I would write some of my code in a higher-level language such as C, and reserve the pure assembly for time-critical functions such as interrupt handlers. Unfortunately, the C compiler that’s part of my tool chain, cc65, does not generate native 65816 code at this time, making it unusable. In fact,, there are only two C compilers for the 65816 that I’m aware of: the WDC Tools and Calypsi. Both are free for personal use, closed-source tools. Additionally, the WDC Tools are Win32 binaries, so they must be executed via WINE, which makes things messier and slower.
In general I avoid closed source tools as much as possible. It’s not because I’m opposed to them in general (I am), so much as that they tend to become unmaintained. Once that happens it rapidly becomes difficult or impossible to make them run properly on newer Linux distributions. Frequently all that is needed is to recompile the tools against newer libraries, something that’s impossible when the tool is closed source.
In the end, just getting my existing code to assemble under a new tool chain would be a major undertaking, one that I’ve decided is just not worth the effort. if I ever were to make a change, it would probably be to Calypsi, simply because it has native Linux binaries.
(By the way, I have experimented with WDC’s C compiler, and the function setup/exit code it generates it nearly identical to what my macros are generating. I have not tried this yet with Calypsi.)
I am still hopeful that someday somebody will finally create a good, fully open-source C compiler for the 65816. There have been rumblings about adding 65816 support to the llvm-mos project, though to date nothing has been released. If and when this happens it would provide a great ecosystem for 65816 development, and I would be tempted to port my code over at that point.
In the mean time, clever and judicious use of macros has made my work much easier.